All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/60] xen: add core scheduling support
@ 2019-05-28 10:32 ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich, Roger Pau Monné

Add support for core- and socket-scheduling in the Xen hypervisor.

Via boot parameter sched-gran=core (or sched-gran=socket)
it is possible to change the scheduling granularity from cpu (the
default) to either whole cores or even sockets.

All logical cpus (threads) of the core or socket are always scheduled
together. This means that on a core always vcpus of the same domain
will be active, and those vcpus will always be scheduled at the same
time.

This is achieved by switching the scheduler to no longer see vcpus as
the primary object to schedule, but "schedule units". Each schedule
unit consists of as many vcpus as each core has threads on the current
system. The vcpu->unit relation is fixed.

I have done some very basic performance testing: on a 4 cpu system
(2 cores with 2 threads each) I did a "make -j 4" for building the Xen
hypervisor. With This test has been run on dom0, once with no other
guest active and once with another guest with 4 vcpus running the same
test. The results are (always elapsed time, system time, user time):

sched-gran=cpu,    no other guest: 116.10 177.65 207.84
sched-gran=core,   no other guest: 114.04 175.47 207.45
sched-gran=cpu,    other guest:    202.30 334.21 384.63
sched-gran=core,   other guest:    207.24 293.04 371.37

The performance tests have been performed with credit2, the other
schedulers are tested only briefly to be able to create a domain in a
cpupool.

Cpupools have been moderately tested (cpu add/remove, create, destroy,
move domain).

Cpu on-/offlining has been moderately tested, too.

The complete patch series is available under:

  git://github.com/jgross1/xen/ sched-v1

Changes in V1:
- cpupools are working now
- cpu on-/offlining working now
- all schedulers working now
- renamed "items" to "units"
- introduction of "idle scheduler"
- several new patches (see individual patches, mostly splits of
  former patches or cpupool and cpu on-/offlining support)
- all review comments addressed
- some minor changes (see individual patches)

Changes in RFC V2:
- ARM is building now
- HVM domains are working now
- idling will always be done with idle_vcpu active
- other small changes see individual patches

Juergen Gross (60):
  xen/sched: only allow schedulers with all mandatory functions
    available
  xen/sched: add inline wrappers for calling per-scheduler functions
  xen/sched: let sched_switch_sched() return new lock address
  xen/sched: use new sched_unit instead of vcpu in scheduler interfaces
  xen/sched: alloc struct sched_unit for each vcpu
  xen/sched: move per-vcpu scheduler private data pointer to sched_unit
  xen/sched: build a linked list of struct sched_unit
  xen/sched: introduce struct sched_resource
  xen/sched: let pick_cpu return a scheduler resource
  xen/sched: switch schedule_data.curr to point at sched_unit
  xen/sched: move per cpu scheduler private data into struct
    sched_resource
  xen/sched: switch vcpu_schedule_lock to unit_schedule_lock
  xen/sched: move some per-vcpu items to struct sched_unit
  xen/sched: add scheduler helpers hiding vcpu
  xen/sched: add domain pointer to struct sched_unit
  xen/sched: add id to struct sched_unit
  xen/sched: rename scheduler related perf counters
  xen/sched: switch struct task_slice from vcpu to sched_unit
  xen/sched: add is_running indicator to struct sched_unit
  xen/sched: make null scheduler vcpu agnostic.
  xen/sched: make rt scheduler vcpu agnostic.
  xen/sched: make credit scheduler vcpu agnostic.
  xen/sched: make credit2 scheduler vcpu agnostic.
  xen/sched: make arinc653 scheduler vcpu agnostic.
  xen: add sched_unit_pause_nosync() and sched_unit_unpause()
  xen: let vcpu_create() select processor
  xen/sched: use sched_resource cpu instead smp_processor_id in
    schedulers
  xen/sched: switch schedule() from vcpus to sched_units
  xen/sched: switch sched_move_irqs() to take sched_unit as parameter
  xen: switch from for_each_vcpu() to for_each_sched_unit()
  xen/sched: add runstate counters to struct sched_unit
  xen/sched: rework and rename vcpu_force_reschedule()
  xen/sched: Change vcpu_migrate_*() to operate on schedule unit
  xen/sched: move struct task_slice into struct sched_unit
  xen/sched: add code to sync scheduling of all vcpus of a sched unit
  xen/sched: introduce unit_runnable_state()
  xen/sched: add support for multiple vcpus per sched unit where missing
  x86: make loading of GDT at context switch more modular
  x86: optimize loading of GDT at context switch
  xen/sched: modify cpupool_domain_cpumask() to be an unit mask
  xen/sched: support allocating multiple vcpus into one sched unit
  xen/sched: add a scheduler_percpu_init() function
  xen/sched: add a percpu resource index
  xen/sched: add fall back to idle vcpu when scheduling unit
  xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware
  xen/sched: carve out freeing sched_unit memory into dedicated function
  xen/sched: move per-cpu variable scheduler to struct sched_resource
  xen/sched: move per-cpu variable cpupool to struct sched_resource
  xen/sched: reject switching smt on/off with core scheduling active
  xen/sched: prepare per-cpupool scheduling granularity
  xen/sched: use one schedule lock for all free cpus
  xen/sched: populate cpupool0 only after all cpus are up
  xen/sched: remove cpu from pool0 before removing it
  xen/sched: add minimalistic idle scheduler for free cpus
  xen/sched: split schedule_cpu_switch()
  xen/sched: protect scheduling resource via rcu
  xen/sched: support multiple cpus per scheduling resource
  xen/sched: support differing granularity in schedule_cpu_[add/rm]()
  xen/sched: support core scheduling for moving cpus to/from cpupools
  xen/sched: add scheduling granularity enum

 xen/arch/arm/domain.c             |    2 +-
 xen/arch/arm/domain_build.c       |   13 +-
 xen/arch/x86/acpi/cpu_idle.c      |    1 -
 xen/arch/x86/cpu/common.c         |    3 +
 xen/arch/x86/cpu/mcheck/mce.c     |    1 -
 xen/arch/x86/cpu/mcheck/mctelem.c |    1 -
 xen/arch/x86/dom0_build.c         |   10 +-
 xen/arch/x86/domain.c             |   93 +-
 xen/arch/x86/hvm/dom0_build.c     |    9 +-
 xen/arch/x86/pv/dom0_build.c      |   10 +-
 xen/arch/x86/pv/emul-priv-op.c    |    1 +
 xen/arch/x86/pv/shim.c            |    4 +-
 xen/arch/x86/pv/traps.c           |    5 +-
 xen/arch/x86/setup.c              |    1 -
 xen/arch/x86/smpboot.c            |    1 -
 xen/arch/x86/sysctl.c             |    3 +-
 xen/arch/x86/traps.c              |    9 +-
 xen/common/cpupool.c              |  326 ++++---
 xen/common/domain.c               |   34 +-
 xen/common/domctl.c               |   23 +-
 xen/common/keyhandler.c           |    4 +-
 xen/common/sched_arinc653.c       |  270 +++---
 xen/common/sched_credit.c         |  783 ++++++++-------
 xen/common/sched_credit2.c        | 1134 +++++++++++-----------
 xen/common/sched_null.c           |  443 +++++----
 xen/common/sched_rt.c             |  555 +++++------
 xen/common/schedule.c             | 1923 +++++++++++++++++++++++++++++--------
 xen/common/softirq.c              |    6 +-
 xen/common/wait.c                 |    4 +-
 xen/include/asm-arm/current.h     |    1 +
 xen/include/asm-x86/cpuidle.h     |   11 -
 xen/include/asm-x86/current.h     |    7 +-
 xen/include/asm-x86/desc.h        |    1 +
 xen/include/asm-x86/dom0_build.h  |    3 +-
 xen/include/asm-x86/smp.h         |    3 +
 xen/include/xen/domain.h          |    3 +-
 xen/include/xen/perfc_defn.h      |   32 +-
 xen/include/xen/sched-if.h        |  444 +++++++--
 xen/include/xen/sched.h           |   99 +-
 xen/include/xen/softirq.h         |    1 +
 40 files changed, 3905 insertions(+), 2372 deletions(-)

-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 00/60] xen: add core scheduling support
@ 2019-05-28 10:32 ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich, Roger Pau Monné

Add support for core- and socket-scheduling in the Xen hypervisor.

Via boot parameter sched-gran=core (or sched-gran=socket)
it is possible to change the scheduling granularity from cpu (the
default) to either whole cores or even sockets.

All logical cpus (threads) of the core or socket are always scheduled
together. This means that on a core always vcpus of the same domain
will be active, and those vcpus will always be scheduled at the same
time.

This is achieved by switching the scheduler to no longer see vcpus as
the primary object to schedule, but "schedule units". Each schedule
unit consists of as many vcpus as each core has threads on the current
system. The vcpu->unit relation is fixed.

I have done some very basic performance testing: on a 4 cpu system
(2 cores with 2 threads each) I did a "make -j 4" for building the Xen
hypervisor. With This test has been run on dom0, once with no other
guest active and once with another guest with 4 vcpus running the same
test. The results are (always elapsed time, system time, user time):

sched-gran=cpu,    no other guest: 116.10 177.65 207.84
sched-gran=core,   no other guest: 114.04 175.47 207.45
sched-gran=cpu,    other guest:    202.30 334.21 384.63
sched-gran=core,   other guest:    207.24 293.04 371.37

The performance tests have been performed with credit2, the other
schedulers are tested only briefly to be able to create a domain in a
cpupool.

Cpupools have been moderately tested (cpu add/remove, create, destroy,
move domain).

Cpu on-/offlining has been moderately tested, too.

The complete patch series is available under:

  git://github.com/jgross1/xen/ sched-v1

Changes in V1:
- cpupools are working now
- cpu on-/offlining working now
- all schedulers working now
- renamed "items" to "units"
- introduction of "idle scheduler"
- several new patches (see individual patches, mostly splits of
  former patches or cpupool and cpu on-/offlining support)
- all review comments addressed
- some minor changes (see individual patches)

Changes in RFC V2:
- ARM is building now
- HVM domains are working now
- idling will always be done with idle_vcpu active
- other small changes see individual patches

Juergen Gross (60):
  xen/sched: only allow schedulers with all mandatory functions
    available
  xen/sched: add inline wrappers for calling per-scheduler functions
  xen/sched: let sched_switch_sched() return new lock address
  xen/sched: use new sched_unit instead of vcpu in scheduler interfaces
  xen/sched: alloc struct sched_unit for each vcpu
  xen/sched: move per-vcpu scheduler private data pointer to sched_unit
  xen/sched: build a linked list of struct sched_unit
  xen/sched: introduce struct sched_resource
  xen/sched: let pick_cpu return a scheduler resource
  xen/sched: switch schedule_data.curr to point at sched_unit
  xen/sched: move per cpu scheduler private data into struct
    sched_resource
  xen/sched: switch vcpu_schedule_lock to unit_schedule_lock
  xen/sched: move some per-vcpu items to struct sched_unit
  xen/sched: add scheduler helpers hiding vcpu
  xen/sched: add domain pointer to struct sched_unit
  xen/sched: add id to struct sched_unit
  xen/sched: rename scheduler related perf counters
  xen/sched: switch struct task_slice from vcpu to sched_unit
  xen/sched: add is_running indicator to struct sched_unit
  xen/sched: make null scheduler vcpu agnostic.
  xen/sched: make rt scheduler vcpu agnostic.
  xen/sched: make credit scheduler vcpu agnostic.
  xen/sched: make credit2 scheduler vcpu agnostic.
  xen/sched: make arinc653 scheduler vcpu agnostic.
  xen: add sched_unit_pause_nosync() and sched_unit_unpause()
  xen: let vcpu_create() select processor
  xen/sched: use sched_resource cpu instead smp_processor_id in
    schedulers
  xen/sched: switch schedule() from vcpus to sched_units
  xen/sched: switch sched_move_irqs() to take sched_unit as parameter
  xen: switch from for_each_vcpu() to for_each_sched_unit()
  xen/sched: add runstate counters to struct sched_unit
  xen/sched: rework and rename vcpu_force_reschedule()
  xen/sched: Change vcpu_migrate_*() to operate on schedule unit
  xen/sched: move struct task_slice into struct sched_unit
  xen/sched: add code to sync scheduling of all vcpus of a sched unit
  xen/sched: introduce unit_runnable_state()
  xen/sched: add support for multiple vcpus per sched unit where missing
  x86: make loading of GDT at context switch more modular
  x86: optimize loading of GDT at context switch
  xen/sched: modify cpupool_domain_cpumask() to be an unit mask
  xen/sched: support allocating multiple vcpus into one sched unit
  xen/sched: add a scheduler_percpu_init() function
  xen/sched: add a percpu resource index
  xen/sched: add fall back to idle vcpu when scheduling unit
  xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware
  xen/sched: carve out freeing sched_unit memory into dedicated function
  xen/sched: move per-cpu variable scheduler to struct sched_resource
  xen/sched: move per-cpu variable cpupool to struct sched_resource
  xen/sched: reject switching smt on/off with core scheduling active
  xen/sched: prepare per-cpupool scheduling granularity
  xen/sched: use one schedule lock for all free cpus
  xen/sched: populate cpupool0 only after all cpus are up
  xen/sched: remove cpu from pool0 before removing it
  xen/sched: add minimalistic idle scheduler for free cpus
  xen/sched: split schedule_cpu_switch()
  xen/sched: protect scheduling resource via rcu
  xen/sched: support multiple cpus per scheduling resource
  xen/sched: support differing granularity in schedule_cpu_[add/rm]()
  xen/sched: support core scheduling for moving cpus to/from cpupools
  xen/sched: add scheduling granularity enum

 xen/arch/arm/domain.c             |    2 +-
 xen/arch/arm/domain_build.c       |   13 +-
 xen/arch/x86/acpi/cpu_idle.c      |    1 -
 xen/arch/x86/cpu/common.c         |    3 +
 xen/arch/x86/cpu/mcheck/mce.c     |    1 -
 xen/arch/x86/cpu/mcheck/mctelem.c |    1 -
 xen/arch/x86/dom0_build.c         |   10 +-
 xen/arch/x86/domain.c             |   93 +-
 xen/arch/x86/hvm/dom0_build.c     |    9 +-
 xen/arch/x86/pv/dom0_build.c      |   10 +-
 xen/arch/x86/pv/emul-priv-op.c    |    1 +
 xen/arch/x86/pv/shim.c            |    4 +-
 xen/arch/x86/pv/traps.c           |    5 +-
 xen/arch/x86/setup.c              |    1 -
 xen/arch/x86/smpboot.c            |    1 -
 xen/arch/x86/sysctl.c             |    3 +-
 xen/arch/x86/traps.c              |    9 +-
 xen/common/cpupool.c              |  326 ++++---
 xen/common/domain.c               |   34 +-
 xen/common/domctl.c               |   23 +-
 xen/common/keyhandler.c           |    4 +-
 xen/common/sched_arinc653.c       |  270 +++---
 xen/common/sched_credit.c         |  783 ++++++++-------
 xen/common/sched_credit2.c        | 1134 +++++++++++-----------
 xen/common/sched_null.c           |  443 +++++----
 xen/common/sched_rt.c             |  555 +++++------
 xen/common/schedule.c             | 1923 +++++++++++++++++++++++++++++--------
 xen/common/softirq.c              |    6 +-
 xen/common/wait.c                 |    4 +-
 xen/include/asm-arm/current.h     |    1 +
 xen/include/asm-x86/cpuidle.h     |   11 -
 xen/include/asm-x86/current.h     |    7 +-
 xen/include/asm-x86/desc.h        |    1 +
 xen/include/asm-x86/dom0_build.h  |    3 +-
 xen/include/asm-x86/smp.h         |    3 +
 xen/include/xen/domain.h          |    3 +-
 xen/include/xen/perfc_defn.h      |   32 +-
 xen/include/xen/sched-if.h        |  444 +++++++--
 xen/include/xen/sched.h           |   99 +-
 xen/include/xen/softirq.h         |    1 +
 40 files changed, 3905 insertions(+), 2372 deletions(-)

-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH 01/60] xen/sched: only allow schedulers with all mandatory functions available
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Some functions of struct scheduler are mandatory. Test those in the
scheduler initialization loop to be present and drop schedulers not
complying.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/schedule.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 66f1e2611b..72d8be3906 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1726,9 +1726,33 @@ void __init scheduler_init(void)
 
     for ( i = 0; i < NUM_SCHEDULERS; i++)
     {
+#define sched_test_func(f)                               \
+        if ( !schedulers[i]->f )                         \
+        {                                                \
+            printk("scheduler %s misses .%s, dropped\n", \
+                   schedulers[i]->opt_name, #f);         \
+            schedulers[i] = NULL;                        \
+        }
+
+        sched_test_func(init);
+        sched_test_func(deinit);
+        sched_test_func(pick_cpu);
+        sched_test_func(alloc_vdata);
+        sched_test_func(free_vdata);
+        sched_test_func(switch_sched);
+        sched_test_func(do_schedule);
+
+#undef sched_test_func
+
         if ( schedulers[i]->global_init && schedulers[i]->global_init() < 0 )
+        {
+            printk("scheduler %s failed initialization, dropped\n",
+                   schedulers[i]->opt_name);
             schedulers[i] = NULL;
-        else if ( !ops.name && !strcmp(schedulers[i]->opt_name, opt_sched) )
+        }
+
+        if ( schedulers[i] && !ops.name &&
+             !strcmp(schedulers[i]->opt_name, opt_sched) )
             ops = *schedulers[i];
     }
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 01/60] xen/sched: only allow schedulers with all mandatory functions available
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Some functions of struct scheduler are mandatory. Test those in the
scheduler initialization loop to be present and drop schedulers not
complying.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/schedule.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 66f1e2611b..72d8be3906 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1726,9 +1726,33 @@ void __init scheduler_init(void)
 
     for ( i = 0; i < NUM_SCHEDULERS; i++)
     {
+#define sched_test_func(f)                               \
+        if ( !schedulers[i]->f )                         \
+        {                                                \
+            printk("scheduler %s misses .%s, dropped\n", \
+                   schedulers[i]->opt_name, #f);         \
+            schedulers[i] = NULL;                        \
+        }
+
+        sched_test_func(init);
+        sched_test_func(deinit);
+        sched_test_func(pick_cpu);
+        sched_test_func(alloc_vdata);
+        sched_test_func(free_vdata);
+        sched_test_func(switch_sched);
+        sched_test_func(do_schedule);
+
+#undef sched_test_func
+
         if ( schedulers[i]->global_init && schedulers[i]->global_init() < 0 )
+        {
+            printk("scheduler %s failed initialization, dropped\n",
+                   schedulers[i]->opt_name);
             schedulers[i] = NULL;
-        else if ( !ops.name && !strcmp(schedulers[i]->opt_name, opt_sched) )
+        }
+
+        if ( schedulers[i] && !ops.name &&
+             !strcmp(schedulers[i]->opt_name, opt_sched) )
             ops = *schedulers[i];
     }
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 02/60] xen/sched: add inline wrappers for calling per-scheduler functions
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Instead of using the SCHED_OP() macro to call the different scheduler
specific functions add inline wrappers for that purpose.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch (Andrew Cooper)
V1: use conditional operator (Jan Beulich, Dario Faggioli)
    drop no longer needed ASSERT()s
---
 xen/common/schedule.c      | 104 ++++++++++++++---------------
 xen/include/xen/sched-if.h | 160 ++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 199 insertions(+), 65 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 72d8be3906..4da970c543 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -73,10 +73,6 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr
 
 static struct scheduler __read_mostly ops;
 
-#define SCHED_OP(opsptr, fn, ...)                                          \
-         (( (opsptr)->fn != NULL ) ? (opsptr)->fn(opsptr, ##__VA_ARGS__ )  \
-          : (typeof((opsptr)->fn(opsptr, ##__VA_ARGS__)))0 )
-
 static inline struct scheduler *dom_scheduler(const struct domain *d)
 {
     if ( likely(d->cpupool != NULL) )
@@ -267,8 +263,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, v,
-                     d->sched_priv);
+    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), v, d->sched_priv);
     if ( v->sched_priv == NULL )
         return 1;
 
@@ -289,7 +284,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     }
     else
     {
-        SCHED_OP(dom_scheduler(d), insert_vcpu, v);
+        sched_insert_vcpu(dom_scheduler(d), v);
     }
 
     return 0;
@@ -330,7 +325,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        vcpu_priv[v->vcpu_id] = SCHED_OP(c->sched, alloc_vdata, v, domdata);
+        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, v, domdata);
         if ( vcpu_priv[v->vcpu_id] == NULL )
         {
             for_each_vcpu ( d, v )
@@ -348,7 +343,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        SCHED_OP(old_ops, remove_vcpu, v);
+        sched_remove_vcpu(old_ops, v);
     }
 
     d->cpupool = c;
@@ -383,9 +378,9 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         new_p = cpumask_cycle(new_p, c->cpu_valid);
 
-        SCHED_OP(c->sched, insert_vcpu, v);
+        sched_insert_vcpu(c->sched, v);
 
-        SCHED_OP(old_ops, free_vdata, vcpudata);
+        sched_free_vdata(old_ops, vcpudata);
     }
 
     domain_update_node_affinity(d);
@@ -406,8 +401,8 @@ void sched_destroy_vcpu(struct vcpu *v)
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
-    SCHED_OP(vcpu_scheduler(v), remove_vcpu, v);
-    SCHED_OP(vcpu_scheduler(v), free_vdata, v->sched_priv);
+    sched_remove_vcpu(vcpu_scheduler(v), v);
+    sched_free_vdata(vcpu_scheduler(v), v->sched_priv);
 }
 
 int sched_init_domain(struct domain *d, int poolid)
@@ -458,7 +453,7 @@ void vcpu_sleep_nosync_locked(struct vcpu *v)
         if ( v->runstate.state == RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
 
-        SCHED_OP(vcpu_scheduler(v), sleep, v);
+        sched_sleep(vcpu_scheduler(v), v);
     }
 }
 
@@ -499,7 +494,7 @@ void vcpu_wake(struct vcpu *v)
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        SCHED_OP(vcpu_scheduler(v), wake, v);
+        sched_wake(vcpu_scheduler(v), v);
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -552,19 +547,16 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
 
     /*
      * Actual CPU switch to new CPU.  This is safe because the lock
-     * pointer cant' change while the current lock is held.
+     * pointer can't change while the current lock is held.
      */
-    if ( vcpu_scheduler(v)->migrate )
-        SCHED_OP(vcpu_scheduler(v), migrate, v, new_cpu);
-    else
-        v->processor = new_cpu;
+    sched_migrate(vcpu_scheduler(v), v, new_cpu);
 }
 
 /*
  * Initiating migration
  *
  * In order to migrate, we need the vcpu in question to have stopped
- * running and had SCHED_OP(sleep) called (to take it off any
+ * running and had sched_sleep() called (to take it off any
  * runqueues, for instance); and if it is currently running, it needs
  * to be scheduled out.  Finally, we need to hold the scheduling locks
  * for both the processor we're migrating from, and the processor
@@ -635,7 +627,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = SCHED_OP(vcpu_scheduler(v), pick_cpu, v);
+            new_cpu = sched_pick_cpu(vcpu_scheduler(v), v);
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -740,7 +732,7 @@ void restore_vcpu_affinity(struct domain *d)
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, v);
+        v->processor = sched_pick_cpu(vcpu_scheduler(v), v);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -852,7 +844,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    SCHED_OP(dom_scheduler(v->domain), adjust_affinity, v, hard, soft);
+    sched_adjust_affinity(dom_scheduler(v->domain), v, hard, soft);
 
     if ( hard )
         cpumask_copy(v->cpu_hard_affinity, hard);
@@ -1027,7 +1019,7 @@ long vcpu_yield(void)
     struct vcpu * v=current;
     spinlock_t *lock = vcpu_schedule_lock_irq(v);
 
-    SCHED_OP(vcpu_scheduler(v), yield, v);
+    sched_yield(vcpu_scheduler(v), v);
     vcpu_schedule_unlock_irq(lock, v);
 
     SCHED_STAT_CRANK(vcpu_yield);
@@ -1352,7 +1344,7 @@ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
 
     /* NB: the pluggable scheduler code needs to take care
      * of locking by itself. */
-    if ( (ret = SCHED_OP(dom_scheduler(d), adjust, d, op)) == 0 )
+    if ( (ret = sched_adjust_dom(dom_scheduler(d), d, op)) == 0 )
         TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id);
 
     return ret;
@@ -1376,7 +1368,7 @@ long sched_adjust_global(struct xen_sysctl_scheduler_op *op)
         return -ESRCH;
 
     rc = ((op->sched_id == pool->sched->sched_id)
-          ? SCHED_OP(pool->sched, adjust_global, op) : -EINVAL);
+          ? sched_adjust_cpupool(pool->sched, op) : -EINVAL);
 
     cpupool_put(pool);
 
@@ -1530,7 +1522,7 @@ void context_saved(struct vcpu *prev)
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
 
-    SCHED_OP(vcpu_scheduler(prev), context_saved, prev);
+    sched_context_saved(vcpu_scheduler(prev), prev);
 
     vcpu_migrate_finish(prev);
 }
@@ -1599,8 +1591,8 @@ static int cpu_schedule_up(unsigned int cpu)
          */
         ASSERT(idle->sched_priv == NULL);
 
-        idle->sched_priv = SCHED_OP(&ops, alloc_vdata, idle,
-                                    idle->domain->sched_priv);
+        idle->sched_priv = sched_alloc_vdata(&ops, idle,
+                                             idle->domain->sched_priv);
         if ( idle->sched_priv == NULL )
             return -ENOMEM;
     }
@@ -1612,7 +1604,7 @@ static int cpu_schedule_up(unsigned int cpu)
      * (e.g., inside free_pdata, from cpu_schedule_down() called
      * during CPU_UP_CANCELLED) that contains an IS_ERR value.
      */
-    sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
+    sched_priv = sched_alloc_pdata(&ops, cpu);
     if ( IS_ERR(sched_priv) )
         return PTR_ERR(sched_priv);
 
@@ -1626,8 +1618,8 @@ static void cpu_schedule_down(unsigned int cpu)
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
-    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
-    SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
+    sched_free_pdata(sched, sd->sched_priv, cpu);
+    sched_free_vdata(sched, idle_vcpu[cpu]->sched_priv);
 
     idle_vcpu[cpu]->sched_priv = NULL;
     sd->sched_priv = NULL;
@@ -1679,7 +1671,7 @@ static int cpu_schedule_callback(
     {
     case CPU_STARTING:
         if ( system_state != SYS_STATE_resume )
-            SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
+            sched_init_pdata(sched, sd->sched_priv, cpu);
         break;
     case CPU_UP_PREPARE:
         if ( system_state != SYS_STATE_resume )
@@ -1698,7 +1690,7 @@ static int cpu_schedule_callback(
         rc = cpu_disable_scheduler(cpu);
         BUG_ON(rc);
         rcu_read_unlock(&domlist_read_lock);
-        SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
+        sched_deinit_pdata(sched, sd->sched_priv, cpu);
         cpu_schedule_down(cpu);
         break;
     case CPU_UP_CANCELED:
@@ -1775,7 +1767,7 @@ void __init scheduler_init(void)
     register_cpu_notifier(&cpu_schedule_nfb);
 
     printk("Using scheduler: %s (%s)\n", ops.name, ops.opt_name);
-    if ( SCHED_OP(&ops, init) )
+    if ( sched_init(&ops) )
         panic("scheduler returned error on init\n");
 
     if ( sched_ratelimit_us &&
@@ -1797,9 +1789,9 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0, 0) == NULL )
         BUG();
-    this_cpu(schedule_data).sched_priv = SCHED_OP(&ops, alloc_pdata, 0);
+    this_cpu(schedule_data).sched_priv = sched_alloc_pdata(&ops, 0);
     BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
-    SCHED_OP(&ops, init_pdata, this_cpu(schedule_data).sched_priv, 0);
+    sched_init_pdata(&ops, this_cpu(schedule_data).sched_priv, 0);
 }
 
 /*
@@ -1842,26 +1834,26 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     /*
      * To setup the cpu for the new scheduler we need:
      *  - a valid instance of per-CPU scheduler specific data, as it is
-     *    allocated by SCHED_OP(alloc_pdata). Note that we do not want to
-     *    initialize it yet (i.e., we are not calling SCHED_OP(init_pdata)).
-     *    That will be done by the target scheduler, in SCHED_OP(switch_sched),
+     *    allocated by sched_alloc_pdata(). Note that we do not want to
+     *    initialize it yet (i.e., we are not calling sched_init_pdata()).
+     *    That will be done by the target scheduler, in sched_switch_sched(),
      *    in proper ordering and with locking.
      *  - a valid instance of per-vCPU scheduler specific data, for the idle
      *    vCPU of cpu. That is what the target scheduler will use for the
      *    sched_priv field of the per-vCPU info of the idle domain.
      */
     idle = idle_vcpu[cpu];
-    ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
+    ppriv = sched_alloc_pdata(new_ops, cpu);
     if ( IS_ERR(ppriv) )
         return PTR_ERR(ppriv);
-    vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
+    vpriv = sched_alloc_vdata(new_ops, idle, idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
-        SCHED_OP(new_ops, free_pdata, ppriv, cpu);
+        sched_free_pdata(new_ops, ppriv, cpu);
         return -ENOMEM;
     }
 
-    SCHED_OP(old_ops, tick_suspend, cpu);
+    sched_do_tick_suspend(old_ops, cpu);
 
     /*
      * The actual switch, including (if necessary) the rerouting of the
@@ -1879,17 +1871,17 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 
     vpriv_old = idle->sched_priv;
     ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
-    SCHED_OP(new_ops, switch_sched, cpu, ppriv, vpriv);
+    sched_switch_sched(new_ops, cpu, ppriv, vpriv);
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock_irq(old_lock);
 
-    SCHED_OP(new_ops, tick_resume, cpu);
+    sched_do_tick_resume(new_ops, cpu);
 
-    SCHED_OP(old_ops, deinit_pdata, ppriv_old, cpu);
+    sched_deinit_pdata(old_ops, ppriv_old, cpu);
 
-    SCHED_OP(old_ops, free_vdata, vpriv_old);
-    SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);
+    sched_free_vdata(old_ops, vpriv_old);
+    sched_free_pdata(old_ops, ppriv_old, cpu);
 
  out:
     per_cpu(cpupool, cpu) = c;
@@ -1921,7 +1913,7 @@ struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr)
     if ( (sched = xmalloc(struct scheduler)) == NULL )
         return NULL;
     memcpy(sched, schedulers[i], sizeof(*sched));
-    if ( (*perr = SCHED_OP(sched, init)) != 0 )
+    if ( (*perr = sched_init(sched)) != 0 )
     {
         xfree(sched);
         sched = NULL;
@@ -1933,7 +1925,7 @@ struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr)
 void scheduler_free(struct scheduler *sched)
 {
     BUG_ON(sched == &ops);
-    SCHED_OP(sched, deinit);
+    sched_deinit(sched);
     xfree(sched);
 }
 
@@ -1950,7 +1942,7 @@ void schedule_dump(struct cpupool *c)
         sched = c->sched;
         cpus = c->cpu_valid;
         printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);
-        SCHED_OP(sched, dump_settings);
+        sched_dump_settings(sched);
     }
     else
     {
@@ -1962,7 +1954,7 @@ void schedule_dump(struct cpupool *c)
     {
         printk("CPUs info:\n");
         for_each_cpu (i, cpus)
-            SCHED_OP(sched, dump_cpu_state, i);
+            sched_dump_cpu_state(sched, i);
     }
 }
 
@@ -1972,7 +1964,7 @@ void sched_tick_suspend(void)
     unsigned int cpu = smp_processor_id();
 
     sched = per_cpu(scheduler, cpu);
-    SCHED_OP(sched, tick_suspend, cpu);
+    sched_do_tick_suspend(sched, cpu);
     rcu_idle_enter(cpu);
     rcu_idle_timer_start();
 }
@@ -1985,7 +1977,7 @@ void sched_tick_resume(void)
     rcu_idle_timer_stop();
     rcu_idle_exit(cpu);
     sched = per_cpu(scheduler, cpu);
-    SCHED_OP(sched, tick_resume, cpu);
+    sched_do_tick_resume(sched, cpu);
 }
 
 void wait(void)
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 92bc7a0365..b3c3e189d9 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -185,26 +185,168 @@ struct scheduler {
     void         (*tick_resume)     (const struct scheduler *, unsigned int);
 };
 
+static inline int sched_init(struct scheduler *s)
+{
+    return s->init(s);
+}
+
+static inline void sched_deinit(struct scheduler *s)
+{
+    s->deinit(s);
+}
+
+static inline void sched_switch_sched(struct scheduler *s, unsigned int cpu,
+                                      void *pdata, void *vdata)
+{
+    s->switch_sched(s, cpu, pdata, vdata);
+}
+
+static inline void sched_dump_settings(const struct scheduler *s)
+{
+    if ( s->dump_settings )
+        s->dump_settings(s);
+}
+
+static inline void sched_dump_cpu_state(const struct scheduler *s, int cpu)
+{
+    if ( s->dump_cpu_state )
+        s->dump_cpu_state(s, cpu);
+}
+
+static inline void sched_do_tick_suspend(const struct scheduler *s, int cpu)
+{
+    if ( s->tick_suspend )
+        s->tick_suspend(s, cpu);
+}
+
+static inline void sched_do_tick_resume(const struct scheduler *s, int cpu)
+{
+    if ( s->tick_resume )
+        s->tick_resume(s, cpu);
+}
+
 static inline void *sched_alloc_domdata(const struct scheduler *s,
                                         struct domain *d)
 {
-    if ( s->alloc_domdata )
-        return s->alloc_domdata(s, d);
-    else
-        return NULL;
+    return s->alloc_domdata ? s->alloc_domdata(s, d) : NULL;
 }
 
 static inline void sched_free_domdata(const struct scheduler *s,
                                       void *data)
 {
+    ASSERT(s->free_domdata || !data);
     if ( s->free_domdata )
         s->free_domdata(s, data);
+}
+
+static inline void *sched_alloc_pdata(const struct scheduler *s, int cpu)
+{
+    return s->alloc_pdata ? s->alloc_pdata(s, cpu) : NULL;
+}
+
+static inline void sched_free_pdata(const struct scheduler *s, void *data,
+                                    int cpu)
+{
+    ASSERT(s->free_pdata || !data);
+    if ( s->free_pdata )
+        s->free_pdata(s, data, cpu);
+}
+
+static inline void sched_init_pdata(const struct scheduler *s, void *data,
+                                    int cpu)
+{
+    if ( s->init_pdata )
+        s->init_pdata(s, data, cpu);
+}
+
+static inline void sched_deinit_pdata(const struct scheduler *s, void *data,
+                                      int cpu)
+{
+    if ( s->deinit_pdata )
+        s->deinit_pdata(s, data, cpu);
+}
+
+static inline void *sched_alloc_vdata(const struct scheduler *s, struct vcpu *v,
+                                      void *dom_data)
+{
+    return s->alloc_vdata(s, v, dom_data);
+}
+
+static inline void sched_free_vdata(const struct scheduler *s, void *data)
+{
+    s->free_vdata(s, data);
+}
+
+static inline void sched_insert_vcpu(const struct scheduler *s, struct vcpu *v)
+{
+    if ( s->insert_vcpu )
+        s->insert_vcpu(s, v);
+}
+
+static inline void sched_remove_vcpu(const struct scheduler *s, struct vcpu *v)
+{
+    if ( s->remove_vcpu )
+        s->remove_vcpu(s, v);
+}
+
+static inline void sched_sleep(const struct scheduler *s, struct vcpu *v)
+{
+    if ( s->sleep )
+        s->sleep(s, v);
+}
+
+static inline void sched_wake(const struct scheduler *s, struct vcpu *v)
+{
+    if ( s->wake )
+        s->wake(s, v);
+}
+
+static inline void sched_yield(const struct scheduler *s, struct vcpu *v)
+{
+    if ( s->yield )
+        s->yield(s, v);
+}
+
+static inline void sched_context_saved(const struct scheduler *s,
+                                       struct vcpu *v)
+{
+    if ( s->context_saved )
+        s->context_saved(s, v);
+}
+
+static inline void sched_migrate(const struct scheduler *s, struct vcpu *v,
+                                 unsigned int cpu)
+{
+    if ( s->migrate )
+        s->migrate(s, v, cpu);
     else
-        /*
-         * Check that if there isn't a free_domdata hook, we haven't got any
-         * data we're expected to deal with.
-         */
-        ASSERT(!data);
+        v->processor = cpu;
+}
+
+static inline int sched_pick_cpu(const struct scheduler *s, struct vcpu *v)
+{
+    return s->pick_cpu(s, v);
+}
+
+static inline void sched_adjust_affinity(const struct scheduler *s,
+                                         struct vcpu *v,
+                                         const cpumask_t *hard,
+                                         const cpumask_t *soft)
+{
+    if ( s->adjust_affinity )
+        s->adjust_affinity(s, v, hard, soft);
+}
+
+static inline int sched_adjust_dom(const struct scheduler *s, struct domain *d,
+                                   struct xen_domctl_scheduler_op *op)
+{
+    return s->adjust ? s->adjust(s, d, op) : 0;
+}
+
+static inline int sched_adjust_cpupool(const struct scheduler *s,
+                                       struct xen_sysctl_scheduler_op *op)
+{
+    return s->adjust_global ? s->adjust_global(s, op) : 0;
 }
 
 #define REGISTER_SCHEDULER(x) static const struct scheduler *x##_entry \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 02/60] xen/sched: add inline wrappers for calling per-scheduler functions
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Instead of using the SCHED_OP() macro to call the different scheduler
specific functions add inline wrappers for that purpose.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch (Andrew Cooper)
V1: use conditional operator (Jan Beulich, Dario Faggioli)
    drop no longer needed ASSERT()s
---
 xen/common/schedule.c      | 104 ++++++++++++++---------------
 xen/include/xen/sched-if.h | 160 ++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 199 insertions(+), 65 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 72d8be3906..4da970c543 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -73,10 +73,6 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr
 
 static struct scheduler __read_mostly ops;
 
-#define SCHED_OP(opsptr, fn, ...)                                          \
-         (( (opsptr)->fn != NULL ) ? (opsptr)->fn(opsptr, ##__VA_ARGS__ )  \
-          : (typeof((opsptr)->fn(opsptr, ##__VA_ARGS__)))0 )
-
 static inline struct scheduler *dom_scheduler(const struct domain *d)
 {
     if ( likely(d->cpupool != NULL) )
@@ -267,8 +263,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, v,
-                     d->sched_priv);
+    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), v, d->sched_priv);
     if ( v->sched_priv == NULL )
         return 1;
 
@@ -289,7 +284,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     }
     else
     {
-        SCHED_OP(dom_scheduler(d), insert_vcpu, v);
+        sched_insert_vcpu(dom_scheduler(d), v);
     }
 
     return 0;
@@ -330,7 +325,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        vcpu_priv[v->vcpu_id] = SCHED_OP(c->sched, alloc_vdata, v, domdata);
+        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, v, domdata);
         if ( vcpu_priv[v->vcpu_id] == NULL )
         {
             for_each_vcpu ( d, v )
@@ -348,7 +343,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        SCHED_OP(old_ops, remove_vcpu, v);
+        sched_remove_vcpu(old_ops, v);
     }
 
     d->cpupool = c;
@@ -383,9 +378,9 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         new_p = cpumask_cycle(new_p, c->cpu_valid);
 
-        SCHED_OP(c->sched, insert_vcpu, v);
+        sched_insert_vcpu(c->sched, v);
 
-        SCHED_OP(old_ops, free_vdata, vcpudata);
+        sched_free_vdata(old_ops, vcpudata);
     }
 
     domain_update_node_affinity(d);
@@ -406,8 +401,8 @@ void sched_destroy_vcpu(struct vcpu *v)
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
-    SCHED_OP(vcpu_scheduler(v), remove_vcpu, v);
-    SCHED_OP(vcpu_scheduler(v), free_vdata, v->sched_priv);
+    sched_remove_vcpu(vcpu_scheduler(v), v);
+    sched_free_vdata(vcpu_scheduler(v), v->sched_priv);
 }
 
 int sched_init_domain(struct domain *d, int poolid)
@@ -458,7 +453,7 @@ void vcpu_sleep_nosync_locked(struct vcpu *v)
         if ( v->runstate.state == RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
 
-        SCHED_OP(vcpu_scheduler(v), sleep, v);
+        sched_sleep(vcpu_scheduler(v), v);
     }
 }
 
@@ -499,7 +494,7 @@ void vcpu_wake(struct vcpu *v)
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        SCHED_OP(vcpu_scheduler(v), wake, v);
+        sched_wake(vcpu_scheduler(v), v);
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -552,19 +547,16 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
 
     /*
      * Actual CPU switch to new CPU.  This is safe because the lock
-     * pointer cant' change while the current lock is held.
+     * pointer can't change while the current lock is held.
      */
-    if ( vcpu_scheduler(v)->migrate )
-        SCHED_OP(vcpu_scheduler(v), migrate, v, new_cpu);
-    else
-        v->processor = new_cpu;
+    sched_migrate(vcpu_scheduler(v), v, new_cpu);
 }
 
 /*
  * Initiating migration
  *
  * In order to migrate, we need the vcpu in question to have stopped
- * running and had SCHED_OP(sleep) called (to take it off any
+ * running and had sched_sleep() called (to take it off any
  * runqueues, for instance); and if it is currently running, it needs
  * to be scheduled out.  Finally, we need to hold the scheduling locks
  * for both the processor we're migrating from, and the processor
@@ -635,7 +627,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = SCHED_OP(vcpu_scheduler(v), pick_cpu, v);
+            new_cpu = sched_pick_cpu(vcpu_scheduler(v), v);
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -740,7 +732,7 @@ void restore_vcpu_affinity(struct domain *d)
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, v);
+        v->processor = sched_pick_cpu(vcpu_scheduler(v), v);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -852,7 +844,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    SCHED_OP(dom_scheduler(v->domain), adjust_affinity, v, hard, soft);
+    sched_adjust_affinity(dom_scheduler(v->domain), v, hard, soft);
 
     if ( hard )
         cpumask_copy(v->cpu_hard_affinity, hard);
@@ -1027,7 +1019,7 @@ long vcpu_yield(void)
     struct vcpu * v=current;
     spinlock_t *lock = vcpu_schedule_lock_irq(v);
 
-    SCHED_OP(vcpu_scheduler(v), yield, v);
+    sched_yield(vcpu_scheduler(v), v);
     vcpu_schedule_unlock_irq(lock, v);
 
     SCHED_STAT_CRANK(vcpu_yield);
@@ -1352,7 +1344,7 @@ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
 
     /* NB: the pluggable scheduler code needs to take care
      * of locking by itself. */
-    if ( (ret = SCHED_OP(dom_scheduler(d), adjust, d, op)) == 0 )
+    if ( (ret = sched_adjust_dom(dom_scheduler(d), d, op)) == 0 )
         TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id);
 
     return ret;
@@ -1376,7 +1368,7 @@ long sched_adjust_global(struct xen_sysctl_scheduler_op *op)
         return -ESRCH;
 
     rc = ((op->sched_id == pool->sched->sched_id)
-          ? SCHED_OP(pool->sched, adjust_global, op) : -EINVAL);
+          ? sched_adjust_cpupool(pool->sched, op) : -EINVAL);
 
     cpupool_put(pool);
 
@@ -1530,7 +1522,7 @@ void context_saved(struct vcpu *prev)
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
 
-    SCHED_OP(vcpu_scheduler(prev), context_saved, prev);
+    sched_context_saved(vcpu_scheduler(prev), prev);
 
     vcpu_migrate_finish(prev);
 }
@@ -1599,8 +1591,8 @@ static int cpu_schedule_up(unsigned int cpu)
          */
         ASSERT(idle->sched_priv == NULL);
 
-        idle->sched_priv = SCHED_OP(&ops, alloc_vdata, idle,
-                                    idle->domain->sched_priv);
+        idle->sched_priv = sched_alloc_vdata(&ops, idle,
+                                             idle->domain->sched_priv);
         if ( idle->sched_priv == NULL )
             return -ENOMEM;
     }
@@ -1612,7 +1604,7 @@ static int cpu_schedule_up(unsigned int cpu)
      * (e.g., inside free_pdata, from cpu_schedule_down() called
      * during CPU_UP_CANCELLED) that contains an IS_ERR value.
      */
-    sched_priv = SCHED_OP(&ops, alloc_pdata, cpu);
+    sched_priv = sched_alloc_pdata(&ops, cpu);
     if ( IS_ERR(sched_priv) )
         return PTR_ERR(sched_priv);
 
@@ -1626,8 +1618,8 @@ static void cpu_schedule_down(unsigned int cpu)
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
-    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
-    SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
+    sched_free_pdata(sched, sd->sched_priv, cpu);
+    sched_free_vdata(sched, idle_vcpu[cpu]->sched_priv);
 
     idle_vcpu[cpu]->sched_priv = NULL;
     sd->sched_priv = NULL;
@@ -1679,7 +1671,7 @@ static int cpu_schedule_callback(
     {
     case CPU_STARTING:
         if ( system_state != SYS_STATE_resume )
-            SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
+            sched_init_pdata(sched, sd->sched_priv, cpu);
         break;
     case CPU_UP_PREPARE:
         if ( system_state != SYS_STATE_resume )
@@ -1698,7 +1690,7 @@ static int cpu_schedule_callback(
         rc = cpu_disable_scheduler(cpu);
         BUG_ON(rc);
         rcu_read_unlock(&domlist_read_lock);
-        SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
+        sched_deinit_pdata(sched, sd->sched_priv, cpu);
         cpu_schedule_down(cpu);
         break;
     case CPU_UP_CANCELED:
@@ -1775,7 +1767,7 @@ void __init scheduler_init(void)
     register_cpu_notifier(&cpu_schedule_nfb);
 
     printk("Using scheduler: %s (%s)\n", ops.name, ops.opt_name);
-    if ( SCHED_OP(&ops, init) )
+    if ( sched_init(&ops) )
         panic("scheduler returned error on init\n");
 
     if ( sched_ratelimit_us &&
@@ -1797,9 +1789,9 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0, 0) == NULL )
         BUG();
-    this_cpu(schedule_data).sched_priv = SCHED_OP(&ops, alloc_pdata, 0);
+    this_cpu(schedule_data).sched_priv = sched_alloc_pdata(&ops, 0);
     BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
-    SCHED_OP(&ops, init_pdata, this_cpu(schedule_data).sched_priv, 0);
+    sched_init_pdata(&ops, this_cpu(schedule_data).sched_priv, 0);
 }
 
 /*
@@ -1842,26 +1834,26 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     /*
      * To setup the cpu for the new scheduler we need:
      *  - a valid instance of per-CPU scheduler specific data, as it is
-     *    allocated by SCHED_OP(alloc_pdata). Note that we do not want to
-     *    initialize it yet (i.e., we are not calling SCHED_OP(init_pdata)).
-     *    That will be done by the target scheduler, in SCHED_OP(switch_sched),
+     *    allocated by sched_alloc_pdata(). Note that we do not want to
+     *    initialize it yet (i.e., we are not calling sched_init_pdata()).
+     *    That will be done by the target scheduler, in sched_switch_sched(),
      *    in proper ordering and with locking.
      *  - a valid instance of per-vCPU scheduler specific data, for the idle
      *    vCPU of cpu. That is what the target scheduler will use for the
      *    sched_priv field of the per-vCPU info of the idle domain.
      */
     idle = idle_vcpu[cpu];
-    ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
+    ppriv = sched_alloc_pdata(new_ops, cpu);
     if ( IS_ERR(ppriv) )
         return PTR_ERR(ppriv);
-    vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
+    vpriv = sched_alloc_vdata(new_ops, idle, idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
-        SCHED_OP(new_ops, free_pdata, ppriv, cpu);
+        sched_free_pdata(new_ops, ppriv, cpu);
         return -ENOMEM;
     }
 
-    SCHED_OP(old_ops, tick_suspend, cpu);
+    sched_do_tick_suspend(old_ops, cpu);
 
     /*
      * The actual switch, including (if necessary) the rerouting of the
@@ -1879,17 +1871,17 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 
     vpriv_old = idle->sched_priv;
     ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
-    SCHED_OP(new_ops, switch_sched, cpu, ppriv, vpriv);
+    sched_switch_sched(new_ops, cpu, ppriv, vpriv);
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock_irq(old_lock);
 
-    SCHED_OP(new_ops, tick_resume, cpu);
+    sched_do_tick_resume(new_ops, cpu);
 
-    SCHED_OP(old_ops, deinit_pdata, ppriv_old, cpu);
+    sched_deinit_pdata(old_ops, ppriv_old, cpu);
 
-    SCHED_OP(old_ops, free_vdata, vpriv_old);
-    SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);
+    sched_free_vdata(old_ops, vpriv_old);
+    sched_free_pdata(old_ops, ppriv_old, cpu);
 
  out:
     per_cpu(cpupool, cpu) = c;
@@ -1921,7 +1913,7 @@ struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr)
     if ( (sched = xmalloc(struct scheduler)) == NULL )
         return NULL;
     memcpy(sched, schedulers[i], sizeof(*sched));
-    if ( (*perr = SCHED_OP(sched, init)) != 0 )
+    if ( (*perr = sched_init(sched)) != 0 )
     {
         xfree(sched);
         sched = NULL;
@@ -1933,7 +1925,7 @@ struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr)
 void scheduler_free(struct scheduler *sched)
 {
     BUG_ON(sched == &ops);
-    SCHED_OP(sched, deinit);
+    sched_deinit(sched);
     xfree(sched);
 }
 
@@ -1950,7 +1942,7 @@ void schedule_dump(struct cpupool *c)
         sched = c->sched;
         cpus = c->cpu_valid;
         printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);
-        SCHED_OP(sched, dump_settings);
+        sched_dump_settings(sched);
     }
     else
     {
@@ -1962,7 +1954,7 @@ void schedule_dump(struct cpupool *c)
     {
         printk("CPUs info:\n");
         for_each_cpu (i, cpus)
-            SCHED_OP(sched, dump_cpu_state, i);
+            sched_dump_cpu_state(sched, i);
     }
 }
 
@@ -1972,7 +1964,7 @@ void sched_tick_suspend(void)
     unsigned int cpu = smp_processor_id();
 
     sched = per_cpu(scheduler, cpu);
-    SCHED_OP(sched, tick_suspend, cpu);
+    sched_do_tick_suspend(sched, cpu);
     rcu_idle_enter(cpu);
     rcu_idle_timer_start();
 }
@@ -1985,7 +1977,7 @@ void sched_tick_resume(void)
     rcu_idle_timer_stop();
     rcu_idle_exit(cpu);
     sched = per_cpu(scheduler, cpu);
-    SCHED_OP(sched, tick_resume, cpu);
+    sched_do_tick_resume(sched, cpu);
 }
 
 void wait(void)
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 92bc7a0365..b3c3e189d9 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -185,26 +185,168 @@ struct scheduler {
     void         (*tick_resume)     (const struct scheduler *, unsigned int);
 };
 
+static inline int sched_init(struct scheduler *s)
+{
+    return s->init(s);
+}
+
+static inline void sched_deinit(struct scheduler *s)
+{
+    s->deinit(s);
+}
+
+static inline void sched_switch_sched(struct scheduler *s, unsigned int cpu,
+                                      void *pdata, void *vdata)
+{
+    s->switch_sched(s, cpu, pdata, vdata);
+}
+
+static inline void sched_dump_settings(const struct scheduler *s)
+{
+    if ( s->dump_settings )
+        s->dump_settings(s);
+}
+
+static inline void sched_dump_cpu_state(const struct scheduler *s, int cpu)
+{
+    if ( s->dump_cpu_state )
+        s->dump_cpu_state(s, cpu);
+}
+
+static inline void sched_do_tick_suspend(const struct scheduler *s, int cpu)
+{
+    if ( s->tick_suspend )
+        s->tick_suspend(s, cpu);
+}
+
+static inline void sched_do_tick_resume(const struct scheduler *s, int cpu)
+{
+    if ( s->tick_resume )
+        s->tick_resume(s, cpu);
+}
+
 static inline void *sched_alloc_domdata(const struct scheduler *s,
                                         struct domain *d)
 {
-    if ( s->alloc_domdata )
-        return s->alloc_domdata(s, d);
-    else
-        return NULL;
+    return s->alloc_domdata ? s->alloc_domdata(s, d) : NULL;
 }
 
 static inline void sched_free_domdata(const struct scheduler *s,
                                       void *data)
 {
+    ASSERT(s->free_domdata || !data);
     if ( s->free_domdata )
         s->free_domdata(s, data);
+}
+
+static inline void *sched_alloc_pdata(const struct scheduler *s, int cpu)
+{
+    return s->alloc_pdata ? s->alloc_pdata(s, cpu) : NULL;
+}
+
+static inline void sched_free_pdata(const struct scheduler *s, void *data,
+                                    int cpu)
+{
+    ASSERT(s->free_pdata || !data);
+    if ( s->free_pdata )
+        s->free_pdata(s, data, cpu);
+}
+
+static inline void sched_init_pdata(const struct scheduler *s, void *data,
+                                    int cpu)
+{
+    if ( s->init_pdata )
+        s->init_pdata(s, data, cpu);
+}
+
+static inline void sched_deinit_pdata(const struct scheduler *s, void *data,
+                                      int cpu)
+{
+    if ( s->deinit_pdata )
+        s->deinit_pdata(s, data, cpu);
+}
+
+static inline void *sched_alloc_vdata(const struct scheduler *s, struct vcpu *v,
+                                      void *dom_data)
+{
+    return s->alloc_vdata(s, v, dom_data);
+}
+
+static inline void sched_free_vdata(const struct scheduler *s, void *data)
+{
+    s->free_vdata(s, data);
+}
+
+static inline void sched_insert_vcpu(const struct scheduler *s, struct vcpu *v)
+{
+    if ( s->insert_vcpu )
+        s->insert_vcpu(s, v);
+}
+
+static inline void sched_remove_vcpu(const struct scheduler *s, struct vcpu *v)
+{
+    if ( s->remove_vcpu )
+        s->remove_vcpu(s, v);
+}
+
+static inline void sched_sleep(const struct scheduler *s, struct vcpu *v)
+{
+    if ( s->sleep )
+        s->sleep(s, v);
+}
+
+static inline void sched_wake(const struct scheduler *s, struct vcpu *v)
+{
+    if ( s->wake )
+        s->wake(s, v);
+}
+
+static inline void sched_yield(const struct scheduler *s, struct vcpu *v)
+{
+    if ( s->yield )
+        s->yield(s, v);
+}
+
+static inline void sched_context_saved(const struct scheduler *s,
+                                       struct vcpu *v)
+{
+    if ( s->context_saved )
+        s->context_saved(s, v);
+}
+
+static inline void sched_migrate(const struct scheduler *s, struct vcpu *v,
+                                 unsigned int cpu)
+{
+    if ( s->migrate )
+        s->migrate(s, v, cpu);
     else
-        /*
-         * Check that if there isn't a free_domdata hook, we haven't got any
-         * data we're expected to deal with.
-         */
-        ASSERT(!data);
+        v->processor = cpu;
+}
+
+static inline int sched_pick_cpu(const struct scheduler *s, struct vcpu *v)
+{
+    return s->pick_cpu(s, v);
+}
+
+static inline void sched_adjust_affinity(const struct scheduler *s,
+                                         struct vcpu *v,
+                                         const cpumask_t *hard,
+                                         const cpumask_t *soft)
+{
+    if ( s->adjust_affinity )
+        s->adjust_affinity(s, v, hard, soft);
+}
+
+static inline int sched_adjust_dom(const struct scheduler *s, struct domain *d,
+                                   struct xen_domctl_scheduler_op *op)
+{
+    return s->adjust ? s->adjust(s, d, op) : 0;
+}
+
+static inline int sched_adjust_cpupool(const struct scheduler *s,
+                                       struct xen_sysctl_scheduler_op *op)
+{
+    return s->adjust_global ? s->adjust_global(s, op) : 0;
 }
 
 #define REGISTER_SCHEDULER(x) static const struct scheduler *x##_entry \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

Instead of setting the scheduler percpu lock address in each of the
switch_sched instances of the different schedulers do that in
schedule_cpu_switch() which is the single caller of that function.
For that purpose let sched_switch_sched() just return the new lock
address.

This allows to set the new struct scheduler and struct schedule_data
values in the percpu area in schedule_cpu_switch() instead of the
schedulers, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c | 14 ++------------
 xen/common/sched_credit.c   | 13 ++-----------
 xen/common/sched_credit2.c  | 15 +++------------
 xen/common/sched_null.c     | 16 ++++------------
 xen/common/sched_rt.c       | 12 ++----------
 xen/common/schedule.c       | 18 +++++++++++++++---
 xen/include/xen/sched-if.h  |  9 +++++----
 7 files changed, 33 insertions(+), 64 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index a4c6d00b81..72b988ea5f 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -630,7 +630,7 @@ a653sched_pick_cpu(const struct scheduler *ops, struct vcpu *vc)
  * @param pdata     scheduler specific PCPU data (we don't have any)
  * @param vdata     scheduler specific VCPU data of the idle vcpu
  */
-static void
+static spinlock_t *
 a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                   void *pdata, void *vdata)
 {
@@ -641,17 +641,7 @@ a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     idle_vcpu[cpu]->sched_priv = vdata;
 
-    per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
-
-    /*
-     * (Re?)route the lock to its default location. We actually do not use
-     * it, but if we leave it pointing to where it does now (i.e., the
-     * runqueue lock for this PCPU in the default scheduler), we'd be
-     * causing unnecessary contention on that lock (in cases where it is
-     * shared among multiple PCPUs, like in Credit2 and RTDS).
-     */
-    sd->schedule_lock = &sd->_lock;
+    return &sd->_lock;
 }
 
 /**
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 7b7facbace..621c408ed0 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -631,7 +631,7 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 }
 
 /* Change the scheduler of cpu to us (Credit). */
-static void
+static spinlock_t *
 csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                     void *pdata, void *vdata)
 {
@@ -653,16 +653,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     init_pdata(prv, pdata, cpu);
     spin_unlock(&prv->lock);
 
-    per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = pdata;
-
-    /*
-     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
-     * if it is free (and it can be) we want that anyone that manages
-     * taking it, finds all the initializations we've done above in place.
-     */
-    smp_mb();
-    sd->schedule_lock = &sd->_lock;
+    return &sd->_lock;
 }
 
 #ifndef NDEBUG
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 9c1c3b4e08..8e4381d8a7 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3855,7 +3855,7 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 }
 
 /* Change the scheduler of cpu to us (Credit2). */
-static void
+static spinlock_t *
 csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                      void *pdata, void *vdata)
 {
@@ -3888,18 +3888,9 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
      */
     ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &prv->rqd[rqi].lock);
 
-    per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = pdata;
-
-    /*
-     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
-     * if it is free (and it can be) we want that anyone that manages
-     * taking it, find all the initializations we've done above in place.
-     */
-    smp_mb();
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock;
-
     write_unlock(&prv->lock);
+
+    return &prv->rqd[rqi].lock;
 }
 
 static void
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index a59dbb2692..10427b37ab 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -381,8 +381,9 @@ static void vcpu_deassign(struct null_private *prv, struct vcpu *v,
 }
 
 /* Change the scheduler of cpu to us (null). */
-static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
-                              void *pdata, void *vdata)
+static spinlock_t *null_switch_sched(struct scheduler *new_ops,
+                                     unsigned int cpu,
+                                     void *pdata, void *vdata)
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct null_private *prv = null_priv(new_ops);
@@ -401,16 +402,7 @@ static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     init_pdata(prv, cpu);
 
-    per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = pdata;
-
-    /*
-     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
-     * if it is free (and it can be) we want that anyone that manages
-     * taking it, finds all the initializations we've done above in place.
-     */
-    smp_mb();
-    sd->schedule_lock = &sd->_lock;
+    return &sd->_lock;
 }
 
 static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v)
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index f1b81f0373..0acfc3d702 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -729,7 +729,7 @@ rt_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 }
 
 /* Change the scheduler of cpu to us (RTDS). */
-static void
+static spinlock_t *
 rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                 void *pdata, void *vdata)
 {
@@ -761,16 +761,8 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     }
 
     idle_vcpu[cpu]->sched_priv = vdata;
-    per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
 
-    /*
-     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
-     * if it is free (and it can be) we want that anyone that manages
-     * taking it, find all the initializations we've done above in place.
-     */
-    smp_mb();
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
+    return &prv->lock;
 }
 
 static void
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 4da970c543..6dc96b3cd4 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1812,7 +1812,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
     struct cpupool *old_pool = per_cpu(cpupool, cpu);
-    spinlock_t * old_lock;
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    spinlock_t *old_lock, *new_lock;
 
     /*
      * pCPUs only move from a valid cpupool to free (i.e., out of any pool),
@@ -1870,8 +1871,19 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     old_lock = pcpu_schedule_lock_irq(cpu);
 
     vpriv_old = idle->sched_priv;
-    ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
-    sched_switch_sched(new_ops, cpu, ppriv, vpriv);
+    ppriv_old = sd->sched_priv;
+    new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
+
+    per_cpu(scheduler, cpu) = new_ops;
+    sd->sched_priv = ppriv;
+
+    /*
+     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
+     * if it is free (and it can be) we want that anyone that manages
+     * taking it, finds all the initializations we've done above in place.
+     */
+    smp_mb();
+    sd->schedule_lock = new_lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock_irq(old_lock);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index b3c3e189d9..b8e2b2e49e 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -153,7 +153,7 @@ struct scheduler {
     /* Idempotent. */
     void         (*free_domdata)   (const struct scheduler *, void *);
 
-    void         (*switch_sched)   (struct scheduler *, unsigned int,
+    spinlock_t * (*switch_sched)   (struct scheduler *, unsigned int,
                                     void *, void *);
 
     /* Activate / deactivate vcpus in a cpu pool */
@@ -195,10 +195,11 @@ static inline void sched_deinit(struct scheduler *s)
     s->deinit(s);
 }
 
-static inline void sched_switch_sched(struct scheduler *s, unsigned int cpu,
-                                      void *pdata, void *vdata)
+static inline spinlock_t *sched_switch_sched(struct scheduler *s,
+                                             unsigned int cpu,
+                                             void *pdata, void *vdata)
 {
-    s->switch_sched(s, cpu, pdata, vdata);
+    return s->switch_sched(s, cpu, pdata, vdata);
 }
 
 static inline void sched_dump_settings(const struct scheduler *s)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

Instead of setting the scheduler percpu lock address in each of the
switch_sched instances of the different schedulers do that in
schedule_cpu_switch() which is the single caller of that function.
For that purpose let sched_switch_sched() just return the new lock
address.

This allows to set the new struct scheduler and struct schedule_data
values in the percpu area in schedule_cpu_switch() instead of the
schedulers, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c | 14 ++------------
 xen/common/sched_credit.c   | 13 ++-----------
 xen/common/sched_credit2.c  | 15 +++------------
 xen/common/sched_null.c     | 16 ++++------------
 xen/common/sched_rt.c       | 12 ++----------
 xen/common/schedule.c       | 18 +++++++++++++++---
 xen/include/xen/sched-if.h  |  9 +++++----
 7 files changed, 33 insertions(+), 64 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index a4c6d00b81..72b988ea5f 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -630,7 +630,7 @@ a653sched_pick_cpu(const struct scheduler *ops, struct vcpu *vc)
  * @param pdata     scheduler specific PCPU data (we don't have any)
  * @param vdata     scheduler specific VCPU data of the idle vcpu
  */
-static void
+static spinlock_t *
 a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                   void *pdata, void *vdata)
 {
@@ -641,17 +641,7 @@ a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     idle_vcpu[cpu]->sched_priv = vdata;
 
-    per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
-
-    /*
-     * (Re?)route the lock to its default location. We actually do not use
-     * it, but if we leave it pointing to where it does now (i.e., the
-     * runqueue lock for this PCPU in the default scheduler), we'd be
-     * causing unnecessary contention on that lock (in cases where it is
-     * shared among multiple PCPUs, like in Credit2 and RTDS).
-     */
-    sd->schedule_lock = &sd->_lock;
+    return &sd->_lock;
 }
 
 /**
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 7b7facbace..621c408ed0 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -631,7 +631,7 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 }
 
 /* Change the scheduler of cpu to us (Credit). */
-static void
+static spinlock_t *
 csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                     void *pdata, void *vdata)
 {
@@ -653,16 +653,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     init_pdata(prv, pdata, cpu);
     spin_unlock(&prv->lock);
 
-    per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = pdata;
-
-    /*
-     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
-     * if it is free (and it can be) we want that anyone that manages
-     * taking it, finds all the initializations we've done above in place.
-     */
-    smp_mb();
-    sd->schedule_lock = &sd->_lock;
+    return &sd->_lock;
 }
 
 #ifndef NDEBUG
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 9c1c3b4e08..8e4381d8a7 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3855,7 +3855,7 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 }
 
 /* Change the scheduler of cpu to us (Credit2). */
-static void
+static spinlock_t *
 csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                      void *pdata, void *vdata)
 {
@@ -3888,18 +3888,9 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
      */
     ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &prv->rqd[rqi].lock);
 
-    per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = pdata;
-
-    /*
-     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
-     * if it is free (and it can be) we want that anyone that manages
-     * taking it, find all the initializations we've done above in place.
-     */
-    smp_mb();
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock;
-
     write_unlock(&prv->lock);
+
+    return &prv->rqd[rqi].lock;
 }
 
 static void
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index a59dbb2692..10427b37ab 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -381,8 +381,9 @@ static void vcpu_deassign(struct null_private *prv, struct vcpu *v,
 }
 
 /* Change the scheduler of cpu to us (null). */
-static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
-                              void *pdata, void *vdata)
+static spinlock_t *null_switch_sched(struct scheduler *new_ops,
+                                     unsigned int cpu,
+                                     void *pdata, void *vdata)
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct null_private *prv = null_priv(new_ops);
@@ -401,16 +402,7 @@ static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     init_pdata(prv, cpu);
 
-    per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = pdata;
-
-    /*
-     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
-     * if it is free (and it can be) we want that anyone that manages
-     * taking it, finds all the initializations we've done above in place.
-     */
-    smp_mb();
-    sd->schedule_lock = &sd->_lock;
+    return &sd->_lock;
 }
 
 static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v)
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index f1b81f0373..0acfc3d702 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -729,7 +729,7 @@ rt_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 }
 
 /* Change the scheduler of cpu to us (RTDS). */
-static void
+static spinlock_t *
 rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                 void *pdata, void *vdata)
 {
@@ -761,16 +761,8 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     }
 
     idle_vcpu[cpu]->sched_priv = vdata;
-    per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
 
-    /*
-     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
-     * if it is free (and it can be) we want that anyone that manages
-     * taking it, find all the initializations we've done above in place.
-     */
-    smp_mb();
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
+    return &prv->lock;
 }
 
 static void
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 4da970c543..6dc96b3cd4 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1812,7 +1812,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
     struct cpupool *old_pool = per_cpu(cpupool, cpu);
-    spinlock_t * old_lock;
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    spinlock_t *old_lock, *new_lock;
 
     /*
      * pCPUs only move from a valid cpupool to free (i.e., out of any pool),
@@ -1870,8 +1871,19 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     old_lock = pcpu_schedule_lock_irq(cpu);
 
     vpriv_old = idle->sched_priv;
-    ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
-    sched_switch_sched(new_ops, cpu, ppriv, vpriv);
+    ppriv_old = sd->sched_priv;
+    new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
+
+    per_cpu(scheduler, cpu) = new_ops;
+    sd->sched_priv = ppriv;
+
+    /*
+     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
+     * if it is free (and it can be) we want that anyone that manages
+     * taking it, finds all the initializations we've done above in place.
+     */
+    smp_mb();
+    sd->schedule_lock = new_lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock_irq(old_lock);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index b3c3e189d9..b8e2b2e49e 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -153,7 +153,7 @@ struct scheduler {
     /* Idempotent. */
     void         (*free_domdata)   (const struct scheduler *, void *);
 
-    void         (*switch_sched)   (struct scheduler *, unsigned int,
+    spinlock_t * (*switch_sched)   (struct scheduler *, unsigned int,
                                     void *, void *);
 
     /* Activate / deactivate vcpus in a cpu pool */
@@ -195,10 +195,11 @@ static inline void sched_deinit(struct scheduler *s)
     s->deinit(s);
 }
 
-static inline void sched_switch_sched(struct scheduler *s, unsigned int cpu,
-                                      void *pdata, void *vdata)
+static inline spinlock_t *sched_switch_sched(struct scheduler *s,
+                                             unsigned int cpu,
+                                             void *pdata, void *vdata)
 {
-    s->switch_sched(s, cpu, pdata, vdata);
+    return s->switch_sched(s, cpu, pdata, vdata);
 }
 
 static inline void sched_dump_settings(const struct scheduler *s)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 04/60] xen/sched: use new sched_unit instead of vcpu in scheduler interfaces
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

In order to prepare core- and socket-scheduling use a new struct
sched_unit instead of struct vcpu for interfaces of the different
schedulers.

Rename the per-scheduler functions insert_vcpu and remove_vcpu to
insert_unit and remove_unit to reflect the change of the parameter.
In the schedulers rename local functions switched to sched_unit, too.

For now this new struct will contain a vcpu pointer only and is
allocated on the stack. This will be changed later.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2:
- move definition of struct sched_unit to sched.h (Andrew Cooper)
V1:
- rename "item" to "unit" (George Dunlap)
---
 xen/common/sched_arinc653.c | 30 +++++++++------
 xen/common/sched_credit.c   | 41 ++++++++++++--------
 xen/common/sched_credit2.c  | 57 ++++++++++++++++------------
 xen/common/sched_null.c     | 39 ++++++++++++-------
 xen/common/sched_rt.c       | 33 +++++++++-------
 xen/common/schedule.c       | 53 ++++++++++++++++++--------
 xen/include/xen/sched-if.h  | 92 ++++++++++++++++++++++++++-------------------
 xen/include/xen/sched.h     |  4 ++
 8 files changed, 218 insertions(+), 131 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 72b988ea5f..283730b3f8 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -376,13 +376,16 @@ a653sched_deinit(struct scheduler *ops)
  * This function allocates scheduler-specific data for a VCPU
  *
  * @param ops       Pointer to this instance of the scheduler structure
+ * @param unit      Pointer to struct sched_unit
  *
  * @return          Pointer to the allocated data
  */
 static void *
-a653sched_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+a653sched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
+                      void *dd)
 {
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
+    struct vcpu *vc = unit->vcpu;
     arinc653_vcpu_t *svc;
     unsigned int entry;
     unsigned long flags;
@@ -458,11 +461,13 @@ a653sched_free_vdata(const struct scheduler *ops, void *priv)
  * Xen scheduler callback function to sleep a VCPU
  *
  * @param ops       Pointer to this instance of the scheduler structure
- * @param vc        Pointer to the VCPU structure for the current domain
+ * @param unit      Pointer to struct sched_unit
  */
 static void
-a653sched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
+
     if ( AVCPU(vc) != NULL )
         AVCPU(vc)->awake = 0;
 
@@ -478,11 +483,13 @@ a653sched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
  * Xen scheduler callback function to wake up a VCPU
  *
  * @param ops       Pointer to this instance of the scheduler structure
- * @param vc        Pointer to the VCPU structure for the current domain
+ * @param unit      Pointer to struct sched_unit
  */
 static void
-a653sched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+a653sched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
+
     if ( AVCPU(vc) != NULL )
         AVCPU(vc)->awake = 1;
 
@@ -597,13 +604,14 @@ a653sched_do_schedule(
  * Xen scheduler callback function to select a CPU for the VCPU to run on
  *
  * @param ops       Pointer to this instance of the scheduler structure
- * @param v         Pointer to the VCPU structure for the current domain
+ * @param unit      Pointer to struct sched_unit
  *
  * @return          Number of selected physical CPU
  */
 static int
-a653sched_pick_cpu(const struct scheduler *ops, struct vcpu *vc)
+a653sched_pick_cpu(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     cpumask_t *online;
     unsigned int cpu;
 
@@ -702,11 +710,11 @@ static const struct scheduler sched_arinc653_def = {
     .free_vdata     = a653sched_free_vdata,
     .alloc_vdata    = a653sched_alloc_vdata,
 
-    .insert_vcpu    = NULL,
-    .remove_vcpu    = NULL,
+    .insert_unit    = NULL,
+    .remove_unit    = NULL,
 
-    .sleep          = a653sched_vcpu_sleep,
-    .wake           = a653sched_vcpu_wake,
+    .sleep          = a653sched_unit_sleep,
+    .wake           = a653sched_unit_wake,
     .yield          = NULL,
     .context_saved  = NULL,
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 621c408ed0..eba4db38bb 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -859,15 +859,16 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
 }
 
 static int
-csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
+csched_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu *svc = CSCHED_VCPU(vc);
 
     /*
      * We have been called by vcpu_migrate() (in schedule.c), as part
      * of the process of seeing if vc can be migrated to another pcpu.
      * We make a note about this in svc->flags so that later, in
-     * csched_vcpu_wake() (still called from vcpu_migrate()) we won't
+     * csched_unit_wake() (still called from vcpu_migrate()) we won't
      * get boosted, which we don't deserve as we are "only" migrating.
      */
     set_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
@@ -995,8 +996,10 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
 }
 
 static void *
-csched_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+csched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
+                   void *dd)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu *svc;
 
     /* Allocate per-VCPU info */
@@ -1016,8 +1019,9 @@ csched_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
 }
 
 static void
-csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu *svc = vc->sched_priv;
     spinlock_t *lock;
 
@@ -1026,7 +1030,7 @@ csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
     /* csched_cpu_pick() looks in vc->processor's runq, so we need the lock. */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched_cpu_pick(ops, vc);
+    vc->processor = csched_cpu_pick(ops, unit);
 
     spin_unlock_irq(lock);
 
@@ -1051,9 +1055,10 @@ csched_free_vdata(const struct scheduler *ops, void *priv)
 }
 
 static void
-csched_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
+csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched_private *prv = CSCHED_PRIV(ops);
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
     struct csched_dom * const sdom = svc->sdom;
 
@@ -1078,8 +1083,9 @@ csched_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
     unsigned int cpu = vc->processor;
 
@@ -1102,8 +1108,9 @@ csched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
     bool_t migrating;
 
@@ -1163,8 +1170,9 @@ csched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched_vcpu_yield(const struct scheduler *ops, struct vcpu *vc)
+csched_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
 
     /* Let the scheduler know that this vcpu is trying to yield */
@@ -1217,9 +1225,10 @@ csched_dom_cntl(
 }
 
 static void
-csched_aff_cntl(const struct scheduler *ops, struct vcpu *v,
+csched_aff_cntl(const struct scheduler *ops, struct sched_unit *unit,
                 const cpumask_t *hard, const cpumask_t *soft)
 {
+    struct vcpu *v = unit->vcpu;
     struct csched_vcpu *svc = CSCHED_VCPU(v);
 
     if ( !hard )
@@ -1748,7 +1757,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                  * - if we race with inc_nr_runnable(), we skip a pCPU that may
                  *   have runnable vCPUs in its runqueue, but that's not a
                  *   problem because:
-                 *   + if racing with csched_vcpu_insert() or csched_vcpu_wake(),
+                 *   + if racing with csched_unit_insert() or csched_unit_wake(),
                  *     __runq_tickle() will be called afterwords, so the vCPU
                  *     won't get stuck in the runqueue for too long;
                  *   + if racing with csched_runq_steal(), it may be that a
@@ -2260,12 +2269,12 @@ static const struct scheduler sched_credit_def = {
 
     .global_init    = csched_global_init,
 
-    .insert_vcpu    = csched_vcpu_insert,
-    .remove_vcpu    = csched_vcpu_remove,
+    .insert_unit    = csched_unit_insert,
+    .remove_unit    = csched_unit_remove,
 
-    .sleep          = csched_vcpu_sleep,
-    .wake           = csched_vcpu_wake,
-    .yield          = csched_vcpu_yield,
+    .sleep          = csched_unit_sleep,
+    .wake           = csched_unit_wake,
+    .yield          = csched_unit_yield,
 
     .adjust         = csched_dom_cntl,
     .adjust_affinity= csched_aff_cntl,
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 8e4381d8a7..def8d87484 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -273,7 +273,7 @@
  * CSFLAG_delayed_runq_add: Do we need to add this to the runqueue once it'd done
  * being context switched out?
  * + Set when scheduling out in csched2_schedule() if prev is runnable
- * + Set in csched2_vcpu_wake if it finds CSFLAG_scheduled set
+ * + Set in csched2_unit_wake if it finds CSFLAG_scheduled set
  * + Read in csched2_context_saved().  If set, it adds prev to the runqueue and
  *   clears the bit.
  */
@@ -623,14 +623,14 @@ static inline bool has_cap(const struct csched2_vcpu *svc)
  * This logic is entirely implemented in runq_tickle(), and that is enough.
  * In fact, in this scheduler, placement of a vcpu on one of the pcpus of a
  * runq, _always_ happens by means of tickling:
- *  - when a vcpu wakes up, it calls csched2_vcpu_wake(), which calls
+ *  - when a vcpu wakes up, it calls csched2_unit_wake(), which calls
  *    runq_tickle();
  *  - when a migration is initiated in schedule.c, we call csched2_cpu_pick(),
- *    csched2_vcpu_migrate() (which calls migrate()) and csched2_vcpu_wake().
+ *    csched2_unit_migrate() (which calls migrate()) and csched2_unit_wake().
  *    csched2_cpu_pick() looks for the least loaded runq and return just any
- *    of its processors. Then, csched2_vcpu_migrate() just moves the vcpu to
+ *    of its processors. Then, csched2_unit_migrate() just moves the vcpu to
  *    the chosen runq, and it is again runq_tickle(), called by
- *    csched2_vcpu_wake() that actually decides what pcpu to use within the
+ *    csched2_unit_wake() that actually decides what pcpu to use within the
  *    chosen runq;
  *  - when a migration is initiated in sched_credit2.c, by calling  migrate()
  *    directly, that again temporarily use a random pcpu from the new runq,
@@ -2026,8 +2026,10 @@ csched2_vcpu_check(struct vcpu *vc)
 #endif
 
 static void *
-csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
+                    void *dd)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu *svc;
 
     /* Allocate per-VCPU info */
@@ -2069,8 +2071,9 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
 }
 
 static void
-csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
 
     ASSERT(!is_idle_vcpu(vc));
@@ -2091,8 +2094,9 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     unsigned int cpu = vc->processor;
     s_time_t now;
@@ -2146,16 +2150,18 @@ out:
 }
 
 static void
-csched2_vcpu_yield(const struct scheduler *ops, struct vcpu *v)
+csched2_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(v);
 
     __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
 }
 
 static void
-csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
+csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
     s_time_t now = NOW();
@@ -2196,9 +2202,10 @@ csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
 
 #define MAX_LOAD (STIME_MAX)
 static int
-csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
+csched2_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched2_private *prv = csched2_priv(ops);
+    struct vcpu *vc = unit->vcpu;
     int i, min_rqi = -1, min_s_rqi = -1;
     unsigned int new_cpu, cpu = vc->processor;
     struct csched2_vcpu *svc = csched2_vcpu(vc);
@@ -2733,9 +2740,10 @@ retry:
 }
 
 static void
-csched2_vcpu_migrate(
-    const struct scheduler *ops, struct vcpu *vc, unsigned int new_cpu)
+csched2_unit_migrate(
+    const struct scheduler *ops, struct sched_unit *unit, unsigned int new_cpu)
 {
+    struct vcpu *vc = unit->vcpu;
     struct domain *d = vc->domain;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     struct csched2_runqueue_data *trqd;
@@ -2996,9 +3004,10 @@ csched2_dom_cntl(
 }
 
 static void
-csched2_aff_cntl(const struct scheduler *ops, struct vcpu *v,
+csched2_aff_cntl(const struct scheduler *ops, struct sched_unit *unit,
                  const cpumask_t *hard, const cpumask_t *soft)
 {
+    struct vcpu *v = unit->vcpu;
     struct csched2_vcpu *svc = csched2_vcpu(v);
 
     if ( !hard )
@@ -3096,8 +3105,9 @@ csched2_free_domdata(const struct scheduler *ops, void *data)
 }
 
 static void
-csched2_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu *svc = vc->sched_priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
@@ -3108,7 +3118,7 @@ csched2_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
     /* csched2_cpu_pick() expects the pcpu lock to be held */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched2_cpu_pick(ops, vc);
+    vc->processor = csched2_cpu_pick(ops, unit);
 
     spin_unlock_irq(lock);
 
@@ -3135,8 +3145,9 @@ csched2_free_vdata(const struct scheduler *ops, void *priv)
 }
 
 static void
-csched2_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
+csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     spinlock_t *lock;
 
@@ -4074,19 +4085,19 @@ static const struct scheduler sched_credit2_def = {
 
     .global_init    = csched2_global_init,
 
-    .insert_vcpu    = csched2_vcpu_insert,
-    .remove_vcpu    = csched2_vcpu_remove,
+    .insert_unit    = csched2_unit_insert,
+    .remove_unit    = csched2_unit_remove,
 
-    .sleep          = csched2_vcpu_sleep,
-    .wake           = csched2_vcpu_wake,
-    .yield          = csched2_vcpu_yield,
+    .sleep          = csched2_unit_sleep,
+    .wake           = csched2_unit_wake,
+    .yield          = csched2_unit_yield,
 
     .adjust         = csched2_dom_cntl,
     .adjust_affinity= csched2_aff_cntl,
     .adjust_global  = csched2_sys_cntl,
 
     .pick_cpu       = csched2_cpu_pick,
-    .migrate        = csched2_vcpu_migrate,
+    .migrate        = csched2_unit_migrate,
     .do_schedule    = csched2_schedule,
     .context_saved  = csched2_context_saved,
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 10427b37ab..bf52c84b0f 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -194,8 +194,9 @@ static void null_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 }
 
 static void *null_alloc_vdata(const struct scheduler *ops,
-                              struct vcpu *v, void *dd)
+                              struct sched_unit *unit, void *dd)
 {
+    struct vcpu *v = unit->vcpu;
     struct null_vcpu *nvc;
 
     nvc = xzalloc(struct null_vcpu);
@@ -405,8 +406,10 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
     return &sd->_lock;
 }
 
-static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v)
+static void null_unit_insert(const struct scheduler *ops,
+                             struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_vcpu *nvc = null_vcpu(v);
     unsigned int cpu;
@@ -497,8 +500,10 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
     spin_unlock(&prv->waitq_lock);
 }
 
-static void null_vcpu_remove(const struct scheduler *ops, struct vcpu *v)
+static void null_unit_remove(const struct scheduler *ops,
+                             struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_vcpu *nvc = null_vcpu(v);
     spinlock_t *lock;
@@ -528,8 +533,11 @@ static void null_vcpu_remove(const struct scheduler *ops, struct vcpu *v)
     SCHED_STAT_CRANK(vcpu_remove);
 }
 
-static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v)
+static void null_unit_wake(const struct scheduler *ops,
+                           struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
+
     ASSERT(!is_idle_vcpu(v));
 
     if ( unlikely(curr_on_cpu(v->processor) == v) )
@@ -554,8 +562,11 @@ static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v)
     cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
 }
 
-static void null_vcpu_sleep(const struct scheduler *ops, struct vcpu *v)
+static void null_unit_sleep(const struct scheduler *ops,
+                            struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
+
     ASSERT(!is_idle_vcpu(v));
 
     /* If v is not assigned to a pCPU, or is not running, no need to bother */
@@ -565,15 +576,17 @@ static void null_vcpu_sleep(const struct scheduler *ops, struct vcpu *v)
     SCHED_STAT_CRANK(vcpu_sleep);
 }
 
-static int null_cpu_pick(const struct scheduler *ops, struct vcpu *v)
+static int null_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
     ASSERT(!is_idle_vcpu(v));
     return pick_cpu(null_priv(ops), v);
 }
 
-static void null_vcpu_migrate(const struct scheduler *ops, struct vcpu *v,
-                              unsigned int new_cpu)
+static void null_unit_migrate(const struct scheduler *ops,
+                              struct sched_unit *unit, unsigned int new_cpu)
 {
+    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_vcpu *nvc = null_vcpu(v);
 
@@ -880,13 +893,13 @@ const struct scheduler sched_null_def = {
     .alloc_domdata  = null_alloc_domdata,
     .free_domdata   = null_free_domdata,
 
-    .insert_vcpu    = null_vcpu_insert,
-    .remove_vcpu    = null_vcpu_remove,
+    .insert_unit    = null_unit_insert,
+    .remove_unit    = null_unit_remove,
 
-    .wake           = null_vcpu_wake,
-    .sleep          = null_vcpu_sleep,
+    .wake           = null_unit_wake,
+    .sleep          = null_unit_sleep,
     .pick_cpu       = null_cpu_pick,
-    .migrate        = null_vcpu_migrate,
+    .migrate        = null_unit_migrate,
     .do_schedule    = null_schedule,
 
     .dump_cpu_state = null_dump_pcpu,
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 0acfc3d702..63ec732157 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -136,7 +136,7 @@
  * RTDS_delayed_runq_add: Do we need to add this to the RunQ/DepletedQ
  * once it's done being context switching out?
  * + Set when scheduling out in rt_schedule() if prev is runable
- * + Set in rt_vcpu_wake if it finds RTDS_scheduled set
+ * + Set in rt_unit_wake if it finds RTDS_scheduled set
  * + Read in rt_context_saved(). If set, it adds prev to the Runqueue/DepletedQ
  *   and clears the bit.
  */
@@ -637,8 +637,9 @@ replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc)
  * and available cpus
  */
 static int
-rt_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
+rt_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     cpumask_t cpus;
     cpumask_t *online;
     int cpu;
@@ -838,8 +839,9 @@ rt_free_domdata(const struct scheduler *ops, void *data)
 }
 
 static void *
-rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu *svc;
 
     /* Allocate per-VCPU info */
@@ -881,8 +883,9 @@ rt_free_vdata(const struct scheduler *ops, void *priv)
  * dest. cpupool.
  */
 static void
-rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu *svc = rt_vcpu(vc);
     s_time_t now;
     spinlock_t *lock;
@@ -890,7 +893,7 @@ rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
     BUG_ON( is_idle_vcpu(vc) );
 
     /* This is safe because vc isn't yet being scheduled */
-    vc->processor = rt_cpu_pick(ops, vc);
+    vc->processor = rt_cpu_pick(ops, unit);
 
     lock = vcpu_schedule_lock_irq(vc);
 
@@ -914,8 +917,9 @@ rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
  * Remove rt_vcpu svc from the old scheduler in source cpupool.
  */
 static void
-rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
+rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu * const svc = rt_vcpu(vc);
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
@@ -1134,8 +1138,9 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
  * The lock is already grabbed in schedule.c, no need to lock here
  */
 static void
-rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu * const svc = rt_vcpu(vc);
 
     BUG_ON( is_idle_vcpu(vc) );
@@ -1249,8 +1254,9 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
  * TODO: what if these two vcpus belongs to the same domain?
  */
 static void
-rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu * const svc = rt_vcpu(vc);
     s_time_t now;
     bool_t missed;
@@ -1319,8 +1325,9 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
  * and then pick the highest priority vcpu from runq to run
  */
 static void
-rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
+rt_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu *svc = rt_vcpu(vc);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
 
@@ -1549,15 +1556,15 @@ static const struct scheduler sched_rtds_def = {
     .free_domdata   = rt_free_domdata,
     .alloc_vdata    = rt_alloc_vdata,
     .free_vdata     = rt_free_vdata,
-    .insert_vcpu    = rt_vcpu_insert,
-    .remove_vcpu    = rt_vcpu_remove,
+    .insert_unit    = rt_unit_insert,
+    .remove_unit    = rt_unit_remove,
 
     .adjust         = rt_dom_cntl,
 
     .pick_cpu       = rt_cpu_pick,
     .do_schedule    = rt_schedule,
-    .sleep          = rt_vcpu_sleep,
-    .wake           = rt_vcpu_wake,
+    .sleep          = rt_unit_sleep,
+    .wake           = rt_unit_wake,
     .context_saved  = rt_context_saved,
 };
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 6dc96b3cd4..8a8e73fe57 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -252,6 +252,7 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 {
     struct domain *d = v->domain;
+    struct sched_unit unit = { .vcpu = v };
 
     v->processor = processor;
 
@@ -263,7 +264,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), v, d->sched_priv);
+    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), &unit, d->sched_priv);
     if ( v->sched_priv == NULL )
         return 1;
 
@@ -284,7 +285,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     }
     else
     {
-        sched_insert_vcpu(dom_scheduler(d), v);
+        sched_insert_unit(dom_scheduler(d), &unit);
     }
 
     return 0;
@@ -305,6 +306,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     void *vcpudata;
     struct scheduler *old_ops;
     void *old_domdata;
+    struct sched_unit unit;
 
     for_each_vcpu ( d, v )
     {
@@ -325,7 +327,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, v, domdata);
+        unit.vcpu = v;
+        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, &unit, domdata);
         if ( vcpu_priv[v->vcpu_id] == NULL )
         {
             for_each_vcpu ( d, v )
@@ -343,7 +346,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        sched_remove_vcpu(old_ops, v);
+        unit.vcpu = v;
+        sched_remove_unit(old_ops, &unit);
     }
 
     d->cpupool = c;
@@ -354,6 +358,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     {
         spinlock_t *lock;
 
+        unit.vcpu = v;
         vcpudata = v->sched_priv;
 
         migrate_timer(&v->periodic_timer, new_p);
@@ -378,7 +383,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         new_p = cpumask_cycle(new_p, c->cpu_valid);
 
-        sched_insert_vcpu(c->sched, v);
+        sched_insert_unit(c->sched, &unit);
 
         sched_free_vdata(old_ops, vcpudata);
     }
@@ -396,12 +401,14 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
 void sched_destroy_vcpu(struct vcpu *v)
 {
+    struct sched_unit unit = { .vcpu = v };
+
     kill_timer(&v->periodic_timer);
     kill_timer(&v->singleshot_timer);
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
-    sched_remove_vcpu(vcpu_scheduler(v), v);
+    sched_remove_unit(vcpu_scheduler(v), &unit);
     sched_free_vdata(vcpu_scheduler(v), v->sched_priv);
 }
 
@@ -446,6 +453,8 @@ void sched_destroy_domain(struct domain *d)
 
 void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
+    struct sched_unit unit = { .vcpu = v };
+
     ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
 
     if ( likely(!vcpu_runnable(v)) )
@@ -453,7 +462,7 @@ void vcpu_sleep_nosync_locked(struct vcpu *v)
         if ( v->runstate.state == RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
 
-        sched_sleep(vcpu_scheduler(v), v);
+        sched_sleep(vcpu_scheduler(v), &unit);
     }
 }
 
@@ -485,6 +494,7 @@ void vcpu_wake(struct vcpu *v)
 {
     unsigned long flags;
     spinlock_t *lock;
+    struct sched_unit unit = { .vcpu = v };
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
@@ -494,7 +504,7 @@ void vcpu_wake(struct vcpu *v)
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        sched_wake(vcpu_scheduler(v), v);
+        sched_wake(vcpu_scheduler(v), &unit);
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -533,6 +543,7 @@ void vcpu_unblock(struct vcpu *v)
 static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
 {
     unsigned int old_cpu = v->processor;
+    struct sched_unit unit = { .vcpu = v };
 
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
@@ -549,7 +560,7 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
      * Actual CPU switch to new CPU.  This is safe because the lock
      * pointer can't change while the current lock is held.
      */
-    sched_migrate(vcpu_scheduler(v), v, new_cpu);
+    sched_migrate(vcpu_scheduler(v), &unit, new_cpu);
 }
 
 /*
@@ -591,6 +602,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
     unsigned int old_cpu, new_cpu;
     spinlock_t *old_lock, *new_lock;
     bool_t pick_called = 0;
+    struct sched_unit unit = { .vcpu = v };
 
     /*
      * If the vcpu is currently running, this will be handled by
@@ -627,7 +639,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = sched_pick_cpu(vcpu_scheduler(v), v);
+            new_cpu = sched_pick_cpu(vcpu_scheduler(v), &unit);
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -697,6 +709,7 @@ void restore_vcpu_affinity(struct domain *d)
     {
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
+        struct sched_unit unit = { .vcpu = v };
 
         ASSERT(!vcpu_runnable(v));
 
@@ -732,7 +745,7 @@ void restore_vcpu_affinity(struct domain *d)
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = sched_pick_cpu(vcpu_scheduler(v), v);
+        v->processor = sched_pick_cpu(vcpu_scheduler(v), &unit);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -844,7 +857,9 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    sched_adjust_affinity(dom_scheduler(v->domain), v, hard, soft);
+    struct sched_unit unit = { .vcpu = v };
+
+    sched_adjust_affinity(dom_scheduler(v->domain), &unit, hard, soft);
 
     if ( hard )
         cpumask_copy(v->cpu_hard_affinity, hard);
@@ -1017,9 +1032,10 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=current;
+    struct sched_unit unit = { .vcpu = v };
     spinlock_t *lock = vcpu_schedule_lock_irq(v);
 
-    sched_yield(vcpu_scheduler(v), v);
+    sched_yield(vcpu_scheduler(v), &unit);
     vcpu_schedule_unlock_irq(lock, v);
 
     SCHED_STAT_CRANK(vcpu_yield);
@@ -1514,6 +1530,8 @@ static void schedule(void)
 
 void context_saved(struct vcpu *prev)
 {
+    struct sched_unit unit = { .vcpu = prev };
+
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
 
@@ -1522,7 +1540,7 @@ void context_saved(struct vcpu *prev)
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
 
-    sched_context_saved(vcpu_scheduler(prev), prev);
+    sched_context_saved(vcpu_scheduler(prev), &unit);
 
     vcpu_migrate_finish(prev);
 }
@@ -1578,6 +1596,7 @@ static int cpu_schedule_up(unsigned int cpu)
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
+        struct sched_unit unit = { .vcpu = idle };
 
         /*
          * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
@@ -1591,7 +1610,7 @@ static int cpu_schedule_up(unsigned int cpu)
          */
         ASSERT(idle->sched_priv == NULL);
 
-        idle->sched_priv = sched_alloc_vdata(&ops, idle,
+        idle->sched_priv = sched_alloc_vdata(&ops, &unit,
                                              idle->domain->sched_priv);
         if ( idle->sched_priv == NULL )
             return -ENOMEM;
@@ -1808,6 +1827,7 @@ void __init scheduler_init(void)
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
+    struct sched_unit unit;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
@@ -1844,10 +1864,11 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      *    sched_priv field of the per-vCPU info of the idle domain.
      */
     idle = idle_vcpu[cpu];
+    unit.vcpu = idle;
     ppriv = sched_alloc_pdata(new_ops, cpu);
     if ( IS_ERR(ppriv) )
         return PTR_ERR(ppriv);
-    vpriv = sched_alloc_vdata(new_ops, idle, idle->domain->sched_priv);
+    vpriv = sched_alloc_vdata(new_ops, &unit, idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
         sched_free_pdata(new_ops, ppriv, cpu);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index b8e2b2e49e..8ddbeb4ffd 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -141,8 +141,8 @@ struct scheduler {
     void         (*deinit)         (struct scheduler *);
 
     void         (*free_vdata)     (const struct scheduler *, void *);
-    void *       (*alloc_vdata)    (const struct scheduler *, struct vcpu *,
-                                    void *);
+    void *       (*alloc_vdata)    (const struct scheduler *,
+                                    struct sched_unit *, void *);
     void         (*free_pdata)     (const struct scheduler *, void *, int);
     void *       (*alloc_pdata)    (const struct scheduler *, int);
     void         (*init_pdata)     (const struct scheduler *, void *, int);
@@ -156,24 +156,32 @@ struct scheduler {
     spinlock_t * (*switch_sched)   (struct scheduler *, unsigned int,
                                     void *, void *);
 
-    /* Activate / deactivate vcpus in a cpu pool */
-    void         (*insert_vcpu)    (const struct scheduler *, struct vcpu *);
-    void         (*remove_vcpu)    (const struct scheduler *, struct vcpu *);
-
-    void         (*sleep)          (const struct scheduler *, struct vcpu *);
-    void         (*wake)           (const struct scheduler *, struct vcpu *);
-    void         (*yield)          (const struct scheduler *, struct vcpu *);
-    void         (*context_saved)  (const struct scheduler *, struct vcpu *);
+    /* Activate / deactivate units in a cpu pool */
+    void         (*insert_unit)    (const struct scheduler *,
+                                    struct sched_unit *);
+    void         (*remove_unit)    (const struct scheduler *,
+                                    struct sched_unit *);
+
+    void         (*sleep)          (const struct scheduler *,
+                                    struct sched_unit *);
+    void         (*wake)           (const struct scheduler *,
+                                    struct sched_unit *);
+    void         (*yield)          (const struct scheduler *,
+                                    struct sched_unit *);
+    void         (*context_saved)  (const struct scheduler *,
+                                    struct sched_unit *);
 
     struct task_slice (*do_schedule) (const struct scheduler *, s_time_t,
                                       bool_t tasklet_work_scheduled);
 
-    int          (*pick_cpu)       (const struct scheduler *, struct vcpu *);
-    void         (*migrate)        (const struct scheduler *, struct vcpu *,
-                                    unsigned int);
+    int          (*pick_cpu)       (const struct scheduler *,
+                                    struct sched_unit *);
+    void         (*migrate)        (const struct scheduler *,
+                                    struct sched_unit *, unsigned int);
     int          (*adjust)         (const struct scheduler *, struct domain *,
                                     struct xen_domctl_scheduler_op *);
-    void         (*adjust_affinity)(const struct scheduler *, struct vcpu *,
+    void         (*adjust_affinity)(const struct scheduler *,
+                                    struct sched_unit *,
                                     const struct cpumask *,
                                     const struct cpumask *);
     int          (*adjust_global)  (const struct scheduler *,
@@ -267,10 +275,10 @@ static inline void sched_deinit_pdata(const struct scheduler *s, void *data,
         s->deinit_pdata(s, data, cpu);
 }
 
-static inline void *sched_alloc_vdata(const struct scheduler *s, struct vcpu *v,
-                                      void *dom_data)
+static inline void *sched_alloc_vdata(const struct scheduler *s,
+                                      struct sched_unit *unit, void *dom_data)
 {
-    return s->alloc_vdata(s, v, dom_data);
+    return s->alloc_vdata(s, unit, dom_data);
 }
 
 static inline void sched_free_vdata(const struct scheduler *s, void *data)
@@ -278,64 +286,70 @@ static inline void sched_free_vdata(const struct scheduler *s, void *data)
     s->free_vdata(s, data);
 }
 
-static inline void sched_insert_vcpu(const struct scheduler *s, struct vcpu *v)
+static inline void sched_insert_unit(const struct scheduler *s,
+                                     struct sched_unit *unit)
 {
-    if ( s->insert_vcpu )
-        s->insert_vcpu(s, v);
+    if ( s->insert_unit )
+        s->insert_unit(s, unit);
 }
 
-static inline void sched_remove_vcpu(const struct scheduler *s, struct vcpu *v)
+static inline void sched_remove_unit(const struct scheduler *s,
+                                     struct sched_unit *unit)
 {
-    if ( s->remove_vcpu )
-        s->remove_vcpu(s, v);
+    if ( s->remove_unit )
+        s->remove_unit(s, unit);
 }
 
-static inline void sched_sleep(const struct scheduler *s, struct vcpu *v)
+static inline void sched_sleep(const struct scheduler *s,
+                               struct sched_unit *unit)
 {
     if ( s->sleep )
-        s->sleep(s, v);
+        s->sleep(s, unit);
 }
 
-static inline void sched_wake(const struct scheduler *s, struct vcpu *v)
+static inline void sched_wake(const struct scheduler *s,
+                              struct sched_unit *unit)
 {
     if ( s->wake )
-        s->wake(s, v);
+        s->wake(s, unit);
 }
 
-static inline void sched_yield(const struct scheduler *s, struct vcpu *v)
+static inline void sched_yield(const struct scheduler *s,
+                               struct sched_unit *unit)
 {
     if ( s->yield )
-        s->yield(s, v);
+        s->yield(s, unit);
 }
 
 static inline void sched_context_saved(const struct scheduler *s,
-                                       struct vcpu *v)
+                                       struct sched_unit *unit)
 {
     if ( s->context_saved )
-        s->context_saved(s, v);
+        s->context_saved(s, unit);
 }
 
-static inline void sched_migrate(const struct scheduler *s, struct vcpu *v,
-                                 unsigned int cpu)
+static inline void sched_migrate(const struct scheduler *s,
+                                 struct sched_unit *unit, unsigned int cpu)
 {
     if ( s->migrate )
-        s->migrate(s, v, cpu);
+        s->migrate(s, unit, cpu);
     else
-        v->processor = cpu;
+        unit->vcpu->processor = cpu;
 }
 
-static inline int sched_pick_cpu(const struct scheduler *s, struct vcpu *v)
+static inline int sched_pick_cpu(const struct scheduler *s,
+                                 struct sched_unit *unit)
 {
-    return s->pick_cpu(s, v);
+    return s->pick_cpu(s, unit);
 }
 
 static inline void sched_adjust_affinity(const struct scheduler *s,
-                                         struct vcpu *v,
+                                         struct sched_unit *unit,
                                          const cpumask_t *hard,
                                          const cpumask_t *soft)
 {
     if ( s->adjust_affinity )
-        s->adjust_affinity(s, v, hard, soft);
+        s->adjust_affinity(s, unit, hard, soft);
 }
 
 static inline int sched_adjust_dom(const struct scheduler *s, struct domain *d,
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 2201faca6b..72a17758a1 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -275,6 +275,10 @@ struct vcpu
     struct arch_vcpu arch;
 };
 
+struct sched_unit {
+    struct vcpu           *vcpu;
+};
+
 /* Per-domain lock can be recursively acquired in fault handlers. */
 #define domain_lock(d) spin_lock_recursive(&(d)->domain_lock)
 #define domain_unlock(d) spin_unlock_recursive(&(d)->domain_lock)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 04/60] xen/sched: use new sched_unit instead of vcpu in scheduler interfaces
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

In order to prepare core- and socket-scheduling use a new struct
sched_unit instead of struct vcpu for interfaces of the different
schedulers.

Rename the per-scheduler functions insert_vcpu and remove_vcpu to
insert_unit and remove_unit to reflect the change of the parameter.
In the schedulers rename local functions switched to sched_unit, too.

For now this new struct will contain a vcpu pointer only and is
allocated on the stack. This will be changed later.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2:
- move definition of struct sched_unit to sched.h (Andrew Cooper)
V1:
- rename "item" to "unit" (George Dunlap)
---
 xen/common/sched_arinc653.c | 30 +++++++++------
 xen/common/sched_credit.c   | 41 ++++++++++++--------
 xen/common/sched_credit2.c  | 57 ++++++++++++++++------------
 xen/common/sched_null.c     | 39 ++++++++++++-------
 xen/common/sched_rt.c       | 33 +++++++++-------
 xen/common/schedule.c       | 53 ++++++++++++++++++--------
 xen/include/xen/sched-if.h  | 92 ++++++++++++++++++++++++++-------------------
 xen/include/xen/sched.h     |  4 ++
 8 files changed, 218 insertions(+), 131 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 72b988ea5f..283730b3f8 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -376,13 +376,16 @@ a653sched_deinit(struct scheduler *ops)
  * This function allocates scheduler-specific data for a VCPU
  *
  * @param ops       Pointer to this instance of the scheduler structure
+ * @param unit      Pointer to struct sched_unit
  *
  * @return          Pointer to the allocated data
  */
 static void *
-a653sched_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+a653sched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
+                      void *dd)
 {
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
+    struct vcpu *vc = unit->vcpu;
     arinc653_vcpu_t *svc;
     unsigned int entry;
     unsigned long flags;
@@ -458,11 +461,13 @@ a653sched_free_vdata(const struct scheduler *ops, void *priv)
  * Xen scheduler callback function to sleep a VCPU
  *
  * @param ops       Pointer to this instance of the scheduler structure
- * @param vc        Pointer to the VCPU structure for the current domain
+ * @param unit      Pointer to struct sched_unit
  */
 static void
-a653sched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
+
     if ( AVCPU(vc) != NULL )
         AVCPU(vc)->awake = 0;
 
@@ -478,11 +483,13 @@ a653sched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
  * Xen scheduler callback function to wake up a VCPU
  *
  * @param ops       Pointer to this instance of the scheduler structure
- * @param vc        Pointer to the VCPU structure for the current domain
+ * @param unit      Pointer to struct sched_unit
  */
 static void
-a653sched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+a653sched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
+
     if ( AVCPU(vc) != NULL )
         AVCPU(vc)->awake = 1;
 
@@ -597,13 +604,14 @@ a653sched_do_schedule(
  * Xen scheduler callback function to select a CPU for the VCPU to run on
  *
  * @param ops       Pointer to this instance of the scheduler structure
- * @param v         Pointer to the VCPU structure for the current domain
+ * @param unit      Pointer to struct sched_unit
  *
  * @return          Number of selected physical CPU
  */
 static int
-a653sched_pick_cpu(const struct scheduler *ops, struct vcpu *vc)
+a653sched_pick_cpu(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     cpumask_t *online;
     unsigned int cpu;
 
@@ -702,11 +710,11 @@ static const struct scheduler sched_arinc653_def = {
     .free_vdata     = a653sched_free_vdata,
     .alloc_vdata    = a653sched_alloc_vdata,
 
-    .insert_vcpu    = NULL,
-    .remove_vcpu    = NULL,
+    .insert_unit    = NULL,
+    .remove_unit    = NULL,
 
-    .sleep          = a653sched_vcpu_sleep,
-    .wake           = a653sched_vcpu_wake,
+    .sleep          = a653sched_unit_sleep,
+    .wake           = a653sched_unit_wake,
     .yield          = NULL,
     .context_saved  = NULL,
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 621c408ed0..eba4db38bb 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -859,15 +859,16 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
 }
 
 static int
-csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
+csched_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu *svc = CSCHED_VCPU(vc);
 
     /*
      * We have been called by vcpu_migrate() (in schedule.c), as part
      * of the process of seeing if vc can be migrated to another pcpu.
      * We make a note about this in svc->flags so that later, in
-     * csched_vcpu_wake() (still called from vcpu_migrate()) we won't
+     * csched_unit_wake() (still called from vcpu_migrate()) we won't
      * get boosted, which we don't deserve as we are "only" migrating.
      */
     set_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
@@ -995,8 +996,10 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
 }
 
 static void *
-csched_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+csched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
+                   void *dd)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu *svc;
 
     /* Allocate per-VCPU info */
@@ -1016,8 +1019,9 @@ csched_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
 }
 
 static void
-csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu *svc = vc->sched_priv;
     spinlock_t *lock;
 
@@ -1026,7 +1030,7 @@ csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
     /* csched_cpu_pick() looks in vc->processor's runq, so we need the lock. */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched_cpu_pick(ops, vc);
+    vc->processor = csched_cpu_pick(ops, unit);
 
     spin_unlock_irq(lock);
 
@@ -1051,9 +1055,10 @@ csched_free_vdata(const struct scheduler *ops, void *priv)
 }
 
 static void
-csched_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
+csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched_private *prv = CSCHED_PRIV(ops);
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
     struct csched_dom * const sdom = svc->sdom;
 
@@ -1078,8 +1083,9 @@ csched_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
     unsigned int cpu = vc->processor;
 
@@ -1102,8 +1108,9 @@ csched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
     bool_t migrating;
 
@@ -1163,8 +1170,9 @@ csched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched_vcpu_yield(const struct scheduler *ops, struct vcpu *vc)
+csched_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
 
     /* Let the scheduler know that this vcpu is trying to yield */
@@ -1217,9 +1225,10 @@ csched_dom_cntl(
 }
 
 static void
-csched_aff_cntl(const struct scheduler *ops, struct vcpu *v,
+csched_aff_cntl(const struct scheduler *ops, struct sched_unit *unit,
                 const cpumask_t *hard, const cpumask_t *soft)
 {
+    struct vcpu *v = unit->vcpu;
     struct csched_vcpu *svc = CSCHED_VCPU(v);
 
     if ( !hard )
@@ -1748,7 +1757,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                  * - if we race with inc_nr_runnable(), we skip a pCPU that may
                  *   have runnable vCPUs in its runqueue, but that's not a
                  *   problem because:
-                 *   + if racing with csched_vcpu_insert() or csched_vcpu_wake(),
+                 *   + if racing with csched_unit_insert() or csched_unit_wake(),
                  *     __runq_tickle() will be called afterwords, so the vCPU
                  *     won't get stuck in the runqueue for too long;
                  *   + if racing with csched_runq_steal(), it may be that a
@@ -2260,12 +2269,12 @@ static const struct scheduler sched_credit_def = {
 
     .global_init    = csched_global_init,
 
-    .insert_vcpu    = csched_vcpu_insert,
-    .remove_vcpu    = csched_vcpu_remove,
+    .insert_unit    = csched_unit_insert,
+    .remove_unit    = csched_unit_remove,
 
-    .sleep          = csched_vcpu_sleep,
-    .wake           = csched_vcpu_wake,
-    .yield          = csched_vcpu_yield,
+    .sleep          = csched_unit_sleep,
+    .wake           = csched_unit_wake,
+    .yield          = csched_unit_yield,
 
     .adjust         = csched_dom_cntl,
     .adjust_affinity= csched_aff_cntl,
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 8e4381d8a7..def8d87484 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -273,7 +273,7 @@
  * CSFLAG_delayed_runq_add: Do we need to add this to the runqueue once it'd done
  * being context switched out?
  * + Set when scheduling out in csched2_schedule() if prev is runnable
- * + Set in csched2_vcpu_wake if it finds CSFLAG_scheduled set
+ * + Set in csched2_unit_wake if it finds CSFLAG_scheduled set
  * + Read in csched2_context_saved().  If set, it adds prev to the runqueue and
  *   clears the bit.
  */
@@ -623,14 +623,14 @@ static inline bool has_cap(const struct csched2_vcpu *svc)
  * This logic is entirely implemented in runq_tickle(), and that is enough.
  * In fact, in this scheduler, placement of a vcpu on one of the pcpus of a
  * runq, _always_ happens by means of tickling:
- *  - when a vcpu wakes up, it calls csched2_vcpu_wake(), which calls
+ *  - when a vcpu wakes up, it calls csched2_unit_wake(), which calls
  *    runq_tickle();
  *  - when a migration is initiated in schedule.c, we call csched2_cpu_pick(),
- *    csched2_vcpu_migrate() (which calls migrate()) and csched2_vcpu_wake().
+ *    csched2_unit_migrate() (which calls migrate()) and csched2_unit_wake().
  *    csched2_cpu_pick() looks for the least loaded runq and return just any
- *    of its processors. Then, csched2_vcpu_migrate() just moves the vcpu to
+ *    of its processors. Then, csched2_unit_migrate() just moves the vcpu to
  *    the chosen runq, and it is again runq_tickle(), called by
- *    csched2_vcpu_wake() that actually decides what pcpu to use within the
+ *    csched2_unit_wake() that actually decides what pcpu to use within the
  *    chosen runq;
  *  - when a migration is initiated in sched_credit2.c, by calling  migrate()
  *    directly, that again temporarily use a random pcpu from the new runq,
@@ -2026,8 +2026,10 @@ csched2_vcpu_check(struct vcpu *vc)
 #endif
 
 static void *
-csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
+                    void *dd)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu *svc;
 
     /* Allocate per-VCPU info */
@@ -2069,8 +2071,9 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
 }
 
 static void
-csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
 
     ASSERT(!is_idle_vcpu(vc));
@@ -2091,8 +2094,9 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     unsigned int cpu = vc->processor;
     s_time_t now;
@@ -2146,16 +2150,18 @@ out:
 }
 
 static void
-csched2_vcpu_yield(const struct scheduler *ops, struct vcpu *v)
+csched2_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(v);
 
     __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
 }
 
 static void
-csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
+csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
     s_time_t now = NOW();
@@ -2196,9 +2202,10 @@ csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
 
 #define MAX_LOAD (STIME_MAX)
 static int
-csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
+csched2_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched2_private *prv = csched2_priv(ops);
+    struct vcpu *vc = unit->vcpu;
     int i, min_rqi = -1, min_s_rqi = -1;
     unsigned int new_cpu, cpu = vc->processor;
     struct csched2_vcpu *svc = csched2_vcpu(vc);
@@ -2733,9 +2740,10 @@ retry:
 }
 
 static void
-csched2_vcpu_migrate(
-    const struct scheduler *ops, struct vcpu *vc, unsigned int new_cpu)
+csched2_unit_migrate(
+    const struct scheduler *ops, struct sched_unit *unit, unsigned int new_cpu)
 {
+    struct vcpu *vc = unit->vcpu;
     struct domain *d = vc->domain;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     struct csched2_runqueue_data *trqd;
@@ -2996,9 +3004,10 @@ csched2_dom_cntl(
 }
 
 static void
-csched2_aff_cntl(const struct scheduler *ops, struct vcpu *v,
+csched2_aff_cntl(const struct scheduler *ops, struct sched_unit *unit,
                  const cpumask_t *hard, const cpumask_t *soft)
 {
+    struct vcpu *v = unit->vcpu;
     struct csched2_vcpu *svc = csched2_vcpu(v);
 
     if ( !hard )
@@ -3096,8 +3105,9 @@ csched2_free_domdata(const struct scheduler *ops, void *data)
 }
 
 static void
-csched2_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu *svc = vc->sched_priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
@@ -3108,7 +3118,7 @@ csched2_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
     /* csched2_cpu_pick() expects the pcpu lock to be held */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched2_cpu_pick(ops, vc);
+    vc->processor = csched2_cpu_pick(ops, unit);
 
     spin_unlock_irq(lock);
 
@@ -3135,8 +3145,9 @@ csched2_free_vdata(const struct scheduler *ops, void *priv)
 }
 
 static void
-csched2_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
+csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     spinlock_t *lock;
 
@@ -4074,19 +4085,19 @@ static const struct scheduler sched_credit2_def = {
 
     .global_init    = csched2_global_init,
 
-    .insert_vcpu    = csched2_vcpu_insert,
-    .remove_vcpu    = csched2_vcpu_remove,
+    .insert_unit    = csched2_unit_insert,
+    .remove_unit    = csched2_unit_remove,
 
-    .sleep          = csched2_vcpu_sleep,
-    .wake           = csched2_vcpu_wake,
-    .yield          = csched2_vcpu_yield,
+    .sleep          = csched2_unit_sleep,
+    .wake           = csched2_unit_wake,
+    .yield          = csched2_unit_yield,
 
     .adjust         = csched2_dom_cntl,
     .adjust_affinity= csched2_aff_cntl,
     .adjust_global  = csched2_sys_cntl,
 
     .pick_cpu       = csched2_cpu_pick,
-    .migrate        = csched2_vcpu_migrate,
+    .migrate        = csched2_unit_migrate,
     .do_schedule    = csched2_schedule,
     .context_saved  = csched2_context_saved,
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 10427b37ab..bf52c84b0f 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -194,8 +194,9 @@ static void null_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 }
 
 static void *null_alloc_vdata(const struct scheduler *ops,
-                              struct vcpu *v, void *dd)
+                              struct sched_unit *unit, void *dd)
 {
+    struct vcpu *v = unit->vcpu;
     struct null_vcpu *nvc;
 
     nvc = xzalloc(struct null_vcpu);
@@ -405,8 +406,10 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
     return &sd->_lock;
 }
 
-static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v)
+static void null_unit_insert(const struct scheduler *ops,
+                             struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_vcpu *nvc = null_vcpu(v);
     unsigned int cpu;
@@ -497,8 +500,10 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
     spin_unlock(&prv->waitq_lock);
 }
 
-static void null_vcpu_remove(const struct scheduler *ops, struct vcpu *v)
+static void null_unit_remove(const struct scheduler *ops,
+                             struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_vcpu *nvc = null_vcpu(v);
     spinlock_t *lock;
@@ -528,8 +533,11 @@ static void null_vcpu_remove(const struct scheduler *ops, struct vcpu *v)
     SCHED_STAT_CRANK(vcpu_remove);
 }
 
-static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v)
+static void null_unit_wake(const struct scheduler *ops,
+                           struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
+
     ASSERT(!is_idle_vcpu(v));
 
     if ( unlikely(curr_on_cpu(v->processor) == v) )
@@ -554,8 +562,11 @@ static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v)
     cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
 }
 
-static void null_vcpu_sleep(const struct scheduler *ops, struct vcpu *v)
+static void null_unit_sleep(const struct scheduler *ops,
+                            struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
+
     ASSERT(!is_idle_vcpu(v));
 
     /* If v is not assigned to a pCPU, or is not running, no need to bother */
@@ -565,15 +576,17 @@ static void null_vcpu_sleep(const struct scheduler *ops, struct vcpu *v)
     SCHED_STAT_CRANK(vcpu_sleep);
 }
 
-static int null_cpu_pick(const struct scheduler *ops, struct vcpu *v)
+static int null_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *v = unit->vcpu;
     ASSERT(!is_idle_vcpu(v));
     return pick_cpu(null_priv(ops), v);
 }
 
-static void null_vcpu_migrate(const struct scheduler *ops, struct vcpu *v,
-                              unsigned int new_cpu)
+static void null_unit_migrate(const struct scheduler *ops,
+                              struct sched_unit *unit, unsigned int new_cpu)
 {
+    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_vcpu *nvc = null_vcpu(v);
 
@@ -880,13 +893,13 @@ const struct scheduler sched_null_def = {
     .alloc_domdata  = null_alloc_domdata,
     .free_domdata   = null_free_domdata,
 
-    .insert_vcpu    = null_vcpu_insert,
-    .remove_vcpu    = null_vcpu_remove,
+    .insert_unit    = null_unit_insert,
+    .remove_unit    = null_unit_remove,
 
-    .wake           = null_vcpu_wake,
-    .sleep          = null_vcpu_sleep,
+    .wake           = null_unit_wake,
+    .sleep          = null_unit_sleep,
     .pick_cpu       = null_cpu_pick,
-    .migrate        = null_vcpu_migrate,
+    .migrate        = null_unit_migrate,
     .do_schedule    = null_schedule,
 
     .dump_cpu_state = null_dump_pcpu,
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 0acfc3d702..63ec732157 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -136,7 +136,7 @@
  * RTDS_delayed_runq_add: Do we need to add this to the RunQ/DepletedQ
  * once it's done being context switching out?
  * + Set when scheduling out in rt_schedule() if prev is runable
- * + Set in rt_vcpu_wake if it finds RTDS_scheduled set
+ * + Set in rt_unit_wake if it finds RTDS_scheduled set
  * + Read in rt_context_saved(). If set, it adds prev to the Runqueue/DepletedQ
  *   and clears the bit.
  */
@@ -637,8 +637,9 @@ replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc)
  * and available cpus
  */
 static int
-rt_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
+rt_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     cpumask_t cpus;
     cpumask_t *online;
     int cpu;
@@ -838,8 +839,9 @@ rt_free_domdata(const struct scheduler *ops, void *data)
 }
 
 static void *
-rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu *svc;
 
     /* Allocate per-VCPU info */
@@ -881,8 +883,9 @@ rt_free_vdata(const struct scheduler *ops, void *priv)
  * dest. cpupool.
  */
 static void
-rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu *svc = rt_vcpu(vc);
     s_time_t now;
     spinlock_t *lock;
@@ -890,7 +893,7 @@ rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
     BUG_ON( is_idle_vcpu(vc) );
 
     /* This is safe because vc isn't yet being scheduled */
-    vc->processor = rt_cpu_pick(ops, vc);
+    vc->processor = rt_cpu_pick(ops, unit);
 
     lock = vcpu_schedule_lock_irq(vc);
 
@@ -914,8 +917,9 @@ rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
  * Remove rt_vcpu svc from the old scheduler in source cpupool.
  */
 static void
-rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
+rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu * const svc = rt_vcpu(vc);
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
@@ -1134,8 +1138,9 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
  * The lock is already grabbed in schedule.c, no need to lock here
  */
 static void
-rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu * const svc = rt_vcpu(vc);
 
     BUG_ON( is_idle_vcpu(vc) );
@@ -1249,8 +1254,9 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
  * TODO: what if these two vcpus belongs to the same domain?
  */
 static void
-rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu * const svc = rt_vcpu(vc);
     s_time_t now;
     bool_t missed;
@@ -1319,8 +1325,9 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
  * and then pick the highest priority vcpu from runq to run
  */
 static void
-rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
+rt_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
+    struct vcpu *vc = unit->vcpu;
     struct rt_vcpu *svc = rt_vcpu(vc);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
 
@@ -1549,15 +1556,15 @@ static const struct scheduler sched_rtds_def = {
     .free_domdata   = rt_free_domdata,
     .alloc_vdata    = rt_alloc_vdata,
     .free_vdata     = rt_free_vdata,
-    .insert_vcpu    = rt_vcpu_insert,
-    .remove_vcpu    = rt_vcpu_remove,
+    .insert_unit    = rt_unit_insert,
+    .remove_unit    = rt_unit_remove,
 
     .adjust         = rt_dom_cntl,
 
     .pick_cpu       = rt_cpu_pick,
     .do_schedule    = rt_schedule,
-    .sleep          = rt_vcpu_sleep,
-    .wake           = rt_vcpu_wake,
+    .sleep          = rt_unit_sleep,
+    .wake           = rt_unit_wake,
     .context_saved  = rt_context_saved,
 };
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 6dc96b3cd4..8a8e73fe57 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -252,6 +252,7 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 {
     struct domain *d = v->domain;
+    struct sched_unit unit = { .vcpu = v };
 
     v->processor = processor;
 
@@ -263,7 +264,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), v, d->sched_priv);
+    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), &unit, d->sched_priv);
     if ( v->sched_priv == NULL )
         return 1;
 
@@ -284,7 +285,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     }
     else
     {
-        sched_insert_vcpu(dom_scheduler(d), v);
+        sched_insert_unit(dom_scheduler(d), &unit);
     }
 
     return 0;
@@ -305,6 +306,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     void *vcpudata;
     struct scheduler *old_ops;
     void *old_domdata;
+    struct sched_unit unit;
 
     for_each_vcpu ( d, v )
     {
@@ -325,7 +327,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, v, domdata);
+        unit.vcpu = v;
+        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, &unit, domdata);
         if ( vcpu_priv[v->vcpu_id] == NULL )
         {
             for_each_vcpu ( d, v )
@@ -343,7 +346,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        sched_remove_vcpu(old_ops, v);
+        unit.vcpu = v;
+        sched_remove_unit(old_ops, &unit);
     }
 
     d->cpupool = c;
@@ -354,6 +358,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     {
         spinlock_t *lock;
 
+        unit.vcpu = v;
         vcpudata = v->sched_priv;
 
         migrate_timer(&v->periodic_timer, new_p);
@@ -378,7 +383,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         new_p = cpumask_cycle(new_p, c->cpu_valid);
 
-        sched_insert_vcpu(c->sched, v);
+        sched_insert_unit(c->sched, &unit);
 
         sched_free_vdata(old_ops, vcpudata);
     }
@@ -396,12 +401,14 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
 void sched_destroy_vcpu(struct vcpu *v)
 {
+    struct sched_unit unit = { .vcpu = v };
+
     kill_timer(&v->periodic_timer);
     kill_timer(&v->singleshot_timer);
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
-    sched_remove_vcpu(vcpu_scheduler(v), v);
+    sched_remove_unit(vcpu_scheduler(v), &unit);
     sched_free_vdata(vcpu_scheduler(v), v->sched_priv);
 }
 
@@ -446,6 +453,8 @@ void sched_destroy_domain(struct domain *d)
 
 void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
+    struct sched_unit unit = { .vcpu = v };
+
     ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
 
     if ( likely(!vcpu_runnable(v)) )
@@ -453,7 +462,7 @@ void vcpu_sleep_nosync_locked(struct vcpu *v)
         if ( v->runstate.state == RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
 
-        sched_sleep(vcpu_scheduler(v), v);
+        sched_sleep(vcpu_scheduler(v), &unit);
     }
 }
 
@@ -485,6 +494,7 @@ void vcpu_wake(struct vcpu *v)
 {
     unsigned long flags;
     spinlock_t *lock;
+    struct sched_unit unit = { .vcpu = v };
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
@@ -494,7 +504,7 @@ void vcpu_wake(struct vcpu *v)
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        sched_wake(vcpu_scheduler(v), v);
+        sched_wake(vcpu_scheduler(v), &unit);
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -533,6 +543,7 @@ void vcpu_unblock(struct vcpu *v)
 static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
 {
     unsigned int old_cpu = v->processor;
+    struct sched_unit unit = { .vcpu = v };
 
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
@@ -549,7 +560,7 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
      * Actual CPU switch to new CPU.  This is safe because the lock
      * pointer can't change while the current lock is held.
      */
-    sched_migrate(vcpu_scheduler(v), v, new_cpu);
+    sched_migrate(vcpu_scheduler(v), &unit, new_cpu);
 }
 
 /*
@@ -591,6 +602,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
     unsigned int old_cpu, new_cpu;
     spinlock_t *old_lock, *new_lock;
     bool_t pick_called = 0;
+    struct sched_unit unit = { .vcpu = v };
 
     /*
      * If the vcpu is currently running, this will be handled by
@@ -627,7 +639,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = sched_pick_cpu(vcpu_scheduler(v), v);
+            new_cpu = sched_pick_cpu(vcpu_scheduler(v), &unit);
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -697,6 +709,7 @@ void restore_vcpu_affinity(struct domain *d)
     {
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
+        struct sched_unit unit = { .vcpu = v };
 
         ASSERT(!vcpu_runnable(v));
 
@@ -732,7 +745,7 @@ void restore_vcpu_affinity(struct domain *d)
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = sched_pick_cpu(vcpu_scheduler(v), v);
+        v->processor = sched_pick_cpu(vcpu_scheduler(v), &unit);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -844,7 +857,9 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    sched_adjust_affinity(dom_scheduler(v->domain), v, hard, soft);
+    struct sched_unit unit = { .vcpu = v };
+
+    sched_adjust_affinity(dom_scheduler(v->domain), &unit, hard, soft);
 
     if ( hard )
         cpumask_copy(v->cpu_hard_affinity, hard);
@@ -1017,9 +1032,10 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=current;
+    struct sched_unit unit = { .vcpu = v };
     spinlock_t *lock = vcpu_schedule_lock_irq(v);
 
-    sched_yield(vcpu_scheduler(v), v);
+    sched_yield(vcpu_scheduler(v), &unit);
     vcpu_schedule_unlock_irq(lock, v);
 
     SCHED_STAT_CRANK(vcpu_yield);
@@ -1514,6 +1530,8 @@ static void schedule(void)
 
 void context_saved(struct vcpu *prev)
 {
+    struct sched_unit unit = { .vcpu = prev };
+
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
 
@@ -1522,7 +1540,7 @@ void context_saved(struct vcpu *prev)
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
 
-    sched_context_saved(vcpu_scheduler(prev), prev);
+    sched_context_saved(vcpu_scheduler(prev), &unit);
 
     vcpu_migrate_finish(prev);
 }
@@ -1578,6 +1596,7 @@ static int cpu_schedule_up(unsigned int cpu)
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
+        struct sched_unit unit = { .vcpu = idle };
 
         /*
          * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
@@ -1591,7 +1610,7 @@ static int cpu_schedule_up(unsigned int cpu)
          */
         ASSERT(idle->sched_priv == NULL);
 
-        idle->sched_priv = sched_alloc_vdata(&ops, idle,
+        idle->sched_priv = sched_alloc_vdata(&ops, &unit,
                                              idle->domain->sched_priv);
         if ( idle->sched_priv == NULL )
             return -ENOMEM;
@@ -1808,6 +1827,7 @@ void __init scheduler_init(void)
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
+    struct sched_unit unit;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
@@ -1844,10 +1864,11 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      *    sched_priv field of the per-vCPU info of the idle domain.
      */
     idle = idle_vcpu[cpu];
+    unit.vcpu = idle;
     ppriv = sched_alloc_pdata(new_ops, cpu);
     if ( IS_ERR(ppriv) )
         return PTR_ERR(ppriv);
-    vpriv = sched_alloc_vdata(new_ops, idle, idle->domain->sched_priv);
+    vpriv = sched_alloc_vdata(new_ops, &unit, idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
         sched_free_pdata(new_ops, ppriv, cpu);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index b8e2b2e49e..8ddbeb4ffd 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -141,8 +141,8 @@ struct scheduler {
     void         (*deinit)         (struct scheduler *);
 
     void         (*free_vdata)     (const struct scheduler *, void *);
-    void *       (*alloc_vdata)    (const struct scheduler *, struct vcpu *,
-                                    void *);
+    void *       (*alloc_vdata)    (const struct scheduler *,
+                                    struct sched_unit *, void *);
     void         (*free_pdata)     (const struct scheduler *, void *, int);
     void *       (*alloc_pdata)    (const struct scheduler *, int);
     void         (*init_pdata)     (const struct scheduler *, void *, int);
@@ -156,24 +156,32 @@ struct scheduler {
     spinlock_t * (*switch_sched)   (struct scheduler *, unsigned int,
                                     void *, void *);
 
-    /* Activate / deactivate vcpus in a cpu pool */
-    void         (*insert_vcpu)    (const struct scheduler *, struct vcpu *);
-    void         (*remove_vcpu)    (const struct scheduler *, struct vcpu *);
-
-    void         (*sleep)          (const struct scheduler *, struct vcpu *);
-    void         (*wake)           (const struct scheduler *, struct vcpu *);
-    void         (*yield)          (const struct scheduler *, struct vcpu *);
-    void         (*context_saved)  (const struct scheduler *, struct vcpu *);
+    /* Activate / deactivate units in a cpu pool */
+    void         (*insert_unit)    (const struct scheduler *,
+                                    struct sched_unit *);
+    void         (*remove_unit)    (const struct scheduler *,
+                                    struct sched_unit *);
+
+    void         (*sleep)          (const struct scheduler *,
+                                    struct sched_unit *);
+    void         (*wake)           (const struct scheduler *,
+                                    struct sched_unit *);
+    void         (*yield)          (const struct scheduler *,
+                                    struct sched_unit *);
+    void         (*context_saved)  (const struct scheduler *,
+                                    struct sched_unit *);
 
     struct task_slice (*do_schedule) (const struct scheduler *, s_time_t,
                                       bool_t tasklet_work_scheduled);
 
-    int          (*pick_cpu)       (const struct scheduler *, struct vcpu *);
-    void         (*migrate)        (const struct scheduler *, struct vcpu *,
-                                    unsigned int);
+    int          (*pick_cpu)       (const struct scheduler *,
+                                    struct sched_unit *);
+    void         (*migrate)        (const struct scheduler *,
+                                    struct sched_unit *, unsigned int);
     int          (*adjust)         (const struct scheduler *, struct domain *,
                                     struct xen_domctl_scheduler_op *);
-    void         (*adjust_affinity)(const struct scheduler *, struct vcpu *,
+    void         (*adjust_affinity)(const struct scheduler *,
+                                    struct sched_unit *,
                                     const struct cpumask *,
                                     const struct cpumask *);
     int          (*adjust_global)  (const struct scheduler *,
@@ -267,10 +275,10 @@ static inline void sched_deinit_pdata(const struct scheduler *s, void *data,
         s->deinit_pdata(s, data, cpu);
 }
 
-static inline void *sched_alloc_vdata(const struct scheduler *s, struct vcpu *v,
-                                      void *dom_data)
+static inline void *sched_alloc_vdata(const struct scheduler *s,
+                                      struct sched_unit *unit, void *dom_data)
 {
-    return s->alloc_vdata(s, v, dom_data);
+    return s->alloc_vdata(s, unit, dom_data);
 }
 
 static inline void sched_free_vdata(const struct scheduler *s, void *data)
@@ -278,64 +286,70 @@ static inline void sched_free_vdata(const struct scheduler *s, void *data)
     s->free_vdata(s, data);
 }
 
-static inline void sched_insert_vcpu(const struct scheduler *s, struct vcpu *v)
+static inline void sched_insert_unit(const struct scheduler *s,
+                                     struct sched_unit *unit)
 {
-    if ( s->insert_vcpu )
-        s->insert_vcpu(s, v);
+    if ( s->insert_unit )
+        s->insert_unit(s, unit);
 }
 
-static inline void sched_remove_vcpu(const struct scheduler *s, struct vcpu *v)
+static inline void sched_remove_unit(const struct scheduler *s,
+                                     struct sched_unit *unit)
 {
-    if ( s->remove_vcpu )
-        s->remove_vcpu(s, v);
+    if ( s->remove_unit )
+        s->remove_unit(s, unit);
 }
 
-static inline void sched_sleep(const struct scheduler *s, struct vcpu *v)
+static inline void sched_sleep(const struct scheduler *s,
+                               struct sched_unit *unit)
 {
     if ( s->sleep )
-        s->sleep(s, v);
+        s->sleep(s, unit);
 }
 
-static inline void sched_wake(const struct scheduler *s, struct vcpu *v)
+static inline void sched_wake(const struct scheduler *s,
+                              struct sched_unit *unit)
 {
     if ( s->wake )
-        s->wake(s, v);
+        s->wake(s, unit);
 }
 
-static inline void sched_yield(const struct scheduler *s, struct vcpu *v)
+static inline void sched_yield(const struct scheduler *s,
+                               struct sched_unit *unit)
 {
     if ( s->yield )
-        s->yield(s, v);
+        s->yield(s, unit);
 }
 
 static inline void sched_context_saved(const struct scheduler *s,
-                                       struct vcpu *v)
+                                       struct sched_unit *unit)
 {
     if ( s->context_saved )
-        s->context_saved(s, v);
+        s->context_saved(s, unit);
 }
 
-static inline void sched_migrate(const struct scheduler *s, struct vcpu *v,
-                                 unsigned int cpu)
+static inline void sched_migrate(const struct scheduler *s,
+                                 struct sched_unit *unit, unsigned int cpu)
 {
     if ( s->migrate )
-        s->migrate(s, v, cpu);
+        s->migrate(s, unit, cpu);
     else
-        v->processor = cpu;
+        unit->vcpu->processor = cpu;
 }
 
-static inline int sched_pick_cpu(const struct scheduler *s, struct vcpu *v)
+static inline int sched_pick_cpu(const struct scheduler *s,
+                                 struct sched_unit *unit)
 {
-    return s->pick_cpu(s, v);
+    return s->pick_cpu(s, unit);
 }
 
 static inline void sched_adjust_affinity(const struct scheduler *s,
-                                         struct vcpu *v,
+                                         struct sched_unit *unit,
                                          const cpumask_t *hard,
                                          const cpumask_t *soft)
 {
     if ( s->adjust_affinity )
-        s->adjust_affinity(s, v, hard, soft);
+        s->adjust_affinity(s, unit, hard, soft);
 }
 
 static inline int sched_adjust_dom(const struct scheduler *s, struct domain *d,
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 2201faca6b..72a17758a1 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -275,6 +275,10 @@ struct vcpu
     struct arch_vcpu arch;
 };
 
+struct sched_unit {
+    struct vcpu           *vcpu;
+};
+
 /* Per-domain lock can be recursively acquired in fault handlers. */
 #define domain_lock(d) spin_lock_recursive(&(d)->domain_lock)
 #define domain_unlock(d) spin_unlock_recursive(&(d)->domain_lock)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 05/60] xen/sched: alloc struct sched_unit for each vcpu
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Allocate a struct sched_unit for each vcpu. This removes the need to
have it locally on the stack in schedule.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c   | 67 +++++++++++++++++++++++--------------------------
 xen/include/xen/sched.h |  2 ++
 2 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 8a8e73fe57..3ff88096d8 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -252,10 +252,15 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 {
     struct domain *d = v->domain;
-    struct sched_unit unit = { .vcpu = v };
+    struct sched_unit *unit;
 
     v->processor = processor;
 
+    if ( (unit = xzalloc(struct sched_unit)) == NULL )
+        return 1;
+    v->sched_unit = unit;
+    unit->vcpu = v;
+
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -264,9 +269,13 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), &unit, d->sched_priv);
+    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), unit, d->sched_priv);
     if ( v->sched_priv == NULL )
+    {
+        v->sched_unit = NULL;
+        xfree(unit);
         return 1;
+    }
 
     /*
      * Initialize affinity settings. The idler, and potentially
@@ -285,7 +294,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     }
     else
     {
-        sched_insert_unit(dom_scheduler(d), &unit);
+        sched_insert_unit(dom_scheduler(d), unit);
     }
 
     return 0;
@@ -306,7 +315,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     void *vcpudata;
     struct scheduler *old_ops;
     void *old_domdata;
-    struct sched_unit unit;
 
     for_each_vcpu ( d, v )
     {
@@ -327,8 +335,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        unit.vcpu = v;
-        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, &unit, domdata);
+        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, v->sched_unit,
+                                                  domdata);
         if ( vcpu_priv[v->vcpu_id] == NULL )
         {
             for_each_vcpu ( d, v )
@@ -346,8 +354,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        unit.vcpu = v;
-        sched_remove_unit(old_ops, &unit);
+        sched_remove_unit(old_ops, v->sched_unit);
     }
 
     d->cpupool = c;
@@ -358,7 +365,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     {
         spinlock_t *lock;
 
-        unit.vcpu = v;
         vcpudata = v->sched_priv;
 
         migrate_timer(&v->periodic_timer, new_p);
@@ -383,7 +389,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         new_p = cpumask_cycle(new_p, c->cpu_valid);
 
-        sched_insert_unit(c->sched, &unit);
+        sched_insert_unit(c->sched, v->sched_unit);
 
         sched_free_vdata(old_ops, vcpudata);
     }
@@ -401,15 +407,17 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
 void sched_destroy_vcpu(struct vcpu *v)
 {
-    struct sched_unit unit = { .vcpu = v };
+    struct sched_unit *unit = v->sched_unit;
 
     kill_timer(&v->periodic_timer);
     kill_timer(&v->singleshot_timer);
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
-    sched_remove_unit(vcpu_scheduler(v), &unit);
+    sched_remove_unit(vcpu_scheduler(v), unit);
     sched_free_vdata(vcpu_scheduler(v), v->sched_priv);
+    xfree(unit);
+    v->sched_unit = NULL;
 }
 
 int sched_init_domain(struct domain *d, int poolid)
@@ -453,8 +461,6 @@ void sched_destroy_domain(struct domain *d)
 
 void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
-    struct sched_unit unit = { .vcpu = v };
-
     ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
 
     if ( likely(!vcpu_runnable(v)) )
@@ -462,7 +468,7 @@ void vcpu_sleep_nosync_locked(struct vcpu *v)
         if ( v->runstate.state == RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
 
-        sched_sleep(vcpu_scheduler(v), &unit);
+        sched_sleep(vcpu_scheduler(v), v->sched_unit);
     }
 }
 
@@ -494,7 +500,6 @@ void vcpu_wake(struct vcpu *v)
 {
     unsigned long flags;
     spinlock_t *lock;
-    struct sched_unit unit = { .vcpu = v };
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
@@ -504,7 +509,7 @@ void vcpu_wake(struct vcpu *v)
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        sched_wake(vcpu_scheduler(v), &unit);
+        sched_wake(vcpu_scheduler(v), v->sched_unit);
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -543,7 +548,6 @@ void vcpu_unblock(struct vcpu *v)
 static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
 {
     unsigned int old_cpu = v->processor;
-    struct sched_unit unit = { .vcpu = v };
 
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
@@ -560,7 +564,7 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
      * Actual CPU switch to new CPU.  This is safe because the lock
      * pointer can't change while the current lock is held.
      */
-    sched_migrate(vcpu_scheduler(v), &unit, new_cpu);
+    sched_migrate(vcpu_scheduler(v), v->sched_unit, new_cpu);
 }
 
 /*
@@ -602,7 +606,6 @@ static void vcpu_migrate_finish(struct vcpu *v)
     unsigned int old_cpu, new_cpu;
     spinlock_t *old_lock, *new_lock;
     bool_t pick_called = 0;
-    struct sched_unit unit = { .vcpu = v };
 
     /*
      * If the vcpu is currently running, this will be handled by
@@ -639,7 +642,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = sched_pick_cpu(vcpu_scheduler(v), &unit);
+            new_cpu = sched_pick_cpu(vcpu_scheduler(v), v->sched_unit);
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -709,7 +712,6 @@ void restore_vcpu_affinity(struct domain *d)
     {
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
-        struct sched_unit unit = { .vcpu = v };
 
         ASSERT(!vcpu_runnable(v));
 
@@ -745,7 +747,7 @@ void restore_vcpu_affinity(struct domain *d)
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = sched_pick_cpu(vcpu_scheduler(v), &unit);
+        v->processor = sched_pick_cpu(vcpu_scheduler(v), v->sched_unit);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -857,9 +859,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    struct sched_unit unit = { .vcpu = v };
-
-    sched_adjust_affinity(dom_scheduler(v->domain), &unit, hard, soft);
+    sched_adjust_affinity(dom_scheduler(v->domain), v->sched_unit, hard, soft);
 
     if ( hard )
         cpumask_copy(v->cpu_hard_affinity, hard);
@@ -1032,10 +1032,9 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=current;
-    struct sched_unit unit = { .vcpu = v };
     spinlock_t *lock = vcpu_schedule_lock_irq(v);
 
-    sched_yield(vcpu_scheduler(v), &unit);
+    sched_yield(vcpu_scheduler(v), v->sched_unit);
     vcpu_schedule_unlock_irq(lock, v);
 
     SCHED_STAT_CRANK(vcpu_yield);
@@ -1530,8 +1529,6 @@ static void schedule(void)
 
 void context_saved(struct vcpu *prev)
 {
-    struct sched_unit unit = { .vcpu = prev };
-
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
 
@@ -1540,7 +1537,7 @@ void context_saved(struct vcpu *prev)
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
 
-    sched_context_saved(vcpu_scheduler(prev), &unit);
+    sched_context_saved(vcpu_scheduler(prev), prev->sched_unit);
 
     vcpu_migrate_finish(prev);
 }
@@ -1596,7 +1593,6 @@ static int cpu_schedule_up(unsigned int cpu)
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
-        struct sched_unit unit = { .vcpu = idle };
 
         /*
          * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
@@ -1610,7 +1606,7 @@ static int cpu_schedule_up(unsigned int cpu)
          */
         ASSERT(idle->sched_priv == NULL);
 
-        idle->sched_priv = sched_alloc_vdata(&ops, &unit,
+        idle->sched_priv = sched_alloc_vdata(&ops, idle->sched_unit,
                                              idle->domain->sched_priv);
         if ( idle->sched_priv == NULL )
             return -ENOMEM;
@@ -1827,7 +1823,6 @@ void __init scheduler_init(void)
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
-    struct sched_unit unit;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
@@ -1864,11 +1859,11 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      *    sched_priv field of the per-vCPU info of the idle domain.
      */
     idle = idle_vcpu[cpu];
-    unit.vcpu = idle;
     ppriv = sched_alloc_pdata(new_ops, cpu);
     if ( IS_ERR(ppriv) )
         return PTR_ERR(ppriv);
-    vpriv = sched_alloc_vdata(new_ops, &unit, idle->domain->sched_priv);
+    vpriv = sched_alloc_vdata(new_ops, idle->sched_unit,
+                              idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
         sched_free_pdata(new_ops, ppriv, cpu);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 72a17758a1..e8ea42e867 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -140,6 +140,7 @@ void evtchn_destroy(struct domain *d); /* from domain_kill */
 void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
 
 struct waitqueue_vcpu;
+struct sched_unit;
 
 struct vcpu
 {
@@ -160,6 +161,7 @@ struct vcpu
 
     struct timer     poll_timer;    /* timeout for SCHEDOP_poll */
 
+    struct sched_unit *sched_unit;
     void            *sched_priv;    /* scheduler-specific data */
 
     struct vcpu_runstate_info runstate;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 05/60] xen/sched: alloc struct sched_unit for each vcpu
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Allocate a struct sched_unit for each vcpu. This removes the need to
have it locally on the stack in schedule.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c   | 67 +++++++++++++++++++++++--------------------------
 xen/include/xen/sched.h |  2 ++
 2 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 8a8e73fe57..3ff88096d8 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -252,10 +252,15 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 {
     struct domain *d = v->domain;
-    struct sched_unit unit = { .vcpu = v };
+    struct sched_unit *unit;
 
     v->processor = processor;
 
+    if ( (unit = xzalloc(struct sched_unit)) == NULL )
+        return 1;
+    v->sched_unit = unit;
+    unit->vcpu = v;
+
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -264,9 +269,13 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), &unit, d->sched_priv);
+    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), unit, d->sched_priv);
     if ( v->sched_priv == NULL )
+    {
+        v->sched_unit = NULL;
+        xfree(unit);
         return 1;
+    }
 
     /*
      * Initialize affinity settings. The idler, and potentially
@@ -285,7 +294,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     }
     else
     {
-        sched_insert_unit(dom_scheduler(d), &unit);
+        sched_insert_unit(dom_scheduler(d), unit);
     }
 
     return 0;
@@ -306,7 +315,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     void *vcpudata;
     struct scheduler *old_ops;
     void *old_domdata;
-    struct sched_unit unit;
 
     for_each_vcpu ( d, v )
     {
@@ -327,8 +335,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        unit.vcpu = v;
-        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, &unit, domdata);
+        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, v->sched_unit,
+                                                  domdata);
         if ( vcpu_priv[v->vcpu_id] == NULL )
         {
             for_each_vcpu ( d, v )
@@ -346,8 +354,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        unit.vcpu = v;
-        sched_remove_unit(old_ops, &unit);
+        sched_remove_unit(old_ops, v->sched_unit);
     }
 
     d->cpupool = c;
@@ -358,7 +365,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     {
         spinlock_t *lock;
 
-        unit.vcpu = v;
         vcpudata = v->sched_priv;
 
         migrate_timer(&v->periodic_timer, new_p);
@@ -383,7 +389,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         new_p = cpumask_cycle(new_p, c->cpu_valid);
 
-        sched_insert_unit(c->sched, &unit);
+        sched_insert_unit(c->sched, v->sched_unit);
 
         sched_free_vdata(old_ops, vcpudata);
     }
@@ -401,15 +407,17 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
 void sched_destroy_vcpu(struct vcpu *v)
 {
-    struct sched_unit unit = { .vcpu = v };
+    struct sched_unit *unit = v->sched_unit;
 
     kill_timer(&v->periodic_timer);
     kill_timer(&v->singleshot_timer);
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
-    sched_remove_unit(vcpu_scheduler(v), &unit);
+    sched_remove_unit(vcpu_scheduler(v), unit);
     sched_free_vdata(vcpu_scheduler(v), v->sched_priv);
+    xfree(unit);
+    v->sched_unit = NULL;
 }
 
 int sched_init_domain(struct domain *d, int poolid)
@@ -453,8 +461,6 @@ void sched_destroy_domain(struct domain *d)
 
 void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
-    struct sched_unit unit = { .vcpu = v };
-
     ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
 
     if ( likely(!vcpu_runnable(v)) )
@@ -462,7 +468,7 @@ void vcpu_sleep_nosync_locked(struct vcpu *v)
         if ( v->runstate.state == RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
 
-        sched_sleep(vcpu_scheduler(v), &unit);
+        sched_sleep(vcpu_scheduler(v), v->sched_unit);
     }
 }
 
@@ -494,7 +500,6 @@ void vcpu_wake(struct vcpu *v)
 {
     unsigned long flags;
     spinlock_t *lock;
-    struct sched_unit unit = { .vcpu = v };
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
@@ -504,7 +509,7 @@ void vcpu_wake(struct vcpu *v)
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        sched_wake(vcpu_scheduler(v), &unit);
+        sched_wake(vcpu_scheduler(v), v->sched_unit);
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -543,7 +548,6 @@ void vcpu_unblock(struct vcpu *v)
 static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
 {
     unsigned int old_cpu = v->processor;
-    struct sched_unit unit = { .vcpu = v };
 
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
@@ -560,7 +564,7 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
      * Actual CPU switch to new CPU.  This is safe because the lock
      * pointer can't change while the current lock is held.
      */
-    sched_migrate(vcpu_scheduler(v), &unit, new_cpu);
+    sched_migrate(vcpu_scheduler(v), v->sched_unit, new_cpu);
 }
 
 /*
@@ -602,7 +606,6 @@ static void vcpu_migrate_finish(struct vcpu *v)
     unsigned int old_cpu, new_cpu;
     spinlock_t *old_lock, *new_lock;
     bool_t pick_called = 0;
-    struct sched_unit unit = { .vcpu = v };
 
     /*
      * If the vcpu is currently running, this will be handled by
@@ -639,7 +642,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = sched_pick_cpu(vcpu_scheduler(v), &unit);
+            new_cpu = sched_pick_cpu(vcpu_scheduler(v), v->sched_unit);
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -709,7 +712,6 @@ void restore_vcpu_affinity(struct domain *d)
     {
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
-        struct sched_unit unit = { .vcpu = v };
 
         ASSERT(!vcpu_runnable(v));
 
@@ -745,7 +747,7 @@ void restore_vcpu_affinity(struct domain *d)
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = sched_pick_cpu(vcpu_scheduler(v), &unit);
+        v->processor = sched_pick_cpu(vcpu_scheduler(v), v->sched_unit);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -857,9 +859,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    struct sched_unit unit = { .vcpu = v };
-
-    sched_adjust_affinity(dom_scheduler(v->domain), &unit, hard, soft);
+    sched_adjust_affinity(dom_scheduler(v->domain), v->sched_unit, hard, soft);
 
     if ( hard )
         cpumask_copy(v->cpu_hard_affinity, hard);
@@ -1032,10 +1032,9 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=current;
-    struct sched_unit unit = { .vcpu = v };
     spinlock_t *lock = vcpu_schedule_lock_irq(v);
 
-    sched_yield(vcpu_scheduler(v), &unit);
+    sched_yield(vcpu_scheduler(v), v->sched_unit);
     vcpu_schedule_unlock_irq(lock, v);
 
     SCHED_STAT_CRANK(vcpu_yield);
@@ -1530,8 +1529,6 @@ static void schedule(void)
 
 void context_saved(struct vcpu *prev)
 {
-    struct sched_unit unit = { .vcpu = prev };
-
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
 
@@ -1540,7 +1537,7 @@ void context_saved(struct vcpu *prev)
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
 
-    sched_context_saved(vcpu_scheduler(prev), &unit);
+    sched_context_saved(vcpu_scheduler(prev), prev->sched_unit);
 
     vcpu_migrate_finish(prev);
 }
@@ -1596,7 +1593,6 @@ static int cpu_schedule_up(unsigned int cpu)
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
-        struct sched_unit unit = { .vcpu = idle };
 
         /*
          * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
@@ -1610,7 +1606,7 @@ static int cpu_schedule_up(unsigned int cpu)
          */
         ASSERT(idle->sched_priv == NULL);
 
-        idle->sched_priv = sched_alloc_vdata(&ops, &unit,
+        idle->sched_priv = sched_alloc_vdata(&ops, idle->sched_unit,
                                              idle->domain->sched_priv);
         if ( idle->sched_priv == NULL )
             return -ENOMEM;
@@ -1827,7 +1823,6 @@ void __init scheduler_init(void)
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
-    struct sched_unit unit;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
@@ -1864,11 +1859,11 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      *    sched_priv field of the per-vCPU info of the idle domain.
      */
     idle = idle_vcpu[cpu];
-    unit.vcpu = idle;
     ppriv = sched_alloc_pdata(new_ops, cpu);
     if ( IS_ERR(ppriv) )
         return PTR_ERR(ppriv);
-    vpriv = sched_alloc_vdata(new_ops, &unit, idle->domain->sched_priv);
+    vpriv = sched_alloc_vdata(new_ops, idle->sched_unit,
+                              idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
         sched_free_pdata(new_ops, ppriv, cpu);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 72a17758a1..e8ea42e867 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -140,6 +140,7 @@ void evtchn_destroy(struct domain *d); /* from domain_kill */
 void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
 
 struct waitqueue_vcpu;
+struct sched_unit;
 
 struct vcpu
 {
@@ -160,6 +161,7 @@ struct vcpu
 
     struct timer     poll_timer;    /* timeout for SCHEDOP_poll */
 
+    struct sched_unit *sched_unit;
     void            *sched_priv;    /* scheduler-specific data */
 
     struct vcpu_runstate_info runstate;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 06/60] xen/sched: move per-vcpu scheduler private data pointer to sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

This prepares making the different schedulers vcpu agnostic.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |  4 ++--
 xen/common/sched_credit.c   |  6 +++---
 xen/common/sched_credit2.c  | 10 +++++-----
 xen/common/sched_null.c     |  4 ++--
 xen/common/sched_rt.c       |  4 ++--
 xen/common/schedule.c       | 24 ++++++++++++------------
 xen/include/xen/sched.h     |  2 +-
 7 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 283730b3f8..840891318e 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -53,7 +53,7 @@
  * Return a pointer to the ARINC 653-specific scheduler data information
  * associated with the given VCPU (vc)
  */
-#define AVCPU(vc) ((arinc653_vcpu_t *)(vc)->sched_priv)
+#define AVCPU(vc) ((arinc653_vcpu_t *)(vc)->sched_unit->priv)
 
 /**
  * Return the global scheduler private data given the scheduler ops pointer
@@ -647,7 +647,7 @@ a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     ASSERT(!pdata && svc && is_idle_vcpu(svc->vc));
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_unit->priv = vdata;
 
     return &sd->_lock;
 }
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index eba4db38bb..4cfef189aa 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -83,7 +83,7 @@
     ((struct csched_private *)((_ops)->sched_data))
 #define CSCHED_PCPU(_c)     \
     ((struct csched_pcpu *)per_cpu(schedule_data, _c).sched_priv)
-#define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_priv)
+#define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_unit->priv)
 #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
 #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
 
@@ -641,7 +641,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     ASSERT(svc && is_idle_vcpu(svc->vcpu));
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_unit->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -1022,7 +1022,7 @@ static void
 csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu *svc = vc->sched_priv;
+    struct csched_vcpu *svc = unit->priv;
     spinlock_t *lock;
 
     BUG_ON( is_idle_vcpu(vc) );
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index def8d87484..86b44dc6cf 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -572,7 +572,7 @@ static inline struct csched2_pcpu *csched2_pcpu(unsigned int cpu)
 
 static inline struct csched2_vcpu *csched2_vcpu(const struct vcpu *v)
 {
-    return v->sched_priv;
+    return v->sched_unit->priv;
 }
 
 static inline struct csched2_dom *csched2_dom(const struct domain *d)
@@ -970,7 +970,7 @@ _runq_assign(struct csched2_vcpu *svc, struct csched2_runqueue_data *rqd)
 static void
 runq_assign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_priv;
+    struct csched2_vcpu *svc = vc->sched_unit->priv;
 
     ASSERT(svc->rqd == NULL);
 
@@ -997,7 +997,7 @@ _runq_deassign(struct csched2_vcpu *svc)
 static void
 runq_deassign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_priv;
+    struct csched2_vcpu *svc = vc->sched_unit->priv;
 
     ASSERT(svc->rqd == c2rqd(ops, vc->processor));
 
@@ -3108,7 +3108,7 @@ static void
 csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu *svc = vc->sched_priv;
+    struct csched2_vcpu *svc = unit->priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
@@ -3887,7 +3887,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     ASSERT(!local_irq_is_enabled());
     write_lock(&prv->lock);
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_unit->priv = vdata;
 
     rqi = init_pdata(prv, pdata, cpu);
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index bf52c84b0f..b622c4d7dc 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -117,7 +117,7 @@ static inline struct null_private *null_priv(const struct scheduler *ops)
 
 static inline struct null_vcpu *null_vcpu(const struct vcpu *v)
 {
-    return v->sched_priv;
+    return v->sched_unit->priv;
 }
 
 static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
@@ -392,7 +392,7 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
 
     ASSERT(nvc && is_idle_vcpu(nvc->vcpu));
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_unit->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 63ec732157..700ee9353e 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -235,7 +235,7 @@ static inline struct rt_private *rt_priv(const struct scheduler *ops)
 
 static inline struct rt_vcpu *rt_vcpu(const struct vcpu *vcpu)
 {
-    return vcpu->sched_priv;
+    return vcpu->sched_unit->priv;
 }
 
 static inline struct list_head *rt_runq(const struct scheduler *ops)
@@ -761,7 +761,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
         dprintk(XENLOG_DEBUG, "RTDS: timer initialized on cpu %u\n", cpu);
     }
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_unit->priv = vdata;
 
     return &prv->lock;
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 3ff88096d8..86a43f7192 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -269,8 +269,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), unit, d->sched_priv);
-    if ( v->sched_priv == NULL )
+    unit->priv = sched_alloc_vdata(dom_scheduler(d), unit, d->sched_priv);
+    if ( unit->priv == NULL )
     {
         v->sched_unit = NULL;
         xfree(unit);
@@ -365,7 +365,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     {
         spinlock_t *lock;
 
-        vcpudata = v->sched_priv;
+        vcpudata = v->sched_unit->priv;
 
         migrate_timer(&v->periodic_timer, new_p);
         migrate_timer(&v->singleshot_timer, new_p);
@@ -383,7 +383,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
          */
         spin_unlock_irq(lock);
 
-        v->sched_priv = vcpu_priv[v->vcpu_id];
+        v->sched_unit->priv = vcpu_priv[v->vcpu_id];
         if ( !d->is_dying )
             sched_move_irqs(v);
 
@@ -415,7 +415,7 @@ void sched_destroy_vcpu(struct vcpu *v)
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
     sched_remove_unit(vcpu_scheduler(v), unit);
-    sched_free_vdata(vcpu_scheduler(v), v->sched_priv);
+    sched_free_vdata(vcpu_scheduler(v), unit->priv);
     xfree(unit);
     v->sched_unit = NULL;
 }
@@ -1593,6 +1593,7 @@ static int cpu_schedule_up(unsigned int cpu)
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
+        struct sched_unit *unit = idle->sched_unit;
 
         /*
          * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
@@ -1604,11 +1605,10 @@ static int cpu_schedule_up(unsigned int cpu)
          * with a different scheduler, it is schedule_cpu_switch(), invoked
          * later, that will set things up as appropriate.
          */
-        ASSERT(idle->sched_priv == NULL);
+        ASSERT(unit->priv == NULL);
 
-        idle->sched_priv = sched_alloc_vdata(&ops, idle->sched_unit,
-                                             idle->domain->sched_priv);
-        if ( idle->sched_priv == NULL )
+        unit->priv = sched_alloc_vdata(&ops, unit, idle->domain->sched_priv);
+        if ( unit->priv == NULL )
             return -ENOMEM;
     }
     if ( idle_vcpu[cpu] == NULL )
@@ -1634,9 +1634,9 @@ static void cpu_schedule_down(unsigned int cpu)
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
     sched_free_pdata(sched, sd->sched_priv, cpu);
-    sched_free_vdata(sched, idle_vcpu[cpu]->sched_priv);
+    sched_free_vdata(sched, idle_vcpu[cpu]->sched_unit->priv);
 
-    idle_vcpu[cpu]->sched_priv = NULL;
+    idle_vcpu[cpu]->sched_unit->priv = NULL;
     sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
@@ -1886,7 +1886,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      */
     old_lock = pcpu_schedule_lock_irq(cpu);
 
-    vpriv_old = idle->sched_priv;
+    vpriv_old = idle->sched_unit->priv;
     ppriv_old = sd->sched_priv;
     new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e8ea42e867..f549ad60d1 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -162,7 +162,6 @@ struct vcpu
     struct timer     poll_timer;    /* timeout for SCHEDOP_poll */
 
     struct sched_unit *sched_unit;
-    void            *sched_priv;    /* scheduler-specific data */
 
     struct vcpu_runstate_info runstate;
 #ifndef CONFIG_COMPAT
@@ -279,6 +278,7 @@ struct vcpu
 
 struct sched_unit {
     struct vcpu           *vcpu;
+    void                  *priv;      /* scheduler private data */
 };
 
 /* Per-domain lock can be recursively acquired in fault handlers. */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 06/60] xen/sched: move per-vcpu scheduler private data pointer to sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

This prepares making the different schedulers vcpu agnostic.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |  4 ++--
 xen/common/sched_credit.c   |  6 +++---
 xen/common/sched_credit2.c  | 10 +++++-----
 xen/common/sched_null.c     |  4 ++--
 xen/common/sched_rt.c       |  4 ++--
 xen/common/schedule.c       | 24 ++++++++++++------------
 xen/include/xen/sched.h     |  2 +-
 7 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 283730b3f8..840891318e 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -53,7 +53,7 @@
  * Return a pointer to the ARINC 653-specific scheduler data information
  * associated with the given VCPU (vc)
  */
-#define AVCPU(vc) ((arinc653_vcpu_t *)(vc)->sched_priv)
+#define AVCPU(vc) ((arinc653_vcpu_t *)(vc)->sched_unit->priv)
 
 /**
  * Return the global scheduler private data given the scheduler ops pointer
@@ -647,7 +647,7 @@ a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     ASSERT(!pdata && svc && is_idle_vcpu(svc->vc));
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_unit->priv = vdata;
 
     return &sd->_lock;
 }
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index eba4db38bb..4cfef189aa 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -83,7 +83,7 @@
     ((struct csched_private *)((_ops)->sched_data))
 #define CSCHED_PCPU(_c)     \
     ((struct csched_pcpu *)per_cpu(schedule_data, _c).sched_priv)
-#define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_priv)
+#define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_unit->priv)
 #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
 #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
 
@@ -641,7 +641,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     ASSERT(svc && is_idle_vcpu(svc->vcpu));
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_unit->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -1022,7 +1022,7 @@ static void
 csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu *svc = vc->sched_priv;
+    struct csched_vcpu *svc = unit->priv;
     spinlock_t *lock;
 
     BUG_ON( is_idle_vcpu(vc) );
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index def8d87484..86b44dc6cf 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -572,7 +572,7 @@ static inline struct csched2_pcpu *csched2_pcpu(unsigned int cpu)
 
 static inline struct csched2_vcpu *csched2_vcpu(const struct vcpu *v)
 {
-    return v->sched_priv;
+    return v->sched_unit->priv;
 }
 
 static inline struct csched2_dom *csched2_dom(const struct domain *d)
@@ -970,7 +970,7 @@ _runq_assign(struct csched2_vcpu *svc, struct csched2_runqueue_data *rqd)
 static void
 runq_assign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_priv;
+    struct csched2_vcpu *svc = vc->sched_unit->priv;
 
     ASSERT(svc->rqd == NULL);
 
@@ -997,7 +997,7 @@ _runq_deassign(struct csched2_vcpu *svc)
 static void
 runq_deassign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_priv;
+    struct csched2_vcpu *svc = vc->sched_unit->priv;
 
     ASSERT(svc->rqd == c2rqd(ops, vc->processor));
 
@@ -3108,7 +3108,7 @@ static void
 csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu *svc = vc->sched_priv;
+    struct csched2_vcpu *svc = unit->priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
@@ -3887,7 +3887,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     ASSERT(!local_irq_is_enabled());
     write_lock(&prv->lock);
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_unit->priv = vdata;
 
     rqi = init_pdata(prv, pdata, cpu);
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index bf52c84b0f..b622c4d7dc 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -117,7 +117,7 @@ static inline struct null_private *null_priv(const struct scheduler *ops)
 
 static inline struct null_vcpu *null_vcpu(const struct vcpu *v)
 {
-    return v->sched_priv;
+    return v->sched_unit->priv;
 }
 
 static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
@@ -392,7 +392,7 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
 
     ASSERT(nvc && is_idle_vcpu(nvc->vcpu));
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_unit->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 63ec732157..700ee9353e 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -235,7 +235,7 @@ static inline struct rt_private *rt_priv(const struct scheduler *ops)
 
 static inline struct rt_vcpu *rt_vcpu(const struct vcpu *vcpu)
 {
-    return vcpu->sched_priv;
+    return vcpu->sched_unit->priv;
 }
 
 static inline struct list_head *rt_runq(const struct scheduler *ops)
@@ -761,7 +761,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
         dprintk(XENLOG_DEBUG, "RTDS: timer initialized on cpu %u\n", cpu);
     }
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_unit->priv = vdata;
 
     return &prv->lock;
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 3ff88096d8..86a43f7192 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -269,8 +269,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = sched_alloc_vdata(dom_scheduler(d), unit, d->sched_priv);
-    if ( v->sched_priv == NULL )
+    unit->priv = sched_alloc_vdata(dom_scheduler(d), unit, d->sched_priv);
+    if ( unit->priv == NULL )
     {
         v->sched_unit = NULL;
         xfree(unit);
@@ -365,7 +365,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     {
         spinlock_t *lock;
 
-        vcpudata = v->sched_priv;
+        vcpudata = v->sched_unit->priv;
 
         migrate_timer(&v->periodic_timer, new_p);
         migrate_timer(&v->singleshot_timer, new_p);
@@ -383,7 +383,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
          */
         spin_unlock_irq(lock);
 
-        v->sched_priv = vcpu_priv[v->vcpu_id];
+        v->sched_unit->priv = vcpu_priv[v->vcpu_id];
         if ( !d->is_dying )
             sched_move_irqs(v);
 
@@ -415,7 +415,7 @@ void sched_destroy_vcpu(struct vcpu *v)
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
     sched_remove_unit(vcpu_scheduler(v), unit);
-    sched_free_vdata(vcpu_scheduler(v), v->sched_priv);
+    sched_free_vdata(vcpu_scheduler(v), unit->priv);
     xfree(unit);
     v->sched_unit = NULL;
 }
@@ -1593,6 +1593,7 @@ static int cpu_schedule_up(unsigned int cpu)
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
+        struct sched_unit *unit = idle->sched_unit;
 
         /*
          * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
@@ -1604,11 +1605,10 @@ static int cpu_schedule_up(unsigned int cpu)
          * with a different scheduler, it is schedule_cpu_switch(), invoked
          * later, that will set things up as appropriate.
          */
-        ASSERT(idle->sched_priv == NULL);
+        ASSERT(unit->priv == NULL);
 
-        idle->sched_priv = sched_alloc_vdata(&ops, idle->sched_unit,
-                                             idle->domain->sched_priv);
-        if ( idle->sched_priv == NULL )
+        unit->priv = sched_alloc_vdata(&ops, unit, idle->domain->sched_priv);
+        if ( unit->priv == NULL )
             return -ENOMEM;
     }
     if ( idle_vcpu[cpu] == NULL )
@@ -1634,9 +1634,9 @@ static void cpu_schedule_down(unsigned int cpu)
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
     sched_free_pdata(sched, sd->sched_priv, cpu);
-    sched_free_vdata(sched, idle_vcpu[cpu]->sched_priv);
+    sched_free_vdata(sched, idle_vcpu[cpu]->sched_unit->priv);
 
-    idle_vcpu[cpu]->sched_priv = NULL;
+    idle_vcpu[cpu]->sched_unit->priv = NULL;
     sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
@@ -1886,7 +1886,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      */
     old_lock = pcpu_schedule_lock_irq(cpu);
 
-    vpriv_old = idle->sched_priv;
+    vpriv_old = idle->sched_unit->priv;
     ppriv_old = sd->sched_priv;
     new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e8ea42e867..f549ad60d1 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -162,7 +162,6 @@ struct vcpu
     struct timer     poll_timer;    /* timeout for SCHEDOP_poll */
 
     struct sched_unit *sched_unit;
-    void            *sched_priv;    /* scheduler-specific data */
 
     struct vcpu_runstate_info runstate;
 #ifndef CONFIG_COMPAT
@@ -279,6 +278,7 @@ struct vcpu
 
 struct sched_unit {
     struct vcpu           *vcpu;
+    void                  *priv;      /* scheduler private data */
 };
 
 /* Per-domain lock can be recursively acquired in fault handlers. */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 07/60] xen/sched: build a linked list of struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

In order to make it easy to iterate over sched_unit elements of a
domain build a single linked list and add an iterator for it. The new
list is guarded by the same mechanisms as the vcpu linked list as it
is modified only via vcpu_create() or vcpu_destroy().

For completeness add another iterator for_each_sched_unit_vcpu() which
will iterate over all vcpus if a sched_unit (right now only one). This
will be needed later for larger scheduling granularity (e.g. cores).

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c   | 56 ++++++++++++++++++++++++++++++++++++++++++-------
 xen/include/xen/sched.h |  9 ++++++++
 2 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 86a43f7192..49d25489ef 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -249,6 +249,52 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
     spin_unlock_irqrestore(lock1, flags);
 }
 
+static void sched_free_unit(struct sched_unit *unit)
+{
+    struct sched_unit *prev_unit;
+    struct domain *d = unit->vcpu->domain;
+
+    if ( d->sched_unit_list == unit )
+        d->sched_unit_list = unit->next_in_list;
+    else
+    {
+        for_each_sched_unit ( d, prev_unit )
+        {
+            if ( prev_unit->next_in_list == unit )
+            {
+                prev_unit->next_in_list = unit->next_in_list;
+                break;
+            }
+        }
+    }
+
+    unit->vcpu->sched_unit = NULL;
+    xfree(unit);
+}
+
+static struct sched_unit *sched_alloc_unit(struct vcpu *v)
+{
+    struct sched_unit *unit, **prev_unit;
+    struct domain *d = v->domain;
+
+    if ( (unit = xzalloc(struct sched_unit)) == NULL )
+        return NULL;
+
+    v->sched_unit = unit;
+    unit->vcpu = v;
+
+    for ( prev_unit = &d->sched_unit_list; *prev_unit;
+          prev_unit = &(*prev_unit)->next_in_list )
+        if ( (*prev_unit)->next_in_list &&
+             (*prev_unit)->next_in_list->vcpu->vcpu_id > v->vcpu_id )
+            break;
+
+    unit->next_in_list = *prev_unit;
+    *prev_unit = unit;
+
+    return unit;
+}
+
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 {
     struct domain *d = v->domain;
@@ -256,10 +302,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 
     v->processor = processor;
 
-    if ( (unit = xzalloc(struct sched_unit)) == NULL )
+    if ( (unit = sched_alloc_unit(v)) == NULL )
         return 1;
-    v->sched_unit = unit;
-    unit->vcpu = v;
 
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
@@ -272,8 +316,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     unit->priv = sched_alloc_vdata(dom_scheduler(d), unit, d->sched_priv);
     if ( unit->priv == NULL )
     {
-        v->sched_unit = NULL;
-        xfree(unit);
+        sched_free_unit(unit);
         return 1;
     }
 
@@ -416,8 +459,7 @@ void sched_destroy_vcpu(struct vcpu *v)
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
     sched_remove_unit(vcpu_scheduler(v), unit);
     sched_free_vdata(vcpu_scheduler(v), unit->priv);
-    xfree(unit);
-    v->sched_unit = NULL;
+    sched_free_unit(unit);
 }
 
 int sched_init_domain(struct domain *d, int poolid)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index f549ad60d1..4da1ab201d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -279,8 +279,16 @@ struct vcpu
 struct sched_unit {
     struct vcpu           *vcpu;
     void                  *priv;      /* scheduler private data */
+    struct sched_unit     *next_in_list;
 };
 
+#define for_each_sched_unit(d, e)                                         \
+    for ( (e) = (d)->sched_unit_list; (e) != NULL; (e) = (e)->next_in_list )
+
+#define for_each_sched_unit_vcpu(i, v)                                    \
+    for ( (v) = (i)->vcpu; (v) != NULL && (v)->sched_unit == (i);         \
+          (v) = (v)->next_in_list )
+
 /* Per-domain lock can be recursively acquired in fault handlers. */
 #define domain_lock(d) spin_lock_recursive(&(d)->domain_lock)
 #define domain_unlock(d) spin_unlock_recursive(&(d)->domain_lock)
@@ -339,6 +347,7 @@ struct domain
 
     /* Scheduling. */
     void            *sched_priv;    /* scheduler-specific data */
+    struct sched_unit *sched_unit_list;
     struct cpupool  *cpupool;
 
     struct domain   *next_in_list;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 07/60] xen/sched: build a linked list of struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

In order to make it easy to iterate over sched_unit elements of a
domain build a single linked list and add an iterator for it. The new
list is guarded by the same mechanisms as the vcpu linked list as it
is modified only via vcpu_create() or vcpu_destroy().

For completeness add another iterator for_each_sched_unit_vcpu() which
will iterate over all vcpus if a sched_unit (right now only one). This
will be needed later for larger scheduling granularity (e.g. cores).

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c   | 56 ++++++++++++++++++++++++++++++++++++++++++-------
 xen/include/xen/sched.h |  9 ++++++++
 2 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 86a43f7192..49d25489ef 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -249,6 +249,52 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
     spin_unlock_irqrestore(lock1, flags);
 }
 
+static void sched_free_unit(struct sched_unit *unit)
+{
+    struct sched_unit *prev_unit;
+    struct domain *d = unit->vcpu->domain;
+
+    if ( d->sched_unit_list == unit )
+        d->sched_unit_list = unit->next_in_list;
+    else
+    {
+        for_each_sched_unit ( d, prev_unit )
+        {
+            if ( prev_unit->next_in_list == unit )
+            {
+                prev_unit->next_in_list = unit->next_in_list;
+                break;
+            }
+        }
+    }
+
+    unit->vcpu->sched_unit = NULL;
+    xfree(unit);
+}
+
+static struct sched_unit *sched_alloc_unit(struct vcpu *v)
+{
+    struct sched_unit *unit, **prev_unit;
+    struct domain *d = v->domain;
+
+    if ( (unit = xzalloc(struct sched_unit)) == NULL )
+        return NULL;
+
+    v->sched_unit = unit;
+    unit->vcpu = v;
+
+    for ( prev_unit = &d->sched_unit_list; *prev_unit;
+          prev_unit = &(*prev_unit)->next_in_list )
+        if ( (*prev_unit)->next_in_list &&
+             (*prev_unit)->next_in_list->vcpu->vcpu_id > v->vcpu_id )
+            break;
+
+    unit->next_in_list = *prev_unit;
+    *prev_unit = unit;
+
+    return unit;
+}
+
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 {
     struct domain *d = v->domain;
@@ -256,10 +302,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 
     v->processor = processor;
 
-    if ( (unit = xzalloc(struct sched_unit)) == NULL )
+    if ( (unit = sched_alloc_unit(v)) == NULL )
         return 1;
-    v->sched_unit = unit;
-    unit->vcpu = v;
 
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
@@ -272,8 +316,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     unit->priv = sched_alloc_vdata(dom_scheduler(d), unit, d->sched_priv);
     if ( unit->priv == NULL )
     {
-        v->sched_unit = NULL;
-        xfree(unit);
+        sched_free_unit(unit);
         return 1;
     }
 
@@ -416,8 +459,7 @@ void sched_destroy_vcpu(struct vcpu *v)
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
     sched_remove_unit(vcpu_scheduler(v), unit);
     sched_free_vdata(vcpu_scheduler(v), unit->priv);
-    xfree(unit);
-    v->sched_unit = NULL;
+    sched_free_unit(unit);
 }
 
 int sched_init_domain(struct domain *d, int poolid)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index f549ad60d1..4da1ab201d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -279,8 +279,16 @@ struct vcpu
 struct sched_unit {
     struct vcpu           *vcpu;
     void                  *priv;      /* scheduler private data */
+    struct sched_unit     *next_in_list;
 };
 
+#define for_each_sched_unit(d, e)                                         \
+    for ( (e) = (d)->sched_unit_list; (e) != NULL; (e) = (e)->next_in_list )
+
+#define for_each_sched_unit_vcpu(i, v)                                    \
+    for ( (v) = (i)->vcpu; (v) != NULL && (v)->sched_unit == (i);         \
+          (v) = (v)->next_in_list )
+
 /* Per-domain lock can be recursively acquired in fault handlers. */
 #define domain_lock(d) spin_lock_recursive(&(d)->domain_lock)
 #define domain_unlock(d) spin_unlock_recursive(&(d)->domain_lock)
@@ -339,6 +347,7 @@ struct domain
 
     /* Scheduling. */
     void            *sched_priv;    /* scheduler-specific data */
+    struct sched_unit *sched_unit_list;
     struct cpupool  *cpupool;
 
     struct domain   *next_in_list;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 08/60] xen/sched: introduce struct sched_resource
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Add a scheduling abstraction layer between physical processors and the
schedulers by introducing a struct sched_resource. Each scheduler unit
running is active on such a scheduler resource. For the time being
there is one struct sched_resource per cpu, but in future there might
be one for each core or socket only.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: add accessor functions
---
 xen/common/sched_credit.c  |  2 ++
 xen/common/sched_credit2.c |  7 +++++++
 xen/common/sched_null.c    |  3 +++
 xen/common/sched_rt.c      |  2 ++
 xen/common/schedule.c      | 15 +++++++++++++++
 xen/include/xen/sched-if.h | 18 ++++++++++++++++++
 xen/include/xen/sched.h    |  3 +++
 7 files changed, 50 insertions(+)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 4cfef189aa..0cd139ec29 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1031,6 +1031,7 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     lock = vcpu_schedule_lock_irq(vc);
 
     vc->processor = csched_cpu_pick(ops, unit);
+    unit->res = get_sched_res(vc->processor);
 
     spin_unlock_irq(lock);
 
@@ -1667,6 +1668,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
             WARN_ON(vc->is_urgent);
             runq_remove(speer);
             vc->processor = cpu;
+            vc->sched_unit->res = get_sched_res(cpu);
             /*
              * speer will start executing directly on cpu, without having to
              * go through runq_insert(). So we must update the runnable count
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 86b44dc6cf..c92a090905 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2519,6 +2519,7 @@ static void migrate(const struct scheduler *ops,
                     &trqd->active);
         svc->vcpu->processor = cpumask_cycle(trqd->pick_bias,
                                              cpumask_scratch_cpu(cpu));
+        svc->vcpu->sched_unit->res = get_sched_res(svc->vcpu->processor);
         trqd->pick_bias = svc->vcpu->processor;
         ASSERT(svc->vcpu->processor < nr_cpu_ids);
 
@@ -2774,6 +2775,7 @@ csched2_unit_migrate(
         }
         _runq_deassign(svc);
         vc->processor = new_cpu;
+        unit->res = get_sched_res(new_cpu);
         return;
     }
 
@@ -2794,7 +2796,10 @@ csched2_unit_migrate(
     if ( trqd != svc->rqd )
         migrate(ops, svc, trqd, now);
     else
+    {
         vc->processor = new_cpu;
+        unit->res = get_sched_res(new_cpu);
+    }
 }
 
 static int
@@ -3119,6 +3124,7 @@ csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     lock = vcpu_schedule_lock_irq(vc);
 
     vc->processor = csched2_cpu_pick(ops, unit);
+    unit->res = get_sched_res(vc->processor);
 
     spin_unlock_irq(lock);
 
@@ -3596,6 +3602,7 @@ csched2_schedule(
         {
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
             snext->vcpu->processor = cpu;
+            snext->vcpu->sched_unit->res = get_sched_res(cpu);
             SCHED_STAT_CRANK(migrated);
             ret.migrated = 1;
         }
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index b622c4d7dc..7c01d0272d 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -343,6 +343,7 @@ static void vcpu_assign(struct null_private *prv, struct vcpu *v,
 {
     per_cpu(npc, cpu).vcpu = v;
     v->processor = cpu;
+    v->sched_unit->res = get_sched_res(cpu);
     cpumask_clear_cpu(cpu, &prv->cpus_free);
 
     dprintk(XENLOG_G_INFO, "%d <-- %pv\n", cpu, v);
@@ -421,6 +422,7 @@ static void null_unit_insert(const struct scheduler *ops,
  retry:
 
     cpu = v->processor = pick_cpu(prv, v);
+    unit->res = get_sched_res(cpu);
 
     spin_unlock(lock);
 
@@ -667,6 +669,7 @@ static void null_unit_migrate(const struct scheduler *ops,
      * by this, will be fixed-up during resume.
      */
     v->processor = new_cpu;
+    unit->res = get_sched_res(new_cpu);
 }
 
 #ifndef NDEBUG
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 700ee9353e..715a7cd218 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -894,6 +894,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 
     /* This is safe because vc isn't yet being scheduled */
     vc->processor = rt_cpu_pick(ops, unit);
+    unit->res = get_sched_res(vc->processor);
 
     lock = vcpu_schedule_lock_irq(vc);
 
@@ -1124,6 +1125,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         if ( snext->vcpu->processor != cpu )
         {
             snext->vcpu->processor = cpu;
+            snext->vcpu->sched_unit->res = get_sched_res(cpu);
             ret.migrated = 1;
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 49d25489ef..32ce248f24 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -63,6 +63,7 @@ static void poll_timer_fn(void *data);
 /* This is global for now so that private implementations can reach it */
 DEFINE_PER_CPU(struct schedule_data, schedule_data);
 DEFINE_PER_CPU(struct scheduler *, scheduler);
+DEFINE_PER_CPU(struct sched_resource *, sched_res);
 
 /* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
@@ -305,6 +306,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     if ( (unit = sched_alloc_unit(v)) == NULL )
         return 1;
 
+    unit->res = get_sched_res(processor);
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -419,6 +421,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         sched_set_affinity(v, &cpumask_all, &cpumask_all);
 
         v->processor = new_p;
+        v->sched_unit->res = get_sched_res(new_p);
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
@@ -787,9 +790,11 @@ void restore_vcpu_affinity(struct domain *d)
         }
 
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+        v->sched_unit->res = get_sched_res(v->processor);
 
         lock = vcpu_schedule_lock_irq(v);
         v->processor = sched_pick_cpu(vcpu_scheduler(v), v->sched_unit);
+        v->sched_unit->res = get_sched_res(v->processor);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -1618,6 +1623,13 @@ static int cpu_schedule_up(unsigned int cpu)
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     void *sched_priv;
+    struct sched_resource *res;
+
+    res = xzalloc(struct sched_resource);
+    if ( res == NULL )
+        return -ENOMEM;
+    res->processor = cpu;
+    set_sched_res(cpu, res);
 
     per_cpu(scheduler, cpu) = &ops;
     spin_lock_init(&sd->_lock);
@@ -1682,6 +1694,9 @@ static void cpu_schedule_down(unsigned int cpu)
     sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
+
+    set_sched_res(cpu, NULL);
+    xfree(sd);
 }
 
 static int cpu_schedule_callback(
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 8ddbeb4ffd..bc7f223aad 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -44,9 +44,24 @@ struct schedule_data {
 
 #define curr_on_cpu(c)    (per_cpu(schedule_data, c).curr)
 
+struct sched_resource {
+    unsigned int processor;
+};
+
 DECLARE_PER_CPU(struct schedule_data, schedule_data);
 DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
+DECLARE_PER_CPU(struct sched_resource *, sched_res);
+
+static inline struct sched_resource *get_sched_res(unsigned int cpu)
+{
+    return per_cpu(sched_res, cpu);
+}
+
+static inline void set_sched_res(unsigned int cpu, struct sched_resource *res)
+{
+    per_cpu(sched_res, cpu) = res;
+}
 
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
@@ -334,7 +349,10 @@ static inline void sched_migrate(const struct scheduler *s,
     if ( s->migrate )
         s->migrate(s, unit, cpu);
     else
+    {
         unit->vcpu->processor = cpu;
+        unit->res = per_cpu(sched_res, cpu);
+    }
 }
 
 static inline int sched_pick_cpu(const struct scheduler *s,
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 4da1ab201d..d3a1a31c86 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -276,10 +276,13 @@ struct vcpu
     struct arch_vcpu arch;
 };
 
+struct sched_resource;
+
 struct sched_unit {
     struct vcpu           *vcpu;
     void                  *priv;      /* scheduler private data */
     struct sched_unit     *next_in_list;
+    struct sched_resource *res;
 };
 
 #define for_each_sched_unit(d, e)                                         \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 08/60] xen/sched: introduce struct sched_resource
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Add a scheduling abstraction layer between physical processors and the
schedulers by introducing a struct sched_resource. Each scheduler unit
running is active on such a scheduler resource. For the time being
there is one struct sched_resource per cpu, but in future there might
be one for each core or socket only.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: add accessor functions
---
 xen/common/sched_credit.c  |  2 ++
 xen/common/sched_credit2.c |  7 +++++++
 xen/common/sched_null.c    |  3 +++
 xen/common/sched_rt.c      |  2 ++
 xen/common/schedule.c      | 15 +++++++++++++++
 xen/include/xen/sched-if.h | 18 ++++++++++++++++++
 xen/include/xen/sched.h    |  3 +++
 7 files changed, 50 insertions(+)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 4cfef189aa..0cd139ec29 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1031,6 +1031,7 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     lock = vcpu_schedule_lock_irq(vc);
 
     vc->processor = csched_cpu_pick(ops, unit);
+    unit->res = get_sched_res(vc->processor);
 
     spin_unlock_irq(lock);
 
@@ -1667,6 +1668,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
             WARN_ON(vc->is_urgent);
             runq_remove(speer);
             vc->processor = cpu;
+            vc->sched_unit->res = get_sched_res(cpu);
             /*
              * speer will start executing directly on cpu, without having to
              * go through runq_insert(). So we must update the runnable count
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 86b44dc6cf..c92a090905 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2519,6 +2519,7 @@ static void migrate(const struct scheduler *ops,
                     &trqd->active);
         svc->vcpu->processor = cpumask_cycle(trqd->pick_bias,
                                              cpumask_scratch_cpu(cpu));
+        svc->vcpu->sched_unit->res = get_sched_res(svc->vcpu->processor);
         trqd->pick_bias = svc->vcpu->processor;
         ASSERT(svc->vcpu->processor < nr_cpu_ids);
 
@@ -2774,6 +2775,7 @@ csched2_unit_migrate(
         }
         _runq_deassign(svc);
         vc->processor = new_cpu;
+        unit->res = get_sched_res(new_cpu);
         return;
     }
 
@@ -2794,7 +2796,10 @@ csched2_unit_migrate(
     if ( trqd != svc->rqd )
         migrate(ops, svc, trqd, now);
     else
+    {
         vc->processor = new_cpu;
+        unit->res = get_sched_res(new_cpu);
+    }
 }
 
 static int
@@ -3119,6 +3124,7 @@ csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     lock = vcpu_schedule_lock_irq(vc);
 
     vc->processor = csched2_cpu_pick(ops, unit);
+    unit->res = get_sched_res(vc->processor);
 
     spin_unlock_irq(lock);
 
@@ -3596,6 +3602,7 @@ csched2_schedule(
         {
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
             snext->vcpu->processor = cpu;
+            snext->vcpu->sched_unit->res = get_sched_res(cpu);
             SCHED_STAT_CRANK(migrated);
             ret.migrated = 1;
         }
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index b622c4d7dc..7c01d0272d 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -343,6 +343,7 @@ static void vcpu_assign(struct null_private *prv, struct vcpu *v,
 {
     per_cpu(npc, cpu).vcpu = v;
     v->processor = cpu;
+    v->sched_unit->res = get_sched_res(cpu);
     cpumask_clear_cpu(cpu, &prv->cpus_free);
 
     dprintk(XENLOG_G_INFO, "%d <-- %pv\n", cpu, v);
@@ -421,6 +422,7 @@ static void null_unit_insert(const struct scheduler *ops,
  retry:
 
     cpu = v->processor = pick_cpu(prv, v);
+    unit->res = get_sched_res(cpu);
 
     spin_unlock(lock);
 
@@ -667,6 +669,7 @@ static void null_unit_migrate(const struct scheduler *ops,
      * by this, will be fixed-up during resume.
      */
     v->processor = new_cpu;
+    unit->res = get_sched_res(new_cpu);
 }
 
 #ifndef NDEBUG
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 700ee9353e..715a7cd218 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -894,6 +894,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 
     /* This is safe because vc isn't yet being scheduled */
     vc->processor = rt_cpu_pick(ops, unit);
+    unit->res = get_sched_res(vc->processor);
 
     lock = vcpu_schedule_lock_irq(vc);
 
@@ -1124,6 +1125,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         if ( snext->vcpu->processor != cpu )
         {
             snext->vcpu->processor = cpu;
+            snext->vcpu->sched_unit->res = get_sched_res(cpu);
             ret.migrated = 1;
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 49d25489ef..32ce248f24 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -63,6 +63,7 @@ static void poll_timer_fn(void *data);
 /* This is global for now so that private implementations can reach it */
 DEFINE_PER_CPU(struct schedule_data, schedule_data);
 DEFINE_PER_CPU(struct scheduler *, scheduler);
+DEFINE_PER_CPU(struct sched_resource *, sched_res);
 
 /* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
@@ -305,6 +306,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     if ( (unit = sched_alloc_unit(v)) == NULL )
         return 1;
 
+    unit->res = get_sched_res(processor);
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -419,6 +421,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         sched_set_affinity(v, &cpumask_all, &cpumask_all);
 
         v->processor = new_p;
+        v->sched_unit->res = get_sched_res(new_p);
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
@@ -787,9 +790,11 @@ void restore_vcpu_affinity(struct domain *d)
         }
 
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+        v->sched_unit->res = get_sched_res(v->processor);
 
         lock = vcpu_schedule_lock_irq(v);
         v->processor = sched_pick_cpu(vcpu_scheduler(v), v->sched_unit);
+        v->sched_unit->res = get_sched_res(v->processor);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -1618,6 +1623,13 @@ static int cpu_schedule_up(unsigned int cpu)
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     void *sched_priv;
+    struct sched_resource *res;
+
+    res = xzalloc(struct sched_resource);
+    if ( res == NULL )
+        return -ENOMEM;
+    res->processor = cpu;
+    set_sched_res(cpu, res);
 
     per_cpu(scheduler, cpu) = &ops;
     spin_lock_init(&sd->_lock);
@@ -1682,6 +1694,9 @@ static void cpu_schedule_down(unsigned int cpu)
     sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
+
+    set_sched_res(cpu, NULL);
+    xfree(sd);
 }
 
 static int cpu_schedule_callback(
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 8ddbeb4ffd..bc7f223aad 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -44,9 +44,24 @@ struct schedule_data {
 
 #define curr_on_cpu(c)    (per_cpu(schedule_data, c).curr)
 
+struct sched_resource {
+    unsigned int processor;
+};
+
 DECLARE_PER_CPU(struct schedule_data, schedule_data);
 DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
+DECLARE_PER_CPU(struct sched_resource *, sched_res);
+
+static inline struct sched_resource *get_sched_res(unsigned int cpu)
+{
+    return per_cpu(sched_res, cpu);
+}
+
+static inline void set_sched_res(unsigned int cpu, struct sched_resource *res)
+{
+    per_cpu(sched_res, cpu) = res;
+}
 
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
@@ -334,7 +349,10 @@ static inline void sched_migrate(const struct scheduler *s,
     if ( s->migrate )
         s->migrate(s, unit, cpu);
     else
+    {
         unit->vcpu->processor = cpu;
+        unit->res = per_cpu(sched_res, cpu);
+    }
 }
 
 static inline int sched_pick_cpu(const struct scheduler *s,
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 4da1ab201d..d3a1a31c86 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -276,10 +276,13 @@ struct vcpu
     struct arch_vcpu arch;
 };
 
+struct sched_resource;
+
 struct sched_unit {
     struct vcpu           *vcpu;
     void                  *priv;      /* scheduler private data */
     struct sched_unit     *next_in_list;
+    struct sched_resource *res;
 };
 
 #define for_each_sched_unit(d, e)                                         \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 09/60] xen/sched: let pick_cpu return a scheduler resource
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

Instead of returning a physical cpu number let pick_cpu() return a
scheduler resource instead. Rename pick_cpu() to pick_resource() to
reflect that change.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c  | 12 ++++++------
 xen/common/sched_credit.c    | 16 ++++++++--------
 xen/common/sched_credit2.c   | 22 +++++++++++-----------
 xen/common/sched_null.c      | 20 +++++++++++---------
 xen/common/sched_rt.c        | 18 +++++++++---------
 xen/common/schedule.c        | 10 ++++++----
 xen/include/xen/perfc_defn.h |  2 +-
 xen/include/xen/sched-if.h   | 12 ++++++------
 8 files changed, 58 insertions(+), 54 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 840891318e..383df750b0 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -601,15 +601,15 @@ a653sched_do_schedule(
 }
 
 /**
- * Xen scheduler callback function to select a CPU for the VCPU to run on
+ * Xen scheduler callback function to select a resource for the VCPU to run on
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param unit      Pointer to struct sched_unit
  *
- * @return          Number of selected physical CPU
+ * @return          Scheduler resource to run on
  */
-static int
-a653sched_pick_cpu(const struct scheduler *ops, struct sched_unit *unit)
+static struct sched_resource *
+a653sched_pick_resource(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
     cpumask_t *online;
@@ -627,7 +627,7 @@ a653sched_pick_cpu(const struct scheduler *ops, struct sched_unit *unit)
          || (cpu >= nr_cpu_ids) )
         cpu = vc->processor;
 
-    return cpu;
+    return get_sched_res(cpu);
 }
 
 /**
@@ -720,7 +720,7 @@ static const struct scheduler sched_arinc653_def = {
 
     .do_schedule    = a653sched_do_schedule,
 
-    .pick_cpu       = a653sched_pick_cpu,
+    .pick_resource  = a653sched_pick_resource,
 
     .switch_sched   = a653_switch_sched,
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 0cd139ec29..7f7596b3bf 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -858,8 +858,8 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     return cpu;
 }
 
-static int
-csched_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
+static struct sched_resource *
+csched_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
     struct csched_vcpu *svc = CSCHED_VCPU(vc);
@@ -872,7 +872,7 @@ csched_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
      * get boosted, which we don't deserve as we are "only" migrating.
      */
     set_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
-    return _csched_cpu_pick(ops, vc, 1);
+    return get_sched_res(_csched_cpu_pick(ops, vc, 1));
 }
 
 static inline void
@@ -972,7 +972,7 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
         /*
          * If it's been active a while, check if we'd be better off
          * migrating it to run elsewhere (see multi-core and multi-thread
-         * support in csched_cpu_pick()).
+         * support in csched_res_pick()).
          */
         new_cpu = _csched_cpu_pick(ops, current, 0);
 
@@ -1027,11 +1027,11 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    /* csched_cpu_pick() looks in vc->processor's runq, so we need the lock. */
+    /* csched_res_pick() looks in vc->processor's runq, so we need the lock. */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched_cpu_pick(ops, unit);
-    unit->res = get_sched_res(vc->processor);
+    unit->res = csched_res_pick(ops, unit);
+    vc->processor = unit->res->processor;
 
     spin_unlock_irq(lock);
 
@@ -2282,7 +2282,7 @@ static const struct scheduler sched_credit_def = {
     .adjust_affinity= csched_aff_cntl,
     .adjust_global  = csched_sys_cntl,
 
-    .pick_cpu       = csched_cpu_pick,
+    .pick_resource  = csched_res_pick,
     .do_schedule    = csched_schedule,
 
     .dump_cpu_state = csched_dump_pcpu,
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index c92a090905..b71802bba2 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -625,9 +625,9 @@ static inline bool has_cap(const struct csched2_vcpu *svc)
  * runq, _always_ happens by means of tickling:
  *  - when a vcpu wakes up, it calls csched2_unit_wake(), which calls
  *    runq_tickle();
- *  - when a migration is initiated in schedule.c, we call csched2_cpu_pick(),
+ *  - when a migration is initiated in schedule.c, we call csched2_res_pick(),
  *    csched2_unit_migrate() (which calls migrate()) and csched2_unit_wake().
- *    csched2_cpu_pick() looks for the least loaded runq and return just any
+ *    csched2_res_pick() looks for the least loaded runq and return just any
  *    of its processors. Then, csched2_unit_migrate() just moves the vcpu to
  *    the chosen runq, and it is again runq_tickle(), called by
  *    csched2_unit_wake() that actually decides what pcpu to use within the
@@ -676,7 +676,7 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 }
 
 /*
- * In csched2_cpu_pick(), it may not be possible to actually look at remote
+ * In csched2_res_pick(), it may not be possible to actually look at remote
  * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
  * we pick, in order of decreasing preference:
  *  1) svc's current pcpu, if it is part of svc's soft affinity;
@@ -2201,8 +2201,8 @@ csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 }
 
 #define MAX_LOAD (STIME_MAX)
-static int
-csched2_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
+static struct sched_resource *
+csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched2_private *prv = csched2_priv(ops);
     struct vcpu *vc = unit->vcpu;
@@ -2214,7 +2214,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
 
     ASSERT(!cpumask_empty(&prv->active_queues));
 
-    SCHED_STAT_CRANK(pick_cpu);
+    SCHED_STAT_CRANK(pick_resource);
 
     /* Locking:
      * - Runqueue lock of vc->processor is already locked
@@ -2423,7 +2423,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
                     (unsigned char *)&d);
     }
 
-    return new_cpu;
+    return get_sched_res(new_cpu);
 }
 
 /* Working state of the load-balancing algorithm */
@@ -3120,11 +3120,11 @@ csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     ASSERT(!is_idle_vcpu(vc));
     ASSERT(list_empty(&svc->runq_elem));
 
-    /* csched2_cpu_pick() expects the pcpu lock to be held */
+    /* csched2_res_pick() expects the pcpu lock to be held */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched2_cpu_pick(ops, unit);
-    unit->res = get_sched_res(vc->processor);
+    unit->res = csched2_res_pick(ops, unit);
+    vc->processor = unit->res->processor;
 
     spin_unlock_irq(lock);
 
@@ -4103,7 +4103,7 @@ static const struct scheduler sched_credit2_def = {
     .adjust_affinity= csched2_aff_cntl,
     .adjust_global  = csched2_sys_cntl,
 
-    .pick_cpu       = csched2_cpu_pick,
+    .pick_resource  = csched2_res_pick,
     .migrate        = csched2_unit_migrate,
     .do_schedule    = csched2_schedule,
     .context_saved  = csched2_context_saved,
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 7c01d0272d..1ea8643a9c 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -269,9 +269,11 @@ static void null_free_domdata(const struct scheduler *ops, void *data)
  *
  * So this is not part of any hot path.
  */
-static unsigned int pick_cpu(struct null_private *prv, struct vcpu *v)
+static struct sched_resource *
+pick_res(struct null_private *prv, struct sched_unit *unit)
 {
     unsigned int bs;
+    struct vcpu *v = unit->vcpu;
     unsigned int cpu = v->processor, new_cpu;
     cpumask_t *cpus = cpupool_domain_cpumask(v->domain);
 
@@ -335,7 +337,7 @@ static unsigned int pick_cpu(struct null_private *prv, struct vcpu *v)
         __trace_var(TRC_SNULL_PICKED_CPU, 1, sizeof(d), &d);
     }
 
-    return new_cpu;
+    return get_sched_res(new_cpu);
 }
 
 static void vcpu_assign(struct null_private *prv, struct vcpu *v,
@@ -421,8 +423,8 @@ static void null_unit_insert(const struct scheduler *ops,
     lock = vcpu_schedule_lock_irq(v);
  retry:
 
-    cpu = v->processor = pick_cpu(prv, v);
-    unit->res = get_sched_res(cpu);
+    unit->res = pick_res(prv, unit);
+    cpu = v->processor = unit->res->processor;
 
     spin_unlock(lock);
 
@@ -578,11 +580,11 @@ static void null_unit_sleep(const struct scheduler *ops,
     SCHED_STAT_CRANK(vcpu_sleep);
 }
 
-static int null_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
+static struct sched_resource *
+null_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
-    ASSERT(!is_idle_vcpu(v));
-    return pick_cpu(null_priv(ops), v);
+    ASSERT(!is_idle_vcpu(unit->vcpu));
+    return pick_res(null_priv(ops), unit);
 }
 
 static void null_unit_migrate(const struct scheduler *ops,
@@ -901,7 +903,7 @@ const struct scheduler sched_null_def = {
 
     .wake           = null_unit_wake,
     .sleep          = null_unit_sleep,
-    .pick_cpu       = null_cpu_pick,
+    .pick_resource  = null_res_pick,
     .migrate        = null_unit_migrate,
     .do_schedule    = null_schedule,
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 715a7cd218..b80bc02357 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -632,12 +632,12 @@ replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc)
 }
 
 /*
- * Pick a valid CPU for the vcpu vc
- * Valid CPU of a vcpu is intesection of vcpu's affinity
- * and available cpus
+ * Pick a valid resource for the vcpu vc
+ * Valid resource of a vcpu is intesection of vcpu's affinity
+ * and available resources
  */
-static int
-rt_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
+static struct sched_resource *
+rt_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
     cpumask_t cpus;
@@ -652,7 +652,7 @@ rt_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
             : cpumask_cycle(vc->processor, &cpus);
     ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
 
-    return cpu;
+    return get_sched_res(cpu);
 }
 
 /*
@@ -893,8 +893,8 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     BUG_ON( is_idle_vcpu(vc) );
 
     /* This is safe because vc isn't yet being scheduled */
-    vc->processor = rt_cpu_pick(ops, unit);
-    unit->res = get_sched_res(vc->processor);
+    unit->res = rt_res_pick(ops, unit);
+    vc->processor = unit->res->processor;
 
     lock = vcpu_schedule_lock_irq(vc);
 
@@ -1563,7 +1563,7 @@ static const struct scheduler sched_rtds_def = {
 
     .adjust         = rt_dom_cntl,
 
-    .pick_cpu       = rt_cpu_pick,
+    .pick_resource  = rt_res_pick,
     .do_schedule    = rt_schedule,
     .sleep          = rt_unit_sleep,
     .wake           = rt_unit_wake,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 32ce248f24..5dfdaaece2 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -687,7 +687,8 @@ static void vcpu_migrate_finish(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = sched_pick_cpu(vcpu_scheduler(v), v->sched_unit);
+            new_cpu = sched_pick_resource(vcpu_scheduler(v),
+                                          v->sched_unit)->processor;
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -793,8 +794,9 @@ void restore_vcpu_affinity(struct domain *d)
         v->sched_unit->res = get_sched_res(v->processor);
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = sched_pick_cpu(vcpu_scheduler(v), v->sched_unit);
-        v->sched_unit->res = get_sched_res(v->processor);
+        v->sched_unit->res = sched_pick_resource(vcpu_scheduler(v),
+                                                 v->sched_unit);
+        v->processor = v->sched_unit->res->processor;
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -1800,7 +1802,7 @@ void __init scheduler_init(void)
 
         sched_test_func(init);
         sched_test_func(deinit);
-        sched_test_func(pick_cpu);
+        sched_test_func(pick_resource);
         sched_test_func(alloc_vdata);
         sched_test_func(free_vdata);
         sched_test_func(switch_sched);
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index ef6f86b91e..1ad4384080 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -69,7 +69,7 @@ PERFCOUNTER(migrate_on_runq,        "csched2: migrate_on_runq")
 PERFCOUNTER(migrate_no_runq,        "csched2: migrate_no_runq")
 PERFCOUNTER(runtime_min_timer,      "csched2: runtime_min_timer")
 PERFCOUNTER(runtime_max_timer,      "csched2: runtime_max_timer")
-PERFCOUNTER(pick_cpu,               "csched2: pick_cpu")
+PERFCOUNTER(pick_resource,          "csched2: pick_resource")
 PERFCOUNTER(need_fallback_cpu,      "csched2: need_fallback_cpu")
 PERFCOUNTER(migrated,               "csched2: migrated")
 PERFCOUNTER(migrate_resisted,       "csched2: migrate_resisted")
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index bc7f223aad..dc149bc102 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -189,8 +189,8 @@ struct scheduler {
     struct task_slice (*do_schedule) (const struct scheduler *, s_time_t,
                                       bool_t tasklet_work_scheduled);
 
-    int          (*pick_cpu)       (const struct scheduler *,
-                                    struct sched_unit *);
+    struct sched_resource * (*pick_resource) (const struct scheduler *,
+                                              struct sched_unit *);
     void         (*migrate)        (const struct scheduler *,
                                     struct sched_unit *, unsigned int);
     int          (*adjust)         (const struct scheduler *, struct domain *,
@@ -351,14 +351,14 @@ static inline void sched_migrate(const struct scheduler *s,
     else
     {
         unit->vcpu->processor = cpu;
-        unit->res = per_cpu(sched_res, cpu);
+        unit->res = get_sched_res(cpu);
     }
 }
 
-static inline int sched_pick_cpu(const struct scheduler *s,
-                                 struct sched_unit *unit)
+static inline struct sched_resource *sched_pick_resource(
+    const struct scheduler *s, struct sched_unit *unit)
 {
-    return s->pick_cpu(s, unit);
+    return s->pick_resource(s, unit);
 }
 
 static inline void sched_adjust_affinity(const struct scheduler *s,
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 09/60] xen/sched: let pick_cpu return a scheduler resource
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

Instead of returning a physical cpu number let pick_cpu() return a
scheduler resource instead. Rename pick_cpu() to pick_resource() to
reflect that change.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c  | 12 ++++++------
 xen/common/sched_credit.c    | 16 ++++++++--------
 xen/common/sched_credit2.c   | 22 +++++++++++-----------
 xen/common/sched_null.c      | 20 +++++++++++---------
 xen/common/sched_rt.c        | 18 +++++++++---------
 xen/common/schedule.c        | 10 ++++++----
 xen/include/xen/perfc_defn.h |  2 +-
 xen/include/xen/sched-if.h   | 12 ++++++------
 8 files changed, 58 insertions(+), 54 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 840891318e..383df750b0 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -601,15 +601,15 @@ a653sched_do_schedule(
 }
 
 /**
- * Xen scheduler callback function to select a CPU for the VCPU to run on
+ * Xen scheduler callback function to select a resource for the VCPU to run on
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param unit      Pointer to struct sched_unit
  *
- * @return          Number of selected physical CPU
+ * @return          Scheduler resource to run on
  */
-static int
-a653sched_pick_cpu(const struct scheduler *ops, struct sched_unit *unit)
+static struct sched_resource *
+a653sched_pick_resource(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
     cpumask_t *online;
@@ -627,7 +627,7 @@ a653sched_pick_cpu(const struct scheduler *ops, struct sched_unit *unit)
          || (cpu >= nr_cpu_ids) )
         cpu = vc->processor;
 
-    return cpu;
+    return get_sched_res(cpu);
 }
 
 /**
@@ -720,7 +720,7 @@ static const struct scheduler sched_arinc653_def = {
 
     .do_schedule    = a653sched_do_schedule,
 
-    .pick_cpu       = a653sched_pick_cpu,
+    .pick_resource  = a653sched_pick_resource,
 
     .switch_sched   = a653_switch_sched,
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 0cd139ec29..7f7596b3bf 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -858,8 +858,8 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     return cpu;
 }
 
-static int
-csched_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
+static struct sched_resource *
+csched_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
     struct csched_vcpu *svc = CSCHED_VCPU(vc);
@@ -872,7 +872,7 @@ csched_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
      * get boosted, which we don't deserve as we are "only" migrating.
      */
     set_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
-    return _csched_cpu_pick(ops, vc, 1);
+    return get_sched_res(_csched_cpu_pick(ops, vc, 1));
 }
 
 static inline void
@@ -972,7 +972,7 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
         /*
          * If it's been active a while, check if we'd be better off
          * migrating it to run elsewhere (see multi-core and multi-thread
-         * support in csched_cpu_pick()).
+         * support in csched_res_pick()).
          */
         new_cpu = _csched_cpu_pick(ops, current, 0);
 
@@ -1027,11 +1027,11 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    /* csched_cpu_pick() looks in vc->processor's runq, so we need the lock. */
+    /* csched_res_pick() looks in vc->processor's runq, so we need the lock. */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched_cpu_pick(ops, unit);
-    unit->res = get_sched_res(vc->processor);
+    unit->res = csched_res_pick(ops, unit);
+    vc->processor = unit->res->processor;
 
     spin_unlock_irq(lock);
 
@@ -2282,7 +2282,7 @@ static const struct scheduler sched_credit_def = {
     .adjust_affinity= csched_aff_cntl,
     .adjust_global  = csched_sys_cntl,
 
-    .pick_cpu       = csched_cpu_pick,
+    .pick_resource  = csched_res_pick,
     .do_schedule    = csched_schedule,
 
     .dump_cpu_state = csched_dump_pcpu,
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index c92a090905..b71802bba2 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -625,9 +625,9 @@ static inline bool has_cap(const struct csched2_vcpu *svc)
  * runq, _always_ happens by means of tickling:
  *  - when a vcpu wakes up, it calls csched2_unit_wake(), which calls
  *    runq_tickle();
- *  - when a migration is initiated in schedule.c, we call csched2_cpu_pick(),
+ *  - when a migration is initiated in schedule.c, we call csched2_res_pick(),
  *    csched2_unit_migrate() (which calls migrate()) and csched2_unit_wake().
- *    csched2_cpu_pick() looks for the least loaded runq and return just any
+ *    csched2_res_pick() looks for the least loaded runq and return just any
  *    of its processors. Then, csched2_unit_migrate() just moves the vcpu to
  *    the chosen runq, and it is again runq_tickle(), called by
  *    csched2_unit_wake() that actually decides what pcpu to use within the
@@ -676,7 +676,7 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 }
 
 /*
- * In csched2_cpu_pick(), it may not be possible to actually look at remote
+ * In csched2_res_pick(), it may not be possible to actually look at remote
  * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
  * we pick, in order of decreasing preference:
  *  1) svc's current pcpu, if it is part of svc's soft affinity;
@@ -2201,8 +2201,8 @@ csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 }
 
 #define MAX_LOAD (STIME_MAX)
-static int
-csched2_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
+static struct sched_resource *
+csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched2_private *prv = csched2_priv(ops);
     struct vcpu *vc = unit->vcpu;
@@ -2214,7 +2214,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
 
     ASSERT(!cpumask_empty(&prv->active_queues));
 
-    SCHED_STAT_CRANK(pick_cpu);
+    SCHED_STAT_CRANK(pick_resource);
 
     /* Locking:
      * - Runqueue lock of vc->processor is already locked
@@ -2423,7 +2423,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
                     (unsigned char *)&d);
     }
 
-    return new_cpu;
+    return get_sched_res(new_cpu);
 }
 
 /* Working state of the load-balancing algorithm */
@@ -3120,11 +3120,11 @@ csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     ASSERT(!is_idle_vcpu(vc));
     ASSERT(list_empty(&svc->runq_elem));
 
-    /* csched2_cpu_pick() expects the pcpu lock to be held */
+    /* csched2_res_pick() expects the pcpu lock to be held */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched2_cpu_pick(ops, unit);
-    unit->res = get_sched_res(vc->processor);
+    unit->res = csched2_res_pick(ops, unit);
+    vc->processor = unit->res->processor;
 
     spin_unlock_irq(lock);
 
@@ -4103,7 +4103,7 @@ static const struct scheduler sched_credit2_def = {
     .adjust_affinity= csched2_aff_cntl,
     .adjust_global  = csched2_sys_cntl,
 
-    .pick_cpu       = csched2_cpu_pick,
+    .pick_resource  = csched2_res_pick,
     .migrate        = csched2_unit_migrate,
     .do_schedule    = csched2_schedule,
     .context_saved  = csched2_context_saved,
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 7c01d0272d..1ea8643a9c 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -269,9 +269,11 @@ static void null_free_domdata(const struct scheduler *ops, void *data)
  *
  * So this is not part of any hot path.
  */
-static unsigned int pick_cpu(struct null_private *prv, struct vcpu *v)
+static struct sched_resource *
+pick_res(struct null_private *prv, struct sched_unit *unit)
 {
     unsigned int bs;
+    struct vcpu *v = unit->vcpu;
     unsigned int cpu = v->processor, new_cpu;
     cpumask_t *cpus = cpupool_domain_cpumask(v->domain);
 
@@ -335,7 +337,7 @@ static unsigned int pick_cpu(struct null_private *prv, struct vcpu *v)
         __trace_var(TRC_SNULL_PICKED_CPU, 1, sizeof(d), &d);
     }
 
-    return new_cpu;
+    return get_sched_res(new_cpu);
 }
 
 static void vcpu_assign(struct null_private *prv, struct vcpu *v,
@@ -421,8 +423,8 @@ static void null_unit_insert(const struct scheduler *ops,
     lock = vcpu_schedule_lock_irq(v);
  retry:
 
-    cpu = v->processor = pick_cpu(prv, v);
-    unit->res = get_sched_res(cpu);
+    unit->res = pick_res(prv, unit);
+    cpu = v->processor = unit->res->processor;
 
     spin_unlock(lock);
 
@@ -578,11 +580,11 @@ static void null_unit_sleep(const struct scheduler *ops,
     SCHED_STAT_CRANK(vcpu_sleep);
 }
 
-static int null_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
+static struct sched_resource *
+null_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
-    ASSERT(!is_idle_vcpu(v));
-    return pick_cpu(null_priv(ops), v);
+    ASSERT(!is_idle_vcpu(unit->vcpu));
+    return pick_res(null_priv(ops), unit);
 }
 
 static void null_unit_migrate(const struct scheduler *ops,
@@ -901,7 +903,7 @@ const struct scheduler sched_null_def = {
 
     .wake           = null_unit_wake,
     .sleep          = null_unit_sleep,
-    .pick_cpu       = null_cpu_pick,
+    .pick_resource  = null_res_pick,
     .migrate        = null_unit_migrate,
     .do_schedule    = null_schedule,
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 715a7cd218..b80bc02357 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -632,12 +632,12 @@ replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc)
 }
 
 /*
- * Pick a valid CPU for the vcpu vc
- * Valid CPU of a vcpu is intesection of vcpu's affinity
- * and available cpus
+ * Pick a valid resource for the vcpu vc
+ * Valid resource of a vcpu is intesection of vcpu's affinity
+ * and available resources
  */
-static int
-rt_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
+static struct sched_resource *
+rt_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
     cpumask_t cpus;
@@ -652,7 +652,7 @@ rt_cpu_pick(const struct scheduler *ops, struct sched_unit *unit)
             : cpumask_cycle(vc->processor, &cpus);
     ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
 
-    return cpu;
+    return get_sched_res(cpu);
 }
 
 /*
@@ -893,8 +893,8 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     BUG_ON( is_idle_vcpu(vc) );
 
     /* This is safe because vc isn't yet being scheduled */
-    vc->processor = rt_cpu_pick(ops, unit);
-    unit->res = get_sched_res(vc->processor);
+    unit->res = rt_res_pick(ops, unit);
+    vc->processor = unit->res->processor;
 
     lock = vcpu_schedule_lock_irq(vc);
 
@@ -1563,7 +1563,7 @@ static const struct scheduler sched_rtds_def = {
 
     .adjust         = rt_dom_cntl,
 
-    .pick_cpu       = rt_cpu_pick,
+    .pick_resource  = rt_res_pick,
     .do_schedule    = rt_schedule,
     .sleep          = rt_unit_sleep,
     .wake           = rt_unit_wake,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 32ce248f24..5dfdaaece2 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -687,7 +687,8 @@ static void vcpu_migrate_finish(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = sched_pick_cpu(vcpu_scheduler(v), v->sched_unit);
+            new_cpu = sched_pick_resource(vcpu_scheduler(v),
+                                          v->sched_unit)->processor;
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -793,8 +794,9 @@ void restore_vcpu_affinity(struct domain *d)
         v->sched_unit->res = get_sched_res(v->processor);
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = sched_pick_cpu(vcpu_scheduler(v), v->sched_unit);
-        v->sched_unit->res = get_sched_res(v->processor);
+        v->sched_unit->res = sched_pick_resource(vcpu_scheduler(v),
+                                                 v->sched_unit);
+        v->processor = v->sched_unit->res->processor;
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -1800,7 +1802,7 @@ void __init scheduler_init(void)
 
         sched_test_func(init);
         sched_test_func(deinit);
-        sched_test_func(pick_cpu);
+        sched_test_func(pick_resource);
         sched_test_func(alloc_vdata);
         sched_test_func(free_vdata);
         sched_test_func(switch_sched);
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index ef6f86b91e..1ad4384080 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -69,7 +69,7 @@ PERFCOUNTER(migrate_on_runq,        "csched2: migrate_on_runq")
 PERFCOUNTER(migrate_no_runq,        "csched2: migrate_no_runq")
 PERFCOUNTER(runtime_min_timer,      "csched2: runtime_min_timer")
 PERFCOUNTER(runtime_max_timer,      "csched2: runtime_max_timer")
-PERFCOUNTER(pick_cpu,               "csched2: pick_cpu")
+PERFCOUNTER(pick_resource,          "csched2: pick_resource")
 PERFCOUNTER(need_fallback_cpu,      "csched2: need_fallback_cpu")
 PERFCOUNTER(migrated,               "csched2: migrated")
 PERFCOUNTER(migrate_resisted,       "csched2: migrate_resisted")
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index bc7f223aad..dc149bc102 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -189,8 +189,8 @@ struct scheduler {
     struct task_slice (*do_schedule) (const struct scheduler *, s_time_t,
                                       bool_t tasklet_work_scheduled);
 
-    int          (*pick_cpu)       (const struct scheduler *,
-                                    struct sched_unit *);
+    struct sched_resource * (*pick_resource) (const struct scheduler *,
+                                              struct sched_unit *);
     void         (*migrate)        (const struct scheduler *,
                                     struct sched_unit *, unsigned int);
     int          (*adjust)         (const struct scheduler *, struct domain *,
@@ -351,14 +351,14 @@ static inline void sched_migrate(const struct scheduler *s,
     else
     {
         unit->vcpu->processor = cpu;
-        unit->res = per_cpu(sched_res, cpu);
+        unit->res = get_sched_res(cpu);
     }
 }
 
-static inline int sched_pick_cpu(const struct scheduler *s,
-                                 struct sched_unit *unit)
+static inline struct sched_resource *sched_pick_resource(
+    const struct scheduler *s, struct sched_unit *unit)
 {
-    return s->pick_cpu(s, unit);
+    return s->pick_resource(s, unit);
 }
 
 static inline void sched_adjust_affinity(const struct scheduler *s,
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 10/60] xen/sched: switch schedule_data.curr to point at sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

In preparation of core scheduling let the percpu pointer
schedule_data.curr point to a strct sched_unit instead of the related
vcpu. At the same time rename the per-vcpu scheduler specific structs
to per-unit ones.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |   2 +-
 xen/common/sched_credit.c   | 101 +++++++++++++-------------
 xen/common/sched_credit2.c  | 168 ++++++++++++++++++++++----------------------
 xen/common/sched_null.c     |  44 ++++++------
 xen/common/sched_rt.c       | 118 +++++++++++++++----------------
 xen/common/schedule.c       |   8 ++-
 xen/include/xen/sched-if.h  |   2 +-
 7 files changed, 220 insertions(+), 223 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 383df750b0..7f26e6a0b0 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -475,7 +475,7 @@ a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
      * If the VCPU being put to sleep is the same one that is currently
      * running, raise a softirq to invoke the scheduler to switch domains.
      */
-    if ( per_cpu(schedule_data, vc->processor).curr == vc )
+    if ( per_cpu(schedule_data, vc->processor).curr == unit )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
 }
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 7f7596b3bf..bc87b813dc 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -83,7 +83,7 @@
     ((struct csched_private *)((_ops)->sched_data))
 #define CSCHED_PCPU(_c)     \
     ((struct csched_pcpu *)per_cpu(schedule_data, _c).sched_priv)
-#define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_unit->priv)
+#define CSCHED_UNIT(unit)   ((struct csched_unit *) (unit)->priv)
 #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
 #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
 
@@ -160,7 +160,7 @@ struct csched_pcpu {
 /*
  * Virtual CPU
  */
-struct csched_vcpu {
+struct csched_unit {
     struct list_head runq_elem;
     struct list_head active_vcpu_elem;
 
@@ -231,15 +231,15 @@ static void csched_tick(void *_cpu);
 static void csched_acct(void *dummy);
 
 static inline int
-__vcpu_on_runq(struct csched_vcpu *svc)
+__vcpu_on_runq(struct csched_unit *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
 
-static inline struct csched_vcpu *
+static inline struct csched_unit *
 __runq_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct csched_vcpu, runq_elem);
+    return list_entry(elem, struct csched_unit, runq_elem);
 }
 
 /* Is the first element of cpu's runq (if any) cpu's idle vcpu? */
@@ -271,7 +271,7 @@ dec_nr_runnable(unsigned int cpu)
 }
 
 static inline void
-__runq_insert(struct csched_vcpu *svc)
+__runq_insert(struct csched_unit *svc)
 {
     unsigned int cpu = svc->vcpu->processor;
     const struct list_head * const runq = RUNQ(cpu);
@@ -281,7 +281,7 @@ __runq_insert(struct csched_vcpu *svc)
 
     list_for_each( iter, runq )
     {
-        const struct csched_vcpu * const iter_svc = __runq_elem(iter);
+        const struct csched_unit * const iter_svc = __runq_elem(iter);
         if ( svc->pri > iter_svc->pri )
             break;
     }
@@ -302,34 +302,34 @@ __runq_insert(struct csched_vcpu *svc)
 }
 
 static inline void
-runq_insert(struct csched_vcpu *svc)
+runq_insert(struct csched_unit *svc)
 {
     __runq_insert(svc);
     inc_nr_runnable(svc->vcpu->processor);
 }
 
 static inline void
-__runq_remove(struct csched_vcpu *svc)
+__runq_remove(struct csched_unit *svc)
 {
     BUG_ON( !__vcpu_on_runq(svc) );
     list_del_init(&svc->runq_elem);
 }
 
 static inline void
-runq_remove(struct csched_vcpu *svc)
+runq_remove(struct csched_unit *svc)
 {
     dec_nr_runnable(svc->vcpu->processor);
     __runq_remove(svc);
 }
 
-static void burn_credits(struct csched_vcpu *svc, s_time_t now)
+static void burn_credits(struct csched_unit *svc, s_time_t now)
 {
     s_time_t delta;
     uint64_t val;
     unsigned int credits;
 
     /* Assert svc is current */
-    ASSERT( svc == CSCHED_VCPU(curr_on_cpu(svc->vcpu->processor)) );
+    ASSERT( svc == CSCHED_UNIT(curr_on_cpu(svc->vcpu->processor)) );
 
     if ( (delta = now - svc->start_time) <= 0 )
         return;
@@ -347,10 +347,10 @@ boolean_param("tickle_one_idle_cpu", opt_tickle_one_idle);
 
 DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 
-static inline void __runq_tickle(struct csched_vcpu *new)
+static inline void __runq_tickle(struct csched_unit *new)
 {
     unsigned int cpu = new->vcpu->processor;
-    struct csched_vcpu * const cur = CSCHED_VCPU(curr_on_cpu(cpu));
+    struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     cpumask_t mask, idle_mask, *online;
     int balance_step, idlers_empty;
@@ -605,7 +605,7 @@ init_pdata(struct csched_private *prv, struct csched_pcpu *spc, int cpu)
     spc->idle_bias = nr_cpu_ids - 1;
 
     /* Start off idling... */
-    BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)));
+    BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)->vcpu));
     cpumask_set_cpu(cpu, prv->idlers);
     spc->nr_runnable = 0;
 }
@@ -637,7 +637,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct csched_private *prv = CSCHED_PRIV(new_ops);
-    struct csched_vcpu *svc = vdata;
+    struct csched_unit *svc = vdata;
 
     ASSERT(svc && is_idle_vcpu(svc->vcpu));
 
@@ -660,7 +660,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 static inline void
 __csched_vcpu_check(struct vcpu *vc)
 {
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_unit * const svc = CSCHED_UNIT(vc->sched_unit);
     struct csched_dom * const sdom = svc->sdom;
 
     BUG_ON( svc->vcpu != vc );
@@ -862,7 +862,7 @@ static struct sched_resource *
 csched_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu *svc = CSCHED_VCPU(vc);
+    struct csched_unit *svc = CSCHED_UNIT(unit);
 
     /*
      * We have been called by vcpu_migrate() (in schedule.c), as part
@@ -876,7 +876,7 @@ csched_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 }
 
 static inline void
-__csched_vcpu_acct_start(struct csched_private *prv, struct csched_vcpu *svc)
+__csched_vcpu_acct_start(struct csched_private *prv, struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
     unsigned long flags;
@@ -906,7 +906,7 @@ __csched_vcpu_acct_start(struct csched_private *prv, struct csched_vcpu *svc)
 
 static inline void
 __csched_vcpu_acct_stop_locked(struct csched_private *prv,
-    struct csched_vcpu *svc)
+    struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
@@ -931,7 +931,7 @@ __csched_vcpu_acct_stop_locked(struct csched_private *prv,
 static void
 csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
 {
-    struct csched_vcpu * const svc = CSCHED_VCPU(current);
+    struct csched_unit * const svc = CSCHED_UNIT(current->sched_unit);
     const struct scheduler *ops = per_cpu(scheduler, cpu);
 
     ASSERT( current->processor == cpu );
@@ -1000,10 +1000,10 @@ csched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
                    void *dd)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu *svc;
+    struct csched_unit *svc;
 
     /* Allocate per-VCPU info */
-    svc = xzalloc(struct csched_vcpu);
+    svc = xzalloc(struct csched_unit);
     if ( svc == NULL )
         return NULL;
 
@@ -1022,7 +1022,7 @@ static void
 csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu *svc = unit->priv;
+    struct csched_unit *svc = unit->priv;
     spinlock_t *lock;
 
     BUG_ON( is_idle_vcpu(vc) );
@@ -1048,7 +1048,7 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 static void
 csched_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct csched_vcpu *svc = priv;
+    struct csched_unit *svc = priv;
 
     BUG_ON( !list_empty(&svc->runq_elem) );
 
@@ -1059,8 +1059,7 @@ static void
 csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_unit * const svc = CSCHED_UNIT(unit);
     struct csched_dom * const sdom = svc->sdom;
 
     SCHED_STAT_CRANK(vcpu_remove);
@@ -1087,14 +1086,14 @@ static void
 csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_unit * const svc = CSCHED_UNIT(unit);
     unsigned int cpu = vc->processor;
 
     SCHED_STAT_CRANK(vcpu_sleep);
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    if ( curr_on_cpu(cpu) == vc )
+    if ( curr_on_cpu(cpu) == unit )
     {
         /*
          * We are about to tickle cpu, so we should clear its bit in idlers.
@@ -1112,12 +1111,12 @@ static void
 csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_unit * const svc = CSCHED_UNIT(unit);
     bool_t migrating;
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
+    if ( unlikely(curr_on_cpu(vc->processor) == unit) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         return;
@@ -1173,8 +1172,7 @@ csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 static void
 csched_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_unit * const svc = CSCHED_UNIT(unit);
 
     /* Let the scheduler know that this vcpu is trying to yield */
     set_bit(CSCHED_FLAG_VCPU_YIELD, &svc->flags);
@@ -1229,8 +1227,7 @@ static void
 csched_aff_cntl(const struct scheduler *ops, struct sched_unit *unit,
                 const cpumask_t *hard, const cpumask_t *soft)
 {
-    struct vcpu *v = unit->vcpu;
-    struct csched_vcpu *svc = CSCHED_VCPU(v);
+    struct csched_unit *svc = CSCHED_UNIT(unit);
 
     if ( !hard )
         return;
@@ -1333,7 +1330,7 @@ csched_runq_sort(struct csched_private *prv, unsigned int cpu)
 {
     struct csched_pcpu * const spc = CSCHED_PCPU(cpu);
     struct list_head *runq, *elem, *next, *last_under;
-    struct csched_vcpu *svc_elem;
+    struct csched_unit *svc_elem;
     spinlock_t *lock;
     unsigned long flags;
     int sort_epoch;
@@ -1379,7 +1376,7 @@ csched_acct(void* dummy)
     unsigned long flags;
     struct list_head *iter_vcpu, *next_vcpu;
     struct list_head *iter_sdom, *next_sdom;
-    struct csched_vcpu *svc;
+    struct csched_unit *svc;
     struct csched_dom *sdom;
     uint32_t credit_total;
     uint32_t weight_total;
@@ -1502,7 +1499,7 @@ csched_acct(void* dummy)
 
         list_for_each_safe( iter_vcpu, next_vcpu, &sdom->active_vcpu )
         {
-            svc = list_entry(iter_vcpu, struct csched_vcpu, active_vcpu_elem);
+            svc = list_entry(iter_vcpu, struct csched_unit, active_vcpu_elem);
             BUG_ON( sdom != svc->sdom );
 
             /* Increment credit */
@@ -1606,12 +1603,12 @@ csched_tick(void *_cpu)
     set_timer(&spc->ticker, NOW() + MICROSECS(prv->tick_period_us) );
 }
 
-static struct csched_vcpu *
+static struct csched_unit *
 csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
 {
     const struct csched_private * const prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu);
-    struct csched_vcpu *speer;
+    struct csched_unit *speer;
     struct list_head *iter;
     struct vcpu *vc;
 
@@ -1621,7 +1618,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
      * Don't steal from an idle CPU's runq because it's about to
      * pick up work from it itself.
      */
-    if ( unlikely(is_idle_vcpu(curr_on_cpu(peer_cpu))) )
+    if ( unlikely(is_idle_vcpu(curr_on_cpu(peer_cpu)->vcpu)) )
         goto out;
 
     list_for_each( iter, &peer_pcpu->runq )
@@ -1683,12 +1680,12 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
     return NULL;
 }
 
-static struct csched_vcpu *
+static struct csched_unit *
 csched_load_balance(struct csched_private *prv, int cpu,
-    struct csched_vcpu *snext, bool_t *stolen)
+    struct csched_unit *snext, bool_t *stolen)
 {
     struct cpupool *c = per_cpu(cpupool, cpu);
-    struct csched_vcpu *speer;
+    struct csched_unit *speer;
     cpumask_t workers;
     cpumask_t *online;
     int peer_cpu, first_cpu, peer_node, bstep;
@@ -1837,9 +1834,9 @@ csched_schedule(
 {
     const int cpu = smp_processor_id();
     struct list_head * const runq = RUNQ(cpu);
-    struct csched_vcpu * const scurr = CSCHED_VCPU(current);
+    struct csched_unit * const scurr = CSCHED_UNIT(current->sched_unit);
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct csched_vcpu *snext;
+    struct csched_unit *snext;
     struct task_slice ret;
     s_time_t runtime, tslice;
 
@@ -1955,7 +1952,7 @@ csched_schedule(
     if ( tasklet_work_scheduled )
     {
         TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_VCPU(idle_vcpu[cpu]);
+        snext = CSCHED_UNIT(idle_vcpu[cpu]->sched_unit);
         snext->pri = CSCHED_PRI_TS_BOOST;
     }
 
@@ -2007,7 +2004,7 @@ out:
 }
 
 static void
-csched_dump_vcpu(struct csched_vcpu *svc)
+csched_dump_vcpu(struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
@@ -2043,7 +2040,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
     struct list_head *runq, *iter;
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_pcpu *spc;
-    struct csched_vcpu *svc;
+    struct csched_unit *svc;
     spinlock_t *lock;
     unsigned long flags;
     int loop;
@@ -2067,7 +2064,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
     /* current VCPU (nothing to say if that's the idle vcpu). */
-    svc = CSCHED_VCPU(curr_on_cpu(cpu));
+    svc = CSCHED_UNIT(curr_on_cpu(cpu));
     if ( svc && !is_idle_vcpu(svc->vcpu) )
     {
         printk("\trun: ");
@@ -2136,10 +2133,10 @@ csched_dump(const struct scheduler *ops)
 
         list_for_each( iter_svc, &sdom->active_vcpu )
         {
-            struct csched_vcpu *svc;
+            struct csched_unit *svc;
             spinlock_t *lock;
 
-            svc = list_entry(iter_svc, struct csched_vcpu, active_vcpu_elem);
+            svc = list_entry(iter_svc, struct csched_unit, active_vcpu_elem);
             lock = vcpu_schedule_lock(svc->vcpu);
 
             printk("\t%3d: ", ++loop);
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index b71802bba2..36201815e3 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -176,7 +176,7 @@
  *     load balancing;
  *  + serializes runqueue operations (removing and inserting vcpus);
  *  + protects runqueue-wide data in csched2_runqueue_data;
- *  + protects vcpu parameters in csched2_vcpu for the vcpu in the
+ *  + protects vcpu parameters in csched2_unit for the vcpu in the
  *    runqueue.
  *
  * - Private scheduler lock
@@ -511,7 +511,7 @@ struct csched2_pcpu {
 /*
  * Virtual CPU
  */
-struct csched2_vcpu {
+struct csched2_unit {
     struct csched2_dom *sdom;          /* Up-pointer to domain                */
     struct vcpu *vcpu;                 /* Up-pointer, to vcpu                 */
     struct csched2_runqueue_data *rqd; /* Up-pointer to the runqueue          */
@@ -570,9 +570,9 @@ static inline struct csched2_pcpu *csched2_pcpu(unsigned int cpu)
     return per_cpu(schedule_data, cpu).sched_priv;
 }
 
-static inline struct csched2_vcpu *csched2_vcpu(const struct vcpu *v)
+static inline struct csched2_unit *csched2_unit(const struct sched_unit *unit)
 {
-    return v->sched_unit->priv;
+    return unit->priv;
 }
 
 static inline struct csched2_dom *csched2_dom(const struct domain *d)
@@ -594,7 +594,7 @@ static inline struct csched2_runqueue_data *c2rqd(const struct scheduler *ops,
 }
 
 /* Does the domain of this vCPU have a cap? */
-static inline bool has_cap(const struct csched2_vcpu *svc)
+static inline bool has_cap(const struct csched2_unit *svc)
 {
     return svc->budget != STIME_MAX;
 }
@@ -688,7 +688,7 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
  * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
  * note that at least 5 is guaranteed to _always_ return at least one pcpu.
  */
-static int get_fallback_cpu(struct csched2_vcpu *svc)
+static int get_fallback_cpu(struct csched2_unit *svc)
 {
     struct vcpu *v = svc->vcpu;
     unsigned int bs;
@@ -773,7 +773,7 @@ static int get_fallback_cpu(struct csched2_vcpu *svc)
  * FIXME: Do pre-calculated division?
  */
 static void t2c_update(struct csched2_runqueue_data *rqd, s_time_t time,
-                          struct csched2_vcpu *svc)
+                          struct csched2_unit *svc)
 {
     uint64_t val = time * rqd->max_weight + svc->residual;
 
@@ -781,7 +781,7 @@ static void t2c_update(struct csched2_runqueue_data *rqd, s_time_t time,
     svc->credit -= val;
 }
 
-static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct csched2_vcpu *svc)
+static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct csched2_unit *svc)
 {
     return credit * svc->weight / rqd->max_weight;
 }
@@ -790,14 +790,14 @@ static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct c
  * Runqueue related code.
  */
 
-static inline int vcpu_on_runq(struct csched2_vcpu *svc)
+static inline int vcpu_on_runq(struct csched2_unit *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
 
-static inline struct csched2_vcpu * runq_elem(struct list_head *elem)
+static inline struct csched2_unit * runq_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct csched2_vcpu, runq_elem);
+    return list_entry(elem, struct csched2_unit, runq_elem);
 }
 
 static void activate_runqueue(struct csched2_private *prv, int rqi)
@@ -915,7 +915,7 @@ static void update_max_weight(struct csched2_runqueue_data *rqd, int new_weight,
 
         list_for_each( iter, &rqd->svc )
         {
-            struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, rqd_elem);
+            struct csched2_unit * svc = list_entry(iter, struct csched2_unit, rqd_elem);
 
             if ( svc->weight > max_weight )
                 max_weight = svc->weight;
@@ -940,7 +940,7 @@ static void update_max_weight(struct csched2_runqueue_data *rqd, int new_weight,
 
 /* Add and remove from runqueue assignment (not active run queue) */
 static void
-_runq_assign(struct csched2_vcpu *svc, struct csched2_runqueue_data *rqd)
+_runq_assign(struct csched2_unit *svc, struct csched2_runqueue_data *rqd)
 {
 
     svc->rqd = rqd;
@@ -970,7 +970,7 @@ _runq_assign(struct csched2_vcpu *svc, struct csched2_runqueue_data *rqd)
 static void
 runq_assign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_unit->priv;
+    struct csched2_unit *svc = vc->sched_unit->priv;
 
     ASSERT(svc->rqd == NULL);
 
@@ -978,7 +978,7 @@ runq_assign(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-_runq_deassign(struct csched2_vcpu *svc)
+_runq_deassign(struct csched2_unit *svc)
 {
     struct csched2_runqueue_data *rqd = svc->rqd;
 
@@ -997,7 +997,7 @@ _runq_deassign(struct csched2_vcpu *svc)
 static void
 runq_deassign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_unit->priv;
+    struct csched2_unit *svc = vc->sched_unit->priv;
 
     ASSERT(svc->rqd == c2rqd(ops, vc->processor));
 
@@ -1199,7 +1199,7 @@ update_runq_load(const struct scheduler *ops,
 
 static void
 update_svc_load(const struct scheduler *ops,
-                struct csched2_vcpu *svc, int change, s_time_t now)
+                struct csched2_unit *svc, int change, s_time_t now)
 {
     struct csched2_private *prv = csched2_priv(ops);
     s_time_t delta, vcpu_load;
@@ -1259,7 +1259,7 @@ update_svc_load(const struct scheduler *ops,
 static void
 update_load(const struct scheduler *ops,
             struct csched2_runqueue_data *rqd,
-            struct csched2_vcpu *svc, int change, s_time_t now)
+            struct csched2_unit *svc, int change, s_time_t now)
 {
     trace_var(TRC_CSCHED2_UPDATE_LOAD, 1, 0,  NULL);
 
@@ -1269,7 +1269,7 @@ update_load(const struct scheduler *ops,
 }
 
 static void
-runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
+runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
 {
     struct list_head *iter;
     unsigned int cpu = svc->vcpu->processor;
@@ -1288,7 +1288,7 @@ runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
 
     list_for_each( iter, runq )
     {
-        struct csched2_vcpu * iter_svc = runq_elem(iter);
+        struct csched2_unit * iter_svc = runq_elem(iter);
 
         if ( svc->credit > iter_svc->credit )
             break;
@@ -1312,13 +1312,13 @@ runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
     }
 }
 
-static inline void runq_remove(struct csched2_vcpu *svc)
+static inline void runq_remove(struct csched2_unit *svc)
 {
     ASSERT(vcpu_on_runq(svc));
     list_del_init(&svc->runq_elem);
 }
 
-void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
+void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_unit *, s_time_t);
 
 static inline void
 tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
@@ -1334,7 +1334,7 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
  * whether or not it already run for more than the ratelimit, to which we
  * apply some tolerance).
  */
-static inline bool is_preemptable(const struct csched2_vcpu *svc,
+static inline bool is_preemptable(const struct csched2_unit *svc,
                                     s_time_t now, s_time_t ratelimit)
 {
     if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
@@ -1360,10 +1360,10 @@ static inline bool is_preemptable(const struct csched2_vcpu *svc,
  * Within the same class, the highest difference of credit.
  */
 static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
-                             struct csched2_vcpu *new, unsigned int cpu)
+                             struct csched2_unit *new, unsigned int cpu)
 {
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
-    struct csched2_vcpu * cur = csched2_vcpu(curr_on_cpu(cpu));
+    struct csched2_unit * cur = csched2_unit(curr_on_cpu(cpu));
     struct csched2_private *prv = csched2_priv(ops);
     s_time_t score;
 
@@ -1432,7 +1432,7 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
  * pick up some work, so it would be wrong to consider it idle.
  */
 static void
-runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
+runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t max = 0;
@@ -1587,7 +1587,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         return;
     }
 
-    ASSERT(!is_idle_vcpu(curr_on_cpu(ipid)));
+    ASSERT(!is_idle_vcpu(curr_on_cpu(ipid)->vcpu));
     SCHED_STAT_CRANK(tickled_busy_cpu);
  tickle:
     BUG_ON(ipid == -1);
@@ -1614,7 +1614,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
  * Credit-related code
  */
 static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
-                         struct csched2_vcpu *snext)
+                         struct csched2_unit *snext)
 {
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
     struct list_head *iter;
@@ -1644,10 +1644,10 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
     list_for_each( iter, &rqd->svc )
     {
         unsigned int svc_cpu;
-        struct csched2_vcpu * svc;
+        struct csched2_unit * svc;
         int start_credit;
 
-        svc = list_entry(iter, struct csched2_vcpu, rqd_elem);
+        svc = list_entry(iter, struct csched2_unit, rqd_elem);
         svc_cpu = svc->vcpu->processor;
 
         ASSERT(!is_idle_vcpu(svc->vcpu));
@@ -1657,7 +1657,7 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
          * If svc is running, it is our responsibility to make sure, here,
          * that the credit it has spent so far get accounted.
          */
-        if ( svc->vcpu == curr_on_cpu(svc_cpu) )
+        if ( svc->vcpu == curr_on_cpu(svc_cpu)->vcpu )
         {
             burn_credits(rqd, svc, now);
             /*
@@ -1709,11 +1709,11 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
 }
 
 void burn_credits(struct csched2_runqueue_data *rqd,
-                  struct csched2_vcpu *svc, s_time_t now)
+                  struct csched2_unit *svc, s_time_t now)
 {
     s_time_t delta;
 
-    ASSERT(svc == csched2_vcpu(curr_on_cpu(svc->vcpu->processor)));
+    ASSERT(svc == csched2_unit(curr_on_cpu(svc->vcpu->processor)));
 
     if ( unlikely(is_idle_vcpu(svc->vcpu)) )
     {
@@ -1763,7 +1763,7 @@ void burn_credits(struct csched2_runqueue_data *rqd,
  * Budget-related code.
  */
 
-static void park_vcpu(struct csched2_vcpu *svc)
+static void park_vcpu(struct csched2_unit *svc)
 {
     struct vcpu *v = svc->vcpu;
 
@@ -1792,7 +1792,7 @@ static void park_vcpu(struct csched2_vcpu *svc)
     list_add(&svc->parked_elem, &svc->sdom->parked_vcpus);
 }
 
-static bool vcpu_grab_budget(struct csched2_vcpu *svc)
+static bool vcpu_grab_budget(struct csched2_unit *svc)
 {
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
@@ -1839,7 +1839,7 @@ static bool vcpu_grab_budget(struct csched2_vcpu *svc)
 }
 
 static void
-vcpu_return_budget(struct csched2_vcpu *svc, struct list_head *parked)
+vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked)
 {
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
@@ -1882,7 +1882,7 @@ vcpu_return_budget(struct csched2_vcpu *svc, struct list_head *parked)
 static void
 unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
 {
-    struct csched2_vcpu *svc, *tmp;
+    struct csched2_unit *svc, *tmp;
     spinlock_t *lock;
 
     list_for_each_entry_safe(svc, tmp, vcpus, parked_elem)
@@ -2004,7 +2004,7 @@ static void replenish_domain_budget(void* data)
 static inline void
 csched2_vcpu_check(struct vcpu *vc)
 {
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(vc->sched_unit);
     struct csched2_dom * const sdom = svc->sdom;
 
     BUG_ON( svc->vcpu != vc );
@@ -2030,10 +2030,10 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
                     void *dd)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu *svc;
+    struct csched2_unit *svc;
 
     /* Allocate per-VCPU info */
-    svc = xzalloc(struct csched2_vcpu);
+    svc = xzalloc(struct csched2_unit);
     if ( svc == NULL )
         return NULL;
 
@@ -2074,12 +2074,12 @@ static void
 csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(unit);
 
     ASSERT(!is_idle_vcpu(vc));
     SCHED_STAT_CRANK(vcpu_sleep);
 
-    if ( curr_on_cpu(vc->processor) == vc )
+    if ( curr_on_cpu(vc->processor) == unit )
     {
         tickle_cpu(vc->processor, svc->rqd);
     }
@@ -2097,7 +2097,7 @@ static void
 csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(unit);
     unsigned int cpu = vc->processor;
     s_time_t now;
 
@@ -2105,7 +2105,7 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
     ASSERT(!is_idle_vcpu(vc));
 
-    if ( unlikely(curr_on_cpu(cpu) == vc) )
+    if ( unlikely(curr_on_cpu(cpu) == unit) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         goto out;
@@ -2152,8 +2152,7 @@ out:
 static void
 csched2_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(v);
+    struct csched2_unit * const svc = csched2_unit(unit);
 
     __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
 }
@@ -2162,7 +2161,7 @@ static void
 csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(unit);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
     s_time_t now = NOW();
     LIST_HEAD(were_parked);
@@ -2208,7 +2207,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
     struct vcpu *vc = unit->vcpu;
     int i, min_rqi = -1, min_s_rqi = -1;
     unsigned int new_cpu, cpu = vc->processor;
-    struct csched2_vcpu *svc = csched2_vcpu(vc);
+    struct csched2_unit *svc = csched2_unit(unit);
     s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
     bool has_soft;
 
@@ -2430,15 +2429,15 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 typedef struct {
     /* NB: Modified by consider() */
     s_time_t load_delta;
-    struct csched2_vcpu * best_push_svc, *best_pull_svc;
+    struct csched2_unit * best_push_svc, *best_pull_svc;
     /* NB: Read by consider() */
     struct csched2_runqueue_data *lrqd;
     struct csched2_runqueue_data *orqd;                  
 } balance_state_t;
 
 static void consider(balance_state_t *st, 
-                     struct csched2_vcpu *push_svc,
-                     struct csched2_vcpu *pull_svc)
+                     struct csched2_unit *push_svc,
+                     struct csched2_unit *pull_svc)
 {
     s_time_t l_load, o_load, delta;
 
@@ -2471,8 +2470,8 @@ static void consider(balance_state_t *st,
 
 
 static void migrate(const struct scheduler *ops,
-                    struct csched2_vcpu *svc, 
-                    struct csched2_runqueue_data *trqd, 
+                    struct csched2_unit *svc,
+                    struct csched2_runqueue_data *trqd,
                     s_time_t now)
 {
     int cpu = svc->vcpu->processor;
@@ -2541,7 +2540,7 @@ static void migrate(const struct scheduler *ops,
  *  - svc is not already flagged to migrate,
  *  - if svc is allowed to run on at least one of the pcpus of rqd.
  */
-static bool vcpu_is_migrateable(struct csched2_vcpu *svc,
+static bool vcpu_is_migrateable(struct csched2_unit *svc,
                                   struct csched2_runqueue_data *rqd)
 {
     struct vcpu *v = svc->vcpu;
@@ -2691,7 +2690,7 @@ retry:
     /* Reuse load delta (as we're trying to minimize it) */
     list_for_each( push_iter, &st.lrqd->svc )
     {
-        struct csched2_vcpu * push_svc = list_entry(push_iter, struct csched2_vcpu, rqd_elem);
+        struct csched2_unit * push_svc = list_entry(push_iter, struct csched2_unit, rqd_elem);
 
         update_svc_load(ops, push_svc, 0, now);
 
@@ -2700,7 +2699,7 @@ retry:
 
         list_for_each( pull_iter, &st.orqd->svc )
         {
-            struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
+            struct csched2_unit * pull_svc = list_entry(pull_iter, struct csched2_unit, rqd_elem);
             
             if ( !inner_load_updated )
                 update_svc_load(ops, pull_svc, 0, now);
@@ -2719,7 +2718,7 @@ retry:
 
     list_for_each( pull_iter, &st.orqd->svc )
     {
-        struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
+        struct csched2_unit * pull_svc = list_entry(pull_iter, struct csched2_unit, rqd_elem);
         
         if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
             continue;
@@ -2746,7 +2745,7 @@ csched2_unit_migrate(
 {
     struct vcpu *vc = unit->vcpu;
     struct domain *d = vc->domain;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(unit);
     struct csched2_runqueue_data *trqd;
     s_time_t now = NOW();
 
@@ -2847,7 +2846,7 @@ csched2_dom_cntl(
             /* Update weights for vcpus, and max_weight for runqueues on which they reside */
             for_each_vcpu ( d, v )
             {
-                struct csched2_vcpu *svc = csched2_vcpu(v);
+                struct csched2_unit *svc = csched2_unit(v->sched_unit);
                 spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
 
                 ASSERT(svc->rqd == c2rqd(ops, svc->vcpu->processor));
@@ -2861,7 +2860,7 @@ csched2_dom_cntl(
         /* Cap */
         if ( op->u.credit2.cap != 0 )
         {
-            struct csched2_vcpu *svc;
+            struct csched2_unit *svc;
             spinlock_t *lock;
 
             /* Cap is only valid if it's below 100 * nr_of_vCPUS */
@@ -2885,7 +2884,7 @@ csched2_dom_cntl(
              */
             for_each_vcpu ( d, v )
             {
-                svc = csched2_vcpu(v);
+                svc = csched2_unit(v->sched_unit);
                 lock = vcpu_schedule_lock(svc->vcpu);
                 /*
                  * Too small quotas would in theory cause a lot of overhead,
@@ -2928,14 +2927,14 @@ csched2_dom_cntl(
                  */
                 for_each_vcpu ( d, v )
                 {
-                    svc = csched2_vcpu(v);
+                    svc = csched2_unit(v->sched_unit);
                     lock = vcpu_schedule_lock(svc->vcpu);
                     if ( v->is_running )
                     {
                         unsigned int cpu = v->processor;
                         struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
 
-                        ASSERT(curr_on_cpu(cpu) == v);
+                        ASSERT(curr_on_cpu(cpu)->vcpu == v);
 
                         /*
                          * We are triggering a reschedule on the vCPU's
@@ -2975,7 +2974,7 @@ csched2_dom_cntl(
             /* Disable budget accounting for all the vCPUs. */
             for_each_vcpu ( d, v )
             {
-                struct csched2_vcpu *svc = csched2_vcpu(v);
+                struct csched2_unit *svc = csched2_unit(v->sched_unit);
                 spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
 
                 svc->budget = STIME_MAX;
@@ -3012,8 +3011,7 @@ static void
 csched2_aff_cntl(const struct scheduler *ops, struct sched_unit *unit,
                  const cpumask_t *hard, const cpumask_t *soft)
 {
-    struct vcpu *v = unit->vcpu;
-    struct csched2_vcpu *svc = csched2_vcpu(v);
+    struct csched2_unit *svc = csched2_unit(unit);
 
     if ( !hard )
         return;
@@ -3113,7 +3111,7 @@ static void
 csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu *svc = unit->priv;
+    struct csched2_unit *svc = unit->priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
@@ -3145,7 +3143,7 @@ csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 static void
 csched2_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct csched2_vcpu *svc = priv;
+    struct csched2_unit *svc = priv;
 
     xfree(svc);
 }
@@ -3154,7 +3152,7 @@ static void
 csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(unit);
     spinlock_t *lock;
 
     ASSERT(!is_idle_vcpu(vc));
@@ -3175,7 +3173,7 @@ csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 /* How long should we let this vcpu run for? */
 static s_time_t
 csched2_runtime(const struct scheduler *ops, int cpu,
-                struct csched2_vcpu *snext, s_time_t now)
+                struct csched2_unit *snext, s_time_t now)
 {
     s_time_t time, min_time;
     int rt_credit; /* Proposed runtime measured in credits */
@@ -3220,7 +3218,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
      */
     if ( ! list_empty(runq) )
     {
-        struct csched2_vcpu *swait = runq_elem(runq->next);
+        struct csched2_unit *swait = runq_elem(runq->next);
 
         if ( ! is_idle_vcpu(swait->vcpu)
              && swait->credit > 0 )
@@ -3271,14 +3269,14 @@ csched2_runtime(const struct scheduler *ops, int cpu,
 /*
  * Find a candidate.
  */
-static struct csched2_vcpu *
+static struct csched2_unit *
 runq_candidate(struct csched2_runqueue_data *rqd,
-               struct csched2_vcpu *scurr,
+               struct csched2_unit *scurr,
                int cpu, s_time_t now,
                unsigned int *skipped)
 {
     struct list_head *iter, *temp;
-    struct csched2_vcpu *snext = NULL;
+    struct csched2_unit *snext = NULL;
     struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
     bool yield = false, soft_aff_preempt = false;
 
@@ -3359,12 +3357,12 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
         snext = scurr;
     else
-        snext = csched2_vcpu(idle_vcpu[cpu]);
+        snext = csched2_unit(idle_vcpu[cpu]->sched_unit);
 
  check_runq:
     list_for_each_safe( iter, temp, &rqd->runq )
     {
-        struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
+        struct csched2_unit * svc = list_entry(iter, struct csched2_unit, runq_elem);
 
         if ( unlikely(tb_init_done) )
         {
@@ -3463,8 +3461,8 @@ csched2_schedule(
 {
     const int cpu = smp_processor_id();
     struct csched2_runqueue_data *rqd;
-    struct csched2_vcpu * const scurr = csched2_vcpu(current);
-    struct csched2_vcpu *snext = NULL;
+    struct csched2_unit * const scurr = csched2_unit(current->sched_unit);
+    struct csched2_unit *snext = NULL;
     unsigned int skipped_vcpus = 0;
     struct task_slice ret;
     bool tickled;
@@ -3540,7 +3538,7 @@ csched2_schedule(
     {
         __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
         trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
-        snext = csched2_vcpu(idle_vcpu[cpu]);
+        snext = csched2_unit(idle_vcpu[cpu]->sched_unit);
     }
     else
         snext = runq_candidate(rqd, scurr, cpu, now, &skipped_vcpus);
@@ -3643,7 +3641,7 @@ csched2_schedule(
 }
 
 static void
-csched2_dump_vcpu(struct csched2_private *prv, struct csched2_vcpu *svc)
+csched2_dump_vcpu(struct csched2_private *prv, struct csched2_unit *svc)
 {
     printk("[%i.%i] flags=%x cpu=%i",
             svc->vcpu->domain->domain_id,
@@ -3667,7 +3665,7 @@ static inline void
 dump_pcpu(const struct scheduler *ops, int cpu)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    struct csched2_vcpu *svc;
+    struct csched2_unit *svc;
 
     printk("CPU[%02d] runq=%d, sibling=%*pb, core=%*pb\n",
            cpu, c2r(cpu),
@@ -3675,7 +3673,7 @@ dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
     /* current VCPU (nothing to say if that's the idle vcpu) */
-    svc = csched2_vcpu(curr_on_cpu(cpu));
+    svc = csched2_unit(curr_on_cpu(cpu));
     if ( svc && !is_idle_vcpu(svc->vcpu) )
     {
         printk("\trun: ");
@@ -3748,7 +3746,7 @@ csched2_dump(const struct scheduler *ops)
 
         for_each_vcpu( sdom->dom, v )
         {
-            struct csched2_vcpu * const svc = csched2_vcpu(v);
+            struct csched2_unit * const svc = csched2_unit(v->sched_unit);
             spinlock_t *lock;
 
             lock = vcpu_schedule_lock(svc->vcpu);
@@ -3777,7 +3775,7 @@ csched2_dump(const struct scheduler *ops)
         printk("RUNQ:\n");
         list_for_each( iter, runq )
         {
-            struct csched2_vcpu *svc = runq_elem(iter);
+            struct csched2_unit *svc = runq_elem(iter);
 
             if ( svc )
             {
@@ -3878,7 +3876,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                      void *pdata, void *vdata)
 {
     struct csched2_private *prv = csched2_priv(new_ops);
-    struct csched2_vcpu *svc = vdata;
+    struct csched2_unit *svc = vdata;
     unsigned rqi;
 
     ASSERT(pdata && svc && is_idle_vcpu(svc->vcpu));
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 1ea8643a9c..defcdfcf1e 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -94,7 +94,7 @@ DEFINE_PER_CPU(struct null_pcpu, npc);
 /*
  * Virtual CPU
  */
-struct null_vcpu {
+struct null_unit {
     struct list_head waitq_elem;
     struct vcpu *vcpu;
 };
@@ -115,9 +115,9 @@ static inline struct null_private *null_priv(const struct scheduler *ops)
     return ops->sched_data;
 }
 
-static inline struct null_vcpu *null_vcpu(const struct vcpu *v)
+static inline struct null_unit *null_unit(const struct sched_unit *unit)
 {
-    return v->sched_unit->priv;
+    return unit->priv;
 }
 
 static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
@@ -197,9 +197,9 @@ static void *null_alloc_vdata(const struct scheduler *ops,
                               struct sched_unit *unit, void *dd)
 {
     struct vcpu *v = unit->vcpu;
-    struct null_vcpu *nvc;
+    struct null_unit *nvc;
 
-    nvc = xzalloc(struct null_vcpu);
+    nvc = xzalloc(struct null_unit);
     if ( nvc == NULL )
         return NULL;
 
@@ -213,7 +213,7 @@ static void *null_alloc_vdata(const struct scheduler *ops,
 
 static void null_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct null_vcpu *nvc = priv;
+    struct null_unit *nvc = priv;
 
     xfree(nvc);
 }
@@ -391,7 +391,7 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct null_private *prv = null_priv(new_ops);
-    struct null_vcpu *nvc = vdata;
+    struct null_unit *nvc = vdata;
 
     ASSERT(nvc && is_idle_vcpu(nvc->vcpu));
 
@@ -414,7 +414,7 @@ static void null_unit_insert(const struct scheduler *ops,
 {
     struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc = null_vcpu(v);
+    struct null_unit *nvc = null_unit(unit);
     unsigned int cpu;
     spinlock_t *lock;
 
@@ -471,9 +471,9 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
 {
     unsigned int bs;
     unsigned int cpu = v->processor;
-    struct null_vcpu *wvc;
+    struct null_unit *wvc;
 
-    ASSERT(list_empty(&null_vcpu(v)->waitq_elem));
+    ASSERT(list_empty(&null_unit(v->sched_unit)->waitq_elem));
 
     vcpu_deassign(prv, v, cpu);
 
@@ -509,7 +509,7 @@ static void null_unit_remove(const struct scheduler *ops,
 {
     struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc = null_vcpu(v);
+    struct null_unit *nvc = null_unit(unit);
     spinlock_t *lock;
 
     ASSERT(!is_idle_vcpu(v));
@@ -544,13 +544,13 @@ static void null_unit_wake(const struct scheduler *ops,
 
     ASSERT(!is_idle_vcpu(v));
 
-    if ( unlikely(curr_on_cpu(v->processor) == v) )
+    if ( unlikely(curr_on_cpu(v->processor) == unit) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         return;
     }
 
-    if ( unlikely(!list_empty(&null_vcpu(v)->waitq_elem)) )
+    if ( unlikely(!list_empty(&null_unit(unit)->waitq_elem)) )
     {
         /* Not exactly "on runq", but close enough for reusing the counter */
         SCHED_STAT_CRANK(vcpu_wake_onrunq);
@@ -574,7 +574,7 @@ static void null_unit_sleep(const struct scheduler *ops,
     ASSERT(!is_idle_vcpu(v));
 
     /* If v is not assigned to a pCPU, or is not running, no need to bother */
-    if ( curr_on_cpu(v->processor) == v )
+    if ( curr_on_cpu(v->processor) == unit )
         cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
 
     SCHED_STAT_CRANK(vcpu_sleep);
@@ -592,7 +592,7 @@ static void null_unit_migrate(const struct scheduler *ops,
 {
     struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc = null_vcpu(v);
+    struct null_unit *nvc = null_unit(unit);
 
     ASSERT(!is_idle_vcpu(v));
 
@@ -677,7 +677,7 @@ static void null_unit_migrate(const struct scheduler *ops,
 #ifndef NDEBUG
 static inline void null_vcpu_check(struct vcpu *v)
 {
-    struct null_vcpu * const nvc = null_vcpu(v);
+    struct null_unit * const nvc = null_unit(v->sched_unit);
     struct null_dom * const ndom = v->domain->sched_priv;
 
     BUG_ON(nvc->vcpu != v);
@@ -707,7 +707,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     unsigned int bs;
     const unsigned int cpu = smp_processor_id();
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *wvc;
+    struct null_unit *wvc;
     struct task_slice ret;
 
     SCHED_STAT_CRANK(schedule);
@@ -790,7 +790,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     return ret;
 }
 
-static inline void dump_vcpu(struct null_private *prv, struct null_vcpu *nvc)
+static inline void dump_vcpu(struct null_private *prv, struct null_unit *nvc)
 {
     printk("[%i.%i] pcpu=%d", nvc->vcpu->domain->domain_id,
             nvc->vcpu->vcpu_id, list_empty(&nvc->waitq_elem) ?
@@ -800,7 +800,7 @@ static inline void dump_vcpu(struct null_private *prv, struct null_vcpu *nvc)
 static void null_dump_pcpu(const struct scheduler *ops, int cpu)
 {
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc;
+    struct null_unit *nvc;
     spinlock_t *lock;
     unsigned long flags;
 
@@ -815,7 +815,7 @@ static void null_dump_pcpu(const struct scheduler *ops, int cpu)
     printk("\n");
 
     /* current VCPU (nothing to say if that's the idle vcpu) */
-    nvc = null_vcpu(curr_on_cpu(cpu));
+    nvc = null_unit(curr_on_cpu(cpu));
     if ( nvc && !is_idle_vcpu(nvc->vcpu) )
     {
         printk("\trun: ");
@@ -849,7 +849,7 @@ static void null_dump(const struct scheduler *ops)
         printk("\tDomain: %d\n", ndom->dom->domain_id);
         for_each_vcpu( ndom->dom, v )
         {
-            struct null_vcpu * const nvc = null_vcpu(v);
+            struct null_unit * const nvc = null_unit(v->sched_unit);
             spinlock_t *lock;
 
             lock = vcpu_schedule_lock(nvc->vcpu);
@@ -867,7 +867,7 @@ static void null_dump(const struct scheduler *ops)
     spin_lock(&prv->waitq_lock);
     list_for_each( iter, &prv->waitq )
     {
-        struct null_vcpu *nvc = list_entry(iter, struct null_vcpu, waitq_elem);
+        struct null_unit *nvc = list_entry(iter, struct null_unit, waitq_elem);
 
         if ( loop++ != 0 )
             printk(", ");
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index b80bc02357..8caec5c5dc 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -195,7 +195,7 @@ struct rt_private {
 /*
  * Virtual CPU
  */
-struct rt_vcpu {
+struct rt_unit {
     struct list_head q_elem;     /* on the runq/depletedq list */
     struct list_head replq_elem; /* on the replenishment events list */
 
@@ -233,9 +233,9 @@ static inline struct rt_private *rt_priv(const struct scheduler *ops)
     return ops->sched_data;
 }
 
-static inline struct rt_vcpu *rt_vcpu(const struct vcpu *vcpu)
+static inline struct rt_unit *rt_unit(const struct sched_unit *unit)
 {
-    return vcpu->sched_unit->priv;
+    return unit->priv;
 }
 
 static inline struct list_head *rt_runq(const struct scheduler *ops)
@@ -253,7 +253,7 @@ static inline struct list_head *rt_replq(const struct scheduler *ops)
     return &rt_priv(ops)->replq;
 }
 
-static inline bool has_extratime(const struct rt_vcpu *svc)
+static inline bool has_extratime(const struct rt_unit *svc)
 {
     return svc->flags & RTDS_extratime;
 }
@@ -263,25 +263,25 @@ static inline bool has_extratime(const struct rt_vcpu *svc)
  * and the replenishment events queue.
  */
 static int
-vcpu_on_q(const struct rt_vcpu *svc)
+vcpu_on_q(const struct rt_unit *svc)
 {
    return !list_empty(&svc->q_elem);
 }
 
-static struct rt_vcpu *
+static struct rt_unit *
 q_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct rt_vcpu, q_elem);
+    return list_entry(elem, struct rt_unit, q_elem);
 }
 
-static struct rt_vcpu *
+static struct rt_unit *
 replq_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct rt_vcpu, replq_elem);
+    return list_entry(elem, struct rt_unit, replq_elem);
 }
 
 static int
-vcpu_on_replq(const struct rt_vcpu *svc)
+vcpu_on_replq(const struct rt_unit *svc)
 {
     return !list_empty(&svc->replq_elem);
 }
@@ -291,7 +291,7 @@ vcpu_on_replq(const struct rt_vcpu *svc)
  * Otherwise, return value < 0
  */
 static s_time_t
-compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
+compare_vcpu_priority(const struct rt_unit *v1, const struct rt_unit *v2)
 {
     int prio = v2->priority_level - v1->priority_level;
 
@@ -305,7 +305,7 @@ compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
  * Debug related code, dump vcpu/cpu information
  */
 static void
-rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
+rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc)
 {
     cpumask_t *cpupool_mask, *mask;
 
@@ -352,13 +352,13 @@ static void
 rt_dump_pcpu(const struct scheduler *ops, int cpu)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *svc;
+    struct rt_unit *svc;
     unsigned long flags;
 
     spin_lock_irqsave(&prv->lock, flags);
     printk("CPU[%02d]\n", cpu);
     /* current VCPU (nothing to say if that's the idle vcpu). */
-    svc = rt_vcpu(curr_on_cpu(cpu));
+    svc = rt_unit(curr_on_cpu(cpu));
     if ( svc && !is_idle_vcpu(svc->vcpu) )
     {
         rt_dump_vcpu(ops, svc);
@@ -371,7 +371,7 @@ rt_dump(const struct scheduler *ops)
 {
     struct list_head *runq, *depletedq, *replq, *iter;
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *svc;
+    struct rt_unit *svc;
     struct rt_dom *sdom;
     unsigned long flags;
 
@@ -415,7 +415,7 @@ rt_dump(const struct scheduler *ops)
 
         for_each_vcpu ( sdom->dom, v )
         {
-            svc = rt_vcpu(v);
+            svc = rt_unit(v->sched_unit);
             rt_dump_vcpu(ops, svc);
         }
     }
@@ -429,7 +429,7 @@ rt_dump(const struct scheduler *ops)
  * it needs to be updated to the deadline of the current period
  */
 static void
-rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
+rt_update_deadline(s_time_t now, struct rt_unit *svc)
 {
     ASSERT(now >= svc->cur_deadline);
     ASSERT(svc->period != 0);
@@ -500,8 +500,8 @@ deadline_queue_remove(struct list_head *queue, struct list_head *elem)
 }
 
 static inline bool
-deadline_queue_insert(struct rt_vcpu * (*qelem)(struct list_head *),
-                      struct rt_vcpu *svc, struct list_head *elem,
+deadline_queue_insert(struct rt_unit * (*qelem)(struct list_head *),
+                      struct rt_unit *svc, struct list_head *elem,
                       struct list_head *queue)
 {
     struct list_head *iter;
@@ -509,7 +509,7 @@ deadline_queue_insert(struct rt_vcpu * (*qelem)(struct list_head *),
 
     list_for_each ( iter, queue )
     {
-        struct rt_vcpu * iter_svc = (*qelem)(iter);
+        struct rt_unit * iter_svc = (*qelem)(iter);
         if ( compare_vcpu_priority(svc, iter_svc) > 0 )
             break;
         pos++;
@@ -523,14 +523,14 @@ deadline_queue_insert(struct rt_vcpu * (*qelem)(struct list_head *),
   deadline_queue_insert(&replq_elem, ##__VA_ARGS__)
 
 static inline void
-q_remove(struct rt_vcpu *svc)
+q_remove(struct rt_unit *svc)
 {
     ASSERT( vcpu_on_q(svc) );
     list_del_init(&svc->q_elem);
 }
 
 static inline void
-replq_remove(const struct scheduler *ops, struct rt_vcpu *svc)
+replq_remove(const struct scheduler *ops, struct rt_unit *svc)
 {
     struct rt_private *prv = rt_priv(ops);
     struct list_head *replq = rt_replq(ops);
@@ -547,7 +547,7 @@ replq_remove(const struct scheduler *ops, struct rt_vcpu *svc)
          */
         if ( !list_empty(replq) )
         {
-            struct rt_vcpu *svc_next = replq_elem(replq->next);
+            struct rt_unit *svc_next = replq_elem(replq->next);
             set_timer(&prv->repl_timer, svc_next->cur_deadline);
         }
         else
@@ -561,7 +561,7 @@ replq_remove(const struct scheduler *ops, struct rt_vcpu *svc)
  * Insert svc without budget in DepletedQ unsorted;
  */
 static void
-runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
+runq_insert(const struct scheduler *ops, struct rt_unit *svc)
 {
     struct rt_private *prv = rt_priv(ops);
     struct list_head *runq = rt_runq(ops);
@@ -579,7 +579,7 @@ runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
 }
 
 static void
-replq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
+replq_insert(const struct scheduler *ops, struct rt_unit *svc)
 {
     struct list_head *replq = rt_replq(ops);
     struct rt_private *prv = rt_priv(ops);
@@ -601,10 +601,10 @@ replq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
  * changed.
  */
 static void
-replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc)
+replq_reinsert(const struct scheduler *ops, struct rt_unit *svc)
 {
     struct list_head *replq = rt_replq(ops);
-    struct rt_vcpu *rearm_svc = svc;
+    struct rt_unit *rearm_svc = svc;
     bool_t rearm = 0;
 
     ASSERT( vcpu_on_replq(svc) );
@@ -735,7 +735,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                 void *pdata, void *vdata)
 {
     struct rt_private *prv = rt_priv(new_ops);
-    struct rt_vcpu *svc = vdata;
+    struct rt_unit *svc = vdata;
 
     ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
 
@@ -842,10 +842,10 @@ static void *
 rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu *svc;
+    struct rt_unit *svc;
 
     /* Allocate per-VCPU info */
-    svc = xzalloc(struct rt_vcpu);
+    svc = xzalloc(struct rt_unit);
     if ( svc == NULL )
         return NULL;
 
@@ -870,7 +870,7 @@ rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
 static void
 rt_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct rt_vcpu *svc = priv;
+    struct rt_unit *svc = priv;
 
     xfree(svc);
 }
@@ -886,7 +886,7 @@ static void
 rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu *svc = rt_vcpu(vc);
+    struct rt_unit *svc = rt_unit(unit);
     s_time_t now;
     spinlock_t *lock;
 
@@ -915,13 +915,13 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 }
 
 /*
- * Remove rt_vcpu svc from the old scheduler in source cpupool.
+ * Remove rt_unit svc from the old scheduler in source cpupool.
  */
 static void
 rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu * const svc = rt_vcpu(vc);
+    struct rt_unit * const svc = rt_unit(unit);
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
@@ -943,7 +943,7 @@ rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
  * Burn budget in nanosecond granularity
  */
 static void
-burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
+burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
 {
     s_time_t delta;
 
@@ -1007,13 +1007,13 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
  * RunQ is sorted. Pick first one within cpumask. If no one, return NULL
  * lock is grabbed before calling this function
  */
-static struct rt_vcpu *
+static struct rt_unit *
 runq_pick(const struct scheduler *ops, const cpumask_t *mask)
 {
     struct list_head *runq = rt_runq(ops);
     struct list_head *iter;
-    struct rt_vcpu *svc = NULL;
-    struct rt_vcpu *iter_svc = NULL;
+    struct rt_unit *svc = NULL;
+    struct rt_unit *iter_svc = NULL;
     cpumask_t cpu_common;
     cpumask_t *online;
 
@@ -1064,8 +1064,8 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
 {
     const int cpu = smp_processor_id();
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *const scurr = rt_vcpu(current);
-    struct rt_vcpu *snext = NULL;
+    struct rt_unit *const scurr = rt_unit(current->sched_unit);
+    struct rt_unit *snext = NULL;
     struct task_slice ret = { .migrated = 0 };
 
     /* TRACE */
@@ -1091,13 +1091,13 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0,  NULL);
-        snext = rt_vcpu(idle_vcpu[cpu]);
+        snext = rt_unit(idle_vcpu[cpu]->sched_unit);
     }
     else
     {
         snext = runq_pick(ops, cpumask_of(cpu));
         if ( snext == NULL )
-            snext = rt_vcpu(idle_vcpu[cpu]);
+            snext = rt_unit(idle_vcpu[cpu]->sched_unit);
 
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_vcpu(current) &&
@@ -1143,12 +1143,12 @@ static void
 rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu * const svc = rt_vcpu(vc);
+    struct rt_unit * const svc = rt_unit(unit);
 
     BUG_ON( is_idle_vcpu(vc) );
     SCHED_STAT_CRANK(vcpu_sleep);
 
-    if ( curr_on_cpu(vc->processor) == vc )
+    if ( curr_on_cpu(vc->processor) == unit )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
     else if ( vcpu_on_q(svc) )
     {
@@ -1178,11 +1178,11 @@ rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
  * lock is grabbed before calling this function
  */
 static void
-runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
+runq_tickle(const struct scheduler *ops, struct rt_unit *new)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *latest_deadline_vcpu = NULL; /* lowest priority */
-    struct rt_vcpu *iter_svc;
+    struct rt_unit *latest_deadline_vcpu = NULL; /* lowest priority */
+    struct rt_unit *iter_svc;
     struct vcpu *iter_vc;
     int cpu = 0, cpu_to_tickle = 0;
     cpumask_t not_tickled;
@@ -1203,14 +1203,14 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
     cpu = cpumask_test_or_cycle(new->vcpu->processor, &not_tickled);
     while ( cpu!= nr_cpu_ids )
     {
-        iter_vc = curr_on_cpu(cpu);
+        iter_vc = curr_on_cpu(cpu)->vcpu;
         if ( is_idle_vcpu(iter_vc) )
         {
             SCHED_STAT_CRANK(tickled_idle_cpu);
             cpu_to_tickle = cpu;
             goto out;
         }
-        iter_svc = rt_vcpu(iter_vc);
+        iter_svc = rt_unit(iter_vc->sched_unit);
         if ( latest_deadline_vcpu == NULL ||
              compare_vcpu_priority(iter_svc, latest_deadline_vcpu) < 0 )
             latest_deadline_vcpu = iter_svc;
@@ -1259,13 +1259,13 @@ static void
 rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu * const svc = rt_vcpu(vc);
+    struct rt_unit * const svc = rt_unit(unit);
     s_time_t now;
     bool_t missed;
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
+    if ( unlikely(curr_on_cpu(vc->processor) == unit) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         return;
@@ -1330,7 +1330,7 @@ static void
 rt_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu *svc = rt_vcpu(vc);
+    struct rt_unit *svc = rt_unit(unit);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
 
     __clear_bit(__RTDS_scheduled, &svc->flags);
@@ -1361,7 +1361,7 @@ rt_dom_cntl(
     struct xen_domctl_scheduler_op *op)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *svc;
+    struct rt_unit *svc;
     struct vcpu *v;
     unsigned long flags;
     int rc = 0;
@@ -1385,7 +1385,7 @@ rt_dom_cntl(
         spin_lock_irqsave(&prv->lock, flags);
         for_each_vcpu ( d, v )
         {
-            svc = rt_vcpu(v);
+            svc = rt_unit(v->sched_unit);
             svc->period = MICROSECS(op->u.rtds.period); /* transfer to nanosec */
             svc->budget = MICROSECS(op->u.rtds.budget);
         }
@@ -1411,7 +1411,7 @@ rt_dom_cntl(
             if ( op->cmd == XEN_DOMCTL_SCHEDOP_getvcpuinfo )
             {
                 spin_lock_irqsave(&prv->lock, flags);
-                svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
+                svc = rt_unit(d->vcpu[local_sched.vcpuid]->sched_unit);
                 local_sched.u.rtds.budget = svc->budget / MICROSECS(1);
                 local_sched.u.rtds.period = svc->period / MICROSECS(1);
                 if ( has_extratime(svc) )
@@ -1439,7 +1439,7 @@ rt_dom_cntl(
                 }
 
                 spin_lock_irqsave(&prv->lock, flags);
-                svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
+                svc = rt_unit(d->vcpu[local_sched.vcpuid]->sched_unit);
                 svc->period = period;
                 svc->budget = budget;
                 if ( local_sched.u.rtds.flags & XEN_DOMCTL_SCHEDRT_extra )
@@ -1472,7 +1472,7 @@ static void repl_timer_handler(void *data){
     struct list_head *replq = rt_replq(ops);
     struct list_head *runq = rt_runq(ops);
     struct list_head *iter, *tmp;
-    struct rt_vcpu *svc;
+    struct rt_unit *svc;
     LIST_HEAD(tmp_replq);
 
     spin_lock_irq(&prv->lock);
@@ -1514,10 +1514,10 @@ static void repl_timer_handler(void *data){
     {
         svc = replq_elem(iter);
 
-        if ( curr_on_cpu(svc->vcpu->processor) == svc->vcpu &&
+        if ( curr_on_cpu(svc->vcpu->processor) == svc->vcpu->sched_unit &&
              !list_empty(runq) )
         {
-            struct rt_vcpu *next_on_runq = q_elem(runq->next);
+            struct rt_unit *next_on_runq = q_elem(runq->next);
 
             if ( compare_vcpu_priority(svc, next_on_runq) < 0 )
                 runq_tickle(ops, next_on_runq);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5dfdaaece2..c0cec0b6ca 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -334,7 +334,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     /* Idle VCPUs are scheduled immediately, so don't put them in runqueue. */
     if ( is_idle_domain(d) )
     {
-        per_cpu(schedule_data, v->processor).curr = v;
+        per_cpu(schedule_data, v->processor).curr = unit;
         v->is_running = 1;
     }
     else
@@ -1513,7 +1513,7 @@ static void schedule(void)
 
     next = next_slice.task;
 
-    sd->curr = next;
+    sd->curr = next->sched_unit;
 
     if ( next_slice.time >= 0 ) /* -ve means no limit */
         set_timer(&sd->s_timer, now + next_slice.time);
@@ -1636,7 +1636,6 @@ static int cpu_schedule_up(unsigned int cpu)
     per_cpu(scheduler, cpu) = &ops;
     spin_lock_init(&sd->_lock);
     sd->schedule_lock = &sd->_lock;
-    sd->curr = idle_vcpu[cpu];
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&sd->urgent_count, 0);
 
@@ -1670,6 +1669,8 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
+    sd->curr = idle_vcpu[cpu]->sched_unit;
+
     /*
      * We don't want to risk calling xfree() on an sd->sched_priv
      * (e.g., inside free_pdata, from cpu_schedule_down() called
@@ -1863,6 +1864,7 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0, 0) == NULL )
         BUG();
+    this_cpu(schedule_data).curr = idle_vcpu[0]->sched_unit;
     this_cpu(schedule_data).sched_priv = sched_alloc_pdata(&ops, 0);
     BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
     sched_init_pdata(&ops, this_cpu(schedule_data).sched_priv, 0);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index dc149bc102..9b7855e775 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -36,7 +36,7 @@ extern int sched_ratelimit_us;
 struct schedule_data {
     spinlock_t         *schedule_lock,
                        _lock;
-    struct vcpu        *curr;           /* current task                    */
+    struct sched_unit  *curr;           /* current task                    */
     void               *sched_priv;
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 10/60] xen/sched: switch schedule_data.curr to point at sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

In preparation of core scheduling let the percpu pointer
schedule_data.curr point to a strct sched_unit instead of the related
vcpu. At the same time rename the per-vcpu scheduler specific structs
to per-unit ones.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |   2 +-
 xen/common/sched_credit.c   | 101 +++++++++++++-------------
 xen/common/sched_credit2.c  | 168 ++++++++++++++++++++++----------------------
 xen/common/sched_null.c     |  44 ++++++------
 xen/common/sched_rt.c       | 118 +++++++++++++++----------------
 xen/common/schedule.c       |   8 ++-
 xen/include/xen/sched-if.h  |   2 +-
 7 files changed, 220 insertions(+), 223 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 383df750b0..7f26e6a0b0 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -475,7 +475,7 @@ a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
      * If the VCPU being put to sleep is the same one that is currently
      * running, raise a softirq to invoke the scheduler to switch domains.
      */
-    if ( per_cpu(schedule_data, vc->processor).curr == vc )
+    if ( per_cpu(schedule_data, vc->processor).curr == unit )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
 }
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 7f7596b3bf..bc87b813dc 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -83,7 +83,7 @@
     ((struct csched_private *)((_ops)->sched_data))
 #define CSCHED_PCPU(_c)     \
     ((struct csched_pcpu *)per_cpu(schedule_data, _c).sched_priv)
-#define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_unit->priv)
+#define CSCHED_UNIT(unit)   ((struct csched_unit *) (unit)->priv)
 #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
 #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
 
@@ -160,7 +160,7 @@ struct csched_pcpu {
 /*
  * Virtual CPU
  */
-struct csched_vcpu {
+struct csched_unit {
     struct list_head runq_elem;
     struct list_head active_vcpu_elem;
 
@@ -231,15 +231,15 @@ static void csched_tick(void *_cpu);
 static void csched_acct(void *dummy);
 
 static inline int
-__vcpu_on_runq(struct csched_vcpu *svc)
+__vcpu_on_runq(struct csched_unit *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
 
-static inline struct csched_vcpu *
+static inline struct csched_unit *
 __runq_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct csched_vcpu, runq_elem);
+    return list_entry(elem, struct csched_unit, runq_elem);
 }
 
 /* Is the first element of cpu's runq (if any) cpu's idle vcpu? */
@@ -271,7 +271,7 @@ dec_nr_runnable(unsigned int cpu)
 }
 
 static inline void
-__runq_insert(struct csched_vcpu *svc)
+__runq_insert(struct csched_unit *svc)
 {
     unsigned int cpu = svc->vcpu->processor;
     const struct list_head * const runq = RUNQ(cpu);
@@ -281,7 +281,7 @@ __runq_insert(struct csched_vcpu *svc)
 
     list_for_each( iter, runq )
     {
-        const struct csched_vcpu * const iter_svc = __runq_elem(iter);
+        const struct csched_unit * const iter_svc = __runq_elem(iter);
         if ( svc->pri > iter_svc->pri )
             break;
     }
@@ -302,34 +302,34 @@ __runq_insert(struct csched_vcpu *svc)
 }
 
 static inline void
-runq_insert(struct csched_vcpu *svc)
+runq_insert(struct csched_unit *svc)
 {
     __runq_insert(svc);
     inc_nr_runnable(svc->vcpu->processor);
 }
 
 static inline void
-__runq_remove(struct csched_vcpu *svc)
+__runq_remove(struct csched_unit *svc)
 {
     BUG_ON( !__vcpu_on_runq(svc) );
     list_del_init(&svc->runq_elem);
 }
 
 static inline void
-runq_remove(struct csched_vcpu *svc)
+runq_remove(struct csched_unit *svc)
 {
     dec_nr_runnable(svc->vcpu->processor);
     __runq_remove(svc);
 }
 
-static void burn_credits(struct csched_vcpu *svc, s_time_t now)
+static void burn_credits(struct csched_unit *svc, s_time_t now)
 {
     s_time_t delta;
     uint64_t val;
     unsigned int credits;
 
     /* Assert svc is current */
-    ASSERT( svc == CSCHED_VCPU(curr_on_cpu(svc->vcpu->processor)) );
+    ASSERT( svc == CSCHED_UNIT(curr_on_cpu(svc->vcpu->processor)) );
 
     if ( (delta = now - svc->start_time) <= 0 )
         return;
@@ -347,10 +347,10 @@ boolean_param("tickle_one_idle_cpu", opt_tickle_one_idle);
 
 DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 
-static inline void __runq_tickle(struct csched_vcpu *new)
+static inline void __runq_tickle(struct csched_unit *new)
 {
     unsigned int cpu = new->vcpu->processor;
-    struct csched_vcpu * const cur = CSCHED_VCPU(curr_on_cpu(cpu));
+    struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     cpumask_t mask, idle_mask, *online;
     int balance_step, idlers_empty;
@@ -605,7 +605,7 @@ init_pdata(struct csched_private *prv, struct csched_pcpu *spc, int cpu)
     spc->idle_bias = nr_cpu_ids - 1;
 
     /* Start off idling... */
-    BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)));
+    BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)->vcpu));
     cpumask_set_cpu(cpu, prv->idlers);
     spc->nr_runnable = 0;
 }
@@ -637,7 +637,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct csched_private *prv = CSCHED_PRIV(new_ops);
-    struct csched_vcpu *svc = vdata;
+    struct csched_unit *svc = vdata;
 
     ASSERT(svc && is_idle_vcpu(svc->vcpu));
 
@@ -660,7 +660,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 static inline void
 __csched_vcpu_check(struct vcpu *vc)
 {
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_unit * const svc = CSCHED_UNIT(vc->sched_unit);
     struct csched_dom * const sdom = svc->sdom;
 
     BUG_ON( svc->vcpu != vc );
@@ -862,7 +862,7 @@ static struct sched_resource *
 csched_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu *svc = CSCHED_VCPU(vc);
+    struct csched_unit *svc = CSCHED_UNIT(unit);
 
     /*
      * We have been called by vcpu_migrate() (in schedule.c), as part
@@ -876,7 +876,7 @@ csched_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 }
 
 static inline void
-__csched_vcpu_acct_start(struct csched_private *prv, struct csched_vcpu *svc)
+__csched_vcpu_acct_start(struct csched_private *prv, struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
     unsigned long flags;
@@ -906,7 +906,7 @@ __csched_vcpu_acct_start(struct csched_private *prv, struct csched_vcpu *svc)
 
 static inline void
 __csched_vcpu_acct_stop_locked(struct csched_private *prv,
-    struct csched_vcpu *svc)
+    struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
@@ -931,7 +931,7 @@ __csched_vcpu_acct_stop_locked(struct csched_private *prv,
 static void
 csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
 {
-    struct csched_vcpu * const svc = CSCHED_VCPU(current);
+    struct csched_unit * const svc = CSCHED_UNIT(current->sched_unit);
     const struct scheduler *ops = per_cpu(scheduler, cpu);
 
     ASSERT( current->processor == cpu );
@@ -1000,10 +1000,10 @@ csched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
                    void *dd)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu *svc;
+    struct csched_unit *svc;
 
     /* Allocate per-VCPU info */
-    svc = xzalloc(struct csched_vcpu);
+    svc = xzalloc(struct csched_unit);
     if ( svc == NULL )
         return NULL;
 
@@ -1022,7 +1022,7 @@ static void
 csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu *svc = unit->priv;
+    struct csched_unit *svc = unit->priv;
     spinlock_t *lock;
 
     BUG_ON( is_idle_vcpu(vc) );
@@ -1048,7 +1048,7 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 static void
 csched_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct csched_vcpu *svc = priv;
+    struct csched_unit *svc = priv;
 
     BUG_ON( !list_empty(&svc->runq_elem) );
 
@@ -1059,8 +1059,7 @@ static void
 csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_unit * const svc = CSCHED_UNIT(unit);
     struct csched_dom * const sdom = svc->sdom;
 
     SCHED_STAT_CRANK(vcpu_remove);
@@ -1087,14 +1086,14 @@ static void
 csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_unit * const svc = CSCHED_UNIT(unit);
     unsigned int cpu = vc->processor;
 
     SCHED_STAT_CRANK(vcpu_sleep);
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    if ( curr_on_cpu(cpu) == vc )
+    if ( curr_on_cpu(cpu) == unit )
     {
         /*
          * We are about to tickle cpu, so we should clear its bit in idlers.
@@ -1112,12 +1111,12 @@ static void
 csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_unit * const svc = CSCHED_UNIT(unit);
     bool_t migrating;
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
+    if ( unlikely(curr_on_cpu(vc->processor) == unit) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         return;
@@ -1173,8 +1172,7 @@ csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 static void
 csched_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_unit * const svc = CSCHED_UNIT(unit);
 
     /* Let the scheduler know that this vcpu is trying to yield */
     set_bit(CSCHED_FLAG_VCPU_YIELD, &svc->flags);
@@ -1229,8 +1227,7 @@ static void
 csched_aff_cntl(const struct scheduler *ops, struct sched_unit *unit,
                 const cpumask_t *hard, const cpumask_t *soft)
 {
-    struct vcpu *v = unit->vcpu;
-    struct csched_vcpu *svc = CSCHED_VCPU(v);
+    struct csched_unit *svc = CSCHED_UNIT(unit);
 
     if ( !hard )
         return;
@@ -1333,7 +1330,7 @@ csched_runq_sort(struct csched_private *prv, unsigned int cpu)
 {
     struct csched_pcpu * const spc = CSCHED_PCPU(cpu);
     struct list_head *runq, *elem, *next, *last_under;
-    struct csched_vcpu *svc_elem;
+    struct csched_unit *svc_elem;
     spinlock_t *lock;
     unsigned long flags;
     int sort_epoch;
@@ -1379,7 +1376,7 @@ csched_acct(void* dummy)
     unsigned long flags;
     struct list_head *iter_vcpu, *next_vcpu;
     struct list_head *iter_sdom, *next_sdom;
-    struct csched_vcpu *svc;
+    struct csched_unit *svc;
     struct csched_dom *sdom;
     uint32_t credit_total;
     uint32_t weight_total;
@@ -1502,7 +1499,7 @@ csched_acct(void* dummy)
 
         list_for_each_safe( iter_vcpu, next_vcpu, &sdom->active_vcpu )
         {
-            svc = list_entry(iter_vcpu, struct csched_vcpu, active_vcpu_elem);
+            svc = list_entry(iter_vcpu, struct csched_unit, active_vcpu_elem);
             BUG_ON( sdom != svc->sdom );
 
             /* Increment credit */
@@ -1606,12 +1603,12 @@ csched_tick(void *_cpu)
     set_timer(&spc->ticker, NOW() + MICROSECS(prv->tick_period_us) );
 }
 
-static struct csched_vcpu *
+static struct csched_unit *
 csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
 {
     const struct csched_private * const prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu);
-    struct csched_vcpu *speer;
+    struct csched_unit *speer;
     struct list_head *iter;
     struct vcpu *vc;
 
@@ -1621,7 +1618,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
      * Don't steal from an idle CPU's runq because it's about to
      * pick up work from it itself.
      */
-    if ( unlikely(is_idle_vcpu(curr_on_cpu(peer_cpu))) )
+    if ( unlikely(is_idle_vcpu(curr_on_cpu(peer_cpu)->vcpu)) )
         goto out;
 
     list_for_each( iter, &peer_pcpu->runq )
@@ -1683,12 +1680,12 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
     return NULL;
 }
 
-static struct csched_vcpu *
+static struct csched_unit *
 csched_load_balance(struct csched_private *prv, int cpu,
-    struct csched_vcpu *snext, bool_t *stolen)
+    struct csched_unit *snext, bool_t *stolen)
 {
     struct cpupool *c = per_cpu(cpupool, cpu);
-    struct csched_vcpu *speer;
+    struct csched_unit *speer;
     cpumask_t workers;
     cpumask_t *online;
     int peer_cpu, first_cpu, peer_node, bstep;
@@ -1837,9 +1834,9 @@ csched_schedule(
 {
     const int cpu = smp_processor_id();
     struct list_head * const runq = RUNQ(cpu);
-    struct csched_vcpu * const scurr = CSCHED_VCPU(current);
+    struct csched_unit * const scurr = CSCHED_UNIT(current->sched_unit);
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct csched_vcpu *snext;
+    struct csched_unit *snext;
     struct task_slice ret;
     s_time_t runtime, tslice;
 
@@ -1955,7 +1952,7 @@ csched_schedule(
     if ( tasklet_work_scheduled )
     {
         TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_VCPU(idle_vcpu[cpu]);
+        snext = CSCHED_UNIT(idle_vcpu[cpu]->sched_unit);
         snext->pri = CSCHED_PRI_TS_BOOST;
     }
 
@@ -2007,7 +2004,7 @@ out:
 }
 
 static void
-csched_dump_vcpu(struct csched_vcpu *svc)
+csched_dump_vcpu(struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
@@ -2043,7 +2040,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
     struct list_head *runq, *iter;
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_pcpu *spc;
-    struct csched_vcpu *svc;
+    struct csched_unit *svc;
     spinlock_t *lock;
     unsigned long flags;
     int loop;
@@ -2067,7 +2064,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
     /* current VCPU (nothing to say if that's the idle vcpu). */
-    svc = CSCHED_VCPU(curr_on_cpu(cpu));
+    svc = CSCHED_UNIT(curr_on_cpu(cpu));
     if ( svc && !is_idle_vcpu(svc->vcpu) )
     {
         printk("\trun: ");
@@ -2136,10 +2133,10 @@ csched_dump(const struct scheduler *ops)
 
         list_for_each( iter_svc, &sdom->active_vcpu )
         {
-            struct csched_vcpu *svc;
+            struct csched_unit *svc;
             spinlock_t *lock;
 
-            svc = list_entry(iter_svc, struct csched_vcpu, active_vcpu_elem);
+            svc = list_entry(iter_svc, struct csched_unit, active_vcpu_elem);
             lock = vcpu_schedule_lock(svc->vcpu);
 
             printk("\t%3d: ", ++loop);
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index b71802bba2..36201815e3 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -176,7 +176,7 @@
  *     load balancing;
  *  + serializes runqueue operations (removing and inserting vcpus);
  *  + protects runqueue-wide data in csched2_runqueue_data;
- *  + protects vcpu parameters in csched2_vcpu for the vcpu in the
+ *  + protects vcpu parameters in csched2_unit for the vcpu in the
  *    runqueue.
  *
  * - Private scheduler lock
@@ -511,7 +511,7 @@ struct csched2_pcpu {
 /*
  * Virtual CPU
  */
-struct csched2_vcpu {
+struct csched2_unit {
     struct csched2_dom *sdom;          /* Up-pointer to domain                */
     struct vcpu *vcpu;                 /* Up-pointer, to vcpu                 */
     struct csched2_runqueue_data *rqd; /* Up-pointer to the runqueue          */
@@ -570,9 +570,9 @@ static inline struct csched2_pcpu *csched2_pcpu(unsigned int cpu)
     return per_cpu(schedule_data, cpu).sched_priv;
 }
 
-static inline struct csched2_vcpu *csched2_vcpu(const struct vcpu *v)
+static inline struct csched2_unit *csched2_unit(const struct sched_unit *unit)
 {
-    return v->sched_unit->priv;
+    return unit->priv;
 }
 
 static inline struct csched2_dom *csched2_dom(const struct domain *d)
@@ -594,7 +594,7 @@ static inline struct csched2_runqueue_data *c2rqd(const struct scheduler *ops,
 }
 
 /* Does the domain of this vCPU have a cap? */
-static inline bool has_cap(const struct csched2_vcpu *svc)
+static inline bool has_cap(const struct csched2_unit *svc)
 {
     return svc->budget != STIME_MAX;
 }
@@ -688,7 +688,7 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
  * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
  * note that at least 5 is guaranteed to _always_ return at least one pcpu.
  */
-static int get_fallback_cpu(struct csched2_vcpu *svc)
+static int get_fallback_cpu(struct csched2_unit *svc)
 {
     struct vcpu *v = svc->vcpu;
     unsigned int bs;
@@ -773,7 +773,7 @@ static int get_fallback_cpu(struct csched2_vcpu *svc)
  * FIXME: Do pre-calculated division?
  */
 static void t2c_update(struct csched2_runqueue_data *rqd, s_time_t time,
-                          struct csched2_vcpu *svc)
+                          struct csched2_unit *svc)
 {
     uint64_t val = time * rqd->max_weight + svc->residual;
 
@@ -781,7 +781,7 @@ static void t2c_update(struct csched2_runqueue_data *rqd, s_time_t time,
     svc->credit -= val;
 }
 
-static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct csched2_vcpu *svc)
+static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct csched2_unit *svc)
 {
     return credit * svc->weight / rqd->max_weight;
 }
@@ -790,14 +790,14 @@ static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct c
  * Runqueue related code.
  */
 
-static inline int vcpu_on_runq(struct csched2_vcpu *svc)
+static inline int vcpu_on_runq(struct csched2_unit *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
 
-static inline struct csched2_vcpu * runq_elem(struct list_head *elem)
+static inline struct csched2_unit * runq_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct csched2_vcpu, runq_elem);
+    return list_entry(elem, struct csched2_unit, runq_elem);
 }
 
 static void activate_runqueue(struct csched2_private *prv, int rqi)
@@ -915,7 +915,7 @@ static void update_max_weight(struct csched2_runqueue_data *rqd, int new_weight,
 
         list_for_each( iter, &rqd->svc )
         {
-            struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, rqd_elem);
+            struct csched2_unit * svc = list_entry(iter, struct csched2_unit, rqd_elem);
 
             if ( svc->weight > max_weight )
                 max_weight = svc->weight;
@@ -940,7 +940,7 @@ static void update_max_weight(struct csched2_runqueue_data *rqd, int new_weight,
 
 /* Add and remove from runqueue assignment (not active run queue) */
 static void
-_runq_assign(struct csched2_vcpu *svc, struct csched2_runqueue_data *rqd)
+_runq_assign(struct csched2_unit *svc, struct csched2_runqueue_data *rqd)
 {
 
     svc->rqd = rqd;
@@ -970,7 +970,7 @@ _runq_assign(struct csched2_vcpu *svc, struct csched2_runqueue_data *rqd)
 static void
 runq_assign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_unit->priv;
+    struct csched2_unit *svc = vc->sched_unit->priv;
 
     ASSERT(svc->rqd == NULL);
 
@@ -978,7 +978,7 @@ runq_assign(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-_runq_deassign(struct csched2_vcpu *svc)
+_runq_deassign(struct csched2_unit *svc)
 {
     struct csched2_runqueue_data *rqd = svc->rqd;
 
@@ -997,7 +997,7 @@ _runq_deassign(struct csched2_vcpu *svc)
 static void
 runq_deassign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_unit->priv;
+    struct csched2_unit *svc = vc->sched_unit->priv;
 
     ASSERT(svc->rqd == c2rqd(ops, vc->processor));
 
@@ -1199,7 +1199,7 @@ update_runq_load(const struct scheduler *ops,
 
 static void
 update_svc_load(const struct scheduler *ops,
-                struct csched2_vcpu *svc, int change, s_time_t now)
+                struct csched2_unit *svc, int change, s_time_t now)
 {
     struct csched2_private *prv = csched2_priv(ops);
     s_time_t delta, vcpu_load;
@@ -1259,7 +1259,7 @@ update_svc_load(const struct scheduler *ops,
 static void
 update_load(const struct scheduler *ops,
             struct csched2_runqueue_data *rqd,
-            struct csched2_vcpu *svc, int change, s_time_t now)
+            struct csched2_unit *svc, int change, s_time_t now)
 {
     trace_var(TRC_CSCHED2_UPDATE_LOAD, 1, 0,  NULL);
 
@@ -1269,7 +1269,7 @@ update_load(const struct scheduler *ops,
 }
 
 static void
-runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
+runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
 {
     struct list_head *iter;
     unsigned int cpu = svc->vcpu->processor;
@@ -1288,7 +1288,7 @@ runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
 
     list_for_each( iter, runq )
     {
-        struct csched2_vcpu * iter_svc = runq_elem(iter);
+        struct csched2_unit * iter_svc = runq_elem(iter);
 
         if ( svc->credit > iter_svc->credit )
             break;
@@ -1312,13 +1312,13 @@ runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
     }
 }
 
-static inline void runq_remove(struct csched2_vcpu *svc)
+static inline void runq_remove(struct csched2_unit *svc)
 {
     ASSERT(vcpu_on_runq(svc));
     list_del_init(&svc->runq_elem);
 }
 
-void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
+void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_unit *, s_time_t);
 
 static inline void
 tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
@@ -1334,7 +1334,7 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
  * whether or not it already run for more than the ratelimit, to which we
  * apply some tolerance).
  */
-static inline bool is_preemptable(const struct csched2_vcpu *svc,
+static inline bool is_preemptable(const struct csched2_unit *svc,
                                     s_time_t now, s_time_t ratelimit)
 {
     if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
@@ -1360,10 +1360,10 @@ static inline bool is_preemptable(const struct csched2_vcpu *svc,
  * Within the same class, the highest difference of credit.
  */
 static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
-                             struct csched2_vcpu *new, unsigned int cpu)
+                             struct csched2_unit *new, unsigned int cpu)
 {
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
-    struct csched2_vcpu * cur = csched2_vcpu(curr_on_cpu(cpu));
+    struct csched2_unit * cur = csched2_unit(curr_on_cpu(cpu));
     struct csched2_private *prv = csched2_priv(ops);
     s_time_t score;
 
@@ -1432,7 +1432,7 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
  * pick up some work, so it would be wrong to consider it idle.
  */
 static void
-runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
+runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t max = 0;
@@ -1587,7 +1587,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         return;
     }
 
-    ASSERT(!is_idle_vcpu(curr_on_cpu(ipid)));
+    ASSERT(!is_idle_vcpu(curr_on_cpu(ipid)->vcpu));
     SCHED_STAT_CRANK(tickled_busy_cpu);
  tickle:
     BUG_ON(ipid == -1);
@@ -1614,7 +1614,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
  * Credit-related code
  */
 static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
-                         struct csched2_vcpu *snext)
+                         struct csched2_unit *snext)
 {
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
     struct list_head *iter;
@@ -1644,10 +1644,10 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
     list_for_each( iter, &rqd->svc )
     {
         unsigned int svc_cpu;
-        struct csched2_vcpu * svc;
+        struct csched2_unit * svc;
         int start_credit;
 
-        svc = list_entry(iter, struct csched2_vcpu, rqd_elem);
+        svc = list_entry(iter, struct csched2_unit, rqd_elem);
         svc_cpu = svc->vcpu->processor;
 
         ASSERT(!is_idle_vcpu(svc->vcpu));
@@ -1657,7 +1657,7 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
          * If svc is running, it is our responsibility to make sure, here,
          * that the credit it has spent so far get accounted.
          */
-        if ( svc->vcpu == curr_on_cpu(svc_cpu) )
+        if ( svc->vcpu == curr_on_cpu(svc_cpu)->vcpu )
         {
             burn_credits(rqd, svc, now);
             /*
@@ -1709,11 +1709,11 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
 }
 
 void burn_credits(struct csched2_runqueue_data *rqd,
-                  struct csched2_vcpu *svc, s_time_t now)
+                  struct csched2_unit *svc, s_time_t now)
 {
     s_time_t delta;
 
-    ASSERT(svc == csched2_vcpu(curr_on_cpu(svc->vcpu->processor)));
+    ASSERT(svc == csched2_unit(curr_on_cpu(svc->vcpu->processor)));
 
     if ( unlikely(is_idle_vcpu(svc->vcpu)) )
     {
@@ -1763,7 +1763,7 @@ void burn_credits(struct csched2_runqueue_data *rqd,
  * Budget-related code.
  */
 
-static void park_vcpu(struct csched2_vcpu *svc)
+static void park_vcpu(struct csched2_unit *svc)
 {
     struct vcpu *v = svc->vcpu;
 
@@ -1792,7 +1792,7 @@ static void park_vcpu(struct csched2_vcpu *svc)
     list_add(&svc->parked_elem, &svc->sdom->parked_vcpus);
 }
 
-static bool vcpu_grab_budget(struct csched2_vcpu *svc)
+static bool vcpu_grab_budget(struct csched2_unit *svc)
 {
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
@@ -1839,7 +1839,7 @@ static bool vcpu_grab_budget(struct csched2_vcpu *svc)
 }
 
 static void
-vcpu_return_budget(struct csched2_vcpu *svc, struct list_head *parked)
+vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked)
 {
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
@@ -1882,7 +1882,7 @@ vcpu_return_budget(struct csched2_vcpu *svc, struct list_head *parked)
 static void
 unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
 {
-    struct csched2_vcpu *svc, *tmp;
+    struct csched2_unit *svc, *tmp;
     spinlock_t *lock;
 
     list_for_each_entry_safe(svc, tmp, vcpus, parked_elem)
@@ -2004,7 +2004,7 @@ static void replenish_domain_budget(void* data)
 static inline void
 csched2_vcpu_check(struct vcpu *vc)
 {
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(vc->sched_unit);
     struct csched2_dom * const sdom = svc->sdom;
 
     BUG_ON( svc->vcpu != vc );
@@ -2030,10 +2030,10 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
                     void *dd)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu *svc;
+    struct csched2_unit *svc;
 
     /* Allocate per-VCPU info */
-    svc = xzalloc(struct csched2_vcpu);
+    svc = xzalloc(struct csched2_unit);
     if ( svc == NULL )
         return NULL;
 
@@ -2074,12 +2074,12 @@ static void
 csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(unit);
 
     ASSERT(!is_idle_vcpu(vc));
     SCHED_STAT_CRANK(vcpu_sleep);
 
-    if ( curr_on_cpu(vc->processor) == vc )
+    if ( curr_on_cpu(vc->processor) == unit )
     {
         tickle_cpu(vc->processor, svc->rqd);
     }
@@ -2097,7 +2097,7 @@ static void
 csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(unit);
     unsigned int cpu = vc->processor;
     s_time_t now;
 
@@ -2105,7 +2105,7 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
     ASSERT(!is_idle_vcpu(vc));
 
-    if ( unlikely(curr_on_cpu(cpu) == vc) )
+    if ( unlikely(curr_on_cpu(cpu) == unit) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         goto out;
@@ -2152,8 +2152,7 @@ out:
 static void
 csched2_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(v);
+    struct csched2_unit * const svc = csched2_unit(unit);
 
     __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
 }
@@ -2162,7 +2161,7 @@ static void
 csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(unit);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
     s_time_t now = NOW();
     LIST_HEAD(were_parked);
@@ -2208,7 +2207,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
     struct vcpu *vc = unit->vcpu;
     int i, min_rqi = -1, min_s_rqi = -1;
     unsigned int new_cpu, cpu = vc->processor;
-    struct csched2_vcpu *svc = csched2_vcpu(vc);
+    struct csched2_unit *svc = csched2_unit(unit);
     s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
     bool has_soft;
 
@@ -2430,15 +2429,15 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 typedef struct {
     /* NB: Modified by consider() */
     s_time_t load_delta;
-    struct csched2_vcpu * best_push_svc, *best_pull_svc;
+    struct csched2_unit * best_push_svc, *best_pull_svc;
     /* NB: Read by consider() */
     struct csched2_runqueue_data *lrqd;
     struct csched2_runqueue_data *orqd;                  
 } balance_state_t;
 
 static void consider(balance_state_t *st, 
-                     struct csched2_vcpu *push_svc,
-                     struct csched2_vcpu *pull_svc)
+                     struct csched2_unit *push_svc,
+                     struct csched2_unit *pull_svc)
 {
     s_time_t l_load, o_load, delta;
 
@@ -2471,8 +2470,8 @@ static void consider(balance_state_t *st,
 
 
 static void migrate(const struct scheduler *ops,
-                    struct csched2_vcpu *svc, 
-                    struct csched2_runqueue_data *trqd, 
+                    struct csched2_unit *svc,
+                    struct csched2_runqueue_data *trqd,
                     s_time_t now)
 {
     int cpu = svc->vcpu->processor;
@@ -2541,7 +2540,7 @@ static void migrate(const struct scheduler *ops,
  *  - svc is not already flagged to migrate,
  *  - if svc is allowed to run on at least one of the pcpus of rqd.
  */
-static bool vcpu_is_migrateable(struct csched2_vcpu *svc,
+static bool vcpu_is_migrateable(struct csched2_unit *svc,
                                   struct csched2_runqueue_data *rqd)
 {
     struct vcpu *v = svc->vcpu;
@@ -2691,7 +2690,7 @@ retry:
     /* Reuse load delta (as we're trying to minimize it) */
     list_for_each( push_iter, &st.lrqd->svc )
     {
-        struct csched2_vcpu * push_svc = list_entry(push_iter, struct csched2_vcpu, rqd_elem);
+        struct csched2_unit * push_svc = list_entry(push_iter, struct csched2_unit, rqd_elem);
 
         update_svc_load(ops, push_svc, 0, now);
 
@@ -2700,7 +2699,7 @@ retry:
 
         list_for_each( pull_iter, &st.orqd->svc )
         {
-            struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
+            struct csched2_unit * pull_svc = list_entry(pull_iter, struct csched2_unit, rqd_elem);
             
             if ( !inner_load_updated )
                 update_svc_load(ops, pull_svc, 0, now);
@@ -2719,7 +2718,7 @@ retry:
 
     list_for_each( pull_iter, &st.orqd->svc )
     {
-        struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
+        struct csched2_unit * pull_svc = list_entry(pull_iter, struct csched2_unit, rqd_elem);
         
         if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
             continue;
@@ -2746,7 +2745,7 @@ csched2_unit_migrate(
 {
     struct vcpu *vc = unit->vcpu;
     struct domain *d = vc->domain;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(unit);
     struct csched2_runqueue_data *trqd;
     s_time_t now = NOW();
 
@@ -2847,7 +2846,7 @@ csched2_dom_cntl(
             /* Update weights for vcpus, and max_weight for runqueues on which they reside */
             for_each_vcpu ( d, v )
             {
-                struct csched2_vcpu *svc = csched2_vcpu(v);
+                struct csched2_unit *svc = csched2_unit(v->sched_unit);
                 spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
 
                 ASSERT(svc->rqd == c2rqd(ops, svc->vcpu->processor));
@@ -2861,7 +2860,7 @@ csched2_dom_cntl(
         /* Cap */
         if ( op->u.credit2.cap != 0 )
         {
-            struct csched2_vcpu *svc;
+            struct csched2_unit *svc;
             spinlock_t *lock;
 
             /* Cap is only valid if it's below 100 * nr_of_vCPUS */
@@ -2885,7 +2884,7 @@ csched2_dom_cntl(
              */
             for_each_vcpu ( d, v )
             {
-                svc = csched2_vcpu(v);
+                svc = csched2_unit(v->sched_unit);
                 lock = vcpu_schedule_lock(svc->vcpu);
                 /*
                  * Too small quotas would in theory cause a lot of overhead,
@@ -2928,14 +2927,14 @@ csched2_dom_cntl(
                  */
                 for_each_vcpu ( d, v )
                 {
-                    svc = csched2_vcpu(v);
+                    svc = csched2_unit(v->sched_unit);
                     lock = vcpu_schedule_lock(svc->vcpu);
                     if ( v->is_running )
                     {
                         unsigned int cpu = v->processor;
                         struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
 
-                        ASSERT(curr_on_cpu(cpu) == v);
+                        ASSERT(curr_on_cpu(cpu)->vcpu == v);
 
                         /*
                          * We are triggering a reschedule on the vCPU's
@@ -2975,7 +2974,7 @@ csched2_dom_cntl(
             /* Disable budget accounting for all the vCPUs. */
             for_each_vcpu ( d, v )
             {
-                struct csched2_vcpu *svc = csched2_vcpu(v);
+                struct csched2_unit *svc = csched2_unit(v->sched_unit);
                 spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
 
                 svc->budget = STIME_MAX;
@@ -3012,8 +3011,7 @@ static void
 csched2_aff_cntl(const struct scheduler *ops, struct sched_unit *unit,
                  const cpumask_t *hard, const cpumask_t *soft)
 {
-    struct vcpu *v = unit->vcpu;
-    struct csched2_vcpu *svc = csched2_vcpu(v);
+    struct csched2_unit *svc = csched2_unit(unit);
 
     if ( !hard )
         return;
@@ -3113,7 +3111,7 @@ static void
 csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu *svc = unit->priv;
+    struct csched2_unit *svc = unit->priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
@@ -3145,7 +3143,7 @@ csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 static void
 csched2_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct csched2_vcpu *svc = priv;
+    struct csched2_unit *svc = priv;
 
     xfree(svc);
 }
@@ -3154,7 +3152,7 @@ static void
 csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_unit * const svc = csched2_unit(unit);
     spinlock_t *lock;
 
     ASSERT(!is_idle_vcpu(vc));
@@ -3175,7 +3173,7 @@ csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 /* How long should we let this vcpu run for? */
 static s_time_t
 csched2_runtime(const struct scheduler *ops, int cpu,
-                struct csched2_vcpu *snext, s_time_t now)
+                struct csched2_unit *snext, s_time_t now)
 {
     s_time_t time, min_time;
     int rt_credit; /* Proposed runtime measured in credits */
@@ -3220,7 +3218,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
      */
     if ( ! list_empty(runq) )
     {
-        struct csched2_vcpu *swait = runq_elem(runq->next);
+        struct csched2_unit *swait = runq_elem(runq->next);
 
         if ( ! is_idle_vcpu(swait->vcpu)
              && swait->credit > 0 )
@@ -3271,14 +3269,14 @@ csched2_runtime(const struct scheduler *ops, int cpu,
 /*
  * Find a candidate.
  */
-static struct csched2_vcpu *
+static struct csched2_unit *
 runq_candidate(struct csched2_runqueue_data *rqd,
-               struct csched2_vcpu *scurr,
+               struct csched2_unit *scurr,
                int cpu, s_time_t now,
                unsigned int *skipped)
 {
     struct list_head *iter, *temp;
-    struct csched2_vcpu *snext = NULL;
+    struct csched2_unit *snext = NULL;
     struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
     bool yield = false, soft_aff_preempt = false;
 
@@ -3359,12 +3357,12 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
         snext = scurr;
     else
-        snext = csched2_vcpu(idle_vcpu[cpu]);
+        snext = csched2_unit(idle_vcpu[cpu]->sched_unit);
 
  check_runq:
     list_for_each_safe( iter, temp, &rqd->runq )
     {
-        struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
+        struct csched2_unit * svc = list_entry(iter, struct csched2_unit, runq_elem);
 
         if ( unlikely(tb_init_done) )
         {
@@ -3463,8 +3461,8 @@ csched2_schedule(
 {
     const int cpu = smp_processor_id();
     struct csched2_runqueue_data *rqd;
-    struct csched2_vcpu * const scurr = csched2_vcpu(current);
-    struct csched2_vcpu *snext = NULL;
+    struct csched2_unit * const scurr = csched2_unit(current->sched_unit);
+    struct csched2_unit *snext = NULL;
     unsigned int skipped_vcpus = 0;
     struct task_slice ret;
     bool tickled;
@@ -3540,7 +3538,7 @@ csched2_schedule(
     {
         __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
         trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
-        snext = csched2_vcpu(idle_vcpu[cpu]);
+        snext = csched2_unit(idle_vcpu[cpu]->sched_unit);
     }
     else
         snext = runq_candidate(rqd, scurr, cpu, now, &skipped_vcpus);
@@ -3643,7 +3641,7 @@ csched2_schedule(
 }
 
 static void
-csched2_dump_vcpu(struct csched2_private *prv, struct csched2_vcpu *svc)
+csched2_dump_vcpu(struct csched2_private *prv, struct csched2_unit *svc)
 {
     printk("[%i.%i] flags=%x cpu=%i",
             svc->vcpu->domain->domain_id,
@@ -3667,7 +3665,7 @@ static inline void
 dump_pcpu(const struct scheduler *ops, int cpu)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    struct csched2_vcpu *svc;
+    struct csched2_unit *svc;
 
     printk("CPU[%02d] runq=%d, sibling=%*pb, core=%*pb\n",
            cpu, c2r(cpu),
@@ -3675,7 +3673,7 @@ dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
     /* current VCPU (nothing to say if that's the idle vcpu) */
-    svc = csched2_vcpu(curr_on_cpu(cpu));
+    svc = csched2_unit(curr_on_cpu(cpu));
     if ( svc && !is_idle_vcpu(svc->vcpu) )
     {
         printk("\trun: ");
@@ -3748,7 +3746,7 @@ csched2_dump(const struct scheduler *ops)
 
         for_each_vcpu( sdom->dom, v )
         {
-            struct csched2_vcpu * const svc = csched2_vcpu(v);
+            struct csched2_unit * const svc = csched2_unit(v->sched_unit);
             spinlock_t *lock;
 
             lock = vcpu_schedule_lock(svc->vcpu);
@@ -3777,7 +3775,7 @@ csched2_dump(const struct scheduler *ops)
         printk("RUNQ:\n");
         list_for_each( iter, runq )
         {
-            struct csched2_vcpu *svc = runq_elem(iter);
+            struct csched2_unit *svc = runq_elem(iter);
 
             if ( svc )
             {
@@ -3878,7 +3876,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                      void *pdata, void *vdata)
 {
     struct csched2_private *prv = csched2_priv(new_ops);
-    struct csched2_vcpu *svc = vdata;
+    struct csched2_unit *svc = vdata;
     unsigned rqi;
 
     ASSERT(pdata && svc && is_idle_vcpu(svc->vcpu));
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 1ea8643a9c..defcdfcf1e 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -94,7 +94,7 @@ DEFINE_PER_CPU(struct null_pcpu, npc);
 /*
  * Virtual CPU
  */
-struct null_vcpu {
+struct null_unit {
     struct list_head waitq_elem;
     struct vcpu *vcpu;
 };
@@ -115,9 +115,9 @@ static inline struct null_private *null_priv(const struct scheduler *ops)
     return ops->sched_data;
 }
 
-static inline struct null_vcpu *null_vcpu(const struct vcpu *v)
+static inline struct null_unit *null_unit(const struct sched_unit *unit)
 {
-    return v->sched_unit->priv;
+    return unit->priv;
 }
 
 static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
@@ -197,9 +197,9 @@ static void *null_alloc_vdata(const struct scheduler *ops,
                               struct sched_unit *unit, void *dd)
 {
     struct vcpu *v = unit->vcpu;
-    struct null_vcpu *nvc;
+    struct null_unit *nvc;
 
-    nvc = xzalloc(struct null_vcpu);
+    nvc = xzalloc(struct null_unit);
     if ( nvc == NULL )
         return NULL;
 
@@ -213,7 +213,7 @@ static void *null_alloc_vdata(const struct scheduler *ops,
 
 static void null_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct null_vcpu *nvc = priv;
+    struct null_unit *nvc = priv;
 
     xfree(nvc);
 }
@@ -391,7 +391,7 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct null_private *prv = null_priv(new_ops);
-    struct null_vcpu *nvc = vdata;
+    struct null_unit *nvc = vdata;
 
     ASSERT(nvc && is_idle_vcpu(nvc->vcpu));
 
@@ -414,7 +414,7 @@ static void null_unit_insert(const struct scheduler *ops,
 {
     struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc = null_vcpu(v);
+    struct null_unit *nvc = null_unit(unit);
     unsigned int cpu;
     spinlock_t *lock;
 
@@ -471,9 +471,9 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
 {
     unsigned int bs;
     unsigned int cpu = v->processor;
-    struct null_vcpu *wvc;
+    struct null_unit *wvc;
 
-    ASSERT(list_empty(&null_vcpu(v)->waitq_elem));
+    ASSERT(list_empty(&null_unit(v->sched_unit)->waitq_elem));
 
     vcpu_deassign(prv, v, cpu);
 
@@ -509,7 +509,7 @@ static void null_unit_remove(const struct scheduler *ops,
 {
     struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc = null_vcpu(v);
+    struct null_unit *nvc = null_unit(unit);
     spinlock_t *lock;
 
     ASSERT(!is_idle_vcpu(v));
@@ -544,13 +544,13 @@ static void null_unit_wake(const struct scheduler *ops,
 
     ASSERT(!is_idle_vcpu(v));
 
-    if ( unlikely(curr_on_cpu(v->processor) == v) )
+    if ( unlikely(curr_on_cpu(v->processor) == unit) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         return;
     }
 
-    if ( unlikely(!list_empty(&null_vcpu(v)->waitq_elem)) )
+    if ( unlikely(!list_empty(&null_unit(unit)->waitq_elem)) )
     {
         /* Not exactly "on runq", but close enough for reusing the counter */
         SCHED_STAT_CRANK(vcpu_wake_onrunq);
@@ -574,7 +574,7 @@ static void null_unit_sleep(const struct scheduler *ops,
     ASSERT(!is_idle_vcpu(v));
 
     /* If v is not assigned to a pCPU, or is not running, no need to bother */
-    if ( curr_on_cpu(v->processor) == v )
+    if ( curr_on_cpu(v->processor) == unit )
         cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
 
     SCHED_STAT_CRANK(vcpu_sleep);
@@ -592,7 +592,7 @@ static void null_unit_migrate(const struct scheduler *ops,
 {
     struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc = null_vcpu(v);
+    struct null_unit *nvc = null_unit(unit);
 
     ASSERT(!is_idle_vcpu(v));
 
@@ -677,7 +677,7 @@ static void null_unit_migrate(const struct scheduler *ops,
 #ifndef NDEBUG
 static inline void null_vcpu_check(struct vcpu *v)
 {
-    struct null_vcpu * const nvc = null_vcpu(v);
+    struct null_unit * const nvc = null_unit(v->sched_unit);
     struct null_dom * const ndom = v->domain->sched_priv;
 
     BUG_ON(nvc->vcpu != v);
@@ -707,7 +707,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     unsigned int bs;
     const unsigned int cpu = smp_processor_id();
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *wvc;
+    struct null_unit *wvc;
     struct task_slice ret;
 
     SCHED_STAT_CRANK(schedule);
@@ -790,7 +790,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     return ret;
 }
 
-static inline void dump_vcpu(struct null_private *prv, struct null_vcpu *nvc)
+static inline void dump_vcpu(struct null_private *prv, struct null_unit *nvc)
 {
     printk("[%i.%i] pcpu=%d", nvc->vcpu->domain->domain_id,
             nvc->vcpu->vcpu_id, list_empty(&nvc->waitq_elem) ?
@@ -800,7 +800,7 @@ static inline void dump_vcpu(struct null_private *prv, struct null_vcpu *nvc)
 static void null_dump_pcpu(const struct scheduler *ops, int cpu)
 {
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc;
+    struct null_unit *nvc;
     spinlock_t *lock;
     unsigned long flags;
 
@@ -815,7 +815,7 @@ static void null_dump_pcpu(const struct scheduler *ops, int cpu)
     printk("\n");
 
     /* current VCPU (nothing to say if that's the idle vcpu) */
-    nvc = null_vcpu(curr_on_cpu(cpu));
+    nvc = null_unit(curr_on_cpu(cpu));
     if ( nvc && !is_idle_vcpu(nvc->vcpu) )
     {
         printk("\trun: ");
@@ -849,7 +849,7 @@ static void null_dump(const struct scheduler *ops)
         printk("\tDomain: %d\n", ndom->dom->domain_id);
         for_each_vcpu( ndom->dom, v )
         {
-            struct null_vcpu * const nvc = null_vcpu(v);
+            struct null_unit * const nvc = null_unit(v->sched_unit);
             spinlock_t *lock;
 
             lock = vcpu_schedule_lock(nvc->vcpu);
@@ -867,7 +867,7 @@ static void null_dump(const struct scheduler *ops)
     spin_lock(&prv->waitq_lock);
     list_for_each( iter, &prv->waitq )
     {
-        struct null_vcpu *nvc = list_entry(iter, struct null_vcpu, waitq_elem);
+        struct null_unit *nvc = list_entry(iter, struct null_unit, waitq_elem);
 
         if ( loop++ != 0 )
             printk(", ");
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index b80bc02357..8caec5c5dc 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -195,7 +195,7 @@ struct rt_private {
 /*
  * Virtual CPU
  */
-struct rt_vcpu {
+struct rt_unit {
     struct list_head q_elem;     /* on the runq/depletedq list */
     struct list_head replq_elem; /* on the replenishment events list */
 
@@ -233,9 +233,9 @@ static inline struct rt_private *rt_priv(const struct scheduler *ops)
     return ops->sched_data;
 }
 
-static inline struct rt_vcpu *rt_vcpu(const struct vcpu *vcpu)
+static inline struct rt_unit *rt_unit(const struct sched_unit *unit)
 {
-    return vcpu->sched_unit->priv;
+    return unit->priv;
 }
 
 static inline struct list_head *rt_runq(const struct scheduler *ops)
@@ -253,7 +253,7 @@ static inline struct list_head *rt_replq(const struct scheduler *ops)
     return &rt_priv(ops)->replq;
 }
 
-static inline bool has_extratime(const struct rt_vcpu *svc)
+static inline bool has_extratime(const struct rt_unit *svc)
 {
     return svc->flags & RTDS_extratime;
 }
@@ -263,25 +263,25 @@ static inline bool has_extratime(const struct rt_vcpu *svc)
  * and the replenishment events queue.
  */
 static int
-vcpu_on_q(const struct rt_vcpu *svc)
+vcpu_on_q(const struct rt_unit *svc)
 {
    return !list_empty(&svc->q_elem);
 }
 
-static struct rt_vcpu *
+static struct rt_unit *
 q_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct rt_vcpu, q_elem);
+    return list_entry(elem, struct rt_unit, q_elem);
 }
 
-static struct rt_vcpu *
+static struct rt_unit *
 replq_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct rt_vcpu, replq_elem);
+    return list_entry(elem, struct rt_unit, replq_elem);
 }
 
 static int
-vcpu_on_replq(const struct rt_vcpu *svc)
+vcpu_on_replq(const struct rt_unit *svc)
 {
     return !list_empty(&svc->replq_elem);
 }
@@ -291,7 +291,7 @@ vcpu_on_replq(const struct rt_vcpu *svc)
  * Otherwise, return value < 0
  */
 static s_time_t
-compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
+compare_vcpu_priority(const struct rt_unit *v1, const struct rt_unit *v2)
 {
     int prio = v2->priority_level - v1->priority_level;
 
@@ -305,7 +305,7 @@ compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
  * Debug related code, dump vcpu/cpu information
  */
 static void
-rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
+rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc)
 {
     cpumask_t *cpupool_mask, *mask;
 
@@ -352,13 +352,13 @@ static void
 rt_dump_pcpu(const struct scheduler *ops, int cpu)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *svc;
+    struct rt_unit *svc;
     unsigned long flags;
 
     spin_lock_irqsave(&prv->lock, flags);
     printk("CPU[%02d]\n", cpu);
     /* current VCPU (nothing to say if that's the idle vcpu). */
-    svc = rt_vcpu(curr_on_cpu(cpu));
+    svc = rt_unit(curr_on_cpu(cpu));
     if ( svc && !is_idle_vcpu(svc->vcpu) )
     {
         rt_dump_vcpu(ops, svc);
@@ -371,7 +371,7 @@ rt_dump(const struct scheduler *ops)
 {
     struct list_head *runq, *depletedq, *replq, *iter;
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *svc;
+    struct rt_unit *svc;
     struct rt_dom *sdom;
     unsigned long flags;
 
@@ -415,7 +415,7 @@ rt_dump(const struct scheduler *ops)
 
         for_each_vcpu ( sdom->dom, v )
         {
-            svc = rt_vcpu(v);
+            svc = rt_unit(v->sched_unit);
             rt_dump_vcpu(ops, svc);
         }
     }
@@ -429,7 +429,7 @@ rt_dump(const struct scheduler *ops)
  * it needs to be updated to the deadline of the current period
  */
 static void
-rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
+rt_update_deadline(s_time_t now, struct rt_unit *svc)
 {
     ASSERT(now >= svc->cur_deadline);
     ASSERT(svc->period != 0);
@@ -500,8 +500,8 @@ deadline_queue_remove(struct list_head *queue, struct list_head *elem)
 }
 
 static inline bool
-deadline_queue_insert(struct rt_vcpu * (*qelem)(struct list_head *),
-                      struct rt_vcpu *svc, struct list_head *elem,
+deadline_queue_insert(struct rt_unit * (*qelem)(struct list_head *),
+                      struct rt_unit *svc, struct list_head *elem,
                       struct list_head *queue)
 {
     struct list_head *iter;
@@ -509,7 +509,7 @@ deadline_queue_insert(struct rt_vcpu * (*qelem)(struct list_head *),
 
     list_for_each ( iter, queue )
     {
-        struct rt_vcpu * iter_svc = (*qelem)(iter);
+        struct rt_unit * iter_svc = (*qelem)(iter);
         if ( compare_vcpu_priority(svc, iter_svc) > 0 )
             break;
         pos++;
@@ -523,14 +523,14 @@ deadline_queue_insert(struct rt_vcpu * (*qelem)(struct list_head *),
   deadline_queue_insert(&replq_elem, ##__VA_ARGS__)
 
 static inline void
-q_remove(struct rt_vcpu *svc)
+q_remove(struct rt_unit *svc)
 {
     ASSERT( vcpu_on_q(svc) );
     list_del_init(&svc->q_elem);
 }
 
 static inline void
-replq_remove(const struct scheduler *ops, struct rt_vcpu *svc)
+replq_remove(const struct scheduler *ops, struct rt_unit *svc)
 {
     struct rt_private *prv = rt_priv(ops);
     struct list_head *replq = rt_replq(ops);
@@ -547,7 +547,7 @@ replq_remove(const struct scheduler *ops, struct rt_vcpu *svc)
          */
         if ( !list_empty(replq) )
         {
-            struct rt_vcpu *svc_next = replq_elem(replq->next);
+            struct rt_unit *svc_next = replq_elem(replq->next);
             set_timer(&prv->repl_timer, svc_next->cur_deadline);
         }
         else
@@ -561,7 +561,7 @@ replq_remove(const struct scheduler *ops, struct rt_vcpu *svc)
  * Insert svc without budget in DepletedQ unsorted;
  */
 static void
-runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
+runq_insert(const struct scheduler *ops, struct rt_unit *svc)
 {
     struct rt_private *prv = rt_priv(ops);
     struct list_head *runq = rt_runq(ops);
@@ -579,7 +579,7 @@ runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
 }
 
 static void
-replq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
+replq_insert(const struct scheduler *ops, struct rt_unit *svc)
 {
     struct list_head *replq = rt_replq(ops);
     struct rt_private *prv = rt_priv(ops);
@@ -601,10 +601,10 @@ replq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
  * changed.
  */
 static void
-replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc)
+replq_reinsert(const struct scheduler *ops, struct rt_unit *svc)
 {
     struct list_head *replq = rt_replq(ops);
-    struct rt_vcpu *rearm_svc = svc;
+    struct rt_unit *rearm_svc = svc;
     bool_t rearm = 0;
 
     ASSERT( vcpu_on_replq(svc) );
@@ -735,7 +735,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                 void *pdata, void *vdata)
 {
     struct rt_private *prv = rt_priv(new_ops);
-    struct rt_vcpu *svc = vdata;
+    struct rt_unit *svc = vdata;
 
     ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
 
@@ -842,10 +842,10 @@ static void *
 rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu *svc;
+    struct rt_unit *svc;
 
     /* Allocate per-VCPU info */
-    svc = xzalloc(struct rt_vcpu);
+    svc = xzalloc(struct rt_unit);
     if ( svc == NULL )
         return NULL;
 
@@ -870,7 +870,7 @@ rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
 static void
 rt_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct rt_vcpu *svc = priv;
+    struct rt_unit *svc = priv;
 
     xfree(svc);
 }
@@ -886,7 +886,7 @@ static void
 rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu *svc = rt_vcpu(vc);
+    struct rt_unit *svc = rt_unit(unit);
     s_time_t now;
     spinlock_t *lock;
 
@@ -915,13 +915,13 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 }
 
 /*
- * Remove rt_vcpu svc from the old scheduler in source cpupool.
+ * Remove rt_unit svc from the old scheduler in source cpupool.
  */
 static void
 rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu * const svc = rt_vcpu(vc);
+    struct rt_unit * const svc = rt_unit(unit);
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
@@ -943,7 +943,7 @@ rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
  * Burn budget in nanosecond granularity
  */
 static void
-burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
+burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
 {
     s_time_t delta;
 
@@ -1007,13 +1007,13 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
  * RunQ is sorted. Pick first one within cpumask. If no one, return NULL
  * lock is grabbed before calling this function
  */
-static struct rt_vcpu *
+static struct rt_unit *
 runq_pick(const struct scheduler *ops, const cpumask_t *mask)
 {
     struct list_head *runq = rt_runq(ops);
     struct list_head *iter;
-    struct rt_vcpu *svc = NULL;
-    struct rt_vcpu *iter_svc = NULL;
+    struct rt_unit *svc = NULL;
+    struct rt_unit *iter_svc = NULL;
     cpumask_t cpu_common;
     cpumask_t *online;
 
@@ -1064,8 +1064,8 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
 {
     const int cpu = smp_processor_id();
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *const scurr = rt_vcpu(current);
-    struct rt_vcpu *snext = NULL;
+    struct rt_unit *const scurr = rt_unit(current->sched_unit);
+    struct rt_unit *snext = NULL;
     struct task_slice ret = { .migrated = 0 };
 
     /* TRACE */
@@ -1091,13 +1091,13 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0,  NULL);
-        snext = rt_vcpu(idle_vcpu[cpu]);
+        snext = rt_unit(idle_vcpu[cpu]->sched_unit);
     }
     else
     {
         snext = runq_pick(ops, cpumask_of(cpu));
         if ( snext == NULL )
-            snext = rt_vcpu(idle_vcpu[cpu]);
+            snext = rt_unit(idle_vcpu[cpu]->sched_unit);
 
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_vcpu(current) &&
@@ -1143,12 +1143,12 @@ static void
 rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu * const svc = rt_vcpu(vc);
+    struct rt_unit * const svc = rt_unit(unit);
 
     BUG_ON( is_idle_vcpu(vc) );
     SCHED_STAT_CRANK(vcpu_sleep);
 
-    if ( curr_on_cpu(vc->processor) == vc )
+    if ( curr_on_cpu(vc->processor) == unit )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
     else if ( vcpu_on_q(svc) )
     {
@@ -1178,11 +1178,11 @@ rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
  * lock is grabbed before calling this function
  */
 static void
-runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
+runq_tickle(const struct scheduler *ops, struct rt_unit *new)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *latest_deadline_vcpu = NULL; /* lowest priority */
-    struct rt_vcpu *iter_svc;
+    struct rt_unit *latest_deadline_vcpu = NULL; /* lowest priority */
+    struct rt_unit *iter_svc;
     struct vcpu *iter_vc;
     int cpu = 0, cpu_to_tickle = 0;
     cpumask_t not_tickled;
@@ -1203,14 +1203,14 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
     cpu = cpumask_test_or_cycle(new->vcpu->processor, &not_tickled);
     while ( cpu!= nr_cpu_ids )
     {
-        iter_vc = curr_on_cpu(cpu);
+        iter_vc = curr_on_cpu(cpu)->vcpu;
         if ( is_idle_vcpu(iter_vc) )
         {
             SCHED_STAT_CRANK(tickled_idle_cpu);
             cpu_to_tickle = cpu;
             goto out;
         }
-        iter_svc = rt_vcpu(iter_vc);
+        iter_svc = rt_unit(iter_vc->sched_unit);
         if ( latest_deadline_vcpu == NULL ||
              compare_vcpu_priority(iter_svc, latest_deadline_vcpu) < 0 )
             latest_deadline_vcpu = iter_svc;
@@ -1259,13 +1259,13 @@ static void
 rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu * const svc = rt_vcpu(vc);
+    struct rt_unit * const svc = rt_unit(unit);
     s_time_t now;
     bool_t missed;
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
+    if ( unlikely(curr_on_cpu(vc->processor) == unit) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         return;
@@ -1330,7 +1330,7 @@ static void
 rt_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
-    struct rt_vcpu *svc = rt_vcpu(vc);
+    struct rt_unit *svc = rt_unit(unit);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
 
     __clear_bit(__RTDS_scheduled, &svc->flags);
@@ -1361,7 +1361,7 @@ rt_dom_cntl(
     struct xen_domctl_scheduler_op *op)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *svc;
+    struct rt_unit *svc;
     struct vcpu *v;
     unsigned long flags;
     int rc = 0;
@@ -1385,7 +1385,7 @@ rt_dom_cntl(
         spin_lock_irqsave(&prv->lock, flags);
         for_each_vcpu ( d, v )
         {
-            svc = rt_vcpu(v);
+            svc = rt_unit(v->sched_unit);
             svc->period = MICROSECS(op->u.rtds.period); /* transfer to nanosec */
             svc->budget = MICROSECS(op->u.rtds.budget);
         }
@@ -1411,7 +1411,7 @@ rt_dom_cntl(
             if ( op->cmd == XEN_DOMCTL_SCHEDOP_getvcpuinfo )
             {
                 spin_lock_irqsave(&prv->lock, flags);
-                svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
+                svc = rt_unit(d->vcpu[local_sched.vcpuid]->sched_unit);
                 local_sched.u.rtds.budget = svc->budget / MICROSECS(1);
                 local_sched.u.rtds.period = svc->period / MICROSECS(1);
                 if ( has_extratime(svc) )
@@ -1439,7 +1439,7 @@ rt_dom_cntl(
                 }
 
                 spin_lock_irqsave(&prv->lock, flags);
-                svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
+                svc = rt_unit(d->vcpu[local_sched.vcpuid]->sched_unit);
                 svc->period = period;
                 svc->budget = budget;
                 if ( local_sched.u.rtds.flags & XEN_DOMCTL_SCHEDRT_extra )
@@ -1472,7 +1472,7 @@ static void repl_timer_handler(void *data){
     struct list_head *replq = rt_replq(ops);
     struct list_head *runq = rt_runq(ops);
     struct list_head *iter, *tmp;
-    struct rt_vcpu *svc;
+    struct rt_unit *svc;
     LIST_HEAD(tmp_replq);
 
     spin_lock_irq(&prv->lock);
@@ -1514,10 +1514,10 @@ static void repl_timer_handler(void *data){
     {
         svc = replq_elem(iter);
 
-        if ( curr_on_cpu(svc->vcpu->processor) == svc->vcpu &&
+        if ( curr_on_cpu(svc->vcpu->processor) == svc->vcpu->sched_unit &&
              !list_empty(runq) )
         {
-            struct rt_vcpu *next_on_runq = q_elem(runq->next);
+            struct rt_unit *next_on_runq = q_elem(runq->next);
 
             if ( compare_vcpu_priority(svc, next_on_runq) < 0 )
                 runq_tickle(ops, next_on_runq);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5dfdaaece2..c0cec0b6ca 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -334,7 +334,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     /* Idle VCPUs are scheduled immediately, so don't put them in runqueue. */
     if ( is_idle_domain(d) )
     {
-        per_cpu(schedule_data, v->processor).curr = v;
+        per_cpu(schedule_data, v->processor).curr = unit;
         v->is_running = 1;
     }
     else
@@ -1513,7 +1513,7 @@ static void schedule(void)
 
     next = next_slice.task;
 
-    sd->curr = next;
+    sd->curr = next->sched_unit;
 
     if ( next_slice.time >= 0 ) /* -ve means no limit */
         set_timer(&sd->s_timer, now + next_slice.time);
@@ -1636,7 +1636,6 @@ static int cpu_schedule_up(unsigned int cpu)
     per_cpu(scheduler, cpu) = &ops;
     spin_lock_init(&sd->_lock);
     sd->schedule_lock = &sd->_lock;
-    sd->curr = idle_vcpu[cpu];
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&sd->urgent_count, 0);
 
@@ -1670,6 +1669,8 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
+    sd->curr = idle_vcpu[cpu]->sched_unit;
+
     /*
      * We don't want to risk calling xfree() on an sd->sched_priv
      * (e.g., inside free_pdata, from cpu_schedule_down() called
@@ -1863,6 +1864,7 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0, 0) == NULL )
         BUG();
+    this_cpu(schedule_data).curr = idle_vcpu[0]->sched_unit;
     this_cpu(schedule_data).sched_priv = sched_alloc_pdata(&ops, 0);
     BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
     sched_init_pdata(&ops, this_cpu(schedule_data).sched_priv, 0);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index dc149bc102..9b7855e775 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -36,7 +36,7 @@ extern int sched_ratelimit_us;
 struct schedule_data {
     spinlock_t         *schedule_lock,
                        _lock;
-    struct vcpu        *curr;           /* current task                    */
+    struct sched_unit  *curr;           /* current task                    */
     void               *sched_priv;
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 11/60] xen/sched: move per cpu scheduler private data into struct sched_resource
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Tim Deegan, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich, Dario Faggioli, Roger Pau Monné

This prepares support of larger scheduling granularities, e.g. core
scheduling.

While at it move sched_has_urgent_vcpu() from include/asm-x86/cpuidle.h
into schedule.c removing the need for including sched-if.h in
cpuidle.h and multiple other C sources.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: move sched_has_urgent_vcpu()
---
 xen/arch/x86/acpi/cpu_idle.c      |  1 -
 xen/arch/x86/cpu/mcheck/mce.c     |  1 -
 xen/arch/x86/cpu/mcheck/mctelem.c |  1 -
 xen/arch/x86/setup.c              |  1 -
 xen/arch/x86/smpboot.c            |  1 -
 xen/common/sched_arinc653.c       |  4 +--
 xen/common/sched_credit.c         | 12 +++----
 xen/common/sched_credit2.c        | 21 +++++++------
 xen/common/sched_null.c           |  6 ++--
 xen/common/sched_rt.c             |  9 +++---
 xen/common/schedule.c             | 66 ++++++++++++++++++++++-----------------
 xen/include/asm-x86/cpuidle.h     | 11 -------
 xen/include/xen/sched-if.h        | 20 +++++-------
 xen/include/xen/sched.h           |  1 +
 14 files changed, 73 insertions(+), 82 deletions(-)

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index 8846722bca..9f66f70986 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -38,7 +38,6 @@
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
 #include <xen/trace.h>
-#include <xen/sched-if.h>
 #include <xen/irq.h>
 #include <asm/cache.h>
 #include <asm/io.h>
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 30cdb06401..726db75180 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -10,7 +10,6 @@
 #include <xen/errno.h>
 #include <xen/console.h>
 #include <xen/sched.h>
-#include <xen/sched-if.h>
 #include <xen/cpumask.h>
 #include <xen/event.h>
 #include <xen/guest_access.h>
diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c b/xen/arch/x86/cpu/mcheck/mctelem.c
index 3bb13e5265..012a9b95e5 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -18,7 +18,6 @@
 #include <xen/smp.h>
 #include <xen/errno.h>
 #include <xen/sched.h>
-#include <xen/sched-if.h>
 #include <xen/cpumask.h>
 #include <xen/event.h>
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 0ed94a613a..5ad66667ef 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -3,7 +3,6 @@
 #include <xen/err.h>
 #include <xen/grant_table.h>
 #include <xen/sched.h>
-#include <xen/sched-if.h>
 #include <xen/domain.h>
 #include <xen/serial.h>
 #include <xen/softirq.h>
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 274865a705..153bfbb4b7 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -25,7 +25,6 @@
 #include <xen/domain.h>
 #include <xen/domain_page.h>
 #include <xen/sched.h>
-#include <xen/sched-if.h>
 #include <xen/irq.h>
 #include <xen/delay.h>
 #include <xen/softirq.h>
diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 7f26e6a0b0..6e7b2c9968 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -475,7 +475,7 @@ a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
      * If the VCPU being put to sleep is the same one that is currently
      * running, raise a softirq to invoke the scheduler to switch domains.
      */
-    if ( per_cpu(schedule_data, vc->processor).curr == unit )
+    if ( get_sched_res(vc->processor)->curr == unit )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
 }
 
@@ -642,7 +642,7 @@ static spinlock_t *
 a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                   void *pdata, void *vdata)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     arinc653_vcpu_t *svc = vdata;
 
     ASSERT(!pdata && svc && is_idle_vcpu(svc->vc));
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index bc87b813dc..fe4fc5abb1 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -82,7 +82,7 @@
 #define CSCHED_PRIV(_ops)   \
     ((struct csched_private *)((_ops)->sched_data))
 #define CSCHED_PCPU(_c)     \
-    ((struct csched_pcpu *)per_cpu(schedule_data, _c).sched_priv)
+    ((struct csched_pcpu *)get_sched_res(_c)->sched_priv)
 #define CSCHED_UNIT(unit)   ((struct csched_unit *) (unit)->priv)
 #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
 #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
@@ -248,7 +248,7 @@ static inline bool_t is_runq_idle(unsigned int cpu)
     /*
      * We're peeking at cpu's runq, we must hold the proper lock.
      */
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     return list_empty(RUNQ(cpu)) ||
            is_idle_vcpu(__runq_elem(RUNQ(cpu)->next)->vcpu);
@@ -257,7 +257,7 @@ static inline bool_t is_runq_idle(unsigned int cpu)
 static inline void
 inc_nr_runnable(unsigned int cpu)
 {
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
     CSCHED_PCPU(cpu)->nr_runnable++;
 
 }
@@ -265,7 +265,7 @@ inc_nr_runnable(unsigned int cpu)
 static inline void
 dec_nr_runnable(unsigned int cpu)
 {
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
     ASSERT(CSCHED_PCPU(cpu)->nr_runnable >= 1);
     CSCHED_PCPU(cpu)->nr_runnable--;
 }
@@ -615,7 +615,7 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     unsigned long flags;
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
 
     /*
      * This is called either during during boot, resume or hotplug, in
@@ -635,7 +635,7 @@ static spinlock_t *
 csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                     void *pdata, void *vdata)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     struct csched_private *prv = CSCHED_PRIV(new_ops);
     struct csched_unit *svc = vdata;
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 36201815e3..99e993b32f 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -567,7 +567,7 @@ static inline struct csched2_private *csched2_priv(const struct scheduler *ops)
 
 static inline struct csched2_pcpu *csched2_pcpu(unsigned int cpu)
 {
-    return per_cpu(schedule_data, cpu).sched_priv;
+    return get_sched_res(cpu)->sched_priv;
 }
 
 static inline struct csched2_unit *csched2_unit(const struct sched_unit *unit)
@@ -1276,7 +1276,7 @@ runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
     struct list_head * runq = &c2rqd(ops, cpu)->runq;
     int pos = 0;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     ASSERT(!vcpu_on_runq(svc));
     ASSERT(c2r(cpu) == c2r(svc->vcpu->processor));
@@ -1797,7 +1797,7 @@ static bool vcpu_grab_budget(struct csched2_unit *svc)
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     if ( svc->budget > 0 )
         return true;
@@ -1844,7 +1844,7 @@ vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked)
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
     ASSERT(list_empty(parked));
 
     /* budget_lock nests inside runqueue lock. */
@@ -2101,7 +2101,7 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
     unsigned int cpu = vc->processor;
     s_time_t now;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     ASSERT(!is_idle_vcpu(vc));
 
@@ -2229,7 +2229,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
      * just grab the prv lock.  Instead, we'll have to trylock, and
      * do something else reasonable if we fail.
      */
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     if ( !read_trylock(&prv->lock) )
     {
@@ -2569,7 +2569,7 @@ static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
      * on either side may be empty).
      */
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
     st.lrqd = c2rqd(ops, cpu);
 
     update_runq_load(ops, st.lrqd, 0, now);
@@ -3475,7 +3475,7 @@ csched2_schedule(
     rqd = c2rqd(ops, cpu);
     BUG_ON(!cpumask_test_cpu(cpu, &rqd->active));
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     BUG_ON(!is_idle_vcpu(scurr->vcpu) && scurr->rqd != rqd);
 
@@ -3863,7 +3863,7 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 
     rqi = init_pdata(prv, pdata, cpu);
     /* Move the scheduler lock to the new runq lock. */
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock;
+    get_sched_res(cpu)->schedule_lock = &prv->rqd[rqi].lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock(old_lock);
@@ -3877,6 +3877,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 {
     struct csched2_private *prv = csched2_priv(new_ops);
     struct csched2_unit *svc = vdata;
+    struct sched_resource *sd = get_sched_res(cpu);
     unsigned rqi;
 
     ASSERT(pdata && svc && is_idle_vcpu(svc->vcpu));
@@ -3902,7 +3903,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
      * this scheduler, and so it's safe to have taken it /before/ our
      * private global lock.
      */
-    ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &prv->rqd[rqi].lock);
+    ASSERT(sd->schedule_lock != &prv->rqd[rqi].lock);
 
     write_unlock(&prv->lock);
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index defcdfcf1e..9df6f867aa 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -168,7 +168,7 @@ static void init_pdata(struct null_private *prv, unsigned int cpu)
 static void null_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     struct null_private *prv = null_priv(ops);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
 
     /* alloc_pdata is not implemented, so we want this to be NULL. */
     ASSERT(!pdata);
@@ -277,7 +277,7 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
     unsigned int cpu = v->processor, new_cpu;
     cpumask_t *cpus = cpupool_domain_cpumask(v->domain);
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     for_each_affinity_balance_step( bs )
     {
@@ -389,7 +389,7 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
                                      unsigned int cpu,
                                      void *pdata, void *vdata)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     struct null_private *prv = null_priv(new_ops);
     struct null_unit *nvc = vdata;
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 8caec5c5dc..cee0d69d54 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -75,7 +75,7 @@
 /*
  * Locking:
  * A global system lock is used to protect the RunQ and DepletedQ.
- * The global lock is referenced by schedule_data.schedule_lock
+ * The global lock is referenced by sched_res->schedule_lock
  * from all physical cpus.
  *
  * The lock is already grabbed when calling wake/sleep/schedule/ functions
@@ -176,7 +176,7 @@ static void repl_timer_handler(void *data);
 
 /*
  * System-wide private data, include global RunQueue/DepletedQ
- * Global lock is referenced by schedule_data.schedule_lock from all
+ * Global lock is referenced by sched_res->schedule_lock from all
  * physical cpus. It can be grabbed via vcpu_schedule_lock_irq()
  */
 struct rt_private {
@@ -723,7 +723,7 @@ rt_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
     }
 
     /* Move the scheduler lock to our global runqueue lock.  */
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
+    get_sched_res(cpu)->schedule_lock = &prv->lock;
 
     /* _Not_ pcpu_schedule_unlock(): per_cpu().schedule_lock changed! */
     spin_unlock_irqrestore(old_lock, flags);
@@ -736,6 +736,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 {
     struct rt_private *prv = rt_priv(new_ops);
     struct rt_unit *svc = vdata;
+    struct sched_resource *sd = get_sched_res(cpu);
 
     ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
 
@@ -745,7 +746,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
      * another scheduler, but that is how things need to be, for
      * preventing races.
      */
-    ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &prv->lock);
+    ASSERT(sd->schedule_lock != &prv->lock);
 
     /*
      * If we are the absolute first cpu being switched toward this
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c0cec0b6ca..ea53bd7183 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -61,7 +61,6 @@ static void vcpu_singleshot_timer_fn(void *data);
 static void poll_timer_fn(void *data);
 
 /* This is global for now so that private implementations can reach it */
-DEFINE_PER_CPU(struct schedule_data, schedule_data);
 DEFINE_PER_CPU(struct scheduler *, scheduler);
 DEFINE_PER_CPU(struct sched_resource *, sched_res);
 
@@ -157,7 +156,7 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
              !test_bit(v->vcpu_id, v->domain->poll_mask) )
         {
             v->is_urgent = 0;
-            atomic_dec(&per_cpu(schedule_data,v->processor).urgent_count);
+            atomic_dec(&get_sched_res(v->processor)->urgent_count);
         }
     }
     else
@@ -166,7 +165,7 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
              unlikely(test_bit(v->vcpu_id, v->domain->poll_mask)) )
         {
             v->is_urgent = 1;
-            atomic_inc(&per_cpu(schedule_data,v->processor).urgent_count);
+            atomic_inc(&get_sched_res(v->processor)->urgent_count);
         }
     }
 }
@@ -177,7 +176,7 @@ static inline void vcpu_runstate_change(
     s_time_t delta;
 
     ASSERT(v->runstate.state != new_state);
-    ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
 
     vcpu_urgent_count_update(v);
 
@@ -334,7 +333,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     /* Idle VCPUs are scheduled immediately, so don't put them in runqueue. */
     if ( is_idle_domain(d) )
     {
-        per_cpu(schedule_data, v->processor).curr = unit;
+        get_sched_res(v->processor)->curr = unit;
         v->is_running = 1;
     }
     else
@@ -459,7 +458,7 @@ void sched_destroy_vcpu(struct vcpu *v)
     kill_timer(&v->singleshot_timer);
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
-        atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
+        atomic_dec(&get_sched_res(v->processor)->urgent_count);
     sched_remove_unit(vcpu_scheduler(v), unit);
     sched_free_vdata(vcpu_scheduler(v), unit->priv);
     sched_free_unit(unit);
@@ -506,7 +505,7 @@ void sched_destroy_domain(struct domain *d)
 
 void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
-    ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
 
     if ( likely(!vcpu_runnable(v)) )
     {
@@ -601,8 +600,8 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
      */
     if ( unlikely(v->is_urgent) && (old_cpu != new_cpu) )
     {
-        atomic_inc(&per_cpu(schedule_data, new_cpu).urgent_count);
-        atomic_dec(&per_cpu(schedule_data, old_cpu).urgent_count);
+        atomic_inc(&get_sched_res(new_cpu)->urgent_count);
+        atomic_dec(&get_sched_res(old_cpu)->urgent_count);
     }
 
     /*
@@ -668,20 +667,20 @@ static void vcpu_migrate_finish(struct vcpu *v)
          * are not correct any longer after evaluating old and new cpu holding
          * the locks.
          */
-        old_lock = per_cpu(schedule_data, old_cpu).schedule_lock;
-        new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
+        old_lock = get_sched_res(old_cpu)->schedule_lock;
+        new_lock = get_sched_res(new_cpu)->schedule_lock;
 
         sched_spin_lock_double(old_lock, new_lock, &flags);
 
         old_cpu = v->processor;
-        if ( old_lock == per_cpu(schedule_data, old_cpu).schedule_lock )
+        if ( old_lock == get_sched_res(old_cpu)->schedule_lock )
         {
             /*
              * If we selected a CPU on the previosu iteration, check if it
              * remains suitable for running this vCPU.
              */
             if ( pick_called &&
-                 (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
+                 (new_lock == get_sched_res(new_cpu)->schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -689,7 +688,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
             /* Select a new CPU. */
             new_cpu = sched_pick_resource(vcpu_scheduler(v),
                                           v->sched_unit)->processor;
-            if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
+            if ( (new_lock == get_sched_res(new_cpu)->schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
             pick_called = 1;
@@ -1472,7 +1471,7 @@ static void schedule(void)
     struct scheduler     *sched;
     unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
     bool_t                tasklet_work_scheduled = 0;
-    struct schedule_data *sd;
+    struct sched_resource *sd;
     spinlock_t           *lock;
     struct task_slice     next_slice;
     int cpu = smp_processor_id();
@@ -1481,7 +1480,7 @@ static void schedule(void)
 
     SCHED_STAT_CRANK(sched_run);
 
-    sd = &this_cpu(schedule_data);
+    sd = get_sched_res(cpu);
 
     /* Update tasklet scheduling status. */
     switch ( *tasklet_work )
@@ -1623,15 +1622,14 @@ static void poll_timer_fn(void *data)
 
 static int cpu_schedule_up(unsigned int cpu)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd;
     void *sched_priv;
-    struct sched_resource *res;
 
-    res = xzalloc(struct sched_resource);
-    if ( res == NULL )
+    sd = xzalloc(struct sched_resource);
+    if ( sd == NULL )
         return -ENOMEM;
-    res->processor = cpu;
-    set_sched_res(cpu, res);
+    sd->processor = cpu;
+    set_sched_res(cpu, sd);
 
     per_cpu(scheduler, cpu) = &ops;
     spin_lock_init(&sd->_lock);
@@ -1687,7 +1685,7 @@ static int cpu_schedule_up(unsigned int cpu)
 
 static void cpu_schedule_down(unsigned int cpu)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
     sched_free_pdata(sched, sd->sched_priv, cpu);
@@ -1707,7 +1705,7 @@ static int cpu_schedule_callback(
 {
     unsigned int cpu = (unsigned long)hcpu;
     struct scheduler *sched = per_cpu(scheduler, cpu);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     int rc = 0;
 
     /*
@@ -1864,10 +1862,10 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0, 0) == NULL )
         BUG();
-    this_cpu(schedule_data).curr = idle_vcpu[0]->sched_unit;
-    this_cpu(schedule_data).sched_priv = sched_alloc_pdata(&ops, 0);
-    BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
-    sched_init_pdata(&ops, this_cpu(schedule_data).sched_priv, 0);
+    get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
+    get_sched_res(0)->sched_priv = sched_alloc_pdata(&ops, 0);
+    BUG_ON(IS_ERR(get_sched_res(0)->sched_priv));
+    sched_init_pdata(&ops, get_sched_res(0)->sched_priv, 0);
 }
 
 /*
@@ -1888,7 +1886,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
     struct cpupool *old_pool = per_cpu(cpupool, cpu);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     spinlock_t *old_lock, *new_lock;
 
     /*
@@ -2074,6 +2072,16 @@ void wait(void)
     schedule();
 }
 
+/*
+ * vcpu is urgent if vcpu is polling event channel
+ *
+ * if urgent vcpu exists, CPU should not enter deep C state
+ */
+int sched_has_urgent_vcpu(void)
+{
+    return atomic_read(&get_sched_res(smp_processor_id())->urgent_count);
+}
+
 #ifdef CONFIG_COMPAT
 #include "compat/schedule.c"
 #endif
diff --git a/xen/include/asm-x86/cpuidle.h b/xen/include/asm-x86/cpuidle.h
index 488f708305..5d7dffd228 100644
--- a/xen/include/asm-x86/cpuidle.h
+++ b/xen/include/asm-x86/cpuidle.h
@@ -4,7 +4,6 @@
 #include <xen/cpuidle.h>
 #include <xen/notifier.h>
 #include <xen/sched.h>
-#include <xen/sched-if.h>
 
 extern struct acpi_processor_power *processor_powers[];
 
@@ -27,14 +26,4 @@ void update_idle_stats(struct acpi_processor_power *,
 void update_last_cx_stat(struct acpi_processor_power *,
                          struct acpi_processor_cx *, uint64_t);
 
-/*
- * vcpu is urgent if vcpu is polling event channel
- *
- * if urgent vcpu exists, CPU should not enter deep C state
- */
-static inline int sched_has_urgent_vcpu(void)
-{
-    return atomic_read(&this_cpu(schedule_data).urgent_count);
-}
-
 #endif /* __X86_ASM_CPUIDLE_H__ */
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 9b7855e775..0443fe1d7e 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -33,22 +33,18 @@ extern int sched_ratelimit_us;
  * For cache betterness, keep the actual lock in the same cache area
  * as the rest of the struct.  Just have the scheduler point to the
  * one it wants (This may be the one right in front of it).*/
-struct schedule_data {
+struct sched_resource {
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_unit  *curr;           /* current task                    */
     void               *sched_priv;
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
+    unsigned int        processor;
 };
 
-#define curr_on_cpu(c)    (per_cpu(schedule_data, c).curr)
-
-struct sched_resource {
-    unsigned int processor;
-};
+#define curr_on_cpu(c)    (get_sched_res(c)->curr)
 
-DECLARE_PER_CPU(struct schedule_data, schedule_data);
 DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
@@ -79,7 +75,7 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
 { \
     for ( ; ; ) \
     { \
-        spinlock_t *lock = per_cpu(schedule_data, cpu).schedule_lock; \
+        spinlock_t *lock = get_sched_res(cpu)->schedule_lock; \
         /* \
          * v->processor may change when grabbing the lock; but \
          * per_cpu(v->processor) may also change, if changing cpu pool \
@@ -89,7 +85,7 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
          * lock may be the same; this will succeed in that case. \
          */ \
         spin_lock##irq(lock, ## arg); \
-        if ( likely(lock == per_cpu(schedule_data, cpu).schedule_lock) ) \
+        if ( likely(lock == get_sched_res(cpu)->schedule_lock) ) \
             return lock; \
         spin_unlock##irq(lock, ## arg); \
     } \
@@ -99,7 +95,7 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
 static inline void kind##_schedule_unlock##irq(spinlock_t *lock \
                                                EXTRA_TYPE(arg), param) \
 { \
-    ASSERT(lock == per_cpu(schedule_data, cpu).schedule_lock); \
+    ASSERT(lock == get_sched_res(cpu)->schedule_lock); \
     spin_unlock##irq(lock, ## arg); \
 }
 
@@ -128,11 +124,11 @@ sched_unlock(vcpu, const struct vcpu *v, v->processor, _irqrestore, flags)
 
 static inline spinlock_t *pcpu_schedule_trylock(unsigned int cpu)
 {
-    spinlock_t *lock = per_cpu(schedule_data, cpu).schedule_lock;
+    spinlock_t *lock = get_sched_res(cpu)->schedule_lock;
 
     if ( !spin_trylock(lock) )
         return NULL;
-    if ( lock == per_cpu(schedule_data, cpu).schedule_lock )
+    if ( lock == get_sched_res(cpu)->schedule_lock )
         return lock;
     spin_unlock(lock);
     return NULL;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d3a1a31c86..ee316cddd7 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -900,6 +900,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu);
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate);
 uint64_t get_cpu_idle_time(unsigned int cpu);
+int sched_has_urgent_vcpu(void);
 
 /*
  * Used by idle loop to decide whether there is work to do:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 11/60] xen/sched: move per cpu scheduler private data into struct sched_resource
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Tim Deegan, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich, Dario Faggioli, Roger Pau Monné

This prepares support of larger scheduling granularities, e.g. core
scheduling.

While at it move sched_has_urgent_vcpu() from include/asm-x86/cpuidle.h
into schedule.c removing the need for including sched-if.h in
cpuidle.h and multiple other C sources.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: move sched_has_urgent_vcpu()
---
 xen/arch/x86/acpi/cpu_idle.c      |  1 -
 xen/arch/x86/cpu/mcheck/mce.c     |  1 -
 xen/arch/x86/cpu/mcheck/mctelem.c |  1 -
 xen/arch/x86/setup.c              |  1 -
 xen/arch/x86/smpboot.c            |  1 -
 xen/common/sched_arinc653.c       |  4 +--
 xen/common/sched_credit.c         | 12 +++----
 xen/common/sched_credit2.c        | 21 +++++++------
 xen/common/sched_null.c           |  6 ++--
 xen/common/sched_rt.c             |  9 +++---
 xen/common/schedule.c             | 66 ++++++++++++++++++++++-----------------
 xen/include/asm-x86/cpuidle.h     | 11 -------
 xen/include/xen/sched-if.h        | 20 +++++-------
 xen/include/xen/sched.h           |  1 +
 14 files changed, 73 insertions(+), 82 deletions(-)

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index 8846722bca..9f66f70986 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -38,7 +38,6 @@
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
 #include <xen/trace.h>
-#include <xen/sched-if.h>
 #include <xen/irq.h>
 #include <asm/cache.h>
 #include <asm/io.h>
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 30cdb06401..726db75180 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -10,7 +10,6 @@
 #include <xen/errno.h>
 #include <xen/console.h>
 #include <xen/sched.h>
-#include <xen/sched-if.h>
 #include <xen/cpumask.h>
 #include <xen/event.h>
 #include <xen/guest_access.h>
diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c b/xen/arch/x86/cpu/mcheck/mctelem.c
index 3bb13e5265..012a9b95e5 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -18,7 +18,6 @@
 #include <xen/smp.h>
 #include <xen/errno.h>
 #include <xen/sched.h>
-#include <xen/sched-if.h>
 #include <xen/cpumask.h>
 #include <xen/event.h>
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 0ed94a613a..5ad66667ef 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -3,7 +3,6 @@
 #include <xen/err.h>
 #include <xen/grant_table.h>
 #include <xen/sched.h>
-#include <xen/sched-if.h>
 #include <xen/domain.h>
 #include <xen/serial.h>
 #include <xen/softirq.h>
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 274865a705..153bfbb4b7 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -25,7 +25,6 @@
 #include <xen/domain.h>
 #include <xen/domain_page.h>
 #include <xen/sched.h>
-#include <xen/sched-if.h>
 #include <xen/irq.h>
 #include <xen/delay.h>
 #include <xen/softirq.h>
diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 7f26e6a0b0..6e7b2c9968 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -475,7 +475,7 @@ a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
      * If the VCPU being put to sleep is the same one that is currently
      * running, raise a softirq to invoke the scheduler to switch domains.
      */
-    if ( per_cpu(schedule_data, vc->processor).curr == unit )
+    if ( get_sched_res(vc->processor)->curr == unit )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
 }
 
@@ -642,7 +642,7 @@ static spinlock_t *
 a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                   void *pdata, void *vdata)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     arinc653_vcpu_t *svc = vdata;
 
     ASSERT(!pdata && svc && is_idle_vcpu(svc->vc));
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index bc87b813dc..fe4fc5abb1 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -82,7 +82,7 @@
 #define CSCHED_PRIV(_ops)   \
     ((struct csched_private *)((_ops)->sched_data))
 #define CSCHED_PCPU(_c)     \
-    ((struct csched_pcpu *)per_cpu(schedule_data, _c).sched_priv)
+    ((struct csched_pcpu *)get_sched_res(_c)->sched_priv)
 #define CSCHED_UNIT(unit)   ((struct csched_unit *) (unit)->priv)
 #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
 #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
@@ -248,7 +248,7 @@ static inline bool_t is_runq_idle(unsigned int cpu)
     /*
      * We're peeking at cpu's runq, we must hold the proper lock.
      */
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     return list_empty(RUNQ(cpu)) ||
            is_idle_vcpu(__runq_elem(RUNQ(cpu)->next)->vcpu);
@@ -257,7 +257,7 @@ static inline bool_t is_runq_idle(unsigned int cpu)
 static inline void
 inc_nr_runnable(unsigned int cpu)
 {
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
     CSCHED_PCPU(cpu)->nr_runnable++;
 
 }
@@ -265,7 +265,7 @@ inc_nr_runnable(unsigned int cpu)
 static inline void
 dec_nr_runnable(unsigned int cpu)
 {
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
     ASSERT(CSCHED_PCPU(cpu)->nr_runnable >= 1);
     CSCHED_PCPU(cpu)->nr_runnable--;
 }
@@ -615,7 +615,7 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     unsigned long flags;
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
 
     /*
      * This is called either during during boot, resume or hotplug, in
@@ -635,7 +635,7 @@ static spinlock_t *
 csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                     void *pdata, void *vdata)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     struct csched_private *prv = CSCHED_PRIV(new_ops);
     struct csched_unit *svc = vdata;
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 36201815e3..99e993b32f 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -567,7 +567,7 @@ static inline struct csched2_private *csched2_priv(const struct scheduler *ops)
 
 static inline struct csched2_pcpu *csched2_pcpu(unsigned int cpu)
 {
-    return per_cpu(schedule_data, cpu).sched_priv;
+    return get_sched_res(cpu)->sched_priv;
 }
 
 static inline struct csched2_unit *csched2_unit(const struct sched_unit *unit)
@@ -1276,7 +1276,7 @@ runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
     struct list_head * runq = &c2rqd(ops, cpu)->runq;
     int pos = 0;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     ASSERT(!vcpu_on_runq(svc));
     ASSERT(c2r(cpu) == c2r(svc->vcpu->processor));
@@ -1797,7 +1797,7 @@ static bool vcpu_grab_budget(struct csched2_unit *svc)
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     if ( svc->budget > 0 )
         return true;
@@ -1844,7 +1844,7 @@ vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked)
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
     ASSERT(list_empty(parked));
 
     /* budget_lock nests inside runqueue lock. */
@@ -2101,7 +2101,7 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
     unsigned int cpu = vc->processor;
     s_time_t now;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     ASSERT(!is_idle_vcpu(vc));
 
@@ -2229,7 +2229,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
      * just grab the prv lock.  Instead, we'll have to trylock, and
      * do something else reasonable if we fail.
      */
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     if ( !read_trylock(&prv->lock) )
     {
@@ -2569,7 +2569,7 @@ static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
      * on either side may be empty).
      */
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
     st.lrqd = c2rqd(ops, cpu);
 
     update_runq_load(ops, st.lrqd, 0, now);
@@ -3475,7 +3475,7 @@ csched2_schedule(
     rqd = c2rqd(ops, cpu);
     BUG_ON(!cpumask_test_cpu(cpu, &rqd->active));
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     BUG_ON(!is_idle_vcpu(scurr->vcpu) && scurr->rqd != rqd);
 
@@ -3863,7 +3863,7 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 
     rqi = init_pdata(prv, pdata, cpu);
     /* Move the scheduler lock to the new runq lock. */
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock;
+    get_sched_res(cpu)->schedule_lock = &prv->rqd[rqi].lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock(old_lock);
@@ -3877,6 +3877,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 {
     struct csched2_private *prv = csched2_priv(new_ops);
     struct csched2_unit *svc = vdata;
+    struct sched_resource *sd = get_sched_res(cpu);
     unsigned rqi;
 
     ASSERT(pdata && svc && is_idle_vcpu(svc->vcpu));
@@ -3902,7 +3903,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
      * this scheduler, and so it's safe to have taken it /before/ our
      * private global lock.
      */
-    ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &prv->rqd[rqi].lock);
+    ASSERT(sd->schedule_lock != &prv->rqd[rqi].lock);
 
     write_unlock(&prv->lock);
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index defcdfcf1e..9df6f867aa 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -168,7 +168,7 @@ static void init_pdata(struct null_private *prv, unsigned int cpu)
 static void null_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     struct null_private *prv = null_priv(ops);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
 
     /* alloc_pdata is not implemented, so we want this to be NULL. */
     ASSERT(!pdata);
@@ -277,7 +277,7 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
     unsigned int cpu = v->processor, new_cpu;
     cpumask_t *cpus = cpupool_domain_cpumask(v->domain);
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     for_each_affinity_balance_step( bs )
     {
@@ -389,7 +389,7 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
                                      unsigned int cpu,
                                      void *pdata, void *vdata)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     struct null_private *prv = null_priv(new_ops);
     struct null_unit *nvc = vdata;
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 8caec5c5dc..cee0d69d54 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -75,7 +75,7 @@
 /*
  * Locking:
  * A global system lock is used to protect the RunQ and DepletedQ.
- * The global lock is referenced by schedule_data.schedule_lock
+ * The global lock is referenced by sched_res->schedule_lock
  * from all physical cpus.
  *
  * The lock is already grabbed when calling wake/sleep/schedule/ functions
@@ -176,7 +176,7 @@ static void repl_timer_handler(void *data);
 
 /*
  * System-wide private data, include global RunQueue/DepletedQ
- * Global lock is referenced by schedule_data.schedule_lock from all
+ * Global lock is referenced by sched_res->schedule_lock from all
  * physical cpus. It can be grabbed via vcpu_schedule_lock_irq()
  */
 struct rt_private {
@@ -723,7 +723,7 @@ rt_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
     }
 
     /* Move the scheduler lock to our global runqueue lock.  */
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
+    get_sched_res(cpu)->schedule_lock = &prv->lock;
 
     /* _Not_ pcpu_schedule_unlock(): per_cpu().schedule_lock changed! */
     spin_unlock_irqrestore(old_lock, flags);
@@ -736,6 +736,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 {
     struct rt_private *prv = rt_priv(new_ops);
     struct rt_unit *svc = vdata;
+    struct sched_resource *sd = get_sched_res(cpu);
 
     ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
 
@@ -745,7 +746,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
      * another scheduler, but that is how things need to be, for
      * preventing races.
      */
-    ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &prv->lock);
+    ASSERT(sd->schedule_lock != &prv->lock);
 
     /*
      * If we are the absolute first cpu being switched toward this
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c0cec0b6ca..ea53bd7183 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -61,7 +61,6 @@ static void vcpu_singleshot_timer_fn(void *data);
 static void poll_timer_fn(void *data);
 
 /* This is global for now so that private implementations can reach it */
-DEFINE_PER_CPU(struct schedule_data, schedule_data);
 DEFINE_PER_CPU(struct scheduler *, scheduler);
 DEFINE_PER_CPU(struct sched_resource *, sched_res);
 
@@ -157,7 +156,7 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
              !test_bit(v->vcpu_id, v->domain->poll_mask) )
         {
             v->is_urgent = 0;
-            atomic_dec(&per_cpu(schedule_data,v->processor).urgent_count);
+            atomic_dec(&get_sched_res(v->processor)->urgent_count);
         }
     }
     else
@@ -166,7 +165,7 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
              unlikely(test_bit(v->vcpu_id, v->domain->poll_mask)) )
         {
             v->is_urgent = 1;
-            atomic_inc(&per_cpu(schedule_data,v->processor).urgent_count);
+            atomic_inc(&get_sched_res(v->processor)->urgent_count);
         }
     }
 }
@@ -177,7 +176,7 @@ static inline void vcpu_runstate_change(
     s_time_t delta;
 
     ASSERT(v->runstate.state != new_state);
-    ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
 
     vcpu_urgent_count_update(v);
 
@@ -334,7 +333,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     /* Idle VCPUs are scheduled immediately, so don't put them in runqueue. */
     if ( is_idle_domain(d) )
     {
-        per_cpu(schedule_data, v->processor).curr = unit;
+        get_sched_res(v->processor)->curr = unit;
         v->is_running = 1;
     }
     else
@@ -459,7 +458,7 @@ void sched_destroy_vcpu(struct vcpu *v)
     kill_timer(&v->singleshot_timer);
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
-        atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
+        atomic_dec(&get_sched_res(v->processor)->urgent_count);
     sched_remove_unit(vcpu_scheduler(v), unit);
     sched_free_vdata(vcpu_scheduler(v), unit->priv);
     sched_free_unit(unit);
@@ -506,7 +505,7 @@ void sched_destroy_domain(struct domain *d)
 
 void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
-    ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
 
     if ( likely(!vcpu_runnable(v)) )
     {
@@ -601,8 +600,8 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
      */
     if ( unlikely(v->is_urgent) && (old_cpu != new_cpu) )
     {
-        atomic_inc(&per_cpu(schedule_data, new_cpu).urgent_count);
-        atomic_dec(&per_cpu(schedule_data, old_cpu).urgent_count);
+        atomic_inc(&get_sched_res(new_cpu)->urgent_count);
+        atomic_dec(&get_sched_res(old_cpu)->urgent_count);
     }
 
     /*
@@ -668,20 +667,20 @@ static void vcpu_migrate_finish(struct vcpu *v)
          * are not correct any longer after evaluating old and new cpu holding
          * the locks.
          */
-        old_lock = per_cpu(schedule_data, old_cpu).schedule_lock;
-        new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
+        old_lock = get_sched_res(old_cpu)->schedule_lock;
+        new_lock = get_sched_res(new_cpu)->schedule_lock;
 
         sched_spin_lock_double(old_lock, new_lock, &flags);
 
         old_cpu = v->processor;
-        if ( old_lock == per_cpu(schedule_data, old_cpu).schedule_lock )
+        if ( old_lock == get_sched_res(old_cpu)->schedule_lock )
         {
             /*
              * If we selected a CPU on the previosu iteration, check if it
              * remains suitable for running this vCPU.
              */
             if ( pick_called &&
-                 (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
+                 (new_lock == get_sched_res(new_cpu)->schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -689,7 +688,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
             /* Select a new CPU. */
             new_cpu = sched_pick_resource(vcpu_scheduler(v),
                                           v->sched_unit)->processor;
-            if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
+            if ( (new_lock == get_sched_res(new_cpu)->schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
             pick_called = 1;
@@ -1472,7 +1471,7 @@ static void schedule(void)
     struct scheduler     *sched;
     unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
     bool_t                tasklet_work_scheduled = 0;
-    struct schedule_data *sd;
+    struct sched_resource *sd;
     spinlock_t           *lock;
     struct task_slice     next_slice;
     int cpu = smp_processor_id();
@@ -1481,7 +1480,7 @@ static void schedule(void)
 
     SCHED_STAT_CRANK(sched_run);
 
-    sd = &this_cpu(schedule_data);
+    sd = get_sched_res(cpu);
 
     /* Update tasklet scheduling status. */
     switch ( *tasklet_work )
@@ -1623,15 +1622,14 @@ static void poll_timer_fn(void *data)
 
 static int cpu_schedule_up(unsigned int cpu)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd;
     void *sched_priv;
-    struct sched_resource *res;
 
-    res = xzalloc(struct sched_resource);
-    if ( res == NULL )
+    sd = xzalloc(struct sched_resource);
+    if ( sd == NULL )
         return -ENOMEM;
-    res->processor = cpu;
-    set_sched_res(cpu, res);
+    sd->processor = cpu;
+    set_sched_res(cpu, sd);
 
     per_cpu(scheduler, cpu) = &ops;
     spin_lock_init(&sd->_lock);
@@ -1687,7 +1685,7 @@ static int cpu_schedule_up(unsigned int cpu)
 
 static void cpu_schedule_down(unsigned int cpu)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
     sched_free_pdata(sched, sd->sched_priv, cpu);
@@ -1707,7 +1705,7 @@ static int cpu_schedule_callback(
 {
     unsigned int cpu = (unsigned long)hcpu;
     struct scheduler *sched = per_cpu(scheduler, cpu);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     int rc = 0;
 
     /*
@@ -1864,10 +1862,10 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0, 0) == NULL )
         BUG();
-    this_cpu(schedule_data).curr = idle_vcpu[0]->sched_unit;
-    this_cpu(schedule_data).sched_priv = sched_alloc_pdata(&ops, 0);
-    BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
-    sched_init_pdata(&ops, this_cpu(schedule_data).sched_priv, 0);
+    get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
+    get_sched_res(0)->sched_priv = sched_alloc_pdata(&ops, 0);
+    BUG_ON(IS_ERR(get_sched_res(0)->sched_priv));
+    sched_init_pdata(&ops, get_sched_res(0)->sched_priv, 0);
 }
 
 /*
@@ -1888,7 +1886,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
     struct cpupool *old_pool = per_cpu(cpupool, cpu);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
     spinlock_t *old_lock, *new_lock;
 
     /*
@@ -2074,6 +2072,16 @@ void wait(void)
     schedule();
 }
 
+/*
+ * vcpu is urgent if vcpu is polling event channel
+ *
+ * if urgent vcpu exists, CPU should not enter deep C state
+ */
+int sched_has_urgent_vcpu(void)
+{
+    return atomic_read(&get_sched_res(smp_processor_id())->urgent_count);
+}
+
 #ifdef CONFIG_COMPAT
 #include "compat/schedule.c"
 #endif
diff --git a/xen/include/asm-x86/cpuidle.h b/xen/include/asm-x86/cpuidle.h
index 488f708305..5d7dffd228 100644
--- a/xen/include/asm-x86/cpuidle.h
+++ b/xen/include/asm-x86/cpuidle.h
@@ -4,7 +4,6 @@
 #include <xen/cpuidle.h>
 #include <xen/notifier.h>
 #include <xen/sched.h>
-#include <xen/sched-if.h>
 
 extern struct acpi_processor_power *processor_powers[];
 
@@ -27,14 +26,4 @@ void update_idle_stats(struct acpi_processor_power *,
 void update_last_cx_stat(struct acpi_processor_power *,
                          struct acpi_processor_cx *, uint64_t);
 
-/*
- * vcpu is urgent if vcpu is polling event channel
- *
- * if urgent vcpu exists, CPU should not enter deep C state
- */
-static inline int sched_has_urgent_vcpu(void)
-{
-    return atomic_read(&this_cpu(schedule_data).urgent_count);
-}
-
 #endif /* __X86_ASM_CPUIDLE_H__ */
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 9b7855e775..0443fe1d7e 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -33,22 +33,18 @@ extern int sched_ratelimit_us;
  * For cache betterness, keep the actual lock in the same cache area
  * as the rest of the struct.  Just have the scheduler point to the
  * one it wants (This may be the one right in front of it).*/
-struct schedule_data {
+struct sched_resource {
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_unit  *curr;           /* current task                    */
     void               *sched_priv;
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
+    unsigned int        processor;
 };
 
-#define curr_on_cpu(c)    (per_cpu(schedule_data, c).curr)
-
-struct sched_resource {
-    unsigned int processor;
-};
+#define curr_on_cpu(c)    (get_sched_res(c)->curr)
 
-DECLARE_PER_CPU(struct schedule_data, schedule_data);
 DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
@@ -79,7 +75,7 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
 { \
     for ( ; ; ) \
     { \
-        spinlock_t *lock = per_cpu(schedule_data, cpu).schedule_lock; \
+        spinlock_t *lock = get_sched_res(cpu)->schedule_lock; \
         /* \
          * v->processor may change when grabbing the lock; but \
          * per_cpu(v->processor) may also change, if changing cpu pool \
@@ -89,7 +85,7 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
          * lock may be the same; this will succeed in that case. \
          */ \
         spin_lock##irq(lock, ## arg); \
-        if ( likely(lock == per_cpu(schedule_data, cpu).schedule_lock) ) \
+        if ( likely(lock == get_sched_res(cpu)->schedule_lock) ) \
             return lock; \
         spin_unlock##irq(lock, ## arg); \
     } \
@@ -99,7 +95,7 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
 static inline void kind##_schedule_unlock##irq(spinlock_t *lock \
                                                EXTRA_TYPE(arg), param) \
 { \
-    ASSERT(lock == per_cpu(schedule_data, cpu).schedule_lock); \
+    ASSERT(lock == get_sched_res(cpu)->schedule_lock); \
     spin_unlock##irq(lock, ## arg); \
 }
 
@@ -128,11 +124,11 @@ sched_unlock(vcpu, const struct vcpu *v, v->processor, _irqrestore, flags)
 
 static inline spinlock_t *pcpu_schedule_trylock(unsigned int cpu)
 {
-    spinlock_t *lock = per_cpu(schedule_data, cpu).schedule_lock;
+    spinlock_t *lock = get_sched_res(cpu)->schedule_lock;
 
     if ( !spin_trylock(lock) )
         return NULL;
-    if ( lock == per_cpu(schedule_data, cpu).schedule_lock )
+    if ( lock == get_sched_res(cpu)->schedule_lock )
         return lock;
     spin_unlock(lock);
     return NULL;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d3a1a31c86..ee316cddd7 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -900,6 +900,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu);
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate);
 uint64_t get_cpu_idle_time(unsigned int cpu);
+int sched_has_urgent_vcpu(void);
 
 /*
  * Used by idle loop to decide whether there is work to do:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 12/60] xen/sched: switch vcpu_schedule_lock to unit_schedule_lock
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Rename vcpu_schedule_[un]lock[_irq]() to unit_schedule_[un]lock[_irq]()
and let it take a sched_unit pointer instead of a vcpu pointer as
parameter.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c  | 17 ++++++++--------
 xen/common/sched_credit2.c | 40 ++++++++++++++++++-------------------
 xen/common/sched_null.c    | 14 ++++++-------
 xen/common/sched_rt.c      | 15 +++++++-------
 xen/common/schedule.c      | 50 +++++++++++++++++++++++++---------------------
 xen/include/xen/sched-if.h | 12 +++++------
 6 files changed, 76 insertions(+), 72 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index fe4fc5abb1..74c85df334 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -931,7 +931,8 @@ __csched_vcpu_acct_stop_locked(struct csched_private *prv,
 static void
 csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
 {
-    struct csched_unit * const svc = CSCHED_UNIT(current->sched_unit);
+    struct sched_unit *currunit = current->sched_unit;
+    struct csched_unit * const svc = CSCHED_UNIT(currunit);
     const struct scheduler *ops = per_cpu(scheduler, cpu);
 
     ASSERT( current->processor == cpu );
@@ -967,7 +968,7 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
     {
         unsigned int new_cpu;
         unsigned long flags;
-        spinlock_t *lock = vcpu_schedule_lock_irqsave(current, &flags);
+        spinlock_t *lock = unit_schedule_lock_irqsave(currunit, &flags);
 
         /*
          * If it's been active a while, check if we'd be better off
@@ -976,7 +977,7 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
          */
         new_cpu = _csched_cpu_pick(ops, current, 0);
 
-        vcpu_schedule_unlock_irqrestore(lock, flags, current);
+        unit_schedule_unlock_irqrestore(lock, flags, currunit);
 
         if ( new_cpu != cpu )
         {
@@ -1028,19 +1029,19 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     BUG_ON( is_idle_vcpu(vc) );
 
     /* csched_res_pick() looks in vc->processor's runq, so we need the lock. */
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     unit->res = csched_res_pick(ops, unit);
     vc->processor = unit->res->processor;
 
     spin_unlock_irq(lock);
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )
         runq_insert(svc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 
     SCHED_STAT_CRANK(vcpu_insert);
 }
@@ -2137,12 +2138,12 @@ csched_dump(const struct scheduler *ops)
             spinlock_t *lock;
 
             svc = list_entry(iter_svc, struct csched_unit, active_vcpu_elem);
-            lock = vcpu_schedule_lock(svc->vcpu);
+            lock = unit_schedule_lock(svc->vcpu->sched_unit);
 
             printk("\t%3d: ", ++loop);
             csched_dump_vcpu(svc);
 
-            vcpu_schedule_unlock(lock, svc->vcpu);
+            unit_schedule_unlock(lock, svc->vcpu->sched_unit);
         }
     }
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 99e993b32f..562e73d99e 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -171,7 +171,7 @@
  * - runqueue lock
  *  + it is per-runqueue, so:
  *   * cpus in a runqueue take the runqueue lock, when using
- *     pcpu_schedule_lock() / vcpu_schedule_lock() (and friends),
+ *     pcpu_schedule_lock() / unit_schedule_lock() (and friends),
  *   * a cpu may (try to) take a "remote" runqueue lock, e.g., for
  *     load balancing;
  *  + serializes runqueue operations (removing and inserting vcpus);
@@ -1890,7 +1890,7 @@ unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
         unsigned long flags;
         s_time_t now;
 
-        lock = vcpu_schedule_lock_irqsave(svc->vcpu, &flags);
+        lock = unit_schedule_lock_irqsave(svc->vcpu->sched_unit, &flags);
 
         __clear_bit(_VPF_parked, &svc->vcpu->pause_flags);
         if ( unlikely(svc->flags & CSFLAG_scheduled) )
@@ -1923,7 +1923,7 @@ unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
         }
         list_del_init(&svc->parked_elem);
 
-        vcpu_schedule_unlock_irqrestore(lock, flags, svc->vcpu);
+        unit_schedule_unlock_irqrestore(lock, flags, svc->vcpu->sched_unit);
     }
 }
 
@@ -2162,7 +2162,7 @@ csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
     struct csched2_unit * const svc = csched2_unit(unit);
-    spinlock_t *lock = vcpu_schedule_lock_irq(vc);
+    spinlock_t *lock = unit_schedule_lock_irq(unit);
     s_time_t now = NOW();
     LIST_HEAD(were_parked);
 
@@ -2194,7 +2194,7 @@ csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
     else if ( !is_idle_vcpu(vc) )
         update_load(ops, svc->rqd, svc, -1, now);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 
     unpark_parked_vcpus(ops, &were_parked);
 }
@@ -2847,14 +2847,14 @@ csched2_dom_cntl(
             for_each_vcpu ( d, v )
             {
                 struct csched2_unit *svc = csched2_unit(v->sched_unit);
-                spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
+                spinlock_t *lock = unit_schedule_lock(svc->vcpu->sched_unit);
 
                 ASSERT(svc->rqd == c2rqd(ops, svc->vcpu->processor));
 
                 svc->weight = sdom->weight;
                 update_max_weight(svc->rqd, svc->weight, old_weight);
 
-                vcpu_schedule_unlock(lock, svc->vcpu);
+                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
             }
         }
         /* Cap */
@@ -2885,7 +2885,7 @@ csched2_dom_cntl(
             for_each_vcpu ( d, v )
             {
                 svc = csched2_unit(v->sched_unit);
-                lock = vcpu_schedule_lock(svc->vcpu);
+                lock = unit_schedule_lock(svc->vcpu->sched_unit);
                 /*
                  * Too small quotas would in theory cause a lot of overhead,
                  * which then won't happen because, in csched2_runtime(),
@@ -2893,7 +2893,7 @@ csched2_dom_cntl(
                  */
                 svc->budget_quota = max(sdom->tot_budget / sdom->nr_vcpus,
                                         CSCHED2_MIN_TIMER);
-                vcpu_schedule_unlock(lock, svc->vcpu);
+                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
             }
 
             if ( sdom->cap == 0 )
@@ -2928,7 +2928,7 @@ csched2_dom_cntl(
                 for_each_vcpu ( d, v )
                 {
                     svc = csched2_unit(v->sched_unit);
-                    lock = vcpu_schedule_lock(svc->vcpu);
+                    lock = unit_schedule_lock(svc->vcpu->sched_unit);
                     if ( v->is_running )
                     {
                         unsigned int cpu = v->processor;
@@ -2959,7 +2959,7 @@ csched2_dom_cntl(
                         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
                     }
                     svc->budget = 0;
-                    vcpu_schedule_unlock(lock, svc->vcpu);
+                    unit_schedule_unlock(lock, svc->vcpu->sched_unit);
                 }
             }
 
@@ -2975,12 +2975,12 @@ csched2_dom_cntl(
             for_each_vcpu ( d, v )
             {
                 struct csched2_unit *svc = csched2_unit(v->sched_unit);
-                spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
+                spinlock_t *lock = unit_schedule_lock(svc->vcpu->sched_unit);
 
                 svc->budget = STIME_MAX;
                 svc->budget_quota = 0;
 
-                vcpu_schedule_unlock(lock, svc->vcpu);
+                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
             }
             sdom->cap = 0;
             /*
@@ -3119,19 +3119,19 @@ csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     ASSERT(list_empty(&svc->runq_elem));
 
     /* csched2_res_pick() expects the pcpu lock to be held */
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     unit->res = csched2_res_pick(ops, unit);
     vc->processor = unit->res->processor;
 
     spin_unlock_irq(lock);
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     /* Add vcpu to runqueue of initial processor */
     runq_assign(ops, vc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 
     sdom->nr_vcpus++;
 
@@ -3161,11 +3161,11 @@ csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     SCHED_STAT_CRANK(vcpu_remove);
 
     /* Remove from runqueue */
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     runq_deassign(ops, vc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 
     svc->sdom->nr_vcpus--;
 }
@@ -3749,12 +3749,12 @@ csched2_dump(const struct scheduler *ops)
             struct csched2_unit * const svc = csched2_unit(v->sched_unit);
             spinlock_t *lock;
 
-            lock = vcpu_schedule_lock(svc->vcpu);
+            lock = unit_schedule_lock(svc->vcpu->sched_unit);
 
             printk("\t%3d: ", ++loop);
             csched2_dump_vcpu(prv, svc);
 
-            vcpu_schedule_unlock(lock, svc->vcpu);
+            unit_schedule_unlock(lock, svc->vcpu->sched_unit);
         }
     }
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 9df6f867aa..ee3a8cf064 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -317,7 +317,7 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
      * all the pCPUs are busy.
      *
      * In fact, there must always be something sane in v->processor, or
-     * vcpu_schedule_lock() and friends won't work. This is not a problem,
+     * unit_schedule_lock() and friends won't work. This is not a problem,
      * as we will actually assign the vCPU to the pCPU we return from here,
      * only if the pCPU is free.
      */
@@ -420,7 +420,7 @@ static void null_unit_insert(const struct scheduler *ops,
 
     ASSERT(!is_idle_vcpu(v));
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = unit_schedule_lock_irq(unit);
  retry:
 
     unit->res = pick_res(prv, unit);
@@ -428,7 +428,7 @@ static void null_unit_insert(const struct scheduler *ops,
 
     spin_unlock(lock);
 
-    lock = vcpu_schedule_lock(v);
+    lock = unit_schedule_lock(unit);
 
     cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
                 cpupool_domain_cpumask(v->domain));
@@ -514,7 +514,7 @@ static void null_unit_remove(const struct scheduler *ops,
 
     ASSERT(!is_idle_vcpu(v));
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = unit_schedule_lock_irq(unit);
 
     /* If v is in waitqueue, just get it out of there and bail */
     if ( unlikely(!list_empty(&nvc->waitq_elem)) )
@@ -532,7 +532,7 @@ static void null_unit_remove(const struct scheduler *ops,
     _vcpu_remove(prv, v);
 
  out:
-    vcpu_schedule_unlock_irq(lock, v);
+    unit_schedule_unlock_irq(lock, unit);
 
     SCHED_STAT_CRANK(vcpu_remove);
 }
@@ -852,13 +852,13 @@ static void null_dump(const struct scheduler *ops)
             struct null_unit * const nvc = null_unit(v->sched_unit);
             spinlock_t *lock;
 
-            lock = vcpu_schedule_lock(nvc->vcpu);
+            lock = unit_schedule_lock(nvc->vcpu->sched_unit);
 
             printk("\t%3d: ", ++loop);
             dump_vcpu(prv, nvc);
             printk("\n");
 
-            vcpu_schedule_unlock(lock, nvc->vcpu);
+            unit_schedule_unlock(lock, nvc->vcpu->sched_unit);
         }
     }
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index cee0d69d54..cd737131a3 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -177,7 +177,7 @@ static void repl_timer_handler(void *data);
 /*
  * System-wide private data, include global RunQueue/DepletedQ
  * Global lock is referenced by sched_res->schedule_lock from all
- * physical cpus. It can be grabbed via vcpu_schedule_lock_irq()
+ * physical cpus. It can be grabbed via unit_schedule_lock_irq()
  */
 struct rt_private {
     spinlock_t lock;            /* the global coarse-grained lock */
@@ -897,7 +897,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     unit->res = rt_res_pick(ops, unit);
     vc->processor = unit->res->processor;
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     now = NOW();
     if ( now >= svc->cur_deadline )
@@ -910,7 +910,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
         if ( !vc->is_running )
             runq_insert(ops, svc);
     }
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 
     SCHED_STAT_CRANK(vcpu_insert);
 }
@@ -921,7 +921,6 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 static void
 rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit * const svc = rt_unit(unit);
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
@@ -930,14 +929,14 @@ rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 
     BUG_ON( sdom == NULL );
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
     if ( vcpu_on_q(svc) )
         q_remove(svc);
 
     if ( vcpu_on_replq(svc) )
         replq_remove(ops,svc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 }
 
 /*
@@ -1332,7 +1331,7 @@ rt_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
     struct rt_unit *svc = rt_unit(unit);
-    spinlock_t *lock = vcpu_schedule_lock_irq(vc);
+    spinlock_t *lock = unit_schedule_lock_irq(unit);
 
     __clear_bit(__RTDS_scheduled, &svc->flags);
     /* not insert idle vcpu to runq */
@@ -1349,7 +1348,7 @@ rt_context_saved(const struct scheduler *ops, struct sched_unit *unit)
         replq_remove(ops, svc);
 
 out:
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 }
 
 /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ea53bd7183..1321c86111 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -194,7 +194,8 @@ static inline void vcpu_runstate_change(
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
 {
-    spinlock_t *lock = likely(v == current) ? NULL : vcpu_schedule_lock_irq(v);
+    spinlock_t *lock = likely(v == current)
+                       ? NULL : unit_schedule_lock_irq(v->sched_unit);
     s_time_t delta;
 
     memcpy(runstate, &v->runstate, sizeof(*runstate));
@@ -203,7 +204,7 @@ void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
         runstate->time[runstate->state] += delta;
 
     if ( unlikely(lock != NULL) )
-        vcpu_schedule_unlock_irq(lock, v);
+        unit_schedule_unlock_irq(lock, v->sched_unit);
 }
 
 uint64_t get_cpu_idle_time(unsigned int cpu)
@@ -415,7 +416,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         migrate_timer(&v->singleshot_timer, new_p);
         migrate_timer(&v->poll_timer, new_p);
 
-        lock = vcpu_schedule_lock_irq(v);
+        lock = unit_schedule_lock_irq(v->sched_unit);
 
         sched_set_affinity(v, &cpumask_all, &cpumask_all);
 
@@ -424,7 +425,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
-         * - use vcpu_schedule_unlock_irq().
+         * - use unit_schedule_unlock_irq().
          */
         spin_unlock_irq(lock);
 
@@ -523,11 +524,11 @@ void vcpu_sleep_nosync(struct vcpu *v)
 
     TRACE_2D(TRC_SCHED_SLEEP, v->domain->domain_id, v->vcpu_id);
 
-    lock = vcpu_schedule_lock_irqsave(v, &flags);
+    lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
 
     vcpu_sleep_nosync_locked(v);
 
-    vcpu_schedule_unlock_irqrestore(lock, flags, v);
+    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
 }
 
 void vcpu_sleep_sync(struct vcpu *v)
@@ -547,7 +548,7 @@ void vcpu_wake(struct vcpu *v)
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
-    lock = vcpu_schedule_lock_irqsave(v, &flags);
+    lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
 
     if ( likely(vcpu_runnable(v)) )
     {
@@ -561,7 +562,7 @@ void vcpu_wake(struct vcpu *v)
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
     }
 
-    vcpu_schedule_unlock_irqrestore(lock, flags, v);
+    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
 }
 
 void vcpu_unblock(struct vcpu *v)
@@ -629,9 +630,9 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
  * These steps are encapsulated in the following two functions; they
  * should be called like this:
  *
- *     lock = vcpu_schedule_lock_irq(v);
+ *     lock = unit_schedule_lock_irq(unit);
  *     vcpu_migrate_start(v);
- *     vcpu_schedule_unlock_irq(lock, v)
+ *     unit_schedule_unlock_irq(lock, unit)
  *     vcpu_migrate_finish(v);
  *
  * vcpu_migrate_finish() will do the work now if it can, or simply
@@ -736,12 +737,12 @@ static void vcpu_migrate_finish(struct vcpu *v)
  */
 void vcpu_force_reschedule(struct vcpu *v)
 {
-    spinlock_t *lock = vcpu_schedule_lock_irq(v);
+    spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit);
 
     if ( v->is_running )
         vcpu_migrate_start(v);
 
-    vcpu_schedule_unlock_irq(lock, v);
+    unit_schedule_unlock_irq(lock, v->sched_unit);
 
     vcpu_migrate_finish(v);
 }
@@ -792,7 +793,7 @@ void restore_vcpu_affinity(struct domain *d)
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
         v->sched_unit->res = get_sched_res(v->processor);
 
-        lock = vcpu_schedule_lock_irq(v);
+        lock = unit_schedule_lock_irq(v->sched_unit);
         v->sched_unit->res = sched_pick_resource(vcpu_scheduler(v),
                                                  v->sched_unit);
         v->processor = v->sched_unit->res->processor;
@@ -827,7 +828,7 @@ int cpu_disable_scheduler(unsigned int cpu)
         for_each_vcpu ( d, v )
         {
             unsigned long flags;
-            spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
+            spinlock_t *lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
 
             cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
@@ -836,7 +837,7 @@ int cpu_disable_scheduler(unsigned int cpu)
                 if ( v->affinity_broken )
                 {
                     /* The vcpu is temporarily pinned, can't move it. */
-                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
+                    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
                     ret = -EADDRINUSE;
                     break;
                 }
@@ -849,7 +850,7 @@ int cpu_disable_scheduler(unsigned int cpu)
             if ( v->processor != cpu )
             {
                 /* The vcpu is not on this cpu, so we can move on. */
-                vcpu_schedule_unlock_irqrestore(lock, flags, v);
+                unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
                 continue;
             }
 
@@ -862,7 +863,7 @@ int cpu_disable_scheduler(unsigned int cpu)
              *    things would have failed before getting in here.
              */
             vcpu_migrate_start(v);
-            vcpu_schedule_unlock_irqrestore(lock, flags, v);
+            unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
 
             vcpu_migrate_finish(v);
 
@@ -926,7 +927,7 @@ static int vcpu_set_affinity(
     spinlock_t *lock;
     int ret = 0;
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = unit_schedule_lock_irq(v->sched_unit);
 
     if ( v->affinity_broken )
         ret = -EBUSY;
@@ -948,7 +949,7 @@ static int vcpu_set_affinity(
         vcpu_migrate_start(v);
     }
 
-    vcpu_schedule_unlock_irq(lock, v);
+    unit_schedule_unlock_irq(lock, v->sched_unit);
 
     domain_update_node_affinity(v->domain);
 
@@ -1080,10 +1081,10 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=current;
-    spinlock_t *lock = vcpu_schedule_lock_irq(v);
+    spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit);
 
     sched_yield(vcpu_scheduler(v), v->sched_unit);
-    vcpu_schedule_unlock_irq(lock, v);
+    unit_schedule_unlock_irq(lock, v->sched_unit);
 
     SCHED_STAT_CRANK(vcpu_yield);
 
@@ -1169,7 +1170,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     spinlock_t *lock;
     int ret = -EINVAL;
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = unit_schedule_lock_irq(v->sched_unit);
 
     if ( cpu < 0 )
     {
@@ -1196,7 +1197,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     if ( ret == 0 )
         vcpu_migrate_start(v);
 
-    vcpu_schedule_unlock_irq(lock, v);
+    unit_schedule_unlock_irq(lock, v->sched_unit);
 
     domain_update_node_affinity(v->domain);
 
@@ -1663,6 +1664,9 @@ static int cpu_schedule_up(unsigned int cpu)
         unit->priv = sched_alloc_vdata(&ops, unit, idle->domain->sched_priv);
         if ( unit->priv == NULL )
             return -ENOMEM;
+
+        /* Update the resource pointer in the idle unit. */
+        unit->res = sd;
     }
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 0443fe1d7e..20e36ea39b 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -101,22 +101,22 @@ static inline void kind##_schedule_unlock##irq(spinlock_t *lock \
 
 #define EXTRA_TYPE(arg)
 sched_lock(pcpu, unsigned int cpu,     cpu, )
-sched_lock(vcpu, const struct vcpu *v, v->processor, )
+sched_lock(unit, const struct sched_unit *i, i->res->processor, )
 sched_lock(pcpu, unsigned int cpu,     cpu,          _irq)
-sched_lock(vcpu, const struct vcpu *v, v->processor, _irq)
+sched_lock(unit, const struct sched_unit *i, i->res->processor, _irq)
 sched_unlock(pcpu, unsigned int cpu,     cpu, )
-sched_unlock(vcpu, const struct vcpu *v, v->processor, )
+sched_unlock(unit, const struct sched_unit *i, i->res->processor, )
 sched_unlock(pcpu, unsigned int cpu,     cpu,          _irq)
-sched_unlock(vcpu, const struct vcpu *v, v->processor, _irq)
+sched_unlock(unit, const struct sched_unit *i, i->res->processor, _irq)
 #undef EXTRA_TYPE
 
 #define EXTRA_TYPE(arg) , unsigned long arg
 #define spin_unlock_irqsave spin_unlock_irqrestore
 sched_lock(pcpu, unsigned int cpu,     cpu,          _irqsave, *flags)
-sched_lock(vcpu, const struct vcpu *v, v->processor, _irqsave, *flags)
+sched_lock(unit, const struct sched_unit *i, i->res->processor, _irqsave, *flags)
 #undef spin_unlock_irqsave
 sched_unlock(pcpu, unsigned int cpu,     cpu,          _irqrestore, flags)
-sched_unlock(vcpu, const struct vcpu *v, v->processor, _irqrestore, flags)
+sched_unlock(unit, const struct sched_unit *i, i->res->processor, _irqrestore, flags)
 #undef EXTRA_TYPE
 
 #undef sched_unlock
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 12/60] xen/sched: switch vcpu_schedule_lock to unit_schedule_lock
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Rename vcpu_schedule_[un]lock[_irq]() to unit_schedule_[un]lock[_irq]()
and let it take a sched_unit pointer instead of a vcpu pointer as
parameter.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c  | 17 ++++++++--------
 xen/common/sched_credit2.c | 40 ++++++++++++++++++-------------------
 xen/common/sched_null.c    | 14 ++++++-------
 xen/common/sched_rt.c      | 15 +++++++-------
 xen/common/schedule.c      | 50 +++++++++++++++++++++++++---------------------
 xen/include/xen/sched-if.h | 12 +++++------
 6 files changed, 76 insertions(+), 72 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index fe4fc5abb1..74c85df334 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -931,7 +931,8 @@ __csched_vcpu_acct_stop_locked(struct csched_private *prv,
 static void
 csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
 {
-    struct csched_unit * const svc = CSCHED_UNIT(current->sched_unit);
+    struct sched_unit *currunit = current->sched_unit;
+    struct csched_unit * const svc = CSCHED_UNIT(currunit);
     const struct scheduler *ops = per_cpu(scheduler, cpu);
 
     ASSERT( current->processor == cpu );
@@ -967,7 +968,7 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
     {
         unsigned int new_cpu;
         unsigned long flags;
-        spinlock_t *lock = vcpu_schedule_lock_irqsave(current, &flags);
+        spinlock_t *lock = unit_schedule_lock_irqsave(currunit, &flags);
 
         /*
          * If it's been active a while, check if we'd be better off
@@ -976,7 +977,7 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
          */
         new_cpu = _csched_cpu_pick(ops, current, 0);
 
-        vcpu_schedule_unlock_irqrestore(lock, flags, current);
+        unit_schedule_unlock_irqrestore(lock, flags, currunit);
 
         if ( new_cpu != cpu )
         {
@@ -1028,19 +1029,19 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     BUG_ON( is_idle_vcpu(vc) );
 
     /* csched_res_pick() looks in vc->processor's runq, so we need the lock. */
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     unit->res = csched_res_pick(ops, unit);
     vc->processor = unit->res->processor;
 
     spin_unlock_irq(lock);
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )
         runq_insert(svc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 
     SCHED_STAT_CRANK(vcpu_insert);
 }
@@ -2137,12 +2138,12 @@ csched_dump(const struct scheduler *ops)
             spinlock_t *lock;
 
             svc = list_entry(iter_svc, struct csched_unit, active_vcpu_elem);
-            lock = vcpu_schedule_lock(svc->vcpu);
+            lock = unit_schedule_lock(svc->vcpu->sched_unit);
 
             printk("\t%3d: ", ++loop);
             csched_dump_vcpu(svc);
 
-            vcpu_schedule_unlock(lock, svc->vcpu);
+            unit_schedule_unlock(lock, svc->vcpu->sched_unit);
         }
     }
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 99e993b32f..562e73d99e 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -171,7 +171,7 @@
  * - runqueue lock
  *  + it is per-runqueue, so:
  *   * cpus in a runqueue take the runqueue lock, when using
- *     pcpu_schedule_lock() / vcpu_schedule_lock() (and friends),
+ *     pcpu_schedule_lock() / unit_schedule_lock() (and friends),
  *   * a cpu may (try to) take a "remote" runqueue lock, e.g., for
  *     load balancing;
  *  + serializes runqueue operations (removing and inserting vcpus);
@@ -1890,7 +1890,7 @@ unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
         unsigned long flags;
         s_time_t now;
 
-        lock = vcpu_schedule_lock_irqsave(svc->vcpu, &flags);
+        lock = unit_schedule_lock_irqsave(svc->vcpu->sched_unit, &flags);
 
         __clear_bit(_VPF_parked, &svc->vcpu->pause_flags);
         if ( unlikely(svc->flags & CSFLAG_scheduled) )
@@ -1923,7 +1923,7 @@ unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
         }
         list_del_init(&svc->parked_elem);
 
-        vcpu_schedule_unlock_irqrestore(lock, flags, svc->vcpu);
+        unit_schedule_unlock_irqrestore(lock, flags, svc->vcpu->sched_unit);
     }
 }
 
@@ -2162,7 +2162,7 @@ csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
     struct csched2_unit * const svc = csched2_unit(unit);
-    spinlock_t *lock = vcpu_schedule_lock_irq(vc);
+    spinlock_t *lock = unit_schedule_lock_irq(unit);
     s_time_t now = NOW();
     LIST_HEAD(were_parked);
 
@@ -2194,7 +2194,7 @@ csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
     else if ( !is_idle_vcpu(vc) )
         update_load(ops, svc->rqd, svc, -1, now);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 
     unpark_parked_vcpus(ops, &were_parked);
 }
@@ -2847,14 +2847,14 @@ csched2_dom_cntl(
             for_each_vcpu ( d, v )
             {
                 struct csched2_unit *svc = csched2_unit(v->sched_unit);
-                spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
+                spinlock_t *lock = unit_schedule_lock(svc->vcpu->sched_unit);
 
                 ASSERT(svc->rqd == c2rqd(ops, svc->vcpu->processor));
 
                 svc->weight = sdom->weight;
                 update_max_weight(svc->rqd, svc->weight, old_weight);
 
-                vcpu_schedule_unlock(lock, svc->vcpu);
+                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
             }
         }
         /* Cap */
@@ -2885,7 +2885,7 @@ csched2_dom_cntl(
             for_each_vcpu ( d, v )
             {
                 svc = csched2_unit(v->sched_unit);
-                lock = vcpu_schedule_lock(svc->vcpu);
+                lock = unit_schedule_lock(svc->vcpu->sched_unit);
                 /*
                  * Too small quotas would in theory cause a lot of overhead,
                  * which then won't happen because, in csched2_runtime(),
@@ -2893,7 +2893,7 @@ csched2_dom_cntl(
                  */
                 svc->budget_quota = max(sdom->tot_budget / sdom->nr_vcpus,
                                         CSCHED2_MIN_TIMER);
-                vcpu_schedule_unlock(lock, svc->vcpu);
+                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
             }
 
             if ( sdom->cap == 0 )
@@ -2928,7 +2928,7 @@ csched2_dom_cntl(
                 for_each_vcpu ( d, v )
                 {
                     svc = csched2_unit(v->sched_unit);
-                    lock = vcpu_schedule_lock(svc->vcpu);
+                    lock = unit_schedule_lock(svc->vcpu->sched_unit);
                     if ( v->is_running )
                     {
                         unsigned int cpu = v->processor;
@@ -2959,7 +2959,7 @@ csched2_dom_cntl(
                         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
                     }
                     svc->budget = 0;
-                    vcpu_schedule_unlock(lock, svc->vcpu);
+                    unit_schedule_unlock(lock, svc->vcpu->sched_unit);
                 }
             }
 
@@ -2975,12 +2975,12 @@ csched2_dom_cntl(
             for_each_vcpu ( d, v )
             {
                 struct csched2_unit *svc = csched2_unit(v->sched_unit);
-                spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
+                spinlock_t *lock = unit_schedule_lock(svc->vcpu->sched_unit);
 
                 svc->budget = STIME_MAX;
                 svc->budget_quota = 0;
 
-                vcpu_schedule_unlock(lock, svc->vcpu);
+                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
             }
             sdom->cap = 0;
             /*
@@ -3119,19 +3119,19 @@ csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     ASSERT(list_empty(&svc->runq_elem));
 
     /* csched2_res_pick() expects the pcpu lock to be held */
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     unit->res = csched2_res_pick(ops, unit);
     vc->processor = unit->res->processor;
 
     spin_unlock_irq(lock);
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     /* Add vcpu to runqueue of initial processor */
     runq_assign(ops, vc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 
     sdom->nr_vcpus++;
 
@@ -3161,11 +3161,11 @@ csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     SCHED_STAT_CRANK(vcpu_remove);
 
     /* Remove from runqueue */
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     runq_deassign(ops, vc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 
     svc->sdom->nr_vcpus--;
 }
@@ -3749,12 +3749,12 @@ csched2_dump(const struct scheduler *ops)
             struct csched2_unit * const svc = csched2_unit(v->sched_unit);
             spinlock_t *lock;
 
-            lock = vcpu_schedule_lock(svc->vcpu);
+            lock = unit_schedule_lock(svc->vcpu->sched_unit);
 
             printk("\t%3d: ", ++loop);
             csched2_dump_vcpu(prv, svc);
 
-            vcpu_schedule_unlock(lock, svc->vcpu);
+            unit_schedule_unlock(lock, svc->vcpu->sched_unit);
         }
     }
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 9df6f867aa..ee3a8cf064 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -317,7 +317,7 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
      * all the pCPUs are busy.
      *
      * In fact, there must always be something sane in v->processor, or
-     * vcpu_schedule_lock() and friends won't work. This is not a problem,
+     * unit_schedule_lock() and friends won't work. This is not a problem,
      * as we will actually assign the vCPU to the pCPU we return from here,
      * only if the pCPU is free.
      */
@@ -420,7 +420,7 @@ static void null_unit_insert(const struct scheduler *ops,
 
     ASSERT(!is_idle_vcpu(v));
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = unit_schedule_lock_irq(unit);
  retry:
 
     unit->res = pick_res(prv, unit);
@@ -428,7 +428,7 @@ static void null_unit_insert(const struct scheduler *ops,
 
     spin_unlock(lock);
 
-    lock = vcpu_schedule_lock(v);
+    lock = unit_schedule_lock(unit);
 
     cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
                 cpupool_domain_cpumask(v->domain));
@@ -514,7 +514,7 @@ static void null_unit_remove(const struct scheduler *ops,
 
     ASSERT(!is_idle_vcpu(v));
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = unit_schedule_lock_irq(unit);
 
     /* If v is in waitqueue, just get it out of there and bail */
     if ( unlikely(!list_empty(&nvc->waitq_elem)) )
@@ -532,7 +532,7 @@ static void null_unit_remove(const struct scheduler *ops,
     _vcpu_remove(prv, v);
 
  out:
-    vcpu_schedule_unlock_irq(lock, v);
+    unit_schedule_unlock_irq(lock, unit);
 
     SCHED_STAT_CRANK(vcpu_remove);
 }
@@ -852,13 +852,13 @@ static void null_dump(const struct scheduler *ops)
             struct null_unit * const nvc = null_unit(v->sched_unit);
             spinlock_t *lock;
 
-            lock = vcpu_schedule_lock(nvc->vcpu);
+            lock = unit_schedule_lock(nvc->vcpu->sched_unit);
 
             printk("\t%3d: ", ++loop);
             dump_vcpu(prv, nvc);
             printk("\n");
 
-            vcpu_schedule_unlock(lock, nvc->vcpu);
+            unit_schedule_unlock(lock, nvc->vcpu->sched_unit);
         }
     }
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index cee0d69d54..cd737131a3 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -177,7 +177,7 @@ static void repl_timer_handler(void *data);
 /*
  * System-wide private data, include global RunQueue/DepletedQ
  * Global lock is referenced by sched_res->schedule_lock from all
- * physical cpus. It can be grabbed via vcpu_schedule_lock_irq()
+ * physical cpus. It can be grabbed via unit_schedule_lock_irq()
  */
 struct rt_private {
     spinlock_t lock;            /* the global coarse-grained lock */
@@ -897,7 +897,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     unit->res = rt_res_pick(ops, unit);
     vc->processor = unit->res->processor;
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
 
     now = NOW();
     if ( now >= svc->cur_deadline )
@@ -910,7 +910,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
         if ( !vc->is_running )
             runq_insert(ops, svc);
     }
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 
     SCHED_STAT_CRANK(vcpu_insert);
 }
@@ -921,7 +921,6 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 static void
 rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit * const svc = rt_unit(unit);
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
@@ -930,14 +929,14 @@ rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 
     BUG_ON( sdom == NULL );
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = unit_schedule_lock_irq(unit);
     if ( vcpu_on_q(svc) )
         q_remove(svc);
 
     if ( vcpu_on_replq(svc) )
         replq_remove(ops,svc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 }
 
 /*
@@ -1332,7 +1331,7 @@ rt_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct vcpu *vc = unit->vcpu;
     struct rt_unit *svc = rt_unit(unit);
-    spinlock_t *lock = vcpu_schedule_lock_irq(vc);
+    spinlock_t *lock = unit_schedule_lock_irq(unit);
 
     __clear_bit(__RTDS_scheduled, &svc->flags);
     /* not insert idle vcpu to runq */
@@ -1349,7 +1348,7 @@ rt_context_saved(const struct scheduler *ops, struct sched_unit *unit)
         replq_remove(ops, svc);
 
 out:
-    vcpu_schedule_unlock_irq(lock, vc);
+    unit_schedule_unlock_irq(lock, unit);
 }
 
 /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ea53bd7183..1321c86111 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -194,7 +194,8 @@ static inline void vcpu_runstate_change(
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
 {
-    spinlock_t *lock = likely(v == current) ? NULL : vcpu_schedule_lock_irq(v);
+    spinlock_t *lock = likely(v == current)
+                       ? NULL : unit_schedule_lock_irq(v->sched_unit);
     s_time_t delta;
 
     memcpy(runstate, &v->runstate, sizeof(*runstate));
@@ -203,7 +204,7 @@ void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
         runstate->time[runstate->state] += delta;
 
     if ( unlikely(lock != NULL) )
-        vcpu_schedule_unlock_irq(lock, v);
+        unit_schedule_unlock_irq(lock, v->sched_unit);
 }
 
 uint64_t get_cpu_idle_time(unsigned int cpu)
@@ -415,7 +416,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         migrate_timer(&v->singleshot_timer, new_p);
         migrate_timer(&v->poll_timer, new_p);
 
-        lock = vcpu_schedule_lock_irq(v);
+        lock = unit_schedule_lock_irq(v->sched_unit);
 
         sched_set_affinity(v, &cpumask_all, &cpumask_all);
 
@@ -424,7 +425,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
-         * - use vcpu_schedule_unlock_irq().
+         * - use unit_schedule_unlock_irq().
          */
         spin_unlock_irq(lock);
 
@@ -523,11 +524,11 @@ void vcpu_sleep_nosync(struct vcpu *v)
 
     TRACE_2D(TRC_SCHED_SLEEP, v->domain->domain_id, v->vcpu_id);
 
-    lock = vcpu_schedule_lock_irqsave(v, &flags);
+    lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
 
     vcpu_sleep_nosync_locked(v);
 
-    vcpu_schedule_unlock_irqrestore(lock, flags, v);
+    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
 }
 
 void vcpu_sleep_sync(struct vcpu *v)
@@ -547,7 +548,7 @@ void vcpu_wake(struct vcpu *v)
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
-    lock = vcpu_schedule_lock_irqsave(v, &flags);
+    lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
 
     if ( likely(vcpu_runnable(v)) )
     {
@@ -561,7 +562,7 @@ void vcpu_wake(struct vcpu *v)
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
     }
 
-    vcpu_schedule_unlock_irqrestore(lock, flags, v);
+    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
 }
 
 void vcpu_unblock(struct vcpu *v)
@@ -629,9 +630,9 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
  * These steps are encapsulated in the following two functions; they
  * should be called like this:
  *
- *     lock = vcpu_schedule_lock_irq(v);
+ *     lock = unit_schedule_lock_irq(unit);
  *     vcpu_migrate_start(v);
- *     vcpu_schedule_unlock_irq(lock, v)
+ *     unit_schedule_unlock_irq(lock, unit)
  *     vcpu_migrate_finish(v);
  *
  * vcpu_migrate_finish() will do the work now if it can, or simply
@@ -736,12 +737,12 @@ static void vcpu_migrate_finish(struct vcpu *v)
  */
 void vcpu_force_reschedule(struct vcpu *v)
 {
-    spinlock_t *lock = vcpu_schedule_lock_irq(v);
+    spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit);
 
     if ( v->is_running )
         vcpu_migrate_start(v);
 
-    vcpu_schedule_unlock_irq(lock, v);
+    unit_schedule_unlock_irq(lock, v->sched_unit);
 
     vcpu_migrate_finish(v);
 }
@@ -792,7 +793,7 @@ void restore_vcpu_affinity(struct domain *d)
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
         v->sched_unit->res = get_sched_res(v->processor);
 
-        lock = vcpu_schedule_lock_irq(v);
+        lock = unit_schedule_lock_irq(v->sched_unit);
         v->sched_unit->res = sched_pick_resource(vcpu_scheduler(v),
                                                  v->sched_unit);
         v->processor = v->sched_unit->res->processor;
@@ -827,7 +828,7 @@ int cpu_disable_scheduler(unsigned int cpu)
         for_each_vcpu ( d, v )
         {
             unsigned long flags;
-            spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
+            spinlock_t *lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
 
             cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
@@ -836,7 +837,7 @@ int cpu_disable_scheduler(unsigned int cpu)
                 if ( v->affinity_broken )
                 {
                     /* The vcpu is temporarily pinned, can't move it. */
-                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
+                    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
                     ret = -EADDRINUSE;
                     break;
                 }
@@ -849,7 +850,7 @@ int cpu_disable_scheduler(unsigned int cpu)
             if ( v->processor != cpu )
             {
                 /* The vcpu is not on this cpu, so we can move on. */
-                vcpu_schedule_unlock_irqrestore(lock, flags, v);
+                unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
                 continue;
             }
 
@@ -862,7 +863,7 @@ int cpu_disable_scheduler(unsigned int cpu)
              *    things would have failed before getting in here.
              */
             vcpu_migrate_start(v);
-            vcpu_schedule_unlock_irqrestore(lock, flags, v);
+            unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
 
             vcpu_migrate_finish(v);
 
@@ -926,7 +927,7 @@ static int vcpu_set_affinity(
     spinlock_t *lock;
     int ret = 0;
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = unit_schedule_lock_irq(v->sched_unit);
 
     if ( v->affinity_broken )
         ret = -EBUSY;
@@ -948,7 +949,7 @@ static int vcpu_set_affinity(
         vcpu_migrate_start(v);
     }
 
-    vcpu_schedule_unlock_irq(lock, v);
+    unit_schedule_unlock_irq(lock, v->sched_unit);
 
     domain_update_node_affinity(v->domain);
 
@@ -1080,10 +1081,10 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=current;
-    spinlock_t *lock = vcpu_schedule_lock_irq(v);
+    spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit);
 
     sched_yield(vcpu_scheduler(v), v->sched_unit);
-    vcpu_schedule_unlock_irq(lock, v);
+    unit_schedule_unlock_irq(lock, v->sched_unit);
 
     SCHED_STAT_CRANK(vcpu_yield);
 
@@ -1169,7 +1170,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     spinlock_t *lock;
     int ret = -EINVAL;
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = unit_schedule_lock_irq(v->sched_unit);
 
     if ( cpu < 0 )
     {
@@ -1196,7 +1197,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     if ( ret == 0 )
         vcpu_migrate_start(v);
 
-    vcpu_schedule_unlock_irq(lock, v);
+    unit_schedule_unlock_irq(lock, v->sched_unit);
 
     domain_update_node_affinity(v->domain);
 
@@ -1663,6 +1664,9 @@ static int cpu_schedule_up(unsigned int cpu)
         unit->priv = sched_alloc_vdata(&ops, unit, idle->domain->sched_priv);
         if ( unit->priv == NULL )
             return -ENOMEM;
+
+        /* Update the resource pointer in the idle unit. */
+        unit->res = sd;
     }
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 0443fe1d7e..20e36ea39b 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -101,22 +101,22 @@ static inline void kind##_schedule_unlock##irq(spinlock_t *lock \
 
 #define EXTRA_TYPE(arg)
 sched_lock(pcpu, unsigned int cpu,     cpu, )
-sched_lock(vcpu, const struct vcpu *v, v->processor, )
+sched_lock(unit, const struct sched_unit *i, i->res->processor, )
 sched_lock(pcpu, unsigned int cpu,     cpu,          _irq)
-sched_lock(vcpu, const struct vcpu *v, v->processor, _irq)
+sched_lock(unit, const struct sched_unit *i, i->res->processor, _irq)
 sched_unlock(pcpu, unsigned int cpu,     cpu, )
-sched_unlock(vcpu, const struct vcpu *v, v->processor, )
+sched_unlock(unit, const struct sched_unit *i, i->res->processor, )
 sched_unlock(pcpu, unsigned int cpu,     cpu,          _irq)
-sched_unlock(vcpu, const struct vcpu *v, v->processor, _irq)
+sched_unlock(unit, const struct sched_unit *i, i->res->processor, _irq)
 #undef EXTRA_TYPE
 
 #define EXTRA_TYPE(arg) , unsigned long arg
 #define spin_unlock_irqsave spin_unlock_irqrestore
 sched_lock(pcpu, unsigned int cpu,     cpu,          _irqsave, *flags)
-sched_lock(vcpu, const struct vcpu *v, v->processor, _irqsave, *flags)
+sched_lock(unit, const struct sched_unit *i, i->res->processor, _irqsave, *flags)
 #undef spin_unlock_irqsave
 sched_unlock(pcpu, unsigned int cpu,     cpu,          _irqrestore, flags)
-sched_unlock(vcpu, const struct vcpu *v, v->processor, _irqrestore, flags)
+sched_unlock(unit, const struct sched_unit *i, i->res->processor, _irqrestore, flags)
 #undef EXTRA_TYPE
 
 #undef sched_unlock
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Affinities are scheduler specific attributes, they should be per
scheduling unit. So move all affinity related fields in struct vcpu
to struct sched_unit. While at it switch affinity related functions in
sched-if.h to use a pointer to sched_unit instead to vcpu as parameter.

vcpu->last_run_time is primarily used by sched_credit, so move it to
struct sched_unit, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/pv/emul-priv-op.c |   1 +
 xen/arch/x86/pv/traps.c        |   5 +-
 xen/arch/x86/traps.c           |   9 ++--
 xen/common/domain.c            |  19 ++-----
 xen/common/domctl.c            |  13 +++--
 xen/common/keyhandler.c        |   4 +-
 xen/common/sched_credit.c      |  20 ++++----
 xen/common/sched_credit2.c     |  42 ++++++++--------
 xen/common/sched_null.c        |  16 +++---
 xen/common/sched_rt.c          |   9 ++--
 xen/common/schedule.c          | 110 ++++++++++++++++++++++++-----------------
 xen/common/wait.c              |   4 +-
 xen/include/xen/sched-if.h     |  17 ++++---
 xen/include/xen/sched.h        |  36 +++++++-------
 14 files changed, 163 insertions(+), 142 deletions(-)

diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index b20d79c7a3..a7a24a2053 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -23,6 +23,7 @@
 #include <xen/event.h>
 #include <xen/guest_access.h>
 #include <xen/iocap.h>
+#include <xen/sched.h>
 #include <xen/spinlock.h>
 #include <xen/trace.h>
 
diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 1740784ff2..419abc3d95 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -22,6 +22,7 @@
 #include <xen/event.h>
 #include <xen/hypercall.h>
 #include <xen/lib.h>
+#include <xen/sched.h>
 #include <xen/trace.h>
 #include <xen/softirq.h>
 
@@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
      * Set the tmp value unconditionally, so that the check in the iret
      * hypercall works.
      */
-    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
-                 st->vcpu->cpu_hard_affinity);
+    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
+                 st->vcpu->sched_unit->cpu_hard_affinity);
 
     if ( (cpu != st->processor) ||
          (st->processor != st->vcpu->processor) )
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 05ddc39bfe..c3b39d6296 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1594,16 +1594,17 @@ static void pci_serr_softirq(void)
 void async_exception_cleanup(struct vcpu *curr)
 {
     int trap;
+    struct sched_unit *unit = curr->sched_unit;
 
     if ( !curr->async_exception_mask )
         return;
 
     /* Restore affinity.  */
-    if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) &&
-         !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) )
+    if ( !cpumask_empty(unit->cpu_hard_affinity_tmp) &&
+         !cpumask_equal(unit->cpu_hard_affinity_tmp, unit->cpu_hard_affinity) )
     {
-        vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp);
-        cpumask_clear(curr->cpu_hard_affinity_tmp);
+        vcpu_set_hard_affinity(curr, unit->cpu_hard_affinity_tmp);
+        cpumask_clear(unit->cpu_hard_affinity_tmp);
     }
 
     if ( !(curr->async_exception_mask & (curr->async_exception_mask - 1)) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 90c66079f9..c200e9024f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
 
 static void vcpu_destroy(struct vcpu *v)
 {
-    free_cpumask_var(v->cpu_hard_affinity);
-    free_cpumask_var(v->cpu_hard_affinity_tmp);
-    free_cpumask_var(v->cpu_hard_affinity_saved);
-    free_cpumask_var(v->cpu_soft_affinity);
-
     free_vcpu_struct(v);
 }
 
@@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
 
     grant_table_init_vcpu(v);
 
-    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
-         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
-         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
-         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
-        goto fail;
-
     if ( is_idle_domain(d) )
     {
         v->runstate.state = RUNSTATE_running;
@@ -198,7 +187,6 @@ struct vcpu *vcpu_create(
     sched_destroy_vcpu(v);
  fail_wq:
     destroy_waitqueue_vcpu(v);
- fail:
     vcpu_destroy(v);
 
     return NULL;
@@ -557,9 +545,10 @@ void domain_update_node_affinity(struct domain *d)
          */
         for_each_vcpu ( d, v )
         {
-            cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity);
+            cpumask_or(dom_cpumask, dom_cpumask,
+                       v->sched_unit->cpu_hard_affinity);
             cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
-                       v->cpu_soft_affinity);
+                       v->sched_unit->cpu_soft_affinity);
         }
         /* Filter out non-online cpus */
         cpumask_and(dom_cpumask, dom_cpumask, online);
@@ -1226,7 +1215,7 @@ int vcpu_reset(struct vcpu *v)
     v->async_exception_mask = 0;
     memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
 #endif
-    cpumask_clear(v->cpu_hard_affinity_tmp);
+    cpumask_clear(v->sched_unit->cpu_hard_affinity_tmp);
     clear_bit(_VPF_blocked, &v->pause_flags);
     clear_bit(_VPF_in_reset, &v->pause_flags);
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index bade9a63b1..bc986d131d 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     case XEN_DOMCTL_getvcpuaffinity:
     {
         struct vcpu *v;
+        struct sched_unit *unit;
         struct xen_domctl_vcpuaffinity *vcpuaff = &op->u.vcpuaffinity;
 
         ret = -EINVAL;
@@ -624,6 +625,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         if ( (v = d->vcpu[vcpuaff->vcpu]) == NULL )
             break;
 
+        unit = v->sched_unit;
         ret = -EINVAL;
         if ( vcpuaffinity_params_invalid(vcpuaff) )
             break;
@@ -643,7 +645,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                 ret = -ENOMEM;
                 break;
             }
-            cpumask_copy(old_affinity, v->cpu_hard_affinity);
+            cpumask_copy(old_affinity, unit->cpu_hard_affinity);
 
             if ( !alloc_cpumask_var(&new_affinity) )
             {
@@ -676,7 +678,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                  * For hard affinity, what we return is the intersection of
                  * cpupool's online mask and the new hard affinity.
                  */
-                cpumask_and(new_affinity, online, v->cpu_hard_affinity);
+                cpumask_and(new_affinity, online, unit->cpu_hard_affinity);
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard,
                                                new_affinity);
             }
@@ -705,7 +707,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                  * hard affinity.
                  */
                 cpumask_and(new_affinity, new_affinity, online);
-                cpumask_and(new_affinity, new_affinity, v->cpu_hard_affinity);
+                cpumask_and(new_affinity, new_affinity,
+                            unit->cpu_hard_affinity);
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft,
                                                new_affinity);
             }
@@ -718,10 +721,10 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         {
             if ( vcpuaff->flags & XEN_VCPUAFFINITY_HARD )
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard,
-                                               v->cpu_hard_affinity);
+                                               unit->cpu_hard_affinity);
             if ( vcpuaff->flags & XEN_VCPUAFFINITY_SOFT )
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft,
-                                               v->cpu_soft_affinity);
+                                               unit->cpu_soft_affinity);
         }
         break;
     }
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 4f4a660b0c..1729f73af0 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -312,8 +312,8 @@ static void dump_domains(unsigned char key)
                 printk("dirty_cpu=%u", v->dirty_cpu);
             printk("\n");
             printk("    cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n",
-                   nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity),
-                   nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity));
+                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_hard_affinity),
+                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_soft_affinity));
             printk("    pause_count=%d pause_flags=%lx\n",
                    atomic_read(&v->pause_count), v->pause_flags);
             arch_dump_vcpu_info(v);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 74c85df334..ffac2f4bbb 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -350,6 +350,7 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 static inline void __runq_tickle(struct csched_unit *new)
 {
     unsigned int cpu = new->vcpu->processor;
+    struct sched_unit *unit = new->vcpu->sched_unit;
     struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     cpumask_t mask, idle_mask, *online;
@@ -375,7 +376,7 @@ static inline void __runq_tickle(struct csched_unit *new)
     if ( unlikely(test_bit(CSCHED_FLAG_VCPU_PINNED, &new->flags) &&
                   cpumask_test_cpu(cpu, &idle_mask)) )
     {
-        ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu);
+        ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu);
         SCHED_STAT_CRANK(tickled_idle_cpu_excl);
         __cpumask_set_cpu(cpu, &mask);
         goto tickle;
@@ -410,11 +411,11 @@ static inline void __runq_tickle(struct csched_unit *new)
             int new_idlers_empty;
 
             if ( balance_step == BALANCE_SOFT_AFFINITY
-                 && !has_soft_affinity(new->vcpu) )
+                 && !has_soft_affinity(unit) )
                 continue;
 
             /* Are there idlers suitable for new (for this balance step)? */
-            affinity_balance_cpumask(new->vcpu, balance_step,
+            affinity_balance_cpumask(unit, balance_step,
                                      cpumask_scratch_cpu(cpu));
             cpumask_and(cpumask_scratch_cpu(cpu),
                         cpumask_scratch_cpu(cpu), &idle_mask);
@@ -443,8 +444,7 @@ static inline void __runq_tickle(struct csched_unit *new)
              */
             if ( new_idlers_empty && new->pri > cur->pri )
             {
-                if ( cpumask_intersects(cur->vcpu->cpu_hard_affinity,
-                                        &idle_mask) )
+                if ( cpumask_intersects(unit->cpu_hard_affinity, &idle_mask) )
                 {
                     SCHED_VCPU_STAT_CRANK(cur, kicked_away);
                     SCHED_VCPU_STAT_CRANK(cur, migrate_r);
@@ -695,7 +695,7 @@ static inline bool
 __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
 {
     bool hot = prv->vcpu_migr_delay &&
-               (NOW() - v->last_run_time) < prv->vcpu_migr_delay;
+               (NOW() - v->sched_unit->last_run_time) < prv->vcpu_migr_delay;
 
     if ( hot )
         SCHED_STAT_CRANK(vcpu_hot);
@@ -733,7 +733,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
 
     for_each_affinity_balance_step( balance_step )
     {
-        affinity_balance_cpumask(vc, balance_step, cpus);
+        affinity_balance_cpumask(vc->sched_unit, balance_step, cpus);
         cpumask_and(cpus, online, cpus);
         /*
          * We want to pick up a pcpu among the ones that are online and
@@ -752,7 +752,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * balancing step all together.
          */
         if ( balance_step == BALANCE_SOFT_AFFINITY &&
-             (!has_soft_affinity(vc) || cpumask_empty(cpus)) )
+             (!has_soft_affinity(vc->sched_unit) || cpumask_empty(cpus)) )
             continue;
 
         /* If present, prefer vc's current processor */
@@ -1652,10 +1652,10 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
          * or counter.
          */
         if ( vc->is_running || (balance_step == BALANCE_SOFT_AFFINITY &&
-                                !has_soft_affinity(vc)) )
+                                !has_soft_affinity(vc->sched_unit)) )
             continue;
 
-        affinity_balance_cpumask(vc, balance_step, cpumask_scratch);
+        affinity_balance_cpumask(vc->sched_unit, balance_step, cpumask_scratch);
         if ( __csched_vcpu_is_migrateable(prv, vc, cpu, cpumask_scratch) )
         {
             /* We got a candidate. Grab it! */
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 562e73d99e..dabd5636f5 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -699,10 +699,10 @@ static int get_fallback_cpu(struct csched2_unit *svc)
     {
         int cpu = v->processor;
 
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v->sched_unit) )
             continue;
 
-        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(v->sched_unit, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     cpupool_domain_cpumask(v->domain));
 
@@ -1390,10 +1390,10 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
      */
     if ( score > 0 )
     {
-        if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+        if ( cpumask_test_cpu(cpu, new->vcpu->sched_unit->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
 
-        if ( !cpumask_test_cpu(cpu, cur->vcpu->cpu_soft_affinity) )
+        if ( !cpumask_test_cpu(cpu, cur->vcpu->sched_unit->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
     }
 
@@ -1436,6 +1436,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t max = 0;
+    struct sched_unit *unit = new->vcpu->sched_unit;
     unsigned int bs, cpu = new->vcpu->processor;
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
     cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain);
@@ -1473,7 +1474,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
                   cpumask_test_cpu(cpu, &rqd->idle) &&
                   !cpumask_test_cpu(cpu, &rqd->tickled)) )
     {
-        ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu);
+        ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu);
         SCHED_STAT_CRANK(tickled_idle_cpu_excl);
         ipid = cpu;
         goto tickle;
@@ -1482,10 +1483,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
     for_each_affinity_balance_step( bs )
     {
         /* Just skip first step, if we don't have a soft affinity */
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(new->vcpu) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) )
             continue;
 
-        affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu));
 
         /*
          * First of all, consider idle cpus, checking if we can just
@@ -1557,7 +1558,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
             ipid = cpu;
 
             /* If this is in new's soft affinity, just take it */
-            if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+            if ( cpumask_test_cpu(cpu, unit->cpu_soft_affinity) )
             {
                 SCHED_STAT_CRANK(tickled_busy_cpu);
                 goto tickle;
@@ -2243,7 +2244,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
         goto out;
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                 cpupool_domain_cpumask(vc->domain));
 
     /*
@@ -2288,7 +2289,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
      *
      * Find both runqueues in one pass.
      */
-    has_soft = has_soft_affinity(vc);
+    has_soft = has_soft_affinity(unit);
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
@@ -2335,7 +2336,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
             cpumask_t mask;
 
             cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active);
-            if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) )
+            if ( cpumask_intersects(&mask, unit->cpu_soft_affinity) )
             {
                 min_s_avgload = rqd_avgload;
                 min_s_rqi = i;
@@ -2357,9 +2358,9 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
          * Note that, to obtain the soft-affinity mask, we "just" put what we
          * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
          * ok because:
-         * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have
+         * - we know that unit->cpu_hard_affinity and ->cpu_soft_affinity have
          *   a non-empty intersection (because has_soft is true);
-         * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already
+         * - we have unit->cpu_hard_affinity & cpupool_domain_cpumask() already
          *   in cpumask_scratch, we do save a lot doing like this.
          *
          * It's kind of like open coding affinity_balance_cpumask() but, in
@@ -2367,7 +2368,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
          * cpumask operations.
          */
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                    vc->cpu_soft_affinity);
+                    unit->cpu_soft_affinity);
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &prv->rqd[min_s_rqi].active);
     }
@@ -2475,6 +2476,7 @@ static void migrate(const struct scheduler *ops,
                     s_time_t now)
 {
     int cpu = svc->vcpu->processor;
+    struct sched_unit *unit = svc->vcpu->sched_unit;
 
     if ( unlikely(tb_init_done) )
     {
@@ -2512,7 +2514,7 @@ static void migrate(const struct scheduler *ops,
         }
         _runq_deassign(svc);
 
-        cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                     cpupool_domain_cpumask(svc->vcpu->domain));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
@@ -2546,7 +2548,7 @@ static bool vcpu_is_migrateable(struct csched2_unit *svc,
     struct vcpu *v = svc->vcpu;
     int cpu = svc->vcpu->processor;
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), v->sched_unit->cpu_hard_affinity,
                 cpupool_domain_cpumask(v->domain));
 
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
@@ -2780,7 +2782,7 @@ csched2_unit_migrate(
 
     /* If here, new_cpu must be a valid Credit2 pCPU, and in our affinity. */
     ASSERT(cpumask_test_cpu(new_cpu, &csched2_priv(ops)->initialized));
-    ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
+    ASSERT(cpumask_test_cpu(new_cpu, unit->cpu_hard_affinity));
 
     trqd = c2rqd(ops, new_cpu);
 
@@ -3320,9 +3322,9 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     }
 
     /* If scurr has a soft-affinity, let's check whether cpu is part of it */
-    if ( has_soft_affinity(scurr->vcpu) )
+    if ( has_soft_affinity(scurr->vcpu->sched_unit) )
     {
-        affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
+        affinity_balance_cpumask(scurr->vcpu->sched_unit, BALANCE_SOFT_AFFINITY,
                                  cpumask_scratch);
         if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) )
         {
@@ -3377,7 +3379,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         }
 
         /* Only consider vcpus that are allowed to run on this processor. */
-        if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
+        if ( !cpumask_test_cpu(cpu, svc->vcpu->sched_unit->cpu_hard_affinity) )
         {
             (*skipped)++;
             continue;
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index ee3a8cf064..56b0055a42 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -123,7 +123,8 @@ static inline struct null_unit *null_unit(const struct sched_unit *unit)
 static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
                                        unsigned int balance_step)
 {
-    affinity_balance_cpumask(v, balance_step, cpumask_scratch_cpu(cpu));
+    affinity_balance_cpumask(v->sched_unit, balance_step,
+                             cpumask_scratch_cpu(cpu));
     cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                 cpupool_domain_cpumask(v->domain));
 
@@ -281,10 +282,10 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
 
     for_each_affinity_balance_step( bs )
     {
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) )
             continue;
 
-        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), cpus);
 
         /*
@@ -321,7 +322,7 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
      * as we will actually assign the vCPU to the pCPU we return from here,
      * only if the pCPU is free.
      */
-    cpumask_and(cpumask_scratch_cpu(cpu), cpus, v->cpu_hard_affinity);
+    cpumask_and(cpumask_scratch_cpu(cpu), cpus, unit->cpu_hard_affinity);
     new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
 
  out:
@@ -430,7 +431,7 @@ static void null_unit_insert(const struct scheduler *ops,
 
     lock = unit_schedule_lock(unit);
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                 cpupool_domain_cpumask(v->domain));
 
     /* If the pCPU is free, we assign v to it */
@@ -488,7 +489,8 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
     {
         list_for_each_entry( wvc, &prv->waitq, waitq_elem )
         {
-            if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(wvc->vcpu) )
+            if ( bs == BALANCE_SOFT_AFFINITY &&
+                 !has_soft_affinity(wvc->vcpu->sched_unit) )
                 continue;
 
             if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
@@ -767,7 +769,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
             list_for_each_entry( wvc, &prv->waitq, waitq_elem )
             {
                 if ( bs == BALANCE_SOFT_AFFINITY &&
-                     !has_soft_affinity(wvc->vcpu) )
+                     !has_soft_affinity(wvc->vcpu->sched_unit) )
                     continue;
 
                 if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index cd737131a3..d640d87b43 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -327,7 +327,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc)
     mask = cpumask_scratch_cpu(svc->vcpu->processor);
 
     cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain);
-    cpumask_and(mask, cpupool_mask, svc->vcpu->cpu_hard_affinity);
+    cpumask_and(mask, cpupool_mask, svc->vcpu->sched_unit->cpu_hard_affinity);
     printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
            " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
            " \t\t priority_level=%d has_extratime=%d\n"
@@ -645,7 +645,7 @@ rt_res_pick(const struct scheduler *ops, struct sched_unit *unit)
     int cpu;
 
     online = cpupool_domain_cpumask(vc->domain);
-    cpumask_and(&cpus, online, vc->cpu_hard_affinity);
+    cpumask_and(&cpus, online, unit->cpu_hard_affinity);
 
     cpu = cpumask_test_cpu(vc->processor, &cpus)
             ? vc->processor
@@ -1023,7 +1023,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
 
         /* mask cpu_hard_affinity & cpupool & mask */
         online = cpupool_domain_cpumask(iter_svc->vcpu->domain);
-        cpumask_and(&cpu_common, online, iter_svc->vcpu->cpu_hard_affinity);
+        cpumask_and(&cpu_common, online,
+                    iter_svc->vcpu->sched_unit->cpu_hard_affinity);
         cpumask_and(&cpu_common, mask, &cpu_common);
         if ( cpumask_empty(&cpu_common) )
             continue;
@@ -1192,7 +1193,7 @@ runq_tickle(const struct scheduler *ops, struct rt_unit *new)
         return;
 
     online = cpupool_domain_cpumask(new->vcpu->domain);
-    cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&not_tickled, online, new->vcpu->sched_unit->cpu_hard_affinity);
     cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
 
     /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 1321c86111..212c1e637f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -270,6 +270,12 @@ static void sched_free_unit(struct sched_unit *unit)
     }
 
     unit->vcpu->sched_unit = NULL;
+
+    free_cpumask_var(unit->cpu_hard_affinity);
+    free_cpumask_var(unit->cpu_hard_affinity_tmp);
+    free_cpumask_var(unit->cpu_hard_affinity_saved);
+    free_cpumask_var(unit->cpu_soft_affinity);
+
     xfree(unit);
 }
 
@@ -293,7 +299,17 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
     unit->next_in_list = *prev_unit;
     *prev_unit = unit;
 
+    if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) ||
+         !zalloc_cpumask_var(&unit->cpu_hard_affinity_tmp) ||
+         !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) ||
+         !zalloc_cpumask_var(&unit->cpu_soft_affinity) )
+        goto fail;
+
     return unit;
+
+ fail:
+    sched_free_unit(unit);
+    return NULL;
 }
 
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
@@ -363,7 +379,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        if ( v->affinity_broken )
+        if ( v->sched_unit->affinity_broken )
             return -EBUSY;
     }
 
@@ -682,7 +698,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
              */
             if ( pick_called &&
                  (new_lock == get_sched_res(new_cpu)->schedule_lock) &&
-                 cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) &&
+                 cpumask_test_cpu(new_cpu, v->sched_unit->cpu_hard_affinity) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
 
@@ -758,6 +774,7 @@ void restore_vcpu_affinity(struct domain *d)
     {
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
+        struct sched_unit *unit = v->sched_unit;
 
         ASSERT(!vcpu_runnable(v));
 
@@ -769,15 +786,15 @@ void restore_vcpu_affinity(struct domain *d)
          * set v->processor of each of their vCPUs to something that will
          * make sense for the scheduler of the cpupool in which they are in.
          */
-        cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                     cpupool_domain_cpumask(d));
         if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
         {
-            if ( v->affinity_broken )
+            if ( unit->affinity_broken )
             {
-                sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
-                v->affinity_broken = 0;
-                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL);
+                unit->affinity_broken = 0;
+                cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
             }
 
@@ -785,18 +802,17 @@ void restore_vcpu_affinity(struct domain *d)
             {
                 printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
                 sched_set_affinity(v, &cpumask_all, NULL);
-                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
             }
         }
 
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
-        v->sched_unit->res = get_sched_res(v->processor);
+        unit->res = get_sched_res(v->processor);
 
-        lock = unit_schedule_lock_irq(v->sched_unit);
-        v->sched_unit->res = sched_pick_resource(vcpu_scheduler(v),
-                                                 v->sched_unit);
-        v->processor = v->sched_unit->res->processor;
+        lock = unit_schedule_lock_irq(unit);
+        unit->res = sched_pick_resource(vcpu_scheduler(v), unit);
+        v->processor = unit->res->processor;
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -828,16 +844,17 @@ int cpu_disable_scheduler(unsigned int cpu)
         for_each_vcpu ( d, v )
         {
             unsigned long flags;
-            spinlock_t *lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
+            struct sched_unit *unit = v->sched_unit;
+            spinlock_t *lock = unit_schedule_lock_irqsave(unit, &flags);
 
-            cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid);
+            cpumask_and(&online_affinity, unit->cpu_hard_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
-                 cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
+                 cpumask_test_cpu(cpu, unit->cpu_hard_affinity) )
             {
-                if ( v->affinity_broken )
+                if ( unit->affinity_broken )
                 {
                     /* The vcpu is temporarily pinned, can't move it. */
-                    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+                    unit_schedule_unlock_irqrestore(lock, flags, unit);
                     ret = -EADDRINUSE;
                     break;
                 }
@@ -850,7 +867,7 @@ int cpu_disable_scheduler(unsigned int cpu)
             if ( v->processor != cpu )
             {
                 /* The vcpu is not on this cpu, so we can move on. */
-                unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+                unit_schedule_unlock_irqrestore(lock, flags, unit);
                 continue;
             }
 
@@ -863,7 +880,7 @@ int cpu_disable_scheduler(unsigned int cpu)
              *    things would have failed before getting in here.
              */
             vcpu_migrate_start(v);
-            unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+            unit_schedule_unlock_irqrestore(lock, flags, unit);
 
             vcpu_migrate_finish(v);
 
@@ -892,7 +909,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 
     for_each_domain_in_cpupool ( d, c )
         for_each_vcpu ( d, v )
-            if ( v->affinity_broken )
+            if ( v->sched_unit->affinity_broken )
                 return -EADDRINUSE;
 
     return 0;
@@ -908,28 +925,31 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    sched_adjust_affinity(dom_scheduler(v->domain), v->sched_unit, hard, soft);
+    struct sched_unit *unit = v->sched_unit;
+
+    sched_adjust_affinity(dom_scheduler(v->domain), unit, hard, soft);
 
     if ( hard )
-        cpumask_copy(v->cpu_hard_affinity, hard);
+        cpumask_copy(unit->cpu_hard_affinity, hard);
     if ( soft )
-        cpumask_copy(v->cpu_soft_affinity, soft);
+        cpumask_copy(unit->cpu_soft_affinity, soft);
 
-    v->soft_aff_effective = !cpumask_subset(v->cpu_hard_affinity,
-                                            v->cpu_soft_affinity) &&
-                            cpumask_intersects(v->cpu_soft_affinity,
-                                               v->cpu_hard_affinity);
+    unit->soft_aff_effective = !cpumask_subset(unit->cpu_hard_affinity,
+                                               unit->cpu_soft_affinity) &&
+                               cpumask_intersects(unit->cpu_soft_affinity,
+                                                  unit->cpu_hard_affinity);
 }
 
 static int vcpu_set_affinity(
     struct vcpu *v, const cpumask_t *affinity, const cpumask_t *which)
 {
+    struct sched_unit *unit = v->sched_unit;
     spinlock_t *lock;
     int ret = 0;
 
-    lock = unit_schedule_lock_irq(v->sched_unit);
+    lock = unit_schedule_lock_irq(unit);
 
-    if ( v->affinity_broken )
+    if ( unit->affinity_broken )
         ret = -EBUSY;
     else
     {
@@ -937,19 +957,19 @@ static int vcpu_set_affinity(
          * Tell the scheduler we changes something about affinity,
          * and ask to re-evaluate vcpu placement.
          */
-        if ( which == v->cpu_hard_affinity )
+        if ( which == unit->cpu_hard_affinity )
         {
             sched_set_affinity(v, affinity, NULL);
         }
         else
         {
-            ASSERT(which == v->cpu_soft_affinity);
+            ASSERT(which == unit->cpu_soft_affinity);
             sched_set_affinity(v, NULL, affinity);
         }
         vcpu_migrate_start(v);
     }
 
-    unit_schedule_unlock_irq(lock, v->sched_unit);
+    unit_schedule_unlock_irq(lock, unit);
 
     domain_update_node_affinity(v->domain);
 
@@ -968,12 +988,12 @@ int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity)
     if ( cpumask_empty(&online_affinity) )
         return -EINVAL;
 
-    return vcpu_set_affinity(v, affinity, v->cpu_hard_affinity);
+    return vcpu_set_affinity(v, affinity, v->sched_unit->cpu_hard_affinity);
 }
 
 int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity)
 {
-    return vcpu_set_affinity(v, affinity, v->cpu_soft_affinity);
+    return vcpu_set_affinity(v, affinity, v->sched_unit->cpu_soft_affinity);
 }
 
 /* Block the currently-executing domain until a pertinent event occurs. */
@@ -1167,28 +1187,30 @@ void watchdog_domain_destroy(struct domain *d)
 
 int vcpu_pin_override(struct vcpu *v, int cpu)
 {
+    struct sched_unit *unit = v->sched_unit;
     spinlock_t *lock;
     int ret = -EINVAL;
 
-    lock = unit_schedule_lock_irq(v->sched_unit);
+    lock = unit_schedule_lock_irq(unit);
 
     if ( cpu < 0 )
     {
-        if ( v->affinity_broken )
+        if ( unit->affinity_broken )
         {
-            sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
-            v->affinity_broken = 0;
+            sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL);
+            unit->affinity_broken = 0;
             ret = 0;
         }
     }
     else if ( cpu < nr_cpu_ids )
     {
-        if ( v->affinity_broken )
+        if ( unit->affinity_broken )
             ret = -EBUSY;
         else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) )
         {
-            cpumask_copy(v->cpu_hard_affinity_saved, v->cpu_hard_affinity);
-            v->affinity_broken = 1;
+            cpumask_copy(unit->cpu_hard_affinity_saved,
+                         unit->cpu_hard_affinity);
+            unit->affinity_broken = 1;
             sched_set_affinity(v, cpumask_of(cpu), NULL);
             ret = 0;
         }
@@ -1197,7 +1219,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     if ( ret == 0 )
         vcpu_migrate_start(v);
 
-    unit_schedule_unlock_irq(lock, v->sched_unit);
+    unit_schedule_unlock_irq(lock, unit);
 
     domain_update_node_affinity(v->domain);
 
@@ -1549,7 +1571,7 @@ static void schedule(void)
         ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
          (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
         now);
-    prev->last_run_time = now;
+    prev->sched_unit->last_run_time = now;
 
     ASSERT(next->runstate.state != RUNSTATE_running);
     vcpu_runstate_change(next, RUNSTATE_running, now);
diff --git a/xen/common/wait.c b/xen/common/wait.c
index 4f830a14e8..37e9e0d016 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -132,7 +132,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
 
     /* Save current VCPU affinity; force wakeup on *this* CPU only. */
     wqv->wakeup_cpu = smp_processor_id();
-    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
+    cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
     if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
     {
         gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
@@ -199,7 +199,7 @@ void check_wakeup_from_wait(void)
     {
         /* Re-set VCPU affinity and re-enter the scheduler. */
         struct vcpu *curr = current;
-        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
+        cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
         if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
         {
             gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 20e36ea39b..17c01abc25 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
  * * The hard affinity is not a subset of soft affinity
  * * There is an overlap between the soft and hard affinity masks
  */
-static inline int has_soft_affinity(const struct vcpu *v)
+static inline int has_soft_affinity(const struct sched_unit *unit)
 {
-    return v->soft_aff_effective &&
-           !cpumask_subset(cpupool_domain_cpumask(v->domain),
-                           v->cpu_soft_affinity);
+    return unit->soft_aff_effective &&
+           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
+                           unit->cpu_soft_affinity);
 }
 
 /*
@@ -452,17 +452,18 @@ static inline int has_soft_affinity(const struct vcpu *v)
  * to avoid running a vcpu where it would like, but is not allowed to!
  */
 static inline void
-affinity_balance_cpumask(const struct vcpu *v, int step, cpumask_t *mask)
+affinity_balance_cpumask(const struct sched_unit *unit, int step,
+                         cpumask_t *mask)
 {
     if ( step == BALANCE_SOFT_AFFINITY )
     {
-        cpumask_and(mask, v->cpu_soft_affinity, v->cpu_hard_affinity);
+        cpumask_and(mask, unit->cpu_soft_affinity, unit->cpu_hard_affinity);
 
         if ( unlikely(cpumask_empty(mask)) )
-            cpumask_copy(mask, v->cpu_hard_affinity);
+            cpumask_copy(mask, unit->cpu_hard_affinity);
     }
     else /* step == BALANCE_HARD_AFFINITY */
-        cpumask_copy(mask, v->cpu_hard_affinity);
+        cpumask_copy(mask, unit->cpu_hard_affinity);
 }
 
 #endif /* __XEN_SCHED_IF_H__ */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ee316cddd7..13c99a9194 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -175,9 +175,6 @@ struct vcpu
     } runstate_guest; /* guest address */
 #endif
 
-    /* last time when vCPU is scheduled out */
-    uint64_t last_run_time;
-
     /* Has the FPU been initialised? */
     bool             fpu_initialised;
     /* Has the FPU been used since it was last saved? */
@@ -203,8 +200,6 @@ struct vcpu
     bool             defer_shutdown;
     /* VCPU is paused following shutdown request (d->is_shutting_down)? */
     bool             paused_for_shutdown;
-    /* VCPU need affinity restored */
-    bool             affinity_broken;
 
     /* A hypercall has been preempted. */
     bool             hcall_preempted;
@@ -213,9 +208,6 @@ struct vcpu
     bool             hcall_compat;
 #endif
 
-    /* Does soft affinity actually play a role (given hard affinity)? */
-    bool             soft_aff_effective;
-
     /* The CPU, if any, which is holding onto this VCPU's state. */
 #define VCPU_CPU_CLEAN (~0u)
     unsigned int     dirty_cpu;
@@ -247,16 +239,6 @@ struct vcpu
     evtchn_port_t    virq_to_evtchn[NR_VIRQS];
     spinlock_t       virq_lock;
 
-    /* Bitmask of CPUs on which this VCPU may run. */
-    cpumask_var_t    cpu_hard_affinity;
-    /* Used to change affinity temporarily. */
-    cpumask_var_t    cpu_hard_affinity_tmp;
-    /* Used to restore affinity across S3. */
-    cpumask_var_t    cpu_hard_affinity_saved;
-
-    /* Bitmask of CPUs on which this VCPU prefers to run. */
-    cpumask_var_t    cpu_soft_affinity;
-
     /* Tasklet for continue_hypercall_on_cpu(). */
     struct tasklet   continue_hypercall_tasklet;
 
@@ -283,6 +265,22 @@ struct sched_unit {
     void                  *priv;      /* scheduler private data */
     struct sched_unit     *next_in_list;
     struct sched_resource *res;
+
+    /* Last time when unit has been scheduled out. */
+    uint64_t               last_run_time;
+
+    /* Item needs affinity restored. */
+    bool                   affinity_broken;
+    /* Does soft affinity actually play a role (given hard affinity)? */
+    bool                   soft_aff_effective;
+    /* Bitmask of CPUs on which this VCPU may run. */
+    cpumask_var_t          cpu_hard_affinity;
+    /* Used to change affinity temporarily. */
+    cpumask_var_t          cpu_hard_affinity_tmp;
+    /* Used to restore affinity across S3. */
+    cpumask_var_t          cpu_hard_affinity_saved;
+    /* Bitmask of CPUs on which this VCPU prefers to run. */
+    cpumask_var_t          cpu_soft_affinity;
 };
 
 #define for_each_sched_unit(d, e)                                         \
@@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v)
 static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v)
 {
     return (is_hardware_domain(v->domain) &&
-            cpumask_weight(v->cpu_hard_affinity) == 1);
+            cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1);
 }
 
 #ifdef CONFIG_HAS_PASSTHROUGH
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Affinities are scheduler specific attributes, they should be per
scheduling unit. So move all affinity related fields in struct vcpu
to struct sched_unit. While at it switch affinity related functions in
sched-if.h to use a pointer to sched_unit instead to vcpu as parameter.

vcpu->last_run_time is primarily used by sched_credit, so move it to
struct sched_unit, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/pv/emul-priv-op.c |   1 +
 xen/arch/x86/pv/traps.c        |   5 +-
 xen/arch/x86/traps.c           |   9 ++--
 xen/common/domain.c            |  19 ++-----
 xen/common/domctl.c            |  13 +++--
 xen/common/keyhandler.c        |   4 +-
 xen/common/sched_credit.c      |  20 ++++----
 xen/common/sched_credit2.c     |  42 ++++++++--------
 xen/common/sched_null.c        |  16 +++---
 xen/common/sched_rt.c          |   9 ++--
 xen/common/schedule.c          | 110 ++++++++++++++++++++++++-----------------
 xen/common/wait.c              |   4 +-
 xen/include/xen/sched-if.h     |  17 ++++---
 xen/include/xen/sched.h        |  36 +++++++-------
 14 files changed, 163 insertions(+), 142 deletions(-)

diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index b20d79c7a3..a7a24a2053 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -23,6 +23,7 @@
 #include <xen/event.h>
 #include <xen/guest_access.h>
 #include <xen/iocap.h>
+#include <xen/sched.h>
 #include <xen/spinlock.h>
 #include <xen/trace.h>
 
diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 1740784ff2..419abc3d95 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -22,6 +22,7 @@
 #include <xen/event.h>
 #include <xen/hypercall.h>
 #include <xen/lib.h>
+#include <xen/sched.h>
 #include <xen/trace.h>
 #include <xen/softirq.h>
 
@@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
      * Set the tmp value unconditionally, so that the check in the iret
      * hypercall works.
      */
-    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
-                 st->vcpu->cpu_hard_affinity);
+    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
+                 st->vcpu->sched_unit->cpu_hard_affinity);
 
     if ( (cpu != st->processor) ||
          (st->processor != st->vcpu->processor) )
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 05ddc39bfe..c3b39d6296 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1594,16 +1594,17 @@ static void pci_serr_softirq(void)
 void async_exception_cleanup(struct vcpu *curr)
 {
     int trap;
+    struct sched_unit *unit = curr->sched_unit;
 
     if ( !curr->async_exception_mask )
         return;
 
     /* Restore affinity.  */
-    if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) &&
-         !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) )
+    if ( !cpumask_empty(unit->cpu_hard_affinity_tmp) &&
+         !cpumask_equal(unit->cpu_hard_affinity_tmp, unit->cpu_hard_affinity) )
     {
-        vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp);
-        cpumask_clear(curr->cpu_hard_affinity_tmp);
+        vcpu_set_hard_affinity(curr, unit->cpu_hard_affinity_tmp);
+        cpumask_clear(unit->cpu_hard_affinity_tmp);
     }
 
     if ( !(curr->async_exception_mask & (curr->async_exception_mask - 1)) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 90c66079f9..c200e9024f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
 
 static void vcpu_destroy(struct vcpu *v)
 {
-    free_cpumask_var(v->cpu_hard_affinity);
-    free_cpumask_var(v->cpu_hard_affinity_tmp);
-    free_cpumask_var(v->cpu_hard_affinity_saved);
-    free_cpumask_var(v->cpu_soft_affinity);
-
     free_vcpu_struct(v);
 }
 
@@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
 
     grant_table_init_vcpu(v);
 
-    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
-         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
-         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
-         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
-        goto fail;
-
     if ( is_idle_domain(d) )
     {
         v->runstate.state = RUNSTATE_running;
@@ -198,7 +187,6 @@ struct vcpu *vcpu_create(
     sched_destroy_vcpu(v);
  fail_wq:
     destroy_waitqueue_vcpu(v);
- fail:
     vcpu_destroy(v);
 
     return NULL;
@@ -557,9 +545,10 @@ void domain_update_node_affinity(struct domain *d)
          */
         for_each_vcpu ( d, v )
         {
-            cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity);
+            cpumask_or(dom_cpumask, dom_cpumask,
+                       v->sched_unit->cpu_hard_affinity);
             cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
-                       v->cpu_soft_affinity);
+                       v->sched_unit->cpu_soft_affinity);
         }
         /* Filter out non-online cpus */
         cpumask_and(dom_cpumask, dom_cpumask, online);
@@ -1226,7 +1215,7 @@ int vcpu_reset(struct vcpu *v)
     v->async_exception_mask = 0;
     memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
 #endif
-    cpumask_clear(v->cpu_hard_affinity_tmp);
+    cpumask_clear(v->sched_unit->cpu_hard_affinity_tmp);
     clear_bit(_VPF_blocked, &v->pause_flags);
     clear_bit(_VPF_in_reset, &v->pause_flags);
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index bade9a63b1..bc986d131d 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     case XEN_DOMCTL_getvcpuaffinity:
     {
         struct vcpu *v;
+        struct sched_unit *unit;
         struct xen_domctl_vcpuaffinity *vcpuaff = &op->u.vcpuaffinity;
 
         ret = -EINVAL;
@@ -624,6 +625,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         if ( (v = d->vcpu[vcpuaff->vcpu]) == NULL )
             break;
 
+        unit = v->sched_unit;
         ret = -EINVAL;
         if ( vcpuaffinity_params_invalid(vcpuaff) )
             break;
@@ -643,7 +645,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                 ret = -ENOMEM;
                 break;
             }
-            cpumask_copy(old_affinity, v->cpu_hard_affinity);
+            cpumask_copy(old_affinity, unit->cpu_hard_affinity);
 
             if ( !alloc_cpumask_var(&new_affinity) )
             {
@@ -676,7 +678,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                  * For hard affinity, what we return is the intersection of
                  * cpupool's online mask and the new hard affinity.
                  */
-                cpumask_and(new_affinity, online, v->cpu_hard_affinity);
+                cpumask_and(new_affinity, online, unit->cpu_hard_affinity);
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard,
                                                new_affinity);
             }
@@ -705,7 +707,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                  * hard affinity.
                  */
                 cpumask_and(new_affinity, new_affinity, online);
-                cpumask_and(new_affinity, new_affinity, v->cpu_hard_affinity);
+                cpumask_and(new_affinity, new_affinity,
+                            unit->cpu_hard_affinity);
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft,
                                                new_affinity);
             }
@@ -718,10 +721,10 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         {
             if ( vcpuaff->flags & XEN_VCPUAFFINITY_HARD )
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard,
-                                               v->cpu_hard_affinity);
+                                               unit->cpu_hard_affinity);
             if ( vcpuaff->flags & XEN_VCPUAFFINITY_SOFT )
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft,
-                                               v->cpu_soft_affinity);
+                                               unit->cpu_soft_affinity);
         }
         break;
     }
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 4f4a660b0c..1729f73af0 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -312,8 +312,8 @@ static void dump_domains(unsigned char key)
                 printk("dirty_cpu=%u", v->dirty_cpu);
             printk("\n");
             printk("    cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n",
-                   nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity),
-                   nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity));
+                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_hard_affinity),
+                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_soft_affinity));
             printk("    pause_count=%d pause_flags=%lx\n",
                    atomic_read(&v->pause_count), v->pause_flags);
             arch_dump_vcpu_info(v);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 74c85df334..ffac2f4bbb 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -350,6 +350,7 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 static inline void __runq_tickle(struct csched_unit *new)
 {
     unsigned int cpu = new->vcpu->processor;
+    struct sched_unit *unit = new->vcpu->sched_unit;
     struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     cpumask_t mask, idle_mask, *online;
@@ -375,7 +376,7 @@ static inline void __runq_tickle(struct csched_unit *new)
     if ( unlikely(test_bit(CSCHED_FLAG_VCPU_PINNED, &new->flags) &&
                   cpumask_test_cpu(cpu, &idle_mask)) )
     {
-        ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu);
+        ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu);
         SCHED_STAT_CRANK(tickled_idle_cpu_excl);
         __cpumask_set_cpu(cpu, &mask);
         goto tickle;
@@ -410,11 +411,11 @@ static inline void __runq_tickle(struct csched_unit *new)
             int new_idlers_empty;
 
             if ( balance_step == BALANCE_SOFT_AFFINITY
-                 && !has_soft_affinity(new->vcpu) )
+                 && !has_soft_affinity(unit) )
                 continue;
 
             /* Are there idlers suitable for new (for this balance step)? */
-            affinity_balance_cpumask(new->vcpu, balance_step,
+            affinity_balance_cpumask(unit, balance_step,
                                      cpumask_scratch_cpu(cpu));
             cpumask_and(cpumask_scratch_cpu(cpu),
                         cpumask_scratch_cpu(cpu), &idle_mask);
@@ -443,8 +444,7 @@ static inline void __runq_tickle(struct csched_unit *new)
              */
             if ( new_idlers_empty && new->pri > cur->pri )
             {
-                if ( cpumask_intersects(cur->vcpu->cpu_hard_affinity,
-                                        &idle_mask) )
+                if ( cpumask_intersects(unit->cpu_hard_affinity, &idle_mask) )
                 {
                     SCHED_VCPU_STAT_CRANK(cur, kicked_away);
                     SCHED_VCPU_STAT_CRANK(cur, migrate_r);
@@ -695,7 +695,7 @@ static inline bool
 __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
 {
     bool hot = prv->vcpu_migr_delay &&
-               (NOW() - v->last_run_time) < prv->vcpu_migr_delay;
+               (NOW() - v->sched_unit->last_run_time) < prv->vcpu_migr_delay;
 
     if ( hot )
         SCHED_STAT_CRANK(vcpu_hot);
@@ -733,7 +733,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
 
     for_each_affinity_balance_step( balance_step )
     {
-        affinity_balance_cpumask(vc, balance_step, cpus);
+        affinity_balance_cpumask(vc->sched_unit, balance_step, cpus);
         cpumask_and(cpus, online, cpus);
         /*
          * We want to pick up a pcpu among the ones that are online and
@@ -752,7 +752,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * balancing step all together.
          */
         if ( balance_step == BALANCE_SOFT_AFFINITY &&
-             (!has_soft_affinity(vc) || cpumask_empty(cpus)) )
+             (!has_soft_affinity(vc->sched_unit) || cpumask_empty(cpus)) )
             continue;
 
         /* If present, prefer vc's current processor */
@@ -1652,10 +1652,10 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
          * or counter.
          */
         if ( vc->is_running || (balance_step == BALANCE_SOFT_AFFINITY &&
-                                !has_soft_affinity(vc)) )
+                                !has_soft_affinity(vc->sched_unit)) )
             continue;
 
-        affinity_balance_cpumask(vc, balance_step, cpumask_scratch);
+        affinity_balance_cpumask(vc->sched_unit, balance_step, cpumask_scratch);
         if ( __csched_vcpu_is_migrateable(prv, vc, cpu, cpumask_scratch) )
         {
             /* We got a candidate. Grab it! */
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 562e73d99e..dabd5636f5 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -699,10 +699,10 @@ static int get_fallback_cpu(struct csched2_unit *svc)
     {
         int cpu = v->processor;
 
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v->sched_unit) )
             continue;
 
-        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(v->sched_unit, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     cpupool_domain_cpumask(v->domain));
 
@@ -1390,10 +1390,10 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
      */
     if ( score > 0 )
     {
-        if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+        if ( cpumask_test_cpu(cpu, new->vcpu->sched_unit->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
 
-        if ( !cpumask_test_cpu(cpu, cur->vcpu->cpu_soft_affinity) )
+        if ( !cpumask_test_cpu(cpu, cur->vcpu->sched_unit->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
     }
 
@@ -1436,6 +1436,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t max = 0;
+    struct sched_unit *unit = new->vcpu->sched_unit;
     unsigned int bs, cpu = new->vcpu->processor;
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
     cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain);
@@ -1473,7 +1474,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
                   cpumask_test_cpu(cpu, &rqd->idle) &&
                   !cpumask_test_cpu(cpu, &rqd->tickled)) )
     {
-        ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu);
+        ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu);
         SCHED_STAT_CRANK(tickled_idle_cpu_excl);
         ipid = cpu;
         goto tickle;
@@ -1482,10 +1483,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
     for_each_affinity_balance_step( bs )
     {
         /* Just skip first step, if we don't have a soft affinity */
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(new->vcpu) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) )
             continue;
 
-        affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu));
 
         /*
          * First of all, consider idle cpus, checking if we can just
@@ -1557,7 +1558,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
             ipid = cpu;
 
             /* If this is in new's soft affinity, just take it */
-            if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+            if ( cpumask_test_cpu(cpu, unit->cpu_soft_affinity) )
             {
                 SCHED_STAT_CRANK(tickled_busy_cpu);
                 goto tickle;
@@ -2243,7 +2244,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
         goto out;
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                 cpupool_domain_cpumask(vc->domain));
 
     /*
@@ -2288,7 +2289,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
      *
      * Find both runqueues in one pass.
      */
-    has_soft = has_soft_affinity(vc);
+    has_soft = has_soft_affinity(unit);
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
@@ -2335,7 +2336,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
             cpumask_t mask;
 
             cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active);
-            if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) )
+            if ( cpumask_intersects(&mask, unit->cpu_soft_affinity) )
             {
                 min_s_avgload = rqd_avgload;
                 min_s_rqi = i;
@@ -2357,9 +2358,9 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
          * Note that, to obtain the soft-affinity mask, we "just" put what we
          * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
          * ok because:
-         * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have
+         * - we know that unit->cpu_hard_affinity and ->cpu_soft_affinity have
          *   a non-empty intersection (because has_soft is true);
-         * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already
+         * - we have unit->cpu_hard_affinity & cpupool_domain_cpumask() already
          *   in cpumask_scratch, we do save a lot doing like this.
          *
          * It's kind of like open coding affinity_balance_cpumask() but, in
@@ -2367,7 +2368,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
          * cpumask operations.
          */
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                    vc->cpu_soft_affinity);
+                    unit->cpu_soft_affinity);
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &prv->rqd[min_s_rqi].active);
     }
@@ -2475,6 +2476,7 @@ static void migrate(const struct scheduler *ops,
                     s_time_t now)
 {
     int cpu = svc->vcpu->processor;
+    struct sched_unit *unit = svc->vcpu->sched_unit;
 
     if ( unlikely(tb_init_done) )
     {
@@ -2512,7 +2514,7 @@ static void migrate(const struct scheduler *ops,
         }
         _runq_deassign(svc);
 
-        cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                     cpupool_domain_cpumask(svc->vcpu->domain));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
@@ -2546,7 +2548,7 @@ static bool vcpu_is_migrateable(struct csched2_unit *svc,
     struct vcpu *v = svc->vcpu;
     int cpu = svc->vcpu->processor;
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), v->sched_unit->cpu_hard_affinity,
                 cpupool_domain_cpumask(v->domain));
 
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
@@ -2780,7 +2782,7 @@ csched2_unit_migrate(
 
     /* If here, new_cpu must be a valid Credit2 pCPU, and in our affinity. */
     ASSERT(cpumask_test_cpu(new_cpu, &csched2_priv(ops)->initialized));
-    ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
+    ASSERT(cpumask_test_cpu(new_cpu, unit->cpu_hard_affinity));
 
     trqd = c2rqd(ops, new_cpu);
 
@@ -3320,9 +3322,9 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     }
 
     /* If scurr has a soft-affinity, let's check whether cpu is part of it */
-    if ( has_soft_affinity(scurr->vcpu) )
+    if ( has_soft_affinity(scurr->vcpu->sched_unit) )
     {
-        affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
+        affinity_balance_cpumask(scurr->vcpu->sched_unit, BALANCE_SOFT_AFFINITY,
                                  cpumask_scratch);
         if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) )
         {
@@ -3377,7 +3379,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         }
 
         /* Only consider vcpus that are allowed to run on this processor. */
-        if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
+        if ( !cpumask_test_cpu(cpu, svc->vcpu->sched_unit->cpu_hard_affinity) )
         {
             (*skipped)++;
             continue;
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index ee3a8cf064..56b0055a42 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -123,7 +123,8 @@ static inline struct null_unit *null_unit(const struct sched_unit *unit)
 static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
                                        unsigned int balance_step)
 {
-    affinity_balance_cpumask(v, balance_step, cpumask_scratch_cpu(cpu));
+    affinity_balance_cpumask(v->sched_unit, balance_step,
+                             cpumask_scratch_cpu(cpu));
     cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                 cpupool_domain_cpumask(v->domain));
 
@@ -281,10 +282,10 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
 
     for_each_affinity_balance_step( bs )
     {
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) )
             continue;
 
-        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), cpus);
 
         /*
@@ -321,7 +322,7 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
      * as we will actually assign the vCPU to the pCPU we return from here,
      * only if the pCPU is free.
      */
-    cpumask_and(cpumask_scratch_cpu(cpu), cpus, v->cpu_hard_affinity);
+    cpumask_and(cpumask_scratch_cpu(cpu), cpus, unit->cpu_hard_affinity);
     new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
 
  out:
@@ -430,7 +431,7 @@ static void null_unit_insert(const struct scheduler *ops,
 
     lock = unit_schedule_lock(unit);
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                 cpupool_domain_cpumask(v->domain));
 
     /* If the pCPU is free, we assign v to it */
@@ -488,7 +489,8 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
     {
         list_for_each_entry( wvc, &prv->waitq, waitq_elem )
         {
-            if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(wvc->vcpu) )
+            if ( bs == BALANCE_SOFT_AFFINITY &&
+                 !has_soft_affinity(wvc->vcpu->sched_unit) )
                 continue;
 
             if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
@@ -767,7 +769,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
             list_for_each_entry( wvc, &prv->waitq, waitq_elem )
             {
                 if ( bs == BALANCE_SOFT_AFFINITY &&
-                     !has_soft_affinity(wvc->vcpu) )
+                     !has_soft_affinity(wvc->vcpu->sched_unit) )
                     continue;
 
                 if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index cd737131a3..d640d87b43 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -327,7 +327,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc)
     mask = cpumask_scratch_cpu(svc->vcpu->processor);
 
     cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain);
-    cpumask_and(mask, cpupool_mask, svc->vcpu->cpu_hard_affinity);
+    cpumask_and(mask, cpupool_mask, svc->vcpu->sched_unit->cpu_hard_affinity);
     printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
            " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
            " \t\t priority_level=%d has_extratime=%d\n"
@@ -645,7 +645,7 @@ rt_res_pick(const struct scheduler *ops, struct sched_unit *unit)
     int cpu;
 
     online = cpupool_domain_cpumask(vc->domain);
-    cpumask_and(&cpus, online, vc->cpu_hard_affinity);
+    cpumask_and(&cpus, online, unit->cpu_hard_affinity);
 
     cpu = cpumask_test_cpu(vc->processor, &cpus)
             ? vc->processor
@@ -1023,7 +1023,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
 
         /* mask cpu_hard_affinity & cpupool & mask */
         online = cpupool_domain_cpumask(iter_svc->vcpu->domain);
-        cpumask_and(&cpu_common, online, iter_svc->vcpu->cpu_hard_affinity);
+        cpumask_and(&cpu_common, online,
+                    iter_svc->vcpu->sched_unit->cpu_hard_affinity);
         cpumask_and(&cpu_common, mask, &cpu_common);
         if ( cpumask_empty(&cpu_common) )
             continue;
@@ -1192,7 +1193,7 @@ runq_tickle(const struct scheduler *ops, struct rt_unit *new)
         return;
 
     online = cpupool_domain_cpumask(new->vcpu->domain);
-    cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&not_tickled, online, new->vcpu->sched_unit->cpu_hard_affinity);
     cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
 
     /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 1321c86111..212c1e637f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -270,6 +270,12 @@ static void sched_free_unit(struct sched_unit *unit)
     }
 
     unit->vcpu->sched_unit = NULL;
+
+    free_cpumask_var(unit->cpu_hard_affinity);
+    free_cpumask_var(unit->cpu_hard_affinity_tmp);
+    free_cpumask_var(unit->cpu_hard_affinity_saved);
+    free_cpumask_var(unit->cpu_soft_affinity);
+
     xfree(unit);
 }
 
@@ -293,7 +299,17 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
     unit->next_in_list = *prev_unit;
     *prev_unit = unit;
 
+    if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) ||
+         !zalloc_cpumask_var(&unit->cpu_hard_affinity_tmp) ||
+         !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) ||
+         !zalloc_cpumask_var(&unit->cpu_soft_affinity) )
+        goto fail;
+
     return unit;
+
+ fail:
+    sched_free_unit(unit);
+    return NULL;
 }
 
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
@@ -363,7 +379,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        if ( v->affinity_broken )
+        if ( v->sched_unit->affinity_broken )
             return -EBUSY;
     }
 
@@ -682,7 +698,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
              */
             if ( pick_called &&
                  (new_lock == get_sched_res(new_cpu)->schedule_lock) &&
-                 cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) &&
+                 cpumask_test_cpu(new_cpu, v->sched_unit->cpu_hard_affinity) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
 
@@ -758,6 +774,7 @@ void restore_vcpu_affinity(struct domain *d)
     {
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
+        struct sched_unit *unit = v->sched_unit;
 
         ASSERT(!vcpu_runnable(v));
 
@@ -769,15 +786,15 @@ void restore_vcpu_affinity(struct domain *d)
          * set v->processor of each of their vCPUs to something that will
          * make sense for the scheduler of the cpupool in which they are in.
          */
-        cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                     cpupool_domain_cpumask(d));
         if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
         {
-            if ( v->affinity_broken )
+            if ( unit->affinity_broken )
             {
-                sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
-                v->affinity_broken = 0;
-                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL);
+                unit->affinity_broken = 0;
+                cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
             }
 
@@ -785,18 +802,17 @@ void restore_vcpu_affinity(struct domain *d)
             {
                 printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
                 sched_set_affinity(v, &cpumask_all, NULL);
-                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
             }
         }
 
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
-        v->sched_unit->res = get_sched_res(v->processor);
+        unit->res = get_sched_res(v->processor);
 
-        lock = unit_schedule_lock_irq(v->sched_unit);
-        v->sched_unit->res = sched_pick_resource(vcpu_scheduler(v),
-                                                 v->sched_unit);
-        v->processor = v->sched_unit->res->processor;
+        lock = unit_schedule_lock_irq(unit);
+        unit->res = sched_pick_resource(vcpu_scheduler(v), unit);
+        v->processor = unit->res->processor;
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -828,16 +844,17 @@ int cpu_disable_scheduler(unsigned int cpu)
         for_each_vcpu ( d, v )
         {
             unsigned long flags;
-            spinlock_t *lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
+            struct sched_unit *unit = v->sched_unit;
+            spinlock_t *lock = unit_schedule_lock_irqsave(unit, &flags);
 
-            cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid);
+            cpumask_and(&online_affinity, unit->cpu_hard_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
-                 cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
+                 cpumask_test_cpu(cpu, unit->cpu_hard_affinity) )
             {
-                if ( v->affinity_broken )
+                if ( unit->affinity_broken )
                 {
                     /* The vcpu is temporarily pinned, can't move it. */
-                    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+                    unit_schedule_unlock_irqrestore(lock, flags, unit);
                     ret = -EADDRINUSE;
                     break;
                 }
@@ -850,7 +867,7 @@ int cpu_disable_scheduler(unsigned int cpu)
             if ( v->processor != cpu )
             {
                 /* The vcpu is not on this cpu, so we can move on. */
-                unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+                unit_schedule_unlock_irqrestore(lock, flags, unit);
                 continue;
             }
 
@@ -863,7 +880,7 @@ int cpu_disable_scheduler(unsigned int cpu)
              *    things would have failed before getting in here.
              */
             vcpu_migrate_start(v);
-            unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+            unit_schedule_unlock_irqrestore(lock, flags, unit);
 
             vcpu_migrate_finish(v);
 
@@ -892,7 +909,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 
     for_each_domain_in_cpupool ( d, c )
         for_each_vcpu ( d, v )
-            if ( v->affinity_broken )
+            if ( v->sched_unit->affinity_broken )
                 return -EADDRINUSE;
 
     return 0;
@@ -908,28 +925,31 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    sched_adjust_affinity(dom_scheduler(v->domain), v->sched_unit, hard, soft);
+    struct sched_unit *unit = v->sched_unit;
+
+    sched_adjust_affinity(dom_scheduler(v->domain), unit, hard, soft);
 
     if ( hard )
-        cpumask_copy(v->cpu_hard_affinity, hard);
+        cpumask_copy(unit->cpu_hard_affinity, hard);
     if ( soft )
-        cpumask_copy(v->cpu_soft_affinity, soft);
+        cpumask_copy(unit->cpu_soft_affinity, soft);
 
-    v->soft_aff_effective = !cpumask_subset(v->cpu_hard_affinity,
-                                            v->cpu_soft_affinity) &&
-                            cpumask_intersects(v->cpu_soft_affinity,
-                                               v->cpu_hard_affinity);
+    unit->soft_aff_effective = !cpumask_subset(unit->cpu_hard_affinity,
+                                               unit->cpu_soft_affinity) &&
+                               cpumask_intersects(unit->cpu_soft_affinity,
+                                                  unit->cpu_hard_affinity);
 }
 
 static int vcpu_set_affinity(
     struct vcpu *v, const cpumask_t *affinity, const cpumask_t *which)
 {
+    struct sched_unit *unit = v->sched_unit;
     spinlock_t *lock;
     int ret = 0;
 
-    lock = unit_schedule_lock_irq(v->sched_unit);
+    lock = unit_schedule_lock_irq(unit);
 
-    if ( v->affinity_broken )
+    if ( unit->affinity_broken )
         ret = -EBUSY;
     else
     {
@@ -937,19 +957,19 @@ static int vcpu_set_affinity(
          * Tell the scheduler we changes something about affinity,
          * and ask to re-evaluate vcpu placement.
          */
-        if ( which == v->cpu_hard_affinity )
+        if ( which == unit->cpu_hard_affinity )
         {
             sched_set_affinity(v, affinity, NULL);
         }
         else
         {
-            ASSERT(which == v->cpu_soft_affinity);
+            ASSERT(which == unit->cpu_soft_affinity);
             sched_set_affinity(v, NULL, affinity);
         }
         vcpu_migrate_start(v);
     }
 
-    unit_schedule_unlock_irq(lock, v->sched_unit);
+    unit_schedule_unlock_irq(lock, unit);
 
     domain_update_node_affinity(v->domain);
 
@@ -968,12 +988,12 @@ int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity)
     if ( cpumask_empty(&online_affinity) )
         return -EINVAL;
 
-    return vcpu_set_affinity(v, affinity, v->cpu_hard_affinity);
+    return vcpu_set_affinity(v, affinity, v->sched_unit->cpu_hard_affinity);
 }
 
 int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity)
 {
-    return vcpu_set_affinity(v, affinity, v->cpu_soft_affinity);
+    return vcpu_set_affinity(v, affinity, v->sched_unit->cpu_soft_affinity);
 }
 
 /* Block the currently-executing domain until a pertinent event occurs. */
@@ -1167,28 +1187,30 @@ void watchdog_domain_destroy(struct domain *d)
 
 int vcpu_pin_override(struct vcpu *v, int cpu)
 {
+    struct sched_unit *unit = v->sched_unit;
     spinlock_t *lock;
     int ret = -EINVAL;
 
-    lock = unit_schedule_lock_irq(v->sched_unit);
+    lock = unit_schedule_lock_irq(unit);
 
     if ( cpu < 0 )
     {
-        if ( v->affinity_broken )
+        if ( unit->affinity_broken )
         {
-            sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
-            v->affinity_broken = 0;
+            sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL);
+            unit->affinity_broken = 0;
             ret = 0;
         }
     }
     else if ( cpu < nr_cpu_ids )
     {
-        if ( v->affinity_broken )
+        if ( unit->affinity_broken )
             ret = -EBUSY;
         else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) )
         {
-            cpumask_copy(v->cpu_hard_affinity_saved, v->cpu_hard_affinity);
-            v->affinity_broken = 1;
+            cpumask_copy(unit->cpu_hard_affinity_saved,
+                         unit->cpu_hard_affinity);
+            unit->affinity_broken = 1;
             sched_set_affinity(v, cpumask_of(cpu), NULL);
             ret = 0;
         }
@@ -1197,7 +1219,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     if ( ret == 0 )
         vcpu_migrate_start(v);
 
-    unit_schedule_unlock_irq(lock, v->sched_unit);
+    unit_schedule_unlock_irq(lock, unit);
 
     domain_update_node_affinity(v->domain);
 
@@ -1549,7 +1571,7 @@ static void schedule(void)
         ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
          (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
         now);
-    prev->last_run_time = now;
+    prev->sched_unit->last_run_time = now;
 
     ASSERT(next->runstate.state != RUNSTATE_running);
     vcpu_runstate_change(next, RUNSTATE_running, now);
diff --git a/xen/common/wait.c b/xen/common/wait.c
index 4f830a14e8..37e9e0d016 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -132,7 +132,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
 
     /* Save current VCPU affinity; force wakeup on *this* CPU only. */
     wqv->wakeup_cpu = smp_processor_id();
-    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
+    cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
     if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
     {
         gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
@@ -199,7 +199,7 @@ void check_wakeup_from_wait(void)
     {
         /* Re-set VCPU affinity and re-enter the scheduler. */
         struct vcpu *curr = current;
-        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
+        cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
         if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
         {
             gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 20e36ea39b..17c01abc25 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
  * * The hard affinity is not a subset of soft affinity
  * * There is an overlap between the soft and hard affinity masks
  */
-static inline int has_soft_affinity(const struct vcpu *v)
+static inline int has_soft_affinity(const struct sched_unit *unit)
 {
-    return v->soft_aff_effective &&
-           !cpumask_subset(cpupool_domain_cpumask(v->domain),
-                           v->cpu_soft_affinity);
+    return unit->soft_aff_effective &&
+           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
+                           unit->cpu_soft_affinity);
 }
 
 /*
@@ -452,17 +452,18 @@ static inline int has_soft_affinity(const struct vcpu *v)
  * to avoid running a vcpu where it would like, but is not allowed to!
  */
 static inline void
-affinity_balance_cpumask(const struct vcpu *v, int step, cpumask_t *mask)
+affinity_balance_cpumask(const struct sched_unit *unit, int step,
+                         cpumask_t *mask)
 {
     if ( step == BALANCE_SOFT_AFFINITY )
     {
-        cpumask_and(mask, v->cpu_soft_affinity, v->cpu_hard_affinity);
+        cpumask_and(mask, unit->cpu_soft_affinity, unit->cpu_hard_affinity);
 
         if ( unlikely(cpumask_empty(mask)) )
-            cpumask_copy(mask, v->cpu_hard_affinity);
+            cpumask_copy(mask, unit->cpu_hard_affinity);
     }
     else /* step == BALANCE_HARD_AFFINITY */
-        cpumask_copy(mask, v->cpu_hard_affinity);
+        cpumask_copy(mask, unit->cpu_hard_affinity);
 }
 
 #endif /* __XEN_SCHED_IF_H__ */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ee316cddd7..13c99a9194 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -175,9 +175,6 @@ struct vcpu
     } runstate_guest; /* guest address */
 #endif
 
-    /* last time when vCPU is scheduled out */
-    uint64_t last_run_time;
-
     /* Has the FPU been initialised? */
     bool             fpu_initialised;
     /* Has the FPU been used since it was last saved? */
@@ -203,8 +200,6 @@ struct vcpu
     bool             defer_shutdown;
     /* VCPU is paused following shutdown request (d->is_shutting_down)? */
     bool             paused_for_shutdown;
-    /* VCPU need affinity restored */
-    bool             affinity_broken;
 
     /* A hypercall has been preempted. */
     bool             hcall_preempted;
@@ -213,9 +208,6 @@ struct vcpu
     bool             hcall_compat;
 #endif
 
-    /* Does soft affinity actually play a role (given hard affinity)? */
-    bool             soft_aff_effective;
-
     /* The CPU, if any, which is holding onto this VCPU's state. */
 #define VCPU_CPU_CLEAN (~0u)
     unsigned int     dirty_cpu;
@@ -247,16 +239,6 @@ struct vcpu
     evtchn_port_t    virq_to_evtchn[NR_VIRQS];
     spinlock_t       virq_lock;
 
-    /* Bitmask of CPUs on which this VCPU may run. */
-    cpumask_var_t    cpu_hard_affinity;
-    /* Used to change affinity temporarily. */
-    cpumask_var_t    cpu_hard_affinity_tmp;
-    /* Used to restore affinity across S3. */
-    cpumask_var_t    cpu_hard_affinity_saved;
-
-    /* Bitmask of CPUs on which this VCPU prefers to run. */
-    cpumask_var_t    cpu_soft_affinity;
-
     /* Tasklet for continue_hypercall_on_cpu(). */
     struct tasklet   continue_hypercall_tasklet;
 
@@ -283,6 +265,22 @@ struct sched_unit {
     void                  *priv;      /* scheduler private data */
     struct sched_unit     *next_in_list;
     struct sched_resource *res;
+
+    /* Last time when unit has been scheduled out. */
+    uint64_t               last_run_time;
+
+    /* Item needs affinity restored. */
+    bool                   affinity_broken;
+    /* Does soft affinity actually play a role (given hard affinity)? */
+    bool                   soft_aff_effective;
+    /* Bitmask of CPUs on which this VCPU may run. */
+    cpumask_var_t          cpu_hard_affinity;
+    /* Used to change affinity temporarily. */
+    cpumask_var_t          cpu_hard_affinity_tmp;
+    /* Used to restore affinity across S3. */
+    cpumask_var_t          cpu_hard_affinity_saved;
+    /* Bitmask of CPUs on which this VCPU prefers to run. */
+    cpumask_var_t          cpu_soft_affinity;
 };
 
 #define for_each_sched_unit(d, e)                                         \
@@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v)
 static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v)
 {
     return (is_hardware_domain(v->domain) &&
-            cpumask_weight(v->cpu_hard_affinity) == 1);
+            cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1);
 }
 
 #ifdef CONFIG_HAS_PASSTHROUGH
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 14/60] xen/sched: add scheduler helpers hiding vcpu
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add the following helpers using a sched_unit as input instead of a
vcpu:

- is_idle_unit() similar to is_idle_vcpu()
- unit_runnable() like vcpu_runnable()
- sched_set_res() to set the current processor of an unit
- sched_unit_cpu() to get the current processor of an unit
- sched_{set|clear}_pause_flags[_atomic]() to modify pause_flags of the
  associated vcpu(s)
- sched_idle_unit() to get the sched_unit pointer of the idle vcpu of a
  specific physical cpu

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c  |  3 +--
 xen/common/schedule.c      | 19 ++++++++--------
 xen/include/xen/sched-if.h | 56 ++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 62 insertions(+), 16 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index ffac2f4bbb..3f002771da 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1665,8 +1665,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
             SCHED_STAT_CRANK(migrate_queued);
             WARN_ON(vc->is_urgent);
             runq_remove(speer);
-            vc->processor = cpu;
-            vc->sched_unit->res = get_sched_res(cpu);
+            sched_set_res(vc->sched_unit, get_sched_res(cpu));
             /*
              * speer will start executing directly on cpu, without having to
              * go through runq_insert(). So we must update the runnable count
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 212c1e637f..78d9108956 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -317,12 +317,11 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     struct domain *d = v->domain;
     struct sched_unit *unit;
 
-    v->processor = processor;
-
     if ( (unit = sched_alloc_unit(v)) == NULL )
         return 1;
 
-    unit->res = get_sched_res(processor);
+    sched_set_res(unit, get_sched_res(processor));
+
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -436,8 +435,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         sched_set_affinity(v, &cpumask_all, &cpumask_all);
 
-        v->processor = new_p;
-        v->sched_unit->res = get_sched_res(new_p);
+        sched_set_res(v->sched_unit, get_sched_res(new_p));
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
@@ -775,8 +773,9 @@ void restore_vcpu_affinity(struct domain *d)
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
         struct sched_unit *unit = v->sched_unit;
+        struct sched_resource *res;
 
-        ASSERT(!vcpu_runnable(v));
+        ASSERT(!unit_runnable(unit));
 
         /*
          * Re-assign the initial processor as after resume we have no
@@ -807,12 +806,12 @@ void restore_vcpu_affinity(struct domain *d)
             }
         }
 
-        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
-        unit->res = get_sched_res(v->processor);
+        res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
+        sched_set_res(unit, res);
 
         lock = unit_schedule_lock_irq(unit);
-        unit->res = sched_pick_resource(vcpu_scheduler(v), unit);
-        v->processor = unit->res->processor;
+        res = sched_pick_resource(vcpu_scheduler(v), unit);
+        sched_set_res(unit, res);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 17c01abc25..da9aa04370 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -59,6 +59,57 @@ static inline void set_sched_res(unsigned int cpu, struct sched_resource *res)
     per_cpu(sched_res, cpu) = res;
 }
 
+static inline bool is_idle_unit(const struct sched_unit *unit)
+{
+    return is_idle_vcpu(unit->vcpu);
+}
+
+static inline bool unit_runnable(const struct sched_unit *unit)
+{
+    return vcpu_runnable(unit->vcpu);
+}
+
+static inline void sched_set_res(struct sched_unit *unit,
+                                 struct sched_resource *res)
+{
+    unit->vcpu->processor = res->processor;
+    unit->res = res;
+}
+
+static inline unsigned int sched_unit_cpu(struct sched_unit *unit)
+{
+    return unit->res->processor;
+}
+
+static inline void sched_set_pause_flags(struct sched_unit *unit,
+                                         unsigned int bit)
+{
+    __set_bit(bit, &unit->vcpu->pause_flags);
+}
+
+static inline void sched_clear_pause_flags(struct sched_unit *unit,
+                                           unsigned int bit)
+{
+    __clear_bit(bit, &unit->vcpu->pause_flags);
+}
+
+static inline void sched_set_pause_flags_atomic(struct sched_unit *unit,
+                                                unsigned int bit)
+{
+    set_bit(bit, &unit->vcpu->pause_flags);
+}
+
+static inline void sched_clear_pause_flags_atomic(struct sched_unit *unit,
+                                                  unsigned int bit)
+{
+    clear_bit(bit, &unit->vcpu->pause_flags);
+}
+
+static inline struct sched_unit *sched_idle_unit(unsigned int cpu)
+{
+    return idle_vcpu[cpu]->sched_unit;
+}
+
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
  * Within each scheduler, when using the scratch mask of one pCPU:
@@ -345,10 +396,7 @@ static inline void sched_migrate(const struct scheduler *s,
     if ( s->migrate )
         s->migrate(s, unit, cpu);
     else
-    {
-        unit->vcpu->processor = cpu;
-        unit->res = get_sched_res(cpu);
-    }
+        sched_set_res(unit, get_sched_res(cpu));
 }
 
 static inline struct sched_resource *sched_pick_resource(
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 14/60] xen/sched: add scheduler helpers hiding vcpu
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add the following helpers using a sched_unit as input instead of a
vcpu:

- is_idle_unit() similar to is_idle_vcpu()
- unit_runnable() like vcpu_runnable()
- sched_set_res() to set the current processor of an unit
- sched_unit_cpu() to get the current processor of an unit
- sched_{set|clear}_pause_flags[_atomic]() to modify pause_flags of the
  associated vcpu(s)
- sched_idle_unit() to get the sched_unit pointer of the idle vcpu of a
  specific physical cpu

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c  |  3 +--
 xen/common/schedule.c      | 19 ++++++++--------
 xen/include/xen/sched-if.h | 56 ++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 62 insertions(+), 16 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index ffac2f4bbb..3f002771da 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1665,8 +1665,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
             SCHED_STAT_CRANK(migrate_queued);
             WARN_ON(vc->is_urgent);
             runq_remove(speer);
-            vc->processor = cpu;
-            vc->sched_unit->res = get_sched_res(cpu);
+            sched_set_res(vc->sched_unit, get_sched_res(cpu));
             /*
              * speer will start executing directly on cpu, without having to
              * go through runq_insert(). So we must update the runnable count
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 212c1e637f..78d9108956 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -317,12 +317,11 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     struct domain *d = v->domain;
     struct sched_unit *unit;
 
-    v->processor = processor;
-
     if ( (unit = sched_alloc_unit(v)) == NULL )
         return 1;
 
-    unit->res = get_sched_res(processor);
+    sched_set_res(unit, get_sched_res(processor));
+
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -436,8 +435,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         sched_set_affinity(v, &cpumask_all, &cpumask_all);
 
-        v->processor = new_p;
-        v->sched_unit->res = get_sched_res(new_p);
+        sched_set_res(v->sched_unit, get_sched_res(new_p));
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
@@ -775,8 +773,9 @@ void restore_vcpu_affinity(struct domain *d)
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
         struct sched_unit *unit = v->sched_unit;
+        struct sched_resource *res;
 
-        ASSERT(!vcpu_runnable(v));
+        ASSERT(!unit_runnable(unit));
 
         /*
          * Re-assign the initial processor as after resume we have no
@@ -807,12 +806,12 @@ void restore_vcpu_affinity(struct domain *d)
             }
         }
 
-        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
-        unit->res = get_sched_res(v->processor);
+        res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
+        sched_set_res(unit, res);
 
         lock = unit_schedule_lock_irq(unit);
-        unit->res = sched_pick_resource(vcpu_scheduler(v), unit);
-        v->processor = unit->res->processor;
+        res = sched_pick_resource(vcpu_scheduler(v), unit);
+        sched_set_res(unit, res);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 17c01abc25..da9aa04370 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -59,6 +59,57 @@ static inline void set_sched_res(unsigned int cpu, struct sched_resource *res)
     per_cpu(sched_res, cpu) = res;
 }
 
+static inline bool is_idle_unit(const struct sched_unit *unit)
+{
+    return is_idle_vcpu(unit->vcpu);
+}
+
+static inline bool unit_runnable(const struct sched_unit *unit)
+{
+    return vcpu_runnable(unit->vcpu);
+}
+
+static inline void sched_set_res(struct sched_unit *unit,
+                                 struct sched_resource *res)
+{
+    unit->vcpu->processor = res->processor;
+    unit->res = res;
+}
+
+static inline unsigned int sched_unit_cpu(struct sched_unit *unit)
+{
+    return unit->res->processor;
+}
+
+static inline void sched_set_pause_flags(struct sched_unit *unit,
+                                         unsigned int bit)
+{
+    __set_bit(bit, &unit->vcpu->pause_flags);
+}
+
+static inline void sched_clear_pause_flags(struct sched_unit *unit,
+                                           unsigned int bit)
+{
+    __clear_bit(bit, &unit->vcpu->pause_flags);
+}
+
+static inline void sched_set_pause_flags_atomic(struct sched_unit *unit,
+                                                unsigned int bit)
+{
+    set_bit(bit, &unit->vcpu->pause_flags);
+}
+
+static inline void sched_clear_pause_flags_atomic(struct sched_unit *unit,
+                                                  unsigned int bit)
+{
+    clear_bit(bit, &unit->vcpu->pause_flags);
+}
+
+static inline struct sched_unit *sched_idle_unit(unsigned int cpu)
+{
+    return idle_vcpu[cpu]->sched_unit;
+}
+
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
  * Within each scheduler, when using the scratch mask of one pCPU:
@@ -345,10 +396,7 @@ static inline void sched_migrate(const struct scheduler *s,
     if ( s->migrate )
         s->migrate(s, unit, cpu);
     else
-    {
-        unit->vcpu->processor = cpu;
-        unit->res = get_sched_res(cpu);
-    }
+        sched_set_res(unit, get_sched_res(cpu));
 }
 
 static inline struct sched_resource *sched_pick_resource(
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 15/60] xen/sched: add domain pointer to struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add a pointer to the domain to struct sched_unit in order to avoid
having to dereference the vcpu pointer of struct sched_unit to find
the related domain.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c   | 3 ++-
 xen/include/xen/sched.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 78d9108956..61c8e1252f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -253,7 +253,7 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
 static void sched_free_unit(struct sched_unit *unit)
 {
     struct sched_unit *prev_unit;
-    struct domain *d = unit->vcpu->domain;
+    struct domain *d = unit->domain;
 
     if ( d->sched_unit_list == unit )
         d->sched_unit_list = unit->next_in_list;
@@ -289,6 +289,7 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 
     v->sched_unit = unit;
     unit->vcpu = v;
+    unit->domain = d;
 
     for ( prev_unit = &d->sched_unit_list; *prev_unit;
           prev_unit = &(*prev_unit)->next_in_list )
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 13c99a9194..0b927fdd55 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -261,6 +261,7 @@ struct vcpu
 struct sched_resource;
 
 struct sched_unit {
+    struct domain         *domain;
     struct vcpu           *vcpu;
     void                  *priv;      /* scheduler private data */
     struct sched_unit     *next_in_list;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 15/60] xen/sched: add domain pointer to struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add a pointer to the domain to struct sched_unit in order to avoid
having to dereference the vcpu pointer of struct sched_unit to find
the related domain.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c   | 3 ++-
 xen/include/xen/sched.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 78d9108956..61c8e1252f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -253,7 +253,7 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
 static void sched_free_unit(struct sched_unit *unit)
 {
     struct sched_unit *prev_unit;
-    struct domain *d = unit->vcpu->domain;
+    struct domain *d = unit->domain;
 
     if ( d->sched_unit_list == unit )
         d->sched_unit_list = unit->next_in_list;
@@ -289,6 +289,7 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 
     v->sched_unit = unit;
     unit->vcpu = v;
+    unit->domain = d;
 
     for ( prev_unit = &d->sched_unit_list; *prev_unit;
           prev_unit = &(*prev_unit)->next_in_list )
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 13c99a9194..0b927fdd55 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -261,6 +261,7 @@ struct vcpu
 struct sched_resource;
 
 struct sched_unit {
+    struct domain         *domain;
     struct vcpu           *vcpu;
     void                  *priv;      /* scheduler private data */
     struct sched_unit     *next_in_list;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 16/60] xen/sched: add id to struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add an identifier to sched_unit. For now it will be the same as the
related vcpu_id.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c   | 3 ++-
 xen/include/xen/sched.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 61c8e1252f..d1c706186f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -289,12 +289,13 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 
     v->sched_unit = unit;
     unit->vcpu = v;
+    unit->unit_id = v->vcpu_id;
     unit->domain = d;
 
     for ( prev_unit = &d->sched_unit_list; *prev_unit;
           prev_unit = &(*prev_unit)->next_in_list )
         if ( (*prev_unit)->next_in_list &&
-             (*prev_unit)->next_in_list->vcpu->vcpu_id > v->vcpu_id )
+             (*prev_unit)->next_in_list->unit_id > unit->unit_id )
             break;
 
     unit->next_in_list = *prev_unit;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 0b927fdd55..c76b81ebef 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -266,6 +266,7 @@ struct sched_unit {
     void                  *priv;      /* scheduler private data */
     struct sched_unit     *next_in_list;
     struct sched_resource *res;
+    int                    unit_id;
 
     /* Last time when unit has been scheduled out. */
     uint64_t               last_run_time;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 16/60] xen/sched: add id to struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add an identifier to sched_unit. For now it will be the same as the
related vcpu_id.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c   | 3 ++-
 xen/include/xen/sched.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 61c8e1252f..d1c706186f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -289,12 +289,13 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 
     v->sched_unit = unit;
     unit->vcpu = v;
+    unit->unit_id = v->vcpu_id;
     unit->domain = d;
 
     for ( prev_unit = &d->sched_unit_list; *prev_unit;
           prev_unit = &(*prev_unit)->next_in_list )
         if ( (*prev_unit)->next_in_list &&
-             (*prev_unit)->next_in_list->vcpu->vcpu_id > v->vcpu_id )
+             (*prev_unit)->next_in_list->unit_id > unit->unit_id )
             break;
 
     unit->next_in_list = *prev_unit;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 0b927fdd55..c76b81ebef 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -266,6 +266,7 @@ struct sched_unit {
     void                  *priv;      /* scheduler private data */
     struct sched_unit     *next_in_list;
     struct sched_resource *res;
+    int                    unit_id;
 
     /* Last time when unit has been scheduled out. */
     uint64_t               last_run_time;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 17/60] xen/sched: rename scheduler related perf counters
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Rename the scheduler related perf counters from vcpu* to unit* where
appropriate.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c    | 32 ++++++++++++++++----------------
 xen/common/sched_credit2.c   | 18 +++++++++---------
 xen/common/sched_null.c      | 18 +++++++++---------
 xen/common/sched_rt.c        | 16 ++++++++--------
 xen/include/xen/perfc_defn.h | 30 +++++++++++++++---------------
 5 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 3f002771da..0d95296d6a 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -675,7 +675,7 @@ __csched_vcpu_check(struct vcpu *vc)
         BUG_ON( !is_idle_vcpu(vc) );
     }
 
-    SCHED_STAT_CRANK(vcpu_check);
+    SCHED_STAT_CRANK(unit_check);
 }
 #define CSCHED_VCPU_CHECK(_vc)  (__csched_vcpu_check(_vc))
 #else
@@ -698,7 +698,7 @@ __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
                (NOW() - v->sched_unit->last_run_time) < prv->vcpu_migr_delay;
 
     if ( hot )
-        SCHED_STAT_CRANK(vcpu_hot);
+        SCHED_STAT_CRANK(unit_hot);
 
     return hot;
 }
@@ -886,7 +886,7 @@ __csched_vcpu_acct_start(struct csched_private *prv, struct csched_unit *svc)
     if ( list_empty(&svc->active_vcpu_elem) )
     {
         SCHED_VCPU_STAT_CRANK(svc, state_active);
-        SCHED_STAT_CRANK(acct_vcpu_active);
+        SCHED_STAT_CRANK(acct_unit_active);
 
         sdom->active_vcpu_count++;
         list_add(&svc->active_vcpu_elem, &sdom->active_vcpu);
@@ -913,7 +913,7 @@ __csched_vcpu_acct_stop_locked(struct csched_private *prv,
     BUG_ON( list_empty(&svc->active_vcpu_elem) );
 
     SCHED_VCPU_STAT_CRANK(svc, state_idle);
-    SCHED_STAT_CRANK(acct_vcpu_idle);
+    SCHED_STAT_CRANK(acct_unit_idle);
 
     BUG_ON( prv->weight < sdom->weight );
     sdom->active_vcpu_count--;
@@ -1015,7 +1015,7 @@ csched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
     svc->pri = is_idle_domain(vc->domain) ?
         CSCHED_PRI_IDLE : CSCHED_PRI_TS_UNDER;
     SCHED_VCPU_STATS_RESET(svc);
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(unit_alloc);
     return svc;
 }
 
@@ -1043,7 +1043,7 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 
     unit_schedule_unlock_irq(lock, unit);
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(unit_insert);
 }
 
 static void
@@ -1063,13 +1063,13 @@ csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     struct csched_unit * const svc = CSCHED_UNIT(unit);
     struct csched_dom * const sdom = svc->sdom;
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(unit_remove);
 
     ASSERT(!__vcpu_on_runq(svc));
 
     if ( test_and_clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
     {
-        SCHED_STAT_CRANK(vcpu_unpark);
+        SCHED_STAT_CRANK(unit_unpark);
         vcpu_unpause(svc->vcpu);
     }
 
@@ -1090,7 +1090,7 @@ csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
     struct csched_unit * const svc = CSCHED_UNIT(unit);
     unsigned int cpu = vc->processor;
 
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(unit_sleep);
 
     BUG_ON( is_idle_vcpu(vc) );
 
@@ -1119,19 +1119,19 @@ csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
     if ( unlikely(curr_on_cpu(vc->processor) == unit) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(unit_wake_running);
         return;
     }
     if ( unlikely(__vcpu_on_runq(svc)) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(unit_wake_onrunq);
         return;
     }
 
     if ( likely(vcpu_runnable(vc)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(unit_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(unit_wake_not_runnable);
 
     /*
      * We temporarly boost the priority of awaking VCPUs!
@@ -1161,7 +1161,7 @@ csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
          !test_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
     {
         TRACE_2D(TRC_CSCHED_BOOST_START, vc->domain->domain_id, vc->vcpu_id);
-        SCHED_STAT_CRANK(vcpu_boost);
+        SCHED_STAT_CRANK(unit_boost);
         svc->pri = CSCHED_PRI_TS_BOOST;
     }
 
@@ -1520,7 +1520,7 @@ csched_acct(void* dummy)
                      credit < -credit_cap &&
                      !test_and_set_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
                 {
-                    SCHED_STAT_CRANK(vcpu_park);
+                    SCHED_STAT_CRANK(unit_park);
                     vcpu_pause_nosync(svc->vcpu);
                 }
 
@@ -1544,7 +1544,7 @@ csched_acct(void* dummy)
                      * call to make sure the VCPU's priority is not boosted
                      * if it is woken up here.
                      */
-                    SCHED_STAT_CRANK(vcpu_unpark);
+                    SCHED_STAT_CRANK(unit_unpark);
                     vcpu_unpause(svc->vcpu);
                     clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags);
                 }
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index dabd5636f5..3fc0be4358 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2019,7 +2019,7 @@ csched2_vcpu_check(struct vcpu *vc)
     {
         BUG_ON( !is_idle_vcpu(vc) );
     }
-    SCHED_STAT_CRANK(vcpu_check);
+    SCHED_STAT_CRANK(unit_check);
 }
 #define CSCHED2_VCPU_CHECK(_vc)  (csched2_vcpu_check(_vc))
 #else
@@ -2066,7 +2066,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
     svc->budget_quota = 0;
     INIT_LIST_HEAD(&svc->parked_elem);
 
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(unit_alloc);
 
     return svc;
 }
@@ -2078,7 +2078,7 @@ csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
     struct csched2_unit * const svc = csched2_unit(unit);
 
     ASSERT(!is_idle_vcpu(vc));
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(unit_sleep);
 
     if ( curr_on_cpu(vc->processor) == unit )
     {
@@ -2108,20 +2108,20 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
     if ( unlikely(curr_on_cpu(cpu) == unit) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(unit_wake_running);
         goto out;
     }
 
     if ( unlikely(vcpu_on_runq(svc)) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(unit_wake_onrunq);
         goto out;
     }
 
     if ( likely(vcpu_runnable(vc)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(unit_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(unit_wake_not_runnable);
 
     /* If the context hasn't been saved for this vcpu yet, we can't put it on
      * another runqueue.  Instead, we set a flag so that it will be put on the runqueue
@@ -3137,7 +3137,7 @@ csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 
     sdom->nr_vcpus++;
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(unit_insert);
 
     CSCHED2_VCPU_CHECK(vc);
 }
@@ -3160,7 +3160,7 @@ csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     ASSERT(!is_idle_vcpu(vc));
     ASSERT(list_empty(&svc->runq_elem));
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(unit_remove);
 
     /* Remove from runqueue */
     lock = unit_schedule_lock_irq(unit);
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 56b0055a42..ac282f437e 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -207,7 +207,7 @@ static void *null_alloc_vdata(const struct scheduler *ops,
     INIT_LIST_HEAD(&nvc->waitq_elem);
     nvc->vcpu = v;
 
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(unit_alloc);
 
     return nvc;
 }
@@ -465,7 +465,7 @@ static void null_unit_insert(const struct scheduler *ops,
     }
     spin_unlock_irq(lock);
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(unit_insert);
 }
 
 static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
@@ -536,7 +536,7 @@ static void null_unit_remove(const struct scheduler *ops,
  out:
     unit_schedule_unlock_irq(lock, unit);
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(unit_remove);
 }
 
 static void null_unit_wake(const struct scheduler *ops,
@@ -548,21 +548,21 @@ static void null_unit_wake(const struct scheduler *ops,
 
     if ( unlikely(curr_on_cpu(v->processor) == unit) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(unit_wake_running);
         return;
     }
 
     if ( unlikely(!list_empty(&null_unit(unit)->waitq_elem)) )
     {
         /* Not exactly "on runq", but close enough for reusing the counter */
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(unit_wake_onrunq);
         return;
     }
 
     if ( likely(vcpu_runnable(v)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(unit_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(unit_wake_not_runnable);
 
     /* Note that we get here only for vCPUs assigned to a pCPU */
     cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
@@ -579,7 +579,7 @@ static void null_unit_sleep(const struct scheduler *ops,
     if ( curr_on_cpu(v->processor) == unit )
         cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
 
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(unit_sleep);
 }
 
 static struct sched_resource *
@@ -689,7 +689,7 @@ static inline void null_vcpu_check(struct vcpu *v)
     else
         BUG_ON(!is_idle_vcpu(v));
 
-    SCHED_STAT_CRANK(vcpu_check);
+    SCHED_STAT_CRANK(unit_check);
 }
 #define NULL_VCPU_CHECK(v)  (null_vcpu_check(v))
 #else
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index d640d87b43..9f18a509bd 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -863,7 +863,7 @@ rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
     if ( !is_idle_vcpu(vc) )
         svc->budget = RTDS_DEFAULT_BUDGET;
 
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(unit_alloc);
 
     return svc;
 }
@@ -912,7 +912,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     }
     unit_schedule_unlock_irq(lock, unit);
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(unit_insert);
 }
 
 /*
@@ -925,7 +925,7 @@ rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(unit_remove);
 
     BUG_ON( sdom == NULL );
 
@@ -1147,7 +1147,7 @@ rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
     struct rt_unit * const svc = rt_unit(unit);
 
     BUG_ON( is_idle_vcpu(vc) );
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(unit_sleep);
 
     if ( curr_on_cpu(vc->processor) == unit )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
@@ -1268,21 +1268,21 @@ rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
     if ( unlikely(curr_on_cpu(vc->processor) == unit) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(unit_wake_running);
         return;
     }
 
     /* on RunQ/DepletedQ, just update info is ok */
     if ( unlikely(vcpu_on_q(svc)) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(unit_wake_onrunq);
         return;
     }
 
     if ( likely(vcpu_runnable(vc)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(unit_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(unit_wake_not_runnable);
 
     /*
      * If a deadline passed while svc was asleep/blocked, we need new
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index 1ad4384080..08b182ccd9 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -21,20 +21,20 @@ PERFCOUNTER(sched_ctx,              "sched: context switches")
 PERFCOUNTER(schedule,               "sched: specific scheduler")
 PERFCOUNTER(dom_init,               "sched: dom_init")
 PERFCOUNTER(dom_destroy,            "sched: dom_destroy")
-PERFCOUNTER(vcpu_alloc,             "sched: vcpu_alloc")
-PERFCOUNTER(vcpu_insert,            "sched: vcpu_insert")
-PERFCOUNTER(vcpu_remove,            "sched: vcpu_remove")
-PERFCOUNTER(vcpu_sleep,             "sched: vcpu_sleep")
 PERFCOUNTER(vcpu_yield,             "sched: vcpu_yield")
-PERFCOUNTER(vcpu_wake_running,      "sched: vcpu_wake_running")
-PERFCOUNTER(vcpu_wake_onrunq,       "sched: vcpu_wake_onrunq")
-PERFCOUNTER(vcpu_wake_runnable,     "sched: vcpu_wake_runnable")
-PERFCOUNTER(vcpu_wake_not_runnable, "sched: vcpu_wake_not_runnable")
+PERFCOUNTER(unit_alloc,             "sched: unit_alloc")
+PERFCOUNTER(unit_insert,            "sched: unit_insert")
+PERFCOUNTER(unit_remove,            "sched: unit_remove")
+PERFCOUNTER(unit_sleep,             "sched: unit_sleep")
+PERFCOUNTER(unit_wake_running,      "sched: unit_wake_running")
+PERFCOUNTER(unit_wake_onrunq,       "sched: unit_wake_onrunq")
+PERFCOUNTER(unit_wake_runnable,     "sched: unit_wake_runnable")
+PERFCOUNTER(unit_wake_not_runnable, "sched: unit_wake_not_runnable")
 PERFCOUNTER(tickled_no_cpu,         "sched: tickled_no_cpu")
 PERFCOUNTER(tickled_idle_cpu,       "sched: tickled_idle_cpu")
 PERFCOUNTER(tickled_idle_cpu_excl,  "sched: tickled_idle_cpu_exclusive")
 PERFCOUNTER(tickled_busy_cpu,       "sched: tickled_busy_cpu")
-PERFCOUNTER(vcpu_check,             "sched: vcpu_check")
+PERFCOUNTER(unit_check,             "sched: unit_check")
 
 /* credit specific counters */
 PERFCOUNTER(delay_ms,               "csched: delay")
@@ -43,11 +43,11 @@ PERFCOUNTER(acct_no_work,           "csched: acct_no_work")
 PERFCOUNTER(acct_balance,           "csched: acct_balance")
 PERFCOUNTER(acct_reorder,           "csched: acct_reorder")
 PERFCOUNTER(acct_min_credit,        "csched: acct_min_credit")
-PERFCOUNTER(acct_vcpu_active,       "csched: acct_vcpu_active")
-PERFCOUNTER(acct_vcpu_idle,         "csched: acct_vcpu_idle")
-PERFCOUNTER(vcpu_boost,             "csched: vcpu_boost")
-PERFCOUNTER(vcpu_park,              "csched: vcpu_park")
-PERFCOUNTER(vcpu_unpark,            "csched: vcpu_unpark")
+PERFCOUNTER(acct_unit_active,       "csched: acct_unit_active")
+PERFCOUNTER(acct_unit_idle,         "csched: acct_unit_idle")
+PERFCOUNTER(unit_boost,             "csched: unit_boost")
+PERFCOUNTER(unit_park,              "csched: unit_park")
+PERFCOUNTER(unit_unpark,            "csched: unit_unpark")
 PERFCOUNTER(load_balance_idle,      "csched: load_balance_idle")
 PERFCOUNTER(load_balance_over,      "csched: load_balance_over")
 PERFCOUNTER(load_balance_other,     "csched: load_balance_other")
@@ -57,7 +57,7 @@ PERFCOUNTER(steal_peer_idle,        "csched: steal_peer_idle")
 PERFCOUNTER(migrate_queued,         "csched: migrate_queued")
 PERFCOUNTER(migrate_running,        "csched: migrate_running")
 PERFCOUNTER(migrate_kicked_away,    "csched: migrate_kicked_away")
-PERFCOUNTER(vcpu_hot,               "csched: vcpu_hot")
+PERFCOUNTER(unit_hot,               "csched: unit_hot")
 
 /* credit2 specific counters */
 PERFCOUNTER(burn_credits_t2c,       "csched2: burn_credits_t2c")
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 17/60] xen/sched: rename scheduler related perf counters
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Rename the scheduler related perf counters from vcpu* to unit* where
appropriate.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c    | 32 ++++++++++++++++----------------
 xen/common/sched_credit2.c   | 18 +++++++++---------
 xen/common/sched_null.c      | 18 +++++++++---------
 xen/common/sched_rt.c        | 16 ++++++++--------
 xen/include/xen/perfc_defn.h | 30 +++++++++++++++---------------
 5 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 3f002771da..0d95296d6a 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -675,7 +675,7 @@ __csched_vcpu_check(struct vcpu *vc)
         BUG_ON( !is_idle_vcpu(vc) );
     }
 
-    SCHED_STAT_CRANK(vcpu_check);
+    SCHED_STAT_CRANK(unit_check);
 }
 #define CSCHED_VCPU_CHECK(_vc)  (__csched_vcpu_check(_vc))
 #else
@@ -698,7 +698,7 @@ __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
                (NOW() - v->sched_unit->last_run_time) < prv->vcpu_migr_delay;
 
     if ( hot )
-        SCHED_STAT_CRANK(vcpu_hot);
+        SCHED_STAT_CRANK(unit_hot);
 
     return hot;
 }
@@ -886,7 +886,7 @@ __csched_vcpu_acct_start(struct csched_private *prv, struct csched_unit *svc)
     if ( list_empty(&svc->active_vcpu_elem) )
     {
         SCHED_VCPU_STAT_CRANK(svc, state_active);
-        SCHED_STAT_CRANK(acct_vcpu_active);
+        SCHED_STAT_CRANK(acct_unit_active);
 
         sdom->active_vcpu_count++;
         list_add(&svc->active_vcpu_elem, &sdom->active_vcpu);
@@ -913,7 +913,7 @@ __csched_vcpu_acct_stop_locked(struct csched_private *prv,
     BUG_ON( list_empty(&svc->active_vcpu_elem) );
 
     SCHED_VCPU_STAT_CRANK(svc, state_idle);
-    SCHED_STAT_CRANK(acct_vcpu_idle);
+    SCHED_STAT_CRANK(acct_unit_idle);
 
     BUG_ON( prv->weight < sdom->weight );
     sdom->active_vcpu_count--;
@@ -1015,7 +1015,7 @@ csched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
     svc->pri = is_idle_domain(vc->domain) ?
         CSCHED_PRI_IDLE : CSCHED_PRI_TS_UNDER;
     SCHED_VCPU_STATS_RESET(svc);
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(unit_alloc);
     return svc;
 }
 
@@ -1043,7 +1043,7 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 
     unit_schedule_unlock_irq(lock, unit);
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(unit_insert);
 }
 
 static void
@@ -1063,13 +1063,13 @@ csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     struct csched_unit * const svc = CSCHED_UNIT(unit);
     struct csched_dom * const sdom = svc->sdom;
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(unit_remove);
 
     ASSERT(!__vcpu_on_runq(svc));
 
     if ( test_and_clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
     {
-        SCHED_STAT_CRANK(vcpu_unpark);
+        SCHED_STAT_CRANK(unit_unpark);
         vcpu_unpause(svc->vcpu);
     }
 
@@ -1090,7 +1090,7 @@ csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
     struct csched_unit * const svc = CSCHED_UNIT(unit);
     unsigned int cpu = vc->processor;
 
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(unit_sleep);
 
     BUG_ON( is_idle_vcpu(vc) );
 
@@ -1119,19 +1119,19 @@ csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
     if ( unlikely(curr_on_cpu(vc->processor) == unit) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(unit_wake_running);
         return;
     }
     if ( unlikely(__vcpu_on_runq(svc)) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(unit_wake_onrunq);
         return;
     }
 
     if ( likely(vcpu_runnable(vc)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(unit_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(unit_wake_not_runnable);
 
     /*
      * We temporarly boost the priority of awaking VCPUs!
@@ -1161,7 +1161,7 @@ csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
          !test_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
     {
         TRACE_2D(TRC_CSCHED_BOOST_START, vc->domain->domain_id, vc->vcpu_id);
-        SCHED_STAT_CRANK(vcpu_boost);
+        SCHED_STAT_CRANK(unit_boost);
         svc->pri = CSCHED_PRI_TS_BOOST;
     }
 
@@ -1520,7 +1520,7 @@ csched_acct(void* dummy)
                      credit < -credit_cap &&
                      !test_and_set_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
                 {
-                    SCHED_STAT_CRANK(vcpu_park);
+                    SCHED_STAT_CRANK(unit_park);
                     vcpu_pause_nosync(svc->vcpu);
                 }
 
@@ -1544,7 +1544,7 @@ csched_acct(void* dummy)
                      * call to make sure the VCPU's priority is not boosted
                      * if it is woken up here.
                      */
-                    SCHED_STAT_CRANK(vcpu_unpark);
+                    SCHED_STAT_CRANK(unit_unpark);
                     vcpu_unpause(svc->vcpu);
                     clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags);
                 }
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index dabd5636f5..3fc0be4358 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2019,7 +2019,7 @@ csched2_vcpu_check(struct vcpu *vc)
     {
         BUG_ON( !is_idle_vcpu(vc) );
     }
-    SCHED_STAT_CRANK(vcpu_check);
+    SCHED_STAT_CRANK(unit_check);
 }
 #define CSCHED2_VCPU_CHECK(_vc)  (csched2_vcpu_check(_vc))
 #else
@@ -2066,7 +2066,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
     svc->budget_quota = 0;
     INIT_LIST_HEAD(&svc->parked_elem);
 
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(unit_alloc);
 
     return svc;
 }
@@ -2078,7 +2078,7 @@ csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
     struct csched2_unit * const svc = csched2_unit(unit);
 
     ASSERT(!is_idle_vcpu(vc));
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(unit_sleep);
 
     if ( curr_on_cpu(vc->processor) == unit )
     {
@@ -2108,20 +2108,20 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
     if ( unlikely(curr_on_cpu(cpu) == unit) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(unit_wake_running);
         goto out;
     }
 
     if ( unlikely(vcpu_on_runq(svc)) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(unit_wake_onrunq);
         goto out;
     }
 
     if ( likely(vcpu_runnable(vc)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(unit_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(unit_wake_not_runnable);
 
     /* If the context hasn't been saved for this vcpu yet, we can't put it on
      * another runqueue.  Instead, we set a flag so that it will be put on the runqueue
@@ -3137,7 +3137,7 @@ csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 
     sdom->nr_vcpus++;
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(unit_insert);
 
     CSCHED2_VCPU_CHECK(vc);
 }
@@ -3160,7 +3160,7 @@ csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     ASSERT(!is_idle_vcpu(vc));
     ASSERT(list_empty(&svc->runq_elem));
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(unit_remove);
 
     /* Remove from runqueue */
     lock = unit_schedule_lock_irq(unit);
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 56b0055a42..ac282f437e 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -207,7 +207,7 @@ static void *null_alloc_vdata(const struct scheduler *ops,
     INIT_LIST_HEAD(&nvc->waitq_elem);
     nvc->vcpu = v;
 
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(unit_alloc);
 
     return nvc;
 }
@@ -465,7 +465,7 @@ static void null_unit_insert(const struct scheduler *ops,
     }
     spin_unlock_irq(lock);
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(unit_insert);
 }
 
 static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
@@ -536,7 +536,7 @@ static void null_unit_remove(const struct scheduler *ops,
  out:
     unit_schedule_unlock_irq(lock, unit);
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(unit_remove);
 }
 
 static void null_unit_wake(const struct scheduler *ops,
@@ -548,21 +548,21 @@ static void null_unit_wake(const struct scheduler *ops,
 
     if ( unlikely(curr_on_cpu(v->processor) == unit) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(unit_wake_running);
         return;
     }
 
     if ( unlikely(!list_empty(&null_unit(unit)->waitq_elem)) )
     {
         /* Not exactly "on runq", but close enough for reusing the counter */
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(unit_wake_onrunq);
         return;
     }
 
     if ( likely(vcpu_runnable(v)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(unit_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(unit_wake_not_runnable);
 
     /* Note that we get here only for vCPUs assigned to a pCPU */
     cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
@@ -579,7 +579,7 @@ static void null_unit_sleep(const struct scheduler *ops,
     if ( curr_on_cpu(v->processor) == unit )
         cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
 
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(unit_sleep);
 }
 
 static struct sched_resource *
@@ -689,7 +689,7 @@ static inline void null_vcpu_check(struct vcpu *v)
     else
         BUG_ON(!is_idle_vcpu(v));
 
-    SCHED_STAT_CRANK(vcpu_check);
+    SCHED_STAT_CRANK(unit_check);
 }
 #define NULL_VCPU_CHECK(v)  (null_vcpu_check(v))
 #else
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index d640d87b43..9f18a509bd 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -863,7 +863,7 @@ rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
     if ( !is_idle_vcpu(vc) )
         svc->budget = RTDS_DEFAULT_BUDGET;
 
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(unit_alloc);
 
     return svc;
 }
@@ -912,7 +912,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     }
     unit_schedule_unlock_irq(lock, unit);
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(unit_insert);
 }
 
 /*
@@ -925,7 +925,7 @@ rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(unit_remove);
 
     BUG_ON( sdom == NULL );
 
@@ -1147,7 +1147,7 @@ rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
     struct rt_unit * const svc = rt_unit(unit);
 
     BUG_ON( is_idle_vcpu(vc) );
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(unit_sleep);
 
     if ( curr_on_cpu(vc->processor) == unit )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
@@ -1268,21 +1268,21 @@ rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
     if ( unlikely(curr_on_cpu(vc->processor) == unit) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(unit_wake_running);
         return;
     }
 
     /* on RunQ/DepletedQ, just update info is ok */
     if ( unlikely(vcpu_on_q(svc)) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(unit_wake_onrunq);
         return;
     }
 
     if ( likely(vcpu_runnable(vc)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(unit_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(unit_wake_not_runnable);
 
     /*
      * If a deadline passed while svc was asleep/blocked, we need new
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index 1ad4384080..08b182ccd9 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -21,20 +21,20 @@ PERFCOUNTER(sched_ctx,              "sched: context switches")
 PERFCOUNTER(schedule,               "sched: specific scheduler")
 PERFCOUNTER(dom_init,               "sched: dom_init")
 PERFCOUNTER(dom_destroy,            "sched: dom_destroy")
-PERFCOUNTER(vcpu_alloc,             "sched: vcpu_alloc")
-PERFCOUNTER(vcpu_insert,            "sched: vcpu_insert")
-PERFCOUNTER(vcpu_remove,            "sched: vcpu_remove")
-PERFCOUNTER(vcpu_sleep,             "sched: vcpu_sleep")
 PERFCOUNTER(vcpu_yield,             "sched: vcpu_yield")
-PERFCOUNTER(vcpu_wake_running,      "sched: vcpu_wake_running")
-PERFCOUNTER(vcpu_wake_onrunq,       "sched: vcpu_wake_onrunq")
-PERFCOUNTER(vcpu_wake_runnable,     "sched: vcpu_wake_runnable")
-PERFCOUNTER(vcpu_wake_not_runnable, "sched: vcpu_wake_not_runnable")
+PERFCOUNTER(unit_alloc,             "sched: unit_alloc")
+PERFCOUNTER(unit_insert,            "sched: unit_insert")
+PERFCOUNTER(unit_remove,            "sched: unit_remove")
+PERFCOUNTER(unit_sleep,             "sched: unit_sleep")
+PERFCOUNTER(unit_wake_running,      "sched: unit_wake_running")
+PERFCOUNTER(unit_wake_onrunq,       "sched: unit_wake_onrunq")
+PERFCOUNTER(unit_wake_runnable,     "sched: unit_wake_runnable")
+PERFCOUNTER(unit_wake_not_runnable, "sched: unit_wake_not_runnable")
 PERFCOUNTER(tickled_no_cpu,         "sched: tickled_no_cpu")
 PERFCOUNTER(tickled_idle_cpu,       "sched: tickled_idle_cpu")
 PERFCOUNTER(tickled_idle_cpu_excl,  "sched: tickled_idle_cpu_exclusive")
 PERFCOUNTER(tickled_busy_cpu,       "sched: tickled_busy_cpu")
-PERFCOUNTER(vcpu_check,             "sched: vcpu_check")
+PERFCOUNTER(unit_check,             "sched: unit_check")
 
 /* credit specific counters */
 PERFCOUNTER(delay_ms,               "csched: delay")
@@ -43,11 +43,11 @@ PERFCOUNTER(acct_no_work,           "csched: acct_no_work")
 PERFCOUNTER(acct_balance,           "csched: acct_balance")
 PERFCOUNTER(acct_reorder,           "csched: acct_reorder")
 PERFCOUNTER(acct_min_credit,        "csched: acct_min_credit")
-PERFCOUNTER(acct_vcpu_active,       "csched: acct_vcpu_active")
-PERFCOUNTER(acct_vcpu_idle,         "csched: acct_vcpu_idle")
-PERFCOUNTER(vcpu_boost,             "csched: vcpu_boost")
-PERFCOUNTER(vcpu_park,              "csched: vcpu_park")
-PERFCOUNTER(vcpu_unpark,            "csched: vcpu_unpark")
+PERFCOUNTER(acct_unit_active,       "csched: acct_unit_active")
+PERFCOUNTER(acct_unit_idle,         "csched: acct_unit_idle")
+PERFCOUNTER(unit_boost,             "csched: unit_boost")
+PERFCOUNTER(unit_park,              "csched: unit_park")
+PERFCOUNTER(unit_unpark,            "csched: unit_unpark")
 PERFCOUNTER(load_balance_idle,      "csched: load_balance_idle")
 PERFCOUNTER(load_balance_over,      "csched: load_balance_over")
 PERFCOUNTER(load_balance_other,     "csched: load_balance_other")
@@ -57,7 +57,7 @@ PERFCOUNTER(steal_peer_idle,        "csched: steal_peer_idle")
 PERFCOUNTER(migrate_queued,         "csched: migrate_queued")
 PERFCOUNTER(migrate_running,        "csched: migrate_running")
 PERFCOUNTER(migrate_kicked_away,    "csched: migrate_kicked_away")
-PERFCOUNTER(vcpu_hot,               "csched: vcpu_hot")
+PERFCOUNTER(unit_hot,               "csched: unit_hot")
 
 /* credit2 specific counters */
 PERFCOUNTER(burn_credits_t2c,       "csched2: burn_credits_t2c")
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 18/60] xen/sched: switch struct task_slice from vcpu to sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

Let the schedulers put a sched_unit pointer into struct task_slice
instead of a vcpu pointer.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |  8 ++++----
 xen/common/sched_credit.c   |  4 ++--
 xen/common/sched_credit2.c  |  4 ++--
 xen/common/sched_null.c     | 12 ++++++------
 xen/common/sched_rt.c       |  2 +-
 xen/common/schedule.c       |  2 +-
 xen/include/xen/sched-if.h  |  6 +++---
 7 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 6e7b2c9968..0c75440bd0 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -554,9 +554,9 @@ a653sched_do_schedule(
 
     /*
      * If there are more domains to run in the current major frame, set
-     * new_task equal to the address of next domain's VCPU structure.
-     * Otherwise, set new_task equal to the address of the idle task's VCPU
-     * structure.
+     * new_task equal to the address of next domain's sched_unit structure.
+     * Otherwise, set new_task equal to the address of the idle task's
+     * sched_unit structure.
      */
     new_task = (sched_index < sched_priv->num_schedule_entries)
         ? sched_priv->schedule[sched_index].vc
@@ -592,7 +592,7 @@ a653sched_do_schedule(
      * of the selected task's VCPU structure.
      */
     ret.time = next_switch_time - now;
-    ret.task = new_task;
+    ret.task = new_task->sched_unit;
     ret.migrated = 0;
 
     BUG_ON(ret.time <= 0);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 0d95296d6a..6908e373dc 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1997,9 +1997,9 @@ out:
      */
     ret.time = (is_idle_vcpu(snext->vcpu) ?
                 -1 : tslice);
-    ret.task = snext->vcpu;
+    ret.task = snext->vcpu->sched_unit;
 
-    CSCHED_VCPU_CHECK(ret.task);
+    CSCHED_VCPU_CHECK(ret.task->vcpu);
     return ret;
 }
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 3fc0be4358..aea05edbb2 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3636,9 +3636,9 @@ csched2_schedule(
      * Return task to run next...
      */
     ret.time = csched2_runtime(ops, cpu, snext, now);
-    ret.task = snext->vcpu;
+    ret.task = snext->vcpu->sched_unit;
 
-    CSCHED2_VCPU_CHECK(ret.task);
+    CSCHED2_VCPU_CHECK(ret.task->vcpu);
     return ret;
 }
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index ac282f437e..e490b791b8 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -738,10 +738,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = idle_vcpu[cpu];
+        ret.task = idle_vcpu[cpu]->sched_unit;
     }
     else
-        ret.task = per_cpu(npc, cpu).vcpu;
+        ret.task = per_cpu(npc, cpu).vcpu->sched_unit;
     ret.migrated = 0;
     ret.time = -1;
 
@@ -776,7 +776,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
                 {
                     vcpu_assign(prv, wvc->vcpu, cpu);
                     list_del_init(&wvc->waitq_elem);
-                    ret.task = wvc->vcpu;
+                    ret.task = wvc->vcpu->sched_unit;
                     goto unlock;
                 }
             }
@@ -785,10 +785,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
         spin_unlock(&prv->waitq_lock);
     }
 
-    if ( unlikely(ret.task == NULL || !vcpu_runnable(ret.task)) )
-        ret.task = idle_vcpu[cpu];
+    if ( unlikely(ret.task == NULL || !unit_runnable(ret.task)) )
+        ret.task = idle_vcpu[cpu]->sched_unit;
 
-    NULL_VCPU_CHECK(ret.task);
+    NULL_VCPU_CHECK(ret.task->vcpu);
     return ret;
 }
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 9f18a509bd..f781e46f9f 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1131,7 +1131,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
     }
-    ret.task = snext->vcpu;
+    ret.task = snext->vcpu->sched_unit;
 
     return ret;
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index d1c706186f..f4aff72105 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1534,7 +1534,7 @@ static void schedule(void)
     sched = this_cpu(scheduler);
     next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
 
-    next = next_slice.task;
+    next = next_slice.task->vcpu;
 
     sd->curr = next->sched_unit;
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index da9aa04370..c5bc0b689c 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -186,9 +186,9 @@ static inline spinlock_t *pcpu_schedule_trylock(unsigned int cpu)
 }
 
 struct task_slice {
-    struct vcpu *task;
-    s_time_t     time;
-    bool_t       migrated;
+    struct sched_unit *task;
+    s_time_t           time;
+    bool_t             migrated;
 };
 
 struct scheduler {
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 18/60] xen/sched: switch struct task_slice from vcpu to sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

Let the schedulers put a sched_unit pointer into struct task_slice
instead of a vcpu pointer.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |  8 ++++----
 xen/common/sched_credit.c   |  4 ++--
 xen/common/sched_credit2.c  |  4 ++--
 xen/common/sched_null.c     | 12 ++++++------
 xen/common/sched_rt.c       |  2 +-
 xen/common/schedule.c       |  2 +-
 xen/include/xen/sched-if.h  |  6 +++---
 7 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 6e7b2c9968..0c75440bd0 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -554,9 +554,9 @@ a653sched_do_schedule(
 
     /*
      * If there are more domains to run in the current major frame, set
-     * new_task equal to the address of next domain's VCPU structure.
-     * Otherwise, set new_task equal to the address of the idle task's VCPU
-     * structure.
+     * new_task equal to the address of next domain's sched_unit structure.
+     * Otherwise, set new_task equal to the address of the idle task's
+     * sched_unit structure.
      */
     new_task = (sched_index < sched_priv->num_schedule_entries)
         ? sched_priv->schedule[sched_index].vc
@@ -592,7 +592,7 @@ a653sched_do_schedule(
      * of the selected task's VCPU structure.
      */
     ret.time = next_switch_time - now;
-    ret.task = new_task;
+    ret.task = new_task->sched_unit;
     ret.migrated = 0;
 
     BUG_ON(ret.time <= 0);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 0d95296d6a..6908e373dc 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1997,9 +1997,9 @@ out:
      */
     ret.time = (is_idle_vcpu(snext->vcpu) ?
                 -1 : tslice);
-    ret.task = snext->vcpu;
+    ret.task = snext->vcpu->sched_unit;
 
-    CSCHED_VCPU_CHECK(ret.task);
+    CSCHED_VCPU_CHECK(ret.task->vcpu);
     return ret;
 }
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 3fc0be4358..aea05edbb2 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3636,9 +3636,9 @@ csched2_schedule(
      * Return task to run next...
      */
     ret.time = csched2_runtime(ops, cpu, snext, now);
-    ret.task = snext->vcpu;
+    ret.task = snext->vcpu->sched_unit;
 
-    CSCHED2_VCPU_CHECK(ret.task);
+    CSCHED2_VCPU_CHECK(ret.task->vcpu);
     return ret;
 }
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index ac282f437e..e490b791b8 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -738,10 +738,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = idle_vcpu[cpu];
+        ret.task = idle_vcpu[cpu]->sched_unit;
     }
     else
-        ret.task = per_cpu(npc, cpu).vcpu;
+        ret.task = per_cpu(npc, cpu).vcpu->sched_unit;
     ret.migrated = 0;
     ret.time = -1;
 
@@ -776,7 +776,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
                 {
                     vcpu_assign(prv, wvc->vcpu, cpu);
                     list_del_init(&wvc->waitq_elem);
-                    ret.task = wvc->vcpu;
+                    ret.task = wvc->vcpu->sched_unit;
                     goto unlock;
                 }
             }
@@ -785,10 +785,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
         spin_unlock(&prv->waitq_lock);
     }
 
-    if ( unlikely(ret.task == NULL || !vcpu_runnable(ret.task)) )
-        ret.task = idle_vcpu[cpu];
+    if ( unlikely(ret.task == NULL || !unit_runnable(ret.task)) )
+        ret.task = idle_vcpu[cpu]->sched_unit;
 
-    NULL_VCPU_CHECK(ret.task);
+    NULL_VCPU_CHECK(ret.task->vcpu);
     return ret;
 }
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 9f18a509bd..f781e46f9f 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1131,7 +1131,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
     }
-    ret.task = snext->vcpu;
+    ret.task = snext->vcpu->sched_unit;
 
     return ret;
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index d1c706186f..f4aff72105 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1534,7 +1534,7 @@ static void schedule(void)
     sched = this_cpu(scheduler);
     next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
 
-    next = next_slice.task;
+    next = next_slice.task->vcpu;
 
     sd->curr = next->sched_unit;
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index da9aa04370..c5bc0b689c 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -186,9 +186,9 @@ static inline spinlock_t *pcpu_schedule_trylock(unsigned int cpu)
 }
 
 struct task_slice {
-    struct vcpu *task;
-    s_time_t     time;
-    bool_t       migrated;
+    struct sched_unit *task;
+    s_time_t           time;
+    bool_t             migrated;
 };
 
 struct scheduler {
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 19/60] xen/sched: add is_running indicator to struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Add an is_running indicator to struct sched_unit which will be set
whenever the unit is being scheduled. Switch scheduler code to use
unit->is_running instead of vcpu->is_running for scheduling decisions.

At the same time introduce a state_entry_time field in struct
sched_unit being updated whenever the is_running indicator is changed.
Use that new field in the schedulers instead of the similar vcpu field.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: fix arm build, don't drop v->is_running
---
 xen/common/sched_credit.c  | 12 +++++++-----
 xen/common/sched_credit2.c | 18 +++++++++---------
 xen/common/sched_rt.c      |  2 +-
 xen/common/schedule.c      | 15 +++++++++++----
 xen/include/xen/sched.h    |  4 ++++
 5 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 6908e373dc..b700cc07ce 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -714,7 +714,7 @@ __csched_vcpu_is_migrateable(const struct csched_private *prv, struct vcpu *vc,
      * The caller is supposed to have already checked that vc is also
      * not running.
      */
-    ASSERT(!vc->is_running);
+    ASSERT(!vc->sched_unit->is_running);
 
     return !__csched_vcpu_is_cache_hot(prv, vc) &&
            cpumask_test_cpu(dest_cpu, mask);
@@ -1038,7 +1038,8 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 
     lock = unit_schedule_lock_irq(unit);
 
-    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )
+    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) &&
+         !vc->sched_unit->is_running )
         runq_insert(svc);
 
     unit_schedule_unlock_irq(lock, unit);
@@ -1651,8 +1652,9 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
          * vCPUs with useful soft affinities in some sort of bitmap
          * or counter.
          */
-        if ( vc->is_running || (balance_step == BALANCE_SOFT_AFFINITY &&
-                                !has_soft_affinity(vc->sched_unit)) )
+        if ( vc->sched_unit->is_running ||
+             (balance_step == BALANCE_SOFT_AFFINITY &&
+              !has_soft_affinity(vc->sched_unit)) )
             continue;
 
         affinity_balance_cpumask(vc->sched_unit, balance_step, cpumask_scratch);
@@ -1860,7 +1862,7 @@ csched_schedule(
                     (unsigned char *)&d);
     }
 
-    runtime = now - current->runstate.state_entry_time;
+    runtime = now - current->sched_unit->state_entry_time;
     if ( runtime < 0 ) /* Does this ever happen? */
         runtime = 0;
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index aea05edbb2..ef29a3d874 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1283,7 +1283,7 @@ runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
 
     ASSERT(&svc->rqd->runq == runq);
     ASSERT(!is_idle_vcpu(svc->vcpu));
-    ASSERT(!svc->vcpu->is_running);
+    ASSERT(!svc->vcpu->sched_unit->is_running);
     ASSERT(!(svc->flags & CSFLAG_scheduled));
 
     list_for_each( iter, runq )
@@ -1340,8 +1340,8 @@ static inline bool is_preemptable(const struct csched2_unit *svc,
     if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
         return true;
 
-    ASSERT(svc->vcpu->is_running);
-    return now - svc->vcpu->runstate.state_entry_time >
+    ASSERT(svc->vcpu->sched_unit->is_running);
+    return now - svc->vcpu->sched_unit->state_entry_time >
            ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE;
 }
 
@@ -2931,7 +2931,7 @@ csched2_dom_cntl(
                 {
                     svc = csched2_unit(v->sched_unit);
                     lock = unit_schedule_lock(svc->vcpu->sched_unit);
-                    if ( v->is_running )
+                    if ( v->sched_unit->is_running )
                     {
                         unsigned int cpu = v->processor;
                         struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
@@ -3204,8 +3204,8 @@ csched2_runtime(const struct scheduler *ops, int cpu,
     if ( prv->ratelimit_us )
     {
         s_time_t ratelimit_min = MICROSECS(prv->ratelimit_us);
-        if ( snext->vcpu->is_running )
-            ratelimit_min = snext->vcpu->runstate.state_entry_time +
+        if ( snext->vcpu->sched_unit->is_running )
+            ratelimit_min = snext->vcpu->sched_unit->state_entry_time +
                             MICROSECS(prv->ratelimit_us) - now;
         if ( ratelimit_min > min_time )
             min_time = ratelimit_min;
@@ -3302,7 +3302,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * no point forcing it to do so until rate limiting expires.
      */
     if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) &&
-         (now - scurr->vcpu->runstate.state_entry_time) <
+         (now - scurr->vcpu->sched_unit->state_entry_time) <
           MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
@@ -3313,7 +3313,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
             } d;
             d.dom = scurr->vcpu->domain->domain_id;
             d.vcpu = scurr->vcpu->vcpu_id;
-            d.runtime = now - scurr->vcpu->runstate.state_entry_time;
+            d.runtime = now - scurr->vcpu->sched_unit->state_entry_time;
             __trace_var(TRC_CSCHED2_RATELIMIT, 1,
                         sizeof(d),
                         (unsigned char *)&d);
@@ -3561,7 +3561,7 @@ csched2_schedule(
         if ( snext != scurr )
         {
             ASSERT(snext->rqd == rqd);
-            ASSERT(!snext->vcpu->is_running);
+            ASSERT(!snext->vcpu->sched_unit->is_running);
 
             runq_remove(snext);
             __set_bit(__CSFLAG_scheduled, &snext->flags);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index f781e46f9f..5b1f6459cc 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -907,7 +907,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     {
         replq_insert(ops, svc);
 
-        if ( !vc->is_running )
+        if ( !unit->is_running )
             runq_insert(ops, svc);
     }
     unit_schedule_unlock_irq(lock, unit);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index f4aff72105..53a3e55f0b 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -353,6 +353,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     {
         get_sched_res(v->processor)->curr = unit;
         v->is_running = 1;
+        unit->is_running = 1;
+        unit->state_entry_time = NOW();
     }
     else
     {
@@ -673,7 +675,8 @@ static void vcpu_migrate_finish(struct vcpu *v)
      * context_saved(); and in any case, if the bit is cleared, then
      * someone else has already done the work so we don't need to.
      */
-    if ( v->is_running || !test_bit(_VPF_migrating, &v->pause_flags) )
+    if ( v->sched_unit->is_running ||
+         !test_bit(_VPF_migrating, &v->pause_flags) )
         return;
 
     old_cpu = new_cpu = v->processor;
@@ -727,7 +730,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
      * because they both happen in (different) spinlock regions, and those
      * regions are strictly serialised.
      */
-    if ( v->is_running ||
+    if ( v->sched_unit->is_running ||
          !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
     {
         sched_spin_unlock_double(old_lock, new_lock, flags);
@@ -755,7 +758,7 @@ void vcpu_force_reschedule(struct vcpu *v)
 {
     spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit);
 
-    if ( v->is_running )
+    if ( v->sched_unit->is_running )
         vcpu_migrate_start(v);
 
     unit_schedule_unlock_irq(lock, v->sched_unit);
@@ -1582,8 +1585,10 @@ static void schedule(void)
      * switch, else lost_records resume will not work properly.
      */
 
-    ASSERT(!next->is_running);
+    ASSERT(!next->sched_unit->is_running);
     next->is_running = 1;
+    next->sched_unit->is_running = 1;
+    next->sched_unit->state_entry_time = now;
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
@@ -1605,6 +1610,8 @@ void context_saved(struct vcpu *prev)
     smp_wmb();
 
     prev->is_running = 0;
+    prev->sched_unit->is_running = 0;
+    prev->sched_unit->state_entry_time = NOW();
 
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c76b81ebef..13bab258d5 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -270,7 +270,11 @@ struct sched_unit {
 
     /* Last time when unit has been scheduled out. */
     uint64_t               last_run_time;
+    /* Last time unit got (de-)scheduled. */
+    uint64_t               state_entry_time;
 
+    /* Currently running on a CPU? */
+    bool                   is_running;
     /* Item needs affinity restored. */
     bool                   affinity_broken;
     /* Does soft affinity actually play a role (given hard affinity)? */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 19/60] xen/sched: add is_running indicator to struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Add an is_running indicator to struct sched_unit which will be set
whenever the unit is being scheduled. Switch scheduler code to use
unit->is_running instead of vcpu->is_running for scheduling decisions.

At the same time introduce a state_entry_time field in struct
sched_unit being updated whenever the is_running indicator is changed.
Use that new field in the schedulers instead of the similar vcpu field.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: fix arm build, don't drop v->is_running
---
 xen/common/sched_credit.c  | 12 +++++++-----
 xen/common/sched_credit2.c | 18 +++++++++---------
 xen/common/sched_rt.c      |  2 +-
 xen/common/schedule.c      | 15 +++++++++++----
 xen/include/xen/sched.h    |  4 ++++
 5 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 6908e373dc..b700cc07ce 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -714,7 +714,7 @@ __csched_vcpu_is_migrateable(const struct csched_private *prv, struct vcpu *vc,
      * The caller is supposed to have already checked that vc is also
      * not running.
      */
-    ASSERT(!vc->is_running);
+    ASSERT(!vc->sched_unit->is_running);
 
     return !__csched_vcpu_is_cache_hot(prv, vc) &&
            cpumask_test_cpu(dest_cpu, mask);
@@ -1038,7 +1038,8 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 
     lock = unit_schedule_lock_irq(unit);
 
-    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )
+    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) &&
+         !vc->sched_unit->is_running )
         runq_insert(svc);
 
     unit_schedule_unlock_irq(lock, unit);
@@ -1651,8 +1652,9 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
          * vCPUs with useful soft affinities in some sort of bitmap
          * or counter.
          */
-        if ( vc->is_running || (balance_step == BALANCE_SOFT_AFFINITY &&
-                                !has_soft_affinity(vc->sched_unit)) )
+        if ( vc->sched_unit->is_running ||
+             (balance_step == BALANCE_SOFT_AFFINITY &&
+              !has_soft_affinity(vc->sched_unit)) )
             continue;
 
         affinity_balance_cpumask(vc->sched_unit, balance_step, cpumask_scratch);
@@ -1860,7 +1862,7 @@ csched_schedule(
                     (unsigned char *)&d);
     }
 
-    runtime = now - current->runstate.state_entry_time;
+    runtime = now - current->sched_unit->state_entry_time;
     if ( runtime < 0 ) /* Does this ever happen? */
         runtime = 0;
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index aea05edbb2..ef29a3d874 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1283,7 +1283,7 @@ runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
 
     ASSERT(&svc->rqd->runq == runq);
     ASSERT(!is_idle_vcpu(svc->vcpu));
-    ASSERT(!svc->vcpu->is_running);
+    ASSERT(!svc->vcpu->sched_unit->is_running);
     ASSERT(!(svc->flags & CSFLAG_scheduled));
 
     list_for_each( iter, runq )
@@ -1340,8 +1340,8 @@ static inline bool is_preemptable(const struct csched2_unit *svc,
     if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
         return true;
 
-    ASSERT(svc->vcpu->is_running);
-    return now - svc->vcpu->runstate.state_entry_time >
+    ASSERT(svc->vcpu->sched_unit->is_running);
+    return now - svc->vcpu->sched_unit->state_entry_time >
            ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE;
 }
 
@@ -2931,7 +2931,7 @@ csched2_dom_cntl(
                 {
                     svc = csched2_unit(v->sched_unit);
                     lock = unit_schedule_lock(svc->vcpu->sched_unit);
-                    if ( v->is_running )
+                    if ( v->sched_unit->is_running )
                     {
                         unsigned int cpu = v->processor;
                         struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
@@ -3204,8 +3204,8 @@ csched2_runtime(const struct scheduler *ops, int cpu,
     if ( prv->ratelimit_us )
     {
         s_time_t ratelimit_min = MICROSECS(prv->ratelimit_us);
-        if ( snext->vcpu->is_running )
-            ratelimit_min = snext->vcpu->runstate.state_entry_time +
+        if ( snext->vcpu->sched_unit->is_running )
+            ratelimit_min = snext->vcpu->sched_unit->state_entry_time +
                             MICROSECS(prv->ratelimit_us) - now;
         if ( ratelimit_min > min_time )
             min_time = ratelimit_min;
@@ -3302,7 +3302,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * no point forcing it to do so until rate limiting expires.
      */
     if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) &&
-         (now - scurr->vcpu->runstate.state_entry_time) <
+         (now - scurr->vcpu->sched_unit->state_entry_time) <
           MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
@@ -3313,7 +3313,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
             } d;
             d.dom = scurr->vcpu->domain->domain_id;
             d.vcpu = scurr->vcpu->vcpu_id;
-            d.runtime = now - scurr->vcpu->runstate.state_entry_time;
+            d.runtime = now - scurr->vcpu->sched_unit->state_entry_time;
             __trace_var(TRC_CSCHED2_RATELIMIT, 1,
                         sizeof(d),
                         (unsigned char *)&d);
@@ -3561,7 +3561,7 @@ csched2_schedule(
         if ( snext != scurr )
         {
             ASSERT(snext->rqd == rqd);
-            ASSERT(!snext->vcpu->is_running);
+            ASSERT(!snext->vcpu->sched_unit->is_running);
 
             runq_remove(snext);
             __set_bit(__CSFLAG_scheduled, &snext->flags);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index f781e46f9f..5b1f6459cc 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -907,7 +907,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     {
         replq_insert(ops, svc);
 
-        if ( !vc->is_running )
+        if ( !unit->is_running )
             runq_insert(ops, svc);
     }
     unit_schedule_unlock_irq(lock, unit);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index f4aff72105..53a3e55f0b 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -353,6 +353,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     {
         get_sched_res(v->processor)->curr = unit;
         v->is_running = 1;
+        unit->is_running = 1;
+        unit->state_entry_time = NOW();
     }
     else
     {
@@ -673,7 +675,8 @@ static void vcpu_migrate_finish(struct vcpu *v)
      * context_saved(); and in any case, if the bit is cleared, then
      * someone else has already done the work so we don't need to.
      */
-    if ( v->is_running || !test_bit(_VPF_migrating, &v->pause_flags) )
+    if ( v->sched_unit->is_running ||
+         !test_bit(_VPF_migrating, &v->pause_flags) )
         return;
 
     old_cpu = new_cpu = v->processor;
@@ -727,7 +730,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
      * because they both happen in (different) spinlock regions, and those
      * regions are strictly serialised.
      */
-    if ( v->is_running ||
+    if ( v->sched_unit->is_running ||
          !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
     {
         sched_spin_unlock_double(old_lock, new_lock, flags);
@@ -755,7 +758,7 @@ void vcpu_force_reschedule(struct vcpu *v)
 {
     spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit);
 
-    if ( v->is_running )
+    if ( v->sched_unit->is_running )
         vcpu_migrate_start(v);
 
     unit_schedule_unlock_irq(lock, v->sched_unit);
@@ -1582,8 +1585,10 @@ static void schedule(void)
      * switch, else lost_records resume will not work properly.
      */
 
-    ASSERT(!next->is_running);
+    ASSERT(!next->sched_unit->is_running);
     next->is_running = 1;
+    next->sched_unit->is_running = 1;
+    next->sched_unit->state_entry_time = now;
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
@@ -1605,6 +1610,8 @@ void context_saved(struct vcpu *prev)
     smp_wmb();
 
     prev->is_running = 0;
+    prev->sched_unit->is_running = 0;
+    prev->sched_unit->state_entry_time = NOW();
 
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c76b81ebef..13bab258d5 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -270,7 +270,11 @@ struct sched_unit {
 
     /* Last time when unit has been scheduled out. */
     uint64_t               last_run_time;
+    /* Last time unit got (de-)scheduled. */
+    uint64_t               state_entry_time;
 
+    /* Currently running on a CPU? */
+    bool                   is_running;
     /* Item needs affinity restored. */
     bool                   affinity_broken;
     /* Does soft affinity actually play a role (given hard affinity)? */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 20/60] xen/sched: make null scheduler vcpu agnostic.
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Switch null scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_null.c | 304 ++++++++++++++++++++++++------------------------
 1 file changed, 149 insertions(+), 155 deletions(-)

diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index e490b791b8..53a20d73aa 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -18,10 +18,10 @@
 
 /*
  * The 'null' scheduler always choose to run, on each pCPU, either nothing
- * (i.e., the pCPU stays idle) or always the same vCPU.
+ * (i.e., the pCPU stays idle) or always the same Item.
  *
  * It is aimed at supporting static scenarios, where there always are
- * less vCPUs than pCPUs (and the vCPUs don't need to move among pCPUs
+ * less Items than pCPUs (and the Items don't need to move among pCPUs
  * for any reason) with the least possible overhead.
  *
  * Typical usecase are embedded applications, but also HPC, especially
@@ -38,8 +38,8 @@
  * null tracing events. Check include/public/trace.h for more details.
  */
 #define TRC_SNULL_PICKED_CPU    TRC_SCHED_CLASS_EVT(SNULL, 1)
-#define TRC_SNULL_VCPU_ASSIGN   TRC_SCHED_CLASS_EVT(SNULL, 2)
-#define TRC_SNULL_VCPU_DEASSIGN TRC_SCHED_CLASS_EVT(SNULL, 3)
+#define TRC_SNULL_UNIT_ASSIGN   TRC_SCHED_CLASS_EVT(SNULL, 2)
+#define TRC_SNULL_UNIT_DEASSIGN TRC_SCHED_CLASS_EVT(SNULL, 3)
 #define TRC_SNULL_MIGRATE       TRC_SCHED_CLASS_EVT(SNULL, 4)
 #define TRC_SNULL_SCHEDULE      TRC_SCHED_CLASS_EVT(SNULL, 5)
 #define TRC_SNULL_TASKLET       TRC_SCHED_CLASS_EVT(SNULL, 6)
@@ -48,13 +48,13 @@
  * Locking:
  * - Scheduler-lock (a.k.a. runqueue lock):
  *  + is per-pCPU;
- *  + serializes assignment and deassignment of vCPUs to a pCPU.
+ *  + serializes assignment and deassignment of Items to a pCPU.
  * - Private data lock (a.k.a. private scheduler lock):
  *  + is scheduler-wide;
  *  + serializes accesses to the list of domains in this scheduler.
  * - Waitqueue lock:
  *  + is scheduler-wide;
- *  + serialize accesses to the list of vCPUs waiting to be assigned
+ *  + serialize accesses to the list of Items waiting to be assigned
  *    to pCPUs.
  *
  * Ordering is: private lock, runqueue lock, waitqueue lock. Or, OTOH,
@@ -78,25 +78,25 @@
 struct null_private {
     spinlock_t lock;        /* scheduler lock; nests inside cpupool_lock */
     struct list_head ndom;  /* Domains of this scheduler                 */
-    struct list_head waitq; /* vCPUs not assigned to any pCPU            */
+    struct list_head waitq; /* Items not assigned to any pCPU            */
     spinlock_t waitq_lock;  /* serializes waitq; nests inside runq locks */
-    cpumask_t cpus_free;    /* CPUs without a vCPU associated to them    */
+    cpumask_t cpus_free;    /* CPUs without a Item associated to them    */
 };
 
 /*
  * Physical CPU
  */
 struct null_pcpu {
-    struct vcpu *vcpu;
+    struct sched_unit *unit;
 };
 DEFINE_PER_CPU(struct null_pcpu, npc);
 
 /*
- * Virtual CPU
+ * Schedule Item
  */
 struct null_unit {
     struct list_head waitq_elem;
-    struct vcpu *vcpu;
+    struct sched_unit *unit;
 };
 
 /*
@@ -120,13 +120,13 @@ static inline struct null_unit *null_unit(const struct sched_unit *unit)
     return unit->priv;
 }
 
-static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
+static inline bool unit_check_affinity(struct sched_unit *unit,
+                                       unsigned int cpu,
                                        unsigned int balance_step)
 {
-    affinity_balance_cpumask(v->sched_unit, balance_step,
-                             cpumask_scratch_cpu(cpu));
+    affinity_balance_cpumask(unit, balance_step, cpumask_scratch_cpu(cpu));
     cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                cpupool_domain_cpumask(v->domain));
+                cpupool_domain_cpumask(unit->domain));
 
     return cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu));
 }
@@ -161,9 +161,9 @@ static void null_deinit(struct scheduler *ops)
 
 static void init_pdata(struct null_private *prv, unsigned int cpu)
 {
-    /* Mark the pCPU as free, and with no vCPU assigned */
+    /* Mark the pCPU as free, and with no unit assigned */
     cpumask_set_cpu(cpu, &prv->cpus_free);
-    per_cpu(npc, cpu).vcpu = NULL;
+    per_cpu(npc, cpu).unit = NULL;
 }
 
 static void null_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
@@ -191,13 +191,12 @@ static void null_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
     ASSERT(!pcpu);
 
     cpumask_clear_cpu(cpu, &prv->cpus_free);
-    per_cpu(npc, cpu).vcpu = NULL;
+    per_cpu(npc, cpu).unit = NULL;
 }
 
 static void *null_alloc_vdata(const struct scheduler *ops,
                               struct sched_unit *unit, void *dd)
 {
-    struct vcpu *v = unit->vcpu;
     struct null_unit *nvc;
 
     nvc = xzalloc(struct null_unit);
@@ -205,7 +204,7 @@ static void *null_alloc_vdata(const struct scheduler *ops,
         return NULL;
 
     INIT_LIST_HEAD(&nvc->waitq_elem);
-    nvc->vcpu = v;
+    nvc->unit = unit;
 
     SCHED_STAT_CRANK(unit_alloc);
 
@@ -257,15 +256,15 @@ static void null_free_domdata(const struct scheduler *ops, void *data)
 }
 
 /*
- * vCPU to pCPU assignment and placement. This _only_ happens:
+ * unit to pCPU assignment and placement. This _only_ happens:
  *  - on insert,
  *  - on migrate.
  *
- * Insert occurs when a vCPU joins this scheduler for the first time
+ * Insert occurs when a unit joins this scheduler for the first time
  * (e.g., when the domain it's part of is moved to the scheduler's
  * cpupool).
  *
- * Migration may be necessary if a pCPU (with a vCPU assigned to it)
+ * Migration may be necessary if a pCPU (with a unit assigned to it)
  * is removed from the scheduler's cpupool.
  *
  * So this is not part of any hot path.
@@ -274,9 +273,8 @@ static struct sched_resource *
 pick_res(struct null_private *prv, struct sched_unit *unit)
 {
     unsigned int bs;
-    struct vcpu *v = unit->vcpu;
-    unsigned int cpu = v->processor, new_cpu;
-    cpumask_t *cpus = cpupool_domain_cpumask(v->domain);
+    unsigned int cpu = sched_unit_cpu(unit), new_cpu;
+    cpumask_t *cpus = cpupool_domain_cpumask(unit->domain);
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
@@ -291,11 +289,12 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
         /*
          * If our processor is free, or we are assigned to it, and it is also
          * still valid and part of our affinity, just go for it.
-         * (Note that we may call vcpu_check_affinity(), but we deliberately
+         * (Note that we may call unit_check_affinity(), but we deliberately
          * don't, so we get to keep in the scratch cpumask what we have just
          * put in it.)
          */
-        if ( likely((per_cpu(npc, cpu).vcpu == NULL || per_cpu(npc, cpu).vcpu == v)
+        if ( likely((per_cpu(npc, cpu).unit == NULL ||
+                     per_cpu(npc, cpu).unit == unit)
                     && cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
         {
             new_cpu = cpu;
@@ -313,13 +312,13 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
 
     /*
      * If we didn't find any free pCPU, just pick any valid pcpu, even if
-     * it has another vCPU assigned. This will happen during shutdown and
+     * it has another Item assigned. This will happen during shutdown and
      * suspend/resume, but it may also happen during "normal operation", if
      * all the pCPUs are busy.
      *
      * In fact, there must always be something sane in v->processor, or
      * unit_schedule_lock() and friends won't work. This is not a problem,
-     * as we will actually assign the vCPU to the pCPU we return from here,
+     * as we will actually assign the Item to the pCPU we return from here,
      * only if the pCPU is free.
      */
     cpumask_and(cpumask_scratch_cpu(cpu), cpus, unit->cpu_hard_affinity);
@@ -329,11 +328,11 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t unit, dom;
             uint32_t new_cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
         d.new_cpu = new_cpu;
         __trace_var(TRC_SNULL_PICKED_CPU, 1, sizeof(d), &d);
     }
@@ -341,47 +340,47 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
     return get_sched_res(new_cpu);
 }
 
-static void vcpu_assign(struct null_private *prv, struct vcpu *v,
+static void unit_assign(struct null_private *prv, struct sched_unit *unit,
                         unsigned int cpu)
 {
-    per_cpu(npc, cpu).vcpu = v;
-    v->processor = cpu;
-    v->sched_unit->res = get_sched_res(cpu);
+    per_cpu(npc, cpu).unit = unit;
+    sched_set_res(unit, get_sched_res(cpu));
     cpumask_clear_cpu(cpu, &prv->cpus_free);
 
-    dprintk(XENLOG_G_INFO, "%d <-- %pv\n", cpu, v);
+    dprintk(XENLOG_G_INFO, "%d <-- %pdv%d\n", cpu, unit->domain, unit->unit_id);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t unit, dom;
             uint32_t cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
         d.cpu = cpu;
-        __trace_var(TRC_SNULL_VCPU_ASSIGN, 1, sizeof(d), &d);
+        __trace_var(TRC_SNULL_UNIT_ASSIGN, 1, sizeof(d), &d);
     }
 }
 
-static void vcpu_deassign(struct null_private *prv, struct vcpu *v,
+static void unit_deassign(struct null_private *prv, struct sched_unit *unit,
                           unsigned int cpu)
 {
-    per_cpu(npc, cpu).vcpu = NULL;
+    per_cpu(npc, cpu).unit = NULL;
     cpumask_set_cpu(cpu, &prv->cpus_free);
 
-    dprintk(XENLOG_G_INFO, "%d <-- NULL (%pv)\n", cpu, v);
+    dprintk(XENLOG_G_INFO, "%d <-- NULL (%pdv%d)\n", cpu, unit->domain,
+            unit->unit_id);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t unit, dom;
             uint32_t cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
         d.cpu = cpu;
-        __trace_var(TRC_SNULL_VCPU_DEASSIGN, 1, sizeof(d), &d);
+        __trace_var(TRC_SNULL_UNIT_DEASSIGN, 1, sizeof(d), &d);
     }
 }
 
@@ -394,9 +393,9 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
     struct null_private *prv = null_priv(new_ops);
     struct null_unit *nvc = vdata;
 
-    ASSERT(nvc && is_idle_vcpu(nvc->vcpu));
+    ASSERT(nvc && is_idle_unit(nvc->unit));
 
-    idle_vcpu[cpu]->sched_unit->priv = vdata;
+    sched_idle_unit(cpu)->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -413,35 +412,34 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
 static void null_unit_insert(const struct scheduler *ops,
                              struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_unit *nvc = null_unit(unit);
     unsigned int cpu;
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_unit(unit));
 
     lock = unit_schedule_lock_irq(unit);
  retry:
 
-    unit->res = pick_res(prv, unit);
-    cpu = v->processor = unit->res->processor;
+    sched_set_res(unit, pick_res(prv, unit));
+    cpu = sched_unit_cpu(unit);
 
     spin_unlock(lock);
 
     lock = unit_schedule_lock(unit);
 
     cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                cpupool_domain_cpumask(v->domain));
+                cpupool_domain_cpumask(unit->domain));
 
-    /* If the pCPU is free, we assign v to it */
-    if ( likely(per_cpu(npc, cpu).vcpu == NULL) )
+    /* If the pCPU is free, we assign unit to it */
+    if ( likely(per_cpu(npc, cpu).unit == NULL) )
     {
         /*
          * Insert is followed by vcpu_wake(), so there's no need to poke
          * the pcpu with the SCHEDULE_SOFTIRQ, as wake will do that.
          */
-        vcpu_assign(prv, v, cpu);
+        unit_assign(prv, unit, cpu);
     }
     else if ( cpumask_intersects(&prv->cpus_free, cpumask_scratch_cpu(cpu)) )
     {
@@ -460,7 +458,8 @@ static void null_unit_insert(const struct scheduler *ops,
          */
         spin_lock(&prv->waitq_lock);
         list_add_tail(&nvc->waitq_elem, &prv->waitq);
-        dprintk(XENLOG_G_WARNING, "WARNING: %pv not assigned to any CPU!\n", v);
+        dprintk(XENLOG_G_WARNING, "WARNING: %pdv%d not assigned to any CPU!\n",
+                unit->domain, unit->unit_id);
         spin_unlock(&prv->waitq_lock);
     }
     spin_unlock_irq(lock);
@@ -468,35 +467,34 @@ static void null_unit_insert(const struct scheduler *ops,
     SCHED_STAT_CRANK(unit_insert);
 }
 
-static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
+static void _unit_remove(struct null_private *prv, struct sched_unit *unit)
 {
     unsigned int bs;
-    unsigned int cpu = v->processor;
+    unsigned int cpu = sched_unit_cpu(unit);
     struct null_unit *wvc;
 
-    ASSERT(list_empty(&null_unit(v->sched_unit)->waitq_elem));
+    ASSERT(list_empty(&null_unit(unit)->waitq_elem));
 
-    vcpu_deassign(prv, v, cpu);
+    unit_deassign(prv, unit, cpu);
 
     spin_lock(&prv->waitq_lock);
 
     /*
-     * If v is assigned to a pCPU, let's see if there is someone waiting,
-     * suitable to be assigned to it (prioritizing vcpus that have
+     * If unit is assigned to a pCPU, let's see if there is someone waiting,
+     * suitable to be assigned to it (prioritizing units that have
      * soft-affinity with cpu).
      */
     for_each_affinity_balance_step( bs )
     {
         list_for_each_entry( wvc, &prv->waitq, waitq_elem )
         {
-            if ( bs == BALANCE_SOFT_AFFINITY &&
-                 !has_soft_affinity(wvc->vcpu->sched_unit) )
+            if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(wvc->unit) )
                 continue;
 
-            if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
+            if ( unit_check_affinity(wvc->unit, cpu, bs) )
             {
                 list_del_init(&wvc->waitq_elem);
-                vcpu_assign(prv, wvc->vcpu, cpu);
+                unit_assign(prv, wvc->unit, cpu);
                 cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
                 spin_unlock(&prv->waitq_lock);
                 return;
@@ -509,16 +507,15 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
 static void null_unit_remove(const struct scheduler *ops,
                              struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_unit *nvc = null_unit(unit);
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_unit(unit));
 
     lock = unit_schedule_lock_irq(unit);
 
-    /* If v is in waitqueue, just get it out of there and bail */
+    /* If unit is in waitqueue, just get it out of there and bail */
     if ( unlikely(!list_empty(&nvc->waitq_elem)) )
     {
         spin_lock(&prv->waitq_lock);
@@ -528,10 +525,10 @@ static void null_unit_remove(const struct scheduler *ops,
         goto out;
     }
 
-    ASSERT(per_cpu(npc, v->processor).vcpu == v);
-    ASSERT(!cpumask_test_cpu(v->processor, &prv->cpus_free));
+    ASSERT(per_cpu(npc, sched_unit_cpu(unit)).unit == unit);
+    ASSERT(!cpumask_test_cpu(sched_unit_cpu(unit), &prv->cpus_free));
 
-    _vcpu_remove(prv, v);
+    _unit_remove(prv, unit);
 
  out:
     unit_schedule_unlock_irq(lock, unit);
@@ -542,11 +539,9 @@ static void null_unit_remove(const struct scheduler *ops,
 static void null_unit_wake(const struct scheduler *ops,
                            struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
+    ASSERT(!is_idle_unit(unit));
 
-    ASSERT(!is_idle_vcpu(v));
-
-    if ( unlikely(curr_on_cpu(v->processor) == unit) )
+    if ( unlikely(curr_on_cpu(sched_unit_cpu(unit)) == unit) )
     {
         SCHED_STAT_CRANK(unit_wake_running);
         return;
@@ -559,25 +554,23 @@ static void null_unit_wake(const struct scheduler *ops,
         return;
     }
 
-    if ( likely(vcpu_runnable(v)) )
+    if ( likely(unit_runnable(unit)) )
         SCHED_STAT_CRANK(unit_wake_runnable);
     else
         SCHED_STAT_CRANK(unit_wake_not_runnable);
 
-    /* Note that we get here only for vCPUs assigned to a pCPU */
-    cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
+    /* Note that we get here only for units assigned to a pCPU */
+    cpu_raise_softirq(sched_unit_cpu(unit), SCHEDULE_SOFTIRQ);
 }
 
 static void null_unit_sleep(const struct scheduler *ops,
                             struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
-
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_unit(unit));
 
-    /* If v is not assigned to a pCPU, or is not running, no need to bother */
-    if ( curr_on_cpu(v->processor) == unit )
-        cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
+    /* If unit isn't assigned to a pCPU, or isn't running, no need to bother */
+    if ( curr_on_cpu(sched_unit_cpu(unit)) == unit )
+        cpu_raise_softirq(sched_unit_cpu(unit), SCHEDULE_SOFTIRQ);
 
     SCHED_STAT_CRANK(unit_sleep);
 }
@@ -585,37 +578,36 @@ static void null_unit_sleep(const struct scheduler *ops,
 static struct sched_resource *
 null_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
-    ASSERT(!is_idle_vcpu(unit->vcpu));
+    ASSERT(!is_idle_unit(unit));
     return pick_res(null_priv(ops), unit);
 }
 
 static void null_unit_migrate(const struct scheduler *ops,
                               struct sched_unit *unit, unsigned int new_cpu)
 {
-    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_unit *nvc = null_unit(unit);
 
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_unit(unit));
 
-    if ( v->processor == new_cpu )
+    if ( sched_unit_cpu(unit) == new_cpu )
         return;
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t unit, dom;
             uint16_t cpu, new_cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
-        d.cpu = v->processor;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
+        d.cpu = sched_unit_cpu(unit);
         d.new_cpu = new_cpu;
         __trace_var(TRC_SNULL_MIGRATE, 1, sizeof(d), &d);
     }
 
     /*
-     * v is either assigned to a pCPU, or in the waitqueue.
+     * unit is either assigned to a pCPU, or in the waitqueue.
      *
      * In the former case, the pCPU to which it was assigned would
      * become free, and we, therefore, should check whether there is
@@ -625,7 +617,7 @@ static void null_unit_migrate(const struct scheduler *ops,
      */
     if ( likely(list_empty(&nvc->waitq_elem)) )
     {
-        _vcpu_remove(prv, v);
+        _unit_remove(prv, unit);
         SCHED_STAT_CRANK(migrate_running);
     }
     else
@@ -634,32 +626,34 @@ static void null_unit_migrate(const struct scheduler *ops,
     SCHED_STAT_CRANK(migrated);
 
     /*
-     * Let's now consider new_cpu, which is where v is being sent. It can be
-     * either free, or have a vCPU already assigned to it.
+     * Let's now consider new_cpu, which is where unit is being sent. It can be
+     * either free, or have a unit already assigned to it.
      *
-     * In the former case, we should assign v to it, and try to get it to run,
+     * In the former case we should assign unit to it, and try to get it to run,
      * if possible, according to affinity.
      *
-     * In latter, all we can do is to park v in the waitqueue.
+     * In latter, all we can do is to park unit in the waitqueue.
      */
-    if ( per_cpu(npc, new_cpu).vcpu == NULL &&
-         vcpu_check_affinity(v, new_cpu, BALANCE_HARD_AFFINITY) )
+    if ( per_cpu(npc, new_cpu).unit == NULL &&
+         unit_check_affinity(unit, new_cpu, BALANCE_HARD_AFFINITY) )
     {
-        /* v might have been in the waitqueue, so remove it */
+        /* unit might have been in the waitqueue, so remove it */
         spin_lock(&prv->waitq_lock);
         list_del_init(&nvc->waitq_elem);
         spin_unlock(&prv->waitq_lock);
 
-        vcpu_assign(prv, v, new_cpu);
+        unit_assign(prv, unit, new_cpu);
     }
     else
     {
-        /* Put v in the waitqueue, if it wasn't there already */
+        /* Put unit in the waitqueue, if it wasn't there already */
         spin_lock(&prv->waitq_lock);
         if ( list_empty(&nvc->waitq_elem) )
         {
             list_add_tail(&nvc->waitq_elem, &prv->waitq);
-            dprintk(XENLOG_G_WARNING, "WARNING: %pv not assigned to any CPU!\n", v);
+            dprintk(XENLOG_G_WARNING,
+                    "WARNING: %pdv%d not assigned to any CPU!\n", unit->domain,
+                    unit->unit_id);
         }
         spin_unlock(&prv->waitq_lock);
     }
@@ -672,35 +666,34 @@ static void null_unit_migrate(const struct scheduler *ops,
      * at least. In case of suspend, any temporary inconsistency caused
      * by this, will be fixed-up during resume.
      */
-    v->processor = new_cpu;
-    unit->res = get_sched_res(new_cpu);
+    sched_set_res(unit, get_sched_res(new_cpu));
 }
 
 #ifndef NDEBUG
-static inline void null_vcpu_check(struct vcpu *v)
+static inline void null_unit_check(struct sched_unit *unit)
 {
-    struct null_unit * const nvc = null_unit(v->sched_unit);
-    struct null_dom * const ndom = v->domain->sched_priv;
+    struct null_unit * const nvc = null_unit(unit);
+    struct null_dom * const ndom = unit->domain->sched_priv;
 
-    BUG_ON(nvc->vcpu != v);
+    BUG_ON(nvc->unit != unit);
 
     if ( ndom )
-        BUG_ON(is_idle_vcpu(v));
+        BUG_ON(is_idle_unit(unit));
     else
-        BUG_ON(!is_idle_vcpu(v));
+        BUG_ON(!is_idle_unit(unit));
 
     SCHED_STAT_CRANK(unit_check);
 }
-#define NULL_VCPU_CHECK(v)  (null_vcpu_check(v))
+#define NULL_UNIT_CHECK(unit)  (null_unit_check(unit))
 #else
-#define NULL_VCPU_CHECK(v)
+#define NULL_UNIT_CHECK(unit)
 #endif
 
 
 /*
  * The most simple scheduling function of all times! We either return:
- *  - the vCPU assigned to the pCPU, if there's one and it can run;
- *  - the idle vCPU, otherwise.
+ *  - the unit assigned to the pCPU, if there's one and it can run;
+ *  - the idle unit, otherwise.
  */
 static struct task_slice null_schedule(const struct scheduler *ops,
                                        s_time_t now,
@@ -713,24 +706,24 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     struct task_slice ret;
 
     SCHED_STAT_CRANK(schedule);
-    NULL_VCPU_CHECK(current);
+    NULL_UNIT_CHECK(current->sched_unit);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
             uint16_t tasklet, cpu;
-            int16_t vcpu, dom;
+            int16_t unit, dom;
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        if ( per_cpu(npc, cpu).vcpu == NULL )
+        if ( per_cpu(npc, cpu).unit == NULL )
         {
-            d.vcpu = d.dom = -1;
+            d.unit = d.dom = -1;
         }
         else
         {
-            d.vcpu = per_cpu(npc, cpu).vcpu->vcpu_id;
-            d.dom = per_cpu(npc, cpu).vcpu->domain->domain_id;
+            d.unit = per_cpu(npc, cpu).unit->unit_id;
+            d.dom = per_cpu(npc, cpu).unit->domain->domain_id;
         }
         __trace_var(TRC_SNULL_SCHEDULE, 1, sizeof(d), &d);
     }
@@ -738,16 +731,16 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = idle_vcpu[cpu]->sched_unit;
+        ret.task = sched_idle_unit(cpu);
     }
     else
-        ret.task = per_cpu(npc, cpu).vcpu->sched_unit;
+        ret.task = per_cpu(npc, cpu).unit;
     ret.migrated = 0;
     ret.time = -1;
 
     /*
      * We may be new in the cpupool, or just coming back online. In which
-     * case, there may be vCPUs in the waitqueue that we can assign to us
+     * case, there may be units in the waitqueue that we can assign to us
      * and run.
      */
     if ( unlikely(ret.task == NULL) )
@@ -758,10 +751,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
             goto unlock;
 
         /*
-         * We scan the waitqueue twice, for prioritizing vcpus that have
+         * We scan the waitqueue twice, for prioritizing units that have
          * soft-affinity with cpu. This may look like something expensive to
-         * do here in null_schedule(), but it's actually fine, beceuse we do
-         * it only in cases where a pcpu has no vcpu associated (e.g., as
+         * do here in null_schedule(), but it's actually fine, because we do
+         * it only in cases where a pcpu has no unit associated (e.g., as
          * said above, the cpu has just joined a cpupool).
          */
         for_each_affinity_balance_step( bs )
@@ -769,14 +762,14 @@ static struct task_slice null_schedule(const struct scheduler *ops,
             list_for_each_entry( wvc, &prv->waitq, waitq_elem )
             {
                 if ( bs == BALANCE_SOFT_AFFINITY &&
-                     !has_soft_affinity(wvc->vcpu->sched_unit) )
+                     !has_soft_affinity(wvc->unit) )
                     continue;
 
-                if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
+                if ( unit_check_affinity(wvc->unit, cpu, bs) )
                 {
-                    vcpu_assign(prv, wvc->vcpu, cpu);
+                    unit_assign(prv, wvc->unit, cpu);
                     list_del_init(&wvc->waitq_elem);
-                    ret.task = wvc->vcpu->sched_unit;
+                    ret.task = wvc->unit;
                     goto unlock;
                 }
             }
@@ -786,17 +779,17 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     }
 
     if ( unlikely(ret.task == NULL || !unit_runnable(ret.task)) )
-        ret.task = idle_vcpu[cpu]->sched_unit;
+        ret.task = sched_idle_unit(cpu);
 
-    NULL_VCPU_CHECK(ret.task->vcpu);
+    NULL_UNIT_CHECK(ret.task);
     return ret;
 }
 
-static inline void dump_vcpu(struct null_private *prv, struct null_unit *nvc)
+static inline void dump_unit(struct null_private *prv, struct null_unit *nvc)
 {
-    printk("[%i.%i] pcpu=%d", nvc->vcpu->domain->domain_id,
-            nvc->vcpu->vcpu_id, list_empty(&nvc->waitq_elem) ?
-                                nvc->vcpu->processor : -1);
+    printk("[%i.%i] pcpu=%d", nvc->unit->domain->domain_id,
+            nvc->unit->unit_id, list_empty(&nvc->waitq_elem) ?
+                                sched_unit_cpu(nvc->unit) : -1);
 }
 
 static void null_dump_pcpu(const struct scheduler *ops, int cpu)
@@ -812,16 +805,17 @@ static void null_dump_pcpu(const struct scheduler *ops, int cpu)
            cpu,
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_sibling_mask, cpu)),
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
-    if ( per_cpu(npc, cpu).vcpu != NULL )
-        printk(", vcpu=%pv", per_cpu(npc, cpu).vcpu);
+    if ( per_cpu(npc, cpu).unit != NULL )
+        printk(", unit=%pdv%d", per_cpu(npc, cpu).unit->domain,
+               per_cpu(npc, cpu).unit->unit_id);
     printk("\n");
 
-    /* current VCPU (nothing to say if that's the idle vcpu) */
+    /* current unit (nothing to say if that's the idle unit) */
     nvc = null_unit(curr_on_cpu(cpu));
-    if ( nvc && !is_idle_vcpu(nvc->vcpu) )
+    if ( nvc && !is_idle_unit(nvc->unit) )
     {
         printk("\trun: ");
-        dump_vcpu(prv, nvc);
+        dump_unit(prv, nvc);
         printk("\n");
     }
 
@@ -844,23 +838,23 @@ static void null_dump(const struct scheduler *ops)
     list_for_each( iter, &prv->ndom )
     {
         struct null_dom *ndom;
-        struct vcpu *v;
+        struct sched_unit *unit;
 
         ndom = list_entry(iter, struct null_dom, ndom_elem);
 
         printk("\tDomain: %d\n", ndom->dom->domain_id);
-        for_each_vcpu( ndom->dom, v )
+        for_each_sched_unit( ndom->dom, unit )
         {
-            struct null_unit * const nvc = null_unit(v->sched_unit);
+            struct null_unit * const nvc = null_unit(unit);
             spinlock_t *lock;
 
-            lock = unit_schedule_lock(nvc->vcpu->sched_unit);
+            lock = unit_schedule_lock(unit);
 
             printk("\t%3d: ", ++loop);
-            dump_vcpu(prv, nvc);
+            dump_unit(prv, nvc);
             printk("\n");
 
-            unit_schedule_unlock(lock, nvc->vcpu->sched_unit);
+            unit_schedule_unlock(lock, unit);
         }
     }
 
@@ -875,7 +869,7 @@ static void null_dump(const struct scheduler *ops)
             printk(", ");
         if ( loop % 24 == 0 )
             printk("\n\t");
-        printk("%pv", nvc->vcpu);
+        printk("%pdv%d", nvc->unit->domain, nvc->unit->unit_id);
     }
     printk("\n");
     spin_unlock(&prv->waitq_lock);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 20/60] xen/sched: make null scheduler vcpu agnostic.
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Switch null scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_null.c | 304 ++++++++++++++++++++++++------------------------
 1 file changed, 149 insertions(+), 155 deletions(-)

diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index e490b791b8..53a20d73aa 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -18,10 +18,10 @@
 
 /*
  * The 'null' scheduler always choose to run, on each pCPU, either nothing
- * (i.e., the pCPU stays idle) or always the same vCPU.
+ * (i.e., the pCPU stays idle) or always the same Item.
  *
  * It is aimed at supporting static scenarios, where there always are
- * less vCPUs than pCPUs (and the vCPUs don't need to move among pCPUs
+ * less Items than pCPUs (and the Items don't need to move among pCPUs
  * for any reason) with the least possible overhead.
  *
  * Typical usecase are embedded applications, but also HPC, especially
@@ -38,8 +38,8 @@
  * null tracing events. Check include/public/trace.h for more details.
  */
 #define TRC_SNULL_PICKED_CPU    TRC_SCHED_CLASS_EVT(SNULL, 1)
-#define TRC_SNULL_VCPU_ASSIGN   TRC_SCHED_CLASS_EVT(SNULL, 2)
-#define TRC_SNULL_VCPU_DEASSIGN TRC_SCHED_CLASS_EVT(SNULL, 3)
+#define TRC_SNULL_UNIT_ASSIGN   TRC_SCHED_CLASS_EVT(SNULL, 2)
+#define TRC_SNULL_UNIT_DEASSIGN TRC_SCHED_CLASS_EVT(SNULL, 3)
 #define TRC_SNULL_MIGRATE       TRC_SCHED_CLASS_EVT(SNULL, 4)
 #define TRC_SNULL_SCHEDULE      TRC_SCHED_CLASS_EVT(SNULL, 5)
 #define TRC_SNULL_TASKLET       TRC_SCHED_CLASS_EVT(SNULL, 6)
@@ -48,13 +48,13 @@
  * Locking:
  * - Scheduler-lock (a.k.a. runqueue lock):
  *  + is per-pCPU;
- *  + serializes assignment and deassignment of vCPUs to a pCPU.
+ *  + serializes assignment and deassignment of Items to a pCPU.
  * - Private data lock (a.k.a. private scheduler lock):
  *  + is scheduler-wide;
  *  + serializes accesses to the list of domains in this scheduler.
  * - Waitqueue lock:
  *  + is scheduler-wide;
- *  + serialize accesses to the list of vCPUs waiting to be assigned
+ *  + serialize accesses to the list of Items waiting to be assigned
  *    to pCPUs.
  *
  * Ordering is: private lock, runqueue lock, waitqueue lock. Or, OTOH,
@@ -78,25 +78,25 @@
 struct null_private {
     spinlock_t lock;        /* scheduler lock; nests inside cpupool_lock */
     struct list_head ndom;  /* Domains of this scheduler                 */
-    struct list_head waitq; /* vCPUs not assigned to any pCPU            */
+    struct list_head waitq; /* Items not assigned to any pCPU            */
     spinlock_t waitq_lock;  /* serializes waitq; nests inside runq locks */
-    cpumask_t cpus_free;    /* CPUs without a vCPU associated to them    */
+    cpumask_t cpus_free;    /* CPUs without a Item associated to them    */
 };
 
 /*
  * Physical CPU
  */
 struct null_pcpu {
-    struct vcpu *vcpu;
+    struct sched_unit *unit;
 };
 DEFINE_PER_CPU(struct null_pcpu, npc);
 
 /*
- * Virtual CPU
+ * Schedule Item
  */
 struct null_unit {
     struct list_head waitq_elem;
-    struct vcpu *vcpu;
+    struct sched_unit *unit;
 };
 
 /*
@@ -120,13 +120,13 @@ static inline struct null_unit *null_unit(const struct sched_unit *unit)
     return unit->priv;
 }
 
-static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
+static inline bool unit_check_affinity(struct sched_unit *unit,
+                                       unsigned int cpu,
                                        unsigned int balance_step)
 {
-    affinity_balance_cpumask(v->sched_unit, balance_step,
-                             cpumask_scratch_cpu(cpu));
+    affinity_balance_cpumask(unit, balance_step, cpumask_scratch_cpu(cpu));
     cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                cpupool_domain_cpumask(v->domain));
+                cpupool_domain_cpumask(unit->domain));
 
     return cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu));
 }
@@ -161,9 +161,9 @@ static void null_deinit(struct scheduler *ops)
 
 static void init_pdata(struct null_private *prv, unsigned int cpu)
 {
-    /* Mark the pCPU as free, and with no vCPU assigned */
+    /* Mark the pCPU as free, and with no unit assigned */
     cpumask_set_cpu(cpu, &prv->cpus_free);
-    per_cpu(npc, cpu).vcpu = NULL;
+    per_cpu(npc, cpu).unit = NULL;
 }
 
 static void null_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
@@ -191,13 +191,12 @@ static void null_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
     ASSERT(!pcpu);
 
     cpumask_clear_cpu(cpu, &prv->cpus_free);
-    per_cpu(npc, cpu).vcpu = NULL;
+    per_cpu(npc, cpu).unit = NULL;
 }
 
 static void *null_alloc_vdata(const struct scheduler *ops,
                               struct sched_unit *unit, void *dd)
 {
-    struct vcpu *v = unit->vcpu;
     struct null_unit *nvc;
 
     nvc = xzalloc(struct null_unit);
@@ -205,7 +204,7 @@ static void *null_alloc_vdata(const struct scheduler *ops,
         return NULL;
 
     INIT_LIST_HEAD(&nvc->waitq_elem);
-    nvc->vcpu = v;
+    nvc->unit = unit;
 
     SCHED_STAT_CRANK(unit_alloc);
 
@@ -257,15 +256,15 @@ static void null_free_domdata(const struct scheduler *ops, void *data)
 }
 
 /*
- * vCPU to pCPU assignment and placement. This _only_ happens:
+ * unit to pCPU assignment and placement. This _only_ happens:
  *  - on insert,
  *  - on migrate.
  *
- * Insert occurs when a vCPU joins this scheduler for the first time
+ * Insert occurs when a unit joins this scheduler for the first time
  * (e.g., when the domain it's part of is moved to the scheduler's
  * cpupool).
  *
- * Migration may be necessary if a pCPU (with a vCPU assigned to it)
+ * Migration may be necessary if a pCPU (with a unit assigned to it)
  * is removed from the scheduler's cpupool.
  *
  * So this is not part of any hot path.
@@ -274,9 +273,8 @@ static struct sched_resource *
 pick_res(struct null_private *prv, struct sched_unit *unit)
 {
     unsigned int bs;
-    struct vcpu *v = unit->vcpu;
-    unsigned int cpu = v->processor, new_cpu;
-    cpumask_t *cpus = cpupool_domain_cpumask(v->domain);
+    unsigned int cpu = sched_unit_cpu(unit), new_cpu;
+    cpumask_t *cpus = cpupool_domain_cpumask(unit->domain);
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
@@ -291,11 +289,12 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
         /*
          * If our processor is free, or we are assigned to it, and it is also
          * still valid and part of our affinity, just go for it.
-         * (Note that we may call vcpu_check_affinity(), but we deliberately
+         * (Note that we may call unit_check_affinity(), but we deliberately
          * don't, so we get to keep in the scratch cpumask what we have just
          * put in it.)
          */
-        if ( likely((per_cpu(npc, cpu).vcpu == NULL || per_cpu(npc, cpu).vcpu == v)
+        if ( likely((per_cpu(npc, cpu).unit == NULL ||
+                     per_cpu(npc, cpu).unit == unit)
                     && cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
         {
             new_cpu = cpu;
@@ -313,13 +312,13 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
 
     /*
      * If we didn't find any free pCPU, just pick any valid pcpu, even if
-     * it has another vCPU assigned. This will happen during shutdown and
+     * it has another Item assigned. This will happen during shutdown and
      * suspend/resume, but it may also happen during "normal operation", if
      * all the pCPUs are busy.
      *
      * In fact, there must always be something sane in v->processor, or
      * unit_schedule_lock() and friends won't work. This is not a problem,
-     * as we will actually assign the vCPU to the pCPU we return from here,
+     * as we will actually assign the Item to the pCPU we return from here,
      * only if the pCPU is free.
      */
     cpumask_and(cpumask_scratch_cpu(cpu), cpus, unit->cpu_hard_affinity);
@@ -329,11 +328,11 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t unit, dom;
             uint32_t new_cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
         d.new_cpu = new_cpu;
         __trace_var(TRC_SNULL_PICKED_CPU, 1, sizeof(d), &d);
     }
@@ -341,47 +340,47 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
     return get_sched_res(new_cpu);
 }
 
-static void vcpu_assign(struct null_private *prv, struct vcpu *v,
+static void unit_assign(struct null_private *prv, struct sched_unit *unit,
                         unsigned int cpu)
 {
-    per_cpu(npc, cpu).vcpu = v;
-    v->processor = cpu;
-    v->sched_unit->res = get_sched_res(cpu);
+    per_cpu(npc, cpu).unit = unit;
+    sched_set_res(unit, get_sched_res(cpu));
     cpumask_clear_cpu(cpu, &prv->cpus_free);
 
-    dprintk(XENLOG_G_INFO, "%d <-- %pv\n", cpu, v);
+    dprintk(XENLOG_G_INFO, "%d <-- %pdv%d\n", cpu, unit->domain, unit->unit_id);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t unit, dom;
             uint32_t cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
         d.cpu = cpu;
-        __trace_var(TRC_SNULL_VCPU_ASSIGN, 1, sizeof(d), &d);
+        __trace_var(TRC_SNULL_UNIT_ASSIGN, 1, sizeof(d), &d);
     }
 }
 
-static void vcpu_deassign(struct null_private *prv, struct vcpu *v,
+static void unit_deassign(struct null_private *prv, struct sched_unit *unit,
                           unsigned int cpu)
 {
-    per_cpu(npc, cpu).vcpu = NULL;
+    per_cpu(npc, cpu).unit = NULL;
     cpumask_set_cpu(cpu, &prv->cpus_free);
 
-    dprintk(XENLOG_G_INFO, "%d <-- NULL (%pv)\n", cpu, v);
+    dprintk(XENLOG_G_INFO, "%d <-- NULL (%pdv%d)\n", cpu, unit->domain,
+            unit->unit_id);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t unit, dom;
             uint32_t cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
         d.cpu = cpu;
-        __trace_var(TRC_SNULL_VCPU_DEASSIGN, 1, sizeof(d), &d);
+        __trace_var(TRC_SNULL_UNIT_DEASSIGN, 1, sizeof(d), &d);
     }
 }
 
@@ -394,9 +393,9 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
     struct null_private *prv = null_priv(new_ops);
     struct null_unit *nvc = vdata;
 
-    ASSERT(nvc && is_idle_vcpu(nvc->vcpu));
+    ASSERT(nvc && is_idle_unit(nvc->unit));
 
-    idle_vcpu[cpu]->sched_unit->priv = vdata;
+    sched_idle_unit(cpu)->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -413,35 +412,34 @@ static spinlock_t *null_switch_sched(struct scheduler *new_ops,
 static void null_unit_insert(const struct scheduler *ops,
                              struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_unit *nvc = null_unit(unit);
     unsigned int cpu;
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_unit(unit));
 
     lock = unit_schedule_lock_irq(unit);
  retry:
 
-    unit->res = pick_res(prv, unit);
-    cpu = v->processor = unit->res->processor;
+    sched_set_res(unit, pick_res(prv, unit));
+    cpu = sched_unit_cpu(unit);
 
     spin_unlock(lock);
 
     lock = unit_schedule_lock(unit);
 
     cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                cpupool_domain_cpumask(v->domain));
+                cpupool_domain_cpumask(unit->domain));
 
-    /* If the pCPU is free, we assign v to it */
-    if ( likely(per_cpu(npc, cpu).vcpu == NULL) )
+    /* If the pCPU is free, we assign unit to it */
+    if ( likely(per_cpu(npc, cpu).unit == NULL) )
     {
         /*
          * Insert is followed by vcpu_wake(), so there's no need to poke
          * the pcpu with the SCHEDULE_SOFTIRQ, as wake will do that.
          */
-        vcpu_assign(prv, v, cpu);
+        unit_assign(prv, unit, cpu);
     }
     else if ( cpumask_intersects(&prv->cpus_free, cpumask_scratch_cpu(cpu)) )
     {
@@ -460,7 +458,8 @@ static void null_unit_insert(const struct scheduler *ops,
          */
         spin_lock(&prv->waitq_lock);
         list_add_tail(&nvc->waitq_elem, &prv->waitq);
-        dprintk(XENLOG_G_WARNING, "WARNING: %pv not assigned to any CPU!\n", v);
+        dprintk(XENLOG_G_WARNING, "WARNING: %pdv%d not assigned to any CPU!\n",
+                unit->domain, unit->unit_id);
         spin_unlock(&prv->waitq_lock);
     }
     spin_unlock_irq(lock);
@@ -468,35 +467,34 @@ static void null_unit_insert(const struct scheduler *ops,
     SCHED_STAT_CRANK(unit_insert);
 }
 
-static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
+static void _unit_remove(struct null_private *prv, struct sched_unit *unit)
 {
     unsigned int bs;
-    unsigned int cpu = v->processor;
+    unsigned int cpu = sched_unit_cpu(unit);
     struct null_unit *wvc;
 
-    ASSERT(list_empty(&null_unit(v->sched_unit)->waitq_elem));
+    ASSERT(list_empty(&null_unit(unit)->waitq_elem));
 
-    vcpu_deassign(prv, v, cpu);
+    unit_deassign(prv, unit, cpu);
 
     spin_lock(&prv->waitq_lock);
 
     /*
-     * If v is assigned to a pCPU, let's see if there is someone waiting,
-     * suitable to be assigned to it (prioritizing vcpus that have
+     * If unit is assigned to a pCPU, let's see if there is someone waiting,
+     * suitable to be assigned to it (prioritizing units that have
      * soft-affinity with cpu).
      */
     for_each_affinity_balance_step( bs )
     {
         list_for_each_entry( wvc, &prv->waitq, waitq_elem )
         {
-            if ( bs == BALANCE_SOFT_AFFINITY &&
-                 !has_soft_affinity(wvc->vcpu->sched_unit) )
+            if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(wvc->unit) )
                 continue;
 
-            if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
+            if ( unit_check_affinity(wvc->unit, cpu, bs) )
             {
                 list_del_init(&wvc->waitq_elem);
-                vcpu_assign(prv, wvc->vcpu, cpu);
+                unit_assign(prv, wvc->unit, cpu);
                 cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
                 spin_unlock(&prv->waitq_lock);
                 return;
@@ -509,16 +507,15 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
 static void null_unit_remove(const struct scheduler *ops,
                              struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_unit *nvc = null_unit(unit);
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_unit(unit));
 
     lock = unit_schedule_lock_irq(unit);
 
-    /* If v is in waitqueue, just get it out of there and bail */
+    /* If unit is in waitqueue, just get it out of there and bail */
     if ( unlikely(!list_empty(&nvc->waitq_elem)) )
     {
         spin_lock(&prv->waitq_lock);
@@ -528,10 +525,10 @@ static void null_unit_remove(const struct scheduler *ops,
         goto out;
     }
 
-    ASSERT(per_cpu(npc, v->processor).vcpu == v);
-    ASSERT(!cpumask_test_cpu(v->processor, &prv->cpus_free));
+    ASSERT(per_cpu(npc, sched_unit_cpu(unit)).unit == unit);
+    ASSERT(!cpumask_test_cpu(sched_unit_cpu(unit), &prv->cpus_free));
 
-    _vcpu_remove(prv, v);
+    _unit_remove(prv, unit);
 
  out:
     unit_schedule_unlock_irq(lock, unit);
@@ -542,11 +539,9 @@ static void null_unit_remove(const struct scheduler *ops,
 static void null_unit_wake(const struct scheduler *ops,
                            struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
+    ASSERT(!is_idle_unit(unit));
 
-    ASSERT(!is_idle_vcpu(v));
-
-    if ( unlikely(curr_on_cpu(v->processor) == unit) )
+    if ( unlikely(curr_on_cpu(sched_unit_cpu(unit)) == unit) )
     {
         SCHED_STAT_CRANK(unit_wake_running);
         return;
@@ -559,25 +554,23 @@ static void null_unit_wake(const struct scheduler *ops,
         return;
     }
 
-    if ( likely(vcpu_runnable(v)) )
+    if ( likely(unit_runnable(unit)) )
         SCHED_STAT_CRANK(unit_wake_runnable);
     else
         SCHED_STAT_CRANK(unit_wake_not_runnable);
 
-    /* Note that we get here only for vCPUs assigned to a pCPU */
-    cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
+    /* Note that we get here only for units assigned to a pCPU */
+    cpu_raise_softirq(sched_unit_cpu(unit), SCHEDULE_SOFTIRQ);
 }
 
 static void null_unit_sleep(const struct scheduler *ops,
                             struct sched_unit *unit)
 {
-    struct vcpu *v = unit->vcpu;
-
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_unit(unit));
 
-    /* If v is not assigned to a pCPU, or is not running, no need to bother */
-    if ( curr_on_cpu(v->processor) == unit )
-        cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
+    /* If unit isn't assigned to a pCPU, or isn't running, no need to bother */
+    if ( curr_on_cpu(sched_unit_cpu(unit)) == unit )
+        cpu_raise_softirq(sched_unit_cpu(unit), SCHEDULE_SOFTIRQ);
 
     SCHED_STAT_CRANK(unit_sleep);
 }
@@ -585,37 +578,36 @@ static void null_unit_sleep(const struct scheduler *ops,
 static struct sched_resource *
 null_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
-    ASSERT(!is_idle_vcpu(unit->vcpu));
+    ASSERT(!is_idle_unit(unit));
     return pick_res(null_priv(ops), unit);
 }
 
 static void null_unit_migrate(const struct scheduler *ops,
                               struct sched_unit *unit, unsigned int new_cpu)
 {
-    struct vcpu *v = unit->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_unit *nvc = null_unit(unit);
 
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_unit(unit));
 
-    if ( v->processor == new_cpu )
+    if ( sched_unit_cpu(unit) == new_cpu )
         return;
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t unit, dom;
             uint16_t cpu, new_cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
-        d.cpu = v->processor;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
+        d.cpu = sched_unit_cpu(unit);
         d.new_cpu = new_cpu;
         __trace_var(TRC_SNULL_MIGRATE, 1, sizeof(d), &d);
     }
 
     /*
-     * v is either assigned to a pCPU, or in the waitqueue.
+     * unit is either assigned to a pCPU, or in the waitqueue.
      *
      * In the former case, the pCPU to which it was assigned would
      * become free, and we, therefore, should check whether there is
@@ -625,7 +617,7 @@ static void null_unit_migrate(const struct scheduler *ops,
      */
     if ( likely(list_empty(&nvc->waitq_elem)) )
     {
-        _vcpu_remove(prv, v);
+        _unit_remove(prv, unit);
         SCHED_STAT_CRANK(migrate_running);
     }
     else
@@ -634,32 +626,34 @@ static void null_unit_migrate(const struct scheduler *ops,
     SCHED_STAT_CRANK(migrated);
 
     /*
-     * Let's now consider new_cpu, which is where v is being sent. It can be
-     * either free, or have a vCPU already assigned to it.
+     * Let's now consider new_cpu, which is where unit is being sent. It can be
+     * either free, or have a unit already assigned to it.
      *
-     * In the former case, we should assign v to it, and try to get it to run,
+     * In the former case we should assign unit to it, and try to get it to run,
      * if possible, according to affinity.
      *
-     * In latter, all we can do is to park v in the waitqueue.
+     * In latter, all we can do is to park unit in the waitqueue.
      */
-    if ( per_cpu(npc, new_cpu).vcpu == NULL &&
-         vcpu_check_affinity(v, new_cpu, BALANCE_HARD_AFFINITY) )
+    if ( per_cpu(npc, new_cpu).unit == NULL &&
+         unit_check_affinity(unit, new_cpu, BALANCE_HARD_AFFINITY) )
     {
-        /* v might have been in the waitqueue, so remove it */
+        /* unit might have been in the waitqueue, so remove it */
         spin_lock(&prv->waitq_lock);
         list_del_init(&nvc->waitq_elem);
         spin_unlock(&prv->waitq_lock);
 
-        vcpu_assign(prv, v, new_cpu);
+        unit_assign(prv, unit, new_cpu);
     }
     else
     {
-        /* Put v in the waitqueue, if it wasn't there already */
+        /* Put unit in the waitqueue, if it wasn't there already */
         spin_lock(&prv->waitq_lock);
         if ( list_empty(&nvc->waitq_elem) )
         {
             list_add_tail(&nvc->waitq_elem, &prv->waitq);
-            dprintk(XENLOG_G_WARNING, "WARNING: %pv not assigned to any CPU!\n", v);
+            dprintk(XENLOG_G_WARNING,
+                    "WARNING: %pdv%d not assigned to any CPU!\n", unit->domain,
+                    unit->unit_id);
         }
         spin_unlock(&prv->waitq_lock);
     }
@@ -672,35 +666,34 @@ static void null_unit_migrate(const struct scheduler *ops,
      * at least. In case of suspend, any temporary inconsistency caused
      * by this, will be fixed-up during resume.
      */
-    v->processor = new_cpu;
-    unit->res = get_sched_res(new_cpu);
+    sched_set_res(unit, get_sched_res(new_cpu));
 }
 
 #ifndef NDEBUG
-static inline void null_vcpu_check(struct vcpu *v)
+static inline void null_unit_check(struct sched_unit *unit)
 {
-    struct null_unit * const nvc = null_unit(v->sched_unit);
-    struct null_dom * const ndom = v->domain->sched_priv;
+    struct null_unit * const nvc = null_unit(unit);
+    struct null_dom * const ndom = unit->domain->sched_priv;
 
-    BUG_ON(nvc->vcpu != v);
+    BUG_ON(nvc->unit != unit);
 
     if ( ndom )
-        BUG_ON(is_idle_vcpu(v));
+        BUG_ON(is_idle_unit(unit));
     else
-        BUG_ON(!is_idle_vcpu(v));
+        BUG_ON(!is_idle_unit(unit));
 
     SCHED_STAT_CRANK(unit_check);
 }
-#define NULL_VCPU_CHECK(v)  (null_vcpu_check(v))
+#define NULL_UNIT_CHECK(unit)  (null_unit_check(unit))
 #else
-#define NULL_VCPU_CHECK(v)
+#define NULL_UNIT_CHECK(unit)
 #endif
 
 
 /*
  * The most simple scheduling function of all times! We either return:
- *  - the vCPU assigned to the pCPU, if there's one and it can run;
- *  - the idle vCPU, otherwise.
+ *  - the unit assigned to the pCPU, if there's one and it can run;
+ *  - the idle unit, otherwise.
  */
 static struct task_slice null_schedule(const struct scheduler *ops,
                                        s_time_t now,
@@ -713,24 +706,24 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     struct task_slice ret;
 
     SCHED_STAT_CRANK(schedule);
-    NULL_VCPU_CHECK(current);
+    NULL_UNIT_CHECK(current->sched_unit);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
             uint16_t tasklet, cpu;
-            int16_t vcpu, dom;
+            int16_t unit, dom;
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        if ( per_cpu(npc, cpu).vcpu == NULL )
+        if ( per_cpu(npc, cpu).unit == NULL )
         {
-            d.vcpu = d.dom = -1;
+            d.unit = d.dom = -1;
         }
         else
         {
-            d.vcpu = per_cpu(npc, cpu).vcpu->vcpu_id;
-            d.dom = per_cpu(npc, cpu).vcpu->domain->domain_id;
+            d.unit = per_cpu(npc, cpu).unit->unit_id;
+            d.dom = per_cpu(npc, cpu).unit->domain->domain_id;
         }
         __trace_var(TRC_SNULL_SCHEDULE, 1, sizeof(d), &d);
     }
@@ -738,16 +731,16 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = idle_vcpu[cpu]->sched_unit;
+        ret.task = sched_idle_unit(cpu);
     }
     else
-        ret.task = per_cpu(npc, cpu).vcpu->sched_unit;
+        ret.task = per_cpu(npc, cpu).unit;
     ret.migrated = 0;
     ret.time = -1;
 
     /*
      * We may be new in the cpupool, or just coming back online. In which
-     * case, there may be vCPUs in the waitqueue that we can assign to us
+     * case, there may be units in the waitqueue that we can assign to us
      * and run.
      */
     if ( unlikely(ret.task == NULL) )
@@ -758,10 +751,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
             goto unlock;
 
         /*
-         * We scan the waitqueue twice, for prioritizing vcpus that have
+         * We scan the waitqueue twice, for prioritizing units that have
          * soft-affinity with cpu. This may look like something expensive to
-         * do here in null_schedule(), but it's actually fine, beceuse we do
-         * it only in cases where a pcpu has no vcpu associated (e.g., as
+         * do here in null_schedule(), but it's actually fine, because we do
+         * it only in cases where a pcpu has no unit associated (e.g., as
          * said above, the cpu has just joined a cpupool).
          */
         for_each_affinity_balance_step( bs )
@@ -769,14 +762,14 @@ static struct task_slice null_schedule(const struct scheduler *ops,
             list_for_each_entry( wvc, &prv->waitq, waitq_elem )
             {
                 if ( bs == BALANCE_SOFT_AFFINITY &&
-                     !has_soft_affinity(wvc->vcpu->sched_unit) )
+                     !has_soft_affinity(wvc->unit) )
                     continue;
 
-                if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
+                if ( unit_check_affinity(wvc->unit, cpu, bs) )
                 {
-                    vcpu_assign(prv, wvc->vcpu, cpu);
+                    unit_assign(prv, wvc->unit, cpu);
                     list_del_init(&wvc->waitq_elem);
-                    ret.task = wvc->vcpu->sched_unit;
+                    ret.task = wvc->unit;
                     goto unlock;
                 }
             }
@@ -786,17 +779,17 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     }
 
     if ( unlikely(ret.task == NULL || !unit_runnable(ret.task)) )
-        ret.task = idle_vcpu[cpu]->sched_unit;
+        ret.task = sched_idle_unit(cpu);
 
-    NULL_VCPU_CHECK(ret.task->vcpu);
+    NULL_UNIT_CHECK(ret.task);
     return ret;
 }
 
-static inline void dump_vcpu(struct null_private *prv, struct null_unit *nvc)
+static inline void dump_unit(struct null_private *prv, struct null_unit *nvc)
 {
-    printk("[%i.%i] pcpu=%d", nvc->vcpu->domain->domain_id,
-            nvc->vcpu->vcpu_id, list_empty(&nvc->waitq_elem) ?
-                                nvc->vcpu->processor : -1);
+    printk("[%i.%i] pcpu=%d", nvc->unit->domain->domain_id,
+            nvc->unit->unit_id, list_empty(&nvc->waitq_elem) ?
+                                sched_unit_cpu(nvc->unit) : -1);
 }
 
 static void null_dump_pcpu(const struct scheduler *ops, int cpu)
@@ -812,16 +805,17 @@ static void null_dump_pcpu(const struct scheduler *ops, int cpu)
            cpu,
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_sibling_mask, cpu)),
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
-    if ( per_cpu(npc, cpu).vcpu != NULL )
-        printk(", vcpu=%pv", per_cpu(npc, cpu).vcpu);
+    if ( per_cpu(npc, cpu).unit != NULL )
+        printk(", unit=%pdv%d", per_cpu(npc, cpu).unit->domain,
+               per_cpu(npc, cpu).unit->unit_id);
     printk("\n");
 
-    /* current VCPU (nothing to say if that's the idle vcpu) */
+    /* current unit (nothing to say if that's the idle unit) */
     nvc = null_unit(curr_on_cpu(cpu));
-    if ( nvc && !is_idle_vcpu(nvc->vcpu) )
+    if ( nvc && !is_idle_unit(nvc->unit) )
     {
         printk("\trun: ");
-        dump_vcpu(prv, nvc);
+        dump_unit(prv, nvc);
         printk("\n");
     }
 
@@ -844,23 +838,23 @@ static void null_dump(const struct scheduler *ops)
     list_for_each( iter, &prv->ndom )
     {
         struct null_dom *ndom;
-        struct vcpu *v;
+        struct sched_unit *unit;
 
         ndom = list_entry(iter, struct null_dom, ndom_elem);
 
         printk("\tDomain: %d\n", ndom->dom->domain_id);
-        for_each_vcpu( ndom->dom, v )
+        for_each_sched_unit( ndom->dom, unit )
         {
-            struct null_unit * const nvc = null_unit(v->sched_unit);
+            struct null_unit * const nvc = null_unit(unit);
             spinlock_t *lock;
 
-            lock = unit_schedule_lock(nvc->vcpu->sched_unit);
+            lock = unit_schedule_lock(unit);
 
             printk("\t%3d: ", ++loop);
-            dump_vcpu(prv, nvc);
+            dump_unit(prv, nvc);
             printk("\n");
 
-            unit_schedule_unlock(lock, nvc->vcpu->sched_unit);
+            unit_schedule_unlock(lock, unit);
         }
     }
 
@@ -875,7 +869,7 @@ static void null_dump(const struct scheduler *ops)
             printk(", ");
         if ( loop % 24 == 0 )
             printk("\n\t");
-        printk("%pv", nvc->vcpu);
+        printk("%pdv%d", nvc->unit->domain, nvc->unit->unit_id);
     }
     printk("\n");
     spin_unlock(&prv->waitq_lock);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 21/60] xen/sched: make rt scheduler vcpu agnostic.
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Meng Xu, Dario Faggioli

Switch rt scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_rt.c | 356 ++++++++++++++++++++++++--------------------------
 1 file changed, 174 insertions(+), 182 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 5b1f6459cc..57883bd0a0 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -36,7 +36,7 @@
  *
  * Migration compensation and resist like credit2 to better use cache;
  * Lock Holder Problem, using yield?
- * Self switch problem: VCPUs of the same domain may preempt each other;
+ * Self switch problem: UNITs of the same domain may preempt each other;
  */
 
 /*
@@ -44,30 +44,30 @@
  *
  * This scheduler follows the Preemptive Global Earliest Deadline First (EDF)
  * theory in real-time field.
- * At any scheduling point, the VCPU with earlier deadline has higher priority.
- * The scheduler always picks highest priority VCPU to run on a feasible PCPU.
- * A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or
- * has a lower-priority VCPU running on it.)
+ * At any scheduling point, the UNIT with earlier deadline has higher priority.
+ * The scheduler always picks highest priority UNIT to run on a feasible PCPU.
+ * A PCPU is feasible if the UNIT can run on this PCPU and (the PCPU is idle or
+ * has a lower-priority UNIT running on it.)
  *
- * Each VCPU has a dedicated period, budget and a extratime flag
- * The deadline of a VCPU is at the end of each period;
- * A VCPU has its budget replenished at the beginning of each period;
- * While scheduled, a VCPU burns its budget.
- * The VCPU needs to finish its budget before its deadline in each period;
- * The VCPU discards its unused budget at the end of each period.
- * When a VCPU runs out of budget in a period, if its extratime flag is set,
- * the VCPU increases its priority_level by 1 and refills its budget; otherwise,
+ * Each UNIT has a dedicated period, budget and a extratime flag
+ * The deadline of an UNIT is at the end of each period;
+ * An UNIT has its budget replenished at the beginning of each period;
+ * While scheduled, an UNIT burns its budget.
+ * The UNIT needs to finish its budget before its deadline in each period;
+ * The UNIT discards its unused budget at the end of each period.
+ * When an UNIT runs out of budget in a period, if its extratime flag is set,
+ * the UNIT increases its priority_level by 1 and refills its budget; otherwise,
  * it has to wait until next period.
  *
- * Each VCPU is implemented as a deferable server.
- * When a VCPU has a task running on it, its budget is continuously burned;
- * When a VCPU has no task but with budget left, its budget is preserved.
+ * Each UNIT is implemented as a deferable server.
+ * When an UNIT has a task running on it, its budget is continuously burned;
+ * When an UNIT has no task but with budget left, its budget is preserved.
  *
  * Queue scheme:
  * A global runqueue and a global depletedqueue for each CPU pool.
- * The runqueue holds all runnable VCPUs with budget,
+ * The runqueue holds all runnable UNITs with budget,
  * sorted by priority_level and deadline;
- * The depletedqueue holds all VCPUs without budget, unsorted;
+ * The depletedqueue holds all UNITs without budget, unsorted;
  *
  * Note: cpumask and cpupool is supported.
  */
@@ -82,7 +82,7 @@
  * in schedule.c
  *
  * The functions involes RunQ and needs to grab locks are:
- *    vcpu_insert, vcpu_remove, context_saved, runq_insert
+ *    unit_insert, unit_remove, context_saved, runq_insert
  */
 
 
@@ -95,7 +95,7 @@
 
 /*
  * Max period: max delta of time type, because period is added to the time
- * a vcpu activates, so this must not overflow.
+ * an unit activates, so this must not overflow.
  * Min period: 10 us, considering the scheduling overhead (when period is
  * too low, scheduling is invoked too frequently, causing high overhead).
  */
@@ -121,12 +121,12 @@
  * Flags
  */
 /*
- * RTDS_scheduled: Is this vcpu either running on, or context-switching off,
+ * RTDS_scheduled: Is this unit either running on, or context-switching off,
  * a phyiscal cpu?
  * + Accessed only with global lock held.
  * + Set when chosen as next in rt_schedule().
  * + Cleared after context switch has been saved in rt_context_saved()
- * + Checked in vcpu_wake to see if we can add to the Runqueue, or if we should
+ * + Checked in unit_wake to see if we can add to the Runqueue, or if we should
  *   set RTDS_delayed_runq_add
  * + Checked to be false in runq_insert.
  */
@@ -146,15 +146,15 @@
 /*
  * RTDS_depleted: Does this vcp run out of budget?
  * This flag is
- * + set in burn_budget() if a vcpu has zero budget left;
+ * + set in burn_budget() if an unit has zero budget left;
  * + cleared and checked in the repenishment handler,
- *   for the vcpus that are being replenished.
+ *   for the units that are being replenished.
  */
 #define __RTDS_depleted     3
 #define RTDS_depleted (1<<__RTDS_depleted)
 
 /*
- * RTDS_extratime: Can the vcpu run in the time that is
+ * RTDS_extratime: Can the unit run in the time that is
  * not part of any real-time reservation, and would therefore
  * be otherwise left idle?
  */
@@ -183,11 +183,11 @@ struct rt_private {
     spinlock_t lock;            /* the global coarse-grained lock */
     struct list_head sdom;      /* list of availalbe domains, used for dump */
 
-    struct list_head runq;      /* ordered list of runnable vcpus */
-    struct list_head depletedq; /* unordered list of depleted vcpus */
+    struct list_head runq;      /* ordered list of runnable units */
+    struct list_head depletedq; /* unordered list of depleted units */
 
     struct timer repl_timer;    /* replenishment timer */
-    struct list_head replq;     /* ordered list of vcpus that need replenishment */
+    struct list_head replq;     /* ordered list of units that need replenishment */
 
     cpumask_t tickled;          /* cpus been tickled */
 };
@@ -199,18 +199,18 @@ struct rt_unit {
     struct list_head q_elem;     /* on the runq/depletedq list */
     struct list_head replq_elem; /* on the replenishment events list */
 
-    /* VCPU parameters, in nanoseconds */
+    /* UNIT parameters, in nanoseconds */
     s_time_t period;
     s_time_t budget;
 
-    /* VCPU current information in nanosecond */
+    /* UNIT current information in nanosecond */
     s_time_t cur_budget;         /* current budget */
     s_time_t last_start;         /* last start time */
     s_time_t cur_deadline;       /* current deadline for EDF */
 
     /* Up-pointers */
     struct rt_dom *sdom;
-    struct vcpu *vcpu;
+    struct sched_unit *unit;
 
     unsigned priority_level;
 
@@ -263,7 +263,7 @@ static inline bool has_extratime(const struct rt_unit *svc)
  * and the replenishment events queue.
  */
 static int
-vcpu_on_q(const struct rt_unit *svc)
+unit_on_q(const struct rt_unit *svc)
 {
    return !list_empty(&svc->q_elem);
 }
@@ -281,7 +281,7 @@ replq_elem(struct list_head *elem)
 }
 
 static int
-vcpu_on_replq(const struct rt_unit *svc)
+unit_on_replq(const struct rt_unit *svc)
 {
     return !list_empty(&svc->replq_elem);
 }
@@ -291,7 +291,7 @@ vcpu_on_replq(const struct rt_unit *svc)
  * Otherwise, return value < 0
  */
 static s_time_t
-compare_vcpu_priority(const struct rt_unit *v1, const struct rt_unit *v2)
+compare_unit_priority(const struct rt_unit *v1, const struct rt_unit *v2)
 {
     int prio = v2->priority_level - v1->priority_level;
 
@@ -302,15 +302,15 @@ compare_vcpu_priority(const struct rt_unit *v1, const struct rt_unit *v2)
 }
 
 /*
- * Debug related code, dump vcpu/cpu information
+ * Debug related code, dump unit/cpu information
  */
 static void
-rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc)
+rt_dump_unit(const struct scheduler *ops, const struct rt_unit *svc)
 {
     cpumask_t *cpupool_mask, *mask;
 
     ASSERT(svc != NULL);
-    /* idle vcpu */
+    /* idle unit */
     if( svc->sdom == NULL )
     {
         printk("\n");
@@ -321,20 +321,20 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc)
      * We can't just use 'cpumask_scratch' because the dumping can
      * happen from a pCPU outside of this scheduler's cpupool, and
      * hence it's not right to use its pCPU's scratch mask.
-     * On the other hand, it is safe to use svc->vcpu->processor's
+     * On the other hand, it is safe to use sched_unit_cpu(svc->unit)'s
      * own scratch space, since we hold the runqueue lock.
      */
-    mask = cpumask_scratch_cpu(svc->vcpu->processor);
+    mask = cpumask_scratch_cpu(sched_unit_cpu(svc->unit));
 
-    cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain);
-    cpumask_and(mask, cpupool_mask, svc->vcpu->sched_unit->cpu_hard_affinity);
+    cpupool_mask = cpupool_domain_cpumask(svc->unit->domain);
+    cpumask_and(mask, cpupool_mask, svc->unit->cpu_hard_affinity);
     printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
            " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
            " \t\t priority_level=%d has_extratime=%d\n"
            " \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%*pbl\n",
-            svc->vcpu->domain->domain_id,
-            svc->vcpu->vcpu_id,
-            svc->vcpu->processor,
+            svc->unit->domain->domain_id,
+            svc->unit->unit_id,
+            sched_unit_cpu(svc->unit),
             svc->period,
             svc->budget,
             svc->cur_budget,
@@ -342,8 +342,8 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc)
             svc->last_start,
             svc->priority_level,
             has_extratime(svc),
-            vcpu_on_q(svc),
-            vcpu_runnable(svc->vcpu),
+            unit_on_q(svc),
+            unit_runnable(svc->unit),
             svc->flags,
             nr_cpu_ids, cpumask_bits(mask));
 }
@@ -357,11 +357,11 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu)
 
     spin_lock_irqsave(&prv->lock, flags);
     printk("CPU[%02d]\n", cpu);
-    /* current VCPU (nothing to say if that's the idle vcpu). */
+    /* current UNIT (nothing to say if that's the idle unit). */
     svc = rt_unit(curr_on_cpu(cpu));
-    if ( svc && !is_idle_vcpu(svc->vcpu) )
+    if ( svc && !is_idle_unit(svc->unit) )
     {
-        rt_dump_vcpu(ops, svc);
+        rt_dump_unit(ops, svc);
     }
     spin_unlock_irqrestore(&prv->lock, flags);
 }
@@ -388,35 +388,35 @@ rt_dump(const struct scheduler *ops)
     list_for_each ( iter, runq )
     {
         svc = q_elem(iter);
-        rt_dump_vcpu(ops, svc);
+        rt_dump_unit(ops, svc);
     }
 
     printk("Global DepletedQueue info:\n");
     list_for_each ( iter, depletedq )
     {
         svc = q_elem(iter);
-        rt_dump_vcpu(ops, svc);
+        rt_dump_unit(ops, svc);
     }
 
     printk("Global Replenishment Events info:\n");
     list_for_each ( iter, replq )
     {
         svc = replq_elem(iter);
-        rt_dump_vcpu(ops, svc);
+        rt_dump_unit(ops, svc);
     }
 
     printk("Domain info:\n");
     list_for_each ( iter, &prv->sdom )
     {
-        struct vcpu *v;
+        struct sched_unit *unit;
 
         sdom = list_entry(iter, struct rt_dom, sdom_elem);
         printk("\tdomain: %d\n", sdom->dom->domain_id);
 
-        for_each_vcpu ( sdom->dom, v )
+        for_each_sched_unit ( sdom->dom, unit )
         {
-            svc = rt_unit(v->sched_unit);
-            rt_dump_vcpu(ops, svc);
+            svc = rt_unit(unit);
+            rt_dump_unit(ops, svc);
         }
     }
 
@@ -458,12 +458,12 @@ rt_update_deadline(s_time_t now, struct rt_unit *svc)
     /* TRACE */
     {
         struct __packed {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned priority_level;
             uint64_t cur_deadline, cur_budget;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.priority_level = svc->priority_level;
         d.cur_deadline = (uint64_t) svc->cur_deadline;
         d.cur_budget = (uint64_t) svc->cur_budget;
@@ -476,15 +476,15 @@ rt_update_deadline(s_time_t now, struct rt_unit *svc)
 }
 
 /*
- * Helpers for removing and inserting a vcpu in a queue
- * that is being kept ordered by the vcpus' deadlines (as EDF
+ * Helpers for removing and inserting an unit in a queue
+ * that is being kept ordered by the units' deadlines (as EDF
  * mandates).
  *
- * For callers' convenience, the vcpu removing helper returns
- * true if the vcpu removed was the one at the front of the
+ * For callers' convenience, the unit removing helper returns
+ * true if the unit removed was the one at the front of the
  * queue; similarly, the inserting helper returns true if the
  * inserted ended at the front of the queue (i.e., in both
- * cases, if the vcpu with the earliest deadline is what we
+ * cases, if the unit with the earliest deadline is what we
  * are dealing with).
  */
 static inline bool
@@ -510,7 +510,7 @@ deadline_queue_insert(struct rt_unit * (*qelem)(struct list_head *),
     list_for_each ( iter, queue )
     {
         struct rt_unit * iter_svc = (*qelem)(iter);
-        if ( compare_vcpu_priority(svc, iter_svc) > 0 )
+        if ( compare_unit_priority(svc, iter_svc) > 0 )
             break;
         pos++;
     }
@@ -525,7 +525,7 @@ deadline_queue_insert(struct rt_unit * (*qelem)(struct list_head *),
 static inline void
 q_remove(struct rt_unit *svc)
 {
-    ASSERT( vcpu_on_q(svc) );
+    ASSERT( unit_on_q(svc) );
     list_del_init(&svc->q_elem);
 }
 
@@ -535,14 +535,14 @@ replq_remove(const struct scheduler *ops, struct rt_unit *svc)
     struct rt_private *prv = rt_priv(ops);
     struct list_head *replq = rt_replq(ops);
 
-    ASSERT( vcpu_on_replq(svc) );
+    ASSERT( unit_on_replq(svc) );
 
     if ( deadline_queue_remove(replq, &svc->replq_elem) )
     {
         /*
          * The replenishment timer needs to be set to fire when a
-         * replenishment for the vcpu at the front of the replenishment
-         * queue is due. If it is such vcpu that we just removed, we may
+         * replenishment for the unit at the front of the replenishment
+         * queue is due. If it is such unit that we just removed, we may
          * need to reprogram the timer.
          */
         if ( !list_empty(replq) )
@@ -557,7 +557,7 @@ replq_remove(const struct scheduler *ops, struct rt_unit *svc)
 
 /*
  * Insert svc with budget in RunQ according to EDF:
- * vcpus with smaller deadlines go first.
+ * units with smaller deadlines go first.
  * Insert svc without budget in DepletedQ unsorted;
  */
 static void
@@ -567,8 +567,8 @@ runq_insert(const struct scheduler *ops, struct rt_unit *svc)
     struct list_head *runq = rt_runq(ops);
 
     ASSERT( spin_is_locked(&prv->lock) );
-    ASSERT( !vcpu_on_q(svc) );
-    ASSERT( vcpu_on_replq(svc) );
+    ASSERT( !unit_on_q(svc) );
+    ASSERT( unit_on_replq(svc) );
 
     /* add svc to runq if svc still has budget or its extratime is set */
     if ( svc->cur_budget > 0 ||
@@ -584,7 +584,7 @@ replq_insert(const struct scheduler *ops, struct rt_unit *svc)
     struct list_head *replq = rt_replq(ops);
     struct rt_private *prv = rt_priv(ops);
 
-    ASSERT( !vcpu_on_replq(svc) );
+    ASSERT( !unit_on_replq(svc) );
 
     /*
      * The timer may be re-programmed if svc is inserted
@@ -607,12 +607,12 @@ replq_reinsert(const struct scheduler *ops, struct rt_unit *svc)
     struct rt_unit *rearm_svc = svc;
     bool_t rearm = 0;
 
-    ASSERT( vcpu_on_replq(svc) );
+    ASSERT( unit_on_replq(svc) );
 
     /*
      * If svc was at the front of the replenishment queue, we certainly
      * need to re-program the timer, and we want to use the deadline of
-     * the vcpu which is now at the front of the queue (which may still
+     * the unit which is now at the front of the queue (which may still
      * be svc or not).
      *
      * We may also need to re-program, if svc has been put at the front
@@ -632,24 +632,23 @@ replq_reinsert(const struct scheduler *ops, struct rt_unit *svc)
 }
 
 /*
- * Pick a valid resource for the vcpu vc
- * Valid resource of a vcpu is intesection of vcpu's affinity
+ * Pick a valid resource for the unit vc
+ * Valid resource of an unit is intesection of unit's affinity
  * and available resources
  */
 static struct sched_resource *
 rt_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     cpumask_t cpus;
     cpumask_t *online;
     int cpu;
 
-    online = cpupool_domain_cpumask(vc->domain);
+    online = cpupool_domain_cpumask(unit->domain);
     cpumask_and(&cpus, online, unit->cpu_hard_affinity);
 
-    cpu = cpumask_test_cpu(vc->processor, &cpus)
-            ? vc->processor
-            : cpumask_cycle(vc->processor, &cpus);
+    cpu = cpumask_test_cpu(sched_unit_cpu(unit), &cpus)
+            ? sched_unit_cpu(unit)
+            : cpumask_cycle(sched_unit_cpu(unit), &cpus);
     ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
 
     return get_sched_res(cpu);
@@ -738,7 +737,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     struct rt_unit *svc = vdata;
     struct sched_resource *sd = get_sched_res(cpu);
 
-    ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
+    ASSERT(!pdata && svc && is_idle_unit(svc->unit));
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -762,7 +761,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
         dprintk(XENLOG_DEBUG, "RTDS: timer initialized on cpu %u\n", cpu);
     }
 
-    idle_vcpu[cpu]->sched_unit->priv = vdata;
+    sched_idle_unit(cpu)->priv = vdata;
 
     return &prv->lock;
 }
@@ -842,10 +841,9 @@ rt_free_domdata(const struct scheduler *ops, void *data)
 static void *
 rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit *svc;
 
-    /* Allocate per-VCPU info */
+    /* Allocate per-UNIT info */
     svc = xzalloc(struct rt_unit);
     if ( svc == NULL )
         return NULL;
@@ -854,13 +852,13 @@ rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
     INIT_LIST_HEAD(&svc->replq_elem);
     svc->flags = 0U;
     svc->sdom = dd;
-    svc->vcpu = vc;
+    svc->unit = unit;
     svc->last_start = 0;
 
     __set_bit(__RTDS_extratime, &svc->flags);
     svc->priority_level = 0;
     svc->period = RTDS_DEFAULT_PERIOD;
-    if ( !is_idle_vcpu(vc) )
+    if ( !is_idle_unit(unit) )
         svc->budget = RTDS_DEFAULT_BUDGET;
 
     SCHED_STAT_CRANK(unit_alloc);
@@ -880,22 +878,20 @@ rt_free_vdata(const struct scheduler *ops, void *priv)
  * It is called in sched_move_domain() and sched_init_vcpu
  * in schedule.c.
  * When move a domain to a new cpupool.
- * It inserts vcpus of moving domain to the scheduler's RunQ in
+ * It inserts units of moving domain to the scheduler's RunQ in
  * dest. cpupool.
  */
 static void
 rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit *svc = rt_unit(unit);
     s_time_t now;
     spinlock_t *lock;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
 
-    /* This is safe because vc isn't yet being scheduled */
-    unit->res = rt_res_pick(ops, unit);
-    vc->processor = unit->res->processor;
+    /* This is safe because unit isn't yet being scheduled */
+    sched_set_res(unit, rt_res_pick(ops, unit));
 
     lock = unit_schedule_lock_irq(unit);
 
@@ -903,7 +899,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     if ( now >= svc->cur_deadline )
         rt_update_deadline(now, svc);
 
-    if ( !vcpu_on_q(svc) && vcpu_runnable(vc) )
+    if ( !unit_on_q(svc) && unit_runnable(unit) )
     {
         replq_insert(ops, svc);
 
@@ -930,10 +926,10 @@ rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     BUG_ON( sdom == NULL );
 
     lock = unit_schedule_lock_irq(unit);
-    if ( vcpu_on_q(svc) )
+    if ( unit_on_q(svc) )
         q_remove(svc);
 
-    if ( vcpu_on_replq(svc) )
+    if ( unit_on_replq(svc) )
         replq_remove(ops,svc);
 
     unit_schedule_unlock_irq(lock, unit);
@@ -947,8 +943,8 @@ burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
 {
     s_time_t delta;
 
-    /* don't burn budget for idle VCPU */
-    if ( is_idle_vcpu(svc->vcpu) )
+    /* don't burn budget for idle UNIT */
+    if ( is_idle_unit(svc->unit) )
         return;
 
     /* burn at nanoseconds level */
@@ -985,14 +981,14 @@ burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
     /* TRACE */
     {
         struct __packed {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             uint64_t cur_budget;
             int delta;
             unsigned priority_level;
             bool has_extratime;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.cur_budget = (uint64_t) svc->cur_budget;
         d.delta = delta;
         d.priority_level = svc->priority_level;
@@ -1022,9 +1018,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
         iter_svc = q_elem(iter);
 
         /* mask cpu_hard_affinity & cpupool & mask */
-        online = cpupool_domain_cpumask(iter_svc->vcpu->domain);
-        cpumask_and(&cpu_common, online,
-                    iter_svc->vcpu->sched_unit->cpu_hard_affinity);
+        online = cpupool_domain_cpumask(iter_svc->unit->domain);
+        cpumask_and(&cpu_common, online, iter_svc->unit->cpu_hard_affinity);
         cpumask_and(&cpu_common, mask, &cpu_common);
         if ( cpumask_empty(&cpu_common) )
             continue;
@@ -1040,11 +1035,11 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
         if( svc != NULL )
         {
             struct __packed {
-                unsigned vcpu:16, dom:16;
+                unsigned unit:16, dom:16;
                 uint64_t cur_deadline, cur_budget;
             } d;
-            d.dom = svc->vcpu->domain->domain_id;
-            d.vcpu = svc->vcpu->vcpu_id;
+            d.dom = svc->unit->domain->domain_id;
+            d.unit = svc->unit->unit_id;
             d.cur_deadline = (uint64_t) svc->cur_deadline;
             d.cur_budget = (uint64_t) svc->cur_budget;
             trace_var(TRC_RTDS_RUNQ_PICK, 1,
@@ -1068,6 +1063,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     struct rt_unit *const scurr = rt_unit(current->sched_unit);
     struct rt_unit *snext = NULL;
     struct task_slice ret = { .migrated = 0 };
+    struct sched_unit *currunit = current->sched_unit;
 
     /* TRACE */
     {
@@ -1077,7 +1073,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
         d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
-        d.idle = is_idle_vcpu(current);
+        d.idle = is_idle_unit(currunit);
         trace_var(TRC_RTDS_SCHEDULE, 1,
                   sizeof(d),
                   (unsigned char *)&d);
@@ -1086,72 +1082,70 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     /* clear ticked bit now that we've been scheduled */
     cpumask_clear_cpu(cpu, &prv->tickled);
 
-    /* burn_budget would return for IDLE VCPU */
+    /* burn_budget would return for IDLE UNIT */
     burn_budget(ops, scurr, now);
 
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0,  NULL);
-        snext = rt_unit(idle_vcpu[cpu]->sched_unit);
+        snext = rt_unit(sched_idle_unit(cpu));
     }
     else
     {
         snext = runq_pick(ops, cpumask_of(cpu));
         if ( snext == NULL )
-            snext = rt_unit(idle_vcpu[cpu]->sched_unit);
+            snext = rt_unit(sched_idle_unit(cpu));
 
         /* if scurr has higher priority and budget, still pick scurr */
-        if ( !is_idle_vcpu(current) &&
-             vcpu_runnable(current) &&
+        if ( !is_idle_unit(currunit) &&
+             unit_runnable(currunit) &&
              scurr->cur_budget > 0 &&
-             ( is_idle_vcpu(snext->vcpu) ||
-               compare_vcpu_priority(scurr, snext) > 0 ) )
+             ( is_idle_unit(snext->unit) ||
+               compare_unit_priority(scurr, snext) > 0 ) )
             snext = scurr;
     }
 
     if ( snext != scurr &&
-         !is_idle_vcpu(current) &&
-         vcpu_runnable(current) )
+         !is_idle_unit(currunit) &&
+         unit_runnable(currunit) )
         __set_bit(__RTDS_delayed_runq_add, &scurr->flags);
 
     snext->last_start = now;
-    ret.time =  -1; /* if an idle vcpu is picked */
-    if ( !is_idle_vcpu(snext->vcpu) )
+    ret.time =  -1; /* if an idle unit is picked */
+    if ( !is_idle_unit(snext->unit) )
     {
         if ( snext != scurr )
         {
             q_remove(snext);
             __set_bit(__RTDS_scheduled, &snext->flags);
         }
-        if ( snext->vcpu->processor != cpu )
+        if ( sched_unit_cpu(snext->unit) != cpu )
         {
-            snext->vcpu->processor = cpu;
-            snext->vcpu->sched_unit->res = get_sched_res(cpu);
+            sched_set_res(snext->unit, get_sched_res(cpu));
             ret.migrated = 1;
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
     }
-    ret.task = snext->vcpu->sched_unit;
+    ret.task = snext->unit;
 
     return ret;
 }
 
 /*
- * Remove VCPU from RunQ
+ * Remove UNIT from RunQ
  * The lock is already grabbed in schedule.c, no need to lock here
  */
 static void
 rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit * const svc = rt_unit(unit);
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
     SCHED_STAT_CRANK(unit_sleep);
 
-    if ( curr_on_cpu(vc->processor) == unit )
-        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
-    else if ( vcpu_on_q(svc) )
+    if ( curr_on_cpu(sched_unit_cpu(unit)) == unit )
+        cpu_raise_softirq(sched_unit_cpu(unit), SCHEDULE_SOFTIRQ);
+    else if ( unit_on_q(svc) )
     {
         q_remove(svc);
         replq_remove(ops, svc);
@@ -1161,20 +1155,20 @@ rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 }
 
 /*
- * Pick a cpu where to run a vcpu,
- * possibly kicking out the vcpu running there
+ * Pick a cpu where to run an unit,
+ * possibly kicking out the unit running there
  * Called by wake() and context_saved()
  * We have a running candidate here, the kick logic is:
  * Among all the cpus that are within the cpu affinity
  * 1) if there are any idle CPUs, kick one.
       For cache benefit, we check new->cpu as first
  * 2) now all pcpus are busy;
- *    among all the running vcpus, pick lowest priority one
+ *    among all the running units, pick lowest priority one
  *    if snext has higher priority, kick it.
  *
  * TODO:
- * 1) what if these two vcpus belongs to the same domain?
- *    replace a vcpu belonging to the same domain introduces more overhead
+ * 1) what if these two units belongs to the same domain?
+ *    replace an unit belonging to the same domain introduces more overhead
  *
  * lock is grabbed before calling this function
  */
@@ -1182,18 +1176,18 @@ static void
 runq_tickle(const struct scheduler *ops, struct rt_unit *new)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_unit *latest_deadline_vcpu = NULL; /* lowest priority */
+    struct rt_unit *latest_deadline_unit = NULL; /* lowest priority */
     struct rt_unit *iter_svc;
-    struct vcpu *iter_vc;
+    struct sched_unit *iter_unit;
     int cpu = 0, cpu_to_tickle = 0;
     cpumask_t not_tickled;
     cpumask_t *online;
 
-    if ( new == NULL || is_idle_vcpu(new->vcpu) )
+    if ( new == NULL || is_idle_unit(new->unit) )
         return;
 
-    online = cpupool_domain_cpumask(new->vcpu->domain);
-    cpumask_and(&not_tickled, online, new->vcpu->sched_unit->cpu_hard_affinity);
+    online = cpupool_domain_cpumask(new->unit->domain);
+    cpumask_and(&not_tickled, online, new->unit->cpu_hard_affinity);
     cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
 
     /*
@@ -1201,31 +1195,31 @@ runq_tickle(const struct scheduler *ops, struct rt_unit *new)
      *    For cache benefit,we first search new->cpu.
      *    The same loop also find the one with lowest priority.
      */
-    cpu = cpumask_test_or_cycle(new->vcpu->processor, &not_tickled);
+    cpu = cpumask_test_or_cycle(sched_unit_cpu(new->unit), &not_tickled);
     while ( cpu!= nr_cpu_ids )
     {
-        iter_vc = curr_on_cpu(cpu)->vcpu;
-        if ( is_idle_vcpu(iter_vc) )
+        iter_unit = curr_on_cpu(cpu);
+        if ( is_idle_unit(iter_unit) )
         {
             SCHED_STAT_CRANK(tickled_idle_cpu);
             cpu_to_tickle = cpu;
             goto out;
         }
-        iter_svc = rt_unit(iter_vc->sched_unit);
-        if ( latest_deadline_vcpu == NULL ||
-             compare_vcpu_priority(iter_svc, latest_deadline_vcpu) < 0 )
-            latest_deadline_vcpu = iter_svc;
+        iter_svc = rt_unit(iter_unit);
+        if ( latest_deadline_unit == NULL ||
+             compare_unit_priority(iter_svc, latest_deadline_unit) < 0 )
+            latest_deadline_unit = iter_svc;
 
         cpumask_clear_cpu(cpu, &not_tickled);
         cpu = cpumask_cycle(cpu, &not_tickled);
     }
 
-    /* 2) candicate has higher priority, kick out lowest priority vcpu */
-    if ( latest_deadline_vcpu != NULL &&
-         compare_vcpu_priority(latest_deadline_vcpu, new) < 0 )
+    /* 2) candicate has higher priority, kick out lowest priority unit */
+    if ( latest_deadline_unit != NULL &&
+         compare_unit_priority(latest_deadline_unit, new) < 0 )
     {
         SCHED_STAT_CRANK(tickled_busy_cpu);
-        cpu_to_tickle = latest_deadline_vcpu->vcpu->processor;
+        cpu_to_tickle = sched_unit_cpu(latest_deadline_unit->unit);
         goto out;
     }
 
@@ -1251,35 +1245,34 @@ runq_tickle(const struct scheduler *ops, struct rt_unit *new)
 }
 
 /*
- * Should always wake up runnable vcpu, put it back to RunQ.
+ * Should always wake up runnable unit, put it back to RunQ.
  * Check priority to raise interrupt
  * The lock is already grabbed in schedule.c, no need to lock here
- * TODO: what if these two vcpus belongs to the same domain?
+ * TODO: what if these two units belongs to the same domain?
  */
 static void
 rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit * const svc = rt_unit(unit);
     s_time_t now;
     bool_t missed;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == unit) )
+    if ( unlikely(curr_on_cpu(sched_unit_cpu(unit)) == unit) )
     {
         SCHED_STAT_CRANK(unit_wake_running);
         return;
     }
 
     /* on RunQ/DepletedQ, just update info is ok */
-    if ( unlikely(vcpu_on_q(svc)) )
+    if ( unlikely(unit_on_q(svc)) )
     {
         SCHED_STAT_CRANK(unit_wake_onrunq);
         return;
     }
 
-    if ( likely(vcpu_runnable(vc)) )
+    if ( likely(unit_runnable(unit)) )
         SCHED_STAT_CRANK(unit_wake_runnable);
     else
         SCHED_STAT_CRANK(unit_wake_not_runnable);
@@ -1295,16 +1288,16 @@ rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
         rt_update_deadline(now, svc);
 
     /*
-     * If context hasn't been saved for this vcpu yet, we can't put it on
+     * If context hasn't been saved for this unit yet, we can't put it on
      * the run-queue/depleted-queue. Instead, we set the appropriate flag,
-     * the vcpu will be put back on queue after the context has been saved
+     * the unit will be put back on queue after the context has been saved
      * (in rt_context_save()).
      */
     if ( unlikely(svc->flags & RTDS_scheduled) )
     {
         __set_bit(__RTDS_delayed_runq_add, &svc->flags);
         /*
-         * The vcpu is waking up already, and we didn't even had the time to
+         * The unit is waking up already, and we didn't even had the time to
          * remove its next replenishment event from the replenishment queue
          * when it blocked! No big deal. If we did not miss the deadline in
          * the meantime, let's just leave it there. If we did, let's remove it
@@ -1325,22 +1318,21 @@ rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
 /*
  * scurr has finished context switch, insert it back to the RunQ,
- * and then pick the highest priority vcpu from runq to run
+ * and then pick the highest priority unit from runq to run
  */
 static void
 rt_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit *svc = rt_unit(unit);
     spinlock_t *lock = unit_schedule_lock_irq(unit);
 
     __clear_bit(__RTDS_scheduled, &svc->flags);
-    /* not insert idle vcpu to runq */
-    if ( is_idle_vcpu(vc) )
+    /* not insert idle unit to runq */
+    if ( is_idle_unit(unit) )
         goto out;
 
     if ( __test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) &&
-         likely(vcpu_runnable(vc)) )
+         likely(unit_runnable(unit)) )
     {
         runq_insert(ops, svc);
         runq_tickle(ops, svc);
@@ -1353,7 +1345,7 @@ out:
 }
 
 /*
- * set/get each vcpu info of each domain
+ * set/get each unit info of each domain
  */
 static int
 rt_dom_cntl(
@@ -1363,7 +1355,7 @@ rt_dom_cntl(
 {
     struct rt_private *prv = rt_priv(ops);
     struct rt_unit *svc;
-    struct vcpu *v;
+    struct sched_unit *unit;
     unsigned long flags;
     int rc = 0;
     struct xen_domctl_schedparam_vcpu local_sched;
@@ -1384,9 +1376,9 @@ rt_dom_cntl(
             break;
         }
         spin_lock_irqsave(&prv->lock, flags);
-        for_each_vcpu ( d, v )
+        for_each_sched_unit ( d, unit )
         {
-            svc = rt_unit(v->sched_unit);
+            svc = rt_unit(unit);
             svc->period = MICROSECS(op->u.rtds.period); /* transfer to nanosec */
             svc->budget = MICROSECS(op->u.rtds.budget);
         }
@@ -1454,7 +1446,7 @@ rt_dom_cntl(
                 break;
         }
         if ( !rc )
-            /* notify upper caller how many vcpus have been processed. */
+            /* notify upper caller how many units have been processed. */
             op->u.v.nr_vcpus = index;
         break;
     }
@@ -1463,7 +1455,7 @@ rt_dom_cntl(
 }
 
 /*
- * The replenishment timer handler picks vcpus
+ * The replenishment timer handler picks units
  * from the replq and does the actual replenishment.
  */
 static void repl_timer_handler(void *data){
@@ -1481,7 +1473,7 @@ static void repl_timer_handler(void *data){
     now = NOW();
 
     /*
-     * Do the replenishment and move replenished vcpus
+     * Do the replenishment and move replenished units
      * to the temporary list to tickle.
      * If svc is on run queue, we need to put it at
      * the correct place since its deadline changes.
@@ -1497,7 +1489,7 @@ static void repl_timer_handler(void *data){
         rt_update_deadline(now, svc);
         list_add(&svc->replq_elem, &tmp_replq);
 
-        if ( vcpu_on_q(svc) )
+        if ( unit_on_q(svc) )
         {
             q_remove(svc);
             runq_insert(ops, svc);
@@ -1505,26 +1497,26 @@ static void repl_timer_handler(void *data){
     }
 
     /*
-     * Iterate through the list of updated vcpus.
-     * If an updated vcpu is running, tickle the head of the
+     * Iterate through the list of updated units.
+     * If an updated unit is running, tickle the head of the
      * runqueue if it has a higher priority.
-     * If an updated vcpu was depleted and on the runqueue, tickle it.
-     * Finally, reinsert the vcpus back to replenishement events list.
+     * If an updated unit was depleted and on the runqueue, tickle it.
+     * Finally, reinsert the units back to replenishement events list.
      */
     list_for_each_safe ( iter, tmp, &tmp_replq )
     {
         svc = replq_elem(iter);
 
-        if ( curr_on_cpu(svc->vcpu->processor) == svc->vcpu->sched_unit &&
+        if ( curr_on_cpu(sched_unit_cpu(svc->unit)) == svc->unit &&
              !list_empty(runq) )
         {
             struct rt_unit *next_on_runq = q_elem(runq->next);
 
-            if ( compare_vcpu_priority(svc, next_on_runq) < 0 )
+            if ( compare_unit_priority(svc, next_on_runq) < 0 )
                 runq_tickle(ops, next_on_runq);
         }
         else if ( __test_and_clear_bit(__RTDS_depleted, &svc->flags) &&
-                  vcpu_on_q(svc) )
+                  unit_on_q(svc) )
             runq_tickle(ops, svc);
 
         list_del(&svc->replq_elem);
@@ -1532,7 +1524,7 @@ static void repl_timer_handler(void *data){
     }
 
     /*
-     * If there are vcpus left in the replenishment event list,
+     * If there are units left in the replenishment event list,
      * set the next replenishment to happen at the deadline of
      * the one in the front.
      */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 21/60] xen/sched: make rt scheduler vcpu agnostic.
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Meng Xu, Dario Faggioli

Switch rt scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_rt.c | 356 ++++++++++++++++++++++++--------------------------
 1 file changed, 174 insertions(+), 182 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 5b1f6459cc..57883bd0a0 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -36,7 +36,7 @@
  *
  * Migration compensation and resist like credit2 to better use cache;
  * Lock Holder Problem, using yield?
- * Self switch problem: VCPUs of the same domain may preempt each other;
+ * Self switch problem: UNITs of the same domain may preempt each other;
  */
 
 /*
@@ -44,30 +44,30 @@
  *
  * This scheduler follows the Preemptive Global Earliest Deadline First (EDF)
  * theory in real-time field.
- * At any scheduling point, the VCPU with earlier deadline has higher priority.
- * The scheduler always picks highest priority VCPU to run on a feasible PCPU.
- * A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or
- * has a lower-priority VCPU running on it.)
+ * At any scheduling point, the UNIT with earlier deadline has higher priority.
+ * The scheduler always picks highest priority UNIT to run on a feasible PCPU.
+ * A PCPU is feasible if the UNIT can run on this PCPU and (the PCPU is idle or
+ * has a lower-priority UNIT running on it.)
  *
- * Each VCPU has a dedicated period, budget and a extratime flag
- * The deadline of a VCPU is at the end of each period;
- * A VCPU has its budget replenished at the beginning of each period;
- * While scheduled, a VCPU burns its budget.
- * The VCPU needs to finish its budget before its deadline in each period;
- * The VCPU discards its unused budget at the end of each period.
- * When a VCPU runs out of budget in a period, if its extratime flag is set,
- * the VCPU increases its priority_level by 1 and refills its budget; otherwise,
+ * Each UNIT has a dedicated period, budget and a extratime flag
+ * The deadline of an UNIT is at the end of each period;
+ * An UNIT has its budget replenished at the beginning of each period;
+ * While scheduled, an UNIT burns its budget.
+ * The UNIT needs to finish its budget before its deadline in each period;
+ * The UNIT discards its unused budget at the end of each period.
+ * When an UNIT runs out of budget in a period, if its extratime flag is set,
+ * the UNIT increases its priority_level by 1 and refills its budget; otherwise,
  * it has to wait until next period.
  *
- * Each VCPU is implemented as a deferable server.
- * When a VCPU has a task running on it, its budget is continuously burned;
- * When a VCPU has no task but with budget left, its budget is preserved.
+ * Each UNIT is implemented as a deferable server.
+ * When an UNIT has a task running on it, its budget is continuously burned;
+ * When an UNIT has no task but with budget left, its budget is preserved.
  *
  * Queue scheme:
  * A global runqueue and a global depletedqueue for each CPU pool.
- * The runqueue holds all runnable VCPUs with budget,
+ * The runqueue holds all runnable UNITs with budget,
  * sorted by priority_level and deadline;
- * The depletedqueue holds all VCPUs without budget, unsorted;
+ * The depletedqueue holds all UNITs without budget, unsorted;
  *
  * Note: cpumask and cpupool is supported.
  */
@@ -82,7 +82,7 @@
  * in schedule.c
  *
  * The functions involes RunQ and needs to grab locks are:
- *    vcpu_insert, vcpu_remove, context_saved, runq_insert
+ *    unit_insert, unit_remove, context_saved, runq_insert
  */
 
 
@@ -95,7 +95,7 @@
 
 /*
  * Max period: max delta of time type, because period is added to the time
- * a vcpu activates, so this must not overflow.
+ * an unit activates, so this must not overflow.
  * Min period: 10 us, considering the scheduling overhead (when period is
  * too low, scheduling is invoked too frequently, causing high overhead).
  */
@@ -121,12 +121,12 @@
  * Flags
  */
 /*
- * RTDS_scheduled: Is this vcpu either running on, or context-switching off,
+ * RTDS_scheduled: Is this unit either running on, or context-switching off,
  * a phyiscal cpu?
  * + Accessed only with global lock held.
  * + Set when chosen as next in rt_schedule().
  * + Cleared after context switch has been saved in rt_context_saved()
- * + Checked in vcpu_wake to see if we can add to the Runqueue, or if we should
+ * + Checked in unit_wake to see if we can add to the Runqueue, or if we should
  *   set RTDS_delayed_runq_add
  * + Checked to be false in runq_insert.
  */
@@ -146,15 +146,15 @@
 /*
  * RTDS_depleted: Does this vcp run out of budget?
  * This flag is
- * + set in burn_budget() if a vcpu has zero budget left;
+ * + set in burn_budget() if an unit has zero budget left;
  * + cleared and checked in the repenishment handler,
- *   for the vcpus that are being replenished.
+ *   for the units that are being replenished.
  */
 #define __RTDS_depleted     3
 #define RTDS_depleted (1<<__RTDS_depleted)
 
 /*
- * RTDS_extratime: Can the vcpu run in the time that is
+ * RTDS_extratime: Can the unit run in the time that is
  * not part of any real-time reservation, and would therefore
  * be otherwise left idle?
  */
@@ -183,11 +183,11 @@ struct rt_private {
     spinlock_t lock;            /* the global coarse-grained lock */
     struct list_head sdom;      /* list of availalbe domains, used for dump */
 
-    struct list_head runq;      /* ordered list of runnable vcpus */
-    struct list_head depletedq; /* unordered list of depleted vcpus */
+    struct list_head runq;      /* ordered list of runnable units */
+    struct list_head depletedq; /* unordered list of depleted units */
 
     struct timer repl_timer;    /* replenishment timer */
-    struct list_head replq;     /* ordered list of vcpus that need replenishment */
+    struct list_head replq;     /* ordered list of units that need replenishment */
 
     cpumask_t tickled;          /* cpus been tickled */
 };
@@ -199,18 +199,18 @@ struct rt_unit {
     struct list_head q_elem;     /* on the runq/depletedq list */
     struct list_head replq_elem; /* on the replenishment events list */
 
-    /* VCPU parameters, in nanoseconds */
+    /* UNIT parameters, in nanoseconds */
     s_time_t period;
     s_time_t budget;
 
-    /* VCPU current information in nanosecond */
+    /* UNIT current information in nanosecond */
     s_time_t cur_budget;         /* current budget */
     s_time_t last_start;         /* last start time */
     s_time_t cur_deadline;       /* current deadline for EDF */
 
     /* Up-pointers */
     struct rt_dom *sdom;
-    struct vcpu *vcpu;
+    struct sched_unit *unit;
 
     unsigned priority_level;
 
@@ -263,7 +263,7 @@ static inline bool has_extratime(const struct rt_unit *svc)
  * and the replenishment events queue.
  */
 static int
-vcpu_on_q(const struct rt_unit *svc)
+unit_on_q(const struct rt_unit *svc)
 {
    return !list_empty(&svc->q_elem);
 }
@@ -281,7 +281,7 @@ replq_elem(struct list_head *elem)
 }
 
 static int
-vcpu_on_replq(const struct rt_unit *svc)
+unit_on_replq(const struct rt_unit *svc)
 {
     return !list_empty(&svc->replq_elem);
 }
@@ -291,7 +291,7 @@ vcpu_on_replq(const struct rt_unit *svc)
  * Otherwise, return value < 0
  */
 static s_time_t
-compare_vcpu_priority(const struct rt_unit *v1, const struct rt_unit *v2)
+compare_unit_priority(const struct rt_unit *v1, const struct rt_unit *v2)
 {
     int prio = v2->priority_level - v1->priority_level;
 
@@ -302,15 +302,15 @@ compare_vcpu_priority(const struct rt_unit *v1, const struct rt_unit *v2)
 }
 
 /*
- * Debug related code, dump vcpu/cpu information
+ * Debug related code, dump unit/cpu information
  */
 static void
-rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc)
+rt_dump_unit(const struct scheduler *ops, const struct rt_unit *svc)
 {
     cpumask_t *cpupool_mask, *mask;
 
     ASSERT(svc != NULL);
-    /* idle vcpu */
+    /* idle unit */
     if( svc->sdom == NULL )
     {
         printk("\n");
@@ -321,20 +321,20 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc)
      * We can't just use 'cpumask_scratch' because the dumping can
      * happen from a pCPU outside of this scheduler's cpupool, and
      * hence it's not right to use its pCPU's scratch mask.
-     * On the other hand, it is safe to use svc->vcpu->processor's
+     * On the other hand, it is safe to use sched_unit_cpu(svc->unit)'s
      * own scratch space, since we hold the runqueue lock.
      */
-    mask = cpumask_scratch_cpu(svc->vcpu->processor);
+    mask = cpumask_scratch_cpu(sched_unit_cpu(svc->unit));
 
-    cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain);
-    cpumask_and(mask, cpupool_mask, svc->vcpu->sched_unit->cpu_hard_affinity);
+    cpupool_mask = cpupool_domain_cpumask(svc->unit->domain);
+    cpumask_and(mask, cpupool_mask, svc->unit->cpu_hard_affinity);
     printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
            " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
            " \t\t priority_level=%d has_extratime=%d\n"
            " \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%*pbl\n",
-            svc->vcpu->domain->domain_id,
-            svc->vcpu->vcpu_id,
-            svc->vcpu->processor,
+            svc->unit->domain->domain_id,
+            svc->unit->unit_id,
+            sched_unit_cpu(svc->unit),
             svc->period,
             svc->budget,
             svc->cur_budget,
@@ -342,8 +342,8 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc)
             svc->last_start,
             svc->priority_level,
             has_extratime(svc),
-            vcpu_on_q(svc),
-            vcpu_runnable(svc->vcpu),
+            unit_on_q(svc),
+            unit_runnable(svc->unit),
             svc->flags,
             nr_cpu_ids, cpumask_bits(mask));
 }
@@ -357,11 +357,11 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu)
 
     spin_lock_irqsave(&prv->lock, flags);
     printk("CPU[%02d]\n", cpu);
-    /* current VCPU (nothing to say if that's the idle vcpu). */
+    /* current UNIT (nothing to say if that's the idle unit). */
     svc = rt_unit(curr_on_cpu(cpu));
-    if ( svc && !is_idle_vcpu(svc->vcpu) )
+    if ( svc && !is_idle_unit(svc->unit) )
     {
-        rt_dump_vcpu(ops, svc);
+        rt_dump_unit(ops, svc);
     }
     spin_unlock_irqrestore(&prv->lock, flags);
 }
@@ -388,35 +388,35 @@ rt_dump(const struct scheduler *ops)
     list_for_each ( iter, runq )
     {
         svc = q_elem(iter);
-        rt_dump_vcpu(ops, svc);
+        rt_dump_unit(ops, svc);
     }
 
     printk("Global DepletedQueue info:\n");
     list_for_each ( iter, depletedq )
     {
         svc = q_elem(iter);
-        rt_dump_vcpu(ops, svc);
+        rt_dump_unit(ops, svc);
     }
 
     printk("Global Replenishment Events info:\n");
     list_for_each ( iter, replq )
     {
         svc = replq_elem(iter);
-        rt_dump_vcpu(ops, svc);
+        rt_dump_unit(ops, svc);
     }
 
     printk("Domain info:\n");
     list_for_each ( iter, &prv->sdom )
     {
-        struct vcpu *v;
+        struct sched_unit *unit;
 
         sdom = list_entry(iter, struct rt_dom, sdom_elem);
         printk("\tdomain: %d\n", sdom->dom->domain_id);
 
-        for_each_vcpu ( sdom->dom, v )
+        for_each_sched_unit ( sdom->dom, unit )
         {
-            svc = rt_unit(v->sched_unit);
-            rt_dump_vcpu(ops, svc);
+            svc = rt_unit(unit);
+            rt_dump_unit(ops, svc);
         }
     }
 
@@ -458,12 +458,12 @@ rt_update_deadline(s_time_t now, struct rt_unit *svc)
     /* TRACE */
     {
         struct __packed {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned priority_level;
             uint64_t cur_deadline, cur_budget;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.priority_level = svc->priority_level;
         d.cur_deadline = (uint64_t) svc->cur_deadline;
         d.cur_budget = (uint64_t) svc->cur_budget;
@@ -476,15 +476,15 @@ rt_update_deadline(s_time_t now, struct rt_unit *svc)
 }
 
 /*
- * Helpers for removing and inserting a vcpu in a queue
- * that is being kept ordered by the vcpus' deadlines (as EDF
+ * Helpers for removing and inserting an unit in a queue
+ * that is being kept ordered by the units' deadlines (as EDF
  * mandates).
  *
- * For callers' convenience, the vcpu removing helper returns
- * true if the vcpu removed was the one at the front of the
+ * For callers' convenience, the unit removing helper returns
+ * true if the unit removed was the one at the front of the
  * queue; similarly, the inserting helper returns true if the
  * inserted ended at the front of the queue (i.e., in both
- * cases, if the vcpu with the earliest deadline is what we
+ * cases, if the unit with the earliest deadline is what we
  * are dealing with).
  */
 static inline bool
@@ -510,7 +510,7 @@ deadline_queue_insert(struct rt_unit * (*qelem)(struct list_head *),
     list_for_each ( iter, queue )
     {
         struct rt_unit * iter_svc = (*qelem)(iter);
-        if ( compare_vcpu_priority(svc, iter_svc) > 0 )
+        if ( compare_unit_priority(svc, iter_svc) > 0 )
             break;
         pos++;
     }
@@ -525,7 +525,7 @@ deadline_queue_insert(struct rt_unit * (*qelem)(struct list_head *),
 static inline void
 q_remove(struct rt_unit *svc)
 {
-    ASSERT( vcpu_on_q(svc) );
+    ASSERT( unit_on_q(svc) );
     list_del_init(&svc->q_elem);
 }
 
@@ -535,14 +535,14 @@ replq_remove(const struct scheduler *ops, struct rt_unit *svc)
     struct rt_private *prv = rt_priv(ops);
     struct list_head *replq = rt_replq(ops);
 
-    ASSERT( vcpu_on_replq(svc) );
+    ASSERT( unit_on_replq(svc) );
 
     if ( deadline_queue_remove(replq, &svc->replq_elem) )
     {
         /*
          * The replenishment timer needs to be set to fire when a
-         * replenishment for the vcpu at the front of the replenishment
-         * queue is due. If it is such vcpu that we just removed, we may
+         * replenishment for the unit at the front of the replenishment
+         * queue is due. If it is such unit that we just removed, we may
          * need to reprogram the timer.
          */
         if ( !list_empty(replq) )
@@ -557,7 +557,7 @@ replq_remove(const struct scheduler *ops, struct rt_unit *svc)
 
 /*
  * Insert svc with budget in RunQ according to EDF:
- * vcpus with smaller deadlines go first.
+ * units with smaller deadlines go first.
  * Insert svc without budget in DepletedQ unsorted;
  */
 static void
@@ -567,8 +567,8 @@ runq_insert(const struct scheduler *ops, struct rt_unit *svc)
     struct list_head *runq = rt_runq(ops);
 
     ASSERT( spin_is_locked(&prv->lock) );
-    ASSERT( !vcpu_on_q(svc) );
-    ASSERT( vcpu_on_replq(svc) );
+    ASSERT( !unit_on_q(svc) );
+    ASSERT( unit_on_replq(svc) );
 
     /* add svc to runq if svc still has budget or its extratime is set */
     if ( svc->cur_budget > 0 ||
@@ -584,7 +584,7 @@ replq_insert(const struct scheduler *ops, struct rt_unit *svc)
     struct list_head *replq = rt_replq(ops);
     struct rt_private *prv = rt_priv(ops);
 
-    ASSERT( !vcpu_on_replq(svc) );
+    ASSERT( !unit_on_replq(svc) );
 
     /*
      * The timer may be re-programmed if svc is inserted
@@ -607,12 +607,12 @@ replq_reinsert(const struct scheduler *ops, struct rt_unit *svc)
     struct rt_unit *rearm_svc = svc;
     bool_t rearm = 0;
 
-    ASSERT( vcpu_on_replq(svc) );
+    ASSERT( unit_on_replq(svc) );
 
     /*
      * If svc was at the front of the replenishment queue, we certainly
      * need to re-program the timer, and we want to use the deadline of
-     * the vcpu which is now at the front of the queue (which may still
+     * the unit which is now at the front of the queue (which may still
      * be svc or not).
      *
      * We may also need to re-program, if svc has been put at the front
@@ -632,24 +632,23 @@ replq_reinsert(const struct scheduler *ops, struct rt_unit *svc)
 }
 
 /*
- * Pick a valid resource for the vcpu vc
- * Valid resource of a vcpu is intesection of vcpu's affinity
+ * Pick a valid resource for the unit vc
+ * Valid resource of an unit is intesection of unit's affinity
  * and available resources
  */
 static struct sched_resource *
 rt_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     cpumask_t cpus;
     cpumask_t *online;
     int cpu;
 
-    online = cpupool_domain_cpumask(vc->domain);
+    online = cpupool_domain_cpumask(unit->domain);
     cpumask_and(&cpus, online, unit->cpu_hard_affinity);
 
-    cpu = cpumask_test_cpu(vc->processor, &cpus)
-            ? vc->processor
-            : cpumask_cycle(vc->processor, &cpus);
+    cpu = cpumask_test_cpu(sched_unit_cpu(unit), &cpus)
+            ? sched_unit_cpu(unit)
+            : cpumask_cycle(sched_unit_cpu(unit), &cpus);
     ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
 
     return get_sched_res(cpu);
@@ -738,7 +737,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     struct rt_unit *svc = vdata;
     struct sched_resource *sd = get_sched_res(cpu);
 
-    ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
+    ASSERT(!pdata && svc && is_idle_unit(svc->unit));
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -762,7 +761,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
         dprintk(XENLOG_DEBUG, "RTDS: timer initialized on cpu %u\n", cpu);
     }
 
-    idle_vcpu[cpu]->sched_unit->priv = vdata;
+    sched_idle_unit(cpu)->priv = vdata;
 
     return &prv->lock;
 }
@@ -842,10 +841,9 @@ rt_free_domdata(const struct scheduler *ops, void *data)
 static void *
 rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit *svc;
 
-    /* Allocate per-VCPU info */
+    /* Allocate per-UNIT info */
     svc = xzalloc(struct rt_unit);
     if ( svc == NULL )
         return NULL;
@@ -854,13 +852,13 @@ rt_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd)
     INIT_LIST_HEAD(&svc->replq_elem);
     svc->flags = 0U;
     svc->sdom = dd;
-    svc->vcpu = vc;
+    svc->unit = unit;
     svc->last_start = 0;
 
     __set_bit(__RTDS_extratime, &svc->flags);
     svc->priority_level = 0;
     svc->period = RTDS_DEFAULT_PERIOD;
-    if ( !is_idle_vcpu(vc) )
+    if ( !is_idle_unit(unit) )
         svc->budget = RTDS_DEFAULT_BUDGET;
 
     SCHED_STAT_CRANK(unit_alloc);
@@ -880,22 +878,20 @@ rt_free_vdata(const struct scheduler *ops, void *priv)
  * It is called in sched_move_domain() and sched_init_vcpu
  * in schedule.c.
  * When move a domain to a new cpupool.
- * It inserts vcpus of moving domain to the scheduler's RunQ in
+ * It inserts units of moving domain to the scheduler's RunQ in
  * dest. cpupool.
  */
 static void
 rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit *svc = rt_unit(unit);
     s_time_t now;
     spinlock_t *lock;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
 
-    /* This is safe because vc isn't yet being scheduled */
-    unit->res = rt_res_pick(ops, unit);
-    vc->processor = unit->res->processor;
+    /* This is safe because unit isn't yet being scheduled */
+    sched_set_res(unit, rt_res_pick(ops, unit));
 
     lock = unit_schedule_lock_irq(unit);
 
@@ -903,7 +899,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     if ( now >= svc->cur_deadline )
         rt_update_deadline(now, svc);
 
-    if ( !vcpu_on_q(svc) && vcpu_runnable(vc) )
+    if ( !unit_on_q(svc) && unit_runnable(unit) )
     {
         replq_insert(ops, svc);
 
@@ -930,10 +926,10 @@ rt_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     BUG_ON( sdom == NULL );
 
     lock = unit_schedule_lock_irq(unit);
-    if ( vcpu_on_q(svc) )
+    if ( unit_on_q(svc) )
         q_remove(svc);
 
-    if ( vcpu_on_replq(svc) )
+    if ( unit_on_replq(svc) )
         replq_remove(ops,svc);
 
     unit_schedule_unlock_irq(lock, unit);
@@ -947,8 +943,8 @@ burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
 {
     s_time_t delta;
 
-    /* don't burn budget for idle VCPU */
-    if ( is_idle_vcpu(svc->vcpu) )
+    /* don't burn budget for idle UNIT */
+    if ( is_idle_unit(svc->unit) )
         return;
 
     /* burn at nanoseconds level */
@@ -985,14 +981,14 @@ burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
     /* TRACE */
     {
         struct __packed {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             uint64_t cur_budget;
             int delta;
             unsigned priority_level;
             bool has_extratime;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.cur_budget = (uint64_t) svc->cur_budget;
         d.delta = delta;
         d.priority_level = svc->priority_level;
@@ -1022,9 +1018,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
         iter_svc = q_elem(iter);
 
         /* mask cpu_hard_affinity & cpupool & mask */
-        online = cpupool_domain_cpumask(iter_svc->vcpu->domain);
-        cpumask_and(&cpu_common, online,
-                    iter_svc->vcpu->sched_unit->cpu_hard_affinity);
+        online = cpupool_domain_cpumask(iter_svc->unit->domain);
+        cpumask_and(&cpu_common, online, iter_svc->unit->cpu_hard_affinity);
         cpumask_and(&cpu_common, mask, &cpu_common);
         if ( cpumask_empty(&cpu_common) )
             continue;
@@ -1040,11 +1035,11 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
         if( svc != NULL )
         {
             struct __packed {
-                unsigned vcpu:16, dom:16;
+                unsigned unit:16, dom:16;
                 uint64_t cur_deadline, cur_budget;
             } d;
-            d.dom = svc->vcpu->domain->domain_id;
-            d.vcpu = svc->vcpu->vcpu_id;
+            d.dom = svc->unit->domain->domain_id;
+            d.unit = svc->unit->unit_id;
             d.cur_deadline = (uint64_t) svc->cur_deadline;
             d.cur_budget = (uint64_t) svc->cur_budget;
             trace_var(TRC_RTDS_RUNQ_PICK, 1,
@@ -1068,6 +1063,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     struct rt_unit *const scurr = rt_unit(current->sched_unit);
     struct rt_unit *snext = NULL;
     struct task_slice ret = { .migrated = 0 };
+    struct sched_unit *currunit = current->sched_unit;
 
     /* TRACE */
     {
@@ -1077,7 +1073,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
         d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
-        d.idle = is_idle_vcpu(current);
+        d.idle = is_idle_unit(currunit);
         trace_var(TRC_RTDS_SCHEDULE, 1,
                   sizeof(d),
                   (unsigned char *)&d);
@@ -1086,72 +1082,70 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     /* clear ticked bit now that we've been scheduled */
     cpumask_clear_cpu(cpu, &prv->tickled);
 
-    /* burn_budget would return for IDLE VCPU */
+    /* burn_budget would return for IDLE UNIT */
     burn_budget(ops, scurr, now);
 
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0,  NULL);
-        snext = rt_unit(idle_vcpu[cpu]->sched_unit);
+        snext = rt_unit(sched_idle_unit(cpu));
     }
     else
     {
         snext = runq_pick(ops, cpumask_of(cpu));
         if ( snext == NULL )
-            snext = rt_unit(idle_vcpu[cpu]->sched_unit);
+            snext = rt_unit(sched_idle_unit(cpu));
 
         /* if scurr has higher priority and budget, still pick scurr */
-        if ( !is_idle_vcpu(current) &&
-             vcpu_runnable(current) &&
+        if ( !is_idle_unit(currunit) &&
+             unit_runnable(currunit) &&
              scurr->cur_budget > 0 &&
-             ( is_idle_vcpu(snext->vcpu) ||
-               compare_vcpu_priority(scurr, snext) > 0 ) )
+             ( is_idle_unit(snext->unit) ||
+               compare_unit_priority(scurr, snext) > 0 ) )
             snext = scurr;
     }
 
     if ( snext != scurr &&
-         !is_idle_vcpu(current) &&
-         vcpu_runnable(current) )
+         !is_idle_unit(currunit) &&
+         unit_runnable(currunit) )
         __set_bit(__RTDS_delayed_runq_add, &scurr->flags);
 
     snext->last_start = now;
-    ret.time =  -1; /* if an idle vcpu is picked */
-    if ( !is_idle_vcpu(snext->vcpu) )
+    ret.time =  -1; /* if an idle unit is picked */
+    if ( !is_idle_unit(snext->unit) )
     {
         if ( snext != scurr )
         {
             q_remove(snext);
             __set_bit(__RTDS_scheduled, &snext->flags);
         }
-        if ( snext->vcpu->processor != cpu )
+        if ( sched_unit_cpu(snext->unit) != cpu )
         {
-            snext->vcpu->processor = cpu;
-            snext->vcpu->sched_unit->res = get_sched_res(cpu);
+            sched_set_res(snext->unit, get_sched_res(cpu));
             ret.migrated = 1;
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
     }
-    ret.task = snext->vcpu->sched_unit;
+    ret.task = snext->unit;
 
     return ret;
 }
 
 /*
- * Remove VCPU from RunQ
+ * Remove UNIT from RunQ
  * The lock is already grabbed in schedule.c, no need to lock here
  */
 static void
 rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit * const svc = rt_unit(unit);
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
     SCHED_STAT_CRANK(unit_sleep);
 
-    if ( curr_on_cpu(vc->processor) == unit )
-        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
-    else if ( vcpu_on_q(svc) )
+    if ( curr_on_cpu(sched_unit_cpu(unit)) == unit )
+        cpu_raise_softirq(sched_unit_cpu(unit), SCHEDULE_SOFTIRQ);
+    else if ( unit_on_q(svc) )
     {
         q_remove(svc);
         replq_remove(ops, svc);
@@ -1161,20 +1155,20 @@ rt_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 }
 
 /*
- * Pick a cpu where to run a vcpu,
- * possibly kicking out the vcpu running there
+ * Pick a cpu where to run an unit,
+ * possibly kicking out the unit running there
  * Called by wake() and context_saved()
  * We have a running candidate here, the kick logic is:
  * Among all the cpus that are within the cpu affinity
  * 1) if there are any idle CPUs, kick one.
       For cache benefit, we check new->cpu as first
  * 2) now all pcpus are busy;
- *    among all the running vcpus, pick lowest priority one
+ *    among all the running units, pick lowest priority one
  *    if snext has higher priority, kick it.
  *
  * TODO:
- * 1) what if these two vcpus belongs to the same domain?
- *    replace a vcpu belonging to the same domain introduces more overhead
+ * 1) what if these two units belongs to the same domain?
+ *    replace an unit belonging to the same domain introduces more overhead
  *
  * lock is grabbed before calling this function
  */
@@ -1182,18 +1176,18 @@ static void
 runq_tickle(const struct scheduler *ops, struct rt_unit *new)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_unit *latest_deadline_vcpu = NULL; /* lowest priority */
+    struct rt_unit *latest_deadline_unit = NULL; /* lowest priority */
     struct rt_unit *iter_svc;
-    struct vcpu *iter_vc;
+    struct sched_unit *iter_unit;
     int cpu = 0, cpu_to_tickle = 0;
     cpumask_t not_tickled;
     cpumask_t *online;
 
-    if ( new == NULL || is_idle_vcpu(new->vcpu) )
+    if ( new == NULL || is_idle_unit(new->unit) )
         return;
 
-    online = cpupool_domain_cpumask(new->vcpu->domain);
-    cpumask_and(&not_tickled, online, new->vcpu->sched_unit->cpu_hard_affinity);
+    online = cpupool_domain_cpumask(new->unit->domain);
+    cpumask_and(&not_tickled, online, new->unit->cpu_hard_affinity);
     cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
 
     /*
@@ -1201,31 +1195,31 @@ runq_tickle(const struct scheduler *ops, struct rt_unit *new)
      *    For cache benefit,we first search new->cpu.
      *    The same loop also find the one with lowest priority.
      */
-    cpu = cpumask_test_or_cycle(new->vcpu->processor, &not_tickled);
+    cpu = cpumask_test_or_cycle(sched_unit_cpu(new->unit), &not_tickled);
     while ( cpu!= nr_cpu_ids )
     {
-        iter_vc = curr_on_cpu(cpu)->vcpu;
-        if ( is_idle_vcpu(iter_vc) )
+        iter_unit = curr_on_cpu(cpu);
+        if ( is_idle_unit(iter_unit) )
         {
             SCHED_STAT_CRANK(tickled_idle_cpu);
             cpu_to_tickle = cpu;
             goto out;
         }
-        iter_svc = rt_unit(iter_vc->sched_unit);
-        if ( latest_deadline_vcpu == NULL ||
-             compare_vcpu_priority(iter_svc, latest_deadline_vcpu) < 0 )
-            latest_deadline_vcpu = iter_svc;
+        iter_svc = rt_unit(iter_unit);
+        if ( latest_deadline_unit == NULL ||
+             compare_unit_priority(iter_svc, latest_deadline_unit) < 0 )
+            latest_deadline_unit = iter_svc;
 
         cpumask_clear_cpu(cpu, &not_tickled);
         cpu = cpumask_cycle(cpu, &not_tickled);
     }
 
-    /* 2) candicate has higher priority, kick out lowest priority vcpu */
-    if ( latest_deadline_vcpu != NULL &&
-         compare_vcpu_priority(latest_deadline_vcpu, new) < 0 )
+    /* 2) candicate has higher priority, kick out lowest priority unit */
+    if ( latest_deadline_unit != NULL &&
+         compare_unit_priority(latest_deadline_unit, new) < 0 )
     {
         SCHED_STAT_CRANK(tickled_busy_cpu);
-        cpu_to_tickle = latest_deadline_vcpu->vcpu->processor;
+        cpu_to_tickle = sched_unit_cpu(latest_deadline_unit->unit);
         goto out;
     }
 
@@ -1251,35 +1245,34 @@ runq_tickle(const struct scheduler *ops, struct rt_unit *new)
 }
 
 /*
- * Should always wake up runnable vcpu, put it back to RunQ.
+ * Should always wake up runnable unit, put it back to RunQ.
  * Check priority to raise interrupt
  * The lock is already grabbed in schedule.c, no need to lock here
- * TODO: what if these two vcpus belongs to the same domain?
+ * TODO: what if these two units belongs to the same domain?
  */
 static void
 rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit * const svc = rt_unit(unit);
     s_time_t now;
     bool_t missed;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == unit) )
+    if ( unlikely(curr_on_cpu(sched_unit_cpu(unit)) == unit) )
     {
         SCHED_STAT_CRANK(unit_wake_running);
         return;
     }
 
     /* on RunQ/DepletedQ, just update info is ok */
-    if ( unlikely(vcpu_on_q(svc)) )
+    if ( unlikely(unit_on_q(svc)) )
     {
         SCHED_STAT_CRANK(unit_wake_onrunq);
         return;
     }
 
-    if ( likely(vcpu_runnable(vc)) )
+    if ( likely(unit_runnable(unit)) )
         SCHED_STAT_CRANK(unit_wake_runnable);
     else
         SCHED_STAT_CRANK(unit_wake_not_runnable);
@@ -1295,16 +1288,16 @@ rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
         rt_update_deadline(now, svc);
 
     /*
-     * If context hasn't been saved for this vcpu yet, we can't put it on
+     * If context hasn't been saved for this unit yet, we can't put it on
      * the run-queue/depleted-queue. Instead, we set the appropriate flag,
-     * the vcpu will be put back on queue after the context has been saved
+     * the unit will be put back on queue after the context has been saved
      * (in rt_context_save()).
      */
     if ( unlikely(svc->flags & RTDS_scheduled) )
     {
         __set_bit(__RTDS_delayed_runq_add, &svc->flags);
         /*
-         * The vcpu is waking up already, and we didn't even had the time to
+         * The unit is waking up already, and we didn't even had the time to
          * remove its next replenishment event from the replenishment queue
          * when it blocked! No big deal. If we did not miss the deadline in
          * the meantime, let's just leave it there. If we did, let's remove it
@@ -1325,22 +1318,21 @@ rt_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
 /*
  * scurr has finished context switch, insert it back to the RunQ,
- * and then pick the highest priority vcpu from runq to run
+ * and then pick the highest priority unit from runq to run
  */
 static void
 rt_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct rt_unit *svc = rt_unit(unit);
     spinlock_t *lock = unit_schedule_lock_irq(unit);
 
     __clear_bit(__RTDS_scheduled, &svc->flags);
-    /* not insert idle vcpu to runq */
-    if ( is_idle_vcpu(vc) )
+    /* not insert idle unit to runq */
+    if ( is_idle_unit(unit) )
         goto out;
 
     if ( __test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) &&
-         likely(vcpu_runnable(vc)) )
+         likely(unit_runnable(unit)) )
     {
         runq_insert(ops, svc);
         runq_tickle(ops, svc);
@@ -1353,7 +1345,7 @@ out:
 }
 
 /*
- * set/get each vcpu info of each domain
+ * set/get each unit info of each domain
  */
 static int
 rt_dom_cntl(
@@ -1363,7 +1355,7 @@ rt_dom_cntl(
 {
     struct rt_private *prv = rt_priv(ops);
     struct rt_unit *svc;
-    struct vcpu *v;
+    struct sched_unit *unit;
     unsigned long flags;
     int rc = 0;
     struct xen_domctl_schedparam_vcpu local_sched;
@@ -1384,9 +1376,9 @@ rt_dom_cntl(
             break;
         }
         spin_lock_irqsave(&prv->lock, flags);
-        for_each_vcpu ( d, v )
+        for_each_sched_unit ( d, unit )
         {
-            svc = rt_unit(v->sched_unit);
+            svc = rt_unit(unit);
             svc->period = MICROSECS(op->u.rtds.period); /* transfer to nanosec */
             svc->budget = MICROSECS(op->u.rtds.budget);
         }
@@ -1454,7 +1446,7 @@ rt_dom_cntl(
                 break;
         }
         if ( !rc )
-            /* notify upper caller how many vcpus have been processed. */
+            /* notify upper caller how many units have been processed. */
             op->u.v.nr_vcpus = index;
         break;
     }
@@ -1463,7 +1455,7 @@ rt_dom_cntl(
 }
 
 /*
- * The replenishment timer handler picks vcpus
+ * The replenishment timer handler picks units
  * from the replq and does the actual replenishment.
  */
 static void repl_timer_handler(void *data){
@@ -1481,7 +1473,7 @@ static void repl_timer_handler(void *data){
     now = NOW();
 
     /*
-     * Do the replenishment and move replenished vcpus
+     * Do the replenishment and move replenished units
      * to the temporary list to tickle.
      * If svc is on run queue, we need to put it at
      * the correct place since its deadline changes.
@@ -1497,7 +1489,7 @@ static void repl_timer_handler(void *data){
         rt_update_deadline(now, svc);
         list_add(&svc->replq_elem, &tmp_replq);
 
-        if ( vcpu_on_q(svc) )
+        if ( unit_on_q(svc) )
         {
             q_remove(svc);
             runq_insert(ops, svc);
@@ -1505,26 +1497,26 @@ static void repl_timer_handler(void *data){
     }
 
     /*
-     * Iterate through the list of updated vcpus.
-     * If an updated vcpu is running, tickle the head of the
+     * Iterate through the list of updated units.
+     * If an updated unit is running, tickle the head of the
      * runqueue if it has a higher priority.
-     * If an updated vcpu was depleted and on the runqueue, tickle it.
-     * Finally, reinsert the vcpus back to replenishement events list.
+     * If an updated unit was depleted and on the runqueue, tickle it.
+     * Finally, reinsert the units back to replenishement events list.
      */
     list_for_each_safe ( iter, tmp, &tmp_replq )
     {
         svc = replq_elem(iter);
 
-        if ( curr_on_cpu(svc->vcpu->processor) == svc->vcpu->sched_unit &&
+        if ( curr_on_cpu(sched_unit_cpu(svc->unit)) == svc->unit &&
              !list_empty(runq) )
         {
             struct rt_unit *next_on_runq = q_elem(runq->next);
 
-            if ( compare_vcpu_priority(svc, next_on_runq) < 0 )
+            if ( compare_unit_priority(svc, next_on_runq) < 0 )
                 runq_tickle(ops, next_on_runq);
         }
         else if ( __test_and_clear_bit(__RTDS_depleted, &svc->flags) &&
-                  vcpu_on_q(svc) )
+                  unit_on_q(svc) )
             runq_tickle(ops, svc);
 
         list_del(&svc->replq_elem);
@@ -1532,7 +1524,7 @@ static void repl_timer_handler(void *data){
     }
 
     /*
-     * If there are vcpus left in the replenishment event list,
+     * If there are units left in the replenishment event list,
      * set the next replenishment to happen at the deadline of
      * the one in the front.
      */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 22/60] xen/sched: make credit scheduler vcpu agnostic.
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Switch credit scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c | 506 +++++++++++++++++++++++-----------------------
 1 file changed, 252 insertions(+), 254 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index b700cc07ce..0bc6a87d30 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -70,10 +70,10 @@
  * inconsistent set of locks. Therefore atomic-safe bit operations must
  * be used for accessing it.
  */
-#define CSCHED_FLAG_VCPU_PARKED    0x0  /* VCPU over capped credits */
-#define CSCHED_FLAG_VCPU_YIELD     0x1  /* VCPU yielding */
-#define CSCHED_FLAG_VCPU_MIGRATING 0x2  /* VCPU may have moved to a new pcpu */
-#define CSCHED_FLAG_VCPU_PINNED    0x4  /* VCPU can run only on 1 pcpu */
+#define CSCHED_FLAG_UNIT_PARKED    0x0  /* UNIT over capped credits */
+#define CSCHED_FLAG_UNIT_YIELD     0x1  /* UNIT yielding */
+#define CSCHED_FLAG_UNIT_MIGRATING 0x2  /* UNIT may have moved to a new pcpu */
+#define CSCHED_FLAG_UNIT_PINNED    0x4  /* UNIT can run only on 1 pcpu */
 
 
 /*
@@ -91,7 +91,7 @@
 /*
  * CSCHED_STATS
  *
- * Manage very basic per-vCPU counters and stats.
+ * Manage very basic per-unit counters and stats.
  *
  * Useful for debugging live systems. The stats are displayed
  * with runq dumps ('r' on the Xen console).
@@ -100,23 +100,23 @@
 
 #define CSCHED_STATS
 
-#define SCHED_VCPU_STATS_RESET(_V)                      \
+#define SCHED_UNIT_STATS_RESET(_V)                      \
     do                                                  \
     {                                                   \
         memset(&(_V)->stats, 0, sizeof((_V)->stats));   \
     } while ( 0 )
 
-#define SCHED_VCPU_STAT_CRANK(_V, _X)       (((_V)->stats._X)++)
+#define SCHED_UNIT_STAT_CRANK(_V, _X)       (((_V)->stats._X)++)
 
-#define SCHED_VCPU_STAT_SET(_V, _X, _Y)     (((_V)->stats._X) = (_Y))
+#define SCHED_UNIT_STAT_SET(_V, _X, _Y)     (((_V)->stats._X) = (_Y))
 
 #else /* !SCHED_STATS */
 
 #undef CSCHED_STATS
 
-#define SCHED_VCPU_STATS_RESET(_V)         do {} while ( 0 )
-#define SCHED_VCPU_STAT_CRANK(_V, _X)      do {} while ( 0 )
-#define SCHED_VCPU_STAT_SET(_V, _X, _Y)    do {} while ( 0 )
+#define SCHED_UNIT_STATS_RESET(_V)         do {} while ( 0 )
+#define SCHED_UNIT_STAT_CRANK(_V, _X)      do {} while ( 0 )
+#define SCHED_UNIT_STAT_SET(_V, _X, _Y)    do {} while ( 0 )
 
 #endif /* SCHED_STATS */
 
@@ -128,7 +128,7 @@
 #define TRC_CSCHED_SCHED_TASKLET TRC_SCHED_CLASS_EVT(CSCHED, 1)
 #define TRC_CSCHED_ACCOUNT_START TRC_SCHED_CLASS_EVT(CSCHED, 2)
 #define TRC_CSCHED_ACCOUNT_STOP  TRC_SCHED_CLASS_EVT(CSCHED, 3)
-#define TRC_CSCHED_STOLEN_VCPU   TRC_SCHED_CLASS_EVT(CSCHED, 4)
+#define TRC_CSCHED_STOLEN_UNIT   TRC_SCHED_CLASS_EVT(CSCHED, 4)
 #define TRC_CSCHED_PICKED_CPU    TRC_SCHED_CLASS_EVT(CSCHED, 5)
 #define TRC_CSCHED_TICKLE        TRC_SCHED_CLASS_EVT(CSCHED, 6)
 #define TRC_CSCHED_BOOST_START   TRC_SCHED_CLASS_EVT(CSCHED, 7)
@@ -158,15 +158,15 @@ struct csched_pcpu {
 };
 
 /*
- * Virtual CPU
+ * Virtual UNIT
  */
 struct csched_unit {
     struct list_head runq_elem;
-    struct list_head active_vcpu_elem;
+    struct list_head active_unit_elem;
 
     /* Up-pointers */
     struct csched_dom *sdom;
-    struct vcpu *vcpu;
+    struct sched_unit *unit;
 
     s_time_t start_time;   /* When we were scheduled (used for credit) */
     unsigned flags;
@@ -192,10 +192,10 @@ struct csched_unit {
  * Domain
  */
 struct csched_dom {
-    struct list_head active_vcpu;
+    struct list_head active_unit;
     struct list_head active_sdom_elem;
     struct domain *dom;
-    uint16_t active_vcpu_count;
+    uint16_t active_unit_count;
     uint16_t weight;
     uint16_t cap;
 };
@@ -215,7 +215,7 @@ struct csched_private {
 
     /* Period of master and tick in milliseconds */
     unsigned int tick_period_us, ticks_per_tslice;
-    s_time_t ratelimit, tslice, vcpu_migr_delay;
+    s_time_t ratelimit, tslice, unit_migr_delay;
 
     struct list_head active_sdom;
     uint32_t weight;
@@ -231,7 +231,7 @@ static void csched_tick(void *_cpu);
 static void csched_acct(void *dummy);
 
 static inline int
-__vcpu_on_runq(struct csched_unit *svc)
+__unit_on_runq(struct csched_unit *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
@@ -242,7 +242,7 @@ __runq_elem(struct list_head *elem)
     return list_entry(elem, struct csched_unit, runq_elem);
 }
 
-/* Is the first element of cpu's runq (if any) cpu's idle vcpu? */
+/* Is the first element of cpu's runq (if any) cpu's idle unit? */
 static inline bool_t is_runq_idle(unsigned int cpu)
 {
     /*
@@ -251,7 +251,7 @@ static inline bool_t is_runq_idle(unsigned int cpu)
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     return list_empty(RUNQ(cpu)) ||
-           is_idle_vcpu(__runq_elem(RUNQ(cpu)->next)->vcpu);
+           is_idle_unit(__runq_elem(RUNQ(cpu)->next)->unit);
 }
 
 static inline void
@@ -273,11 +273,11 @@ dec_nr_runnable(unsigned int cpu)
 static inline void
 __runq_insert(struct csched_unit *svc)
 {
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_unit_cpu(svc->unit);
     const struct list_head * const runq = RUNQ(cpu);
     struct list_head *iter;
 
-    BUG_ON( __vcpu_on_runq(svc) );
+    BUG_ON( __unit_on_runq(svc) );
 
     list_for_each( iter, runq )
     {
@@ -286,10 +286,10 @@ __runq_insert(struct csched_unit *svc)
             break;
     }
 
-    /* If the vcpu yielded, try to put it behind one lower-priority
-     * runnable vcpu if we can.  The next runq_sort will bring it forward
+    /* If the unit yielded, try to put it behind one lower-priority
+     * runnable unit if we can.  The next runq_sort will bring it forward
      * within 30ms if the queue too long. */
-    if ( test_bit(CSCHED_FLAG_VCPU_YIELD, &svc->flags)
+    if ( test_bit(CSCHED_FLAG_UNIT_YIELD, &svc->flags)
          && __runq_elem(iter)->pri > CSCHED_PRI_IDLE )
     {
         iter=iter->next;
@@ -305,20 +305,20 @@ static inline void
 runq_insert(struct csched_unit *svc)
 {
     __runq_insert(svc);
-    inc_nr_runnable(svc->vcpu->processor);
+    inc_nr_runnable(sched_unit_cpu(svc->unit));
 }
 
 static inline void
 __runq_remove(struct csched_unit *svc)
 {
-    BUG_ON( !__vcpu_on_runq(svc) );
+    BUG_ON( !__unit_on_runq(svc) );
     list_del_init(&svc->runq_elem);
 }
 
 static inline void
 runq_remove(struct csched_unit *svc)
 {
-    dec_nr_runnable(svc->vcpu->processor);
+    dec_nr_runnable(sched_unit_cpu(svc->unit));
     __runq_remove(svc);
 }
 
@@ -329,7 +329,7 @@ static void burn_credits(struct csched_unit *svc, s_time_t now)
     unsigned int credits;
 
     /* Assert svc is current */
-    ASSERT( svc == CSCHED_UNIT(curr_on_cpu(svc->vcpu->processor)) );
+    ASSERT( svc == CSCHED_UNIT(curr_on_cpu(sched_unit_cpu(svc->unit))) );
 
     if ( (delta = now - svc->start_time) <= 0 )
         return;
@@ -349,8 +349,8 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 
 static inline void __runq_tickle(struct csched_unit *new)
 {
-    unsigned int cpu = new->vcpu->processor;
-    struct sched_unit *unit = new->vcpu->sched_unit;
+    unsigned int cpu = sched_unit_cpu(new->unit);
+    struct sched_unit *unit = new->unit;
     struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     cpumask_t mask, idle_mask, *online;
@@ -364,16 +364,16 @@ static inline void __runq_tickle(struct csched_unit *new)
     idlers_empty = cpumask_empty(&idle_mask);
 
     /*
-     * Exclusive pinning is when a vcpu has hard-affinity with only one
-     * cpu, and there is no other vcpu that has hard-affinity with that
+     * Exclusive pinning is when a unit has hard-affinity with only one
+     * cpu, and there is no other unit that has hard-affinity with that
      * same cpu. This is infrequent, but if it happens, is for achieving
      * the most possible determinism, and least possible overhead for
-     * the vcpus in question.
+     * the units in question.
      *
      * Try to identify the vast majority of these situations, and deal
      * with them quickly.
      */
-    if ( unlikely(test_bit(CSCHED_FLAG_VCPU_PINNED, &new->flags) &&
+    if ( unlikely(test_bit(CSCHED_FLAG_UNIT_PINNED, &new->flags) &&
                   cpumask_test_cpu(cpu, &idle_mask)) )
     {
         ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu);
@@ -384,7 +384,7 @@ static inline void __runq_tickle(struct csched_unit *new)
 
     /*
      * If the pcpu is idle, or there are no idlers and the new
-     * vcpu is a higher priority than the old vcpu, run it here.
+     * unit is a higher priority than the old unit, run it here.
      *
      * If there are idle cpus, first try to find one suitable to run
      * new, so we can avoid preempting cur.  If we cannot find a
@@ -403,7 +403,7 @@ static inline void __runq_tickle(struct csched_unit *new)
     else if ( !idlers_empty )
     {
         /*
-         * Soft and hard affinity balancing loop. For vcpus without
+         * Soft and hard affinity balancing loop. For units without
          * a useful soft affinity, consider hard affinity only.
          */
         for_each_affinity_balance_step( balance_step )
@@ -446,10 +446,10 @@ static inline void __runq_tickle(struct csched_unit *new)
             {
                 if ( cpumask_intersects(unit->cpu_hard_affinity, &idle_mask) )
                 {
-                    SCHED_VCPU_STAT_CRANK(cur, kicked_away);
-                    SCHED_VCPU_STAT_CRANK(cur, migrate_r);
+                    SCHED_UNIT_STAT_CRANK(cur, kicked_away);
+                    SCHED_UNIT_STAT_CRANK(cur, migrate_r);
                     SCHED_STAT_CRANK(migrate_kicked_away);
-                    set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
+                    sched_set_pause_flags_atomic(cur->unit, _VPF_migrating);
                 }
                 /* Tickle cpu anyway, to let new preempt cur. */
                 SCHED_STAT_CRANK(tickled_busy_cpu);
@@ -605,7 +605,7 @@ init_pdata(struct csched_private *prv, struct csched_pcpu *spc, int cpu)
     spc->idle_bias = nr_cpu_ids - 1;
 
     /* Start off idling... */
-    BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)->vcpu));
+    BUG_ON(!is_idle_unit(curr_on_cpu(cpu)));
     cpumask_set_cpu(cpu, prv->idlers);
     spc->nr_runnable = 0;
 }
@@ -639,9 +639,9 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     struct csched_private *prv = CSCHED_PRIV(new_ops);
     struct csched_unit *svc = vdata;
 
-    ASSERT(svc && is_idle_vcpu(svc->vcpu));
+    ASSERT(svc && is_idle_unit(svc->unit));
 
-    idle_vcpu[cpu]->sched_unit->priv = vdata;
+    sched_idle_unit(cpu)->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -658,33 +658,33 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
 #ifndef NDEBUG
 static inline void
-__csched_vcpu_check(struct vcpu *vc)
+__csched_unit_check(struct sched_unit *unit)
 {
-    struct csched_unit * const svc = CSCHED_UNIT(vc->sched_unit);
+    struct csched_unit * const svc = CSCHED_UNIT(unit);
     struct csched_dom * const sdom = svc->sdom;
 
-    BUG_ON( svc->vcpu != vc );
-    BUG_ON( sdom != CSCHED_DOM(vc->domain) );
+    BUG_ON( svc->unit != unit );
+    BUG_ON( sdom != CSCHED_DOM(unit->domain) );
     if ( sdom )
     {
-        BUG_ON( is_idle_vcpu(vc) );
-        BUG_ON( sdom->dom != vc->domain );
+        BUG_ON( is_idle_unit(unit) );
+        BUG_ON( sdom->dom != unit->domain );
     }
     else
     {
-        BUG_ON( !is_idle_vcpu(vc) );
+        BUG_ON( !is_idle_unit(unit) );
     }
 
     SCHED_STAT_CRANK(unit_check);
 }
-#define CSCHED_VCPU_CHECK(_vc)  (__csched_vcpu_check(_vc))
+#define CSCHED_UNIT_CHECK(unit)  (__csched_unit_check(unit))
 #else
-#define CSCHED_VCPU_CHECK(_vc)
+#define CSCHED_UNIT_CHECK(unit)
 #endif
 
 /*
- * Delay, in microseconds, between migrations of a VCPU between PCPUs.
- * This prevents rapid fluttering of a VCPU between CPUs, and reduces the
+ * Delay, in microseconds, between migrations of a UNIT between PCPUs.
+ * This prevents rapid fluttering of a UNIT between CPUs, and reduces the
  * implicit overheads such as cache-warming. 1ms (1000) has been measured
  * as a good value.
  */
@@ -692,10 +692,11 @@ static unsigned int vcpu_migration_delay_us;
 integer_param("vcpu_migration_delay", vcpu_migration_delay_us);
 
 static inline bool
-__csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
+__csched_unit_is_cache_hot(const struct csched_private *prv,
+                           struct sched_unit *unit)
 {
-    bool hot = prv->vcpu_migr_delay &&
-               (NOW() - v->sched_unit->last_run_time) < prv->vcpu_migr_delay;
+    bool hot = prv->unit_migr_delay &&
+               (NOW() - unit->last_run_time) < prv->unit_migr_delay;
 
     if ( hot )
         SCHED_STAT_CRANK(unit_hot);
@@ -704,36 +705,38 @@ __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
 }
 
 static inline int
-__csched_vcpu_is_migrateable(const struct csched_private *prv, struct vcpu *vc,
+__csched_unit_is_migrateable(const struct csched_private *prv,
+                             struct sched_unit *unit,
                              int dest_cpu, cpumask_t *mask)
 {
     /*
      * Don't pick up work that's hot on peer PCPU, or that can't (or
      * would prefer not to) run on cpu.
      *
-     * The caller is supposed to have already checked that vc is also
+     * The caller is supposed to have already checked that unit is also
      * not running.
      */
-    ASSERT(!vc->sched_unit->is_running);
+    ASSERT(!unit->is_running);
 
-    return !__csched_vcpu_is_cache_hot(prv, vc) &&
+    return !__csched_unit_is_cache_hot(prv, unit) &&
            cpumask_test_cpu(dest_cpu, mask);
 }
 
 static int
-_csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
+_csched_cpu_pick(const struct scheduler *ops, struct sched_unit *unit,
+                 bool_t commit)
 {
-    /* We must always use vc->procssor's scratch space */
-    cpumask_t *cpus = cpumask_scratch_cpu(vc->processor);
+    int cpu = sched_unit_cpu(unit);
+    /* We must always use cpu's scratch space */
+    cpumask_t *cpus = cpumask_scratch_cpu(cpu);
     cpumask_t idlers;
-    cpumask_t *online = cpupool_domain_cpumask(vc->domain);
+    cpumask_t *online = cpupool_domain_cpumask(unit->domain);
     struct csched_pcpu *spc = NULL;
-    int cpu = vc->processor;
     int balance_step;
 
     for_each_affinity_balance_step( balance_step )
     {
-        affinity_balance_cpumask(vc->sched_unit, balance_step, cpus);
+        affinity_balance_cpumask(unit, balance_step, cpus);
         cpumask_and(cpus, online, cpus);
         /*
          * We want to pick up a pcpu among the ones that are online and
@@ -752,12 +755,13 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * balancing step all together.
          */
         if ( balance_step == BALANCE_SOFT_AFFINITY &&
-             (!has_soft_affinity(vc->sched_unit) || cpumask_empty(cpus)) )
+             (!has_soft_affinity(unit) || cpumask_empty(cpus)) )
             continue;
 
         /* If present, prefer vc's current processor */
-        cpu = cpumask_test_cpu(vc->processor, cpus)
-                ? vc->processor : cpumask_cycle(vc->processor, cpus);
+        cpu = cpumask_test_cpu(sched_unit_cpu(unit), cpus)
+                ? sched_unit_cpu(unit)
+                : cpumask_cycle(sched_unit_cpu(unit), cpus);
         ASSERT(cpumask_test_cpu(cpu, cpus));
 
         /*
@@ -769,15 +773,15 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * We give preference to the idle execution vehicle with the most
          * idling neighbours in its grouping. This distributes work across
          * distinct cores first and guarantees we don't do something stupid
-         * like run two VCPUs on co-hyperthreads while there are idle cores
+         * like run two UNITs on co-hyperthreads while there are idle cores
          * or sockets.
          *
          * Notice that, when computing the "idleness" of cpu, we may want to
-         * discount vc. That is, iff vc is the currently running and the only
-         * runnable vcpu on cpu, we add cpu to the idlers.
+         * discount unit. That is, iff unit is the currently running and the
+         * only runnable unit on cpu, we add cpu to the idlers.
          */
         cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
-        if ( vc->processor == cpu && is_runq_idle(cpu) )
+        if ( sched_unit_cpu(unit) == cpu && is_runq_idle(cpu) )
             __cpumask_set_cpu(cpu, &idlers);
         cpumask_and(cpus, &idlers, cpus);
 
@@ -787,7 +791,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * CPU, as we just &&-ed it with idlers). In fact, if we are on SMT, and
          * cpu points to a busy thread with an idle sibling, both the threads
          * will be considered the same, from the "idleness" calculation point
-         * of view", preventing vcpu from being moved to the thread that is
+         * of view", preventing unit from being moved to the thread that is
          * actually idle.
          *
          * Notice that cpumask_test_cpu() is quicker than cpumask_empty(), so
@@ -853,7 +857,8 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     if ( commit && spc )
        spc->idle_bias = cpu;
 
-    TRACE_3D(TRC_CSCHED_PICKED_CPU, vc->domain->domain_id, vc->vcpu_id, cpu);
+    TRACE_3D(TRC_CSCHED_PICKED_CPU, unit->domain->domain_id, unit->unit_id,
+             cpu);
 
     return cpu;
 }
@@ -861,7 +866,6 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
 static struct sched_resource *
 csched_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched_unit *svc = CSCHED_UNIT(unit);
 
     /*
@@ -871,26 +875,26 @@ csched_res_pick(const struct scheduler *ops, struct sched_unit *unit)
      * csched_unit_wake() (still called from vcpu_migrate()) we won't
      * get boosted, which we don't deserve as we are "only" migrating.
      */
-    set_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
-    return get_sched_res(_csched_cpu_pick(ops, vc, 1));
+    set_bit(CSCHED_FLAG_UNIT_MIGRATING, &svc->flags);
+    return get_sched_res(_csched_cpu_pick(ops, unit, 1));
 }
 
 static inline void
-__csched_vcpu_acct_start(struct csched_private *prv, struct csched_unit *svc)
+__csched_unit_acct_start(struct csched_private *prv, struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
     unsigned long flags;
 
     spin_lock_irqsave(&prv->lock, flags);
 
-    if ( list_empty(&svc->active_vcpu_elem) )
+    if ( list_empty(&svc->active_unit_elem) )
     {
-        SCHED_VCPU_STAT_CRANK(svc, state_active);
+        SCHED_UNIT_STAT_CRANK(svc, state_active);
         SCHED_STAT_CRANK(acct_unit_active);
 
-        sdom->active_vcpu_count++;
-        list_add(&svc->active_vcpu_elem, &sdom->active_vcpu);
-        /* Make weight per-vcpu */
+        sdom->active_unit_count++;
+        list_add(&svc->active_unit_elem, &sdom->active_unit);
+        /* Make weight per-unit */
         prv->weight += sdom->weight;
         if ( list_empty(&sdom->active_sdom_elem) )
         {
@@ -899,56 +903,56 @@ __csched_vcpu_acct_start(struct csched_private *prv, struct csched_unit *svc)
     }
 
     TRACE_3D(TRC_CSCHED_ACCOUNT_START, sdom->dom->domain_id,
-             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
+             svc->unit->unit_id, sdom->active_unit_count);
 
     spin_unlock_irqrestore(&prv->lock, flags);
 }
 
 static inline void
-__csched_vcpu_acct_stop_locked(struct csched_private *prv,
+__csched_unit_acct_stop_locked(struct csched_private *prv,
     struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
-    BUG_ON( list_empty(&svc->active_vcpu_elem) );
+    BUG_ON( list_empty(&svc->active_unit_elem) );
 
-    SCHED_VCPU_STAT_CRANK(svc, state_idle);
+    SCHED_UNIT_STAT_CRANK(svc, state_idle);
     SCHED_STAT_CRANK(acct_unit_idle);
 
     BUG_ON( prv->weight < sdom->weight );
-    sdom->active_vcpu_count--;
-    list_del_init(&svc->active_vcpu_elem);
+    sdom->active_unit_count--;
+    list_del_init(&svc->active_unit_elem);
     prv->weight -= sdom->weight;
-    if ( list_empty(&sdom->active_vcpu) )
+    if ( list_empty(&sdom->active_unit) )
     {
         list_del_init(&sdom->active_sdom_elem);
     }
 
     TRACE_3D(TRC_CSCHED_ACCOUNT_STOP, sdom->dom->domain_id,
-             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
+             svc->unit->unit_id, sdom->active_unit_count);
 }
 
 static void
-csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
+csched_unit_acct(struct csched_private *prv, unsigned int cpu)
 {
     struct sched_unit *currunit = current->sched_unit;
     struct csched_unit * const svc = CSCHED_UNIT(currunit);
     const struct scheduler *ops = per_cpu(scheduler, cpu);
 
-    ASSERT( current->processor == cpu );
+    ASSERT( sched_unit_cpu(currunit) == cpu );
     ASSERT( svc->sdom != NULL );
-    ASSERT( !is_idle_vcpu(svc->vcpu) );
+    ASSERT( !is_idle_unit(svc->unit) );
 
     /*
-     * If this VCPU's priority was boosted when it last awoke, reset it.
-     * If the VCPU is found here, then it's consuming a non-negligeable
+     * If this UNIT's priority was boosted when it last awoke, reset it.
+     * If the UNIT is found here, then it's consuming a non-negligeable
      * amount of CPU resources and should no longer be boosted.
      */
     if ( svc->pri == CSCHED_PRI_TS_BOOST )
     {
         svc->pri = CSCHED_PRI_TS_UNDER;
         TRACE_2D(TRC_CSCHED_BOOST_END, svc->sdom->dom->domain_id,
-                 svc->vcpu->vcpu_id);
+                 svc->unit->unit_id);
     }
 
     /*
@@ -957,12 +961,12 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
     burn_credits(svc, NOW());
 
     /*
-     * Put this VCPU and domain back on the active list if it was
+     * Put this UNIT and domain back on the active list if it was
      * idling.
      */
-    if ( list_empty(&svc->active_vcpu_elem) )
+    if ( list_empty(&svc->active_unit_elem) )
     {
-        __csched_vcpu_acct_start(prv, svc);
+        __csched_unit_acct_start(prv, svc);
     }
     else
     {
@@ -975,15 +979,15 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
          * migrating it to run elsewhere (see multi-core and multi-thread
          * support in csched_res_pick()).
          */
-        new_cpu = _csched_cpu_pick(ops, current, 0);
+        new_cpu = _csched_cpu_pick(ops, currunit, 0);
 
         unit_schedule_unlock_irqrestore(lock, flags, currunit);
 
         if ( new_cpu != cpu )
         {
-            SCHED_VCPU_STAT_CRANK(svc, migrate_r);
+            SCHED_UNIT_STAT_CRANK(svc, migrate_r);
             SCHED_STAT_CRANK(migrate_running);
-            set_bit(_VPF_migrating, &current->pause_flags);
+            sched_set_pause_flags_atomic(currunit, _VPF_migrating);
             /*
              * As we are about to tickle cpu, we should clear its bit in
              * idlers. But, if we are here, it means there is someone running
@@ -1000,21 +1004,20 @@ static void *
 csched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
                    void *dd)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched_unit *svc;
 
-    /* Allocate per-VCPU info */
+    /* Allocate per-UNIT info */
     svc = xzalloc(struct csched_unit);
     if ( svc == NULL )
         return NULL;
 
     INIT_LIST_HEAD(&svc->runq_elem);
-    INIT_LIST_HEAD(&svc->active_vcpu_elem);
+    INIT_LIST_HEAD(&svc->active_unit_elem);
     svc->sdom = dd;
-    svc->vcpu = vc;
-    svc->pri = is_idle_domain(vc->domain) ?
+    svc->unit = unit;
+    svc->pri = is_idle_unit(unit) ?
         CSCHED_PRI_IDLE : CSCHED_PRI_TS_UNDER;
-    SCHED_VCPU_STATS_RESET(svc);
+    SCHED_UNIT_STATS_RESET(svc);
     SCHED_STAT_CRANK(unit_alloc);
     return svc;
 }
@@ -1022,24 +1025,21 @@ csched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
 static void
 csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched_unit *svc = unit->priv;
     spinlock_t *lock;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
 
     /* csched_res_pick() looks in vc->processor's runq, so we need the lock. */
     lock = unit_schedule_lock_irq(unit);
 
-    unit->res = csched_res_pick(ops, unit);
-    vc->processor = unit->res->processor;
+    sched_set_res(unit, csched_res_pick(ops, unit));
 
     spin_unlock_irq(lock);
 
     lock = unit_schedule_lock_irq(unit);
 
-    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) &&
-         !vc->sched_unit->is_running )
+    if ( !__unit_on_runq(svc) && unit_runnable(unit) && !unit->is_running )
         runq_insert(svc);
 
     unit_schedule_unlock_irq(lock, unit);
@@ -1066,18 +1066,18 @@ csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 
     SCHED_STAT_CRANK(unit_remove);
 
-    ASSERT(!__vcpu_on_runq(svc));
+    ASSERT(!__unit_on_runq(svc));
 
-    if ( test_and_clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+    if ( test_and_clear_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
     {
         SCHED_STAT_CRANK(unit_unpark);
-        vcpu_unpause(svc->vcpu);
+        vcpu_unpause(svc->unit->vcpu);
     }
 
     spin_lock_irq(&prv->lock);
 
-    if ( !list_empty(&svc->active_vcpu_elem) )
-        __csched_vcpu_acct_stop_locked(prv, svc);
+    if ( !list_empty(&svc->active_unit_elem) )
+        __csched_unit_acct_stop_locked(prv, svc);
 
     spin_unlock_irq(&prv->lock);
 
@@ -1087,86 +1087,85 @@ csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 static void
 csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched_unit * const svc = CSCHED_UNIT(unit);
-    unsigned int cpu = vc->processor;
+    unsigned int cpu = sched_unit_cpu(unit);
 
     SCHED_STAT_CRANK(unit_sleep);
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
 
     if ( curr_on_cpu(cpu) == unit )
     {
         /*
          * We are about to tickle cpu, so we should clear its bit in idlers.
-         * But, we are here because vc is going to sleep while running on cpu,
+         * But, we are here because unit is going to sleep while running on cpu,
          * so the bit must be zero already.
          */
         ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(per_cpu(scheduler, cpu))->idlers));
         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
     }
-    else if ( __vcpu_on_runq(svc) )
+    else if ( __unit_on_runq(svc) )
         runq_remove(svc);
 }
 
 static void
 csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched_unit * const svc = CSCHED_UNIT(unit);
     bool_t migrating;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == unit) )
+    if ( unlikely(curr_on_cpu(sched_unit_cpu(unit)) == unit) )
     {
         SCHED_STAT_CRANK(unit_wake_running);
         return;
     }
-    if ( unlikely(__vcpu_on_runq(svc)) )
+    if ( unlikely(__unit_on_runq(svc)) )
     {
         SCHED_STAT_CRANK(unit_wake_onrunq);
         return;
     }
 
-    if ( likely(vcpu_runnable(vc)) )
+    if ( likely(unit_runnable(unit)) )
         SCHED_STAT_CRANK(unit_wake_runnable);
     else
         SCHED_STAT_CRANK(unit_wake_not_runnable);
 
     /*
-     * We temporarly boost the priority of awaking VCPUs!
+     * We temporarily boost the priority of awaking UNITs!
      *
-     * If this VCPU consumes a non negligeable amount of CPU, it
+     * If this UNIT consumes a non negligible amount of CPU, it
      * will eventually find itself in the credit accounting code
      * path where its priority will be reset to normal.
      *
-     * If on the other hand the VCPU consumes little CPU and is
+     * If on the other hand the UNIT consumes little CPU and is
      * blocking and awoken a lot (doing I/O for example), its
      * priority will remain boosted, optimizing it's wake-to-run
      * latencies.
      *
-     * This allows wake-to-run latency sensitive VCPUs to preempt
-     * more CPU resource intensive VCPUs without impacting overall 
+     * This allows wake-to-run latency sensitive UNITs to preempt
+     * more CPU resource intensive UNITs without impacting overall
      * system fairness.
      *
      * There are two cases, when we don't want to boost:
-     *  - VCPUs that are waking up after a migration, rather than
+     *  - UNITs that are waking up after a migration, rather than
      *    after having block;
-     *  - VCPUs of capped domains unpausing after earning credits
+     *  - UNITs of capped domains unpausing after earning credits
      *    they had overspent.
      */
-    migrating = test_and_clear_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
+    migrating = test_and_clear_bit(CSCHED_FLAG_UNIT_MIGRATING, &svc->flags);
 
     if ( !migrating && svc->pri == CSCHED_PRI_TS_UNDER &&
-         !test_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+         !test_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
     {
-        TRACE_2D(TRC_CSCHED_BOOST_START, vc->domain->domain_id, vc->vcpu_id);
+        TRACE_2D(TRC_CSCHED_BOOST_START, unit->domain->domain_id,
+                 unit->unit_id);
         SCHED_STAT_CRANK(unit_boost);
         svc->pri = CSCHED_PRI_TS_BOOST;
     }
 
-    /* Put the VCPU on the runq and tickle CPUs */
+    /* Put the UNIT on the runq and tickle CPUs */
     runq_insert(svc);
     __runq_tickle(svc);
 }
@@ -1177,7 +1176,7 @@ csched_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
     struct csched_unit * const svc = CSCHED_UNIT(unit);
 
     /* Let the scheduler know that this vcpu is trying to yield */
-    set_bit(CSCHED_FLAG_VCPU_YIELD, &svc->flags);
+    set_bit(CSCHED_FLAG_UNIT_YIELD, &svc->flags);
 }
 
 static int
@@ -1206,8 +1205,8 @@ csched_dom_cntl(
         {
             if ( !list_empty(&sdom->active_sdom_elem) )
             {
-                prv->weight -= sdom->weight * sdom->active_vcpu_count;
-                prv->weight += op->u.credit.weight * sdom->active_vcpu_count;
+                prv->weight -= sdom->weight * sdom->active_unit_count;
+                prv->weight += op->u.credit.weight * sdom->active_unit_count;
             }
             sdom->weight = op->u.credit.weight;
         }
@@ -1236,9 +1235,9 @@ csched_aff_cntl(const struct scheduler *ops, struct sched_unit *unit,
 
     /* Are we becoming exclusively pinned? */
     if ( cpumask_weight(hard) == 1 )
-        set_bit(CSCHED_FLAG_VCPU_PINNED, &svc->flags);
+        set_bit(CSCHED_FLAG_UNIT_PINNED, &svc->flags);
     else
-        clear_bit(CSCHED_FLAG_VCPU_PINNED, &svc->flags);
+        clear_bit(CSCHED_FLAG_UNIT_PINNED, &svc->flags);
 }
 
 static inline void
@@ -1281,14 +1280,14 @@ csched_sys_cntl(const struct scheduler *ops,
         else if ( prv->ratelimit && !params->ratelimit_us )
             printk(XENLOG_INFO "Disabling context switch rate limiting\n");
         prv->ratelimit = MICROSECS(params->ratelimit_us);
-        prv->vcpu_migr_delay = MICROSECS(params->vcpu_migr_delay_us);
+        prv->unit_migr_delay = MICROSECS(params->vcpu_migr_delay_us);
         spin_unlock_irqrestore(&prv->lock, flags);
 
         /* FALLTHRU */
     case XEN_SYSCTL_SCHEDOP_getinfo:
         params->tslice_ms = prv->tslice / MILLISECS(1);
         params->ratelimit_us = prv->ratelimit / MICROSECS(1);
-        params->vcpu_migr_delay_us = prv->vcpu_migr_delay / MICROSECS(1);
+        params->vcpu_migr_delay_us = prv->unit_migr_delay / MICROSECS(1);
         rc = 0;
         break;
     }
@@ -1306,7 +1305,7 @@ csched_alloc_domdata(const struct scheduler *ops, struct domain *dom)
         return ERR_PTR(-ENOMEM);
 
     /* Initialize credit and weight */
-    INIT_LIST_HEAD(&sdom->active_vcpu);
+    INIT_LIST_HEAD(&sdom->active_unit);
     INIT_LIST_HEAD(&sdom->active_sdom_elem);
     sdom->dom = dom;
     sdom->weight = CSCHED_DEFAULT_WEIGHT;
@@ -1323,7 +1322,7 @@ csched_free_domdata(const struct scheduler *ops, void *data)
 /*
  * This is a O(n) optimized sort of the runq.
  *
- * Time-share VCPUs can only be one of two priorities, UNDER or OVER. We walk
+ * Time-share UNITs can only be one of two priorities, UNDER or OVER. We walk
  * through the runq and move up any UNDERs that are preceded by OVERS. We
  * remember the last UNDER to make the move up operation O(1).
  */
@@ -1376,7 +1375,7 @@ csched_acct(void* dummy)
 {
     struct csched_private *prv = dummy;
     unsigned long flags;
-    struct list_head *iter_vcpu, *next_vcpu;
+    struct list_head *iter_unit, *next_unit;
     struct list_head *iter_sdom, *next_sdom;
     struct csched_unit *svc;
     struct csched_dom *sdom;
@@ -1423,26 +1422,26 @@ csched_acct(void* dummy)
         sdom = list_entry(iter_sdom, struct csched_dom, active_sdom_elem);
 
         BUG_ON( is_idle_domain(sdom->dom) );
-        BUG_ON( sdom->active_vcpu_count == 0 );
+        BUG_ON( sdom->active_unit_count == 0 );
         BUG_ON( sdom->weight == 0 );
-        BUG_ON( (sdom->weight * sdom->active_vcpu_count) > weight_left );
+        BUG_ON( (sdom->weight * sdom->active_unit_count) > weight_left );
 
-        weight_left -= ( sdom->weight * sdom->active_vcpu_count );
+        weight_left -= ( sdom->weight * sdom->active_unit_count );
 
         /*
          * A domain's fair share is computed using its weight in competition
          * with that of all other active domains.
          *
-         * At most, a domain can use credits to run all its active VCPUs
+         * At most, a domain can use credits to run all its active UNITs
          * for one full accounting period. We allow a domain to earn more
          * only when the system-wide credit balance is negative.
          */
-        credit_peak = sdom->active_vcpu_count * prv->credits_per_tslice;
+        credit_peak = sdom->active_unit_count * prv->credits_per_tslice;
         if ( prv->credit_balance < 0 )
         {
             credit_peak += ( ( -prv->credit_balance
                                * sdom->weight
-                               * sdom->active_vcpu_count) +
+                               * sdom->active_unit_count) +
                              (weight_total - 1)
                            ) / weight_total;
         }
@@ -1453,14 +1452,14 @@ csched_acct(void* dummy)
             if ( credit_cap < credit_peak )
                 credit_peak = credit_cap;
 
-            /* FIXME -- set cap per-vcpu as well...? */
-            credit_cap = ( credit_cap + ( sdom->active_vcpu_count - 1 )
-                         ) / sdom->active_vcpu_count;
+            /* FIXME -- set cap per-unit as well...? */
+            credit_cap = ( credit_cap + ( sdom->active_unit_count - 1 )
+                         ) / sdom->active_unit_count;
         }
 
         credit_fair = ( ( credit_total
                           * sdom->weight
-                          * sdom->active_vcpu_count )
+                          * sdom->active_unit_count )
                         + (weight_total - 1)
                       ) / weight_total;
 
@@ -1494,14 +1493,14 @@ csched_acct(void* dummy)
             credit_fair = credit_peak;
         }
 
-        /* Compute fair share per VCPU */
-        credit_fair = ( credit_fair + ( sdom->active_vcpu_count - 1 )
-                      ) / sdom->active_vcpu_count;
+        /* Compute fair share per UNIT */
+        credit_fair = ( credit_fair + ( sdom->active_unit_count - 1 )
+                      ) / sdom->active_unit_count;
 
 
-        list_for_each_safe( iter_vcpu, next_vcpu, &sdom->active_vcpu )
+        list_for_each_safe( iter_unit, next_unit, &sdom->active_unit )
         {
-            svc = list_entry(iter_vcpu, struct csched_unit, active_vcpu_elem);
+            svc = list_entry(iter_unit, struct csched_unit, active_unit_elem);
             BUG_ON( sdom != svc->sdom );
 
             /* Increment credit */
@@ -1509,20 +1508,20 @@ csched_acct(void* dummy)
             credit = atomic_read(&svc->credit);
 
             /*
-             * Recompute priority or, if VCPU is idling, remove it from
+             * Recompute priority or, if UNIT is idling, remove it from
              * the active list.
              */
             if ( credit < 0 )
             {
                 svc->pri = CSCHED_PRI_TS_OVER;
 
-                /* Park running VCPUs of capped-out domains */
+                /* Park running UNITs of capped-out domains */
                 if ( sdom->cap != 0U &&
                      credit < -credit_cap &&
-                     !test_and_set_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+                     !test_and_set_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
                 {
                     SCHED_STAT_CRANK(unit_park);
-                    vcpu_pause_nosync(svc->vcpu);
+                    vcpu_pause_nosync(svc->unit->vcpu);
                 }
 
                 /* Lower bound on credits */
@@ -1538,22 +1537,22 @@ csched_acct(void* dummy)
                 svc->pri = CSCHED_PRI_TS_UNDER;
 
                 /* Unpark any capped domains whose credits go positive */
-                if ( test_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+                if ( test_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
                 {
                     /*
                      * It's important to unset the flag AFTER the unpause()
-                     * call to make sure the VCPU's priority is not boosted
+                     * call to make sure the UNIT's priority is not boosted
                      * if it is woken up here.
                      */
                     SCHED_STAT_CRANK(unit_unpark);
-                    vcpu_unpause(svc->vcpu);
-                    clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags);
+                    vcpu_unpause(svc->unit->vcpu);
+                    clear_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags);
                 }
 
-                /* Upper bound on credits means VCPU stops earning */
+                /* Upper bound on credits means UNIT stops earning */
                 if ( credit > prv->credits_per_tslice )
                 {
-                    __csched_vcpu_acct_stop_locked(prv, svc);
+                    __csched_unit_acct_stop_locked(prv, svc);
                     /* Divide credits in half, so that when it starts
                      * accounting again, it starts a little bit "ahead" */
                     credit /= 2;
@@ -1561,8 +1560,8 @@ csched_acct(void* dummy)
                 }
             }
 
-            SCHED_VCPU_STAT_SET(svc, credit_last, credit);
-            SCHED_VCPU_STAT_SET(svc, credit_incr, credit_fair);
+            SCHED_UNIT_STAT_SET(svc, credit_last, credit);
+            SCHED_UNIT_STAT_SET(svc, credit_incr, credit_fair);
             credit_balance += credit;
         }
     }
@@ -1588,10 +1587,10 @@ csched_tick(void *_cpu)
     spc->tick++;
 
     /*
-     * Accounting for running VCPU
+     * Accounting for running UNIT
      */
-    if ( !is_idle_vcpu(current) )
-        csched_vcpu_acct(prv, cpu);
+    if ( !is_idle_unit(current->sched_unit) )
+        csched_unit_acct(prv, cpu);
 
     /*
      * Check if runq needs to be sorted
@@ -1612,7 +1611,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
     const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu);
     struct csched_unit *speer;
     struct list_head *iter;
-    struct vcpu *vc;
+    struct sched_unit *unit;
 
     ASSERT(peer_pcpu != NULL);
 
@@ -1620,7 +1619,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
      * Don't steal from an idle CPU's runq because it's about to
      * pick up work from it itself.
      */
-    if ( unlikely(is_idle_vcpu(curr_on_cpu(peer_cpu)->vcpu)) )
+    if ( unlikely(is_idle_unit(curr_on_cpu(peer_cpu))) )
         goto out;
 
     list_for_each( iter, &peer_pcpu->runq )
@@ -1628,46 +1627,44 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
         speer = __runq_elem(iter);
 
         /*
-         * If next available VCPU here is not of strictly higher
+         * If next available UNIT here is not of strictly higher
          * priority than ours, this PCPU is useless to us.
          */
         if ( speer->pri <= pri )
             break;
 
-        /* Is this VCPU runnable on our PCPU? */
-        vc = speer->vcpu;
-        BUG_ON( is_idle_vcpu(vc) );
+        /* Is this UNIT runnable on our PCPU? */
+        unit = speer->unit;
+        BUG_ON( is_idle_unit(unit) );
 
         /*
-         * If the vcpu is still in peer_cpu's scheduling tail, or if it
+         * If the unit is still in peer_cpu's scheduling tail, or if it
          * has no useful soft affinity, skip it.
          *
          * In fact, what we want is to check if we have any "soft-affine
          * work" to steal, before starting to look at "hard-affine work".
          *
-         * Notice that, if not even one vCPU on this runq has a useful
+         * Notice that, if not even one unit on this runq has a useful
          * soft affinity, we could have avoid considering this runq for
          * a soft balancing step in the first place. This, for instance,
          * can be implemented by taking note of on what runq there are
-         * vCPUs with useful soft affinities in some sort of bitmap
+         * units with useful soft affinities in some sort of bitmap
          * or counter.
          */
-        if ( vc->sched_unit->is_running ||
-             (balance_step == BALANCE_SOFT_AFFINITY &&
-              !has_soft_affinity(vc->sched_unit)) )
+        if ( unit->is_running || (balance_step == BALANCE_SOFT_AFFINITY &&
+                                  !has_soft_affinity(unit)) )
             continue;
 
-        affinity_balance_cpumask(vc->sched_unit, balance_step, cpumask_scratch);
-        if ( __csched_vcpu_is_migrateable(prv, vc, cpu, cpumask_scratch) )
+        affinity_balance_cpumask(unit, balance_step, cpumask_scratch);
+        if ( __csched_unit_is_migrateable(prv, unit, cpu, cpumask_scratch) )
         {
             /* We got a candidate. Grab it! */
-            TRACE_3D(TRC_CSCHED_STOLEN_VCPU, peer_cpu,
-                     vc->domain->domain_id, vc->vcpu_id);
-            SCHED_VCPU_STAT_CRANK(speer, migrate_q);
+            TRACE_3D(TRC_CSCHED_STOLEN_UNIT, peer_cpu,
+                     unit->domain->domain_id, unit->unit_id);
+            SCHED_UNIT_STAT_CRANK(speer, migrate_q);
             SCHED_STAT_CRANK(migrate_queued);
-            WARN_ON(vc->is_urgent);
             runq_remove(speer);
-            sched_set_res(vc->sched_unit, get_sched_res(cpu));
+            sched_set_res(unit, get_sched_res(cpu));
             /*
              * speer will start executing directly on cpu, without having to
              * go through runq_insert(). So we must update the runnable count
@@ -1693,7 +1690,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
     int peer_cpu, first_cpu, peer_node, bstep;
     int node = cpu_to_node(cpu);
 
-    BUG_ON( cpu != snext->vcpu->processor );
+    BUG_ON( cpu != sched_unit_cpu(snext->unit) );
     online = cpupool_online_cpumask(c);
 
     /*
@@ -1722,7 +1719,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
         /*
          * We peek at the non-idling CPUs in a node-wise fashion. In fact,
          * it is more likely that we find some affine work on our same
-         * node, not to mention that migrating vcpus within the same node
+         * node, not to mention that migrating units within the same node
          * could well expected to be cheaper than across-nodes (memory
          * stays local, there might be some node-wide cache[s], etc.).
          */
@@ -1743,7 +1740,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                 spinlock_t *lock;
 
                 /*
-                 * If there is only one runnable vCPU on peer_cpu, it means
+                 * If there is only one runnable unit on peer_cpu, it means
                  * there's no one to be stolen in its runqueue, so skip it.
                  *
                  * Checking this without holding the lock is racy... But that's
@@ -1756,13 +1753,13 @@ csched_load_balance(struct csched_private *prv, int cpu,
                  *   And we can avoid that by re-checking nr_runnable after
                  *   having grabbed the lock, if we want;
                  * - if we race with inc_nr_runnable(), we skip a pCPU that may
-                 *   have runnable vCPUs in its runqueue, but that's not a
+                 *   have runnable units in its runqueue, but that's not a
                  *   problem because:
                  *   + if racing with csched_unit_insert() or csched_unit_wake(),
-                 *     __runq_tickle() will be called afterwords, so the vCPU
+                 *     __runq_tickle() will be called afterwords, so the unit
                  *     won't get stuck in the runqueue for too long;
-                 *   + if racing with csched_runq_steal(), it may be that a
-                 *     vCPU that we could have picked up, stays in a runqueue
+                 *   + if racing with csched_runq_steal(), it may be that an
+                 *     unit that we could have picked up, stays in a runqueue
                  *     until someone else tries to steal it again. But this is
                  *     no worse than what can happen already (without this
                  *     optimization), it the pCPU would schedule right after we
@@ -1797,7 +1794,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                     csched_runq_steal(peer_cpu, cpu, snext->pri, bstep) : NULL;
                 pcpu_schedule_unlock(lock, peer_cpu);
 
-                /* As soon as one vcpu is found, balancing ends */
+                /* As soon as one unit is found, balancing ends */
                 if ( speer != NULL )
                 {
                     *stolen = 1;
@@ -1836,14 +1833,15 @@ csched_schedule(
 {
     const int cpu = smp_processor_id();
     struct list_head * const runq = RUNQ(cpu);
-    struct csched_unit * const scurr = CSCHED_UNIT(current->sched_unit);
+    struct sched_unit *unit = current->sched_unit;
+    struct csched_unit * const scurr = CSCHED_UNIT(unit);
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_unit *snext;
     struct task_slice ret;
     s_time_t runtime, tslice;
 
     SCHED_STAT_CRANK(schedule);
-    CSCHED_VCPU_CHECK(current);
+    CSCHED_UNIT_CHECK(unit);
 
     /*
      * Here in Credit1 code, we usually just call TRACE_nD() helpers, and
@@ -1857,30 +1855,30 @@ csched_schedule(
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        d.idle = is_idle_vcpu(current);
+        d.idle = is_idle_unit(unit);
         __trace_var(TRC_CSCHED_SCHEDULE, 1, sizeof(d),
                     (unsigned char *)&d);
     }
 
-    runtime = now - current->sched_unit->state_entry_time;
+    runtime = now - unit->state_entry_time;
     if ( runtime < 0 ) /* Does this ever happen? */
         runtime = 0;
 
-    if ( !is_idle_vcpu(scurr->vcpu) )
+    if ( !is_idle_unit(unit) )
     {
-        /* Update credits of a non-idle VCPU. */
+        /* Update credits of a non-idle UNIT. */
         burn_credits(scurr, now);
         scurr->start_time -= now;
     }
     else
     {
-        /* Re-instate a boosted idle VCPU as normal-idle. */
+        /* Re-instate a boosted idle UNIT as normal-idle. */
         scurr->pri = CSCHED_PRI_IDLE;
     }
 
     /* Choices, choices:
-     * - If we have a tasklet, we need to run the idle vcpu no matter what.
-     * - If sched rate limiting is in effect, and the current vcpu has
+     * - If we have a tasklet, we need to run the idle unit no matter what.
+     * - If sched rate limiting is in effect, and the current unit has
      *   run for less than that amount of time, continue the current one,
      *   but with a shorter timeslice and return it immediately
      * - Otherwise, chose the one with the highest priority (which may
@@ -1898,11 +1896,11 @@ csched_schedule(
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !test_bit(CSCHED_FLAG_VCPU_YIELD, &scurr->flags)
+    if ( !test_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags)
          && !tasklet_work_scheduled
          && prv->ratelimit
-         && vcpu_runnable(current)
-         && !is_idle_vcpu(current)
+         && unit_runnable(unit)
+         && !is_idle_unit(unit)
          && runtime < prv->ratelimit )
     {
         snext = scurr;
@@ -1920,11 +1918,11 @@ csched_schedule(
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned unit:16, dom:16;
                 unsigned runtime;
             } d;
-            d.dom = scurr->vcpu->domain->domain_id;
-            d.vcpu = scurr->vcpu->vcpu_id;
+            d.dom = unit->domain->domain_id;
+            d.unit = unit->unit_id;
             d.runtime = runtime;
             __trace_var(TRC_CSCHED_RATELIMIT, 1, sizeof(d),
                         (unsigned char *)&d);
@@ -1936,13 +1934,13 @@ csched_schedule(
     tslice = prv->tslice;
 
     /*
-     * Select next runnable local VCPU (ie top of local runq)
+     * Select next runnable local UNIT (ie top of local runq)
      */
-    if ( vcpu_runnable(current) )
+    if ( unit_runnable(unit) )
         __runq_insert(scurr);
     else
     {
-        BUG_ON( is_idle_vcpu(current) || list_empty(runq) );
+        BUG_ON( is_idle_unit(unit) || list_empty(runq) );
         /* Current has blocked. Update the runnable counter for this cpu. */
         dec_nr_runnable(cpu);
     }
@@ -1950,23 +1948,23 @@ csched_schedule(
     snext = __runq_elem(runq->next);
     ret.migrated = 0;
 
-    /* Tasklet work (which runs in idle VCPU context) overrides all else. */
+    /* Tasklet work (which runs in idle UNIT context) overrides all else. */
     if ( tasklet_work_scheduled )
     {
         TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_UNIT(idle_vcpu[cpu]->sched_unit);
+        snext = CSCHED_UNIT(sched_idle_unit(cpu));
         snext->pri = CSCHED_PRI_TS_BOOST;
     }
 
     /*
      * Clear YIELD flag before scheduling out
      */
-    clear_bit(CSCHED_FLAG_VCPU_YIELD, &scurr->flags);
+    clear_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags);
 
     /*
      * SMP Load balance:
      *
-     * If the next highest priority local runnable VCPU has already eaten
+     * If the next highest priority local runnable UNIT has already eaten
      * through its credits, look on other PCPUs to see if we have more
      * urgent work... If not, csched_load_balance() will return snext, but
      * already removed from the runq.
@@ -1990,32 +1988,32 @@ csched_schedule(
         cpumask_clear_cpu(cpu, prv->idlers);
     }
 
-    if ( !is_idle_vcpu(snext->vcpu) )
+    if ( !is_idle_unit(snext->unit) )
         snext->start_time += now;
 
 out:
     /*
      * Return task to run next...
      */
-    ret.time = (is_idle_vcpu(snext->vcpu) ?
+    ret.time = (is_idle_unit(snext->unit) ?
                 -1 : tslice);
-    ret.task = snext->vcpu->sched_unit;
+    ret.task = snext->unit;
 
-    CSCHED_VCPU_CHECK(ret.task->vcpu);
+    CSCHED_UNIT_CHECK(ret.task);
     return ret;
 }
 
 static void
-csched_dump_vcpu(struct csched_unit *svc)
+csched_dump_unit(struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
     printk("[%i.%i] pri=%i flags=%x cpu=%i",
-            svc->vcpu->domain->domain_id,
-            svc->vcpu->vcpu_id,
+            svc->unit->domain->domain_id,
+            svc->unit->unit_id,
             svc->pri,
             svc->flags,
-            svc->vcpu->processor);
+            sched_unit_cpu(svc->unit));
 
     if ( sdom )
     {
@@ -2049,7 +2047,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
 
     /*
      * We need both locks:
-     * - csched_dump_vcpu() wants to access domains' scheduling
+     * - csched_dump_unit() wants to access domains' scheduling
      *   parameters, which are protected by the private scheduler lock;
      * - we scan through the runqueue, so we need the proper runqueue
      *   lock (the one of the runqueue of this cpu).
@@ -2065,12 +2063,12 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_sibling_mask, cpu)),
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
-    /* current VCPU (nothing to say if that's the idle vcpu). */
+    /* current UNIT (nothing to say if that's the idle unit). */
     svc = CSCHED_UNIT(curr_on_cpu(cpu));
-    if ( svc && !is_idle_vcpu(svc->vcpu) )
+    if ( svc && !is_idle_unit(svc->unit) )
     {
         printk("\trun: ");
-        csched_dump_vcpu(svc);
+        csched_dump_unit(svc);
     }
 
     loop = 0;
@@ -2080,7 +2078,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
         if ( svc )
         {
             printk("\t%3d: ", ++loop);
-            csched_dump_vcpu(svc);
+            csched_dump_unit(svc);
         }
     }
 
@@ -2122,29 +2120,29 @@ csched_dump(const struct scheduler *ops)
            prv->ratelimit / MICROSECS(1),
            CSCHED_CREDITS_PER_MSEC,
            prv->ticks_per_tslice,
-           prv->vcpu_migr_delay/ MICROSECS(1));
+           prv->unit_migr_delay/ MICROSECS(1));
 
     printk("idlers: %*pb\n", nr_cpu_ids, cpumask_bits(prv->idlers));
 
-    printk("active vcpus:\n");
+    printk("active units:\n");
     loop = 0;
     list_for_each( iter_sdom, &prv->active_sdom )
     {
         struct csched_dom *sdom;
         sdom = list_entry(iter_sdom, struct csched_dom, active_sdom_elem);
 
-        list_for_each( iter_svc, &sdom->active_vcpu )
+        list_for_each( iter_svc, &sdom->active_unit )
         {
             struct csched_unit *svc;
             spinlock_t *lock;
 
-            svc = list_entry(iter_svc, struct csched_unit, active_vcpu_elem);
-            lock = unit_schedule_lock(svc->vcpu->sched_unit);
+            svc = list_entry(iter_svc, struct csched_unit, active_unit_elem);
+            lock = unit_schedule_lock(svc->unit);
 
             printk("\t%3d: ", ++loop);
-            csched_dump_vcpu(svc);
+            csched_dump_unit(svc);
 
-            unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+            unit_schedule_unlock(lock, svc->unit);
         }
     }
 
@@ -2218,7 +2216,7 @@ csched_init(struct scheduler *ops)
     else
         prv->ratelimit = MICROSECS(sched_ratelimit_us);
 
-    prv->vcpu_migr_delay = MICROSECS(vcpu_migration_delay_us);
+    prv->unit_migr_delay = MICROSECS(vcpu_migration_delay_us);
 
     return 0;
 }
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 22/60] xen/sched: make credit scheduler vcpu agnostic.
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Switch credit scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c | 506 +++++++++++++++++++++++-----------------------
 1 file changed, 252 insertions(+), 254 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index b700cc07ce..0bc6a87d30 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -70,10 +70,10 @@
  * inconsistent set of locks. Therefore atomic-safe bit operations must
  * be used for accessing it.
  */
-#define CSCHED_FLAG_VCPU_PARKED    0x0  /* VCPU over capped credits */
-#define CSCHED_FLAG_VCPU_YIELD     0x1  /* VCPU yielding */
-#define CSCHED_FLAG_VCPU_MIGRATING 0x2  /* VCPU may have moved to a new pcpu */
-#define CSCHED_FLAG_VCPU_PINNED    0x4  /* VCPU can run only on 1 pcpu */
+#define CSCHED_FLAG_UNIT_PARKED    0x0  /* UNIT over capped credits */
+#define CSCHED_FLAG_UNIT_YIELD     0x1  /* UNIT yielding */
+#define CSCHED_FLAG_UNIT_MIGRATING 0x2  /* UNIT may have moved to a new pcpu */
+#define CSCHED_FLAG_UNIT_PINNED    0x4  /* UNIT can run only on 1 pcpu */
 
 
 /*
@@ -91,7 +91,7 @@
 /*
  * CSCHED_STATS
  *
- * Manage very basic per-vCPU counters and stats.
+ * Manage very basic per-unit counters and stats.
  *
  * Useful for debugging live systems. The stats are displayed
  * with runq dumps ('r' on the Xen console).
@@ -100,23 +100,23 @@
 
 #define CSCHED_STATS
 
-#define SCHED_VCPU_STATS_RESET(_V)                      \
+#define SCHED_UNIT_STATS_RESET(_V)                      \
     do                                                  \
     {                                                   \
         memset(&(_V)->stats, 0, sizeof((_V)->stats));   \
     } while ( 0 )
 
-#define SCHED_VCPU_STAT_CRANK(_V, _X)       (((_V)->stats._X)++)
+#define SCHED_UNIT_STAT_CRANK(_V, _X)       (((_V)->stats._X)++)
 
-#define SCHED_VCPU_STAT_SET(_V, _X, _Y)     (((_V)->stats._X) = (_Y))
+#define SCHED_UNIT_STAT_SET(_V, _X, _Y)     (((_V)->stats._X) = (_Y))
 
 #else /* !SCHED_STATS */
 
 #undef CSCHED_STATS
 
-#define SCHED_VCPU_STATS_RESET(_V)         do {} while ( 0 )
-#define SCHED_VCPU_STAT_CRANK(_V, _X)      do {} while ( 0 )
-#define SCHED_VCPU_STAT_SET(_V, _X, _Y)    do {} while ( 0 )
+#define SCHED_UNIT_STATS_RESET(_V)         do {} while ( 0 )
+#define SCHED_UNIT_STAT_CRANK(_V, _X)      do {} while ( 0 )
+#define SCHED_UNIT_STAT_SET(_V, _X, _Y)    do {} while ( 0 )
 
 #endif /* SCHED_STATS */
 
@@ -128,7 +128,7 @@
 #define TRC_CSCHED_SCHED_TASKLET TRC_SCHED_CLASS_EVT(CSCHED, 1)
 #define TRC_CSCHED_ACCOUNT_START TRC_SCHED_CLASS_EVT(CSCHED, 2)
 #define TRC_CSCHED_ACCOUNT_STOP  TRC_SCHED_CLASS_EVT(CSCHED, 3)
-#define TRC_CSCHED_STOLEN_VCPU   TRC_SCHED_CLASS_EVT(CSCHED, 4)
+#define TRC_CSCHED_STOLEN_UNIT   TRC_SCHED_CLASS_EVT(CSCHED, 4)
 #define TRC_CSCHED_PICKED_CPU    TRC_SCHED_CLASS_EVT(CSCHED, 5)
 #define TRC_CSCHED_TICKLE        TRC_SCHED_CLASS_EVT(CSCHED, 6)
 #define TRC_CSCHED_BOOST_START   TRC_SCHED_CLASS_EVT(CSCHED, 7)
@@ -158,15 +158,15 @@ struct csched_pcpu {
 };
 
 /*
- * Virtual CPU
+ * Virtual UNIT
  */
 struct csched_unit {
     struct list_head runq_elem;
-    struct list_head active_vcpu_elem;
+    struct list_head active_unit_elem;
 
     /* Up-pointers */
     struct csched_dom *sdom;
-    struct vcpu *vcpu;
+    struct sched_unit *unit;
 
     s_time_t start_time;   /* When we were scheduled (used for credit) */
     unsigned flags;
@@ -192,10 +192,10 @@ struct csched_unit {
  * Domain
  */
 struct csched_dom {
-    struct list_head active_vcpu;
+    struct list_head active_unit;
     struct list_head active_sdom_elem;
     struct domain *dom;
-    uint16_t active_vcpu_count;
+    uint16_t active_unit_count;
     uint16_t weight;
     uint16_t cap;
 };
@@ -215,7 +215,7 @@ struct csched_private {
 
     /* Period of master and tick in milliseconds */
     unsigned int tick_period_us, ticks_per_tslice;
-    s_time_t ratelimit, tslice, vcpu_migr_delay;
+    s_time_t ratelimit, tslice, unit_migr_delay;
 
     struct list_head active_sdom;
     uint32_t weight;
@@ -231,7 +231,7 @@ static void csched_tick(void *_cpu);
 static void csched_acct(void *dummy);
 
 static inline int
-__vcpu_on_runq(struct csched_unit *svc)
+__unit_on_runq(struct csched_unit *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
@@ -242,7 +242,7 @@ __runq_elem(struct list_head *elem)
     return list_entry(elem, struct csched_unit, runq_elem);
 }
 
-/* Is the first element of cpu's runq (if any) cpu's idle vcpu? */
+/* Is the first element of cpu's runq (if any) cpu's idle unit? */
 static inline bool_t is_runq_idle(unsigned int cpu)
 {
     /*
@@ -251,7 +251,7 @@ static inline bool_t is_runq_idle(unsigned int cpu)
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
     return list_empty(RUNQ(cpu)) ||
-           is_idle_vcpu(__runq_elem(RUNQ(cpu)->next)->vcpu);
+           is_idle_unit(__runq_elem(RUNQ(cpu)->next)->unit);
 }
 
 static inline void
@@ -273,11 +273,11 @@ dec_nr_runnable(unsigned int cpu)
 static inline void
 __runq_insert(struct csched_unit *svc)
 {
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_unit_cpu(svc->unit);
     const struct list_head * const runq = RUNQ(cpu);
     struct list_head *iter;
 
-    BUG_ON( __vcpu_on_runq(svc) );
+    BUG_ON( __unit_on_runq(svc) );
 
     list_for_each( iter, runq )
     {
@@ -286,10 +286,10 @@ __runq_insert(struct csched_unit *svc)
             break;
     }
 
-    /* If the vcpu yielded, try to put it behind one lower-priority
-     * runnable vcpu if we can.  The next runq_sort will bring it forward
+    /* If the unit yielded, try to put it behind one lower-priority
+     * runnable unit if we can.  The next runq_sort will bring it forward
      * within 30ms if the queue too long. */
-    if ( test_bit(CSCHED_FLAG_VCPU_YIELD, &svc->flags)
+    if ( test_bit(CSCHED_FLAG_UNIT_YIELD, &svc->flags)
          && __runq_elem(iter)->pri > CSCHED_PRI_IDLE )
     {
         iter=iter->next;
@@ -305,20 +305,20 @@ static inline void
 runq_insert(struct csched_unit *svc)
 {
     __runq_insert(svc);
-    inc_nr_runnable(svc->vcpu->processor);
+    inc_nr_runnable(sched_unit_cpu(svc->unit));
 }
 
 static inline void
 __runq_remove(struct csched_unit *svc)
 {
-    BUG_ON( !__vcpu_on_runq(svc) );
+    BUG_ON( !__unit_on_runq(svc) );
     list_del_init(&svc->runq_elem);
 }
 
 static inline void
 runq_remove(struct csched_unit *svc)
 {
-    dec_nr_runnable(svc->vcpu->processor);
+    dec_nr_runnable(sched_unit_cpu(svc->unit));
     __runq_remove(svc);
 }
 
@@ -329,7 +329,7 @@ static void burn_credits(struct csched_unit *svc, s_time_t now)
     unsigned int credits;
 
     /* Assert svc is current */
-    ASSERT( svc == CSCHED_UNIT(curr_on_cpu(svc->vcpu->processor)) );
+    ASSERT( svc == CSCHED_UNIT(curr_on_cpu(sched_unit_cpu(svc->unit))) );
 
     if ( (delta = now - svc->start_time) <= 0 )
         return;
@@ -349,8 +349,8 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 
 static inline void __runq_tickle(struct csched_unit *new)
 {
-    unsigned int cpu = new->vcpu->processor;
-    struct sched_unit *unit = new->vcpu->sched_unit;
+    unsigned int cpu = sched_unit_cpu(new->unit);
+    struct sched_unit *unit = new->unit;
     struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     cpumask_t mask, idle_mask, *online;
@@ -364,16 +364,16 @@ static inline void __runq_tickle(struct csched_unit *new)
     idlers_empty = cpumask_empty(&idle_mask);
 
     /*
-     * Exclusive pinning is when a vcpu has hard-affinity with only one
-     * cpu, and there is no other vcpu that has hard-affinity with that
+     * Exclusive pinning is when a unit has hard-affinity with only one
+     * cpu, and there is no other unit that has hard-affinity with that
      * same cpu. This is infrequent, but if it happens, is for achieving
      * the most possible determinism, and least possible overhead for
-     * the vcpus in question.
+     * the units in question.
      *
      * Try to identify the vast majority of these situations, and deal
      * with them quickly.
      */
-    if ( unlikely(test_bit(CSCHED_FLAG_VCPU_PINNED, &new->flags) &&
+    if ( unlikely(test_bit(CSCHED_FLAG_UNIT_PINNED, &new->flags) &&
                   cpumask_test_cpu(cpu, &idle_mask)) )
     {
         ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu);
@@ -384,7 +384,7 @@ static inline void __runq_tickle(struct csched_unit *new)
 
     /*
      * If the pcpu is idle, or there are no idlers and the new
-     * vcpu is a higher priority than the old vcpu, run it here.
+     * unit is a higher priority than the old unit, run it here.
      *
      * If there are idle cpus, first try to find one suitable to run
      * new, so we can avoid preempting cur.  If we cannot find a
@@ -403,7 +403,7 @@ static inline void __runq_tickle(struct csched_unit *new)
     else if ( !idlers_empty )
     {
         /*
-         * Soft and hard affinity balancing loop. For vcpus without
+         * Soft and hard affinity balancing loop. For units without
          * a useful soft affinity, consider hard affinity only.
          */
         for_each_affinity_balance_step( balance_step )
@@ -446,10 +446,10 @@ static inline void __runq_tickle(struct csched_unit *new)
             {
                 if ( cpumask_intersects(unit->cpu_hard_affinity, &idle_mask) )
                 {
-                    SCHED_VCPU_STAT_CRANK(cur, kicked_away);
-                    SCHED_VCPU_STAT_CRANK(cur, migrate_r);
+                    SCHED_UNIT_STAT_CRANK(cur, kicked_away);
+                    SCHED_UNIT_STAT_CRANK(cur, migrate_r);
                     SCHED_STAT_CRANK(migrate_kicked_away);
-                    set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
+                    sched_set_pause_flags_atomic(cur->unit, _VPF_migrating);
                 }
                 /* Tickle cpu anyway, to let new preempt cur. */
                 SCHED_STAT_CRANK(tickled_busy_cpu);
@@ -605,7 +605,7 @@ init_pdata(struct csched_private *prv, struct csched_pcpu *spc, int cpu)
     spc->idle_bias = nr_cpu_ids - 1;
 
     /* Start off idling... */
-    BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)->vcpu));
+    BUG_ON(!is_idle_unit(curr_on_cpu(cpu)));
     cpumask_set_cpu(cpu, prv->idlers);
     spc->nr_runnable = 0;
 }
@@ -639,9 +639,9 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     struct csched_private *prv = CSCHED_PRIV(new_ops);
     struct csched_unit *svc = vdata;
 
-    ASSERT(svc && is_idle_vcpu(svc->vcpu));
+    ASSERT(svc && is_idle_unit(svc->unit));
 
-    idle_vcpu[cpu]->sched_unit->priv = vdata;
+    sched_idle_unit(cpu)->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -658,33 +658,33 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
 #ifndef NDEBUG
 static inline void
-__csched_vcpu_check(struct vcpu *vc)
+__csched_unit_check(struct sched_unit *unit)
 {
-    struct csched_unit * const svc = CSCHED_UNIT(vc->sched_unit);
+    struct csched_unit * const svc = CSCHED_UNIT(unit);
     struct csched_dom * const sdom = svc->sdom;
 
-    BUG_ON( svc->vcpu != vc );
-    BUG_ON( sdom != CSCHED_DOM(vc->domain) );
+    BUG_ON( svc->unit != unit );
+    BUG_ON( sdom != CSCHED_DOM(unit->domain) );
     if ( sdom )
     {
-        BUG_ON( is_idle_vcpu(vc) );
-        BUG_ON( sdom->dom != vc->domain );
+        BUG_ON( is_idle_unit(unit) );
+        BUG_ON( sdom->dom != unit->domain );
     }
     else
     {
-        BUG_ON( !is_idle_vcpu(vc) );
+        BUG_ON( !is_idle_unit(unit) );
     }
 
     SCHED_STAT_CRANK(unit_check);
 }
-#define CSCHED_VCPU_CHECK(_vc)  (__csched_vcpu_check(_vc))
+#define CSCHED_UNIT_CHECK(unit)  (__csched_unit_check(unit))
 #else
-#define CSCHED_VCPU_CHECK(_vc)
+#define CSCHED_UNIT_CHECK(unit)
 #endif
 
 /*
- * Delay, in microseconds, between migrations of a VCPU between PCPUs.
- * This prevents rapid fluttering of a VCPU between CPUs, and reduces the
+ * Delay, in microseconds, between migrations of a UNIT between PCPUs.
+ * This prevents rapid fluttering of a UNIT between CPUs, and reduces the
  * implicit overheads such as cache-warming. 1ms (1000) has been measured
  * as a good value.
  */
@@ -692,10 +692,11 @@ static unsigned int vcpu_migration_delay_us;
 integer_param("vcpu_migration_delay", vcpu_migration_delay_us);
 
 static inline bool
-__csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
+__csched_unit_is_cache_hot(const struct csched_private *prv,
+                           struct sched_unit *unit)
 {
-    bool hot = prv->vcpu_migr_delay &&
-               (NOW() - v->sched_unit->last_run_time) < prv->vcpu_migr_delay;
+    bool hot = prv->unit_migr_delay &&
+               (NOW() - unit->last_run_time) < prv->unit_migr_delay;
 
     if ( hot )
         SCHED_STAT_CRANK(unit_hot);
@@ -704,36 +705,38 @@ __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
 }
 
 static inline int
-__csched_vcpu_is_migrateable(const struct csched_private *prv, struct vcpu *vc,
+__csched_unit_is_migrateable(const struct csched_private *prv,
+                             struct sched_unit *unit,
                              int dest_cpu, cpumask_t *mask)
 {
     /*
      * Don't pick up work that's hot on peer PCPU, or that can't (or
      * would prefer not to) run on cpu.
      *
-     * The caller is supposed to have already checked that vc is also
+     * The caller is supposed to have already checked that unit is also
      * not running.
      */
-    ASSERT(!vc->sched_unit->is_running);
+    ASSERT(!unit->is_running);
 
-    return !__csched_vcpu_is_cache_hot(prv, vc) &&
+    return !__csched_unit_is_cache_hot(prv, unit) &&
            cpumask_test_cpu(dest_cpu, mask);
 }
 
 static int
-_csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
+_csched_cpu_pick(const struct scheduler *ops, struct sched_unit *unit,
+                 bool_t commit)
 {
-    /* We must always use vc->procssor's scratch space */
-    cpumask_t *cpus = cpumask_scratch_cpu(vc->processor);
+    int cpu = sched_unit_cpu(unit);
+    /* We must always use cpu's scratch space */
+    cpumask_t *cpus = cpumask_scratch_cpu(cpu);
     cpumask_t idlers;
-    cpumask_t *online = cpupool_domain_cpumask(vc->domain);
+    cpumask_t *online = cpupool_domain_cpumask(unit->domain);
     struct csched_pcpu *spc = NULL;
-    int cpu = vc->processor;
     int balance_step;
 
     for_each_affinity_balance_step( balance_step )
     {
-        affinity_balance_cpumask(vc->sched_unit, balance_step, cpus);
+        affinity_balance_cpumask(unit, balance_step, cpus);
         cpumask_and(cpus, online, cpus);
         /*
          * We want to pick up a pcpu among the ones that are online and
@@ -752,12 +755,13 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * balancing step all together.
          */
         if ( balance_step == BALANCE_SOFT_AFFINITY &&
-             (!has_soft_affinity(vc->sched_unit) || cpumask_empty(cpus)) )
+             (!has_soft_affinity(unit) || cpumask_empty(cpus)) )
             continue;
 
         /* If present, prefer vc's current processor */
-        cpu = cpumask_test_cpu(vc->processor, cpus)
-                ? vc->processor : cpumask_cycle(vc->processor, cpus);
+        cpu = cpumask_test_cpu(sched_unit_cpu(unit), cpus)
+                ? sched_unit_cpu(unit)
+                : cpumask_cycle(sched_unit_cpu(unit), cpus);
         ASSERT(cpumask_test_cpu(cpu, cpus));
 
         /*
@@ -769,15 +773,15 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * We give preference to the idle execution vehicle with the most
          * idling neighbours in its grouping. This distributes work across
          * distinct cores first and guarantees we don't do something stupid
-         * like run two VCPUs on co-hyperthreads while there are idle cores
+         * like run two UNITs on co-hyperthreads while there are idle cores
          * or sockets.
          *
          * Notice that, when computing the "idleness" of cpu, we may want to
-         * discount vc. That is, iff vc is the currently running and the only
-         * runnable vcpu on cpu, we add cpu to the idlers.
+         * discount unit. That is, iff unit is the currently running and the
+         * only runnable unit on cpu, we add cpu to the idlers.
          */
         cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
-        if ( vc->processor == cpu && is_runq_idle(cpu) )
+        if ( sched_unit_cpu(unit) == cpu && is_runq_idle(cpu) )
             __cpumask_set_cpu(cpu, &idlers);
         cpumask_and(cpus, &idlers, cpus);
 
@@ -787,7 +791,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * CPU, as we just &&-ed it with idlers). In fact, if we are on SMT, and
          * cpu points to a busy thread with an idle sibling, both the threads
          * will be considered the same, from the "idleness" calculation point
-         * of view", preventing vcpu from being moved to the thread that is
+         * of view", preventing unit from being moved to the thread that is
          * actually idle.
          *
          * Notice that cpumask_test_cpu() is quicker than cpumask_empty(), so
@@ -853,7 +857,8 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     if ( commit && spc )
        spc->idle_bias = cpu;
 
-    TRACE_3D(TRC_CSCHED_PICKED_CPU, vc->domain->domain_id, vc->vcpu_id, cpu);
+    TRACE_3D(TRC_CSCHED_PICKED_CPU, unit->domain->domain_id, unit->unit_id,
+             cpu);
 
     return cpu;
 }
@@ -861,7 +866,6 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
 static struct sched_resource *
 csched_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched_unit *svc = CSCHED_UNIT(unit);
 
     /*
@@ -871,26 +875,26 @@ csched_res_pick(const struct scheduler *ops, struct sched_unit *unit)
      * csched_unit_wake() (still called from vcpu_migrate()) we won't
      * get boosted, which we don't deserve as we are "only" migrating.
      */
-    set_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
-    return get_sched_res(_csched_cpu_pick(ops, vc, 1));
+    set_bit(CSCHED_FLAG_UNIT_MIGRATING, &svc->flags);
+    return get_sched_res(_csched_cpu_pick(ops, unit, 1));
 }
 
 static inline void
-__csched_vcpu_acct_start(struct csched_private *prv, struct csched_unit *svc)
+__csched_unit_acct_start(struct csched_private *prv, struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
     unsigned long flags;
 
     spin_lock_irqsave(&prv->lock, flags);
 
-    if ( list_empty(&svc->active_vcpu_elem) )
+    if ( list_empty(&svc->active_unit_elem) )
     {
-        SCHED_VCPU_STAT_CRANK(svc, state_active);
+        SCHED_UNIT_STAT_CRANK(svc, state_active);
         SCHED_STAT_CRANK(acct_unit_active);
 
-        sdom->active_vcpu_count++;
-        list_add(&svc->active_vcpu_elem, &sdom->active_vcpu);
-        /* Make weight per-vcpu */
+        sdom->active_unit_count++;
+        list_add(&svc->active_unit_elem, &sdom->active_unit);
+        /* Make weight per-unit */
         prv->weight += sdom->weight;
         if ( list_empty(&sdom->active_sdom_elem) )
         {
@@ -899,56 +903,56 @@ __csched_vcpu_acct_start(struct csched_private *prv, struct csched_unit *svc)
     }
 
     TRACE_3D(TRC_CSCHED_ACCOUNT_START, sdom->dom->domain_id,
-             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
+             svc->unit->unit_id, sdom->active_unit_count);
 
     spin_unlock_irqrestore(&prv->lock, flags);
 }
 
 static inline void
-__csched_vcpu_acct_stop_locked(struct csched_private *prv,
+__csched_unit_acct_stop_locked(struct csched_private *prv,
     struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
-    BUG_ON( list_empty(&svc->active_vcpu_elem) );
+    BUG_ON( list_empty(&svc->active_unit_elem) );
 
-    SCHED_VCPU_STAT_CRANK(svc, state_idle);
+    SCHED_UNIT_STAT_CRANK(svc, state_idle);
     SCHED_STAT_CRANK(acct_unit_idle);
 
     BUG_ON( prv->weight < sdom->weight );
-    sdom->active_vcpu_count--;
-    list_del_init(&svc->active_vcpu_elem);
+    sdom->active_unit_count--;
+    list_del_init(&svc->active_unit_elem);
     prv->weight -= sdom->weight;
-    if ( list_empty(&sdom->active_vcpu) )
+    if ( list_empty(&sdom->active_unit) )
     {
         list_del_init(&sdom->active_sdom_elem);
     }
 
     TRACE_3D(TRC_CSCHED_ACCOUNT_STOP, sdom->dom->domain_id,
-             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
+             svc->unit->unit_id, sdom->active_unit_count);
 }
 
 static void
-csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
+csched_unit_acct(struct csched_private *prv, unsigned int cpu)
 {
     struct sched_unit *currunit = current->sched_unit;
     struct csched_unit * const svc = CSCHED_UNIT(currunit);
     const struct scheduler *ops = per_cpu(scheduler, cpu);
 
-    ASSERT( current->processor == cpu );
+    ASSERT( sched_unit_cpu(currunit) == cpu );
     ASSERT( svc->sdom != NULL );
-    ASSERT( !is_idle_vcpu(svc->vcpu) );
+    ASSERT( !is_idle_unit(svc->unit) );
 
     /*
-     * If this VCPU's priority was boosted when it last awoke, reset it.
-     * If the VCPU is found here, then it's consuming a non-negligeable
+     * If this UNIT's priority was boosted when it last awoke, reset it.
+     * If the UNIT is found here, then it's consuming a non-negligeable
      * amount of CPU resources and should no longer be boosted.
      */
     if ( svc->pri == CSCHED_PRI_TS_BOOST )
     {
         svc->pri = CSCHED_PRI_TS_UNDER;
         TRACE_2D(TRC_CSCHED_BOOST_END, svc->sdom->dom->domain_id,
-                 svc->vcpu->vcpu_id);
+                 svc->unit->unit_id);
     }
 
     /*
@@ -957,12 +961,12 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
     burn_credits(svc, NOW());
 
     /*
-     * Put this VCPU and domain back on the active list if it was
+     * Put this UNIT and domain back on the active list if it was
      * idling.
      */
-    if ( list_empty(&svc->active_vcpu_elem) )
+    if ( list_empty(&svc->active_unit_elem) )
     {
-        __csched_vcpu_acct_start(prv, svc);
+        __csched_unit_acct_start(prv, svc);
     }
     else
     {
@@ -975,15 +979,15 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
          * migrating it to run elsewhere (see multi-core and multi-thread
          * support in csched_res_pick()).
          */
-        new_cpu = _csched_cpu_pick(ops, current, 0);
+        new_cpu = _csched_cpu_pick(ops, currunit, 0);
 
         unit_schedule_unlock_irqrestore(lock, flags, currunit);
 
         if ( new_cpu != cpu )
         {
-            SCHED_VCPU_STAT_CRANK(svc, migrate_r);
+            SCHED_UNIT_STAT_CRANK(svc, migrate_r);
             SCHED_STAT_CRANK(migrate_running);
-            set_bit(_VPF_migrating, &current->pause_flags);
+            sched_set_pause_flags_atomic(currunit, _VPF_migrating);
             /*
              * As we are about to tickle cpu, we should clear its bit in
              * idlers. But, if we are here, it means there is someone running
@@ -1000,21 +1004,20 @@ static void *
 csched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
                    void *dd)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched_unit *svc;
 
-    /* Allocate per-VCPU info */
+    /* Allocate per-UNIT info */
     svc = xzalloc(struct csched_unit);
     if ( svc == NULL )
         return NULL;
 
     INIT_LIST_HEAD(&svc->runq_elem);
-    INIT_LIST_HEAD(&svc->active_vcpu_elem);
+    INIT_LIST_HEAD(&svc->active_unit_elem);
     svc->sdom = dd;
-    svc->vcpu = vc;
-    svc->pri = is_idle_domain(vc->domain) ?
+    svc->unit = unit;
+    svc->pri = is_idle_unit(unit) ?
         CSCHED_PRI_IDLE : CSCHED_PRI_TS_UNDER;
-    SCHED_VCPU_STATS_RESET(svc);
+    SCHED_UNIT_STATS_RESET(svc);
     SCHED_STAT_CRANK(unit_alloc);
     return svc;
 }
@@ -1022,24 +1025,21 @@ csched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
 static void
 csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched_unit *svc = unit->priv;
     spinlock_t *lock;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
 
     /* csched_res_pick() looks in vc->processor's runq, so we need the lock. */
     lock = unit_schedule_lock_irq(unit);
 
-    unit->res = csched_res_pick(ops, unit);
-    vc->processor = unit->res->processor;
+    sched_set_res(unit, csched_res_pick(ops, unit));
 
     spin_unlock_irq(lock);
 
     lock = unit_schedule_lock_irq(unit);
 
-    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) &&
-         !vc->sched_unit->is_running )
+    if ( !__unit_on_runq(svc) && unit_runnable(unit) && !unit->is_running )
         runq_insert(svc);
 
     unit_schedule_unlock_irq(lock, unit);
@@ -1066,18 +1066,18 @@ csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 
     SCHED_STAT_CRANK(unit_remove);
 
-    ASSERT(!__vcpu_on_runq(svc));
+    ASSERT(!__unit_on_runq(svc));
 
-    if ( test_and_clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+    if ( test_and_clear_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
     {
         SCHED_STAT_CRANK(unit_unpark);
-        vcpu_unpause(svc->vcpu);
+        vcpu_unpause(svc->unit->vcpu);
     }
 
     spin_lock_irq(&prv->lock);
 
-    if ( !list_empty(&svc->active_vcpu_elem) )
-        __csched_vcpu_acct_stop_locked(prv, svc);
+    if ( !list_empty(&svc->active_unit_elem) )
+        __csched_unit_acct_stop_locked(prv, svc);
 
     spin_unlock_irq(&prv->lock);
 
@@ -1087,86 +1087,85 @@ csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 static void
 csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched_unit * const svc = CSCHED_UNIT(unit);
-    unsigned int cpu = vc->processor;
+    unsigned int cpu = sched_unit_cpu(unit);
 
     SCHED_STAT_CRANK(unit_sleep);
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
 
     if ( curr_on_cpu(cpu) == unit )
     {
         /*
          * We are about to tickle cpu, so we should clear its bit in idlers.
-         * But, we are here because vc is going to sleep while running on cpu,
+         * But, we are here because unit is going to sleep while running on cpu,
          * so the bit must be zero already.
          */
         ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(per_cpu(scheduler, cpu))->idlers));
         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
     }
-    else if ( __vcpu_on_runq(svc) )
+    else if ( __unit_on_runq(svc) )
         runq_remove(svc);
 }
 
 static void
 csched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched_unit * const svc = CSCHED_UNIT(unit);
     bool_t migrating;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_unit(unit) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == unit) )
+    if ( unlikely(curr_on_cpu(sched_unit_cpu(unit)) == unit) )
     {
         SCHED_STAT_CRANK(unit_wake_running);
         return;
     }
-    if ( unlikely(__vcpu_on_runq(svc)) )
+    if ( unlikely(__unit_on_runq(svc)) )
     {
         SCHED_STAT_CRANK(unit_wake_onrunq);
         return;
     }
 
-    if ( likely(vcpu_runnable(vc)) )
+    if ( likely(unit_runnable(unit)) )
         SCHED_STAT_CRANK(unit_wake_runnable);
     else
         SCHED_STAT_CRANK(unit_wake_not_runnable);
 
     /*
-     * We temporarly boost the priority of awaking VCPUs!
+     * We temporarily boost the priority of awaking UNITs!
      *
-     * If this VCPU consumes a non negligeable amount of CPU, it
+     * If this UNIT consumes a non negligible amount of CPU, it
      * will eventually find itself in the credit accounting code
      * path where its priority will be reset to normal.
      *
-     * If on the other hand the VCPU consumes little CPU and is
+     * If on the other hand the UNIT consumes little CPU and is
      * blocking and awoken a lot (doing I/O for example), its
      * priority will remain boosted, optimizing it's wake-to-run
      * latencies.
      *
-     * This allows wake-to-run latency sensitive VCPUs to preempt
-     * more CPU resource intensive VCPUs without impacting overall 
+     * This allows wake-to-run latency sensitive UNITs to preempt
+     * more CPU resource intensive UNITs without impacting overall
      * system fairness.
      *
      * There are two cases, when we don't want to boost:
-     *  - VCPUs that are waking up after a migration, rather than
+     *  - UNITs that are waking up after a migration, rather than
      *    after having block;
-     *  - VCPUs of capped domains unpausing after earning credits
+     *  - UNITs of capped domains unpausing after earning credits
      *    they had overspent.
      */
-    migrating = test_and_clear_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
+    migrating = test_and_clear_bit(CSCHED_FLAG_UNIT_MIGRATING, &svc->flags);
 
     if ( !migrating && svc->pri == CSCHED_PRI_TS_UNDER &&
-         !test_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+         !test_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
     {
-        TRACE_2D(TRC_CSCHED_BOOST_START, vc->domain->domain_id, vc->vcpu_id);
+        TRACE_2D(TRC_CSCHED_BOOST_START, unit->domain->domain_id,
+                 unit->unit_id);
         SCHED_STAT_CRANK(unit_boost);
         svc->pri = CSCHED_PRI_TS_BOOST;
     }
 
-    /* Put the VCPU on the runq and tickle CPUs */
+    /* Put the UNIT on the runq and tickle CPUs */
     runq_insert(svc);
     __runq_tickle(svc);
 }
@@ -1177,7 +1176,7 @@ csched_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
     struct csched_unit * const svc = CSCHED_UNIT(unit);
 
     /* Let the scheduler know that this vcpu is trying to yield */
-    set_bit(CSCHED_FLAG_VCPU_YIELD, &svc->flags);
+    set_bit(CSCHED_FLAG_UNIT_YIELD, &svc->flags);
 }
 
 static int
@@ -1206,8 +1205,8 @@ csched_dom_cntl(
         {
             if ( !list_empty(&sdom->active_sdom_elem) )
             {
-                prv->weight -= sdom->weight * sdom->active_vcpu_count;
-                prv->weight += op->u.credit.weight * sdom->active_vcpu_count;
+                prv->weight -= sdom->weight * sdom->active_unit_count;
+                prv->weight += op->u.credit.weight * sdom->active_unit_count;
             }
             sdom->weight = op->u.credit.weight;
         }
@@ -1236,9 +1235,9 @@ csched_aff_cntl(const struct scheduler *ops, struct sched_unit *unit,
 
     /* Are we becoming exclusively pinned? */
     if ( cpumask_weight(hard) == 1 )
-        set_bit(CSCHED_FLAG_VCPU_PINNED, &svc->flags);
+        set_bit(CSCHED_FLAG_UNIT_PINNED, &svc->flags);
     else
-        clear_bit(CSCHED_FLAG_VCPU_PINNED, &svc->flags);
+        clear_bit(CSCHED_FLAG_UNIT_PINNED, &svc->flags);
 }
 
 static inline void
@@ -1281,14 +1280,14 @@ csched_sys_cntl(const struct scheduler *ops,
         else if ( prv->ratelimit && !params->ratelimit_us )
             printk(XENLOG_INFO "Disabling context switch rate limiting\n");
         prv->ratelimit = MICROSECS(params->ratelimit_us);
-        prv->vcpu_migr_delay = MICROSECS(params->vcpu_migr_delay_us);
+        prv->unit_migr_delay = MICROSECS(params->vcpu_migr_delay_us);
         spin_unlock_irqrestore(&prv->lock, flags);
 
         /* FALLTHRU */
     case XEN_SYSCTL_SCHEDOP_getinfo:
         params->tslice_ms = prv->tslice / MILLISECS(1);
         params->ratelimit_us = prv->ratelimit / MICROSECS(1);
-        params->vcpu_migr_delay_us = prv->vcpu_migr_delay / MICROSECS(1);
+        params->vcpu_migr_delay_us = prv->unit_migr_delay / MICROSECS(1);
         rc = 0;
         break;
     }
@@ -1306,7 +1305,7 @@ csched_alloc_domdata(const struct scheduler *ops, struct domain *dom)
         return ERR_PTR(-ENOMEM);
 
     /* Initialize credit and weight */
-    INIT_LIST_HEAD(&sdom->active_vcpu);
+    INIT_LIST_HEAD(&sdom->active_unit);
     INIT_LIST_HEAD(&sdom->active_sdom_elem);
     sdom->dom = dom;
     sdom->weight = CSCHED_DEFAULT_WEIGHT;
@@ -1323,7 +1322,7 @@ csched_free_domdata(const struct scheduler *ops, void *data)
 /*
  * This is a O(n) optimized sort of the runq.
  *
- * Time-share VCPUs can only be one of two priorities, UNDER or OVER. We walk
+ * Time-share UNITs can only be one of two priorities, UNDER or OVER. We walk
  * through the runq and move up any UNDERs that are preceded by OVERS. We
  * remember the last UNDER to make the move up operation O(1).
  */
@@ -1376,7 +1375,7 @@ csched_acct(void* dummy)
 {
     struct csched_private *prv = dummy;
     unsigned long flags;
-    struct list_head *iter_vcpu, *next_vcpu;
+    struct list_head *iter_unit, *next_unit;
     struct list_head *iter_sdom, *next_sdom;
     struct csched_unit *svc;
     struct csched_dom *sdom;
@@ -1423,26 +1422,26 @@ csched_acct(void* dummy)
         sdom = list_entry(iter_sdom, struct csched_dom, active_sdom_elem);
 
         BUG_ON( is_idle_domain(sdom->dom) );
-        BUG_ON( sdom->active_vcpu_count == 0 );
+        BUG_ON( sdom->active_unit_count == 0 );
         BUG_ON( sdom->weight == 0 );
-        BUG_ON( (sdom->weight * sdom->active_vcpu_count) > weight_left );
+        BUG_ON( (sdom->weight * sdom->active_unit_count) > weight_left );
 
-        weight_left -= ( sdom->weight * sdom->active_vcpu_count );
+        weight_left -= ( sdom->weight * sdom->active_unit_count );
 
         /*
          * A domain's fair share is computed using its weight in competition
          * with that of all other active domains.
          *
-         * At most, a domain can use credits to run all its active VCPUs
+         * At most, a domain can use credits to run all its active UNITs
          * for one full accounting period. We allow a domain to earn more
          * only when the system-wide credit balance is negative.
          */
-        credit_peak = sdom->active_vcpu_count * prv->credits_per_tslice;
+        credit_peak = sdom->active_unit_count * prv->credits_per_tslice;
         if ( prv->credit_balance < 0 )
         {
             credit_peak += ( ( -prv->credit_balance
                                * sdom->weight
-                               * sdom->active_vcpu_count) +
+                               * sdom->active_unit_count) +
                              (weight_total - 1)
                            ) / weight_total;
         }
@@ -1453,14 +1452,14 @@ csched_acct(void* dummy)
             if ( credit_cap < credit_peak )
                 credit_peak = credit_cap;
 
-            /* FIXME -- set cap per-vcpu as well...? */
-            credit_cap = ( credit_cap + ( sdom->active_vcpu_count - 1 )
-                         ) / sdom->active_vcpu_count;
+            /* FIXME -- set cap per-unit as well...? */
+            credit_cap = ( credit_cap + ( sdom->active_unit_count - 1 )
+                         ) / sdom->active_unit_count;
         }
 
         credit_fair = ( ( credit_total
                           * sdom->weight
-                          * sdom->active_vcpu_count )
+                          * sdom->active_unit_count )
                         + (weight_total - 1)
                       ) / weight_total;
 
@@ -1494,14 +1493,14 @@ csched_acct(void* dummy)
             credit_fair = credit_peak;
         }
 
-        /* Compute fair share per VCPU */
-        credit_fair = ( credit_fair + ( sdom->active_vcpu_count - 1 )
-                      ) / sdom->active_vcpu_count;
+        /* Compute fair share per UNIT */
+        credit_fair = ( credit_fair + ( sdom->active_unit_count - 1 )
+                      ) / sdom->active_unit_count;
 
 
-        list_for_each_safe( iter_vcpu, next_vcpu, &sdom->active_vcpu )
+        list_for_each_safe( iter_unit, next_unit, &sdom->active_unit )
         {
-            svc = list_entry(iter_vcpu, struct csched_unit, active_vcpu_elem);
+            svc = list_entry(iter_unit, struct csched_unit, active_unit_elem);
             BUG_ON( sdom != svc->sdom );
 
             /* Increment credit */
@@ -1509,20 +1508,20 @@ csched_acct(void* dummy)
             credit = atomic_read(&svc->credit);
 
             /*
-             * Recompute priority or, if VCPU is idling, remove it from
+             * Recompute priority or, if UNIT is idling, remove it from
              * the active list.
              */
             if ( credit < 0 )
             {
                 svc->pri = CSCHED_PRI_TS_OVER;
 
-                /* Park running VCPUs of capped-out domains */
+                /* Park running UNITs of capped-out domains */
                 if ( sdom->cap != 0U &&
                      credit < -credit_cap &&
-                     !test_and_set_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+                     !test_and_set_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
                 {
                     SCHED_STAT_CRANK(unit_park);
-                    vcpu_pause_nosync(svc->vcpu);
+                    vcpu_pause_nosync(svc->unit->vcpu);
                 }
 
                 /* Lower bound on credits */
@@ -1538,22 +1537,22 @@ csched_acct(void* dummy)
                 svc->pri = CSCHED_PRI_TS_UNDER;
 
                 /* Unpark any capped domains whose credits go positive */
-                if ( test_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+                if ( test_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
                 {
                     /*
                      * It's important to unset the flag AFTER the unpause()
-                     * call to make sure the VCPU's priority is not boosted
+                     * call to make sure the UNIT's priority is not boosted
                      * if it is woken up here.
                      */
                     SCHED_STAT_CRANK(unit_unpark);
-                    vcpu_unpause(svc->vcpu);
-                    clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags);
+                    vcpu_unpause(svc->unit->vcpu);
+                    clear_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags);
                 }
 
-                /* Upper bound on credits means VCPU stops earning */
+                /* Upper bound on credits means UNIT stops earning */
                 if ( credit > prv->credits_per_tslice )
                 {
-                    __csched_vcpu_acct_stop_locked(prv, svc);
+                    __csched_unit_acct_stop_locked(prv, svc);
                     /* Divide credits in half, so that when it starts
                      * accounting again, it starts a little bit "ahead" */
                     credit /= 2;
@@ -1561,8 +1560,8 @@ csched_acct(void* dummy)
                 }
             }
 
-            SCHED_VCPU_STAT_SET(svc, credit_last, credit);
-            SCHED_VCPU_STAT_SET(svc, credit_incr, credit_fair);
+            SCHED_UNIT_STAT_SET(svc, credit_last, credit);
+            SCHED_UNIT_STAT_SET(svc, credit_incr, credit_fair);
             credit_balance += credit;
         }
     }
@@ -1588,10 +1587,10 @@ csched_tick(void *_cpu)
     spc->tick++;
 
     /*
-     * Accounting for running VCPU
+     * Accounting for running UNIT
      */
-    if ( !is_idle_vcpu(current) )
-        csched_vcpu_acct(prv, cpu);
+    if ( !is_idle_unit(current->sched_unit) )
+        csched_unit_acct(prv, cpu);
 
     /*
      * Check if runq needs to be sorted
@@ -1612,7 +1611,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
     const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu);
     struct csched_unit *speer;
     struct list_head *iter;
-    struct vcpu *vc;
+    struct sched_unit *unit;
 
     ASSERT(peer_pcpu != NULL);
 
@@ -1620,7 +1619,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
      * Don't steal from an idle CPU's runq because it's about to
      * pick up work from it itself.
      */
-    if ( unlikely(is_idle_vcpu(curr_on_cpu(peer_cpu)->vcpu)) )
+    if ( unlikely(is_idle_unit(curr_on_cpu(peer_cpu))) )
         goto out;
 
     list_for_each( iter, &peer_pcpu->runq )
@@ -1628,46 +1627,44 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
         speer = __runq_elem(iter);
 
         /*
-         * If next available VCPU here is not of strictly higher
+         * If next available UNIT here is not of strictly higher
          * priority than ours, this PCPU is useless to us.
          */
         if ( speer->pri <= pri )
             break;
 
-        /* Is this VCPU runnable on our PCPU? */
-        vc = speer->vcpu;
-        BUG_ON( is_idle_vcpu(vc) );
+        /* Is this UNIT runnable on our PCPU? */
+        unit = speer->unit;
+        BUG_ON( is_idle_unit(unit) );
 
         /*
-         * If the vcpu is still in peer_cpu's scheduling tail, or if it
+         * If the unit is still in peer_cpu's scheduling tail, or if it
          * has no useful soft affinity, skip it.
          *
          * In fact, what we want is to check if we have any "soft-affine
          * work" to steal, before starting to look at "hard-affine work".
          *
-         * Notice that, if not even one vCPU on this runq has a useful
+         * Notice that, if not even one unit on this runq has a useful
          * soft affinity, we could have avoid considering this runq for
          * a soft balancing step in the first place. This, for instance,
          * can be implemented by taking note of on what runq there are
-         * vCPUs with useful soft affinities in some sort of bitmap
+         * units with useful soft affinities in some sort of bitmap
          * or counter.
          */
-        if ( vc->sched_unit->is_running ||
-             (balance_step == BALANCE_SOFT_AFFINITY &&
-              !has_soft_affinity(vc->sched_unit)) )
+        if ( unit->is_running || (balance_step == BALANCE_SOFT_AFFINITY &&
+                                  !has_soft_affinity(unit)) )
             continue;
 
-        affinity_balance_cpumask(vc->sched_unit, balance_step, cpumask_scratch);
-        if ( __csched_vcpu_is_migrateable(prv, vc, cpu, cpumask_scratch) )
+        affinity_balance_cpumask(unit, balance_step, cpumask_scratch);
+        if ( __csched_unit_is_migrateable(prv, unit, cpu, cpumask_scratch) )
         {
             /* We got a candidate. Grab it! */
-            TRACE_3D(TRC_CSCHED_STOLEN_VCPU, peer_cpu,
-                     vc->domain->domain_id, vc->vcpu_id);
-            SCHED_VCPU_STAT_CRANK(speer, migrate_q);
+            TRACE_3D(TRC_CSCHED_STOLEN_UNIT, peer_cpu,
+                     unit->domain->domain_id, unit->unit_id);
+            SCHED_UNIT_STAT_CRANK(speer, migrate_q);
             SCHED_STAT_CRANK(migrate_queued);
-            WARN_ON(vc->is_urgent);
             runq_remove(speer);
-            sched_set_res(vc->sched_unit, get_sched_res(cpu));
+            sched_set_res(unit, get_sched_res(cpu));
             /*
              * speer will start executing directly on cpu, without having to
              * go through runq_insert(). So we must update the runnable count
@@ -1693,7 +1690,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
     int peer_cpu, first_cpu, peer_node, bstep;
     int node = cpu_to_node(cpu);
 
-    BUG_ON( cpu != snext->vcpu->processor );
+    BUG_ON( cpu != sched_unit_cpu(snext->unit) );
     online = cpupool_online_cpumask(c);
 
     /*
@@ -1722,7 +1719,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
         /*
          * We peek at the non-idling CPUs in a node-wise fashion. In fact,
          * it is more likely that we find some affine work on our same
-         * node, not to mention that migrating vcpus within the same node
+         * node, not to mention that migrating units within the same node
          * could well expected to be cheaper than across-nodes (memory
          * stays local, there might be some node-wide cache[s], etc.).
          */
@@ -1743,7 +1740,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                 spinlock_t *lock;
 
                 /*
-                 * If there is only one runnable vCPU on peer_cpu, it means
+                 * If there is only one runnable unit on peer_cpu, it means
                  * there's no one to be stolen in its runqueue, so skip it.
                  *
                  * Checking this without holding the lock is racy... But that's
@@ -1756,13 +1753,13 @@ csched_load_balance(struct csched_private *prv, int cpu,
                  *   And we can avoid that by re-checking nr_runnable after
                  *   having grabbed the lock, if we want;
                  * - if we race with inc_nr_runnable(), we skip a pCPU that may
-                 *   have runnable vCPUs in its runqueue, but that's not a
+                 *   have runnable units in its runqueue, but that's not a
                  *   problem because:
                  *   + if racing with csched_unit_insert() or csched_unit_wake(),
-                 *     __runq_tickle() will be called afterwords, so the vCPU
+                 *     __runq_tickle() will be called afterwords, so the unit
                  *     won't get stuck in the runqueue for too long;
-                 *   + if racing with csched_runq_steal(), it may be that a
-                 *     vCPU that we could have picked up, stays in a runqueue
+                 *   + if racing with csched_runq_steal(), it may be that an
+                 *     unit that we could have picked up, stays in a runqueue
                  *     until someone else tries to steal it again. But this is
                  *     no worse than what can happen already (without this
                  *     optimization), it the pCPU would schedule right after we
@@ -1797,7 +1794,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                     csched_runq_steal(peer_cpu, cpu, snext->pri, bstep) : NULL;
                 pcpu_schedule_unlock(lock, peer_cpu);
 
-                /* As soon as one vcpu is found, balancing ends */
+                /* As soon as one unit is found, balancing ends */
                 if ( speer != NULL )
                 {
                     *stolen = 1;
@@ -1836,14 +1833,15 @@ csched_schedule(
 {
     const int cpu = smp_processor_id();
     struct list_head * const runq = RUNQ(cpu);
-    struct csched_unit * const scurr = CSCHED_UNIT(current->sched_unit);
+    struct sched_unit *unit = current->sched_unit;
+    struct csched_unit * const scurr = CSCHED_UNIT(unit);
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_unit *snext;
     struct task_slice ret;
     s_time_t runtime, tslice;
 
     SCHED_STAT_CRANK(schedule);
-    CSCHED_VCPU_CHECK(current);
+    CSCHED_UNIT_CHECK(unit);
 
     /*
      * Here in Credit1 code, we usually just call TRACE_nD() helpers, and
@@ -1857,30 +1855,30 @@ csched_schedule(
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        d.idle = is_idle_vcpu(current);
+        d.idle = is_idle_unit(unit);
         __trace_var(TRC_CSCHED_SCHEDULE, 1, sizeof(d),
                     (unsigned char *)&d);
     }
 
-    runtime = now - current->sched_unit->state_entry_time;
+    runtime = now - unit->state_entry_time;
     if ( runtime < 0 ) /* Does this ever happen? */
         runtime = 0;
 
-    if ( !is_idle_vcpu(scurr->vcpu) )
+    if ( !is_idle_unit(unit) )
     {
-        /* Update credits of a non-idle VCPU. */
+        /* Update credits of a non-idle UNIT. */
         burn_credits(scurr, now);
         scurr->start_time -= now;
     }
     else
     {
-        /* Re-instate a boosted idle VCPU as normal-idle. */
+        /* Re-instate a boosted idle UNIT as normal-idle. */
         scurr->pri = CSCHED_PRI_IDLE;
     }
 
     /* Choices, choices:
-     * - If we have a tasklet, we need to run the idle vcpu no matter what.
-     * - If sched rate limiting is in effect, and the current vcpu has
+     * - If we have a tasklet, we need to run the idle unit no matter what.
+     * - If sched rate limiting is in effect, and the current unit has
      *   run for less than that amount of time, continue the current one,
      *   but with a shorter timeslice and return it immediately
      * - Otherwise, chose the one with the highest priority (which may
@@ -1898,11 +1896,11 @@ csched_schedule(
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !test_bit(CSCHED_FLAG_VCPU_YIELD, &scurr->flags)
+    if ( !test_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags)
          && !tasklet_work_scheduled
          && prv->ratelimit
-         && vcpu_runnable(current)
-         && !is_idle_vcpu(current)
+         && unit_runnable(unit)
+         && !is_idle_unit(unit)
          && runtime < prv->ratelimit )
     {
         snext = scurr;
@@ -1920,11 +1918,11 @@ csched_schedule(
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned unit:16, dom:16;
                 unsigned runtime;
             } d;
-            d.dom = scurr->vcpu->domain->domain_id;
-            d.vcpu = scurr->vcpu->vcpu_id;
+            d.dom = unit->domain->domain_id;
+            d.unit = unit->unit_id;
             d.runtime = runtime;
             __trace_var(TRC_CSCHED_RATELIMIT, 1, sizeof(d),
                         (unsigned char *)&d);
@@ -1936,13 +1934,13 @@ csched_schedule(
     tslice = prv->tslice;
 
     /*
-     * Select next runnable local VCPU (ie top of local runq)
+     * Select next runnable local UNIT (ie top of local runq)
      */
-    if ( vcpu_runnable(current) )
+    if ( unit_runnable(unit) )
         __runq_insert(scurr);
     else
     {
-        BUG_ON( is_idle_vcpu(current) || list_empty(runq) );
+        BUG_ON( is_idle_unit(unit) || list_empty(runq) );
         /* Current has blocked. Update the runnable counter for this cpu. */
         dec_nr_runnable(cpu);
     }
@@ -1950,23 +1948,23 @@ csched_schedule(
     snext = __runq_elem(runq->next);
     ret.migrated = 0;
 
-    /* Tasklet work (which runs in idle VCPU context) overrides all else. */
+    /* Tasklet work (which runs in idle UNIT context) overrides all else. */
     if ( tasklet_work_scheduled )
     {
         TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_UNIT(idle_vcpu[cpu]->sched_unit);
+        snext = CSCHED_UNIT(sched_idle_unit(cpu));
         snext->pri = CSCHED_PRI_TS_BOOST;
     }
 
     /*
      * Clear YIELD flag before scheduling out
      */
-    clear_bit(CSCHED_FLAG_VCPU_YIELD, &scurr->flags);
+    clear_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags);
 
     /*
      * SMP Load balance:
      *
-     * If the next highest priority local runnable VCPU has already eaten
+     * If the next highest priority local runnable UNIT has already eaten
      * through its credits, look on other PCPUs to see if we have more
      * urgent work... If not, csched_load_balance() will return snext, but
      * already removed from the runq.
@@ -1990,32 +1988,32 @@ csched_schedule(
         cpumask_clear_cpu(cpu, prv->idlers);
     }
 
-    if ( !is_idle_vcpu(snext->vcpu) )
+    if ( !is_idle_unit(snext->unit) )
         snext->start_time += now;
 
 out:
     /*
      * Return task to run next...
      */
-    ret.time = (is_idle_vcpu(snext->vcpu) ?
+    ret.time = (is_idle_unit(snext->unit) ?
                 -1 : tslice);
-    ret.task = snext->vcpu->sched_unit;
+    ret.task = snext->unit;
 
-    CSCHED_VCPU_CHECK(ret.task->vcpu);
+    CSCHED_UNIT_CHECK(ret.task);
     return ret;
 }
 
 static void
-csched_dump_vcpu(struct csched_unit *svc)
+csched_dump_unit(struct csched_unit *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
     printk("[%i.%i] pri=%i flags=%x cpu=%i",
-            svc->vcpu->domain->domain_id,
-            svc->vcpu->vcpu_id,
+            svc->unit->domain->domain_id,
+            svc->unit->unit_id,
             svc->pri,
             svc->flags,
-            svc->vcpu->processor);
+            sched_unit_cpu(svc->unit));
 
     if ( sdom )
     {
@@ -2049,7 +2047,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
 
     /*
      * We need both locks:
-     * - csched_dump_vcpu() wants to access domains' scheduling
+     * - csched_dump_unit() wants to access domains' scheduling
      *   parameters, which are protected by the private scheduler lock;
      * - we scan through the runqueue, so we need the proper runqueue
      *   lock (the one of the runqueue of this cpu).
@@ -2065,12 +2063,12 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_sibling_mask, cpu)),
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
-    /* current VCPU (nothing to say if that's the idle vcpu). */
+    /* current UNIT (nothing to say if that's the idle unit). */
     svc = CSCHED_UNIT(curr_on_cpu(cpu));
-    if ( svc && !is_idle_vcpu(svc->vcpu) )
+    if ( svc && !is_idle_unit(svc->unit) )
     {
         printk("\trun: ");
-        csched_dump_vcpu(svc);
+        csched_dump_unit(svc);
     }
 
     loop = 0;
@@ -2080,7 +2078,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
         if ( svc )
         {
             printk("\t%3d: ", ++loop);
-            csched_dump_vcpu(svc);
+            csched_dump_unit(svc);
         }
     }
 
@@ -2122,29 +2120,29 @@ csched_dump(const struct scheduler *ops)
            prv->ratelimit / MICROSECS(1),
            CSCHED_CREDITS_PER_MSEC,
            prv->ticks_per_tslice,
-           prv->vcpu_migr_delay/ MICROSECS(1));
+           prv->unit_migr_delay/ MICROSECS(1));
 
     printk("idlers: %*pb\n", nr_cpu_ids, cpumask_bits(prv->idlers));
 
-    printk("active vcpus:\n");
+    printk("active units:\n");
     loop = 0;
     list_for_each( iter_sdom, &prv->active_sdom )
     {
         struct csched_dom *sdom;
         sdom = list_entry(iter_sdom, struct csched_dom, active_sdom_elem);
 
-        list_for_each( iter_svc, &sdom->active_vcpu )
+        list_for_each( iter_svc, &sdom->active_unit )
         {
             struct csched_unit *svc;
             spinlock_t *lock;
 
-            svc = list_entry(iter_svc, struct csched_unit, active_vcpu_elem);
-            lock = unit_schedule_lock(svc->vcpu->sched_unit);
+            svc = list_entry(iter_svc, struct csched_unit, active_unit_elem);
+            lock = unit_schedule_lock(svc->unit);
 
             printk("\t%3d: ", ++loop);
-            csched_dump_vcpu(svc);
+            csched_dump_unit(svc);
 
-            unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+            unit_schedule_unlock(lock, svc->unit);
         }
     }
 
@@ -2218,7 +2216,7 @@ csched_init(struct scheduler *ops)
     else
         prv->ratelimit = MICROSECS(sched_ratelimit_us);
 
-    prv->vcpu_migr_delay = MICROSECS(vcpu_migration_delay_us);
+    prv->unit_migr_delay = MICROSECS(vcpu_migration_delay_us);
 
     return 0;
 }
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 23/60] xen/sched: make credit2 scheduler vcpu agnostic.
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Switch credit2 scheduler completely from vcpu to sched_unit usage.

As we are touching lots of lines remove some white space at the end of
the line, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit2.c | 820 ++++++++++++++++++++++-----------------------
 1 file changed, 403 insertions(+), 417 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ef29a3d874..0f57135e81 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -45,7 +45,7 @@
 #define TRC_CSCHED2_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(CSCHED2, 8)
 #define TRC_CSCHED2_UPDATE_LOAD      TRC_SCHED_CLASS_EVT(CSCHED2, 9)
 #define TRC_CSCHED2_RUNQ_ASSIGN      TRC_SCHED_CLASS_EVT(CSCHED2, 10)
-#define TRC_CSCHED2_UPDATE_VCPU_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 11)
+#define TRC_CSCHED2_UPDATE_UNIT_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 11)
 #define TRC_CSCHED2_UPDATE_RUNQ_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 12)
 #define TRC_CSCHED2_TICKLE_NEW       TRC_SCHED_CLASS_EVT(CSCHED2, 13)
 #define TRC_CSCHED2_RUNQ_MAX_WEIGHT  TRC_SCHED_CLASS_EVT(CSCHED2, 14)
@@ -74,13 +74,13 @@
  * Design:
  *
  * VMs "burn" credits based on their weight; higher weight means
- * credits burn more slowly.  The highest weight vcpu burns credits at
+ * credits burn more slowly.  The highest weight unit burns credits at
  * a rate of 1 credit per nanosecond.  Others burn proportionally
  * more.
  *
- * vcpus are inserted into the runqueue by credit order.
+ * units are inserted into the runqueue by credit order.
  *
- * Credits are "reset" when the next vcpu in the runqueue is less than
+ * Credits are "reset" when the next unit in the runqueue is less than
  * or equal to zero.  At that point, everyone's credits are "clipped"
  * to a small value, and a fixed credit is added to everyone.
  */
@@ -95,33 +95,33 @@
  *   be given a cap of 25%; a domain that must not use more than 1+1/2 of
  *   physical CPU time, will be given a cap of 150%;
  *
- * - caps are per-domain (not per-vCPU). If a domain has only 1 vCPU, and
- *   a 40% cap, that one vCPU will use 40% of one pCPU. If a somain has 4
- *   vCPUs, and a 200% cap, the equivalent of 100% time on 2 pCPUs will be
- *   split among the v vCPUs. How much each of the vCPUs will actually get,
+ * - caps are per-domain (not per-unit). If a domain has only 1 unit, and
+ *   a 40% cap, that one unit will use 40% of one pCPU. If a somain has 4
+ *   units, and a 200% cap, the equivalent of 100% time on 2 pCPUs will be
+ *   split among the v units. How much each of the units will actually get,
  *   during any given interval of time, is unspecified (as it depends on
  *   various aspects: workload, system load, etc.). For instance, it is
- *   possible that, during a given time interval, 2 vCPUs use 100% each,
+ *   possible that, during a given time interval, 2 units use 100% each,
  *   and the other two use nothing; while during another time interval,
- *   two vCPUs use 80%, one uses 10% and the other 30%; or that each use
+ *   two units use 80%, one uses 10% and the other 30%; or that each use
  *   50% (and so on and so forth).
  *
  * For implementing this, we use the following approach:
  *
  * - each domain is given a 'budget', an each domain has a timer, which
  *   replenishes the domain's budget periodically. The budget is the amount
- *   of time the vCPUs of the domain can use every 'period';
+ *   of time the units of the domain can use every 'period';
  *
  * - the period is CSCHED2_BDGT_REPL_PERIOD, and is the same for all domains
  *   (but each domain has its own timer; so the all are periodic by the same
  *   period, but replenishment of the budgets of the various domains, at
  *   periods boundaries, are not synchronous);
  *
- * - when vCPUs run, they consume budget. When they don't run, they don't
- *   consume budget. If there is no budget left for the domain, no vCPU of
- *   that domain can run. If a vCPU tries to run and finds that there is no
+ * - when units run, they consume budget. When they don't run, they don't
+ *   consume budget. If there is no budget left for the domain, no unit of
+ *   that domain can run. If an unit tries to run and finds that there is no
  *   budget, it blocks.
- *   At whatever time a vCPU wants to run, it must check the domain's budget,
+ *   At whatever time an unit wants to run, it must check the domain's budget,
  *   and if there is some, it can use it.
  *
  * - budget is replenished to the top of the capacity for the domain once
@@ -129,39 +129,39 @@
  *   though, the budget after a replenishment will always be at most equal
  *   to the total capacify of the domain ('tot_budget');
  *
- * - when a budget replenishment occurs, if there are vCPUs that had been
+ * - when a budget replenishment occurs, if there are units that had been
  *   blocked because of lack of budget, they'll be unblocked, and they will
  *   (potentially) be able to run again.
  *
  * Finally, some even more implementation related detail:
  *
- * - budget is stored in a domain-wide pool. vCPUs of the domain that want
+ * - budget is stored in a domain-wide pool. Items of the domain that want
  *   to run go to such pool, and grub some. When they do so, the amount
  *   they grabbed is _immediately_ removed from the pool. This happens in
- *   vcpu_grab_budget();
+ *   unit_grab_budget();
  *
- * - when vCPUs stop running, if they've not consumed all the budget they
+ * - when units stop running, if they've not consumed all the budget they
  *   took, the leftover is put back in the pool. This happens in
- *   vcpu_return_budget();
+ *   unit_return_budget();
  *
- * - the above means that a vCPU can find out that there is no budget and
+ * - the above means that an unit can find out that there is no budget and
  *   block, not only if the cap has actually been reached (for this period),
- *   but also if some other vCPUs, in order to run, have grabbed a certain
+ *   but also if some other units, in order to run, have grabbed a certain
  *   quota of budget, no matter whether they've already used it all or not.
- *   A vCPU blocking because (any form of) lack of budget is said to be
- *   "parked", and such blocking happens in park_vcpu();
+ *   An unit blocking because (any form of) lack of budget is said to be
+ *   "parked", and such blocking happens in park_unit();
  *
- * - when a vCPU stops running, and puts back some budget in the domain pool,
+ * - when an unit stops running, and puts back some budget in the domain pool,
  *   we need to check whether there is someone which has been parked and that
- *   can be unparked. This happens in unpark_parked_vcpus(), called from
+ *   can be unparked. This happens in unpark_parked_units(), called from
  *   csched2_context_saved();
  *
  * - of course, unparking happens also as a consequence of the domain's budget
  *   being replenished by the periodic timer. This also occurs by means of
  *   calling csched2_context_saved() (but from replenish_domain_budget());
  *
- * - parked vCPUs of a domain are kept in a (per-domain) list, called
- *   'parked_vcpus'). Manipulation of the list and of the domain-wide budget
+ * - parked units of a domain are kept in a (per-domain) list, called
+ *   'parked_units'). Manipulation of the list and of the domain-wide budget
  *   pool, must occur only when holding the 'budget_lock'.
  */
 
@@ -174,9 +174,9 @@
  *     pcpu_schedule_lock() / unit_schedule_lock() (and friends),
  *   * a cpu may (try to) take a "remote" runqueue lock, e.g., for
  *     load balancing;
- *  + serializes runqueue operations (removing and inserting vcpus);
+ *  + serializes runqueue operations (removing and inserting units);
  *  + protects runqueue-wide data in csched2_runqueue_data;
- *  + protects vcpu parameters in csched2_unit for the vcpu in the
+ *  + protects unit parameters in csched2_unit for the unit in the
  *    runqueue.
  *
  * - Private scheduler lock
@@ -190,8 +190,8 @@
  *  + it is per-domain;
  *  + protects, in domains that have an utilization cap;
  *   * manipulation of the total budget of the domain (as it is shared
- *     among all vCPUs of the domain),
- *   * manipulation of the list of vCPUs that are blocked waiting for
+ *     among all units of the domain),
+ *   * manipulation of the list of units that are blocked waiting for
  *     some budget to be available.
  *
  * - Type:
@@ -228,9 +228,9 @@
  */
 #define CSCHED2_CREDIT_INIT          MILLISECS(10)
 /*
- * Amount of credit the idle vcpus have. It never changes, as idle
- * vcpus does not consume credits, and it must be lower than whatever
- * amount of credit 'regular' vcpu would end up with.
+ * Amount of credit the idle units have. It never changes, as idle
+ * units does not consume credits, and it must be lower than whatever
+ * amount of credit 'regular' unit would end up with.
  */
 #define CSCHED2_IDLE_CREDIT          (-(1U<<30))
 /*
@@ -243,9 +243,9 @@
  * MIN_TIMER.
  */
 #define CSCHED2_MIGRATE_RESIST       ((opt_migrate_resist)*MICROSECS(1))
-/* How much to "compensate" a vcpu for L2 migration. */
+/* How much to "compensate" an unit for L2 migration. */
 #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50)
-/* How tolerant we should be when peeking at runtime of vcpus on other cpus */
+/* How tolerant we should be when peeking at runtime of units on other cpus */
 #define CSCHED2_RATELIMIT_TICKLE_TOLERANCE MICROSECS(50)
 /* Reset: Value below which credit will be reset. */
 #define CSCHED2_CREDIT_RESET         0
@@ -258,7 +258,7 @@
  * Flags
  */
 /*
- * CSFLAG_scheduled: Is this vcpu either running on, or context-switching off,
+ * CSFLAG_scheduled: Is this unit either running on, or context-switching off,
  * a physical cpu?
  * + Accessed only with runqueue lock held
  * + Set when chosen as next in csched2_schedule().
@@ -280,21 +280,21 @@
 #define __CSFLAG_delayed_runq_add 2
 #define CSFLAG_delayed_runq_add (1U<<__CSFLAG_delayed_runq_add)
 /*
- * CSFLAG_runq_migrate_request: This vcpu is being migrated as a result of a
+ * CSFLAG_runq_migrate_request: This unit is being migrated as a result of a
  * credit2-initiated runq migrate request; migrate it to the runqueue indicated
- * in the svc struct. 
+ * in the svc struct.
  */
 #define __CSFLAG_runq_migrate_request 3
 #define CSFLAG_runq_migrate_request (1U<<__CSFLAG_runq_migrate_request)
 /*
- * CSFLAG_vcpu_yield: this vcpu was running, and has called vcpu_yield(). The
+ * CSFLAG_unit_yield: this unit was running, and has called vcpu_yield(). The
  * scheduler is invoked to see if we can give the cpu to someone else, and
- * get back to the yielding vcpu in a while.
+ * get back to the yielding unit in a while.
  */
-#define __CSFLAG_vcpu_yield 4
-#define CSFLAG_vcpu_yield (1U<<__CSFLAG_vcpu_yield)
+#define __CSFLAG_unit_yield 4
+#define CSFLAG_unit_yield (1U<<__CSFLAG_unit_yield)
 /*
- * CSFLAGS_pinned: this vcpu is currently 'pinned', i.e., has its hard
+ * CSFLAGS_pinned: this unit is currently 'pinned', i.e., has its hard
  * affinity set to one and only 1 cpu (and, hence, can only run there).
  */
 #define __CSFLAG_pinned 5
@@ -306,7 +306,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
 /*
  * Load tracking and load balancing
  *
- * Load history of runqueues and vcpus is accounted for by using an
+ * Load history of runqueues and units is accounted for by using an
  * exponential weighted moving average algorithm. However, instead of using
  * fractions,we shift everything to left by the number of bits we want to
  * use for representing the fractional part (Q-format).
@@ -326,7 +326,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
  *
  * where W is the length of the window, P the multiplier for transitiong into
  * Q-format fixed point arithmetic and load is the instantaneous load of a
- * runqueue, which basically is the number of runnable vcpus there are on the
+ * runqueue, which basically is the number of runnable units there are on the
  * runqueue (for the meaning of the other terms, look at the doc comment to
  *  update_runq_load()).
  *
@@ -338,7 +338,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
  * The maximum possible value for the average load, which we want to store in
  * s_time_t type variables (i.e., we have 63 bits available) is load*P. This
  * means that, with P 18 bits wide, load can occupy 45 bits. This in turn
- * means we can have 2^45 vcpus in each runqueue, before overflow occurs!
+ * means we can have 2^45 units in each runqueue, before overflow occurs!
  *
  * However, it can happen that, at step j+1, if:
  *
@@ -354,13 +354,13 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
  *
  *  2^(63 - 30 - 18) = 2^15 = 32768
  *
- * So 32768 is the maximum number of vcpus the we can have in a runqueue,
+ * So 32768 is the maximum number of units the we can have in a runqueue,
  * at any given time, and still not have problems with the load tracking
  * calculations... and this is more than fine.
  *
  * As a matter of fact, since we are using microseconds granularity, we have
  * W=2^20. So, still with 18 fractional bits and a 1 second long window, there
- * may be 2^25 = 33554432 vcpus in a runq before we have to start thinking
+ * may be 2^25 = 33554432 units in a runq before we have to start thinking
  * about overflow.
  */
 
@@ -468,7 +468,7 @@ struct csched2_runqueue_data {
     struct list_head runq;     /* Ordered list of runnable vms               */
     int id;                    /* ID of this runqueue (-1 if invalid)        */
 
-    int load;                  /* Instantaneous load (num of non-idle vcpus) */
+    int load;                  /* Instantaneous load (num of non-idle units) */
     s_time_t load_last_update; /* Last time average was updated              */
     s_time_t avgload;          /* Decaying queue load                        */
     s_time_t b_avgload;        /* Decaying queue load modified by balancing  */
@@ -478,8 +478,8 @@ struct csched2_runqueue_data {
         tickled,               /* Have been asked to go through schedule     */
         idle;                  /* Currently idle pcpus                       */
 
-    struct list_head svc;      /* List of all vcpus assigned to the runqueue */
-    unsigned int max_weight;   /* Max weight of the vcpus in this runqueue   */
+    struct list_head svc;      /* List of all units assigned to the runqueue */
+    unsigned int max_weight;   /* Max weight of the units in this runqueue   */
     unsigned int pick_bias;    /* Last picked pcpu. Start from it next time  */
 };
 
@@ -509,20 +509,20 @@ struct csched2_pcpu {
 };
 
 /*
- * Virtual CPU
+ * Schedule Item
  */
 struct csched2_unit {
     struct csched2_dom *sdom;          /* Up-pointer to domain                */
-    struct vcpu *vcpu;                 /* Up-pointer, to vcpu                 */
+    struct sched_unit *unit;           /* Up-pointer, to schedule unit        */
     struct csched2_runqueue_data *rqd; /* Up-pointer to the runqueue          */
 
     int credit;                        /* Current amount of credit            */
-    unsigned int weight;               /* Weight of this vcpu                 */
+    unsigned int weight;               /* Weight of this unit                 */
     unsigned int residual;             /* Reminder of div(max_weight/weight)  */
     unsigned flags;                    /* Status flags (16 bits would be ok,  */
     s_time_t budget;                   /* Current budget (if domains has cap) */
                                        /* but clear_bit() does not like that) */
-    s_time_t budget_quota;             /* Budget to which vCPU is entitled    */
+    s_time_t budget_quota;             /* Budget to which unit is entitled    */
 
     s_time_t start_time;               /* Time we were scheduled (for credit) */
 
@@ -531,7 +531,7 @@ struct csched2_unit {
     s_time_t avgload;                  /* Decaying queue load                 */
 
     struct list_head runq_elem;        /* On the runqueue (rqd->runq)         */
-    struct list_head parked_elem;      /* On the parked_vcpus list            */
+    struct list_head parked_elem;      /* On the parked_units list            */
     struct list_head rqd_elem;         /* On csched2_runqueue_data's svc list */
     struct csched2_runqueue_data *migrate_rqd; /* Pre-determined migr. target */
     int tickled_cpu;                   /* Cpu that will pick us (-1 if none)  */
@@ -549,12 +549,12 @@ struct csched2_dom {
 
     struct timer repl_timer;    /* Timer for periodic replenishment of budget */
     s_time_t next_repl;         /* Time at which next replenishment occurs    */
-    struct list_head parked_vcpus; /* List of CPUs waiting for budget         */
+    struct list_head parked_units; /* List of CPUs waiting for budget         */
 
     struct list_head sdom_elem; /* On csched2_runqueue_data's sdom list       */
     uint16_t weight;            /* User specified weight                      */
     uint16_t cap;               /* User specified cap                         */
-    uint16_t nr_vcpus;          /* Number of vcpus of this domain             */
+    uint16_t nr_units;          /* Number of units of this domain             */
 };
 
 /*
@@ -593,7 +593,7 @@ static inline struct csched2_runqueue_data *c2rqd(const struct scheduler *ops,
     return &csched2_priv(ops)->rqd[c2r(cpu)];
 }
 
-/* Does the domain of this vCPU have a cap? */
+/* Does the domain of this unit have a cap? */
 static inline bool has_cap(const struct csched2_unit *svc)
 {
     return svc->budget != STIME_MAX;
@@ -611,24 +611,24 @@ static inline bool has_cap(const struct csched2_unit *svc)
  *    smt_idle mask.
  *
  * Once we have such a mask, it is easy to implement a policy that, either:
- *  - uses fully idle cores first: it is enough to try to schedule the vcpus
+ *  - uses fully idle cores first: it is enough to try to schedule the units
  *    on pcpus from smt_idle mask first. This is what happens if
  *    sched_smt_power_savings was not set at boot (default), and it maximizes
  *    true parallelism, and hence performance;
- *  - uses already busy cores first: it is enough to try to schedule the vcpus
+ *  - uses already busy cores first: it is enough to try to schedule the units
  *    on pcpus that are idle, but are not in smt_idle. This is what happens if
  *    sched_smt_power_savings is set at boot, and it allows as more cores as
  *    possible to stay in low power states, minimizing power consumption.
  *
  * This logic is entirely implemented in runq_tickle(), and that is enough.
- * In fact, in this scheduler, placement of a vcpu on one of the pcpus of a
+ * In fact, in this scheduler, placement of an unit on one of the pcpus of a
  * runq, _always_ happens by means of tickling:
- *  - when a vcpu wakes up, it calls csched2_unit_wake(), which calls
+ *  - when an unit wakes up, it calls csched2_unit_wake(), which calls
  *    runq_tickle();
  *  - when a migration is initiated in schedule.c, we call csched2_res_pick(),
  *    csched2_unit_migrate() (which calls migrate()) and csched2_unit_wake().
  *    csched2_res_pick() looks for the least loaded runq and return just any
- *    of its processors. Then, csched2_unit_migrate() just moves the vcpu to
+ *    of its processors. Then, csched2_unit_migrate() just moves the unit to
  *    the chosen runq, and it is again runq_tickle(), called by
  *    csched2_unit_wake() that actually decides what pcpu to use within the
  *    chosen runq;
@@ -643,7 +643,7 @@ static inline bool has_cap(const struct csched2_unit *svc)
  *
  * NB that rqd->smt_idle is different than rqd->idle.  rqd->idle
  * records pcpus that at are merely idle (i.e., at the moment do not
- * have a vcpu running on them).  But you have to manually filter out
+ * have an unit running on them).  But you have to manually filter out
  * which pcpus have been tickled in order to find cores that are not
  * going to be busy soon.  Filtering out tickled cpus pairwise is a
  * lot of extra pain; so for rqd->smt_idle, we explicitly make so that
@@ -690,24 +690,24 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
  */
 static int get_fallback_cpu(struct csched2_unit *svc)
 {
-    struct vcpu *v = svc->vcpu;
+    struct sched_unit *unit = svc->unit;
     unsigned int bs;
 
     SCHED_STAT_CRANK(need_fallback_cpu);
 
     for_each_affinity_balance_step( bs )
     {
-        int cpu = v->processor;
+        int cpu = sched_unit_cpu(unit);
 
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v->sched_unit) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) )
             continue;
 
-        affinity_balance_cpumask(v->sched_unit, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                    cpupool_domain_cpumask(v->domain));
+                    cpupool_domain_cpumask(unit->domain));
 
         /*
-         * This is cases 1 or 3 (depending on bs): if v->processor is (still)
+         * This is cases 1 or 3 (depending on bs): if processor is (still)
          * in our affinity, go for it, for cache betterness.
          */
         if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
@@ -729,7 +729,7 @@ static int get_fallback_cpu(struct csched2_unit *svc)
          * We may well pick any valid pcpu from our soft-affinity, outside
          * of our current runqueue, but we decide not to. In fact, changing
          * runqueue is slow, affects load distribution, and is a source of
-         * overhead for the vcpus running on the other runqueue (we need the
+         * overhead for the units running on the other runqueue (we need the
          * lock). So, better do that as a consequence of a well informed
          * decision (or if we really don't have any other chance, as we will,
          * at step 5, if we get to there).
@@ -761,7 +761,7 @@ static int get_fallback_cpu(struct csched2_unit *svc)
      * We can't be here.  But if that somehow happen (in non-debug builds),
      * at least return something which both online and in our hard-affinity.
      */
-    return cpumask_any(cpumask_scratch_cpu(v->processor));
+    return cpumask_any(cpumask_scratch_cpu(sched_unit_cpu(unit)));
 }
 
 /*
@@ -790,7 +790,7 @@ static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct c
  * Runqueue related code.
  */
 
-static inline int vcpu_on_runq(struct csched2_unit *svc)
+static inline int unit_on_runq(struct csched2_unit *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
@@ -948,17 +948,17 @@ _runq_assign(struct csched2_unit *svc, struct csched2_runqueue_data *rqd)
 
     update_max_weight(svc->rqd, svc->weight, 0);
 
-    /* Expected new load based on adding this vcpu */
+    /* Expected new load based on adding this unit */
     rqd->b_avgload += svc->avgload;
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned rqi:16;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.rqi=rqd->id;
         __trace_var(TRC_CSCHED2_RUNQ_ASSIGN, 1,
                     sizeof(d),
@@ -968,13 +968,13 @@ _runq_assign(struct csched2_unit *svc, struct csched2_runqueue_data *rqd)
 }
 
 static void
-runq_assign(const struct scheduler *ops, struct vcpu *vc)
+runq_assign(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct csched2_unit *svc = vc->sched_unit->priv;
+    struct csched2_unit *svc = unit->priv;
 
     ASSERT(svc->rqd == NULL);
 
-    _runq_assign(svc, c2rqd(ops, vc->processor));
+    _runq_assign(svc, c2rqd(ops, sched_unit_cpu(unit)));
 }
 
 static void
@@ -982,24 +982,24 @@ _runq_deassign(struct csched2_unit *svc)
 {
     struct csched2_runqueue_data *rqd = svc->rqd;
 
-    ASSERT(!vcpu_on_runq(svc));
+    ASSERT(!unit_on_runq(svc));
     ASSERT(!(svc->flags & CSFLAG_scheduled));
 
     list_del_init(&svc->rqd_elem);
     update_max_weight(rqd, 0, svc->weight);
 
-    /* Expected new load based on removing this vcpu */
+    /* Expected new load based on removing this unit */
     rqd->b_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
 
     svc->rqd = NULL;
 }
 
 static void
-runq_deassign(const struct scheduler *ops, struct vcpu *vc)
+runq_deassign(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct csched2_unit *svc = vc->sched_unit->priv;
+    struct csched2_unit *svc = unit->priv;
 
-    ASSERT(svc->rqd == c2rqd(ops, vc->processor));
+    ASSERT(svc->rqd == c2rqd(ops, sched_unit_cpu(unit)));
 
     _runq_deassign(svc);
 }
@@ -1202,15 +1202,15 @@ update_svc_load(const struct scheduler *ops,
                 struct csched2_unit *svc, int change, s_time_t now)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    s_time_t delta, vcpu_load;
+    s_time_t delta, unit_load;
     unsigned int P, W;
 
     if ( change == -1 )
-        vcpu_load = 1;
+        unit_load = 1;
     else if ( change == 1 )
-        vcpu_load = 0;
+        unit_load = 0;
     else
-        vcpu_load = vcpu_runnable(svc->vcpu);
+        unit_load = unit_runnable(svc->unit);
 
     W = prv->load_window_shift;
     P = prv->load_precision_shift;
@@ -1218,7 +1218,7 @@ update_svc_load(const struct scheduler *ops,
 
     if ( svc->load_last_update + (1ULL << W) < now )
     {
-        svc->avgload = vcpu_load << P;
+        svc->avgload = unit_load << P;
     }
     else
     {
@@ -1231,7 +1231,7 @@ update_svc_load(const struct scheduler *ops,
         }
 
         svc->avgload = svc->avgload +
-                       ((delta * (vcpu_load << P)) >> W) -
+                       ((delta * (unit_load << P)) >> W) -
                        ((delta * svc->avgload) >> W);
     }
     svc->load_last_update = now;
@@ -1243,14 +1243,14 @@ update_svc_load(const struct scheduler *ops,
     {
         struct {
             uint64_t v_avgload;
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned shift;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.v_avgload = svc->avgload;
         d.shift = P;
-        __trace_var(TRC_CSCHED2_UPDATE_VCPU_LOAD, 1,
+        __trace_var(TRC_CSCHED2_UPDATE_UNIT_LOAD, 1,
                     sizeof(d),
                     (unsigned char *)&d);
     }
@@ -1272,18 +1272,18 @@ static void
 runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
 {
     struct list_head *iter;
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_unit_cpu(svc->unit);
     struct list_head * runq = &c2rqd(ops, cpu)->runq;
     int pos = 0;
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
-    ASSERT(!vcpu_on_runq(svc));
-    ASSERT(c2r(cpu) == c2r(svc->vcpu->processor));
+    ASSERT(!unit_on_runq(svc));
+    ASSERT(c2r(cpu) == c2r(sched_unit_cpu(svc->unit)));
 
     ASSERT(&svc->rqd->runq == runq);
-    ASSERT(!is_idle_vcpu(svc->vcpu));
-    ASSERT(!svc->vcpu->sched_unit->is_running);
+    ASSERT(!is_idle_unit(svc->unit));
+    ASSERT(!svc->unit->is_running);
     ASSERT(!(svc->flags & CSFLAG_scheduled));
 
     list_for_each( iter, runq )
@@ -1300,11 +1300,11 @@ runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned pos;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.pos = pos;
         __trace_var(TRC_CSCHED2_RUNQ_POS, 1,
                     sizeof(d),
@@ -1314,7 +1314,7 @@ runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
 
 static inline void runq_remove(struct csched2_unit *svc)
 {
-    ASSERT(vcpu_on_runq(svc));
+    ASSERT(unit_on_runq(svc));
     list_del_init(&svc->runq_elem);
 }
 
@@ -1340,8 +1340,8 @@ static inline bool is_preemptable(const struct csched2_unit *svc,
     if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
         return true;
 
-    ASSERT(svc->vcpu->sched_unit->is_running);
-    return now - svc->vcpu->sched_unit->state_entry_time >
+    ASSERT(svc->unit->is_running);
+    return now - svc->unit->state_entry_time >
            ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE;
 }
 
@@ -1369,17 +1369,17 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
 
     /*
      * We are dealing with cpus that are marked non-idle (i.e., that are not
-     * in rqd->idle). However, some of them may be running their idle vcpu,
+     * in rqd->idle). However, some of them may be running their idle unit,
      * if taking care of tasklets. In that case, we want to leave it alone.
      */
-    if ( unlikely(is_idle_vcpu(cur->vcpu) ||
+    if ( unlikely(is_idle_unit(cur->unit) ||
          !is_preemptable(cur, now, MICROSECS(prv->ratelimit_us))) )
         return -1;
 
     burn_credits(rqd, cur, now);
 
     score = new->credit - cur->credit;
-    if ( new->vcpu->processor != cpu )
+    if ( sched_unit_cpu(new->unit) != cpu )
         score -= CSCHED2_MIGRATE_RESIST;
 
     /*
@@ -1390,21 +1390,21 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
      */
     if ( score > 0 )
     {
-        if ( cpumask_test_cpu(cpu, new->vcpu->sched_unit->cpu_soft_affinity) )
+        if ( cpumask_test_cpu(cpu, new->unit->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
 
-        if ( !cpumask_test_cpu(cpu, cur->vcpu->sched_unit->cpu_soft_affinity) )
+        if ( !cpumask_test_cpu(cpu, cur->unit->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
     }
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             int credit, score;
         } d;
-        d.dom = cur->vcpu->domain->domain_id;
-        d.vcpu = cur->vcpu->vcpu_id;
+        d.dom = cur->unit->domain->domain_id;
+        d.unit = cur->unit->unit_id;
         d.credit = cur->credit;
         d.score = score;
         __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
@@ -1416,14 +1416,14 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
 }
 
 /*
- * Check what processor it is best to 'wake', for picking up a vcpu that has
+ * Check what processor it is best to 'wake', for picking up an unit that has
  * just been put (back) in the runqueue. Logic is as follows:
  *  1. if there are idle processors in the runq, wake one of them;
- *  2. if there aren't idle processor, check the one were the vcpu was
+ *  2. if there aren't idle processor, check the one were the unit was
  *     running before to see if we can preempt what's running there now
  *     (and hence doing just one migration);
- *  3. last stand: check all processors and see if the vcpu is in right
- *     of preempting any of the other vcpus running on them (this requires
+ *  3. last stand: check all processors and see if the unit is in right
+ *     of preempting any of the other units running on them (this requires
  *     two migrations, and that's indeed why it is left as the last stand).
  *
  * Note that when we say 'idle processors' what we really mean is (pretty
@@ -1436,10 +1436,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t max = 0;
-    struct sched_unit *unit = new->vcpu->sched_unit;
-    unsigned int bs, cpu = new->vcpu->processor;
+    struct sched_unit *unit = new->unit;
+    unsigned int bs, cpu = sched_unit_cpu(unit);
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
-    cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain);
+    cpumask_t *online = cpupool_domain_cpumask(unit->domain);
     cpumask_t mask;
 
     ASSERT(new->rqd == rqd);
@@ -1447,13 +1447,13 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned processor;
             int credit;
         } d;
-        d.dom = new->vcpu->domain->domain_id;
-        d.vcpu = new->vcpu->vcpu_id;
-        d.processor = new->vcpu->processor;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
+        d.processor = cpu;
         d.credit = new->credit;
         __trace_var(TRC_CSCHED2_TICKLE_NEW, 1,
                     sizeof(d),
@@ -1461,11 +1461,11 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
     }
 
     /*
-     * Exclusive pinning is when a vcpu has hard-affinity with only one
-     * cpu, and there is no other vcpu that has hard-affinity with that
+     * Exclusive pinning is when an unit has hard-affinity with only one
+     * cpu, and there is no other unit that has hard-affinity with that
      * same cpu. This is infrequent, but if it happens, is for achieving
      * the most possible determinism, and least possible overhead for
-     * the vcpus in question.
+     * the units in question.
      *
      * Try to identify the vast majority of these situations, and deal
      * with them quickly.
@@ -1532,7 +1532,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
     /*
      * Note that, if we are here, it means we have done the hard-affinity
      * balancing step of the loop, and hence what we have in cpumask_scratch
-     * is what we put there for last, i.e., new's vcpu_hard_affinity & online
+     * is what we put there for last, i.e., new's unit_hard_affinity & online
      * which is exactly what we need for the next part of the function.
      */
 
@@ -1543,7 +1543,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
      *
      * For deciding which cpu to tickle, we use tickle_score(), which will
      * factor in both new's soft-affinity, and the soft-affinity of the
-     * vcpu running on each cpu that we consider.
+     * unit running on each cpu that we consider.
      */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
@@ -1588,7 +1588,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
         return;
     }
 
-    ASSERT(!is_idle_vcpu(curr_on_cpu(ipid)->vcpu));
+    ASSERT(!is_idle_unit(curr_on_cpu(ipid)));
     SCHED_STAT_CRANK(tickled_busy_cpu);
  tickle:
     BUG_ON(ipid == -1);
@@ -1623,16 +1623,16 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
 
     /*
      * Under normal circumstances, snext->credit should never be less
-     * than -CSCHED2_MIN_TIMER.  However, under some circumstances, a
-     * vcpu with low credits may be allowed to run long enough that
+     * than -CSCHED2_MIN_TIMER.  However, under some circumstances, an
+     * unit with low credits may be allowed to run long enough that
      * its credits are actually less than -CSCHED2_CREDIT_INIT.
-     * (Instances have been observed, for example, where a vcpu with
+     * (Instances have been observed, for example, where an unit with
      * 200us of credit was allowed to run for 11ms, giving it -10.8ms
      * of credit.  Thus it was still negative even after the reset.)
      *
      * If this is the case for snext, we simply want to keep moving
      * everyone up until it is in the black again.  This fair because
-     * none of the other vcpus want to run at the moment.
+     * none of the other units want to run at the moment.
      *
      * Rather than looping, however, we just calculate a multiplier,
      * avoiding an integer division and multiplication in the common
@@ -1649,16 +1649,16 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
         int start_credit;
 
         svc = list_entry(iter, struct csched2_unit, rqd_elem);
-        svc_cpu = svc->vcpu->processor;
+        svc_cpu = sched_unit_cpu(svc->unit);
 
-        ASSERT(!is_idle_vcpu(svc->vcpu));
+        ASSERT(!is_idle_unit(svc->unit));
         ASSERT(svc->rqd == rqd);
 
         /*
          * If svc is running, it is our responsibility to make sure, here,
          * that the credit it has spent so far get accounted.
          */
-        if ( svc->vcpu == curr_on_cpu(svc_cpu)->vcpu )
+        if ( svc->unit == curr_on_cpu(svc_cpu) )
         {
             burn_credits(rqd, svc, now);
             /*
@@ -1689,12 +1689,12 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned unit:16, dom:16;
                 int credit_start, credit_end;
                 unsigned multiplier;
             } d;
-            d.dom = svc->vcpu->domain->domain_id;
-            d.vcpu = svc->vcpu->vcpu_id;
+            d.dom = svc->unit->domain->domain_id;
+            d.unit = svc->unit->unit_id;
             d.credit_start = start_credit;
             d.credit_end = svc->credit;
             d.multiplier = m;
@@ -1714,9 +1714,9 @@ void burn_credits(struct csched2_runqueue_data *rqd,
 {
     s_time_t delta;
 
-    ASSERT(svc == csched2_unit(curr_on_cpu(svc->vcpu->processor)));
+    ASSERT(svc == csched2_unit(curr_on_cpu(sched_unit_cpu(svc->unit))));
 
-    if ( unlikely(is_idle_vcpu(svc->vcpu)) )
+    if ( unlikely(is_idle_unit(svc->unit)) )
     {
         ASSERT(svc->credit == CSCHED2_IDLE_CREDIT);
         return;
@@ -1745,12 +1745,12 @@ void burn_credits(struct csched2_runqueue_data *rqd,
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             int credit, budget;
             int delta;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.credit = svc->credit;
         d.budget = has_cap(svc) ?  svc->budget : INT_MIN;
         d.delta = delta;
@@ -1764,39 +1764,39 @@ void burn_credits(struct csched2_runqueue_data *rqd,
  * Budget-related code.
  */
 
-static void park_vcpu(struct csched2_unit *svc)
+static void park_unit(struct csched2_unit *svc)
 {
-    struct vcpu *v = svc->vcpu;
+    struct sched_unit *unit = svc->unit;
 
     ASSERT(spin_is_locked(&svc->sdom->budget_lock));
 
     /*
-     * It was impossible to find budget for this vCPU, so it has to be
+     * It was impossible to find budget for this unit, so it has to be
      * "parked". This implies it is not runnable, so we mark it as such in
-     * its pause_flags. If the vCPU is currently scheduled (which means we
+     * its pause_flags. If the unit is currently scheduled (which means we
      * are here after being called from within csched_schedule()), flagging
      * is enough, as we'll choose someone else, and then context_saved()
      * will take care of updating the load properly.
      *
-     * If, OTOH, the vCPU is sitting in the runqueue (which means we are here
+     * If, OTOH, the unit is sitting in the runqueue (which means we are here
      * after being called from within runq_candidate()), we must go all the
      * way down to taking it out of there, and updating the load accordingly.
      *
-     * In both cases, we also add it to the list of parked vCPUs of the domain.
+     * In both cases, we also add it to the list of parked units of the domain.
      */
-    __set_bit(_VPF_parked, &v->pause_flags);
-    if ( vcpu_on_runq(svc) )
+    sched_set_pause_flags(unit, _VPF_parked);
+    if ( unit_on_runq(svc) )
     {
         runq_remove(svc);
         update_load(svc->sdom->dom->cpupool->sched, svc->rqd, svc, -1, NOW());
     }
-    list_add(&svc->parked_elem, &svc->sdom->parked_vcpus);
+    list_add(&svc->parked_elem, &svc->sdom->parked_units);
 }
 
-static bool vcpu_grab_budget(struct csched2_unit *svc)
+static bool unit_grab_budget(struct csched2_unit *svc)
 {
     struct csched2_dom *sdom = svc->sdom;
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_unit_cpu(svc->unit);
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
@@ -1808,9 +1808,9 @@ static bool vcpu_grab_budget(struct csched2_unit *svc)
 
     /*
      * Here, svc->budget is <= 0 (as, if it was > 0, we'd have taken the if
-     * above!). That basically means the vCPU has overrun a bit --because of
+     * above!). That basically means the unit has overrun a bit --because of
      * various reasons-- and we want to take that into account. With the +=,
-     * we are actually subtracting the amount of budget the vCPU has
+     * we are actually subtracting the amount of budget the unit has
      * overconsumed, from the total domain budget.
      */
     sdom->budget += svc->budget;
@@ -1831,7 +1831,7 @@ static bool vcpu_grab_budget(struct csched2_unit *svc)
     else
     {
         svc->budget = 0;
-        park_vcpu(svc);
+        park_unit(svc);
     }
 
     spin_unlock(&sdom->budget_lock);
@@ -1840,10 +1840,10 @@ static bool vcpu_grab_budget(struct csched2_unit *svc)
 }
 
 static void
-vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked)
+unit_return_budget(struct csched2_unit *svc, struct list_head *parked)
 {
     struct csched2_dom *sdom = svc->sdom;
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_unit_cpu(svc->unit);
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
     ASSERT(list_empty(parked));
@@ -1852,7 +1852,7 @@ vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked)
     spin_lock(&sdom->budget_lock);
 
     /*
-     * The vCPU is stopping running (e.g., because it's blocking, or it has
+     * The unit is stopping running (e.g., because it's blocking, or it has
      * been preempted). If it hasn't consumed all the budget it got when,
      * starting to run, put that remaining amount back in the domain's budget
      * pool.
@@ -1861,58 +1861,58 @@ vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked)
     svc->budget = 0;
 
     /*
-     * Making budget available again to the domain means that parked vCPUs
-     * may be unparked and run. They are, if any, in the domain's parked_vcpus
+     * Making budget available again to the domain means that parked units
+     * may be unparked and run. They are, if any, in the domain's parked_units
      * list, so we want to go through that and unpark them (so they can try
      * to get some budget).
      *
      * Touching the list requires the budget_lock, which we hold. Let's
      * therefore put everyone in that list in another, temporary list, which
-     * then the caller will traverse, unparking the vCPUs it finds there.
+     * then the caller will traverse, unparking the units it finds there.
      *
      * In fact, we can't do the actual unparking here, because that requires
-     * taking the runqueue lock of the vCPUs being unparked, and we can't
+     * taking the runqueue lock of the units being unparked, and we can't
      * take any runqueue locks while we hold a budget_lock.
      */
     if ( sdom->budget > 0 )
-        list_splice_init(&sdom->parked_vcpus, parked);
+        list_splice_init(&sdom->parked_units, parked);
 
     spin_unlock(&sdom->budget_lock);
 }
 
 static void
-unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
+unpark_parked_units(const struct scheduler *ops, struct list_head *units)
 {
     struct csched2_unit *svc, *tmp;
     spinlock_t *lock;
 
-    list_for_each_entry_safe(svc, tmp, vcpus, parked_elem)
+    list_for_each_entry_safe ( svc, tmp, units, parked_elem )
     {
         unsigned long flags;
         s_time_t now;
 
-        lock = unit_schedule_lock_irqsave(svc->vcpu->sched_unit, &flags);
+        lock = unit_schedule_lock_irqsave(svc->unit, &flags);
 
-        __clear_bit(_VPF_parked, &svc->vcpu->pause_flags);
+        sched_clear_pause_flags(svc->unit, _VPF_parked);
         if ( unlikely(svc->flags & CSFLAG_scheduled) )
         {
             /*
              * We end here if a budget replenishment arrived between
              * csched2_schedule() (and, in particular, after a call to
-             * vcpu_grab_budget() that returned false), and
+             * unit_grab_budget() that returned false), and
              * context_saved(). By setting __CSFLAG_delayed_runq_add,
-             * we tell context_saved() to put the vCPU back in the
+             * we tell context_saved() to put the unit back in the
              * runqueue, from where it will compete with the others
              * for the newly replenished budget.
              */
             ASSERT( svc->rqd != NULL );
-            ASSERT( c2rqd(ops, svc->vcpu->processor) == svc->rqd );
+            ASSERT( c2rqd(ops, sched_unit_cpu(svc->unit)) == svc->rqd );
             __set_bit(__CSFLAG_delayed_runq_add, &svc->flags);
         }
-        else if ( vcpu_runnable(svc->vcpu) )
+        else if ( unit_runnable(svc->unit) )
         {
             /*
-             * The vCPU should go back to the runqueue, and compete for
+             * The unit should go back to the runqueue, and compete for
              * the newly replenished budget, but only if it is actually
              * runnable (and was therefore offline only because of the
              * lack of budget).
@@ -1924,7 +1924,7 @@ unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
         }
         list_del_init(&svc->parked_elem);
 
-        unit_schedule_unlock_irqrestore(lock, flags, svc->vcpu->sched_unit);
+        unit_schedule_unlock_irqrestore(lock, flags, svc->unit);
     }
 }
 
@@ -1954,7 +1954,7 @@ static void replenish_domain_budget(void* data)
      *
      * Even in cases of overrun or delay, however, we expect that in 99% of
      * cases, doing just one replenishment will be good enough for being able
-     * to unpark the vCPUs that are waiting for some budget.
+     * to unpark the units that are waiting for some budget.
      */
     do_replenish(sdom);
 
@@ -1974,7 +1974,7 @@ static void replenish_domain_budget(void* data)
     }
     /*
      * 2) if we overrun by more than tot_budget, then budget+tot_budget is
-     * still < 0, which means that we can't unpark the vCPUs. Let's bail,
+     * still < 0, which means that we can't unpark the units. Let's bail,
      * and wait for future replenishments.
      */
     if ( unlikely(sdom->budget <= 0) )
@@ -1988,14 +1988,14 @@ static void replenish_domain_budget(void* data)
 
     /*
      * As above, let's prepare the temporary list, out of the domain's
-     * parked_vcpus list, now that we hold the budget_lock. Then, drop such
+     * parked_units list, now that we hold the budget_lock. Then, drop such
      * lock, and pass the list to the unparking function.
      */
-    list_splice_init(&sdom->parked_vcpus, &parked);
+    list_splice_init(&sdom->parked_units, &parked);
 
     spin_unlock_irqrestore(&sdom->budget_lock, flags);
 
-    unpark_parked_vcpus(sdom->dom->cpupool->sched, &parked);
+    unpark_parked_units(sdom->dom->cpupool->sched, &parked);
 
  out:
     set_timer(&sdom->repl_timer, sdom->next_repl);
@@ -2003,37 +2003,36 @@ static void replenish_domain_budget(void* data)
 
 #ifndef NDEBUG
 static inline void
-csched2_vcpu_check(struct vcpu *vc)
+csched2_unit_check(struct sched_unit *unit)
 {
-    struct csched2_unit * const svc = csched2_unit(vc->sched_unit);
+    struct csched2_unit * const svc = csched2_unit(unit);
     struct csched2_dom * const sdom = svc->sdom;
 
-    BUG_ON( svc->vcpu != vc );
-    BUG_ON( sdom != csched2_dom(vc->domain) );
+    BUG_ON( svc->unit != unit );
+    BUG_ON( sdom != csched2_dom(unit->domain) );
     if ( sdom )
     {
-        BUG_ON( is_idle_vcpu(vc) );
-        BUG_ON( sdom->dom != vc->domain );
+        BUG_ON( is_idle_unit(unit) );
+        BUG_ON( sdom->dom != unit->domain );
     }
     else
     {
-        BUG_ON( !is_idle_vcpu(vc) );
+        BUG_ON( !is_idle_unit(unit) );
     }
     SCHED_STAT_CRANK(unit_check);
 }
-#define CSCHED2_VCPU_CHECK(_vc)  (csched2_vcpu_check(_vc))
+#define CSCHED2_UNIT_CHECK(unit)  (csched2_unit_check(unit))
 #else
-#define CSCHED2_VCPU_CHECK(_vc)
+#define CSCHED2_UNIT_CHECK(unit)
 #endif
 
 static void *
 csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
                     void *dd)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit *svc;
 
-    /* Allocate per-VCPU info */
+    /* Allocate per-UNIT info */
     svc = xzalloc(struct csched2_unit);
     if ( svc == NULL )
         return NULL;
@@ -2042,10 +2041,10 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
     INIT_LIST_HEAD(&svc->runq_elem);
 
     svc->sdom = dd;
-    svc->vcpu = vc;
+    svc->unit = unit;
     svc->flags = 0U;
 
-    if ( ! is_idle_vcpu(vc) )
+    if ( ! is_idle_unit(unit) )
     {
         ASSERT(svc->sdom != NULL);
         svc->credit = CSCHED2_CREDIT_INIT;
@@ -2074,19 +2073,18 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
 static void
 csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit * const svc = csched2_unit(unit);
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_unit(unit));
     SCHED_STAT_CRANK(unit_sleep);
 
-    if ( curr_on_cpu(vc->processor) == unit )
+    if ( curr_on_cpu(sched_unit_cpu(unit)) == unit )
     {
-        tickle_cpu(vc->processor, svc->rqd);
+        tickle_cpu(sched_unit_cpu(unit), svc->rqd);
     }
-    else if ( vcpu_on_runq(svc) )
+    else if ( unit_on_runq(svc) )
     {
-        ASSERT(svc->rqd == c2rqd(ops, vc->processor));
+        ASSERT(svc->rqd == c2rqd(ops, sched_unit_cpu(unit)));
         update_load(ops, svc->rqd, svc, -1, NOW());
         runq_remove(svc);
     }
@@ -2097,14 +2095,13 @@ csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 static void
 csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit * const svc = csched2_unit(unit);
-    unsigned int cpu = vc->processor;
+    unsigned int cpu = sched_unit_cpu(unit);
     s_time_t now;
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_unit(unit));
 
     if ( unlikely(curr_on_cpu(cpu) == unit) )
     {
@@ -2112,18 +2109,18 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
         goto out;
     }
 
-    if ( unlikely(vcpu_on_runq(svc)) )
+    if ( unlikely(unit_on_runq(svc)) )
     {
         SCHED_STAT_CRANK(unit_wake_onrunq);
         goto out;
     }
 
-    if ( likely(vcpu_runnable(vc)) )
+    if ( likely(unit_runnable(unit)) )
         SCHED_STAT_CRANK(unit_wake_runnable);
     else
         SCHED_STAT_CRANK(unit_wake_not_runnable);
 
-    /* If the context hasn't been saved for this vcpu yet, we can't put it on
+    /* If the context hasn't been saved for this unit yet, we can't put it on
      * another runqueue.  Instead, we set a flag so that it will be put on the runqueue
      * after the context has been saved. */
     if ( unlikely(svc->flags & CSFLAG_scheduled) )
@@ -2134,15 +2131,15 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
     /* Add into the new runqueue if necessary */
     if ( svc->rqd == NULL )
-        runq_assign(ops, vc);
+        runq_assign(ops, unit);
     else
-        ASSERT(c2rqd(ops, vc->processor) == svc->rqd );
+        ASSERT(c2rqd(ops, sched_unit_cpu(unit)) == svc->rqd );
 
     now = NOW();
 
     update_load(ops, svc->rqd, svc, 1, now);
-        
-    /* Put the VCPU on the runq */
+
+    /* Put the UNIT on the runq */
     runq_insert(ops, svc);
     runq_tickle(ops, svc, now);
 
@@ -2155,49 +2152,48 @@ csched2_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched2_unit * const svc = csched2_unit(unit);
 
-    __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
+    __set_bit(__CSFLAG_unit_yield, &svc->flags);
 }
 
 static void
 csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit * const svc = csched2_unit(unit);
     spinlock_t *lock = unit_schedule_lock_irq(unit);
     s_time_t now = NOW();
     LIST_HEAD(were_parked);
 
-    BUG_ON( !is_idle_vcpu(vc) && svc->rqd != c2rqd(ops, vc->processor));
-    ASSERT(is_idle_vcpu(vc) || svc->rqd == c2rqd(ops, vc->processor));
+    BUG_ON( !is_idle_unit(unit) && svc->rqd != c2rqd(ops, sched_unit_cpu(unit)));
+    ASSERT(is_idle_unit(unit) || svc->rqd == c2rqd(ops, sched_unit_cpu(unit)));
 
-    /* This vcpu is now eligible to be put on the runqueue again */
+    /* This unit is now eligible to be put on the runqueue again */
     __clear_bit(__CSFLAG_scheduled, &svc->flags);
 
     if ( unlikely(has_cap(svc) && svc->budget > 0) )
-        vcpu_return_budget(svc, &were_parked);
+        unit_return_budget(svc, &were_parked);
 
     /* If someone wants it on the runqueue, put it there. */
     /*
      * NB: We can get rid of CSFLAG_scheduled by checking for
-     * vc->is_running and vcpu_on_runq(svc) here.  However,
+     * vc->is_running and unit_on_runq(svc) here.  However,
      * since we're accessing the flags cacheline anyway,
      * it seems a bit pointless; especially as we have plenty of
      * bits free.
      */
     if ( __test_and_clear_bit(__CSFLAG_delayed_runq_add, &svc->flags)
-         && likely(vcpu_runnable(vc)) )
+         && likely(unit_runnable(unit)) )
     {
-        ASSERT(!vcpu_on_runq(svc));
+        ASSERT(!unit_on_runq(svc));
 
         runq_insert(ops, svc);
         runq_tickle(ops, svc, now);
     }
-    else if ( !is_idle_vcpu(vc) )
+    else if ( !is_idle_unit(unit) )
         update_load(ops, svc->rqd, svc, -1, now);
 
     unit_schedule_unlock_irq(lock, unit);
 
-    unpark_parked_vcpus(ops, &were_parked);
+    unpark_parked_units(ops, &were_parked);
 }
 
 #define MAX_LOAD (STIME_MAX)
@@ -2205,9 +2201,8 @@ static struct sched_resource *
 csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    struct vcpu *vc = unit->vcpu;
     int i, min_rqi = -1, min_s_rqi = -1;
-    unsigned int new_cpu, cpu = vc->processor;
+    unsigned int new_cpu, cpu = sched_unit_cpu(unit);
     struct csched2_unit *svc = csched2_unit(unit);
     s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
     bool has_soft;
@@ -2245,7 +2240,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
     }
 
     cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                cpupool_domain_cpumask(vc->domain));
+                cpupool_domain_cpumask(unit->domain));
 
     /*
      * First check to see if we're here because someone else suggested a place
@@ -2356,7 +2351,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
          * We have soft affinity, and we have a candidate runq, so go for it.
          *
          * Note that, to obtain the soft-affinity mask, we "just" put what we
-         * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
+         * have in cpumask_scratch in && with unit->cpu_soft_affinity. This is
          * ok because:
          * - we know that unit->cpu_hard_affinity and ->cpu_soft_affinity have
          *   a non-empty intersection (because has_soft is true);
@@ -2379,7 +2374,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
          * any suitable runq. But we did find one when considering hard
          * affinity, so go for it.
          *
-         * cpumask_scratch already has vc->cpu_hard_affinity &
+         * cpumask_scratch already has unit->cpu_hard_affinity &
          * cpupool_domain_cpumask() in it, so it's enough that we filter
          * with the cpus of the runq.
          */
@@ -2410,11 +2405,11 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
     {
         struct {
             uint64_t b_avgload;
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned rq_id:16, new_cpu:16;
         } d;
-        d.dom = vc->domain->domain_id;
-        d.vcpu = vc->vcpu_id;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
         d.rq_id = min_rqi;
         d.b_avgload = min_avgload;
         d.new_cpu = new_cpu;
@@ -2433,10 +2428,10 @@ typedef struct {
     struct csched2_unit * best_push_svc, *best_pull_svc;
     /* NB: Read by consider() */
     struct csched2_runqueue_data *lrqd;
-    struct csched2_runqueue_data *orqd;                  
+    struct csched2_runqueue_data *orqd;
 } balance_state_t;
 
-static void consider(balance_state_t *st, 
+static void consider(balance_state_t *st,
                      struct csched2_unit *push_svc,
                      struct csched2_unit *pull_svc)
 {
@@ -2475,17 +2470,17 @@ static void migrate(const struct scheduler *ops,
                     struct csched2_runqueue_data *trqd,
                     s_time_t now)
 {
-    int cpu = svc->vcpu->processor;
-    struct sched_unit *unit = svc->vcpu->sched_unit;
+    struct sched_unit *unit = svc->unit;
+    int cpu = sched_unit_cpu(unit);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned rqi:16, trqi:16;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
         d.rqi = svc->rqd->id;
         d.trqi = trqd->id;
         __trace_var(TRC_CSCHED2_MIGRATE, 1,
@@ -2497,7 +2492,7 @@ static void migrate(const struct scheduler *ops,
     {
         /* It's running; mark it to migrate. */
         svc->migrate_rqd = trqd;
-        __set_bit(_VPF_migrating, &svc->vcpu->pause_flags);
+        sched_set_pause_flags(unit, _VPF_migrating);
         __set_bit(__CSFLAG_runq_migrate_request, &svc->flags);
         SCHED_STAT_CRANK(migrate_requested);
         tickle_cpu(cpu, svc->rqd);
@@ -2506,7 +2501,7 @@ static void migrate(const struct scheduler *ops,
     {
         int on_runq = 0;
         /* It's not running; just move it */
-        if ( vcpu_on_runq(svc) )
+        if ( unit_on_runq(svc) )
         {
             runq_remove(svc);
             update_load(ops, svc->rqd, NULL, -1, now);
@@ -2515,14 +2510,14 @@ static void migrate(const struct scheduler *ops,
         _runq_deassign(svc);
 
         cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                    cpupool_domain_cpumask(svc->vcpu->domain));
+                    cpupool_domain_cpumask(unit->domain));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
-        svc->vcpu->processor = cpumask_cycle(trqd->pick_bias,
-                                             cpumask_scratch_cpu(cpu));
-        svc->vcpu->sched_unit->res = get_sched_res(svc->vcpu->processor);
-        trqd->pick_bias = svc->vcpu->processor;
-        ASSERT(svc->vcpu->processor < nr_cpu_ids);
+        sched_set_res(unit,
+                      get_sched_res(cpumask_cycle(trqd->pick_bias,
+                                                  cpumask_scratch_cpu(cpu))));
+        trqd->pick_bias = sched_unit_cpu(unit);
+        ASSERT(sched_unit_cpu(unit) < nr_cpu_ids);
 
         _runq_assign(svc, trqd);
         if ( on_runq )
@@ -2542,14 +2537,14 @@ static void migrate(const struct scheduler *ops,
  *  - svc is not already flagged to migrate,
  *  - if svc is allowed to run on at least one of the pcpus of rqd.
  */
-static bool vcpu_is_migrateable(struct csched2_unit *svc,
+static bool unit_is_migrateable(struct csched2_unit *svc,
                                   struct csched2_runqueue_data *rqd)
 {
-    struct vcpu *v = svc->vcpu;
-    int cpu = svc->vcpu->processor;
+    struct sched_unit *unit = svc->unit;
+    int cpu = sched_unit_cpu(unit);
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->sched_unit->cpu_hard_affinity,
-                cpupool_domain_cpumask(v->domain));
+    cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
+                cpupool_domain_cpumask(unit->domain));
 
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
            cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
@@ -2586,7 +2581,7 @@ retry:
     for_each_cpu(i, &prv->active_queues)
     {
         s_time_t delta;
-        
+
         st.orqd = prv->rqd + i;
 
         if ( st.orqd == st.lrqd
@@ -2594,7 +2589,7 @@ retry:
             continue;
 
         update_runq_load(ops, st.orqd, 0, now);
-    
+
         delta = st.lrqd->b_avgload - st.orqd->b_avgload;
         if ( delta < 0 )
             delta = -delta;
@@ -2617,7 +2612,7 @@ retry:
         s_time_t load_max;
         int cpus_max;
 
-        
+
         load_max = st.lrqd->b_avgload;
         if ( st.orqd->b_avgload > load_max )
             load_max = st.orqd->b_avgload;
@@ -2656,7 +2651,7 @@ retry:
                                            opt_overload_balance_tolerance)) )
                 goto out;
     }
-             
+
     /* Try to grab the other runqueue lock; if it's been taken in the
      * meantime, try the process over again.  This can't deadlock
      * because if it doesn't get any other rqd locks, it will simply
@@ -2696,17 +2691,17 @@ retry:
 
         update_svc_load(ops, push_svc, 0, now);
 
-        if ( !vcpu_is_migrateable(push_svc, st.orqd) )
+        if ( !unit_is_migrateable(push_svc, st.orqd) )
             continue;
 
         list_for_each( pull_iter, &st.orqd->svc )
         {
             struct csched2_unit * pull_svc = list_entry(pull_iter, struct csched2_unit, rqd_elem);
-            
+
             if ( !inner_load_updated )
                 update_svc_load(ops, pull_svc, 0, now);
-        
-            if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
+
+            if ( !unit_is_migrateable(pull_svc, st.lrqd) )
                 continue;
 
             consider(&st, push_svc, pull_svc);
@@ -2721,8 +2716,8 @@ retry:
     list_for_each( pull_iter, &st.orqd->svc )
     {
         struct csched2_unit * pull_svc = list_entry(pull_iter, struct csched2_unit, rqd_elem);
-        
-        if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
+
+        if ( !unit_is_migrateable(pull_svc, st.lrqd) )
             continue;
 
         /* Consider pull only */
@@ -2745,8 +2740,7 @@ static void
 csched2_unit_migrate(
     const struct scheduler *ops, struct sched_unit *unit, unsigned int new_cpu)
 {
-    struct vcpu *vc = unit->vcpu;
-    struct domain *d = vc->domain;
+    struct domain *d = unit->domain;
     struct csched2_unit * const svc = csched2_unit(unit);
     struct csched2_runqueue_data *trqd;
     s_time_t now = NOW();
@@ -2758,25 +2752,24 @@ csched2_unit_migrate(
      * cpupool.
      *
      * And since there indeed is the chance that it is not part of it, all
-     * we must do is remove _and_ unassign the vCPU from any runqueue, as
+     * we must do is remove _and_ unassign the unit from any runqueue, as
      * well as updating v->processor with the target, so that the suspend
      * process can continue.
      *
      * It will then be during resume that a new, meaningful, value for
      * v->processor will be chosen, and during actual domain unpause that
-     * the vCPU will be assigned to and added to the proper runqueue.
+     * the unit will be assigned to and added to the proper runqueue.
      */
     if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
     {
         ASSERT(system_state == SYS_STATE_suspend);
-        if ( vcpu_on_runq(svc) )
+        if ( unit_on_runq(svc) )
         {
             runq_remove(svc);
             update_load(ops, svc->rqd, NULL, -1, now);
         }
         _runq_deassign(svc);
-        vc->processor = new_cpu;
-        unit->res = get_sched_res(new_cpu);
+        sched_set_res(unit, get_sched_res(new_cpu));
         return;
     }
 
@@ -2790,17 +2783,14 @@ csched2_unit_migrate(
      * Do the actual movement toward new_cpu, and update vc->processor.
      * If we are changing runqueue, migrate() takes care of everything.
      * If we are not changing runqueue, we need to update vc->processor
-     * here. In fact, if, for instance, we are here because the vcpu's
+     * here. In fact, if, for instance, we are here because the unit's
      * hard affinity changed, we don't want to risk leaving vc->processor
      * pointing to a pcpu where we can't run any longer.
      */
     if ( trqd != svc->rqd )
         migrate(ops, svc, trqd, now);
     else
-    {
-        vc->processor = new_cpu;
-        unit->res = get_sched_res(new_cpu);
-    }
+        sched_set_res(unit, get_sched_res(new_cpu));
 }
 
 static int
@@ -2812,18 +2802,18 @@ csched2_dom_cntl(
     struct csched2_dom * const sdom = csched2_dom(d);
     struct csched2_private *prv = csched2_priv(ops);
     unsigned long flags;
-    struct vcpu *v;
+    struct sched_unit *unit;
     int rc = 0;
 
     /*
      * Locking:
      *  - we must take the private lock for accessing the weights of the
-     *    vcpus of d, and/or the cap;
+     *    units of d, and/or the cap;
      *  - in the putinfo case, we also need the runqueue lock(s), for
      *    updating the max waight of the runqueue(s).
      *    If changing the cap, we also need the budget_lock, for updating
      *    the value of the domain budget pool (and the runqueue lock,
-     *    for adjusting the parameters and rescheduling any vCPU that is
+     *    for adjusting the parameters and rescheduling any unit that is
      *    running at the time of the change).
      */
     switch ( op->cmd )
@@ -2845,18 +2835,18 @@ csched2_dom_cntl(
 
             sdom->weight = op->u.credit2.weight;
 
-            /* Update weights for vcpus, and max_weight for runqueues on which they reside */
-            for_each_vcpu ( d, v )
+            /* Update weights for units, and max_weight for runqueues on which they reside */
+            for_each_sched_unit ( d, unit )
             {
-                struct csched2_unit *svc = csched2_unit(v->sched_unit);
-                spinlock_t *lock = unit_schedule_lock(svc->vcpu->sched_unit);
+                struct csched2_unit *svc = csched2_unit(unit);
+                spinlock_t *lock = unit_schedule_lock(unit);
 
-                ASSERT(svc->rqd == c2rqd(ops, svc->vcpu->processor));
+                ASSERT(svc->rqd == c2rqd(ops, sched_unit_cpu(unit)));
 
                 svc->weight = sdom->weight;
                 update_max_weight(svc->rqd, svc->weight, old_weight);
 
-                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+                unit_schedule_unlock(lock, unit);
             }
         }
         /* Cap */
@@ -2865,8 +2855,8 @@ csched2_dom_cntl(
             struct csched2_unit *svc;
             spinlock_t *lock;
 
-            /* Cap is only valid if it's below 100 * nr_of_vCPUS */
-            if ( op->u.credit2.cap > 100 * sdom->nr_vcpus )
+            /* Cap is only valid if it's below 100 * nr_of_units */
+            if ( op->u.credit2.cap > 100 * sdom->nr_units )
             {
                 rc = -EINVAL;
                 write_unlock_irqrestore(&prv->lock, flags);
@@ -2879,23 +2869,23 @@ csched2_dom_cntl(
             spin_unlock(&sdom->budget_lock);
 
             /*
-             * When trying to get some budget and run, each vCPU will grab
-             * from the pool 1/N (with N = nr of vCPUs of the domain) of
-             * the total budget. Roughly speaking, this means each vCPU will
+             * When trying to get some budget and run, each unit will grab
+             * from the pool 1/N (with N = nr of units of the domain) of
+             * the total budget. Roughly speaking, this means each unit will
              * have at least one chance to run during every period.
              */
-            for_each_vcpu ( d, v )
+            for_each_sched_unit ( d, unit )
             {
-                svc = csched2_unit(v->sched_unit);
-                lock = unit_schedule_lock(svc->vcpu->sched_unit);
+                svc = csched2_unit(unit);
+                lock = unit_schedule_lock(unit);
                 /*
                  * Too small quotas would in theory cause a lot of overhead,
                  * which then won't happen because, in csched2_runtime(),
                  * CSCHED2_MIN_TIMER is what would be used anyway.
                  */
-                svc->budget_quota = max(sdom->tot_budget / sdom->nr_vcpus,
+                svc->budget_quota = max(sdom->tot_budget / sdom->nr_units,
                                         CSCHED2_MIN_TIMER);
-                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+                unit_schedule_unlock(lock, unit);
             }
 
             if ( sdom->cap == 0 )
@@ -2905,7 +2895,7 @@ csched2_dom_cntl(
                  * and queue its first replenishment event.
                  *
                  * Since cap is currently disabled for this domain, we
-                 * know no vCPU is messing with the domain's budget, and
+                 * know no unit is messing with the domain's budget, and
                  * the replenishment timer is still off.
                  * For these reasons, it is safe to do the following without
                  * taking the budget_lock.
@@ -2915,42 +2905,42 @@ csched2_dom_cntl(
                 set_timer(&sdom->repl_timer, sdom->next_repl);
 
                 /*
-                 * Now, let's enable budget accounting for all the vCPUs.
+                 * Now, let's enable budget accounting for all the units.
                  * For making sure that they will start to honour the domain's
                  * cap, we set their budget to 0.
                  * This way, as soon as they will try to run, they will have
                  * to get some budget.
                  *
-                 * For the vCPUs that are already running, we trigger the
+                 * For the units that are already running, we trigger the
                  * scheduler on their pCPU. When, as a consequence of this,
                  * csched2_schedule() will run, it will figure out there is
-                 * no budget, and the vCPU will try to get some (and be parked,
+                 * no budget, and the unit will try to get some (and be parked,
                  * if there's none, and we'll switch to someone else).
                  */
-                for_each_vcpu ( d, v )
+                for_each_sched_unit ( d, unit )
                 {
-                    svc = csched2_unit(v->sched_unit);
-                    lock = unit_schedule_lock(svc->vcpu->sched_unit);
-                    if ( v->sched_unit->is_running )
+                    svc = csched2_unit(unit);
+                    lock = unit_schedule_lock(unit);
+                    if ( unit->is_running )
                     {
-                        unsigned int cpu = v->processor;
+                        unsigned int cpu = sched_unit_cpu(unit);
                         struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
 
-                        ASSERT(curr_on_cpu(cpu)->vcpu == v);
+                        ASSERT(curr_on_cpu(cpu) == unit);
 
                         /*
-                         * We are triggering a reschedule on the vCPU's
+                         * We are triggering a reschedule on the unit's
                          * pCPU. That will run burn_credits() and, since
-                         * the vCPU is capped now, it would charge all the
+                         * the unit is capped now, it would charge all the
                          * execution time of this last round as budget as
-                         * well. That will make the vCPU budget go negative,
+                         * well. That will make the unit budget go negative,
                          * potentially by a large amount, and it's unfair.
                          *
                          * To avoid that, call burn_credit() here, to do the
                          * accounting of this current running instance now,
                          * with budgetting still disabled. This does not
                          * prevent some small amount of budget being charged
-                         * to the vCPU (i.e., the amount of time it runs from
+                         * to the unit (i.e., the amount of time it runs from
                          * now, to when scheduling happens). The budget will
                          * also go below 0, but a lot less than how it would
                          * if we don't do this.
@@ -2961,7 +2951,7 @@ csched2_dom_cntl(
                         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
                     }
                     svc->budget = 0;
-                    unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+                    unit_schedule_unlock(lock, unit);
                 }
             }
 
@@ -2973,30 +2963,30 @@ csched2_dom_cntl(
 
             stop_timer(&sdom->repl_timer);
 
-            /* Disable budget accounting for all the vCPUs. */
-            for_each_vcpu ( d, v )
+            /* Disable budget accounting for all the units. */
+            for_each_sched_unit ( d, unit )
             {
-                struct csched2_unit *svc = csched2_unit(v->sched_unit);
-                spinlock_t *lock = unit_schedule_lock(svc->vcpu->sched_unit);
+                struct csched2_unit *svc = csched2_unit(unit);
+                spinlock_t *lock = unit_schedule_lock(unit);
 
                 svc->budget = STIME_MAX;
                 svc->budget_quota = 0;
 
-                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+                unit_schedule_unlock(lock, unit);
             }
             sdom->cap = 0;
             /*
              * We are disabling the cap for this domain, which may have
-             * vCPUs waiting for a replenishment, so we unpark them all.
+             * units waiting for a replenishment, so we unpark them all.
              * Note that, since we have already disabled budget accounting
-             * for all the vCPUs of the domain, no currently running vCPU
-             * will be added to the parked vCPUs list any longer.
+             * for all the units of the domain, no currently running unit
+             * will be added to the parked units list any longer.
              */
             spin_lock(&sdom->budget_lock);
-            list_splice_init(&sdom->parked_vcpus, &parked);
+            list_splice_init(&sdom->parked_units, &parked);
             spin_unlock(&sdom->budget_lock);
 
-            unpark_parked_vcpus(ops, &parked);
+            unpark_parked_units(ops, &parked);
         }
         write_unlock_irqrestore(&prv->lock, flags);
         break;
@@ -3073,12 +3063,12 @@ csched2_alloc_domdata(const struct scheduler *ops, struct domain *dom)
     sdom->dom = dom;
     sdom->weight = CSCHED2_DEFAULT_WEIGHT;
     sdom->cap = 0U;
-    sdom->nr_vcpus = 0;
+    sdom->nr_units = 0;
 
     init_timer(&sdom->repl_timer, replenish_domain_budget, sdom,
                cpumask_any(cpupool_domain_cpumask(dom)));
     spin_lock_init(&sdom->budget_lock);
-    INIT_LIST_HEAD(&sdom->parked_vcpus);
+    INIT_LIST_HEAD(&sdom->parked_units);
 
     write_lock_irqsave(&prv->lock, flags);
 
@@ -3112,34 +3102,32 @@ csched2_free_domdata(const struct scheduler *ops, void *data)
 static void
 csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit *svc = unit->priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_unit(unit));
     ASSERT(list_empty(&svc->runq_elem));
 
     /* csched2_res_pick() expects the pcpu lock to be held */
     lock = unit_schedule_lock_irq(unit);
 
-    unit->res = csched2_res_pick(ops, unit);
-    vc->processor = unit->res->processor;
+    sched_set_res(unit, csched2_res_pick(ops, unit));
 
     spin_unlock_irq(lock);
 
     lock = unit_schedule_lock_irq(unit);
 
-    /* Add vcpu to runqueue of initial processor */
-    runq_assign(ops, vc);
+    /* Add unit to runqueue of initial processor */
+    runq_assign(ops, unit);
 
     unit_schedule_unlock_irq(lock, unit);
 
-    sdom->nr_vcpus++;
+    sdom->nr_units++;
 
     SCHED_STAT_CRANK(unit_insert);
 
-    CSCHED2_VCPU_CHECK(vc);
+    CSCHED2_UNIT_CHECK(unit);
 }
 
 static void
@@ -3153,11 +3141,10 @@ csched2_free_vdata(const struct scheduler *ops, void *priv)
 static void
 csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit * const svc = csched2_unit(unit);
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_unit(unit));
     ASSERT(list_empty(&svc->runq_elem));
 
     SCHED_STAT_CRANK(unit_remove);
@@ -3165,14 +3152,14 @@ csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     /* Remove from runqueue */
     lock = unit_schedule_lock_irq(unit);
 
-    runq_deassign(ops, vc);
+    runq_deassign(ops, unit);
 
     unit_schedule_unlock_irq(lock, unit);
 
-    svc->sdom->nr_vcpus--;
+    svc->sdom->nr_units--;
 }
 
-/* How long should we let this vcpu run for? */
+/* How long should we let this unit run for? */
 static s_time_t
 csched2_runtime(const struct scheduler *ops, int cpu,
                 struct csched2_unit *snext, s_time_t now)
@@ -3187,7 +3174,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
      * If we're idle, just stay so. Others (or external events)
      * will poke us when necessary.
      */
-    if ( is_idle_vcpu(snext->vcpu) )
+    if ( is_idle_unit(snext->unit) )
         return -1;
 
     /* General algorithm:
@@ -3204,8 +3191,8 @@ csched2_runtime(const struct scheduler *ops, int cpu,
     if ( prv->ratelimit_us )
     {
         s_time_t ratelimit_min = MICROSECS(prv->ratelimit_us);
-        if ( snext->vcpu->sched_unit->is_running )
-            ratelimit_min = snext->vcpu->sched_unit->state_entry_time +
+        if ( snext->unit->is_running )
+            ratelimit_min = snext->unit->state_entry_time +
                             MICROSECS(prv->ratelimit_us) - now;
         if ( ratelimit_min > min_time )
             min_time = ratelimit_min;
@@ -3222,7 +3209,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
     {
         struct csched2_unit *swait = runq_elem(runq->next);
 
-        if ( ! is_idle_vcpu(swait->vcpu)
+        if ( ! is_idle_unit(swait->unit)
              && swait->credit > 0 )
         {
             rt_credit = snext->credit - swait->credit;
@@ -3236,7 +3223,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
      *
      * FIXME: See if we can eliminate this conversion if we know time
      * will be outside (MIN,MAX).  Probably requires pre-calculating
-     * credit values of MIN,MAX per vcpu, since each vcpu burns credit
+     * credit values of MIN,MAX per unit, since each unit burns credit
      * at a different rate.
      */
     if ( rt_credit > 0 )
@@ -3284,36 +3271,35 @@ runq_candidate(struct csched2_runqueue_data *rqd,
 
     *skipped = 0;
 
-    if ( unlikely(is_idle_vcpu(scurr->vcpu)) )
+    if ( unlikely(is_idle_unit(scurr->unit)) )
     {
         snext = scurr;
         goto check_runq;
     }
 
-    yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+    yield = __test_and_clear_bit(__CSFLAG_unit_yield, &scurr->flags);
 
     /*
-     * Return the current vcpu if it has executed for less than ratelimit.
-     * Adjuststment for the selected vcpu's credit and decision
+     * Return the current unit if it has executed for less than ratelimit.
+     * Adjuststment for the selected unit's credit and decision
      * for how long it will run will be taken in csched2_runtime.
      *
      * Note that, if scurr is yielding, we don't let rate limiting kick in.
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) &&
-         (now - scurr->vcpu->sched_unit->state_entry_time) <
-          MICROSECS(prv->ratelimit_us) )
+    if ( !yield && prv->ratelimit_us && unit_runnable(scurr->unit) &&
+         (now - scurr->unit->state_entry_time) < MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned unit:16, dom:16;
                 unsigned runtime;
             } d;
-            d.dom = scurr->vcpu->domain->domain_id;
-            d.vcpu = scurr->vcpu->vcpu_id;
-            d.runtime = now - scurr->vcpu->sched_unit->state_entry_time;
+            d.dom = scurr->unit->domain->domain_id;
+            d.unit = scurr->unit->unit_id;
+            d.runtime = now - scurr->unit->state_entry_time;
             __trace_var(TRC_CSCHED2_RATELIMIT, 1,
                         sizeof(d),
                         (unsigned char *)&d);
@@ -3322,13 +3308,13 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     }
 
     /* If scurr has a soft-affinity, let's check whether cpu is part of it */
-    if ( has_soft_affinity(scurr->vcpu->sched_unit) )
+    if ( has_soft_affinity(scurr->unit) )
     {
-        affinity_balance_cpumask(scurr->vcpu->sched_unit, BALANCE_SOFT_AFFINITY,
+        affinity_balance_cpumask(scurr->unit, BALANCE_SOFT_AFFINITY,
                                  cpumask_scratch);
         if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) )
         {
-            cpumask_t *online = cpupool_domain_cpumask(scurr->vcpu->domain);
+            cpumask_t *online = cpupool_domain_cpumask(scurr->unit->domain);
 
             /* Ok, is any of the pcpus in scurr soft-affinity idle? */
             cpumask_and(cpumask_scratch, cpumask_scratch, &rqd->idle);
@@ -3356,10 +3342,10 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      *
      * Of course, we also default to idle also if scurr is not runnable.
      */
-    if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
+    if ( unit_runnable(scurr->unit) && !soft_aff_preempt )
         snext = scurr;
     else
-        snext = csched2_unit(idle_vcpu[cpu]->sched_unit);
+        snext = csched2_unit(sched_idle_unit(cpu));
 
  check_runq:
     list_for_each_safe( iter, temp, &rqd->runq )
@@ -3369,24 +3355,24 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned unit:16, dom:16;
             } d;
-            d.dom = svc->vcpu->domain->domain_id;
-            d.vcpu = svc->vcpu->vcpu_id;
+            d.dom = svc->unit->domain->domain_id;
+            d.unit = svc->unit->unit_id;
             __trace_var(TRC_CSCHED2_RUNQ_CAND_CHECK, 1,
                         sizeof(d),
                         (unsigned char *)&d);
         }
 
-        /* Only consider vcpus that are allowed to run on this processor. */
-        if ( !cpumask_test_cpu(cpu, svc->vcpu->sched_unit->cpu_hard_affinity) )
+        /* Only consider units that are allowed to run on this processor. */
+        if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
         {
             (*skipped)++;
             continue;
         }
 
         /*
-         * If a vcpu is meant to be picked up by another processor, and such
+         * If an unit is meant to be picked up by another processor, and such
          * processor has not scheduled yet, leave it in the runqueue for him.
          */
         if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
@@ -3401,7 +3387,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          * If this is on a different processor, don't pull it unless
          * its credit is at least CSCHED2_MIGRATE_RESIST higher.
          */
-        if ( svc->vcpu->processor != cpu
+        if ( sched_unit_cpu(svc->unit) != cpu
              && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
         {
             (*skipped)++;
@@ -3416,7 +3402,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          * some budget, then choose it.
          */
         if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || vcpu_grab_budget(svc)) )
+             (!has_cap(svc) || unit_grab_budget(svc)) )
             snext = svc;
 
         /* In any case, if we got this far, break. */
@@ -3426,12 +3412,12 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned tickled_cpu, skipped;
             int credit;
         } d;
-        d.dom = snext->vcpu->domain->domain_id;
-        d.vcpu = snext->vcpu->vcpu_id;
+        d.dom = snext->unit->domain->domain_id;
+        d.unit = snext->unit->unit_id;
         d.credit = snext->credit;
         d.tickled_cpu = snext->tickled_cpu;
         d.skipped = *skipped;
@@ -3463,14 +3449,15 @@ csched2_schedule(
 {
     const int cpu = smp_processor_id();
     struct csched2_runqueue_data *rqd;
-    struct csched2_unit * const scurr = csched2_unit(current->sched_unit);
+    struct sched_unit *currunit = current->sched_unit;
+    struct csched2_unit * const scurr = csched2_unit(currunit);
     struct csched2_unit *snext = NULL;
-    unsigned int skipped_vcpus = 0;
+    unsigned int skipped_units = 0;
     struct task_slice ret;
     bool tickled;
 
     SCHED_STAT_CRANK(schedule);
-    CSCHED2_VCPU_CHECK(current);
+    CSCHED2_UNIT_CHECK(currunit);
 
     BUG_ON(!cpumask_test_cpu(cpu, &csched2_priv(ops)->initialized));
 
@@ -3479,7 +3466,7 @@ csched2_schedule(
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
-    BUG_ON(!is_idle_vcpu(scurr->vcpu) && scurr->rqd != rqd);
+    BUG_ON(!is_idle_unit(currunit) && scurr->rqd != rqd);
 
     /* Clear "tickled" bit now that we've been scheduled */
     tickled = cpumask_test_cpu(cpu, &rqd->tickled);
@@ -3499,7 +3486,7 @@ csched2_schedule(
         d.cpu = cpu;
         d.rq_id = c2r(cpu);
         d.tasklet = tasklet_work_scheduled;
-        d.idle = is_idle_vcpu(current);
+        d.idle = is_idle_unit(currunit);
         d.smt_idle = cpumask_test_cpu(cpu, &rqd->smt_idle);
         d.tickled = tickled;
         __trace_var(TRC_CSCHED2_SCHEDULE, 1,
@@ -3513,55 +3500,55 @@ csched2_schedule(
     /*
      *  Below 0, means that we are capped and we have overrun our  budget.
      *  Let's try to get some more but, if we fail (e.g., because of the
-     *  other running vcpus), we will be parked.
+     *  other running units), we will be parked.
      */
     if ( unlikely(scurr->budget <= 0) )
-        vcpu_grab_budget(scurr);
+        unit_grab_budget(scurr);
 
     /*
-     * Select next runnable local VCPU (ie top of local runq).
+     * Select next runnable local UNIT (ie top of local runq).
      *
-     * If the current vcpu is runnable, and has higher credit than
+     * If the current unit is runnable, and has higher credit than
      * the next guy on the queue (or there is noone else), we want to
      * run him again.
      *
-     * If there's tasklet work to do, we want to chose the idle vcpu
+     * If there's tasklet work to do, we want to chose the idle unit
      * for this processor, and mark the current for delayed runqueue
      * add.
      *
-     * If the current vcpu is runnable, and there's another runnable
+     * If the current unit is runnable, and there's another runnable
      * candidate, we want to mark current for delayed runqueue add,
      * and remove the next guy from the queue.
      *
-     * If the current vcpu is not runnable, we want to chose the idle
-     * vcpu for this processor.
+     * If the current unit is not runnable, we want to chose the idle
+     * unit for this processor.
      */
     if ( tasklet_work_scheduled )
     {
-        __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+        __clear_bit(__CSFLAG_unit_yield, &scurr->flags);
         trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
-        snext = csched2_unit(idle_vcpu[cpu]->sched_unit);
+        snext = csched2_unit(sched_idle_unit(cpu));
     }
     else
-        snext = runq_candidate(rqd, scurr, cpu, now, &skipped_vcpus);
+        snext = runq_candidate(rqd, scurr, cpu, now, &skipped_units);
 
-    /* If switching from a non-idle runnable vcpu, put it
+    /* If switching from a non-idle runnable unit, put it
      * back on the runqueue. */
     if ( snext != scurr
-         && !is_idle_vcpu(scurr->vcpu)
-         && vcpu_runnable(current) )
+         && !is_idle_unit(currunit)
+         && unit_runnable(currunit) )
         __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags);
 
     ret.migrated = 0;
 
     /* Accounting for non-idle tasks */
-    if ( !is_idle_vcpu(snext->vcpu) )
+    if ( !is_idle_unit(snext->unit) )
     {
         /* If switching, remove this from the runqueue and mark it scheduled */
         if ( snext != scurr )
         {
             ASSERT(snext->rqd == rqd);
-            ASSERT(!snext->vcpu->sched_unit->is_running);
+            ASSERT(!snext->unit->is_running);
 
             runq_remove(snext);
             __set_bit(__CSFLAG_scheduled, &snext->flags);
@@ -3576,19 +3563,19 @@ csched2_schedule(
 
         /*
          * The reset condition is "has a scheduler epoch come to an end?".
-         * The way this is enforced is checking whether the vcpu at the top
+         * The way this is enforced is checking whether the unit at the top
          * of the runqueue has negative credits. This means the epochs have
          * variable length, as in one epoch expores when:
-         *  1) the vcpu at the top of the runqueue has executed for
+         *  1) the unit at the top of the runqueue has executed for
          *     around 10 ms (with default parameters);
-         *  2) no other vcpu with higher credits wants to run.
+         *  2) no other unit with higher credits wants to run.
          *
          * Here, where we want to check for reset, we need to make sure the
-         * proper vcpu is being used. In fact, runqueue_candidate() may have
-         * not returned the first vcpu in the runqueue, for various reasons
+         * proper unit is being used. In fact, runqueue_candidate() may have
+         * not returned the first unit in the runqueue, for various reasons
          * (e.g., affinity). Only trigger a reset when it does.
          */
-        if ( skipped_vcpus == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
+        if ( skipped_units == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
         {
             reset_credit(ops, cpu, now, snext);
             balance_load(ops, cpu, now);
@@ -3598,11 +3585,10 @@ csched2_schedule(
         snext->tickled_cpu = -1;
 
         /* Safe because lock for old processor is held */
-        if ( snext->vcpu->processor != cpu )
+        if ( sched_unit_cpu(snext->unit) != cpu )
         {
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
-            snext->vcpu->processor = cpu;
-            snext->vcpu->sched_unit->res = get_sched_res(cpu);
+            sched_set_res(snext->unit, get_sched_res(cpu));
             SCHED_STAT_CRANK(migrated);
             ret.migrated = 1;
         }
@@ -3636,20 +3622,20 @@ csched2_schedule(
      * Return task to run next...
      */
     ret.time = csched2_runtime(ops, cpu, snext, now);
-    ret.task = snext->vcpu->sched_unit;
+    ret.task = snext->unit;
 
-    CSCHED2_VCPU_CHECK(ret.task->vcpu);
+    CSCHED2_UNIT_CHECK(ret.task);
     return ret;
 }
 
 static void
-csched2_dump_vcpu(struct csched2_private *prv, struct csched2_unit *svc)
+csched2_dump_unit(struct csched2_private *prv, struct csched2_unit *svc)
 {
     printk("[%i.%i] flags=%x cpu=%i",
-            svc->vcpu->domain->domain_id,
-            svc->vcpu->vcpu_id,
+            svc->unit->domain->domain_id,
+            svc->unit->unit_id,
             svc->flags,
-            svc->vcpu->processor);
+            sched_unit_cpu(svc->unit));
 
     printk(" credit=%" PRIi32" [w=%u]", svc->credit, svc->weight);
 
@@ -3674,12 +3660,12 @@ dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_sibling_mask, cpu)),
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
-    /* current VCPU (nothing to say if that's the idle vcpu) */
+    /* current UNIT (nothing to say if that's the idle unit) */
     svc = csched2_unit(curr_on_cpu(cpu));
-    if ( svc && !is_idle_vcpu(svc->vcpu) )
+    if ( svc && !is_idle_unit(svc->unit) )
     {
         printk("\trun: ");
-        csched2_dump_vcpu(prv, svc);
+        csched2_dump_unit(prv, svc);
     }
 }
 
@@ -3736,7 +3722,7 @@ csched2_dump(const struct scheduler *ops)
     list_for_each( iter_sdom, &prv->sdom )
     {
         struct csched2_dom *sdom;
-        struct vcpu *v;
+        struct sched_unit *unit;
 
         sdom = list_entry(iter_sdom, struct csched2_dom, sdom_elem);
 
@@ -3744,19 +3730,19 @@ csched2_dump(const struct scheduler *ops)
                sdom->dom->domain_id,
                sdom->weight,
                sdom->cap,
-               sdom->nr_vcpus);
+               sdom->nr_units);
 
-        for_each_vcpu( sdom->dom, v )
+        for_each_sched_unit ( sdom->dom, unit )
         {
-            struct csched2_unit * const svc = csched2_unit(v->sched_unit);
+            struct csched2_unit * const svc = csched2_unit(unit);
             spinlock_t *lock;
 
-            lock = unit_schedule_lock(svc->vcpu->sched_unit);
+            lock = unit_schedule_lock(unit);
 
             printk("\t%3d: ", ++loop);
-            csched2_dump_vcpu(prv, svc);
+            csched2_dump_unit(prv, svc);
 
-            unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+            unit_schedule_unlock(lock, unit);
         }
     }
 
@@ -3782,7 +3768,7 @@ csched2_dump(const struct scheduler *ops)
             if ( svc )
             {
                 printk("\t%3d: ", loop++);
-                csched2_dump_vcpu(prv, svc);
+                csched2_dump_unit(prv, svc);
             }
         }
         spin_unlock(&rqd->lock);
@@ -3882,7 +3868,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     struct sched_resource *sd = get_sched_res(cpu);
     unsigned rqi;
 
-    ASSERT(pdata && svc && is_idle_vcpu(svc->vcpu));
+    ASSERT(pdata && svc && is_idle_unit(svc->unit));
 
     /*
      * We own one runqueue lock already (from schedule_cpu_switch()). This
@@ -3895,7 +3881,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     ASSERT(!local_irq_is_enabled());
     write_lock(&prv->lock);
 
-    idle_vcpu[cpu]->sched_unit->priv = vdata;
+    sched_idle_unit(cpu)->priv = vdata;
 
     rqi = init_pdata(prv, pdata, cpu);
 
@@ -3937,7 +3923,7 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
      */
     ASSERT(spc && spc->runq_id != -1);
     ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
-    
+
     /* Find the old runqueue and remove this cpu from it */
     rqd = prv->rqd + spc->runq_id;
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 23/60] xen/sched: make credit2 scheduler vcpu agnostic.
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Switch credit2 scheduler completely from vcpu to sched_unit usage.

As we are touching lots of lines remove some white space at the end of
the line, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit2.c | 820 ++++++++++++++++++++++-----------------------
 1 file changed, 403 insertions(+), 417 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ef29a3d874..0f57135e81 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -45,7 +45,7 @@
 #define TRC_CSCHED2_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(CSCHED2, 8)
 #define TRC_CSCHED2_UPDATE_LOAD      TRC_SCHED_CLASS_EVT(CSCHED2, 9)
 #define TRC_CSCHED2_RUNQ_ASSIGN      TRC_SCHED_CLASS_EVT(CSCHED2, 10)
-#define TRC_CSCHED2_UPDATE_VCPU_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 11)
+#define TRC_CSCHED2_UPDATE_UNIT_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 11)
 #define TRC_CSCHED2_UPDATE_RUNQ_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 12)
 #define TRC_CSCHED2_TICKLE_NEW       TRC_SCHED_CLASS_EVT(CSCHED2, 13)
 #define TRC_CSCHED2_RUNQ_MAX_WEIGHT  TRC_SCHED_CLASS_EVT(CSCHED2, 14)
@@ -74,13 +74,13 @@
  * Design:
  *
  * VMs "burn" credits based on their weight; higher weight means
- * credits burn more slowly.  The highest weight vcpu burns credits at
+ * credits burn more slowly.  The highest weight unit burns credits at
  * a rate of 1 credit per nanosecond.  Others burn proportionally
  * more.
  *
- * vcpus are inserted into the runqueue by credit order.
+ * units are inserted into the runqueue by credit order.
  *
- * Credits are "reset" when the next vcpu in the runqueue is less than
+ * Credits are "reset" when the next unit in the runqueue is less than
  * or equal to zero.  At that point, everyone's credits are "clipped"
  * to a small value, and a fixed credit is added to everyone.
  */
@@ -95,33 +95,33 @@
  *   be given a cap of 25%; a domain that must not use more than 1+1/2 of
  *   physical CPU time, will be given a cap of 150%;
  *
- * - caps are per-domain (not per-vCPU). If a domain has only 1 vCPU, and
- *   a 40% cap, that one vCPU will use 40% of one pCPU. If a somain has 4
- *   vCPUs, and a 200% cap, the equivalent of 100% time on 2 pCPUs will be
- *   split among the v vCPUs. How much each of the vCPUs will actually get,
+ * - caps are per-domain (not per-unit). If a domain has only 1 unit, and
+ *   a 40% cap, that one unit will use 40% of one pCPU. If a somain has 4
+ *   units, and a 200% cap, the equivalent of 100% time on 2 pCPUs will be
+ *   split among the v units. How much each of the units will actually get,
  *   during any given interval of time, is unspecified (as it depends on
  *   various aspects: workload, system load, etc.). For instance, it is
- *   possible that, during a given time interval, 2 vCPUs use 100% each,
+ *   possible that, during a given time interval, 2 units use 100% each,
  *   and the other two use nothing; while during another time interval,
- *   two vCPUs use 80%, one uses 10% and the other 30%; or that each use
+ *   two units use 80%, one uses 10% and the other 30%; or that each use
  *   50% (and so on and so forth).
  *
  * For implementing this, we use the following approach:
  *
  * - each domain is given a 'budget', an each domain has a timer, which
  *   replenishes the domain's budget periodically. The budget is the amount
- *   of time the vCPUs of the domain can use every 'period';
+ *   of time the units of the domain can use every 'period';
  *
  * - the period is CSCHED2_BDGT_REPL_PERIOD, and is the same for all domains
  *   (but each domain has its own timer; so the all are periodic by the same
  *   period, but replenishment of the budgets of the various domains, at
  *   periods boundaries, are not synchronous);
  *
- * - when vCPUs run, they consume budget. When they don't run, they don't
- *   consume budget. If there is no budget left for the domain, no vCPU of
- *   that domain can run. If a vCPU tries to run and finds that there is no
+ * - when units run, they consume budget. When they don't run, they don't
+ *   consume budget. If there is no budget left for the domain, no unit of
+ *   that domain can run. If an unit tries to run and finds that there is no
  *   budget, it blocks.
- *   At whatever time a vCPU wants to run, it must check the domain's budget,
+ *   At whatever time an unit wants to run, it must check the domain's budget,
  *   and if there is some, it can use it.
  *
  * - budget is replenished to the top of the capacity for the domain once
@@ -129,39 +129,39 @@
  *   though, the budget after a replenishment will always be at most equal
  *   to the total capacify of the domain ('tot_budget');
  *
- * - when a budget replenishment occurs, if there are vCPUs that had been
+ * - when a budget replenishment occurs, if there are units that had been
  *   blocked because of lack of budget, they'll be unblocked, and they will
  *   (potentially) be able to run again.
  *
  * Finally, some even more implementation related detail:
  *
- * - budget is stored in a domain-wide pool. vCPUs of the domain that want
+ * - budget is stored in a domain-wide pool. Items of the domain that want
  *   to run go to such pool, and grub some. When they do so, the amount
  *   they grabbed is _immediately_ removed from the pool. This happens in
- *   vcpu_grab_budget();
+ *   unit_grab_budget();
  *
- * - when vCPUs stop running, if they've not consumed all the budget they
+ * - when units stop running, if they've not consumed all the budget they
  *   took, the leftover is put back in the pool. This happens in
- *   vcpu_return_budget();
+ *   unit_return_budget();
  *
- * - the above means that a vCPU can find out that there is no budget and
+ * - the above means that an unit can find out that there is no budget and
  *   block, not only if the cap has actually been reached (for this period),
- *   but also if some other vCPUs, in order to run, have grabbed a certain
+ *   but also if some other units, in order to run, have grabbed a certain
  *   quota of budget, no matter whether they've already used it all or not.
- *   A vCPU blocking because (any form of) lack of budget is said to be
- *   "parked", and such blocking happens in park_vcpu();
+ *   An unit blocking because (any form of) lack of budget is said to be
+ *   "parked", and such blocking happens in park_unit();
  *
- * - when a vCPU stops running, and puts back some budget in the domain pool,
+ * - when an unit stops running, and puts back some budget in the domain pool,
  *   we need to check whether there is someone which has been parked and that
- *   can be unparked. This happens in unpark_parked_vcpus(), called from
+ *   can be unparked. This happens in unpark_parked_units(), called from
  *   csched2_context_saved();
  *
  * - of course, unparking happens also as a consequence of the domain's budget
  *   being replenished by the periodic timer. This also occurs by means of
  *   calling csched2_context_saved() (but from replenish_domain_budget());
  *
- * - parked vCPUs of a domain are kept in a (per-domain) list, called
- *   'parked_vcpus'). Manipulation of the list and of the domain-wide budget
+ * - parked units of a domain are kept in a (per-domain) list, called
+ *   'parked_units'). Manipulation of the list and of the domain-wide budget
  *   pool, must occur only when holding the 'budget_lock'.
  */
 
@@ -174,9 +174,9 @@
  *     pcpu_schedule_lock() / unit_schedule_lock() (and friends),
  *   * a cpu may (try to) take a "remote" runqueue lock, e.g., for
  *     load balancing;
- *  + serializes runqueue operations (removing and inserting vcpus);
+ *  + serializes runqueue operations (removing and inserting units);
  *  + protects runqueue-wide data in csched2_runqueue_data;
- *  + protects vcpu parameters in csched2_unit for the vcpu in the
+ *  + protects unit parameters in csched2_unit for the unit in the
  *    runqueue.
  *
  * - Private scheduler lock
@@ -190,8 +190,8 @@
  *  + it is per-domain;
  *  + protects, in domains that have an utilization cap;
  *   * manipulation of the total budget of the domain (as it is shared
- *     among all vCPUs of the domain),
- *   * manipulation of the list of vCPUs that are blocked waiting for
+ *     among all units of the domain),
+ *   * manipulation of the list of units that are blocked waiting for
  *     some budget to be available.
  *
  * - Type:
@@ -228,9 +228,9 @@
  */
 #define CSCHED2_CREDIT_INIT          MILLISECS(10)
 /*
- * Amount of credit the idle vcpus have. It never changes, as idle
- * vcpus does not consume credits, and it must be lower than whatever
- * amount of credit 'regular' vcpu would end up with.
+ * Amount of credit the idle units have. It never changes, as idle
+ * units does not consume credits, and it must be lower than whatever
+ * amount of credit 'regular' unit would end up with.
  */
 #define CSCHED2_IDLE_CREDIT          (-(1U<<30))
 /*
@@ -243,9 +243,9 @@
  * MIN_TIMER.
  */
 #define CSCHED2_MIGRATE_RESIST       ((opt_migrate_resist)*MICROSECS(1))
-/* How much to "compensate" a vcpu for L2 migration. */
+/* How much to "compensate" an unit for L2 migration. */
 #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50)
-/* How tolerant we should be when peeking at runtime of vcpus on other cpus */
+/* How tolerant we should be when peeking at runtime of units on other cpus */
 #define CSCHED2_RATELIMIT_TICKLE_TOLERANCE MICROSECS(50)
 /* Reset: Value below which credit will be reset. */
 #define CSCHED2_CREDIT_RESET         0
@@ -258,7 +258,7 @@
  * Flags
  */
 /*
- * CSFLAG_scheduled: Is this vcpu either running on, or context-switching off,
+ * CSFLAG_scheduled: Is this unit either running on, or context-switching off,
  * a physical cpu?
  * + Accessed only with runqueue lock held
  * + Set when chosen as next in csched2_schedule().
@@ -280,21 +280,21 @@
 #define __CSFLAG_delayed_runq_add 2
 #define CSFLAG_delayed_runq_add (1U<<__CSFLAG_delayed_runq_add)
 /*
- * CSFLAG_runq_migrate_request: This vcpu is being migrated as a result of a
+ * CSFLAG_runq_migrate_request: This unit is being migrated as a result of a
  * credit2-initiated runq migrate request; migrate it to the runqueue indicated
- * in the svc struct. 
+ * in the svc struct.
  */
 #define __CSFLAG_runq_migrate_request 3
 #define CSFLAG_runq_migrate_request (1U<<__CSFLAG_runq_migrate_request)
 /*
- * CSFLAG_vcpu_yield: this vcpu was running, and has called vcpu_yield(). The
+ * CSFLAG_unit_yield: this unit was running, and has called vcpu_yield(). The
  * scheduler is invoked to see if we can give the cpu to someone else, and
- * get back to the yielding vcpu in a while.
+ * get back to the yielding unit in a while.
  */
-#define __CSFLAG_vcpu_yield 4
-#define CSFLAG_vcpu_yield (1U<<__CSFLAG_vcpu_yield)
+#define __CSFLAG_unit_yield 4
+#define CSFLAG_unit_yield (1U<<__CSFLAG_unit_yield)
 /*
- * CSFLAGS_pinned: this vcpu is currently 'pinned', i.e., has its hard
+ * CSFLAGS_pinned: this unit is currently 'pinned', i.e., has its hard
  * affinity set to one and only 1 cpu (and, hence, can only run there).
  */
 #define __CSFLAG_pinned 5
@@ -306,7 +306,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
 /*
  * Load tracking and load balancing
  *
- * Load history of runqueues and vcpus is accounted for by using an
+ * Load history of runqueues and units is accounted for by using an
  * exponential weighted moving average algorithm. However, instead of using
  * fractions,we shift everything to left by the number of bits we want to
  * use for representing the fractional part (Q-format).
@@ -326,7 +326,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
  *
  * where W is the length of the window, P the multiplier for transitiong into
  * Q-format fixed point arithmetic and load is the instantaneous load of a
- * runqueue, which basically is the number of runnable vcpus there are on the
+ * runqueue, which basically is the number of runnable units there are on the
  * runqueue (for the meaning of the other terms, look at the doc comment to
  *  update_runq_load()).
  *
@@ -338,7 +338,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
  * The maximum possible value for the average load, which we want to store in
  * s_time_t type variables (i.e., we have 63 bits available) is load*P. This
  * means that, with P 18 bits wide, load can occupy 45 bits. This in turn
- * means we can have 2^45 vcpus in each runqueue, before overflow occurs!
+ * means we can have 2^45 units in each runqueue, before overflow occurs!
  *
  * However, it can happen that, at step j+1, if:
  *
@@ -354,13 +354,13 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
  *
  *  2^(63 - 30 - 18) = 2^15 = 32768
  *
- * So 32768 is the maximum number of vcpus the we can have in a runqueue,
+ * So 32768 is the maximum number of units the we can have in a runqueue,
  * at any given time, and still not have problems with the load tracking
  * calculations... and this is more than fine.
  *
  * As a matter of fact, since we are using microseconds granularity, we have
  * W=2^20. So, still with 18 fractional bits and a 1 second long window, there
- * may be 2^25 = 33554432 vcpus in a runq before we have to start thinking
+ * may be 2^25 = 33554432 units in a runq before we have to start thinking
  * about overflow.
  */
 
@@ -468,7 +468,7 @@ struct csched2_runqueue_data {
     struct list_head runq;     /* Ordered list of runnable vms               */
     int id;                    /* ID of this runqueue (-1 if invalid)        */
 
-    int load;                  /* Instantaneous load (num of non-idle vcpus) */
+    int load;                  /* Instantaneous load (num of non-idle units) */
     s_time_t load_last_update; /* Last time average was updated              */
     s_time_t avgload;          /* Decaying queue load                        */
     s_time_t b_avgload;        /* Decaying queue load modified by balancing  */
@@ -478,8 +478,8 @@ struct csched2_runqueue_data {
         tickled,               /* Have been asked to go through schedule     */
         idle;                  /* Currently idle pcpus                       */
 
-    struct list_head svc;      /* List of all vcpus assigned to the runqueue */
-    unsigned int max_weight;   /* Max weight of the vcpus in this runqueue   */
+    struct list_head svc;      /* List of all units assigned to the runqueue */
+    unsigned int max_weight;   /* Max weight of the units in this runqueue   */
     unsigned int pick_bias;    /* Last picked pcpu. Start from it next time  */
 };
 
@@ -509,20 +509,20 @@ struct csched2_pcpu {
 };
 
 /*
- * Virtual CPU
+ * Schedule Item
  */
 struct csched2_unit {
     struct csched2_dom *sdom;          /* Up-pointer to domain                */
-    struct vcpu *vcpu;                 /* Up-pointer, to vcpu                 */
+    struct sched_unit *unit;           /* Up-pointer, to schedule unit        */
     struct csched2_runqueue_data *rqd; /* Up-pointer to the runqueue          */
 
     int credit;                        /* Current amount of credit            */
-    unsigned int weight;               /* Weight of this vcpu                 */
+    unsigned int weight;               /* Weight of this unit                 */
     unsigned int residual;             /* Reminder of div(max_weight/weight)  */
     unsigned flags;                    /* Status flags (16 bits would be ok,  */
     s_time_t budget;                   /* Current budget (if domains has cap) */
                                        /* but clear_bit() does not like that) */
-    s_time_t budget_quota;             /* Budget to which vCPU is entitled    */
+    s_time_t budget_quota;             /* Budget to which unit is entitled    */
 
     s_time_t start_time;               /* Time we were scheduled (for credit) */
 
@@ -531,7 +531,7 @@ struct csched2_unit {
     s_time_t avgload;                  /* Decaying queue load                 */
 
     struct list_head runq_elem;        /* On the runqueue (rqd->runq)         */
-    struct list_head parked_elem;      /* On the parked_vcpus list            */
+    struct list_head parked_elem;      /* On the parked_units list            */
     struct list_head rqd_elem;         /* On csched2_runqueue_data's svc list */
     struct csched2_runqueue_data *migrate_rqd; /* Pre-determined migr. target */
     int tickled_cpu;                   /* Cpu that will pick us (-1 if none)  */
@@ -549,12 +549,12 @@ struct csched2_dom {
 
     struct timer repl_timer;    /* Timer for periodic replenishment of budget */
     s_time_t next_repl;         /* Time at which next replenishment occurs    */
-    struct list_head parked_vcpus; /* List of CPUs waiting for budget         */
+    struct list_head parked_units; /* List of CPUs waiting for budget         */
 
     struct list_head sdom_elem; /* On csched2_runqueue_data's sdom list       */
     uint16_t weight;            /* User specified weight                      */
     uint16_t cap;               /* User specified cap                         */
-    uint16_t nr_vcpus;          /* Number of vcpus of this domain             */
+    uint16_t nr_units;          /* Number of units of this domain             */
 };
 
 /*
@@ -593,7 +593,7 @@ static inline struct csched2_runqueue_data *c2rqd(const struct scheduler *ops,
     return &csched2_priv(ops)->rqd[c2r(cpu)];
 }
 
-/* Does the domain of this vCPU have a cap? */
+/* Does the domain of this unit have a cap? */
 static inline bool has_cap(const struct csched2_unit *svc)
 {
     return svc->budget != STIME_MAX;
@@ -611,24 +611,24 @@ static inline bool has_cap(const struct csched2_unit *svc)
  *    smt_idle mask.
  *
  * Once we have such a mask, it is easy to implement a policy that, either:
- *  - uses fully idle cores first: it is enough to try to schedule the vcpus
+ *  - uses fully idle cores first: it is enough to try to schedule the units
  *    on pcpus from smt_idle mask first. This is what happens if
  *    sched_smt_power_savings was not set at boot (default), and it maximizes
  *    true parallelism, and hence performance;
- *  - uses already busy cores first: it is enough to try to schedule the vcpus
+ *  - uses already busy cores first: it is enough to try to schedule the units
  *    on pcpus that are idle, but are not in smt_idle. This is what happens if
  *    sched_smt_power_savings is set at boot, and it allows as more cores as
  *    possible to stay in low power states, minimizing power consumption.
  *
  * This logic is entirely implemented in runq_tickle(), and that is enough.
- * In fact, in this scheduler, placement of a vcpu on one of the pcpus of a
+ * In fact, in this scheduler, placement of an unit on one of the pcpus of a
  * runq, _always_ happens by means of tickling:
- *  - when a vcpu wakes up, it calls csched2_unit_wake(), which calls
+ *  - when an unit wakes up, it calls csched2_unit_wake(), which calls
  *    runq_tickle();
  *  - when a migration is initiated in schedule.c, we call csched2_res_pick(),
  *    csched2_unit_migrate() (which calls migrate()) and csched2_unit_wake().
  *    csched2_res_pick() looks for the least loaded runq and return just any
- *    of its processors. Then, csched2_unit_migrate() just moves the vcpu to
+ *    of its processors. Then, csched2_unit_migrate() just moves the unit to
  *    the chosen runq, and it is again runq_tickle(), called by
  *    csched2_unit_wake() that actually decides what pcpu to use within the
  *    chosen runq;
@@ -643,7 +643,7 @@ static inline bool has_cap(const struct csched2_unit *svc)
  *
  * NB that rqd->smt_idle is different than rqd->idle.  rqd->idle
  * records pcpus that at are merely idle (i.e., at the moment do not
- * have a vcpu running on them).  But you have to manually filter out
+ * have an unit running on them).  But you have to manually filter out
  * which pcpus have been tickled in order to find cores that are not
  * going to be busy soon.  Filtering out tickled cpus pairwise is a
  * lot of extra pain; so for rqd->smt_idle, we explicitly make so that
@@ -690,24 +690,24 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
  */
 static int get_fallback_cpu(struct csched2_unit *svc)
 {
-    struct vcpu *v = svc->vcpu;
+    struct sched_unit *unit = svc->unit;
     unsigned int bs;
 
     SCHED_STAT_CRANK(need_fallback_cpu);
 
     for_each_affinity_balance_step( bs )
     {
-        int cpu = v->processor;
+        int cpu = sched_unit_cpu(unit);
 
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v->sched_unit) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) )
             continue;
 
-        affinity_balance_cpumask(v->sched_unit, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                    cpupool_domain_cpumask(v->domain));
+                    cpupool_domain_cpumask(unit->domain));
 
         /*
-         * This is cases 1 or 3 (depending on bs): if v->processor is (still)
+         * This is cases 1 or 3 (depending on bs): if processor is (still)
          * in our affinity, go for it, for cache betterness.
          */
         if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
@@ -729,7 +729,7 @@ static int get_fallback_cpu(struct csched2_unit *svc)
          * We may well pick any valid pcpu from our soft-affinity, outside
          * of our current runqueue, but we decide not to. In fact, changing
          * runqueue is slow, affects load distribution, and is a source of
-         * overhead for the vcpus running on the other runqueue (we need the
+         * overhead for the units running on the other runqueue (we need the
          * lock). So, better do that as a consequence of a well informed
          * decision (or if we really don't have any other chance, as we will,
          * at step 5, if we get to there).
@@ -761,7 +761,7 @@ static int get_fallback_cpu(struct csched2_unit *svc)
      * We can't be here.  But if that somehow happen (in non-debug builds),
      * at least return something which both online and in our hard-affinity.
      */
-    return cpumask_any(cpumask_scratch_cpu(v->processor));
+    return cpumask_any(cpumask_scratch_cpu(sched_unit_cpu(unit)));
 }
 
 /*
@@ -790,7 +790,7 @@ static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct c
  * Runqueue related code.
  */
 
-static inline int vcpu_on_runq(struct csched2_unit *svc)
+static inline int unit_on_runq(struct csched2_unit *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
@@ -948,17 +948,17 @@ _runq_assign(struct csched2_unit *svc, struct csched2_runqueue_data *rqd)
 
     update_max_weight(svc->rqd, svc->weight, 0);
 
-    /* Expected new load based on adding this vcpu */
+    /* Expected new load based on adding this unit */
     rqd->b_avgload += svc->avgload;
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned rqi:16;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.rqi=rqd->id;
         __trace_var(TRC_CSCHED2_RUNQ_ASSIGN, 1,
                     sizeof(d),
@@ -968,13 +968,13 @@ _runq_assign(struct csched2_unit *svc, struct csched2_runqueue_data *rqd)
 }
 
 static void
-runq_assign(const struct scheduler *ops, struct vcpu *vc)
+runq_assign(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct csched2_unit *svc = vc->sched_unit->priv;
+    struct csched2_unit *svc = unit->priv;
 
     ASSERT(svc->rqd == NULL);
 
-    _runq_assign(svc, c2rqd(ops, vc->processor));
+    _runq_assign(svc, c2rqd(ops, sched_unit_cpu(unit)));
 }
 
 static void
@@ -982,24 +982,24 @@ _runq_deassign(struct csched2_unit *svc)
 {
     struct csched2_runqueue_data *rqd = svc->rqd;
 
-    ASSERT(!vcpu_on_runq(svc));
+    ASSERT(!unit_on_runq(svc));
     ASSERT(!(svc->flags & CSFLAG_scheduled));
 
     list_del_init(&svc->rqd_elem);
     update_max_weight(rqd, 0, svc->weight);
 
-    /* Expected new load based on removing this vcpu */
+    /* Expected new load based on removing this unit */
     rqd->b_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
 
     svc->rqd = NULL;
 }
 
 static void
-runq_deassign(const struct scheduler *ops, struct vcpu *vc)
+runq_deassign(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct csched2_unit *svc = vc->sched_unit->priv;
+    struct csched2_unit *svc = unit->priv;
 
-    ASSERT(svc->rqd == c2rqd(ops, vc->processor));
+    ASSERT(svc->rqd == c2rqd(ops, sched_unit_cpu(unit)));
 
     _runq_deassign(svc);
 }
@@ -1202,15 +1202,15 @@ update_svc_load(const struct scheduler *ops,
                 struct csched2_unit *svc, int change, s_time_t now)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    s_time_t delta, vcpu_load;
+    s_time_t delta, unit_load;
     unsigned int P, W;
 
     if ( change == -1 )
-        vcpu_load = 1;
+        unit_load = 1;
     else if ( change == 1 )
-        vcpu_load = 0;
+        unit_load = 0;
     else
-        vcpu_load = vcpu_runnable(svc->vcpu);
+        unit_load = unit_runnable(svc->unit);
 
     W = prv->load_window_shift;
     P = prv->load_precision_shift;
@@ -1218,7 +1218,7 @@ update_svc_load(const struct scheduler *ops,
 
     if ( svc->load_last_update + (1ULL << W) < now )
     {
-        svc->avgload = vcpu_load << P;
+        svc->avgload = unit_load << P;
     }
     else
     {
@@ -1231,7 +1231,7 @@ update_svc_load(const struct scheduler *ops,
         }
 
         svc->avgload = svc->avgload +
-                       ((delta * (vcpu_load << P)) >> W) -
+                       ((delta * (unit_load << P)) >> W) -
                        ((delta * svc->avgload) >> W);
     }
     svc->load_last_update = now;
@@ -1243,14 +1243,14 @@ update_svc_load(const struct scheduler *ops,
     {
         struct {
             uint64_t v_avgload;
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned shift;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.v_avgload = svc->avgload;
         d.shift = P;
-        __trace_var(TRC_CSCHED2_UPDATE_VCPU_LOAD, 1,
+        __trace_var(TRC_CSCHED2_UPDATE_UNIT_LOAD, 1,
                     sizeof(d),
                     (unsigned char *)&d);
     }
@@ -1272,18 +1272,18 @@ static void
 runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
 {
     struct list_head *iter;
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_unit_cpu(svc->unit);
     struct list_head * runq = &c2rqd(ops, cpu)->runq;
     int pos = 0;
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
-    ASSERT(!vcpu_on_runq(svc));
-    ASSERT(c2r(cpu) == c2r(svc->vcpu->processor));
+    ASSERT(!unit_on_runq(svc));
+    ASSERT(c2r(cpu) == c2r(sched_unit_cpu(svc->unit)));
 
     ASSERT(&svc->rqd->runq == runq);
-    ASSERT(!is_idle_vcpu(svc->vcpu));
-    ASSERT(!svc->vcpu->sched_unit->is_running);
+    ASSERT(!is_idle_unit(svc->unit));
+    ASSERT(!svc->unit->is_running);
     ASSERT(!(svc->flags & CSFLAG_scheduled));
 
     list_for_each( iter, runq )
@@ -1300,11 +1300,11 @@ runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned pos;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.pos = pos;
         __trace_var(TRC_CSCHED2_RUNQ_POS, 1,
                     sizeof(d),
@@ -1314,7 +1314,7 @@ runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
 
 static inline void runq_remove(struct csched2_unit *svc)
 {
-    ASSERT(vcpu_on_runq(svc));
+    ASSERT(unit_on_runq(svc));
     list_del_init(&svc->runq_elem);
 }
 
@@ -1340,8 +1340,8 @@ static inline bool is_preemptable(const struct csched2_unit *svc,
     if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
         return true;
 
-    ASSERT(svc->vcpu->sched_unit->is_running);
-    return now - svc->vcpu->sched_unit->state_entry_time >
+    ASSERT(svc->unit->is_running);
+    return now - svc->unit->state_entry_time >
            ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE;
 }
 
@@ -1369,17 +1369,17 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
 
     /*
      * We are dealing with cpus that are marked non-idle (i.e., that are not
-     * in rqd->idle). However, some of them may be running their idle vcpu,
+     * in rqd->idle). However, some of them may be running their idle unit,
      * if taking care of tasklets. In that case, we want to leave it alone.
      */
-    if ( unlikely(is_idle_vcpu(cur->vcpu) ||
+    if ( unlikely(is_idle_unit(cur->unit) ||
          !is_preemptable(cur, now, MICROSECS(prv->ratelimit_us))) )
         return -1;
 
     burn_credits(rqd, cur, now);
 
     score = new->credit - cur->credit;
-    if ( new->vcpu->processor != cpu )
+    if ( sched_unit_cpu(new->unit) != cpu )
         score -= CSCHED2_MIGRATE_RESIST;
 
     /*
@@ -1390,21 +1390,21 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
      */
     if ( score > 0 )
     {
-        if ( cpumask_test_cpu(cpu, new->vcpu->sched_unit->cpu_soft_affinity) )
+        if ( cpumask_test_cpu(cpu, new->unit->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
 
-        if ( !cpumask_test_cpu(cpu, cur->vcpu->sched_unit->cpu_soft_affinity) )
+        if ( !cpumask_test_cpu(cpu, cur->unit->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
     }
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             int credit, score;
         } d;
-        d.dom = cur->vcpu->domain->domain_id;
-        d.vcpu = cur->vcpu->vcpu_id;
+        d.dom = cur->unit->domain->domain_id;
+        d.unit = cur->unit->unit_id;
         d.credit = cur->credit;
         d.score = score;
         __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
@@ -1416,14 +1416,14 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
 }
 
 /*
- * Check what processor it is best to 'wake', for picking up a vcpu that has
+ * Check what processor it is best to 'wake', for picking up an unit that has
  * just been put (back) in the runqueue. Logic is as follows:
  *  1. if there are idle processors in the runq, wake one of them;
- *  2. if there aren't idle processor, check the one were the vcpu was
+ *  2. if there aren't idle processor, check the one were the unit was
  *     running before to see if we can preempt what's running there now
  *     (and hence doing just one migration);
- *  3. last stand: check all processors and see if the vcpu is in right
- *     of preempting any of the other vcpus running on them (this requires
+ *  3. last stand: check all processors and see if the unit is in right
+ *     of preempting any of the other units running on them (this requires
  *     two migrations, and that's indeed why it is left as the last stand).
  *
  * Note that when we say 'idle processors' what we really mean is (pretty
@@ -1436,10 +1436,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t max = 0;
-    struct sched_unit *unit = new->vcpu->sched_unit;
-    unsigned int bs, cpu = new->vcpu->processor;
+    struct sched_unit *unit = new->unit;
+    unsigned int bs, cpu = sched_unit_cpu(unit);
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
-    cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain);
+    cpumask_t *online = cpupool_domain_cpumask(unit->domain);
     cpumask_t mask;
 
     ASSERT(new->rqd == rqd);
@@ -1447,13 +1447,13 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned processor;
             int credit;
         } d;
-        d.dom = new->vcpu->domain->domain_id;
-        d.vcpu = new->vcpu->vcpu_id;
-        d.processor = new->vcpu->processor;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
+        d.processor = cpu;
         d.credit = new->credit;
         __trace_var(TRC_CSCHED2_TICKLE_NEW, 1,
                     sizeof(d),
@@ -1461,11 +1461,11 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
     }
 
     /*
-     * Exclusive pinning is when a vcpu has hard-affinity with only one
-     * cpu, and there is no other vcpu that has hard-affinity with that
+     * Exclusive pinning is when an unit has hard-affinity with only one
+     * cpu, and there is no other unit that has hard-affinity with that
      * same cpu. This is infrequent, but if it happens, is for achieving
      * the most possible determinism, and least possible overhead for
-     * the vcpus in question.
+     * the units in question.
      *
      * Try to identify the vast majority of these situations, and deal
      * with them quickly.
@@ -1532,7 +1532,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
     /*
      * Note that, if we are here, it means we have done the hard-affinity
      * balancing step of the loop, and hence what we have in cpumask_scratch
-     * is what we put there for last, i.e., new's vcpu_hard_affinity & online
+     * is what we put there for last, i.e., new's unit_hard_affinity & online
      * which is exactly what we need for the next part of the function.
      */
 
@@ -1543,7 +1543,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
      *
      * For deciding which cpu to tickle, we use tickle_score(), which will
      * factor in both new's soft-affinity, and the soft-affinity of the
-     * vcpu running on each cpu that we consider.
+     * unit running on each cpu that we consider.
      */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
@@ -1588,7 +1588,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
         return;
     }
 
-    ASSERT(!is_idle_vcpu(curr_on_cpu(ipid)->vcpu));
+    ASSERT(!is_idle_unit(curr_on_cpu(ipid)));
     SCHED_STAT_CRANK(tickled_busy_cpu);
  tickle:
     BUG_ON(ipid == -1);
@@ -1623,16 +1623,16 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
 
     /*
      * Under normal circumstances, snext->credit should never be less
-     * than -CSCHED2_MIN_TIMER.  However, under some circumstances, a
-     * vcpu with low credits may be allowed to run long enough that
+     * than -CSCHED2_MIN_TIMER.  However, under some circumstances, an
+     * unit with low credits may be allowed to run long enough that
      * its credits are actually less than -CSCHED2_CREDIT_INIT.
-     * (Instances have been observed, for example, where a vcpu with
+     * (Instances have been observed, for example, where an unit with
      * 200us of credit was allowed to run for 11ms, giving it -10.8ms
      * of credit.  Thus it was still negative even after the reset.)
      *
      * If this is the case for snext, we simply want to keep moving
      * everyone up until it is in the black again.  This fair because
-     * none of the other vcpus want to run at the moment.
+     * none of the other units want to run at the moment.
      *
      * Rather than looping, however, we just calculate a multiplier,
      * avoiding an integer division and multiplication in the common
@@ -1649,16 +1649,16 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
         int start_credit;
 
         svc = list_entry(iter, struct csched2_unit, rqd_elem);
-        svc_cpu = svc->vcpu->processor;
+        svc_cpu = sched_unit_cpu(svc->unit);
 
-        ASSERT(!is_idle_vcpu(svc->vcpu));
+        ASSERT(!is_idle_unit(svc->unit));
         ASSERT(svc->rqd == rqd);
 
         /*
          * If svc is running, it is our responsibility to make sure, here,
          * that the credit it has spent so far get accounted.
          */
-        if ( svc->vcpu == curr_on_cpu(svc_cpu)->vcpu )
+        if ( svc->unit == curr_on_cpu(svc_cpu) )
         {
             burn_credits(rqd, svc, now);
             /*
@@ -1689,12 +1689,12 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned unit:16, dom:16;
                 int credit_start, credit_end;
                 unsigned multiplier;
             } d;
-            d.dom = svc->vcpu->domain->domain_id;
-            d.vcpu = svc->vcpu->vcpu_id;
+            d.dom = svc->unit->domain->domain_id;
+            d.unit = svc->unit->unit_id;
             d.credit_start = start_credit;
             d.credit_end = svc->credit;
             d.multiplier = m;
@@ -1714,9 +1714,9 @@ void burn_credits(struct csched2_runqueue_data *rqd,
 {
     s_time_t delta;
 
-    ASSERT(svc == csched2_unit(curr_on_cpu(svc->vcpu->processor)));
+    ASSERT(svc == csched2_unit(curr_on_cpu(sched_unit_cpu(svc->unit))));
 
-    if ( unlikely(is_idle_vcpu(svc->vcpu)) )
+    if ( unlikely(is_idle_unit(svc->unit)) )
     {
         ASSERT(svc->credit == CSCHED2_IDLE_CREDIT);
         return;
@@ -1745,12 +1745,12 @@ void burn_credits(struct csched2_runqueue_data *rqd,
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             int credit, budget;
             int delta;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->unit->domain->domain_id;
+        d.unit = svc->unit->unit_id;
         d.credit = svc->credit;
         d.budget = has_cap(svc) ?  svc->budget : INT_MIN;
         d.delta = delta;
@@ -1764,39 +1764,39 @@ void burn_credits(struct csched2_runqueue_data *rqd,
  * Budget-related code.
  */
 
-static void park_vcpu(struct csched2_unit *svc)
+static void park_unit(struct csched2_unit *svc)
 {
-    struct vcpu *v = svc->vcpu;
+    struct sched_unit *unit = svc->unit;
 
     ASSERT(spin_is_locked(&svc->sdom->budget_lock));
 
     /*
-     * It was impossible to find budget for this vCPU, so it has to be
+     * It was impossible to find budget for this unit, so it has to be
      * "parked". This implies it is not runnable, so we mark it as such in
-     * its pause_flags. If the vCPU is currently scheduled (which means we
+     * its pause_flags. If the unit is currently scheduled (which means we
      * are here after being called from within csched_schedule()), flagging
      * is enough, as we'll choose someone else, and then context_saved()
      * will take care of updating the load properly.
      *
-     * If, OTOH, the vCPU is sitting in the runqueue (which means we are here
+     * If, OTOH, the unit is sitting in the runqueue (which means we are here
      * after being called from within runq_candidate()), we must go all the
      * way down to taking it out of there, and updating the load accordingly.
      *
-     * In both cases, we also add it to the list of parked vCPUs of the domain.
+     * In both cases, we also add it to the list of parked units of the domain.
      */
-    __set_bit(_VPF_parked, &v->pause_flags);
-    if ( vcpu_on_runq(svc) )
+    sched_set_pause_flags(unit, _VPF_parked);
+    if ( unit_on_runq(svc) )
     {
         runq_remove(svc);
         update_load(svc->sdom->dom->cpupool->sched, svc->rqd, svc, -1, NOW());
     }
-    list_add(&svc->parked_elem, &svc->sdom->parked_vcpus);
+    list_add(&svc->parked_elem, &svc->sdom->parked_units);
 }
 
-static bool vcpu_grab_budget(struct csched2_unit *svc)
+static bool unit_grab_budget(struct csched2_unit *svc)
 {
     struct csched2_dom *sdom = svc->sdom;
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_unit_cpu(svc->unit);
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
@@ -1808,9 +1808,9 @@ static bool vcpu_grab_budget(struct csched2_unit *svc)
 
     /*
      * Here, svc->budget is <= 0 (as, if it was > 0, we'd have taken the if
-     * above!). That basically means the vCPU has overrun a bit --because of
+     * above!). That basically means the unit has overrun a bit --because of
      * various reasons-- and we want to take that into account. With the +=,
-     * we are actually subtracting the amount of budget the vCPU has
+     * we are actually subtracting the amount of budget the unit has
      * overconsumed, from the total domain budget.
      */
     sdom->budget += svc->budget;
@@ -1831,7 +1831,7 @@ static bool vcpu_grab_budget(struct csched2_unit *svc)
     else
     {
         svc->budget = 0;
-        park_vcpu(svc);
+        park_unit(svc);
     }
 
     spin_unlock(&sdom->budget_lock);
@@ -1840,10 +1840,10 @@ static bool vcpu_grab_budget(struct csched2_unit *svc)
 }
 
 static void
-vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked)
+unit_return_budget(struct csched2_unit *svc, struct list_head *parked)
 {
     struct csched2_dom *sdom = svc->sdom;
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_unit_cpu(svc->unit);
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
     ASSERT(list_empty(parked));
@@ -1852,7 +1852,7 @@ vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked)
     spin_lock(&sdom->budget_lock);
 
     /*
-     * The vCPU is stopping running (e.g., because it's blocking, or it has
+     * The unit is stopping running (e.g., because it's blocking, or it has
      * been preempted). If it hasn't consumed all the budget it got when,
      * starting to run, put that remaining amount back in the domain's budget
      * pool.
@@ -1861,58 +1861,58 @@ vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked)
     svc->budget = 0;
 
     /*
-     * Making budget available again to the domain means that parked vCPUs
-     * may be unparked and run. They are, if any, in the domain's parked_vcpus
+     * Making budget available again to the domain means that parked units
+     * may be unparked and run. They are, if any, in the domain's parked_units
      * list, so we want to go through that and unpark them (so they can try
      * to get some budget).
      *
      * Touching the list requires the budget_lock, which we hold. Let's
      * therefore put everyone in that list in another, temporary list, which
-     * then the caller will traverse, unparking the vCPUs it finds there.
+     * then the caller will traverse, unparking the units it finds there.
      *
      * In fact, we can't do the actual unparking here, because that requires
-     * taking the runqueue lock of the vCPUs being unparked, and we can't
+     * taking the runqueue lock of the units being unparked, and we can't
      * take any runqueue locks while we hold a budget_lock.
      */
     if ( sdom->budget > 0 )
-        list_splice_init(&sdom->parked_vcpus, parked);
+        list_splice_init(&sdom->parked_units, parked);
 
     spin_unlock(&sdom->budget_lock);
 }
 
 static void
-unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
+unpark_parked_units(const struct scheduler *ops, struct list_head *units)
 {
     struct csched2_unit *svc, *tmp;
     spinlock_t *lock;
 
-    list_for_each_entry_safe(svc, tmp, vcpus, parked_elem)
+    list_for_each_entry_safe ( svc, tmp, units, parked_elem )
     {
         unsigned long flags;
         s_time_t now;
 
-        lock = unit_schedule_lock_irqsave(svc->vcpu->sched_unit, &flags);
+        lock = unit_schedule_lock_irqsave(svc->unit, &flags);
 
-        __clear_bit(_VPF_parked, &svc->vcpu->pause_flags);
+        sched_clear_pause_flags(svc->unit, _VPF_parked);
         if ( unlikely(svc->flags & CSFLAG_scheduled) )
         {
             /*
              * We end here if a budget replenishment arrived between
              * csched2_schedule() (and, in particular, after a call to
-             * vcpu_grab_budget() that returned false), and
+             * unit_grab_budget() that returned false), and
              * context_saved(). By setting __CSFLAG_delayed_runq_add,
-             * we tell context_saved() to put the vCPU back in the
+             * we tell context_saved() to put the unit back in the
              * runqueue, from where it will compete with the others
              * for the newly replenished budget.
              */
             ASSERT( svc->rqd != NULL );
-            ASSERT( c2rqd(ops, svc->vcpu->processor) == svc->rqd );
+            ASSERT( c2rqd(ops, sched_unit_cpu(svc->unit)) == svc->rqd );
             __set_bit(__CSFLAG_delayed_runq_add, &svc->flags);
         }
-        else if ( vcpu_runnable(svc->vcpu) )
+        else if ( unit_runnable(svc->unit) )
         {
             /*
-             * The vCPU should go back to the runqueue, and compete for
+             * The unit should go back to the runqueue, and compete for
              * the newly replenished budget, but only if it is actually
              * runnable (and was therefore offline only because of the
              * lack of budget).
@@ -1924,7 +1924,7 @@ unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
         }
         list_del_init(&svc->parked_elem);
 
-        unit_schedule_unlock_irqrestore(lock, flags, svc->vcpu->sched_unit);
+        unit_schedule_unlock_irqrestore(lock, flags, svc->unit);
     }
 }
 
@@ -1954,7 +1954,7 @@ static void replenish_domain_budget(void* data)
      *
      * Even in cases of overrun or delay, however, we expect that in 99% of
      * cases, doing just one replenishment will be good enough for being able
-     * to unpark the vCPUs that are waiting for some budget.
+     * to unpark the units that are waiting for some budget.
      */
     do_replenish(sdom);
 
@@ -1974,7 +1974,7 @@ static void replenish_domain_budget(void* data)
     }
     /*
      * 2) if we overrun by more than tot_budget, then budget+tot_budget is
-     * still < 0, which means that we can't unpark the vCPUs. Let's bail,
+     * still < 0, which means that we can't unpark the units. Let's bail,
      * and wait for future replenishments.
      */
     if ( unlikely(sdom->budget <= 0) )
@@ -1988,14 +1988,14 @@ static void replenish_domain_budget(void* data)
 
     /*
      * As above, let's prepare the temporary list, out of the domain's
-     * parked_vcpus list, now that we hold the budget_lock. Then, drop such
+     * parked_units list, now that we hold the budget_lock. Then, drop such
      * lock, and pass the list to the unparking function.
      */
-    list_splice_init(&sdom->parked_vcpus, &parked);
+    list_splice_init(&sdom->parked_units, &parked);
 
     spin_unlock_irqrestore(&sdom->budget_lock, flags);
 
-    unpark_parked_vcpus(sdom->dom->cpupool->sched, &parked);
+    unpark_parked_units(sdom->dom->cpupool->sched, &parked);
 
  out:
     set_timer(&sdom->repl_timer, sdom->next_repl);
@@ -2003,37 +2003,36 @@ static void replenish_domain_budget(void* data)
 
 #ifndef NDEBUG
 static inline void
-csched2_vcpu_check(struct vcpu *vc)
+csched2_unit_check(struct sched_unit *unit)
 {
-    struct csched2_unit * const svc = csched2_unit(vc->sched_unit);
+    struct csched2_unit * const svc = csched2_unit(unit);
     struct csched2_dom * const sdom = svc->sdom;
 
-    BUG_ON( svc->vcpu != vc );
-    BUG_ON( sdom != csched2_dom(vc->domain) );
+    BUG_ON( svc->unit != unit );
+    BUG_ON( sdom != csched2_dom(unit->domain) );
     if ( sdom )
     {
-        BUG_ON( is_idle_vcpu(vc) );
-        BUG_ON( sdom->dom != vc->domain );
+        BUG_ON( is_idle_unit(unit) );
+        BUG_ON( sdom->dom != unit->domain );
     }
     else
     {
-        BUG_ON( !is_idle_vcpu(vc) );
+        BUG_ON( !is_idle_unit(unit) );
     }
     SCHED_STAT_CRANK(unit_check);
 }
-#define CSCHED2_VCPU_CHECK(_vc)  (csched2_vcpu_check(_vc))
+#define CSCHED2_UNIT_CHECK(unit)  (csched2_unit_check(unit))
 #else
-#define CSCHED2_VCPU_CHECK(_vc)
+#define CSCHED2_UNIT_CHECK(unit)
 #endif
 
 static void *
 csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
                     void *dd)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit *svc;
 
-    /* Allocate per-VCPU info */
+    /* Allocate per-UNIT info */
     svc = xzalloc(struct csched2_unit);
     if ( svc == NULL )
         return NULL;
@@ -2042,10 +2041,10 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
     INIT_LIST_HEAD(&svc->runq_elem);
 
     svc->sdom = dd;
-    svc->vcpu = vc;
+    svc->unit = unit;
     svc->flags = 0U;
 
-    if ( ! is_idle_vcpu(vc) )
+    if ( ! is_idle_unit(unit) )
     {
         ASSERT(svc->sdom != NULL);
         svc->credit = CSCHED2_CREDIT_INIT;
@@ -2074,19 +2073,18 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
 static void
 csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit * const svc = csched2_unit(unit);
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_unit(unit));
     SCHED_STAT_CRANK(unit_sleep);
 
-    if ( curr_on_cpu(vc->processor) == unit )
+    if ( curr_on_cpu(sched_unit_cpu(unit)) == unit )
     {
-        tickle_cpu(vc->processor, svc->rqd);
+        tickle_cpu(sched_unit_cpu(unit), svc->rqd);
     }
-    else if ( vcpu_on_runq(svc) )
+    else if ( unit_on_runq(svc) )
     {
-        ASSERT(svc->rqd == c2rqd(ops, vc->processor));
+        ASSERT(svc->rqd == c2rqd(ops, sched_unit_cpu(unit)));
         update_load(ops, svc->rqd, svc, -1, NOW());
         runq_remove(svc);
     }
@@ -2097,14 +2095,13 @@ csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 static void
 csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit * const svc = csched2_unit(unit);
-    unsigned int cpu = vc->processor;
+    unsigned int cpu = sched_unit_cpu(unit);
     s_time_t now;
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_unit(unit));
 
     if ( unlikely(curr_on_cpu(cpu) == unit) )
     {
@@ -2112,18 +2109,18 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
         goto out;
     }
 
-    if ( unlikely(vcpu_on_runq(svc)) )
+    if ( unlikely(unit_on_runq(svc)) )
     {
         SCHED_STAT_CRANK(unit_wake_onrunq);
         goto out;
     }
 
-    if ( likely(vcpu_runnable(vc)) )
+    if ( likely(unit_runnable(unit)) )
         SCHED_STAT_CRANK(unit_wake_runnable);
     else
         SCHED_STAT_CRANK(unit_wake_not_runnable);
 
-    /* If the context hasn't been saved for this vcpu yet, we can't put it on
+    /* If the context hasn't been saved for this unit yet, we can't put it on
      * another runqueue.  Instead, we set a flag so that it will be put on the runqueue
      * after the context has been saved. */
     if ( unlikely(svc->flags & CSFLAG_scheduled) )
@@ -2134,15 +2131,15 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 
     /* Add into the new runqueue if necessary */
     if ( svc->rqd == NULL )
-        runq_assign(ops, vc);
+        runq_assign(ops, unit);
     else
-        ASSERT(c2rqd(ops, vc->processor) == svc->rqd );
+        ASSERT(c2rqd(ops, sched_unit_cpu(unit)) == svc->rqd );
 
     now = NOW();
 
     update_load(ops, svc->rqd, svc, 1, now);
-        
-    /* Put the VCPU on the runq */
+
+    /* Put the UNIT on the runq */
     runq_insert(ops, svc);
     runq_tickle(ops, svc, now);
 
@@ -2155,49 +2152,48 @@ csched2_unit_yield(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched2_unit * const svc = csched2_unit(unit);
 
-    __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
+    __set_bit(__CSFLAG_unit_yield, &svc->flags);
 }
 
 static void
 csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit * const svc = csched2_unit(unit);
     spinlock_t *lock = unit_schedule_lock_irq(unit);
     s_time_t now = NOW();
     LIST_HEAD(were_parked);
 
-    BUG_ON( !is_idle_vcpu(vc) && svc->rqd != c2rqd(ops, vc->processor));
-    ASSERT(is_idle_vcpu(vc) || svc->rqd == c2rqd(ops, vc->processor));
+    BUG_ON( !is_idle_unit(unit) && svc->rqd != c2rqd(ops, sched_unit_cpu(unit)));
+    ASSERT(is_idle_unit(unit) || svc->rqd == c2rqd(ops, sched_unit_cpu(unit)));
 
-    /* This vcpu is now eligible to be put on the runqueue again */
+    /* This unit is now eligible to be put on the runqueue again */
     __clear_bit(__CSFLAG_scheduled, &svc->flags);
 
     if ( unlikely(has_cap(svc) && svc->budget > 0) )
-        vcpu_return_budget(svc, &were_parked);
+        unit_return_budget(svc, &were_parked);
 
     /* If someone wants it on the runqueue, put it there. */
     /*
      * NB: We can get rid of CSFLAG_scheduled by checking for
-     * vc->is_running and vcpu_on_runq(svc) here.  However,
+     * vc->is_running and unit_on_runq(svc) here.  However,
      * since we're accessing the flags cacheline anyway,
      * it seems a bit pointless; especially as we have plenty of
      * bits free.
      */
     if ( __test_and_clear_bit(__CSFLAG_delayed_runq_add, &svc->flags)
-         && likely(vcpu_runnable(vc)) )
+         && likely(unit_runnable(unit)) )
     {
-        ASSERT(!vcpu_on_runq(svc));
+        ASSERT(!unit_on_runq(svc));
 
         runq_insert(ops, svc);
         runq_tickle(ops, svc, now);
     }
-    else if ( !is_idle_vcpu(vc) )
+    else if ( !is_idle_unit(unit) )
         update_load(ops, svc->rqd, svc, -1, now);
 
     unit_schedule_unlock_irq(lock, unit);
 
-    unpark_parked_vcpus(ops, &were_parked);
+    unpark_parked_units(ops, &were_parked);
 }
 
 #define MAX_LOAD (STIME_MAX)
@@ -2205,9 +2201,8 @@ static struct sched_resource *
 csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    struct vcpu *vc = unit->vcpu;
     int i, min_rqi = -1, min_s_rqi = -1;
-    unsigned int new_cpu, cpu = vc->processor;
+    unsigned int new_cpu, cpu = sched_unit_cpu(unit);
     struct csched2_unit *svc = csched2_unit(unit);
     s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
     bool has_soft;
@@ -2245,7 +2240,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
     }
 
     cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                cpupool_domain_cpumask(vc->domain));
+                cpupool_domain_cpumask(unit->domain));
 
     /*
      * First check to see if we're here because someone else suggested a place
@@ -2356,7 +2351,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
          * We have soft affinity, and we have a candidate runq, so go for it.
          *
          * Note that, to obtain the soft-affinity mask, we "just" put what we
-         * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
+         * have in cpumask_scratch in && with unit->cpu_soft_affinity. This is
          * ok because:
          * - we know that unit->cpu_hard_affinity and ->cpu_soft_affinity have
          *   a non-empty intersection (because has_soft is true);
@@ -2379,7 +2374,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
          * any suitable runq. But we did find one when considering hard
          * affinity, so go for it.
          *
-         * cpumask_scratch already has vc->cpu_hard_affinity &
+         * cpumask_scratch already has unit->cpu_hard_affinity &
          * cpupool_domain_cpumask() in it, so it's enough that we filter
          * with the cpus of the runq.
          */
@@ -2410,11 +2405,11 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
     {
         struct {
             uint64_t b_avgload;
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned rq_id:16, new_cpu:16;
         } d;
-        d.dom = vc->domain->domain_id;
-        d.vcpu = vc->vcpu_id;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
         d.rq_id = min_rqi;
         d.b_avgload = min_avgload;
         d.new_cpu = new_cpu;
@@ -2433,10 +2428,10 @@ typedef struct {
     struct csched2_unit * best_push_svc, *best_pull_svc;
     /* NB: Read by consider() */
     struct csched2_runqueue_data *lrqd;
-    struct csched2_runqueue_data *orqd;                  
+    struct csched2_runqueue_data *orqd;
 } balance_state_t;
 
-static void consider(balance_state_t *st, 
+static void consider(balance_state_t *st,
                      struct csched2_unit *push_svc,
                      struct csched2_unit *pull_svc)
 {
@@ -2475,17 +2470,17 @@ static void migrate(const struct scheduler *ops,
                     struct csched2_runqueue_data *trqd,
                     s_time_t now)
 {
-    int cpu = svc->vcpu->processor;
-    struct sched_unit *unit = svc->vcpu->sched_unit;
+    struct sched_unit *unit = svc->unit;
+    int cpu = sched_unit_cpu(unit);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned rqi:16, trqi:16;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = unit->domain->domain_id;
+        d.unit = unit->unit_id;
         d.rqi = svc->rqd->id;
         d.trqi = trqd->id;
         __trace_var(TRC_CSCHED2_MIGRATE, 1,
@@ -2497,7 +2492,7 @@ static void migrate(const struct scheduler *ops,
     {
         /* It's running; mark it to migrate. */
         svc->migrate_rqd = trqd;
-        __set_bit(_VPF_migrating, &svc->vcpu->pause_flags);
+        sched_set_pause_flags(unit, _VPF_migrating);
         __set_bit(__CSFLAG_runq_migrate_request, &svc->flags);
         SCHED_STAT_CRANK(migrate_requested);
         tickle_cpu(cpu, svc->rqd);
@@ -2506,7 +2501,7 @@ static void migrate(const struct scheduler *ops,
     {
         int on_runq = 0;
         /* It's not running; just move it */
-        if ( vcpu_on_runq(svc) )
+        if ( unit_on_runq(svc) )
         {
             runq_remove(svc);
             update_load(ops, svc->rqd, NULL, -1, now);
@@ -2515,14 +2510,14 @@ static void migrate(const struct scheduler *ops,
         _runq_deassign(svc);
 
         cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                    cpupool_domain_cpumask(svc->vcpu->domain));
+                    cpupool_domain_cpumask(unit->domain));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
-        svc->vcpu->processor = cpumask_cycle(trqd->pick_bias,
-                                             cpumask_scratch_cpu(cpu));
-        svc->vcpu->sched_unit->res = get_sched_res(svc->vcpu->processor);
-        trqd->pick_bias = svc->vcpu->processor;
-        ASSERT(svc->vcpu->processor < nr_cpu_ids);
+        sched_set_res(unit,
+                      get_sched_res(cpumask_cycle(trqd->pick_bias,
+                                                  cpumask_scratch_cpu(cpu))));
+        trqd->pick_bias = sched_unit_cpu(unit);
+        ASSERT(sched_unit_cpu(unit) < nr_cpu_ids);
 
         _runq_assign(svc, trqd);
         if ( on_runq )
@@ -2542,14 +2537,14 @@ static void migrate(const struct scheduler *ops,
  *  - svc is not already flagged to migrate,
  *  - if svc is allowed to run on at least one of the pcpus of rqd.
  */
-static bool vcpu_is_migrateable(struct csched2_unit *svc,
+static bool unit_is_migrateable(struct csched2_unit *svc,
                                   struct csched2_runqueue_data *rqd)
 {
-    struct vcpu *v = svc->vcpu;
-    int cpu = svc->vcpu->processor;
+    struct sched_unit *unit = svc->unit;
+    int cpu = sched_unit_cpu(unit);
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->sched_unit->cpu_hard_affinity,
-                cpupool_domain_cpumask(v->domain));
+    cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
+                cpupool_domain_cpumask(unit->domain));
 
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
            cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
@@ -2586,7 +2581,7 @@ retry:
     for_each_cpu(i, &prv->active_queues)
     {
         s_time_t delta;
-        
+
         st.orqd = prv->rqd + i;
 
         if ( st.orqd == st.lrqd
@@ -2594,7 +2589,7 @@ retry:
             continue;
 
         update_runq_load(ops, st.orqd, 0, now);
-    
+
         delta = st.lrqd->b_avgload - st.orqd->b_avgload;
         if ( delta < 0 )
             delta = -delta;
@@ -2617,7 +2612,7 @@ retry:
         s_time_t load_max;
         int cpus_max;
 
-        
+
         load_max = st.lrqd->b_avgload;
         if ( st.orqd->b_avgload > load_max )
             load_max = st.orqd->b_avgload;
@@ -2656,7 +2651,7 @@ retry:
                                            opt_overload_balance_tolerance)) )
                 goto out;
     }
-             
+
     /* Try to grab the other runqueue lock; if it's been taken in the
      * meantime, try the process over again.  This can't deadlock
      * because if it doesn't get any other rqd locks, it will simply
@@ -2696,17 +2691,17 @@ retry:
 
         update_svc_load(ops, push_svc, 0, now);
 
-        if ( !vcpu_is_migrateable(push_svc, st.orqd) )
+        if ( !unit_is_migrateable(push_svc, st.orqd) )
             continue;
 
         list_for_each( pull_iter, &st.orqd->svc )
         {
             struct csched2_unit * pull_svc = list_entry(pull_iter, struct csched2_unit, rqd_elem);
-            
+
             if ( !inner_load_updated )
                 update_svc_load(ops, pull_svc, 0, now);
-        
-            if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
+
+            if ( !unit_is_migrateable(pull_svc, st.lrqd) )
                 continue;
 
             consider(&st, push_svc, pull_svc);
@@ -2721,8 +2716,8 @@ retry:
     list_for_each( pull_iter, &st.orqd->svc )
     {
         struct csched2_unit * pull_svc = list_entry(pull_iter, struct csched2_unit, rqd_elem);
-        
-        if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
+
+        if ( !unit_is_migrateable(pull_svc, st.lrqd) )
             continue;
 
         /* Consider pull only */
@@ -2745,8 +2740,7 @@ static void
 csched2_unit_migrate(
     const struct scheduler *ops, struct sched_unit *unit, unsigned int new_cpu)
 {
-    struct vcpu *vc = unit->vcpu;
-    struct domain *d = vc->domain;
+    struct domain *d = unit->domain;
     struct csched2_unit * const svc = csched2_unit(unit);
     struct csched2_runqueue_data *trqd;
     s_time_t now = NOW();
@@ -2758,25 +2752,24 @@ csched2_unit_migrate(
      * cpupool.
      *
      * And since there indeed is the chance that it is not part of it, all
-     * we must do is remove _and_ unassign the vCPU from any runqueue, as
+     * we must do is remove _and_ unassign the unit from any runqueue, as
      * well as updating v->processor with the target, so that the suspend
      * process can continue.
      *
      * It will then be during resume that a new, meaningful, value for
      * v->processor will be chosen, and during actual domain unpause that
-     * the vCPU will be assigned to and added to the proper runqueue.
+     * the unit will be assigned to and added to the proper runqueue.
      */
     if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
     {
         ASSERT(system_state == SYS_STATE_suspend);
-        if ( vcpu_on_runq(svc) )
+        if ( unit_on_runq(svc) )
         {
             runq_remove(svc);
             update_load(ops, svc->rqd, NULL, -1, now);
         }
         _runq_deassign(svc);
-        vc->processor = new_cpu;
-        unit->res = get_sched_res(new_cpu);
+        sched_set_res(unit, get_sched_res(new_cpu));
         return;
     }
 
@@ -2790,17 +2783,14 @@ csched2_unit_migrate(
      * Do the actual movement toward new_cpu, and update vc->processor.
      * If we are changing runqueue, migrate() takes care of everything.
      * If we are not changing runqueue, we need to update vc->processor
-     * here. In fact, if, for instance, we are here because the vcpu's
+     * here. In fact, if, for instance, we are here because the unit's
      * hard affinity changed, we don't want to risk leaving vc->processor
      * pointing to a pcpu where we can't run any longer.
      */
     if ( trqd != svc->rqd )
         migrate(ops, svc, trqd, now);
     else
-    {
-        vc->processor = new_cpu;
-        unit->res = get_sched_res(new_cpu);
-    }
+        sched_set_res(unit, get_sched_res(new_cpu));
 }
 
 static int
@@ -2812,18 +2802,18 @@ csched2_dom_cntl(
     struct csched2_dom * const sdom = csched2_dom(d);
     struct csched2_private *prv = csched2_priv(ops);
     unsigned long flags;
-    struct vcpu *v;
+    struct sched_unit *unit;
     int rc = 0;
 
     /*
      * Locking:
      *  - we must take the private lock for accessing the weights of the
-     *    vcpus of d, and/or the cap;
+     *    units of d, and/or the cap;
      *  - in the putinfo case, we also need the runqueue lock(s), for
      *    updating the max waight of the runqueue(s).
      *    If changing the cap, we also need the budget_lock, for updating
      *    the value of the domain budget pool (and the runqueue lock,
-     *    for adjusting the parameters and rescheduling any vCPU that is
+     *    for adjusting the parameters and rescheduling any unit that is
      *    running at the time of the change).
      */
     switch ( op->cmd )
@@ -2845,18 +2835,18 @@ csched2_dom_cntl(
 
             sdom->weight = op->u.credit2.weight;
 
-            /* Update weights for vcpus, and max_weight for runqueues on which they reside */
-            for_each_vcpu ( d, v )
+            /* Update weights for units, and max_weight for runqueues on which they reside */
+            for_each_sched_unit ( d, unit )
             {
-                struct csched2_unit *svc = csched2_unit(v->sched_unit);
-                spinlock_t *lock = unit_schedule_lock(svc->vcpu->sched_unit);
+                struct csched2_unit *svc = csched2_unit(unit);
+                spinlock_t *lock = unit_schedule_lock(unit);
 
-                ASSERT(svc->rqd == c2rqd(ops, svc->vcpu->processor));
+                ASSERT(svc->rqd == c2rqd(ops, sched_unit_cpu(unit)));
 
                 svc->weight = sdom->weight;
                 update_max_weight(svc->rqd, svc->weight, old_weight);
 
-                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+                unit_schedule_unlock(lock, unit);
             }
         }
         /* Cap */
@@ -2865,8 +2855,8 @@ csched2_dom_cntl(
             struct csched2_unit *svc;
             spinlock_t *lock;
 
-            /* Cap is only valid if it's below 100 * nr_of_vCPUS */
-            if ( op->u.credit2.cap > 100 * sdom->nr_vcpus )
+            /* Cap is only valid if it's below 100 * nr_of_units */
+            if ( op->u.credit2.cap > 100 * sdom->nr_units )
             {
                 rc = -EINVAL;
                 write_unlock_irqrestore(&prv->lock, flags);
@@ -2879,23 +2869,23 @@ csched2_dom_cntl(
             spin_unlock(&sdom->budget_lock);
 
             /*
-             * When trying to get some budget and run, each vCPU will grab
-             * from the pool 1/N (with N = nr of vCPUs of the domain) of
-             * the total budget. Roughly speaking, this means each vCPU will
+             * When trying to get some budget and run, each unit will grab
+             * from the pool 1/N (with N = nr of units of the domain) of
+             * the total budget. Roughly speaking, this means each unit will
              * have at least one chance to run during every period.
              */
-            for_each_vcpu ( d, v )
+            for_each_sched_unit ( d, unit )
             {
-                svc = csched2_unit(v->sched_unit);
-                lock = unit_schedule_lock(svc->vcpu->sched_unit);
+                svc = csched2_unit(unit);
+                lock = unit_schedule_lock(unit);
                 /*
                  * Too small quotas would in theory cause a lot of overhead,
                  * which then won't happen because, in csched2_runtime(),
                  * CSCHED2_MIN_TIMER is what would be used anyway.
                  */
-                svc->budget_quota = max(sdom->tot_budget / sdom->nr_vcpus,
+                svc->budget_quota = max(sdom->tot_budget / sdom->nr_units,
                                         CSCHED2_MIN_TIMER);
-                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+                unit_schedule_unlock(lock, unit);
             }
 
             if ( sdom->cap == 0 )
@@ -2905,7 +2895,7 @@ csched2_dom_cntl(
                  * and queue its first replenishment event.
                  *
                  * Since cap is currently disabled for this domain, we
-                 * know no vCPU is messing with the domain's budget, and
+                 * know no unit is messing with the domain's budget, and
                  * the replenishment timer is still off.
                  * For these reasons, it is safe to do the following without
                  * taking the budget_lock.
@@ -2915,42 +2905,42 @@ csched2_dom_cntl(
                 set_timer(&sdom->repl_timer, sdom->next_repl);
 
                 /*
-                 * Now, let's enable budget accounting for all the vCPUs.
+                 * Now, let's enable budget accounting for all the units.
                  * For making sure that they will start to honour the domain's
                  * cap, we set their budget to 0.
                  * This way, as soon as they will try to run, they will have
                  * to get some budget.
                  *
-                 * For the vCPUs that are already running, we trigger the
+                 * For the units that are already running, we trigger the
                  * scheduler on their pCPU. When, as a consequence of this,
                  * csched2_schedule() will run, it will figure out there is
-                 * no budget, and the vCPU will try to get some (and be parked,
+                 * no budget, and the unit will try to get some (and be parked,
                  * if there's none, and we'll switch to someone else).
                  */
-                for_each_vcpu ( d, v )
+                for_each_sched_unit ( d, unit )
                 {
-                    svc = csched2_unit(v->sched_unit);
-                    lock = unit_schedule_lock(svc->vcpu->sched_unit);
-                    if ( v->sched_unit->is_running )
+                    svc = csched2_unit(unit);
+                    lock = unit_schedule_lock(unit);
+                    if ( unit->is_running )
                     {
-                        unsigned int cpu = v->processor;
+                        unsigned int cpu = sched_unit_cpu(unit);
                         struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
 
-                        ASSERT(curr_on_cpu(cpu)->vcpu == v);
+                        ASSERT(curr_on_cpu(cpu) == unit);
 
                         /*
-                         * We are triggering a reschedule on the vCPU's
+                         * We are triggering a reschedule on the unit's
                          * pCPU. That will run burn_credits() and, since
-                         * the vCPU is capped now, it would charge all the
+                         * the unit is capped now, it would charge all the
                          * execution time of this last round as budget as
-                         * well. That will make the vCPU budget go negative,
+                         * well. That will make the unit budget go negative,
                          * potentially by a large amount, and it's unfair.
                          *
                          * To avoid that, call burn_credit() here, to do the
                          * accounting of this current running instance now,
                          * with budgetting still disabled. This does not
                          * prevent some small amount of budget being charged
-                         * to the vCPU (i.e., the amount of time it runs from
+                         * to the unit (i.e., the amount of time it runs from
                          * now, to when scheduling happens). The budget will
                          * also go below 0, but a lot less than how it would
                          * if we don't do this.
@@ -2961,7 +2951,7 @@ csched2_dom_cntl(
                         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
                     }
                     svc->budget = 0;
-                    unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+                    unit_schedule_unlock(lock, unit);
                 }
             }
 
@@ -2973,30 +2963,30 @@ csched2_dom_cntl(
 
             stop_timer(&sdom->repl_timer);
 
-            /* Disable budget accounting for all the vCPUs. */
-            for_each_vcpu ( d, v )
+            /* Disable budget accounting for all the units. */
+            for_each_sched_unit ( d, unit )
             {
-                struct csched2_unit *svc = csched2_unit(v->sched_unit);
-                spinlock_t *lock = unit_schedule_lock(svc->vcpu->sched_unit);
+                struct csched2_unit *svc = csched2_unit(unit);
+                spinlock_t *lock = unit_schedule_lock(unit);
 
                 svc->budget = STIME_MAX;
                 svc->budget_quota = 0;
 
-                unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+                unit_schedule_unlock(lock, unit);
             }
             sdom->cap = 0;
             /*
              * We are disabling the cap for this domain, which may have
-             * vCPUs waiting for a replenishment, so we unpark them all.
+             * units waiting for a replenishment, so we unpark them all.
              * Note that, since we have already disabled budget accounting
-             * for all the vCPUs of the domain, no currently running vCPU
-             * will be added to the parked vCPUs list any longer.
+             * for all the units of the domain, no currently running unit
+             * will be added to the parked units list any longer.
              */
             spin_lock(&sdom->budget_lock);
-            list_splice_init(&sdom->parked_vcpus, &parked);
+            list_splice_init(&sdom->parked_units, &parked);
             spin_unlock(&sdom->budget_lock);
 
-            unpark_parked_vcpus(ops, &parked);
+            unpark_parked_units(ops, &parked);
         }
         write_unlock_irqrestore(&prv->lock, flags);
         break;
@@ -3073,12 +3063,12 @@ csched2_alloc_domdata(const struct scheduler *ops, struct domain *dom)
     sdom->dom = dom;
     sdom->weight = CSCHED2_DEFAULT_WEIGHT;
     sdom->cap = 0U;
-    sdom->nr_vcpus = 0;
+    sdom->nr_units = 0;
 
     init_timer(&sdom->repl_timer, replenish_domain_budget, sdom,
                cpumask_any(cpupool_domain_cpumask(dom)));
     spin_lock_init(&sdom->budget_lock);
-    INIT_LIST_HEAD(&sdom->parked_vcpus);
+    INIT_LIST_HEAD(&sdom->parked_units);
 
     write_lock_irqsave(&prv->lock, flags);
 
@@ -3112,34 +3102,32 @@ csched2_free_domdata(const struct scheduler *ops, void *data)
 static void
 csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit *svc = unit->priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_unit(unit));
     ASSERT(list_empty(&svc->runq_elem));
 
     /* csched2_res_pick() expects the pcpu lock to be held */
     lock = unit_schedule_lock_irq(unit);
 
-    unit->res = csched2_res_pick(ops, unit);
-    vc->processor = unit->res->processor;
+    sched_set_res(unit, csched2_res_pick(ops, unit));
 
     spin_unlock_irq(lock);
 
     lock = unit_schedule_lock_irq(unit);
 
-    /* Add vcpu to runqueue of initial processor */
-    runq_assign(ops, vc);
+    /* Add unit to runqueue of initial processor */
+    runq_assign(ops, unit);
 
     unit_schedule_unlock_irq(lock, unit);
 
-    sdom->nr_vcpus++;
+    sdom->nr_units++;
 
     SCHED_STAT_CRANK(unit_insert);
 
-    CSCHED2_VCPU_CHECK(vc);
+    CSCHED2_UNIT_CHECK(unit);
 }
 
 static void
@@ -3153,11 +3141,10 @@ csched2_free_vdata(const struct scheduler *ops, void *priv)
 static void
 csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     struct csched2_unit * const svc = csched2_unit(unit);
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_unit(unit));
     ASSERT(list_empty(&svc->runq_elem));
 
     SCHED_STAT_CRANK(unit_remove);
@@ -3165,14 +3152,14 @@ csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     /* Remove from runqueue */
     lock = unit_schedule_lock_irq(unit);
 
-    runq_deassign(ops, vc);
+    runq_deassign(ops, unit);
 
     unit_schedule_unlock_irq(lock, unit);
 
-    svc->sdom->nr_vcpus--;
+    svc->sdom->nr_units--;
 }
 
-/* How long should we let this vcpu run for? */
+/* How long should we let this unit run for? */
 static s_time_t
 csched2_runtime(const struct scheduler *ops, int cpu,
                 struct csched2_unit *snext, s_time_t now)
@@ -3187,7 +3174,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
      * If we're idle, just stay so. Others (or external events)
      * will poke us when necessary.
      */
-    if ( is_idle_vcpu(snext->vcpu) )
+    if ( is_idle_unit(snext->unit) )
         return -1;
 
     /* General algorithm:
@@ -3204,8 +3191,8 @@ csched2_runtime(const struct scheduler *ops, int cpu,
     if ( prv->ratelimit_us )
     {
         s_time_t ratelimit_min = MICROSECS(prv->ratelimit_us);
-        if ( snext->vcpu->sched_unit->is_running )
-            ratelimit_min = snext->vcpu->sched_unit->state_entry_time +
+        if ( snext->unit->is_running )
+            ratelimit_min = snext->unit->state_entry_time +
                             MICROSECS(prv->ratelimit_us) - now;
         if ( ratelimit_min > min_time )
             min_time = ratelimit_min;
@@ -3222,7 +3209,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
     {
         struct csched2_unit *swait = runq_elem(runq->next);
 
-        if ( ! is_idle_vcpu(swait->vcpu)
+        if ( ! is_idle_unit(swait->unit)
              && swait->credit > 0 )
         {
             rt_credit = snext->credit - swait->credit;
@@ -3236,7 +3223,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
      *
      * FIXME: See if we can eliminate this conversion if we know time
      * will be outside (MIN,MAX).  Probably requires pre-calculating
-     * credit values of MIN,MAX per vcpu, since each vcpu burns credit
+     * credit values of MIN,MAX per unit, since each unit burns credit
      * at a different rate.
      */
     if ( rt_credit > 0 )
@@ -3284,36 +3271,35 @@ runq_candidate(struct csched2_runqueue_data *rqd,
 
     *skipped = 0;
 
-    if ( unlikely(is_idle_vcpu(scurr->vcpu)) )
+    if ( unlikely(is_idle_unit(scurr->unit)) )
     {
         snext = scurr;
         goto check_runq;
     }
 
-    yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+    yield = __test_and_clear_bit(__CSFLAG_unit_yield, &scurr->flags);
 
     /*
-     * Return the current vcpu if it has executed for less than ratelimit.
-     * Adjuststment for the selected vcpu's credit and decision
+     * Return the current unit if it has executed for less than ratelimit.
+     * Adjuststment for the selected unit's credit and decision
      * for how long it will run will be taken in csched2_runtime.
      *
      * Note that, if scurr is yielding, we don't let rate limiting kick in.
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) &&
-         (now - scurr->vcpu->sched_unit->state_entry_time) <
-          MICROSECS(prv->ratelimit_us) )
+    if ( !yield && prv->ratelimit_us && unit_runnable(scurr->unit) &&
+         (now - scurr->unit->state_entry_time) < MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned unit:16, dom:16;
                 unsigned runtime;
             } d;
-            d.dom = scurr->vcpu->domain->domain_id;
-            d.vcpu = scurr->vcpu->vcpu_id;
-            d.runtime = now - scurr->vcpu->sched_unit->state_entry_time;
+            d.dom = scurr->unit->domain->domain_id;
+            d.unit = scurr->unit->unit_id;
+            d.runtime = now - scurr->unit->state_entry_time;
             __trace_var(TRC_CSCHED2_RATELIMIT, 1,
                         sizeof(d),
                         (unsigned char *)&d);
@@ -3322,13 +3308,13 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     }
 
     /* If scurr has a soft-affinity, let's check whether cpu is part of it */
-    if ( has_soft_affinity(scurr->vcpu->sched_unit) )
+    if ( has_soft_affinity(scurr->unit) )
     {
-        affinity_balance_cpumask(scurr->vcpu->sched_unit, BALANCE_SOFT_AFFINITY,
+        affinity_balance_cpumask(scurr->unit, BALANCE_SOFT_AFFINITY,
                                  cpumask_scratch);
         if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) )
         {
-            cpumask_t *online = cpupool_domain_cpumask(scurr->vcpu->domain);
+            cpumask_t *online = cpupool_domain_cpumask(scurr->unit->domain);
 
             /* Ok, is any of the pcpus in scurr soft-affinity idle? */
             cpumask_and(cpumask_scratch, cpumask_scratch, &rqd->idle);
@@ -3356,10 +3342,10 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      *
      * Of course, we also default to idle also if scurr is not runnable.
      */
-    if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
+    if ( unit_runnable(scurr->unit) && !soft_aff_preempt )
         snext = scurr;
     else
-        snext = csched2_unit(idle_vcpu[cpu]->sched_unit);
+        snext = csched2_unit(sched_idle_unit(cpu));
 
  check_runq:
     list_for_each_safe( iter, temp, &rqd->runq )
@@ -3369,24 +3355,24 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned unit:16, dom:16;
             } d;
-            d.dom = svc->vcpu->domain->domain_id;
-            d.vcpu = svc->vcpu->vcpu_id;
+            d.dom = svc->unit->domain->domain_id;
+            d.unit = svc->unit->unit_id;
             __trace_var(TRC_CSCHED2_RUNQ_CAND_CHECK, 1,
                         sizeof(d),
                         (unsigned char *)&d);
         }
 
-        /* Only consider vcpus that are allowed to run on this processor. */
-        if ( !cpumask_test_cpu(cpu, svc->vcpu->sched_unit->cpu_hard_affinity) )
+        /* Only consider units that are allowed to run on this processor. */
+        if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
         {
             (*skipped)++;
             continue;
         }
 
         /*
-         * If a vcpu is meant to be picked up by another processor, and such
+         * If an unit is meant to be picked up by another processor, and such
          * processor has not scheduled yet, leave it in the runqueue for him.
          */
         if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
@@ -3401,7 +3387,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          * If this is on a different processor, don't pull it unless
          * its credit is at least CSCHED2_MIGRATE_RESIST higher.
          */
-        if ( svc->vcpu->processor != cpu
+        if ( sched_unit_cpu(svc->unit) != cpu
              && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
         {
             (*skipped)++;
@@ -3416,7 +3402,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          * some budget, then choose it.
          */
         if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || vcpu_grab_budget(svc)) )
+             (!has_cap(svc) || unit_grab_budget(svc)) )
             snext = svc;
 
         /* In any case, if we got this far, break. */
@@ -3426,12 +3412,12 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned unit:16, dom:16;
             unsigned tickled_cpu, skipped;
             int credit;
         } d;
-        d.dom = snext->vcpu->domain->domain_id;
-        d.vcpu = snext->vcpu->vcpu_id;
+        d.dom = snext->unit->domain->domain_id;
+        d.unit = snext->unit->unit_id;
         d.credit = snext->credit;
         d.tickled_cpu = snext->tickled_cpu;
         d.skipped = *skipped;
@@ -3463,14 +3449,15 @@ csched2_schedule(
 {
     const int cpu = smp_processor_id();
     struct csched2_runqueue_data *rqd;
-    struct csched2_unit * const scurr = csched2_unit(current->sched_unit);
+    struct sched_unit *currunit = current->sched_unit;
+    struct csched2_unit * const scurr = csched2_unit(currunit);
     struct csched2_unit *snext = NULL;
-    unsigned int skipped_vcpus = 0;
+    unsigned int skipped_units = 0;
     struct task_slice ret;
     bool tickled;
 
     SCHED_STAT_CRANK(schedule);
-    CSCHED2_VCPU_CHECK(current);
+    CSCHED2_UNIT_CHECK(currunit);
 
     BUG_ON(!cpumask_test_cpu(cpu, &csched2_priv(ops)->initialized));
 
@@ -3479,7 +3466,7 @@ csched2_schedule(
 
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
 
-    BUG_ON(!is_idle_vcpu(scurr->vcpu) && scurr->rqd != rqd);
+    BUG_ON(!is_idle_unit(currunit) && scurr->rqd != rqd);
 
     /* Clear "tickled" bit now that we've been scheduled */
     tickled = cpumask_test_cpu(cpu, &rqd->tickled);
@@ -3499,7 +3486,7 @@ csched2_schedule(
         d.cpu = cpu;
         d.rq_id = c2r(cpu);
         d.tasklet = tasklet_work_scheduled;
-        d.idle = is_idle_vcpu(current);
+        d.idle = is_idle_unit(currunit);
         d.smt_idle = cpumask_test_cpu(cpu, &rqd->smt_idle);
         d.tickled = tickled;
         __trace_var(TRC_CSCHED2_SCHEDULE, 1,
@@ -3513,55 +3500,55 @@ csched2_schedule(
     /*
      *  Below 0, means that we are capped and we have overrun our  budget.
      *  Let's try to get some more but, if we fail (e.g., because of the
-     *  other running vcpus), we will be parked.
+     *  other running units), we will be parked.
      */
     if ( unlikely(scurr->budget <= 0) )
-        vcpu_grab_budget(scurr);
+        unit_grab_budget(scurr);
 
     /*
-     * Select next runnable local VCPU (ie top of local runq).
+     * Select next runnable local UNIT (ie top of local runq).
      *
-     * If the current vcpu is runnable, and has higher credit than
+     * If the current unit is runnable, and has higher credit than
      * the next guy on the queue (or there is noone else), we want to
      * run him again.
      *
-     * If there's tasklet work to do, we want to chose the idle vcpu
+     * If there's tasklet work to do, we want to chose the idle unit
      * for this processor, and mark the current for delayed runqueue
      * add.
      *
-     * If the current vcpu is runnable, and there's another runnable
+     * If the current unit is runnable, and there's another runnable
      * candidate, we want to mark current for delayed runqueue add,
      * and remove the next guy from the queue.
      *
-     * If the current vcpu is not runnable, we want to chose the idle
-     * vcpu for this processor.
+     * If the current unit is not runnable, we want to chose the idle
+     * unit for this processor.
      */
     if ( tasklet_work_scheduled )
     {
-        __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+        __clear_bit(__CSFLAG_unit_yield, &scurr->flags);
         trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
-        snext = csched2_unit(idle_vcpu[cpu]->sched_unit);
+        snext = csched2_unit(sched_idle_unit(cpu));
     }
     else
-        snext = runq_candidate(rqd, scurr, cpu, now, &skipped_vcpus);
+        snext = runq_candidate(rqd, scurr, cpu, now, &skipped_units);
 
-    /* If switching from a non-idle runnable vcpu, put it
+    /* If switching from a non-idle runnable unit, put it
      * back on the runqueue. */
     if ( snext != scurr
-         && !is_idle_vcpu(scurr->vcpu)
-         && vcpu_runnable(current) )
+         && !is_idle_unit(currunit)
+         && unit_runnable(currunit) )
         __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags);
 
     ret.migrated = 0;
 
     /* Accounting for non-idle tasks */
-    if ( !is_idle_vcpu(snext->vcpu) )
+    if ( !is_idle_unit(snext->unit) )
     {
         /* If switching, remove this from the runqueue and mark it scheduled */
         if ( snext != scurr )
         {
             ASSERT(snext->rqd == rqd);
-            ASSERT(!snext->vcpu->sched_unit->is_running);
+            ASSERT(!snext->unit->is_running);
 
             runq_remove(snext);
             __set_bit(__CSFLAG_scheduled, &snext->flags);
@@ -3576,19 +3563,19 @@ csched2_schedule(
 
         /*
          * The reset condition is "has a scheduler epoch come to an end?".
-         * The way this is enforced is checking whether the vcpu at the top
+         * The way this is enforced is checking whether the unit at the top
          * of the runqueue has negative credits. This means the epochs have
          * variable length, as in one epoch expores when:
-         *  1) the vcpu at the top of the runqueue has executed for
+         *  1) the unit at the top of the runqueue has executed for
          *     around 10 ms (with default parameters);
-         *  2) no other vcpu with higher credits wants to run.
+         *  2) no other unit with higher credits wants to run.
          *
          * Here, where we want to check for reset, we need to make sure the
-         * proper vcpu is being used. In fact, runqueue_candidate() may have
-         * not returned the first vcpu in the runqueue, for various reasons
+         * proper unit is being used. In fact, runqueue_candidate() may have
+         * not returned the first unit in the runqueue, for various reasons
          * (e.g., affinity). Only trigger a reset when it does.
          */
-        if ( skipped_vcpus == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
+        if ( skipped_units == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
         {
             reset_credit(ops, cpu, now, snext);
             balance_load(ops, cpu, now);
@@ -3598,11 +3585,10 @@ csched2_schedule(
         snext->tickled_cpu = -1;
 
         /* Safe because lock for old processor is held */
-        if ( snext->vcpu->processor != cpu )
+        if ( sched_unit_cpu(snext->unit) != cpu )
         {
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
-            snext->vcpu->processor = cpu;
-            snext->vcpu->sched_unit->res = get_sched_res(cpu);
+            sched_set_res(snext->unit, get_sched_res(cpu));
             SCHED_STAT_CRANK(migrated);
             ret.migrated = 1;
         }
@@ -3636,20 +3622,20 @@ csched2_schedule(
      * Return task to run next...
      */
     ret.time = csched2_runtime(ops, cpu, snext, now);
-    ret.task = snext->vcpu->sched_unit;
+    ret.task = snext->unit;
 
-    CSCHED2_VCPU_CHECK(ret.task->vcpu);
+    CSCHED2_UNIT_CHECK(ret.task);
     return ret;
 }
 
 static void
-csched2_dump_vcpu(struct csched2_private *prv, struct csched2_unit *svc)
+csched2_dump_unit(struct csched2_private *prv, struct csched2_unit *svc)
 {
     printk("[%i.%i] flags=%x cpu=%i",
-            svc->vcpu->domain->domain_id,
-            svc->vcpu->vcpu_id,
+            svc->unit->domain->domain_id,
+            svc->unit->unit_id,
             svc->flags,
-            svc->vcpu->processor);
+            sched_unit_cpu(svc->unit));
 
     printk(" credit=%" PRIi32" [w=%u]", svc->credit, svc->weight);
 
@@ -3674,12 +3660,12 @@ dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_sibling_mask, cpu)),
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
-    /* current VCPU (nothing to say if that's the idle vcpu) */
+    /* current UNIT (nothing to say if that's the idle unit) */
     svc = csched2_unit(curr_on_cpu(cpu));
-    if ( svc && !is_idle_vcpu(svc->vcpu) )
+    if ( svc && !is_idle_unit(svc->unit) )
     {
         printk("\trun: ");
-        csched2_dump_vcpu(prv, svc);
+        csched2_dump_unit(prv, svc);
     }
 }
 
@@ -3736,7 +3722,7 @@ csched2_dump(const struct scheduler *ops)
     list_for_each( iter_sdom, &prv->sdom )
     {
         struct csched2_dom *sdom;
-        struct vcpu *v;
+        struct sched_unit *unit;
 
         sdom = list_entry(iter_sdom, struct csched2_dom, sdom_elem);
 
@@ -3744,19 +3730,19 @@ csched2_dump(const struct scheduler *ops)
                sdom->dom->domain_id,
                sdom->weight,
                sdom->cap,
-               sdom->nr_vcpus);
+               sdom->nr_units);
 
-        for_each_vcpu( sdom->dom, v )
+        for_each_sched_unit ( sdom->dom, unit )
         {
-            struct csched2_unit * const svc = csched2_unit(v->sched_unit);
+            struct csched2_unit * const svc = csched2_unit(unit);
             spinlock_t *lock;
 
-            lock = unit_schedule_lock(svc->vcpu->sched_unit);
+            lock = unit_schedule_lock(unit);
 
             printk("\t%3d: ", ++loop);
-            csched2_dump_vcpu(prv, svc);
+            csched2_dump_unit(prv, svc);
 
-            unit_schedule_unlock(lock, svc->vcpu->sched_unit);
+            unit_schedule_unlock(lock, unit);
         }
     }
 
@@ -3782,7 +3768,7 @@ csched2_dump(const struct scheduler *ops)
             if ( svc )
             {
                 printk("\t%3d: ", loop++);
-                csched2_dump_vcpu(prv, svc);
+                csched2_dump_unit(prv, svc);
             }
         }
         spin_unlock(&rqd->lock);
@@ -3882,7 +3868,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     struct sched_resource *sd = get_sched_res(cpu);
     unsigned rqi;
 
-    ASSERT(pdata && svc && is_idle_vcpu(svc->vcpu));
+    ASSERT(pdata && svc && is_idle_unit(svc->unit));
 
     /*
      * We own one runqueue lock already (from schedule_cpu_switch()). This
@@ -3895,7 +3881,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     ASSERT(!local_irq_is_enabled());
     write_lock(&prv->lock);
 
-    idle_vcpu[cpu]->sched_unit->priv = vdata;
+    sched_idle_unit(cpu)->priv = vdata;
 
     rqi = init_pdata(prv, pdata, cpu);
 
@@ -3937,7 +3923,7 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
      */
     ASSERT(spc && spc->runq_id != -1);
     ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
-    
+
     /* Find the old runqueue and remove this cpu from it */
     rqd = prv->rqd + spc->runq_id;
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 24/60] xen/sched: make arinc653 scheduler vcpu agnostic.
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, George Dunlap, Josh Whitehead, Robert VanVossen,
	Dario Faggioli

Switch arinc653 scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c | 208 +++++++++++++++++++++-----------------------
 1 file changed, 101 insertions(+), 107 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 0c75440bd0..213bc960ef 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -45,15 +45,15 @@
 #define DEFAULT_TIMESLICE MILLISECS(10)
 
 /**
- * Retrieve the idle VCPU for a given physical CPU
+ * Retrieve the idle UNIT for a given physical CPU
  */
-#define IDLETASK(cpu)  (idle_vcpu[cpu])
+#define IDLETASK(cpu)  (sched_idle_unit(cpu))
 
 /**
  * Return a pointer to the ARINC 653-specific scheduler data information
- * associated with the given VCPU (vc)
+ * associated with the given UNIT (unit)
  */
-#define AVCPU(vc) ((arinc653_vcpu_t *)(vc)->sched_unit->priv)
+#define AUNIT(unit) ((arinc653_unit_t *)(unit)->priv)
 
 /**
  * Return the global scheduler private data given the scheduler ops pointer
@@ -65,20 +65,20 @@
  **************************************************************************/
 
 /**
- * The arinc653_vcpu_t structure holds ARINC 653-scheduler-specific
- * information for all non-idle VCPUs
+ * The arinc653_unit_t structure holds ARINC 653-scheduler-specific
+ * information for all non-idle UNITs
  */
-typedef struct arinc653_vcpu_s
+typedef struct arinc653_unit_s
 {
-    /* vc points to Xen's struct vcpu so we can get to it from an
-     * arinc653_vcpu_t pointer. */
-    struct vcpu *       vc;
-    /* awake holds whether the VCPU has been woken with vcpu_wake() */
+    /* unit points to Xen's struct sched_unit so we can get to it from an
+     * arinc653_unit_t pointer. */
+    struct sched_unit * unit;
+    /* awake holds whether the UNIT has been woken with vcpu_wake() */
     bool_t              awake;
-    /* list holds the linked list information for the list this VCPU
+    /* list holds the linked list information for the list this UNIT
      * is stored in */
     struct list_head    list;
-} arinc653_vcpu_t;
+} arinc653_unit_t;
 
 /**
  * The sched_entry_t structure holds a single entry of the
@@ -89,14 +89,14 @@ typedef struct sched_entry_s
     /* dom_handle holds the handle ("UUID") for the domain that this
      * schedule entry refers to. */
     xen_domain_handle_t dom_handle;
-    /* vcpu_id holds the VCPU number for the VCPU that this schedule
+    /* unit_id holds the UNIT number for the UNIT that this schedule
      * entry refers to. */
-    int                 vcpu_id;
-    /* runtime holds the number of nanoseconds that the VCPU for this
+    int                 unit_id;
+    /* runtime holds the number of nanoseconds that the UNIT for this
      * schedule entry should be allowed to run per major frame. */
     s_time_t            runtime;
-    /* vc holds a pointer to the Xen VCPU structure */
-    struct vcpu *       vc;
+    /* unit holds a pointer to the Xen sched_unit structure */
+    struct sched_unit * unit;
 } sched_entry_t;
 
 /**
@@ -110,9 +110,9 @@ typedef struct a653sched_priv_s
     /**
      * This array holds the active ARINC 653 schedule.
      *
-     * When the system tries to start a new VCPU, this schedule is scanned
-     * to look for a matching (handle, VCPU #) pair. If both the handle (UUID)
-     * and VCPU number match, then the VCPU is allowed to run. Its run time
+     * When the system tries to start a new UNIT, this schedule is scanned
+     * to look for a matching (handle, UNIT #) pair. If both the handle (UUID)
+     * and UNIT number match, then the UNIT is allowed to run. Its run time
      * (per major frame) is given in the third entry of the schedule.
      */
     sched_entry_t schedule[ARINC653_MAX_DOMAINS_PER_SCHEDULE];
@@ -123,8 +123,8 @@ typedef struct a653sched_priv_s
      *
      * This is not necessarily the same as the number of domains in the
      * schedule. A domain could be listed multiple times within the schedule,
-     * or a domain with multiple VCPUs could have a different
-     * schedule entry for each VCPU.
+     * or a domain with multiple UNITs could have a different
+     * schedule entry for each UNIT.
      */
     unsigned int num_schedule_entries;
 
@@ -139,9 +139,9 @@ typedef struct a653sched_priv_s
     s_time_t next_major_frame;
 
     /**
-     * pointers to all Xen VCPU structures for iterating through
+     * pointers to all Xen UNIT structures for iterating through
      */
-    struct list_head vcpu_list;
+    struct list_head unit_list;
 } a653sched_priv_t;
 
 /**************************************************************************
@@ -167,50 +167,50 @@ static int dom_handle_cmp(const xen_domain_handle_t h1,
 }
 
 /**
- * This function searches the vcpu list to find a VCPU that matches
- * the domain handle and VCPU ID specified.
+ * This function searches the unit list to find a UNIT that matches
+ * the domain handle and UNIT ID specified.
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param handle    Pointer to handler
- * @param vcpu_id   VCPU ID
+ * @param unit_id   UNIT ID
  *
  * @return          <ul>
- *                  <li> Pointer to the matching VCPU if one is found
+ *                  <li> Pointer to the matching UNIT if one is found
  *                  <li> NULL otherwise
  *                  </ul>
  */
-static struct vcpu *find_vcpu(
+static struct sched_unit *find_unit(
     const struct scheduler *ops,
     xen_domain_handle_t handle,
-    int vcpu_id)
+    int unit_id)
 {
-    arinc653_vcpu_t *avcpu;
+    arinc653_unit_t *aunit;
 
-    /* loop through the vcpu_list looking for the specified VCPU */
-    list_for_each_entry ( avcpu, &SCHED_PRIV(ops)->vcpu_list, list )
-        if ( (dom_handle_cmp(avcpu->vc->domain->handle, handle) == 0)
-             && (vcpu_id == avcpu->vc->vcpu_id) )
-            return avcpu->vc;
+    /* loop through the unit_list looking for the specified UNIT */
+    list_for_each_entry ( aunit, &SCHED_PRIV(ops)->unit_list, list )
+        if ( (dom_handle_cmp(aunit->unit->domain->handle, handle) == 0)
+             && (unit_id == aunit->unit->unit_id) )
+            return aunit->unit;
 
     return NULL;
 }
 
 /**
- * This function updates the pointer to the Xen VCPU structure for each entry
+ * This function updates the pointer to the Xen UNIT structure for each entry
  * in the ARINC 653 schedule.
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @return          <None>
  */
-static void update_schedule_vcpus(const struct scheduler *ops)
+static void update_schedule_units(const struct scheduler *ops)
 {
     unsigned int i, n_entries = SCHED_PRIV(ops)->num_schedule_entries;
 
     for ( i = 0; i < n_entries; i++ )
-        SCHED_PRIV(ops)->schedule[i].vc =
-            find_vcpu(ops,
+        SCHED_PRIV(ops)->schedule[i].unit =
+            find_unit(ops,
                       SCHED_PRIV(ops)->schedule[i].dom_handle,
-                      SCHED_PRIV(ops)->schedule[i].vcpu_id);
+                      SCHED_PRIV(ops)->schedule[i].unit_id);
 }
 
 /**
@@ -268,12 +268,12 @@ arinc653_sched_set(
         memcpy(sched_priv->schedule[i].dom_handle,
                schedule->sched_entries[i].dom_handle,
                sizeof(sched_priv->schedule[i].dom_handle));
-        sched_priv->schedule[i].vcpu_id =
+        sched_priv->schedule[i].unit_id =
             schedule->sched_entries[i].vcpu_id;
         sched_priv->schedule[i].runtime =
             schedule->sched_entries[i].runtime;
     }
-    update_schedule_vcpus(ops);
+    update_schedule_units(ops);
 
     /*
      * The newly-installed schedule takes effect immediately. We do not even
@@ -319,7 +319,7 @@ arinc653_sched_get(
         memcpy(schedule->sched_entries[i].dom_handle,
                sched_priv->schedule[i].dom_handle,
                sizeof(sched_priv->schedule[i].dom_handle));
-        schedule->sched_entries[i].vcpu_id = sched_priv->schedule[i].vcpu_id;
+        schedule->sched_entries[i].vcpu_id = sched_priv->schedule[i].unit_id;
         schedule->sched_entries[i].runtime = sched_priv->schedule[i].runtime;
     }
 
@@ -355,7 +355,7 @@ a653sched_init(struct scheduler *ops)
 
     prv->next_major_frame = 0;
     spin_lock_init(&prv->lock);
-    INIT_LIST_HEAD(&prv->vcpu_list);
+    INIT_LIST_HEAD(&prv->unit_list);
 
     return 0;
 }
@@ -373,7 +373,7 @@ a653sched_deinit(struct scheduler *ops)
 }
 
 /**
- * This function allocates scheduler-specific data for a VCPU
+ * This function allocates scheduler-specific data for a UNIT
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param unit      Pointer to struct sched_unit
@@ -385,35 +385,34 @@ a653sched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
                       void *dd)
 {
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
-    struct vcpu *vc = unit->vcpu;
-    arinc653_vcpu_t *svc;
+    arinc653_unit_t *svc;
     unsigned int entry;
     unsigned long flags;
 
     /*
      * Allocate memory for the ARINC 653-specific scheduler data information
-     * associated with the given VCPU (vc).
+     * associated with the given UNIT (unit).
      */
-    svc = xmalloc(arinc653_vcpu_t);
+    svc = xmalloc(arinc653_unit_t);
     if ( svc == NULL )
         return NULL;
 
     spin_lock_irqsave(&sched_priv->lock, flags);
 
-    /* 
-     * Add every one of dom0's vcpus to the schedule, as long as there are
+    /*
+     * Add every one of dom0's units to the schedule, as long as there are
      * slots available.
      */
-    if ( vc->domain->domain_id == 0 )
+    if ( unit->domain->domain_id == 0 )
     {
         entry = sched_priv->num_schedule_entries;
 
         if ( entry < ARINC653_MAX_DOMAINS_PER_SCHEDULE )
         {
             sched_priv->schedule[entry].dom_handle[0] = '\0';
-            sched_priv->schedule[entry].vcpu_id = vc->vcpu_id;
+            sched_priv->schedule[entry].unit_id = unit->unit_id;
             sched_priv->schedule[entry].runtime = DEFAULT_TIMESLICE;
-            sched_priv->schedule[entry].vc = vc;
+            sched_priv->schedule[entry].unit = unit;
 
             sched_priv->major_frame += DEFAULT_TIMESLICE;
             ++sched_priv->num_schedule_entries;
@@ -421,16 +420,16 @@ a653sched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
     }
 
     /*
-     * Initialize our ARINC 653 scheduler-specific information for the VCPU.
-     * The VCPU starts "asleep." When Xen is ready for the VCPU to run, it
+     * Initialize our ARINC 653 scheduler-specific information for the UNIT.
+     * The UNIT starts "asleep." When Xen is ready for the UNIT to run, it
      * will call the vcpu_wake scheduler callback function and our scheduler
-     * will mark the VCPU awake.
+     * will mark the UNIT awake.
      */
-    svc->vc = vc;
+    svc->unit = unit;
     svc->awake = 0;
-    if ( !is_idle_vcpu(vc) )
-        list_add(&svc->list, &SCHED_PRIV(ops)->vcpu_list);
-    update_schedule_vcpus(ops);
+    if ( !is_idle_unit(unit) )
+        list_add(&svc->list, &SCHED_PRIV(ops)->unit_list);
+    update_schedule_units(ops);
 
     spin_unlock_irqrestore(&sched_priv->lock, flags);
 
@@ -438,27 +437,27 @@ a653sched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
 }
 
 /**
- * This function frees scheduler-specific VCPU data
+ * This function frees scheduler-specific UNIT data
  *
  * @param ops       Pointer to this instance of the scheduler structure
  */
 static void
 a653sched_free_vdata(const struct scheduler *ops, void *priv)
 {
-    arinc653_vcpu_t *av = priv;
+    arinc653_unit_t *av = priv;
 
     if (av == NULL)
         return;
 
-    if ( !is_idle_vcpu(av->vc) )
+    if ( !is_idle_unit(av->unit) )
         list_del(&av->list);
 
     xfree(av);
-    update_schedule_vcpus(ops);
+    update_schedule_units(ops);
 }
 
 /**
- * Xen scheduler callback function to sleep a VCPU
+ * Xen scheduler callback function to sleep a UNIT
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param unit      Pointer to struct sched_unit
@@ -466,21 +465,19 @@ a653sched_free_vdata(const struct scheduler *ops, void *priv)
 static void
 a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
-
-    if ( AVCPU(vc) != NULL )
-        AVCPU(vc)->awake = 0;
+    if ( AUNIT(unit) != NULL )
+        AUNIT(unit)->awake = 0;
 
     /*
-     * If the VCPU being put to sleep is the same one that is currently
+     * If the UNIT being put to sleep is the same one that is currently
      * running, raise a softirq to invoke the scheduler to switch domains.
      */
-    if ( get_sched_res(vc->processor)->curr == unit )
-        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+    if ( get_sched_res(sched_unit_cpu(unit))->curr == unit )
+        cpu_raise_softirq(sched_unit_cpu(unit), SCHEDULE_SOFTIRQ);
 }
 
 /**
- * Xen scheduler callback function to wake up a VCPU
+ * Xen scheduler callback function to wake up a UNIT
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param unit      Pointer to struct sched_unit
@@ -488,24 +485,22 @@ a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 static void
 a653sched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
+    if ( AUNIT(unit) != NULL )
+        AUNIT(unit)->awake = 1;
 
-    if ( AVCPU(vc) != NULL )
-        AVCPU(vc)->awake = 1;
-
-    cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+    cpu_raise_softirq(sched_unit_cpu(unit), SCHEDULE_SOFTIRQ);
 }
 
 /**
- * Xen scheduler callback function to select a VCPU to run.
+ * Xen scheduler callback function to select a UNIT to run.
  * This is the main scheduler routine.
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param now       Current time
  *
- * @return          Address of the VCPU structure scheduled to be run next
- *                  Amount of time to execute the returned VCPU
- *                  Flag for whether the VCPU was migrated
+ * @return          Address of the UNIT structure scheduled to be run next
+ *                  Amount of time to execute the returned UNIT
+ *                  Flag for whether the UNIT was migrated
  */
 static struct task_slice
 a653sched_do_schedule(
@@ -514,7 +509,7 @@ a653sched_do_schedule(
     bool_t tasklet_work_scheduled)
 {
     struct task_slice ret;                      /* hold the chosen domain */
-    struct vcpu * new_task = NULL;
+    struct sched_unit *new_task = NULL;
     static unsigned int sched_index = 0;
     static s_time_t next_switch_time;
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
@@ -559,14 +554,14 @@ a653sched_do_schedule(
      * sched_unit structure.
      */
     new_task = (sched_index < sched_priv->num_schedule_entries)
-        ? sched_priv->schedule[sched_index].vc
+        ? sched_priv->schedule[sched_index].unit
         : IDLETASK(cpu);
 
     /* Check to see if the new task can be run (awake & runnable). */
     if ( !((new_task != NULL)
-           && (AVCPU(new_task) != NULL)
-           && AVCPU(new_task)->awake
-           && vcpu_runnable(new_task)) )
+           && (AUNIT(new_task) != NULL)
+           && AUNIT(new_task)->awake
+           && unit_runnable(new_task)) )
         new_task = IDLETASK(cpu);
     BUG_ON(new_task == NULL);
 
@@ -578,21 +573,21 @@ a653sched_do_schedule(
 
     spin_unlock_irqrestore(&sched_priv->lock, flags);
 
-    /* Tasklet work (which runs in idle VCPU context) overrides all else. */
+    /* Tasklet work (which runs in idle UNIT context) overrides all else. */
     if ( tasklet_work_scheduled )
         new_task = IDLETASK(cpu);
 
     /* Running this task would result in a migration */
-    if ( !is_idle_vcpu(new_task)
-         && (new_task->processor != cpu) )
+    if ( !is_idle_unit(new_task)
+         && (sched_unit_cpu(new_task) != cpu) )
         new_task = IDLETASK(cpu);
 
     /*
      * Return the amount of time the next domain has to run and the address
-     * of the selected task's VCPU structure.
+     * of the selected task's UNIT structure.
      */
     ret.time = next_switch_time - now;
-    ret.task = new_task->sched_unit;
+    ret.task = new_task;
     ret.migrated = 0;
 
     BUG_ON(ret.time <= 0);
@@ -601,7 +596,7 @@ a653sched_do_schedule(
 }
 
 /**
- * Xen scheduler callback function to select a resource for the VCPU to run on
+ * Xen scheduler callback function to select a resource for the UNIT to run on
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param unit      Pointer to struct sched_unit
@@ -611,21 +606,20 @@ a653sched_do_schedule(
 static struct sched_resource *
 a653sched_pick_resource(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     cpumask_t *online;
     unsigned int cpu;
 
-    /* 
-     * If present, prefer vc's current processor, else
-     * just find the first valid vcpu .
+    /*
+     * If present, prefer unit's current processor, else
+     * just find the first valid unit.
      */
-    online = cpupool_domain_cpumask(vc->domain);
+    online = cpupool_domain_cpumask(unit->domain);
 
     cpu = cpumask_first(online);
 
-    if ( cpumask_test_cpu(vc->processor, online)
+    if ( cpumask_test_cpu(sched_unit_cpu(unit), online)
          || (cpu >= nr_cpu_ids) )
-        cpu = vc->processor;
+        cpu = sched_unit_cpu(unit);
 
     return get_sched_res(cpu);
 }
@@ -636,18 +630,18 @@ a653sched_pick_resource(const struct scheduler *ops, struct sched_unit *unit)
  * @param new_ops   Pointer to this instance of the scheduler structure
  * @param cpu       The cpu that is changing scheduler
  * @param pdata     scheduler specific PCPU data (we don't have any)
- * @param vdata     scheduler specific VCPU data of the idle vcpu
+ * @param vdata     scheduler specific UNIT data of the idle unit
  */
 static spinlock_t *
 a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                   void *pdata, void *vdata)
 {
     struct sched_resource *sd = get_sched_res(cpu);
-    arinc653_vcpu_t *svc = vdata;
+    arinc653_unit_t *svc = vdata;
 
-    ASSERT(!pdata && svc && is_idle_vcpu(svc->vc));
+    ASSERT(!pdata && svc && is_idle_unit(svc->unit));
 
-    idle_vcpu[cpu]->sched_unit->priv = vdata;
+    sched_idle_unit(cpu)->priv = vdata;
 
     return &sd->_lock;
 }
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 24/60] xen/sched: make arinc653 scheduler vcpu agnostic.
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, George Dunlap, Josh Whitehead, Robert VanVossen,
	Dario Faggioli

Switch arinc653 scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c | 208 +++++++++++++++++++++-----------------------
 1 file changed, 101 insertions(+), 107 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 0c75440bd0..213bc960ef 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -45,15 +45,15 @@
 #define DEFAULT_TIMESLICE MILLISECS(10)
 
 /**
- * Retrieve the idle VCPU for a given physical CPU
+ * Retrieve the idle UNIT for a given physical CPU
  */
-#define IDLETASK(cpu)  (idle_vcpu[cpu])
+#define IDLETASK(cpu)  (sched_idle_unit(cpu))
 
 /**
  * Return a pointer to the ARINC 653-specific scheduler data information
- * associated with the given VCPU (vc)
+ * associated with the given UNIT (unit)
  */
-#define AVCPU(vc) ((arinc653_vcpu_t *)(vc)->sched_unit->priv)
+#define AUNIT(unit) ((arinc653_unit_t *)(unit)->priv)
 
 /**
  * Return the global scheduler private data given the scheduler ops pointer
@@ -65,20 +65,20 @@
  **************************************************************************/
 
 /**
- * The arinc653_vcpu_t structure holds ARINC 653-scheduler-specific
- * information for all non-idle VCPUs
+ * The arinc653_unit_t structure holds ARINC 653-scheduler-specific
+ * information for all non-idle UNITs
  */
-typedef struct arinc653_vcpu_s
+typedef struct arinc653_unit_s
 {
-    /* vc points to Xen's struct vcpu so we can get to it from an
-     * arinc653_vcpu_t pointer. */
-    struct vcpu *       vc;
-    /* awake holds whether the VCPU has been woken with vcpu_wake() */
+    /* unit points to Xen's struct sched_unit so we can get to it from an
+     * arinc653_unit_t pointer. */
+    struct sched_unit * unit;
+    /* awake holds whether the UNIT has been woken with vcpu_wake() */
     bool_t              awake;
-    /* list holds the linked list information for the list this VCPU
+    /* list holds the linked list information for the list this UNIT
      * is stored in */
     struct list_head    list;
-} arinc653_vcpu_t;
+} arinc653_unit_t;
 
 /**
  * The sched_entry_t structure holds a single entry of the
@@ -89,14 +89,14 @@ typedef struct sched_entry_s
     /* dom_handle holds the handle ("UUID") for the domain that this
      * schedule entry refers to. */
     xen_domain_handle_t dom_handle;
-    /* vcpu_id holds the VCPU number for the VCPU that this schedule
+    /* unit_id holds the UNIT number for the UNIT that this schedule
      * entry refers to. */
-    int                 vcpu_id;
-    /* runtime holds the number of nanoseconds that the VCPU for this
+    int                 unit_id;
+    /* runtime holds the number of nanoseconds that the UNIT for this
      * schedule entry should be allowed to run per major frame. */
     s_time_t            runtime;
-    /* vc holds a pointer to the Xen VCPU structure */
-    struct vcpu *       vc;
+    /* unit holds a pointer to the Xen sched_unit structure */
+    struct sched_unit * unit;
 } sched_entry_t;
 
 /**
@@ -110,9 +110,9 @@ typedef struct a653sched_priv_s
     /**
      * This array holds the active ARINC 653 schedule.
      *
-     * When the system tries to start a new VCPU, this schedule is scanned
-     * to look for a matching (handle, VCPU #) pair. If both the handle (UUID)
-     * and VCPU number match, then the VCPU is allowed to run. Its run time
+     * When the system tries to start a new UNIT, this schedule is scanned
+     * to look for a matching (handle, UNIT #) pair. If both the handle (UUID)
+     * and UNIT number match, then the UNIT is allowed to run. Its run time
      * (per major frame) is given in the third entry of the schedule.
      */
     sched_entry_t schedule[ARINC653_MAX_DOMAINS_PER_SCHEDULE];
@@ -123,8 +123,8 @@ typedef struct a653sched_priv_s
      *
      * This is not necessarily the same as the number of domains in the
      * schedule. A domain could be listed multiple times within the schedule,
-     * or a domain with multiple VCPUs could have a different
-     * schedule entry for each VCPU.
+     * or a domain with multiple UNITs could have a different
+     * schedule entry for each UNIT.
      */
     unsigned int num_schedule_entries;
 
@@ -139,9 +139,9 @@ typedef struct a653sched_priv_s
     s_time_t next_major_frame;
 
     /**
-     * pointers to all Xen VCPU structures for iterating through
+     * pointers to all Xen UNIT structures for iterating through
      */
-    struct list_head vcpu_list;
+    struct list_head unit_list;
 } a653sched_priv_t;
 
 /**************************************************************************
@@ -167,50 +167,50 @@ static int dom_handle_cmp(const xen_domain_handle_t h1,
 }
 
 /**
- * This function searches the vcpu list to find a VCPU that matches
- * the domain handle and VCPU ID specified.
+ * This function searches the unit list to find a UNIT that matches
+ * the domain handle and UNIT ID specified.
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param handle    Pointer to handler
- * @param vcpu_id   VCPU ID
+ * @param unit_id   UNIT ID
  *
  * @return          <ul>
- *                  <li> Pointer to the matching VCPU if one is found
+ *                  <li> Pointer to the matching UNIT if one is found
  *                  <li> NULL otherwise
  *                  </ul>
  */
-static struct vcpu *find_vcpu(
+static struct sched_unit *find_unit(
     const struct scheduler *ops,
     xen_domain_handle_t handle,
-    int vcpu_id)
+    int unit_id)
 {
-    arinc653_vcpu_t *avcpu;
+    arinc653_unit_t *aunit;
 
-    /* loop through the vcpu_list looking for the specified VCPU */
-    list_for_each_entry ( avcpu, &SCHED_PRIV(ops)->vcpu_list, list )
-        if ( (dom_handle_cmp(avcpu->vc->domain->handle, handle) == 0)
-             && (vcpu_id == avcpu->vc->vcpu_id) )
-            return avcpu->vc;
+    /* loop through the unit_list looking for the specified UNIT */
+    list_for_each_entry ( aunit, &SCHED_PRIV(ops)->unit_list, list )
+        if ( (dom_handle_cmp(aunit->unit->domain->handle, handle) == 0)
+             && (unit_id == aunit->unit->unit_id) )
+            return aunit->unit;
 
     return NULL;
 }
 
 /**
- * This function updates the pointer to the Xen VCPU structure for each entry
+ * This function updates the pointer to the Xen UNIT structure for each entry
  * in the ARINC 653 schedule.
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @return          <None>
  */
-static void update_schedule_vcpus(const struct scheduler *ops)
+static void update_schedule_units(const struct scheduler *ops)
 {
     unsigned int i, n_entries = SCHED_PRIV(ops)->num_schedule_entries;
 
     for ( i = 0; i < n_entries; i++ )
-        SCHED_PRIV(ops)->schedule[i].vc =
-            find_vcpu(ops,
+        SCHED_PRIV(ops)->schedule[i].unit =
+            find_unit(ops,
                       SCHED_PRIV(ops)->schedule[i].dom_handle,
-                      SCHED_PRIV(ops)->schedule[i].vcpu_id);
+                      SCHED_PRIV(ops)->schedule[i].unit_id);
 }
 
 /**
@@ -268,12 +268,12 @@ arinc653_sched_set(
         memcpy(sched_priv->schedule[i].dom_handle,
                schedule->sched_entries[i].dom_handle,
                sizeof(sched_priv->schedule[i].dom_handle));
-        sched_priv->schedule[i].vcpu_id =
+        sched_priv->schedule[i].unit_id =
             schedule->sched_entries[i].vcpu_id;
         sched_priv->schedule[i].runtime =
             schedule->sched_entries[i].runtime;
     }
-    update_schedule_vcpus(ops);
+    update_schedule_units(ops);
 
     /*
      * The newly-installed schedule takes effect immediately. We do not even
@@ -319,7 +319,7 @@ arinc653_sched_get(
         memcpy(schedule->sched_entries[i].dom_handle,
                sched_priv->schedule[i].dom_handle,
                sizeof(sched_priv->schedule[i].dom_handle));
-        schedule->sched_entries[i].vcpu_id = sched_priv->schedule[i].vcpu_id;
+        schedule->sched_entries[i].vcpu_id = sched_priv->schedule[i].unit_id;
         schedule->sched_entries[i].runtime = sched_priv->schedule[i].runtime;
     }
 
@@ -355,7 +355,7 @@ a653sched_init(struct scheduler *ops)
 
     prv->next_major_frame = 0;
     spin_lock_init(&prv->lock);
-    INIT_LIST_HEAD(&prv->vcpu_list);
+    INIT_LIST_HEAD(&prv->unit_list);
 
     return 0;
 }
@@ -373,7 +373,7 @@ a653sched_deinit(struct scheduler *ops)
 }
 
 /**
- * This function allocates scheduler-specific data for a VCPU
+ * This function allocates scheduler-specific data for a UNIT
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param unit      Pointer to struct sched_unit
@@ -385,35 +385,34 @@ a653sched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
                       void *dd)
 {
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
-    struct vcpu *vc = unit->vcpu;
-    arinc653_vcpu_t *svc;
+    arinc653_unit_t *svc;
     unsigned int entry;
     unsigned long flags;
 
     /*
      * Allocate memory for the ARINC 653-specific scheduler data information
-     * associated with the given VCPU (vc).
+     * associated with the given UNIT (unit).
      */
-    svc = xmalloc(arinc653_vcpu_t);
+    svc = xmalloc(arinc653_unit_t);
     if ( svc == NULL )
         return NULL;
 
     spin_lock_irqsave(&sched_priv->lock, flags);
 
-    /* 
-     * Add every one of dom0's vcpus to the schedule, as long as there are
+    /*
+     * Add every one of dom0's units to the schedule, as long as there are
      * slots available.
      */
-    if ( vc->domain->domain_id == 0 )
+    if ( unit->domain->domain_id == 0 )
     {
         entry = sched_priv->num_schedule_entries;
 
         if ( entry < ARINC653_MAX_DOMAINS_PER_SCHEDULE )
         {
             sched_priv->schedule[entry].dom_handle[0] = '\0';
-            sched_priv->schedule[entry].vcpu_id = vc->vcpu_id;
+            sched_priv->schedule[entry].unit_id = unit->unit_id;
             sched_priv->schedule[entry].runtime = DEFAULT_TIMESLICE;
-            sched_priv->schedule[entry].vc = vc;
+            sched_priv->schedule[entry].unit = unit;
 
             sched_priv->major_frame += DEFAULT_TIMESLICE;
             ++sched_priv->num_schedule_entries;
@@ -421,16 +420,16 @@ a653sched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
     }
 
     /*
-     * Initialize our ARINC 653 scheduler-specific information for the VCPU.
-     * The VCPU starts "asleep." When Xen is ready for the VCPU to run, it
+     * Initialize our ARINC 653 scheduler-specific information for the UNIT.
+     * The UNIT starts "asleep." When Xen is ready for the UNIT to run, it
      * will call the vcpu_wake scheduler callback function and our scheduler
-     * will mark the VCPU awake.
+     * will mark the UNIT awake.
      */
-    svc->vc = vc;
+    svc->unit = unit;
     svc->awake = 0;
-    if ( !is_idle_vcpu(vc) )
-        list_add(&svc->list, &SCHED_PRIV(ops)->vcpu_list);
-    update_schedule_vcpus(ops);
+    if ( !is_idle_unit(unit) )
+        list_add(&svc->list, &SCHED_PRIV(ops)->unit_list);
+    update_schedule_units(ops);
 
     spin_unlock_irqrestore(&sched_priv->lock, flags);
 
@@ -438,27 +437,27 @@ a653sched_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
 }
 
 /**
- * This function frees scheduler-specific VCPU data
+ * This function frees scheduler-specific UNIT data
  *
  * @param ops       Pointer to this instance of the scheduler structure
  */
 static void
 a653sched_free_vdata(const struct scheduler *ops, void *priv)
 {
-    arinc653_vcpu_t *av = priv;
+    arinc653_unit_t *av = priv;
 
     if (av == NULL)
         return;
 
-    if ( !is_idle_vcpu(av->vc) )
+    if ( !is_idle_unit(av->unit) )
         list_del(&av->list);
 
     xfree(av);
-    update_schedule_vcpus(ops);
+    update_schedule_units(ops);
 }
 
 /**
- * Xen scheduler callback function to sleep a VCPU
+ * Xen scheduler callback function to sleep a UNIT
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param unit      Pointer to struct sched_unit
@@ -466,21 +465,19 @@ a653sched_free_vdata(const struct scheduler *ops, void *priv)
 static void
 a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
-
-    if ( AVCPU(vc) != NULL )
-        AVCPU(vc)->awake = 0;
+    if ( AUNIT(unit) != NULL )
+        AUNIT(unit)->awake = 0;
 
     /*
-     * If the VCPU being put to sleep is the same one that is currently
+     * If the UNIT being put to sleep is the same one that is currently
      * running, raise a softirq to invoke the scheduler to switch domains.
      */
-    if ( get_sched_res(vc->processor)->curr == unit )
-        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+    if ( get_sched_res(sched_unit_cpu(unit))->curr == unit )
+        cpu_raise_softirq(sched_unit_cpu(unit), SCHEDULE_SOFTIRQ);
 }
 
 /**
- * Xen scheduler callback function to wake up a VCPU
+ * Xen scheduler callback function to wake up a UNIT
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param unit      Pointer to struct sched_unit
@@ -488,24 +485,22 @@ a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 static void
 a653sched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
+    if ( AUNIT(unit) != NULL )
+        AUNIT(unit)->awake = 1;
 
-    if ( AVCPU(vc) != NULL )
-        AVCPU(vc)->awake = 1;
-
-    cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+    cpu_raise_softirq(sched_unit_cpu(unit), SCHEDULE_SOFTIRQ);
 }
 
 /**
- * Xen scheduler callback function to select a VCPU to run.
+ * Xen scheduler callback function to select a UNIT to run.
  * This is the main scheduler routine.
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param now       Current time
  *
- * @return          Address of the VCPU structure scheduled to be run next
- *                  Amount of time to execute the returned VCPU
- *                  Flag for whether the VCPU was migrated
+ * @return          Address of the UNIT structure scheduled to be run next
+ *                  Amount of time to execute the returned UNIT
+ *                  Flag for whether the UNIT was migrated
  */
 static struct task_slice
 a653sched_do_schedule(
@@ -514,7 +509,7 @@ a653sched_do_schedule(
     bool_t tasklet_work_scheduled)
 {
     struct task_slice ret;                      /* hold the chosen domain */
-    struct vcpu * new_task = NULL;
+    struct sched_unit *new_task = NULL;
     static unsigned int sched_index = 0;
     static s_time_t next_switch_time;
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
@@ -559,14 +554,14 @@ a653sched_do_schedule(
      * sched_unit structure.
      */
     new_task = (sched_index < sched_priv->num_schedule_entries)
-        ? sched_priv->schedule[sched_index].vc
+        ? sched_priv->schedule[sched_index].unit
         : IDLETASK(cpu);
 
     /* Check to see if the new task can be run (awake & runnable). */
     if ( !((new_task != NULL)
-           && (AVCPU(new_task) != NULL)
-           && AVCPU(new_task)->awake
-           && vcpu_runnable(new_task)) )
+           && (AUNIT(new_task) != NULL)
+           && AUNIT(new_task)->awake
+           && unit_runnable(new_task)) )
         new_task = IDLETASK(cpu);
     BUG_ON(new_task == NULL);
 
@@ -578,21 +573,21 @@ a653sched_do_schedule(
 
     spin_unlock_irqrestore(&sched_priv->lock, flags);
 
-    /* Tasklet work (which runs in idle VCPU context) overrides all else. */
+    /* Tasklet work (which runs in idle UNIT context) overrides all else. */
     if ( tasklet_work_scheduled )
         new_task = IDLETASK(cpu);
 
     /* Running this task would result in a migration */
-    if ( !is_idle_vcpu(new_task)
-         && (new_task->processor != cpu) )
+    if ( !is_idle_unit(new_task)
+         && (sched_unit_cpu(new_task) != cpu) )
         new_task = IDLETASK(cpu);
 
     /*
      * Return the amount of time the next domain has to run and the address
-     * of the selected task's VCPU structure.
+     * of the selected task's UNIT structure.
      */
     ret.time = next_switch_time - now;
-    ret.task = new_task->sched_unit;
+    ret.task = new_task;
     ret.migrated = 0;
 
     BUG_ON(ret.time <= 0);
@@ -601,7 +596,7 @@ a653sched_do_schedule(
 }
 
 /**
- * Xen scheduler callback function to select a resource for the VCPU to run on
+ * Xen scheduler callback function to select a resource for the UNIT to run on
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param unit      Pointer to struct sched_unit
@@ -611,21 +606,20 @@ a653sched_do_schedule(
 static struct sched_resource *
 a653sched_pick_resource(const struct scheduler *ops, struct sched_unit *unit)
 {
-    struct vcpu *vc = unit->vcpu;
     cpumask_t *online;
     unsigned int cpu;
 
-    /* 
-     * If present, prefer vc's current processor, else
-     * just find the first valid vcpu .
+    /*
+     * If present, prefer unit's current processor, else
+     * just find the first valid unit.
      */
-    online = cpupool_domain_cpumask(vc->domain);
+    online = cpupool_domain_cpumask(unit->domain);
 
     cpu = cpumask_first(online);
 
-    if ( cpumask_test_cpu(vc->processor, online)
+    if ( cpumask_test_cpu(sched_unit_cpu(unit), online)
          || (cpu >= nr_cpu_ids) )
-        cpu = vc->processor;
+        cpu = sched_unit_cpu(unit);
 
     return get_sched_res(cpu);
 }
@@ -636,18 +630,18 @@ a653sched_pick_resource(const struct scheduler *ops, struct sched_unit *unit)
  * @param new_ops   Pointer to this instance of the scheduler structure
  * @param cpu       The cpu that is changing scheduler
  * @param pdata     scheduler specific PCPU data (we don't have any)
- * @param vdata     scheduler specific VCPU data of the idle vcpu
+ * @param vdata     scheduler specific UNIT data of the idle unit
  */
 static spinlock_t *
 a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                   void *pdata, void *vdata)
 {
     struct sched_resource *sd = get_sched_res(cpu);
-    arinc653_vcpu_t *svc = vdata;
+    arinc653_unit_t *svc = vdata;
 
-    ASSERT(!pdata && svc && is_idle_vcpu(svc->vc));
+    ASSERT(!pdata && svc && is_idle_unit(svc->unit));
 
-    idle_vcpu[cpu]->sched_unit->priv = vdata;
+    sched_idle_unit(cpu)->priv = vdata;
 
     return &sd->_lock;
 }
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 25/60] xen: add sched_unit_pause_nosync() and sched_unit_unpause()
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

The credit scheduler calls vcpu_pause_nosync() and vcpu_unpause()
today. Add sched_unit_pause_nosync() and sched_unit_unpause() to
perform the same operations on scheduler units instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c  |  6 +++---
 xen/include/xen/sched-if.h | 10 ++++++++++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 0bc6a87d30..05df9e54ac 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1071,7 +1071,7 @@ csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     if ( test_and_clear_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
     {
         SCHED_STAT_CRANK(unit_unpark);
-        vcpu_unpause(svc->unit->vcpu);
+        sched_unit_unpause(svc->unit);
     }
 
     spin_lock_irq(&prv->lock);
@@ -1521,7 +1521,7 @@ csched_acct(void* dummy)
                      !test_and_set_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
                 {
                     SCHED_STAT_CRANK(unit_park);
-                    vcpu_pause_nosync(svc->unit->vcpu);
+                    sched_unit_pause_nosync(svc->unit);
                 }
 
                 /* Lower bound on credits */
@@ -1545,7 +1545,7 @@ csched_acct(void* dummy)
                      * if it is woken up here.
                      */
                     SCHED_STAT_CRANK(unit_unpark);
-                    vcpu_unpause(svc->unit->vcpu);
+                    sched_unit_unpause(svc->unit);
                     clear_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags);
                 }
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index c5bc0b689c..ca70ffb7e9 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -426,6 +426,16 @@ static inline int sched_adjust_cpupool(const struct scheduler *s,
     return s->adjust_global ? s->adjust_global(s, op) : 0;
 }
 
+static inline void sched_unit_pause_nosync(struct sched_unit *unit)
+{
+    vcpu_pause_nosync(unit->vcpu);
+}
+
+static inline void sched_unit_unpause(struct sched_unit *unit)
+{
+    vcpu_unpause(unit->vcpu);
+}
+
 #define REGISTER_SCHEDULER(x) static const struct scheduler *x##_entry \
   __used_section(".data.schedulers") = &x;
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 25/60] xen: add sched_unit_pause_nosync() and sched_unit_unpause()
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

The credit scheduler calls vcpu_pause_nosync() and vcpu_unpause()
today. Add sched_unit_pause_nosync() and sched_unit_unpause() to
perform the same operations on scheduler units instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c  |  6 +++---
 xen/include/xen/sched-if.h | 10 ++++++++++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 0bc6a87d30..05df9e54ac 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1071,7 +1071,7 @@ csched_unit_remove(const struct scheduler *ops, struct sched_unit *unit)
     if ( test_and_clear_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
     {
         SCHED_STAT_CRANK(unit_unpark);
-        vcpu_unpause(svc->unit->vcpu);
+        sched_unit_unpause(svc->unit);
     }
 
     spin_lock_irq(&prv->lock);
@@ -1521,7 +1521,7 @@ csched_acct(void* dummy)
                      !test_and_set_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags) )
                 {
                     SCHED_STAT_CRANK(unit_park);
-                    vcpu_pause_nosync(svc->unit->vcpu);
+                    sched_unit_pause_nosync(svc->unit);
                 }
 
                 /* Lower bound on credits */
@@ -1545,7 +1545,7 @@ csched_acct(void* dummy)
                      * if it is woken up here.
                      */
                     SCHED_STAT_CRANK(unit_unpark);
-                    vcpu_unpause(svc->unit->vcpu);
+                    sched_unit_unpause(svc->unit);
                     clear_bit(CSCHED_FLAG_UNIT_PARKED, &svc->flags);
                 }
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index c5bc0b689c..ca70ffb7e9 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -426,6 +426,16 @@ static inline int sched_adjust_cpupool(const struct scheduler *s,
     return s->adjust_global ? s->adjust_global(s, op) : 0;
 }
 
+static inline void sched_unit_pause_nosync(struct sched_unit *unit)
+{
+    vcpu_pause_nosync(unit->vcpu);
+}
+
+static inline void sched_unit_unpause(struct sched_unit *unit)
+{
+    vcpu_unpause(unit->vcpu);
+}
+
 #define REGISTER_SCHEDULER(x) static const struct scheduler *x##_entry \
   __used_section(".data.schedulers") = &x;
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 26/60] xen: let vcpu_create() select processor
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Today there are two distinct scenarios for vcpu_create(): either for
creation of idle-domain vcpus (vcpuid == processor) or for creation of
"normal" domain vcpus (including dom0), where the caller selects the
initial processor on a round-robin scheme of the allowed processors
(allowed being based on cpupool and affinities).

Instead of passing the initial processor to vcpu_create() and passing
on to sched_init_vcpu() let sched_init_vcpu() do the processor
selection. For supporting dom0 vcpu creation use the node_affinity of
the domain as a base for selecting the processors. User domains will
have initially all nodes set, so this is no different behavior compared
to today.

To be able to use const struct domain * make cpupool_domain_cpumask()
take a const domain pointer, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
RFC V2: add ASSERT(), modify error message (Andrew Cooper)
V1: constify pointers, avoid cpumask on stack (Jan Beulich)
---
 xen/arch/arm/domain_build.c      | 13 ++++++------
 xen/arch/x86/dom0_build.c        | 10 +++------
 xen/arch/x86/hvm/dom0_build.c    |  9 ++------
 xen/arch/x86/pv/dom0_build.c     | 10 ++-------
 xen/common/domain.c              |  5 ++---
 xen/common/domctl.c              | 10 ++-------
 xen/common/schedule.c            | 44 +++++++++++++++++++++++++++++++++++++---
 xen/include/asm-x86/dom0_build.h |  3 +--
 xen/include/xen/domain.h         |  3 +--
 xen/include/xen/sched-if.h       |  2 +-
 xen/include/xen/sched.h          |  2 +-
 11 files changed, 62 insertions(+), 49 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index d9836779d1..86a6e4bf7b 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -80,7 +80,7 @@ unsigned int __init dom0_max_vcpus(void)
 
 struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
 {
-    return vcpu_create(dom0, 0, 0);
+    return vcpu_create(dom0, 0);
 }
 
 static unsigned int __init get_allocation_size(paddr_t size)
@@ -1923,7 +1923,7 @@ static void __init find_gnttab_region(struct domain *d,
 
 static int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
 {
-    int i, cpu;
+    int i;
     struct vcpu *v = d->vcpu[0];
     struct cpu_user_regs *regs = &v->arch.cpu_info->guest_cpu_user_regs;
 
@@ -1986,12 +1986,11 @@ static int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
     }
 #endif
 
-    for ( i = 1, cpu = 0; i < d->max_vcpus; i++ )
+    for ( i = 1; i < d->max_vcpus; i++ )
     {
-        cpu = cpumask_cycle(cpu, &cpu_online_map);
-        if ( vcpu_create(d, i, cpu) == NULL )
+        if ( vcpu_create(d, i) == NULL )
         {
-            printk("Failed to allocate dom0 vcpu %d on pcpu %d\n", i, cpu);
+            printk("Failed to allocate d0v%u\n", i);
             break;
         }
 
@@ -2026,7 +2025,7 @@ static int __init construct_domU(struct domain *d,
 
     kinfo.vpl011 = dt_property_read_bool(node, "vpl011");
 
-    if ( vcpu_create(d, 0, 0) == NULL )
+    if ( vcpu_create(d, 0) == NULL )
         return -ENOMEM;
     d->max_pages = ~0U;
 
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 9b063639c9..9c23cf6aab 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -198,12 +198,9 @@ custom_param("dom0_nodes", parse_dom0_nodes);
 
 static cpumask_t __initdata dom0_cpus;
 
-struct vcpu *__init dom0_setup_vcpu(struct domain *d,
-                                    unsigned int vcpu_id,
-                                    unsigned int prev_cpu)
+struct vcpu *__init dom0_setup_vcpu(struct domain *d, unsigned int vcpu_id)
 {
-    unsigned int cpu = cpumask_cycle(prev_cpu, &dom0_cpus);
-    struct vcpu *v = vcpu_create(d, vcpu_id, cpu);
+    struct vcpu *v = vcpu_create(d, vcpu_id);
 
     if ( v )
     {
@@ -273,8 +270,7 @@ struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
     dom0->node_affinity = dom0_nodes;
     dom0->auto_node_affinity = !dom0_nr_pxms;
 
-    return dom0_setup_vcpu(dom0, 0,
-                           cpumask_last(&dom0_cpus) /* so it wraps around to first pcpu */);
+    return dom0_setup_vcpu(dom0, 0);
 }
 
 #ifdef CONFIG_SHADOW_PAGING
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 8845399ae9..fd8d9609b1 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -614,7 +614,7 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
                                  paddr_t start_info)
 {
     struct vcpu *v = d->vcpu[0];
-    unsigned int cpu = v->processor, i;
+    unsigned int i;
     int rc;
     /*
      * This sets the vCPU state according to the state described in
@@ -636,12 +636,7 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
     };
 
     for ( i = 1; i < d->max_vcpus; i++ )
-    {
-        const struct vcpu *p = dom0_setup_vcpu(d, i, cpu);
-
-        if ( p )
-            cpu = p->processor;
-    }
+        dom0_setup_vcpu(d, i);
 
     domain_update_node_affinity(d);
 
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 903611fb0d..215c2e7d84 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -285,7 +285,7 @@ int __init dom0_construct_pv(struct domain *d,
                              module_t *initrd,
                              char *cmdline)
 {
-    int i, cpu, rc, compatible, order, machine;
+    int i, rc, compatible, order, machine;
     struct cpu_user_regs *regs;
     unsigned long pfn, mfn;
     unsigned long nr_pages;
@@ -694,14 +694,8 @@ int __init dom0_construct_pv(struct domain *d,
 
     printk("Dom%u has maximum %u VCPUs\n", d->domain_id, d->max_vcpus);
 
-    cpu = v->processor;
     for ( i = 1; i < d->max_vcpus; i++ )
-    {
-        const struct vcpu *p = dom0_setup_vcpu(d, i, cpu);
-
-        if ( p )
-            cpu = p->processor;
-    }
+        dom0_setup_vcpu(d, i);
 
     domain_update_node_affinity(d);
     d->arch.paging.mode = 0;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index c200e9024f..2d3427eb0f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -128,8 +128,7 @@ static void vcpu_destroy(struct vcpu *v)
     free_vcpu_struct(v);
 }
 
-struct vcpu *vcpu_create(
-    struct domain *d, unsigned int vcpu_id, unsigned int cpu_id)
+struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
 {
     struct vcpu *v;
 
@@ -161,7 +160,7 @@ struct vcpu *vcpu_create(
         init_waitqueue_vcpu(v);
     }
 
-    if ( sched_init_vcpu(v, cpu_id) != 0 )
+    if ( sched_init_vcpu(v) != 0 )
         goto fail_wq;
 
     if ( arch_vcpu_create(v) != 0 )
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index bc986d131d..e48f17cb19 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -540,8 +540,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 
     case XEN_DOMCTL_max_vcpus:
     {
-        unsigned int i, max = op->u.max_vcpus.max, cpu;
-        cpumask_t *online;
+        unsigned int i, max = op->u.max_vcpus.max;
 
         ret = -EINVAL;
         if ( (d == current->domain) || /* no domain_pause() */
@@ -552,18 +551,13 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         domain_pause(d);
 
         ret = -ENOMEM;
-        online = cpupool_domain_cpumask(d);
 
         for ( i = 0; i < max; i++ )
         {
             if ( d->vcpu[i] != NULL )
                 continue;
 
-            cpu = (i == 0) ?
-                cpumask_any(online) :
-                cpumask_cycle(d->vcpu[i-1]->processor, online);
-
-            if ( vcpu_create(d, i, cpu) == NULL )
+            if ( vcpu_create(d, i) == NULL )
                 goto maxvcpu_out;
         }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 53a3e55f0b..2c1ba32d83 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -314,14 +314,52 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
     return NULL;
 }
 
-int sched_init_vcpu(struct vcpu *v, unsigned int processor)
+static unsigned int sched_select_initial_cpu(const struct vcpu *v)
+{
+    const struct domain *d = v->domain;
+    nodeid_t node;
+    spinlock_t *lock;
+    unsigned long flags;
+    unsigned int cpu_ret, cpu = smp_processor_id();
+    cpumask_t *cpus = cpumask_scratch_cpu(cpu);
+
+    lock = pcpu_schedule_lock_irqsave(cpu, &flags);
+    cpumask_clear(cpus);
+    for_each_node_mask ( node, d->node_affinity )
+        cpumask_or(cpus, cpus, &node_to_cpumask(node));
+    cpumask_and(cpus, cpus, cpupool_domain_cpumask(d));
+    if ( cpumask_empty(cpus) )
+        cpumask_copy(cpus, cpupool_domain_cpumask(d));
+
+    if ( v->vcpu_id == 0 )
+        cpu_ret = cpumask_first(cpus);
+    else
+    {
+        /* We can rely on previous vcpu being available. */
+        ASSERT(!is_idle_domain(d));
+
+        cpu_ret = cpumask_cycle(d->vcpu[v->vcpu_id - 1]->processor, cpus);
+    }
+
+    pcpu_schedule_unlock_irqrestore(lock, flags, cpu);
+
+    return cpu_ret;
+}
+
+int sched_init_vcpu(struct vcpu *v)
 {
     struct domain *d = v->domain;
     struct sched_unit *unit;
+    unsigned int processor;
 
     if ( (unit = sched_alloc_unit(v)) == NULL )
         return 1;
 
+    if ( is_idle_domain(d) )
+        processor = v->vcpu_id;
+    else
+        processor = sched_select_initial_cpu(v);
+
     sched_set_res(unit, get_sched_res(processor));
 
     /* Initialise the per-vcpu timers. */
@@ -1673,7 +1711,7 @@ static int cpu_schedule_up(unsigned int cpu)
         return 0;
 
     if ( idle_vcpu[cpu] == NULL )
-        vcpu_create(idle_vcpu[0]->domain, cpu, cpu);
+        vcpu_create(idle_vcpu[0]->domain, cpu);
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
@@ -1894,7 +1932,7 @@ void __init scheduler_init(void)
     BUG_ON(nr_cpu_ids > ARRAY_SIZE(idle_vcpu));
     idle_domain->vcpu = idle_vcpu;
     idle_domain->max_vcpus = nr_cpu_ids;
-    if ( vcpu_create(idle_domain, 0, 0) == NULL )
+    if ( vcpu_create(idle_domain, 0) == NULL )
         BUG();
     get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_priv = sched_alloc_pdata(&ops, 0);
diff --git a/xen/include/asm-x86/dom0_build.h b/xen/include/asm-x86/dom0_build.h
index 33a5483739..3eb4b036e1 100644
--- a/xen/include/asm-x86/dom0_build.h
+++ b/xen/include/asm-x86/dom0_build.h
@@ -11,8 +11,7 @@ extern unsigned int dom0_memflags;
 unsigned long dom0_compute_nr_pages(struct domain *d,
                                     struct elf_dom_parms *parms,
                                     unsigned long initrd_len);
-struct vcpu *dom0_setup_vcpu(struct domain *d, unsigned int vcpu_id,
-                             unsigned int cpu);
+struct vcpu *dom0_setup_vcpu(struct domain *d, unsigned int vcpu_id);
 int dom0_setup_permissions(struct domain *d);
 
 int dom0_construct_pv(struct domain *d, const module_t *image,
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index d1bfc82f57..a6e929685c 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -13,8 +13,7 @@ typedef union {
     struct compat_vcpu_guest_context *cmp;
 } vcpu_guest_context_u __attribute__((__transparent_union__));
 
-struct vcpu *vcpu_create(
-    struct domain *d, unsigned int vcpu_id, unsigned int cpu_id);
+struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id);
 
 unsigned int dom0_max_vcpus(void);
 struct vcpu *alloc_dom0_vcpu0(struct domain *dom0);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index ca70ffb7e9..2a8789b438 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -452,7 +452,7 @@ struct cpupool
 #define cpupool_online_cpumask(_pool) \
     (((_pool) == NULL) ? &cpu_online_map : (_pool)->cpu_valid)
 
-static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
+static inline cpumask_t* cpupool_domain_cpumask(const struct domain *d)
 {
     /*
      * d->cpupool is NULL only for the idle domain, and no one should
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 13bab258d5..b14dff6fc4 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -660,7 +660,7 @@ void __domain_crash(struct domain *d);
 void noreturn asm_domain_crash_synchronous(unsigned long addr);
 
 void scheduler_init(void);
-int  sched_init_vcpu(struct vcpu *v, unsigned int processor);
+int  sched_init_vcpu(struct vcpu *v);
 void sched_destroy_vcpu(struct vcpu *v);
 int  sched_init_domain(struct domain *d, int poolid);
 void sched_destroy_domain(struct domain *d);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 26/60] xen: let vcpu_create() select processor
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Today there are two distinct scenarios for vcpu_create(): either for
creation of idle-domain vcpus (vcpuid == processor) or for creation of
"normal" domain vcpus (including dom0), where the caller selects the
initial processor on a round-robin scheme of the allowed processors
(allowed being based on cpupool and affinities).

Instead of passing the initial processor to vcpu_create() and passing
on to sched_init_vcpu() let sched_init_vcpu() do the processor
selection. For supporting dom0 vcpu creation use the node_affinity of
the domain as a base for selecting the processors. User domains will
have initially all nodes set, so this is no different behavior compared
to today.

To be able to use const struct domain * make cpupool_domain_cpumask()
take a const domain pointer, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
RFC V2: add ASSERT(), modify error message (Andrew Cooper)
V1: constify pointers, avoid cpumask on stack (Jan Beulich)
---
 xen/arch/arm/domain_build.c      | 13 ++++++------
 xen/arch/x86/dom0_build.c        | 10 +++------
 xen/arch/x86/hvm/dom0_build.c    |  9 ++------
 xen/arch/x86/pv/dom0_build.c     | 10 ++-------
 xen/common/domain.c              |  5 ++---
 xen/common/domctl.c              | 10 ++-------
 xen/common/schedule.c            | 44 +++++++++++++++++++++++++++++++++++++---
 xen/include/asm-x86/dom0_build.h |  3 +--
 xen/include/xen/domain.h         |  3 +--
 xen/include/xen/sched-if.h       |  2 +-
 xen/include/xen/sched.h          |  2 +-
 11 files changed, 62 insertions(+), 49 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index d9836779d1..86a6e4bf7b 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -80,7 +80,7 @@ unsigned int __init dom0_max_vcpus(void)
 
 struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
 {
-    return vcpu_create(dom0, 0, 0);
+    return vcpu_create(dom0, 0);
 }
 
 static unsigned int __init get_allocation_size(paddr_t size)
@@ -1923,7 +1923,7 @@ static void __init find_gnttab_region(struct domain *d,
 
 static int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
 {
-    int i, cpu;
+    int i;
     struct vcpu *v = d->vcpu[0];
     struct cpu_user_regs *regs = &v->arch.cpu_info->guest_cpu_user_regs;
 
@@ -1986,12 +1986,11 @@ static int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
     }
 #endif
 
-    for ( i = 1, cpu = 0; i < d->max_vcpus; i++ )
+    for ( i = 1; i < d->max_vcpus; i++ )
     {
-        cpu = cpumask_cycle(cpu, &cpu_online_map);
-        if ( vcpu_create(d, i, cpu) == NULL )
+        if ( vcpu_create(d, i) == NULL )
         {
-            printk("Failed to allocate dom0 vcpu %d on pcpu %d\n", i, cpu);
+            printk("Failed to allocate d0v%u\n", i);
             break;
         }
 
@@ -2026,7 +2025,7 @@ static int __init construct_domU(struct domain *d,
 
     kinfo.vpl011 = dt_property_read_bool(node, "vpl011");
 
-    if ( vcpu_create(d, 0, 0) == NULL )
+    if ( vcpu_create(d, 0) == NULL )
         return -ENOMEM;
     d->max_pages = ~0U;
 
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 9b063639c9..9c23cf6aab 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -198,12 +198,9 @@ custom_param("dom0_nodes", parse_dom0_nodes);
 
 static cpumask_t __initdata dom0_cpus;
 
-struct vcpu *__init dom0_setup_vcpu(struct domain *d,
-                                    unsigned int vcpu_id,
-                                    unsigned int prev_cpu)
+struct vcpu *__init dom0_setup_vcpu(struct domain *d, unsigned int vcpu_id)
 {
-    unsigned int cpu = cpumask_cycle(prev_cpu, &dom0_cpus);
-    struct vcpu *v = vcpu_create(d, vcpu_id, cpu);
+    struct vcpu *v = vcpu_create(d, vcpu_id);
 
     if ( v )
     {
@@ -273,8 +270,7 @@ struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
     dom0->node_affinity = dom0_nodes;
     dom0->auto_node_affinity = !dom0_nr_pxms;
 
-    return dom0_setup_vcpu(dom0, 0,
-                           cpumask_last(&dom0_cpus) /* so it wraps around to first pcpu */);
+    return dom0_setup_vcpu(dom0, 0);
 }
 
 #ifdef CONFIG_SHADOW_PAGING
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 8845399ae9..fd8d9609b1 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -614,7 +614,7 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
                                  paddr_t start_info)
 {
     struct vcpu *v = d->vcpu[0];
-    unsigned int cpu = v->processor, i;
+    unsigned int i;
     int rc;
     /*
      * This sets the vCPU state according to the state described in
@@ -636,12 +636,7 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
     };
 
     for ( i = 1; i < d->max_vcpus; i++ )
-    {
-        const struct vcpu *p = dom0_setup_vcpu(d, i, cpu);
-
-        if ( p )
-            cpu = p->processor;
-    }
+        dom0_setup_vcpu(d, i);
 
     domain_update_node_affinity(d);
 
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 903611fb0d..215c2e7d84 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -285,7 +285,7 @@ int __init dom0_construct_pv(struct domain *d,
                              module_t *initrd,
                              char *cmdline)
 {
-    int i, cpu, rc, compatible, order, machine;
+    int i, rc, compatible, order, machine;
     struct cpu_user_regs *regs;
     unsigned long pfn, mfn;
     unsigned long nr_pages;
@@ -694,14 +694,8 @@ int __init dom0_construct_pv(struct domain *d,
 
     printk("Dom%u has maximum %u VCPUs\n", d->domain_id, d->max_vcpus);
 
-    cpu = v->processor;
     for ( i = 1; i < d->max_vcpus; i++ )
-    {
-        const struct vcpu *p = dom0_setup_vcpu(d, i, cpu);
-
-        if ( p )
-            cpu = p->processor;
-    }
+        dom0_setup_vcpu(d, i);
 
     domain_update_node_affinity(d);
     d->arch.paging.mode = 0;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index c200e9024f..2d3427eb0f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -128,8 +128,7 @@ static void vcpu_destroy(struct vcpu *v)
     free_vcpu_struct(v);
 }
 
-struct vcpu *vcpu_create(
-    struct domain *d, unsigned int vcpu_id, unsigned int cpu_id)
+struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
 {
     struct vcpu *v;
 
@@ -161,7 +160,7 @@ struct vcpu *vcpu_create(
         init_waitqueue_vcpu(v);
     }
 
-    if ( sched_init_vcpu(v, cpu_id) != 0 )
+    if ( sched_init_vcpu(v) != 0 )
         goto fail_wq;
 
     if ( arch_vcpu_create(v) != 0 )
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index bc986d131d..e48f17cb19 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -540,8 +540,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 
     case XEN_DOMCTL_max_vcpus:
     {
-        unsigned int i, max = op->u.max_vcpus.max, cpu;
-        cpumask_t *online;
+        unsigned int i, max = op->u.max_vcpus.max;
 
         ret = -EINVAL;
         if ( (d == current->domain) || /* no domain_pause() */
@@ -552,18 +551,13 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         domain_pause(d);
 
         ret = -ENOMEM;
-        online = cpupool_domain_cpumask(d);
 
         for ( i = 0; i < max; i++ )
         {
             if ( d->vcpu[i] != NULL )
                 continue;
 
-            cpu = (i == 0) ?
-                cpumask_any(online) :
-                cpumask_cycle(d->vcpu[i-1]->processor, online);
-
-            if ( vcpu_create(d, i, cpu) == NULL )
+            if ( vcpu_create(d, i) == NULL )
                 goto maxvcpu_out;
         }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 53a3e55f0b..2c1ba32d83 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -314,14 +314,52 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
     return NULL;
 }
 
-int sched_init_vcpu(struct vcpu *v, unsigned int processor)
+static unsigned int sched_select_initial_cpu(const struct vcpu *v)
+{
+    const struct domain *d = v->domain;
+    nodeid_t node;
+    spinlock_t *lock;
+    unsigned long flags;
+    unsigned int cpu_ret, cpu = smp_processor_id();
+    cpumask_t *cpus = cpumask_scratch_cpu(cpu);
+
+    lock = pcpu_schedule_lock_irqsave(cpu, &flags);
+    cpumask_clear(cpus);
+    for_each_node_mask ( node, d->node_affinity )
+        cpumask_or(cpus, cpus, &node_to_cpumask(node));
+    cpumask_and(cpus, cpus, cpupool_domain_cpumask(d));
+    if ( cpumask_empty(cpus) )
+        cpumask_copy(cpus, cpupool_domain_cpumask(d));
+
+    if ( v->vcpu_id == 0 )
+        cpu_ret = cpumask_first(cpus);
+    else
+    {
+        /* We can rely on previous vcpu being available. */
+        ASSERT(!is_idle_domain(d));
+
+        cpu_ret = cpumask_cycle(d->vcpu[v->vcpu_id - 1]->processor, cpus);
+    }
+
+    pcpu_schedule_unlock_irqrestore(lock, flags, cpu);
+
+    return cpu_ret;
+}
+
+int sched_init_vcpu(struct vcpu *v)
 {
     struct domain *d = v->domain;
     struct sched_unit *unit;
+    unsigned int processor;
 
     if ( (unit = sched_alloc_unit(v)) == NULL )
         return 1;
 
+    if ( is_idle_domain(d) )
+        processor = v->vcpu_id;
+    else
+        processor = sched_select_initial_cpu(v);
+
     sched_set_res(unit, get_sched_res(processor));
 
     /* Initialise the per-vcpu timers. */
@@ -1673,7 +1711,7 @@ static int cpu_schedule_up(unsigned int cpu)
         return 0;
 
     if ( idle_vcpu[cpu] == NULL )
-        vcpu_create(idle_vcpu[0]->domain, cpu, cpu);
+        vcpu_create(idle_vcpu[0]->domain, cpu);
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
@@ -1894,7 +1932,7 @@ void __init scheduler_init(void)
     BUG_ON(nr_cpu_ids > ARRAY_SIZE(idle_vcpu));
     idle_domain->vcpu = idle_vcpu;
     idle_domain->max_vcpus = nr_cpu_ids;
-    if ( vcpu_create(idle_domain, 0, 0) == NULL )
+    if ( vcpu_create(idle_domain, 0) == NULL )
         BUG();
     get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_priv = sched_alloc_pdata(&ops, 0);
diff --git a/xen/include/asm-x86/dom0_build.h b/xen/include/asm-x86/dom0_build.h
index 33a5483739..3eb4b036e1 100644
--- a/xen/include/asm-x86/dom0_build.h
+++ b/xen/include/asm-x86/dom0_build.h
@@ -11,8 +11,7 @@ extern unsigned int dom0_memflags;
 unsigned long dom0_compute_nr_pages(struct domain *d,
                                     struct elf_dom_parms *parms,
                                     unsigned long initrd_len);
-struct vcpu *dom0_setup_vcpu(struct domain *d, unsigned int vcpu_id,
-                             unsigned int cpu);
+struct vcpu *dom0_setup_vcpu(struct domain *d, unsigned int vcpu_id);
 int dom0_setup_permissions(struct domain *d);
 
 int dom0_construct_pv(struct domain *d, const module_t *image,
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index d1bfc82f57..a6e929685c 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -13,8 +13,7 @@ typedef union {
     struct compat_vcpu_guest_context *cmp;
 } vcpu_guest_context_u __attribute__((__transparent_union__));
 
-struct vcpu *vcpu_create(
-    struct domain *d, unsigned int vcpu_id, unsigned int cpu_id);
+struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id);
 
 unsigned int dom0_max_vcpus(void);
 struct vcpu *alloc_dom0_vcpu0(struct domain *dom0);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index ca70ffb7e9..2a8789b438 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -452,7 +452,7 @@ struct cpupool
 #define cpupool_online_cpumask(_pool) \
     (((_pool) == NULL) ? &cpu_online_map : (_pool)->cpu_valid)
 
-static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
+static inline cpumask_t* cpupool_domain_cpumask(const struct domain *d)
 {
     /*
      * d->cpupool is NULL only for the idle domain, and no one should
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 13bab258d5..b14dff6fc4 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -660,7 +660,7 @@ void __domain_crash(struct domain *d);
 void noreturn asm_domain_crash_synchronous(unsigned long addr);
 
 void scheduler_init(void);
-int  sched_init_vcpu(struct vcpu *v, unsigned int processor);
+int  sched_init_vcpu(struct vcpu *v);
 void sched_destroy_vcpu(struct vcpu *v);
 int  sched_init_domain(struct domain *d, int poolid);
 void sched_destroy_domain(struct domain *d);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 27/60] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

Especially in the do_schedule() functions of the different schedulers
using smp_processor_id() for the local cpu number is correct only if
the sched_unit is a single vcpu. As soon as larger sched_units are
used most uses should be replaced by the cpu number of the local
sched_resource instead.

Add a helper to get that sched_resource cpu and modify the schedulers
to use it in a correct way.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |  2 +-
 xen/common/sched_credit.c   | 21 +++++++++---------
 xen/common/sched_credit2.c  | 53 +++++++++++++++++++++++----------------------
 xen/common/sched_null.c     | 17 ++++++++-------
 xen/common/sched_rt.c       | 17 ++++++++-------
 xen/include/xen/sched-if.h  |  5 +++++
 6 files changed, 62 insertions(+), 53 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 213bc960ef..e48f2b2eb9 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -513,7 +513,7 @@ a653sched_do_schedule(
     static unsigned int sched_index = 0;
     static s_time_t next_switch_time;
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
-    const unsigned int cpu = smp_processor_id();
+    const unsigned int cpu = sched_get_resource_cpu(smp_processor_id());
     unsigned long flags;
 
     spin_lock_irqsave(&sched_priv->lock, flags);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 05df9e54ac..e67b9bb23e 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1690,7 +1690,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
     int peer_cpu, first_cpu, peer_node, bstep;
     int node = cpu_to_node(cpu);
 
-    BUG_ON( cpu != sched_unit_cpu(snext->unit) );
+    BUG_ON( sched_get_resource_cpu(cpu) != sched_unit_cpu(snext->unit) );
     online = cpupool_online_cpumask(c);
 
     /*
@@ -1831,8 +1831,9 @@ static struct task_slice
 csched_schedule(
     const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
 {
-    const int cpu = smp_processor_id();
-    struct list_head * const runq = RUNQ(cpu);
+    const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
+    struct list_head * const runq = RUNQ(sched_cpu);
     struct sched_unit *unit = current->sched_unit;
     struct csched_unit * const scurr = CSCHED_UNIT(unit);
     struct csched_private *prv = CSCHED_PRIV(ops);
@@ -1942,7 +1943,7 @@ csched_schedule(
     {
         BUG_ON( is_idle_unit(unit) || list_empty(runq) );
         /* Current has blocked. Update the runnable counter for this cpu. */
-        dec_nr_runnable(cpu);
+        dec_nr_runnable(sched_cpu);
     }
 
     snext = __runq_elem(runq->next);
@@ -1952,7 +1953,7 @@ csched_schedule(
     if ( tasklet_work_scheduled )
     {
         TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_UNIT(sched_idle_unit(cpu));
+        snext = CSCHED_UNIT(sched_idle_unit(sched_cpu));
         snext->pri = CSCHED_PRI_TS_BOOST;
     }
 
@@ -1972,7 +1973,7 @@ csched_schedule(
     if ( snext->pri > CSCHED_PRI_TS_OVER )
         __runq_remove(snext);
     else
-        snext = csched_load_balance(prv, cpu, snext, &ret.migrated);
+        snext = csched_load_balance(prv, sched_cpu, snext, &ret.migrated);
 
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
@@ -1980,12 +1981,12 @@ csched_schedule(
      */
     if ( !tasklet_work_scheduled && snext->pri == CSCHED_PRI_IDLE )
     {
-        if ( !cpumask_test_cpu(cpu, prv->idlers) )
-            cpumask_set_cpu(cpu, prv->idlers);
+        if ( !cpumask_test_cpu(sched_cpu, prv->idlers) )
+            cpumask_set_cpu(sched_cpu, prv->idlers);
     }
-    else if ( cpumask_test_cpu(cpu, prv->idlers) )
+    else if ( cpumask_test_cpu(sched_cpu, prv->idlers) )
     {
-        cpumask_clear_cpu(cpu, prv->idlers);
+        cpumask_clear_cpu(sched_cpu, prv->idlers);
     }
 
     if ( !is_idle_unit(snext->unit) )
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 0f57135e81..cf502b3cbc 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3447,7 +3447,8 @@ static struct task_slice
 csched2_schedule(
     const struct scheduler *ops, s_time_t now, bool tasklet_work_scheduled)
 {
-    const int cpu = smp_processor_id();
+    const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct csched2_runqueue_data *rqd;
     struct sched_unit *currunit = current->sched_unit;
     struct csched2_unit * const scurr = csched2_unit(currunit);
@@ -3459,22 +3460,22 @@ csched2_schedule(
     SCHED_STAT_CRANK(schedule);
     CSCHED2_UNIT_CHECK(currunit);
 
-    BUG_ON(!cpumask_test_cpu(cpu, &csched2_priv(ops)->initialized));
+    BUG_ON(!cpumask_test_cpu(sched_cpu, &csched2_priv(ops)->initialized));
 
-    rqd = c2rqd(ops, cpu);
-    BUG_ON(!cpumask_test_cpu(cpu, &rqd->active));
+    rqd = c2rqd(ops, sched_cpu);
+    BUG_ON(!cpumask_test_cpu(sched_cpu, &rqd->active));
 
-    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(sched_cpu)->schedule_lock));
 
     BUG_ON(!is_idle_unit(currunit) && scurr->rqd != rqd);
 
     /* Clear "tickled" bit now that we've been scheduled */
-    tickled = cpumask_test_cpu(cpu, &rqd->tickled);
+    tickled = cpumask_test_cpu(sched_cpu, &rqd->tickled);
     if ( tickled )
     {
-        __cpumask_clear_cpu(cpu, &rqd->tickled);
+        __cpumask_clear_cpu(sched_cpu, &rqd->tickled);
         cpumask_andnot(cpumask_scratch, &rqd->idle, &rqd->tickled);
-        smt_idle_mask_set(cpu, cpumask_scratch, &rqd->smt_idle);
+        smt_idle_mask_set(sched_cpu, cpumask_scratch, &rqd->smt_idle);
     }
 
     if ( unlikely(tb_init_done) )
@@ -3484,10 +3485,10 @@ csched2_schedule(
             unsigned tasklet:8, idle:8, smt_idle:8, tickled:8;
         } d;
         d.cpu = cpu;
-        d.rq_id = c2r(cpu);
+        d.rq_id = c2r(sched_cpu);
         d.tasklet = tasklet_work_scheduled;
         d.idle = is_idle_unit(currunit);
-        d.smt_idle = cpumask_test_cpu(cpu, &rqd->smt_idle);
+        d.smt_idle = cpumask_test_cpu(sched_cpu, &rqd->smt_idle);
         d.tickled = tickled;
         __trace_var(TRC_CSCHED2_SCHEDULE, 1,
                     sizeof(d),
@@ -3527,10 +3528,10 @@ csched2_schedule(
     {
         __clear_bit(__CSFLAG_unit_yield, &scurr->flags);
         trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
-        snext = csched2_unit(sched_idle_unit(cpu));
+        snext = csched2_unit(sched_idle_unit(sched_cpu));
     }
     else
-        snext = runq_candidate(rqd, scurr, cpu, now, &skipped_units);
+        snext = runq_candidate(rqd, scurr, sched_cpu, now, &skipped_units);
 
     /* If switching from a non-idle runnable unit, put it
      * back on the runqueue. */
@@ -3555,10 +3556,10 @@ csched2_schedule(
         }
 
         /* Clear the idle mask if necessary */
-        if ( cpumask_test_cpu(cpu, &rqd->idle) )
+        if ( cpumask_test_cpu(sched_cpu, &rqd->idle) )
         {
-            __cpumask_clear_cpu(cpu, &rqd->idle);
-            smt_idle_mask_clear(cpu, &rqd->smt_idle);
+            __cpumask_clear_cpu(sched_cpu, &rqd->idle);
+            smt_idle_mask_clear(sched_cpu, &rqd->smt_idle);
         }
 
         /*
@@ -3577,18 +3578,18 @@ csched2_schedule(
          */
         if ( skipped_units == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
         {
-            reset_credit(ops, cpu, now, snext);
-            balance_load(ops, cpu, now);
+            reset_credit(ops, sched_cpu, now, snext);
+            balance_load(ops, sched_cpu, now);
         }
 
         snext->start_time = now;
         snext->tickled_cpu = -1;
 
         /* Safe because lock for old processor is held */
-        if ( sched_unit_cpu(snext->unit) != cpu )
+        if ( sched_unit_cpu(snext->unit) != sched_cpu )
         {
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
-            sched_set_res(snext->unit, get_sched_res(cpu));
+            sched_set_res(snext->unit, get_sched_res(sched_cpu));
             SCHED_STAT_CRANK(migrated);
             ret.migrated = 1;
         }
@@ -3601,17 +3602,17 @@ csched2_schedule(
          */
         if ( tasklet_work_scheduled )
         {
-            if ( cpumask_test_cpu(cpu, &rqd->idle) )
+            if ( cpumask_test_cpu(sched_cpu, &rqd->idle) )
             {
-                __cpumask_clear_cpu(cpu, &rqd->idle);
-                smt_idle_mask_clear(cpu, &rqd->smt_idle);
+                __cpumask_clear_cpu(sched_cpu, &rqd->idle);
+                smt_idle_mask_clear(sched_cpu, &rqd->smt_idle);
             }
         }
-        else if ( !cpumask_test_cpu(cpu, &rqd->idle) )
+        else if ( !cpumask_test_cpu(sched_cpu, &rqd->idle) )
         {
-            __cpumask_set_cpu(cpu, &rqd->idle);
+            __cpumask_set_cpu(sched_cpu, &rqd->idle);
             cpumask_andnot(cpumask_scratch, &rqd->idle, &rqd->tickled);
-            smt_idle_mask_set(cpu, cpumask_scratch, &rqd->smt_idle);
+            smt_idle_mask_set(sched_cpu, cpumask_scratch, &rqd->smt_idle);
         }
         /* Make sure avgload gets updated periodically even
          * if there's no activity */
@@ -3621,7 +3622,7 @@ csched2_schedule(
     /*
      * Return task to run next...
      */
-    ret.time = csched2_runtime(ops, cpu, snext, now);
+    ret.time = csched2_runtime(ops, sched_cpu, snext, now);
     ret.task = snext->unit;
 
     CSCHED2_UNIT_CHECK(ret.task);
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 53a20d73aa..6c350a342b 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -701,6 +701,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
 {
     unsigned int bs;
     const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct null_private *prv = null_priv(ops);
     struct null_unit *wvc;
     struct task_slice ret;
@@ -716,14 +717,14 @@ static struct task_slice null_schedule(const struct scheduler *ops,
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        if ( per_cpu(npc, cpu).unit == NULL )
+        if ( per_cpu(npc, sched_cpu).unit == NULL )
         {
             d.unit = d.dom = -1;
         }
         else
         {
-            d.unit = per_cpu(npc, cpu).unit->unit_id;
-            d.dom = per_cpu(npc, cpu).unit->domain->domain_id;
+            d.unit = per_cpu(npc, sched_cpu).unit->unit_id;
+            d.dom = per_cpu(npc, sched_cpu).unit->domain->domain_id;
         }
         __trace_var(TRC_SNULL_SCHEDULE, 1, sizeof(d), &d);
     }
@@ -731,10 +732,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = sched_idle_unit(cpu);
+        ret.task = sched_idle_unit(sched_cpu);
     }
     else
-        ret.task = per_cpu(npc, cpu).unit;
+        ret.task = per_cpu(npc, sched_cpu).unit;
     ret.migrated = 0;
     ret.time = -1;
 
@@ -765,9 +766,9 @@ static struct task_slice null_schedule(const struct scheduler *ops,
                      !has_soft_affinity(wvc->unit) )
                     continue;
 
-                if ( unit_check_affinity(wvc->unit, cpu, bs) )
+                if ( unit_check_affinity(wvc->unit, sched_cpu, bs) )
                 {
-                    unit_assign(prv, wvc->unit, cpu);
+                    unit_assign(prv, wvc->unit, sched_cpu);
                     list_del_init(&wvc->waitq_elem);
                     ret.task = wvc->unit;
                     goto unlock;
@@ -779,7 +780,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     }
 
     if ( unlikely(ret.task == NULL || !unit_runnable(ret.task)) )
-        ret.task = sched_idle_unit(cpu);
+        ret.task = sched_idle_unit(sched_cpu);
 
     NULL_UNIT_CHECK(ret.task);
     return ret;
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 57883bd0a0..ddd1badd38 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1058,7 +1058,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
 static struct task_slice
 rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
 {
-    const int cpu = smp_processor_id();
+    const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct rt_private *prv = rt_priv(ops);
     struct rt_unit *const scurr = rt_unit(current->sched_unit);
     struct rt_unit *snext = NULL;
@@ -1072,7 +1073,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
+        d.tickled = cpumask_test_cpu(sched_cpu, &prv->tickled);
         d.idle = is_idle_unit(currunit);
         trace_var(TRC_RTDS_SCHEDULE, 1,
                   sizeof(d),
@@ -1080,7 +1081,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     }
 
     /* clear ticked bit now that we've been scheduled */
-    cpumask_clear_cpu(cpu, &prv->tickled);
+    cpumask_clear_cpu(sched_cpu, &prv->tickled);
 
     /* burn_budget would return for IDLE UNIT */
     burn_budget(ops, scurr, now);
@@ -1088,13 +1089,13 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0,  NULL);
-        snext = rt_unit(sched_idle_unit(cpu));
+        snext = rt_unit(sched_idle_unit(sched_cpu));
     }
     else
     {
-        snext = runq_pick(ops, cpumask_of(cpu));
+        snext = runq_pick(ops, cpumask_of(sched_cpu));
         if ( snext == NULL )
-            snext = rt_unit(sched_idle_unit(cpu));
+            snext = rt_unit(sched_idle_unit(sched_cpu));
 
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_unit(currunit) &&
@@ -1119,9 +1120,9 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
             q_remove(snext);
             __set_bit(__RTDS_scheduled, &snext->flags);
         }
-        if ( sched_unit_cpu(snext->unit) != cpu )
+        if ( sched_unit_cpu(snext->unit) != sched_cpu )
         {
-            sched_set_res(snext->unit, get_sched_res(cpu));
+            sched_set_res(snext->unit, get_sched_res(sched_cpu));
             ret.migrated = 1;
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 2a8789b438..8b349069db 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -110,6 +110,11 @@ static inline struct sched_unit *sched_idle_unit(unsigned int cpu)
     return idle_vcpu[cpu]->sched_unit;
 }
 
+static inline unsigned int sched_get_resource_cpu(unsigned int cpu)
+{
+    return get_sched_res(cpu)->processor;
+}
+
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
  * Within each scheduler, when using the scratch mask of one pCPU:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 27/60] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

Especially in the do_schedule() functions of the different schedulers
using smp_processor_id() for the local cpu number is correct only if
the sched_unit is a single vcpu. As soon as larger sched_units are
used most uses should be replaced by the cpu number of the local
sched_resource instead.

Add a helper to get that sched_resource cpu and modify the schedulers
to use it in a correct way.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |  2 +-
 xen/common/sched_credit.c   | 21 +++++++++---------
 xen/common/sched_credit2.c  | 53 +++++++++++++++++++++++----------------------
 xen/common/sched_null.c     | 17 ++++++++-------
 xen/common/sched_rt.c       | 17 ++++++++-------
 xen/include/xen/sched-if.h  |  5 +++++
 6 files changed, 62 insertions(+), 53 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 213bc960ef..e48f2b2eb9 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -513,7 +513,7 @@ a653sched_do_schedule(
     static unsigned int sched_index = 0;
     static s_time_t next_switch_time;
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
-    const unsigned int cpu = smp_processor_id();
+    const unsigned int cpu = sched_get_resource_cpu(smp_processor_id());
     unsigned long flags;
 
     spin_lock_irqsave(&sched_priv->lock, flags);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 05df9e54ac..e67b9bb23e 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1690,7 +1690,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
     int peer_cpu, first_cpu, peer_node, bstep;
     int node = cpu_to_node(cpu);
 
-    BUG_ON( cpu != sched_unit_cpu(snext->unit) );
+    BUG_ON( sched_get_resource_cpu(cpu) != sched_unit_cpu(snext->unit) );
     online = cpupool_online_cpumask(c);
 
     /*
@@ -1831,8 +1831,9 @@ static struct task_slice
 csched_schedule(
     const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
 {
-    const int cpu = smp_processor_id();
-    struct list_head * const runq = RUNQ(cpu);
+    const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
+    struct list_head * const runq = RUNQ(sched_cpu);
     struct sched_unit *unit = current->sched_unit;
     struct csched_unit * const scurr = CSCHED_UNIT(unit);
     struct csched_private *prv = CSCHED_PRIV(ops);
@@ -1942,7 +1943,7 @@ csched_schedule(
     {
         BUG_ON( is_idle_unit(unit) || list_empty(runq) );
         /* Current has blocked. Update the runnable counter for this cpu. */
-        dec_nr_runnable(cpu);
+        dec_nr_runnable(sched_cpu);
     }
 
     snext = __runq_elem(runq->next);
@@ -1952,7 +1953,7 @@ csched_schedule(
     if ( tasklet_work_scheduled )
     {
         TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_UNIT(sched_idle_unit(cpu));
+        snext = CSCHED_UNIT(sched_idle_unit(sched_cpu));
         snext->pri = CSCHED_PRI_TS_BOOST;
     }
 
@@ -1972,7 +1973,7 @@ csched_schedule(
     if ( snext->pri > CSCHED_PRI_TS_OVER )
         __runq_remove(snext);
     else
-        snext = csched_load_balance(prv, cpu, snext, &ret.migrated);
+        snext = csched_load_balance(prv, sched_cpu, snext, &ret.migrated);
 
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
@@ -1980,12 +1981,12 @@ csched_schedule(
      */
     if ( !tasklet_work_scheduled && snext->pri == CSCHED_PRI_IDLE )
     {
-        if ( !cpumask_test_cpu(cpu, prv->idlers) )
-            cpumask_set_cpu(cpu, prv->idlers);
+        if ( !cpumask_test_cpu(sched_cpu, prv->idlers) )
+            cpumask_set_cpu(sched_cpu, prv->idlers);
     }
-    else if ( cpumask_test_cpu(cpu, prv->idlers) )
+    else if ( cpumask_test_cpu(sched_cpu, prv->idlers) )
     {
-        cpumask_clear_cpu(cpu, prv->idlers);
+        cpumask_clear_cpu(sched_cpu, prv->idlers);
     }
 
     if ( !is_idle_unit(snext->unit) )
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 0f57135e81..cf502b3cbc 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3447,7 +3447,8 @@ static struct task_slice
 csched2_schedule(
     const struct scheduler *ops, s_time_t now, bool tasklet_work_scheduled)
 {
-    const int cpu = smp_processor_id();
+    const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct csched2_runqueue_data *rqd;
     struct sched_unit *currunit = current->sched_unit;
     struct csched2_unit * const scurr = csched2_unit(currunit);
@@ -3459,22 +3460,22 @@ csched2_schedule(
     SCHED_STAT_CRANK(schedule);
     CSCHED2_UNIT_CHECK(currunit);
 
-    BUG_ON(!cpumask_test_cpu(cpu, &csched2_priv(ops)->initialized));
+    BUG_ON(!cpumask_test_cpu(sched_cpu, &csched2_priv(ops)->initialized));
 
-    rqd = c2rqd(ops, cpu);
-    BUG_ON(!cpumask_test_cpu(cpu, &rqd->active));
+    rqd = c2rqd(ops, sched_cpu);
+    BUG_ON(!cpumask_test_cpu(sched_cpu, &rqd->active));
 
-    ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
+    ASSERT(spin_is_locked(get_sched_res(sched_cpu)->schedule_lock));
 
     BUG_ON(!is_idle_unit(currunit) && scurr->rqd != rqd);
 
     /* Clear "tickled" bit now that we've been scheduled */
-    tickled = cpumask_test_cpu(cpu, &rqd->tickled);
+    tickled = cpumask_test_cpu(sched_cpu, &rqd->tickled);
     if ( tickled )
     {
-        __cpumask_clear_cpu(cpu, &rqd->tickled);
+        __cpumask_clear_cpu(sched_cpu, &rqd->tickled);
         cpumask_andnot(cpumask_scratch, &rqd->idle, &rqd->tickled);
-        smt_idle_mask_set(cpu, cpumask_scratch, &rqd->smt_idle);
+        smt_idle_mask_set(sched_cpu, cpumask_scratch, &rqd->smt_idle);
     }
 
     if ( unlikely(tb_init_done) )
@@ -3484,10 +3485,10 @@ csched2_schedule(
             unsigned tasklet:8, idle:8, smt_idle:8, tickled:8;
         } d;
         d.cpu = cpu;
-        d.rq_id = c2r(cpu);
+        d.rq_id = c2r(sched_cpu);
         d.tasklet = tasklet_work_scheduled;
         d.idle = is_idle_unit(currunit);
-        d.smt_idle = cpumask_test_cpu(cpu, &rqd->smt_idle);
+        d.smt_idle = cpumask_test_cpu(sched_cpu, &rqd->smt_idle);
         d.tickled = tickled;
         __trace_var(TRC_CSCHED2_SCHEDULE, 1,
                     sizeof(d),
@@ -3527,10 +3528,10 @@ csched2_schedule(
     {
         __clear_bit(__CSFLAG_unit_yield, &scurr->flags);
         trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
-        snext = csched2_unit(sched_idle_unit(cpu));
+        snext = csched2_unit(sched_idle_unit(sched_cpu));
     }
     else
-        snext = runq_candidate(rqd, scurr, cpu, now, &skipped_units);
+        snext = runq_candidate(rqd, scurr, sched_cpu, now, &skipped_units);
 
     /* If switching from a non-idle runnable unit, put it
      * back on the runqueue. */
@@ -3555,10 +3556,10 @@ csched2_schedule(
         }
 
         /* Clear the idle mask if necessary */
-        if ( cpumask_test_cpu(cpu, &rqd->idle) )
+        if ( cpumask_test_cpu(sched_cpu, &rqd->idle) )
         {
-            __cpumask_clear_cpu(cpu, &rqd->idle);
-            smt_idle_mask_clear(cpu, &rqd->smt_idle);
+            __cpumask_clear_cpu(sched_cpu, &rqd->idle);
+            smt_idle_mask_clear(sched_cpu, &rqd->smt_idle);
         }
 
         /*
@@ -3577,18 +3578,18 @@ csched2_schedule(
          */
         if ( skipped_units == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
         {
-            reset_credit(ops, cpu, now, snext);
-            balance_load(ops, cpu, now);
+            reset_credit(ops, sched_cpu, now, snext);
+            balance_load(ops, sched_cpu, now);
         }
 
         snext->start_time = now;
         snext->tickled_cpu = -1;
 
         /* Safe because lock for old processor is held */
-        if ( sched_unit_cpu(snext->unit) != cpu )
+        if ( sched_unit_cpu(snext->unit) != sched_cpu )
         {
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
-            sched_set_res(snext->unit, get_sched_res(cpu));
+            sched_set_res(snext->unit, get_sched_res(sched_cpu));
             SCHED_STAT_CRANK(migrated);
             ret.migrated = 1;
         }
@@ -3601,17 +3602,17 @@ csched2_schedule(
          */
         if ( tasklet_work_scheduled )
         {
-            if ( cpumask_test_cpu(cpu, &rqd->idle) )
+            if ( cpumask_test_cpu(sched_cpu, &rqd->idle) )
             {
-                __cpumask_clear_cpu(cpu, &rqd->idle);
-                smt_idle_mask_clear(cpu, &rqd->smt_idle);
+                __cpumask_clear_cpu(sched_cpu, &rqd->idle);
+                smt_idle_mask_clear(sched_cpu, &rqd->smt_idle);
             }
         }
-        else if ( !cpumask_test_cpu(cpu, &rqd->idle) )
+        else if ( !cpumask_test_cpu(sched_cpu, &rqd->idle) )
         {
-            __cpumask_set_cpu(cpu, &rqd->idle);
+            __cpumask_set_cpu(sched_cpu, &rqd->idle);
             cpumask_andnot(cpumask_scratch, &rqd->idle, &rqd->tickled);
-            smt_idle_mask_set(cpu, cpumask_scratch, &rqd->smt_idle);
+            smt_idle_mask_set(sched_cpu, cpumask_scratch, &rqd->smt_idle);
         }
         /* Make sure avgload gets updated periodically even
          * if there's no activity */
@@ -3621,7 +3622,7 @@ csched2_schedule(
     /*
      * Return task to run next...
      */
-    ret.time = csched2_runtime(ops, cpu, snext, now);
+    ret.time = csched2_runtime(ops, sched_cpu, snext, now);
     ret.task = snext->unit;
 
     CSCHED2_UNIT_CHECK(ret.task);
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 53a20d73aa..6c350a342b 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -701,6 +701,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
 {
     unsigned int bs;
     const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct null_private *prv = null_priv(ops);
     struct null_unit *wvc;
     struct task_slice ret;
@@ -716,14 +717,14 @@ static struct task_slice null_schedule(const struct scheduler *ops,
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        if ( per_cpu(npc, cpu).unit == NULL )
+        if ( per_cpu(npc, sched_cpu).unit == NULL )
         {
             d.unit = d.dom = -1;
         }
         else
         {
-            d.unit = per_cpu(npc, cpu).unit->unit_id;
-            d.dom = per_cpu(npc, cpu).unit->domain->domain_id;
+            d.unit = per_cpu(npc, sched_cpu).unit->unit_id;
+            d.dom = per_cpu(npc, sched_cpu).unit->domain->domain_id;
         }
         __trace_var(TRC_SNULL_SCHEDULE, 1, sizeof(d), &d);
     }
@@ -731,10 +732,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = sched_idle_unit(cpu);
+        ret.task = sched_idle_unit(sched_cpu);
     }
     else
-        ret.task = per_cpu(npc, cpu).unit;
+        ret.task = per_cpu(npc, sched_cpu).unit;
     ret.migrated = 0;
     ret.time = -1;
 
@@ -765,9 +766,9 @@ static struct task_slice null_schedule(const struct scheduler *ops,
                      !has_soft_affinity(wvc->unit) )
                     continue;
 
-                if ( unit_check_affinity(wvc->unit, cpu, bs) )
+                if ( unit_check_affinity(wvc->unit, sched_cpu, bs) )
                 {
-                    unit_assign(prv, wvc->unit, cpu);
+                    unit_assign(prv, wvc->unit, sched_cpu);
                     list_del_init(&wvc->waitq_elem);
                     ret.task = wvc->unit;
                     goto unlock;
@@ -779,7 +780,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     }
 
     if ( unlikely(ret.task == NULL || !unit_runnable(ret.task)) )
-        ret.task = sched_idle_unit(cpu);
+        ret.task = sched_idle_unit(sched_cpu);
 
     NULL_UNIT_CHECK(ret.task);
     return ret;
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 57883bd0a0..ddd1badd38 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1058,7 +1058,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
 static struct task_slice
 rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
 {
-    const int cpu = smp_processor_id();
+    const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct rt_private *prv = rt_priv(ops);
     struct rt_unit *const scurr = rt_unit(current->sched_unit);
     struct rt_unit *snext = NULL;
@@ -1072,7 +1073,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
+        d.tickled = cpumask_test_cpu(sched_cpu, &prv->tickled);
         d.idle = is_idle_unit(currunit);
         trace_var(TRC_RTDS_SCHEDULE, 1,
                   sizeof(d),
@@ -1080,7 +1081,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     }
 
     /* clear ticked bit now that we've been scheduled */
-    cpumask_clear_cpu(cpu, &prv->tickled);
+    cpumask_clear_cpu(sched_cpu, &prv->tickled);
 
     /* burn_budget would return for IDLE UNIT */
     burn_budget(ops, scurr, now);
@@ -1088,13 +1089,13 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0,  NULL);
-        snext = rt_unit(sched_idle_unit(cpu));
+        snext = rt_unit(sched_idle_unit(sched_cpu));
     }
     else
     {
-        snext = runq_pick(ops, cpumask_of(cpu));
+        snext = runq_pick(ops, cpumask_of(sched_cpu));
         if ( snext == NULL )
-            snext = rt_unit(sched_idle_unit(cpu));
+            snext = rt_unit(sched_idle_unit(sched_cpu));
 
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_unit(currunit) &&
@@ -1119,9 +1120,9 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
             q_remove(snext);
             __set_bit(__RTDS_scheduled, &snext->flags);
         }
-        if ( sched_unit_cpu(snext->unit) != cpu )
+        if ( sched_unit_cpu(snext->unit) != sched_cpu )
         {
-            sched_set_res(snext->unit, get_sched_res(cpu));
+            sched_set_res(snext->unit, get_sched_res(sched_cpu));
             ret.migrated = 1;
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 2a8789b438..8b349069db 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -110,6 +110,11 @@ static inline struct sched_unit *sched_idle_unit(unsigned int cpu)
     return idle_vcpu[cpu]->sched_unit;
 }
 
+static inline unsigned int sched_get_resource_cpu(unsigned int cpu)
+{
+    return get_sched_res(cpu)->processor;
+}
+
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
  * Within each scheduler, when using the scratch mask of one pCPU:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 28/60] xen/sched: switch schedule() from vcpus to sched_units
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Use sched_units instead of vcpus in schedule(). This includes the
introduction of sched_unit_runstate_change() as a replacement of
vcpu_runstate_change() in schedule().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 70 +++++++++++++++++++++++++++++----------------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 2c1ba32d83..11d1b73bfc 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -192,6 +192,20 @@ static inline void vcpu_runstate_change(
     v->runstate.state = new_state;
 }
 
+static inline void sched_unit_runstate_change(struct sched_unit *unit,
+    bool running, s_time_t new_entry_time)
+{
+    struct vcpu *v = unit->vcpu;
+
+    if ( running )
+        vcpu_runstate_change(v, RUNSTATE_running, new_entry_time);
+    else
+        vcpu_runstate_change(v,
+            ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
+             (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)),
+            new_entry_time);
+}
+
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
 {
     spinlock_t *lock = likely(v == current)
@@ -1531,7 +1545,7 @@ static void vcpu_periodic_timer_work(struct vcpu *v)
  */
 static void schedule(void)
 {
-    struct vcpu          *prev = current, *next = NULL;
+    struct sched_unit    *prev = current->sched_unit, *next = NULL;
     s_time_t              now;
     struct scheduler     *sched;
     unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
@@ -1575,9 +1589,9 @@ static void schedule(void)
     sched = this_cpu(scheduler);
     next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
 
-    next = next_slice.task->vcpu;
+    next = next_slice.task;
 
-    sd->curr = next->sched_unit;
+    sd->curr = next;
 
     if ( next_slice.time >= 0 ) /* -ve means no limit */
         set_timer(&sd->s_timer, now + next_slice.time);
@@ -1586,60 +1600,56 @@ static void schedule(void)
     {
         pcpu_schedule_unlock_irq(lock, cpu);
         TRACE_4D(TRC_SCHED_SWITCH_INFCONT,
-                 next->domain->domain_id, next->vcpu_id,
-                 now - prev->runstate.state_entry_time,
+                 next->domain->domain_id, next->unit_id,
+                 now - prev->state_entry_time,
                  next_slice.time);
-        trace_continue_running(next);
-        return continue_running(prev);
+        trace_continue_running(next->vcpu);
+        return continue_running(prev->vcpu);
     }
 
     TRACE_3D(TRC_SCHED_SWITCH_INFPREV,
-             prev->domain->domain_id, prev->vcpu_id,
-             now - prev->runstate.state_entry_time);
+             prev->domain->domain_id, prev->unit_id,
+             now - prev->state_entry_time);
     TRACE_4D(TRC_SCHED_SWITCH_INFNEXT,
-             next->domain->domain_id, next->vcpu_id,
-             (next->runstate.state == RUNSTATE_runnable) ?
-             (now - next->runstate.state_entry_time) : 0,
+             next->domain->domain_id, next->unit_id,
+             (next->vcpu->runstate.state == RUNSTATE_runnable) ?
+             (now - next->state_entry_time) : 0,
              next_slice.time);
 
-    ASSERT(prev->runstate.state == RUNSTATE_running);
+    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
 
     TRACE_4D(TRC_SCHED_SWITCH,
-             prev->domain->domain_id, prev->vcpu_id,
-             next->domain->domain_id, next->vcpu_id);
+             prev->domain->domain_id, prev->unit_id,
+             next->domain->domain_id, next->unit_id);
 
-    vcpu_runstate_change(
-        prev,
-        ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
-         (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
-        now);
-    prev->sched_unit->last_run_time = now;
+    sched_unit_runstate_change(prev, false, now);
+    prev->last_run_time = now;
 
-    ASSERT(next->runstate.state != RUNSTATE_running);
-    vcpu_runstate_change(next, RUNSTATE_running, now);
+    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
+    sched_unit_runstate_change(next, true, now);
 
     /*
      * NB. Don't add any trace records from here until the actual context
      * switch, else lost_records resume will not work properly.
      */
 
-    ASSERT(!next->sched_unit->is_running);
+    ASSERT(!next->is_running);
+    next->vcpu->is_running = 1;
     next->is_running = 1;
-    next->sched_unit->is_running = 1;
-    next->sched_unit->state_entry_time = now;
+    next->state_entry_time = now;
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
     SCHED_STAT_CRANK(sched_ctx);
 
-    stop_timer(&prev->periodic_timer);
+    stop_timer(&prev->vcpu->periodic_timer);
 
     if ( next_slice.migrated )
-        sched_move_irqs(next);
+        sched_move_irqs(next->vcpu);
 
-    vcpu_periodic_timer_work(next);
+    vcpu_periodic_timer_work(next->vcpu);
 
-    context_switch(prev, next);
+    context_switch(prev->vcpu, next->vcpu);
 }
 
 void context_saved(struct vcpu *prev)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 28/60] xen/sched: switch schedule() from vcpus to sched_units
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Use sched_units instead of vcpus in schedule(). This includes the
introduction of sched_unit_runstate_change() as a replacement of
vcpu_runstate_change() in schedule().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 70 +++++++++++++++++++++++++++++----------------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 2c1ba32d83..11d1b73bfc 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -192,6 +192,20 @@ static inline void vcpu_runstate_change(
     v->runstate.state = new_state;
 }
 
+static inline void sched_unit_runstate_change(struct sched_unit *unit,
+    bool running, s_time_t new_entry_time)
+{
+    struct vcpu *v = unit->vcpu;
+
+    if ( running )
+        vcpu_runstate_change(v, RUNSTATE_running, new_entry_time);
+    else
+        vcpu_runstate_change(v,
+            ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
+             (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)),
+            new_entry_time);
+}
+
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
 {
     spinlock_t *lock = likely(v == current)
@@ -1531,7 +1545,7 @@ static void vcpu_periodic_timer_work(struct vcpu *v)
  */
 static void schedule(void)
 {
-    struct vcpu          *prev = current, *next = NULL;
+    struct sched_unit    *prev = current->sched_unit, *next = NULL;
     s_time_t              now;
     struct scheduler     *sched;
     unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
@@ -1575,9 +1589,9 @@ static void schedule(void)
     sched = this_cpu(scheduler);
     next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
 
-    next = next_slice.task->vcpu;
+    next = next_slice.task;
 
-    sd->curr = next->sched_unit;
+    sd->curr = next;
 
     if ( next_slice.time >= 0 ) /* -ve means no limit */
         set_timer(&sd->s_timer, now + next_slice.time);
@@ -1586,60 +1600,56 @@ static void schedule(void)
     {
         pcpu_schedule_unlock_irq(lock, cpu);
         TRACE_4D(TRC_SCHED_SWITCH_INFCONT,
-                 next->domain->domain_id, next->vcpu_id,
-                 now - prev->runstate.state_entry_time,
+                 next->domain->domain_id, next->unit_id,
+                 now - prev->state_entry_time,
                  next_slice.time);
-        trace_continue_running(next);
-        return continue_running(prev);
+        trace_continue_running(next->vcpu);
+        return continue_running(prev->vcpu);
     }
 
     TRACE_3D(TRC_SCHED_SWITCH_INFPREV,
-             prev->domain->domain_id, prev->vcpu_id,
-             now - prev->runstate.state_entry_time);
+             prev->domain->domain_id, prev->unit_id,
+             now - prev->state_entry_time);
     TRACE_4D(TRC_SCHED_SWITCH_INFNEXT,
-             next->domain->domain_id, next->vcpu_id,
-             (next->runstate.state == RUNSTATE_runnable) ?
-             (now - next->runstate.state_entry_time) : 0,
+             next->domain->domain_id, next->unit_id,
+             (next->vcpu->runstate.state == RUNSTATE_runnable) ?
+             (now - next->state_entry_time) : 0,
              next_slice.time);
 
-    ASSERT(prev->runstate.state == RUNSTATE_running);
+    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
 
     TRACE_4D(TRC_SCHED_SWITCH,
-             prev->domain->domain_id, prev->vcpu_id,
-             next->domain->domain_id, next->vcpu_id);
+             prev->domain->domain_id, prev->unit_id,
+             next->domain->domain_id, next->unit_id);
 
-    vcpu_runstate_change(
-        prev,
-        ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
-         (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
-        now);
-    prev->sched_unit->last_run_time = now;
+    sched_unit_runstate_change(prev, false, now);
+    prev->last_run_time = now;
 
-    ASSERT(next->runstate.state != RUNSTATE_running);
-    vcpu_runstate_change(next, RUNSTATE_running, now);
+    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
+    sched_unit_runstate_change(next, true, now);
 
     /*
      * NB. Don't add any trace records from here until the actual context
      * switch, else lost_records resume will not work properly.
      */
 
-    ASSERT(!next->sched_unit->is_running);
+    ASSERT(!next->is_running);
+    next->vcpu->is_running = 1;
     next->is_running = 1;
-    next->sched_unit->is_running = 1;
-    next->sched_unit->state_entry_time = now;
+    next->state_entry_time = now;
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
     SCHED_STAT_CRANK(sched_ctx);
 
-    stop_timer(&prev->periodic_timer);
+    stop_timer(&prev->vcpu->periodic_timer);
 
     if ( next_slice.migrated )
-        sched_move_irqs(next);
+        sched_move_irqs(next->vcpu);
 
-    vcpu_periodic_timer_work(next);
+    vcpu_periodic_timer_work(next->vcpu);
 
-    context_switch(prev, next);
+    context_switch(prev->vcpu, next->vcpu);
 }
 
 void context_saved(struct vcpu *prev)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 29/60] xen/sched: switch sched_move_irqs() to take sched_unit as parameter
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

sched_move_irqs() should work on a sched_unit as that is the unit
moved between cpus.

Rename the current function to vcpu_move_irqs() as it is still needed
in schedule().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 11d1b73bfc..266932fe25 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -416,12 +416,20 @@ int sched_init_vcpu(struct vcpu *v)
     return 0;
 }
 
-static void sched_move_irqs(struct vcpu *v)
+static void vcpu_move_irqs(struct vcpu *v)
 {
     arch_move_irqs(v);
     evtchn_move_pirqs(v);
 }
 
+static void sched_move_irqs(struct sched_unit *unit)
+{
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        vcpu_move_irqs(v);
+}
+
 int sched_move_domain(struct domain *d, struct cpupool *c)
 {
     struct vcpu *v;
@@ -501,7 +509,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         v->sched_unit->priv = vcpu_priv[v->vcpu_id];
         if ( !d->is_dying )
-            sched_move_irqs(v);
+            sched_move_irqs(v->sched_unit);
 
         new_p = cpumask_cycle(new_p, c->cpu_valid);
 
@@ -794,7 +802,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
     sched_spin_unlock_double(old_lock, new_lock, flags);
 
     if ( old_cpu != new_cpu )
-        sched_move_irqs(v);
+        sched_move_irqs(v->sched_unit);
 
     /* Wake on new CPU. */
     vcpu_wake(v);
@@ -872,7 +880,7 @@ void restore_vcpu_affinity(struct domain *d)
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
-            sched_move_irqs(v);
+            sched_move_irqs(v->sched_unit);
     }
 
     domain_update_node_affinity(d);
@@ -1645,7 +1653,7 @@ static void schedule(void)
     stop_timer(&prev->vcpu->periodic_timer);
 
     if ( next_slice.migrated )
-        sched_move_irqs(next->vcpu);
+        vcpu_move_irqs(next->vcpu);
 
     vcpu_periodic_timer_work(next->vcpu);
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 29/60] xen/sched: switch sched_move_irqs() to take sched_unit as parameter
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

sched_move_irqs() should work on a sched_unit as that is the unit
moved between cpus.

Rename the current function to vcpu_move_irqs() as it is still needed
in schedule().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 11d1b73bfc..266932fe25 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -416,12 +416,20 @@ int sched_init_vcpu(struct vcpu *v)
     return 0;
 }
 
-static void sched_move_irqs(struct vcpu *v)
+static void vcpu_move_irqs(struct vcpu *v)
 {
     arch_move_irqs(v);
     evtchn_move_pirqs(v);
 }
 
+static void sched_move_irqs(struct sched_unit *unit)
+{
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        vcpu_move_irqs(v);
+}
+
 int sched_move_domain(struct domain *d, struct cpupool *c)
 {
     struct vcpu *v;
@@ -501,7 +509,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         v->sched_unit->priv = vcpu_priv[v->vcpu_id];
         if ( !d->is_dying )
-            sched_move_irqs(v);
+            sched_move_irqs(v->sched_unit);
 
         new_p = cpumask_cycle(new_p, c->cpu_valid);
 
@@ -794,7 +802,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
     sched_spin_unlock_double(old_lock, new_lock, flags);
 
     if ( old_cpu != new_cpu )
-        sched_move_irqs(v);
+        sched_move_irqs(v->sched_unit);
 
     /* Wake on new CPU. */
     vcpu_wake(v);
@@ -872,7 +880,7 @@ void restore_vcpu_affinity(struct domain *d)
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
-            sched_move_irqs(v);
+            sched_move_irqs(v->sched_unit);
     }
 
     domain_update_node_affinity(d);
@@ -1645,7 +1653,7 @@ static void schedule(void)
     stop_timer(&prev->vcpu->periodic_timer);
 
     if ( next_slice.migrated )
-        sched_move_irqs(next->vcpu);
+        vcpu_move_irqs(next->vcpu);
 
     vcpu_periodic_timer_work(next->vcpu);
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 30/60] xen: switch from for_each_vcpu() to for_each_sched_unit()
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli

Where appropriate switch from for_each_vcpu() to for_each_sched_unit()
in order to prepare core scheduling.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/domain.c   |   9 ++---
 xen/common/schedule.c | 109 ++++++++++++++++++++++++++------------------------
 2 files changed, 60 insertions(+), 58 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 2d3427eb0f..f55ff06513 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -509,7 +509,7 @@ void domain_update_node_affinity(struct domain *d)
     cpumask_var_t dom_cpumask, dom_cpumask_soft;
     cpumask_t *dom_affinity;
     const cpumask_t *online;
-    struct vcpu *v;
+    struct sched_unit *unit;
     unsigned int cpu;
 
     /* Do we have vcpus already? If not, no need to update node-affinity. */
@@ -542,12 +542,11 @@ void domain_update_node_affinity(struct domain *d)
          * and the full mask of where it would prefer to run (the union of
          * the soft affinity of all its various vcpus). Let's build them.
          */
-        for_each_vcpu ( d, v )
+        for_each_sched_unit ( d, unit )
         {
-            cpumask_or(dom_cpumask, dom_cpumask,
-                       v->sched_unit->cpu_hard_affinity);
+            cpumask_or(dom_cpumask, dom_cpumask, unit->cpu_hard_affinity);
             cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
-                       v->sched_unit->cpu_soft_affinity);
+                       unit->cpu_soft_affinity);
         }
         /* Filter out non-online cpus */
         cpumask_and(dom_cpumask, dom_cpumask, online);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 266932fe25..63c9c0a8fb 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -433,16 +433,17 @@ static void sched_move_irqs(struct sched_unit *unit)
 int sched_move_domain(struct domain *d, struct cpupool *c)
 {
     struct vcpu *v;
+    struct sched_unit *unit;
     unsigned int new_p;
-    void **vcpu_priv;
+    void **unit_priv;
     void *domdata;
-    void *vcpudata;
+    void *unitdata;
     struct scheduler *old_ops;
     void *old_domdata;
 
-    for_each_vcpu ( d, v )
+    for_each_sched_unit ( d, unit )
     {
-        if ( v->sched_unit->affinity_broken )
+        if ( unit->affinity_broken )
             return -EBUSY;
     }
 
@@ -450,22 +451,21 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     if ( IS_ERR(domdata) )
         return PTR_ERR(domdata);
 
-    vcpu_priv = xzalloc_array(void *, d->max_vcpus);
-    if ( vcpu_priv == NULL )
+    unit_priv = xzalloc_array(void *, d->max_vcpus);
+    if ( unit_priv == NULL )
     {
         sched_free_domdata(c->sched, domdata);
         return -ENOMEM;
     }
 
-    for_each_vcpu ( d, v )
+    for_each_sched_unit ( d, unit )
     {
-        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, v->sched_unit,
-                                                  domdata);
-        if ( vcpu_priv[v->vcpu_id] == NULL )
+        unit_priv[unit->unit_id] = sched_alloc_vdata(c->sched, unit, domdata);
+        if ( unit_priv[unit->unit_id] == NULL )
         {
-            for_each_vcpu ( d, v )
-                xfree(vcpu_priv[v->vcpu_id]);
-            xfree(vcpu_priv);
+            for_each_sched_unit ( d, unit )
+                xfree(unit_priv[unit->unit_id]);
+            xfree(unit_priv);
             sched_free_domdata(c->sched, domdata);
             return -ENOMEM;
         }
@@ -476,30 +476,35 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     old_ops = dom_scheduler(d);
     old_domdata = d->sched_priv;
 
-    for_each_vcpu ( d, v )
+    for_each_sched_unit ( d, unit )
     {
-        sched_remove_unit(old_ops, v->sched_unit);
+        sched_remove_unit(old_ops, unit);
     }
 
     d->cpupool = c;
     d->sched_priv = domdata;
 
     new_p = cpumask_first(c->cpu_valid);
-    for_each_vcpu ( d, v )
+    for_each_sched_unit ( d, unit )
     {
         spinlock_t *lock;
+        unsigned int unit_p = new_p;
 
-        vcpudata = v->sched_unit->priv;
+        unitdata = unit->priv;
 
-        migrate_timer(&v->periodic_timer, new_p);
-        migrate_timer(&v->singleshot_timer, new_p);
-        migrate_timer(&v->poll_timer, new_p);
+        for_each_sched_unit_vcpu ( unit, v )
+        {
+            migrate_timer(&v->periodic_timer, new_p);
+            migrate_timer(&v->singleshot_timer, new_p);
+            migrate_timer(&v->poll_timer, new_p);
+            new_p = cpumask_cycle(new_p, c->cpu_valid);
+        }
 
-        lock = unit_schedule_lock_irq(v->sched_unit);
+        lock = unit_schedule_lock_irq(unit);
 
-        sched_set_affinity(v, &cpumask_all, &cpumask_all);
+        sched_set_affinity(unit->vcpu, &cpumask_all, &cpumask_all);
 
-        sched_set_res(v->sched_unit, get_sched_res(new_p));
+        sched_set_res(unit, get_sched_res(unit_p));
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
@@ -507,15 +512,13 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
          */
         spin_unlock_irq(lock);
 
-        v->sched_unit->priv = vcpu_priv[v->vcpu_id];
+        unit->priv = unit_priv[unit->unit_id];
         if ( !d->is_dying )
-            sched_move_irqs(v->sched_unit);
-
-        new_p = cpumask_cycle(new_p, c->cpu_valid);
+            sched_move_irqs(unit);
 
-        sched_insert_unit(c->sched, v->sched_unit);
+        sched_insert_unit(c->sched, unit);
 
-        sched_free_vdata(old_ops, vcpudata);
+        sched_free_vdata(old_ops, unitdata);
     }
 
     domain_update_node_affinity(d);
@@ -524,7 +527,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     sched_free_domdata(old_ops, old_domdata);
 
-    xfree(vcpu_priv);
+    xfree(unit_priv);
 
     return 0;
 }
@@ -829,15 +832,14 @@ void vcpu_force_reschedule(struct vcpu *v)
 void restore_vcpu_affinity(struct domain *d)
 {
     unsigned int cpu = smp_processor_id();
-    struct vcpu *v;
+    struct sched_unit *unit;
 
     ASSERT(system_state == SYS_STATE_resume);
 
-    for_each_vcpu ( d, v )
+    for_each_sched_unit ( d, unit )
     {
         spinlock_t *lock;
-        unsigned int old_cpu = v->processor;
-        struct sched_unit *unit = v->sched_unit;
+        unsigned int old_cpu = sched_unit_cpu(unit);
         struct sched_resource *res;
 
         ASSERT(!unit_runnable(unit));
@@ -856,7 +858,8 @@ void restore_vcpu_affinity(struct domain *d)
         {
             if ( unit->affinity_broken )
             {
-                sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL);
+                sched_set_affinity(unit->vcpu, unit->cpu_hard_affinity_saved,
+                                   NULL);
                 unit->affinity_broken = 0;
                 cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
@@ -864,8 +867,8 @@ void restore_vcpu_affinity(struct domain *d)
 
             if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
             {
-                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
-                sched_set_affinity(v, &cpumask_all, NULL);
+                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", unit->vcpu);
+                sched_set_affinity(unit->vcpu, &cpumask_all, NULL);
                 cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
             }
@@ -875,12 +878,12 @@ void restore_vcpu_affinity(struct domain *d)
         sched_set_res(unit, res);
 
         lock = unit_schedule_lock_irq(unit);
-        res = sched_pick_resource(vcpu_scheduler(v), unit);
+        res = sched_pick_resource(vcpu_scheduler(unit->vcpu), unit);
         sched_set_res(unit, res);
         spin_unlock_irq(lock);
 
-        if ( old_cpu != v->processor )
-            sched_move_irqs(v->sched_unit);
+        if ( old_cpu != sched_unit_cpu(unit) )
+            sched_move_irqs(unit);
     }
 
     domain_update_node_affinity(d);
@@ -894,7 +897,6 @@ void restore_vcpu_affinity(struct domain *d)
 int cpu_disable_scheduler(unsigned int cpu)
 {
     struct domain *d;
-    struct vcpu *v;
     struct cpupool *c;
     cpumask_t online_affinity;
     int ret = 0;
@@ -905,10 +907,11 @@ int cpu_disable_scheduler(unsigned int cpu)
 
     for_each_domain_in_cpupool ( d, c )
     {
-        for_each_vcpu ( d, v )
+        struct sched_unit *unit;
+
+        for_each_sched_unit ( d, unit )
         {
             unsigned long flags;
-            struct sched_unit *unit = v->sched_unit;
             spinlock_t *lock = unit_schedule_lock_irqsave(unit, &flags);
 
             cpumask_and(&online_affinity, unit->cpu_hard_affinity, c->cpu_valid);
@@ -923,14 +926,14 @@ int cpu_disable_scheduler(unsigned int cpu)
                     break;
                 }
 
-                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
+                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", unit->vcpu);
 
-                sched_set_affinity(v, &cpumask_all, NULL);
+                sched_set_affinity(unit->vcpu, &cpumask_all, NULL);
             }
 
-            if ( v->processor != cpu )
+            if ( sched_unit_cpu(unit) != sched_get_resource_cpu(cpu) )
             {
-                /* The vcpu is not on this cpu, so we can move on. */
+                /* The unit is not on this cpu, so we can move on. */
                 unit_schedule_unlock_irqrestore(lock, flags, unit);
                 continue;
             }
@@ -943,17 +946,17 @@ int cpu_disable_scheduler(unsigned int cpu)
              *  * the scheduler will always find a suitable solution, or
              *    things would have failed before getting in here.
              */
-            vcpu_migrate_start(v);
+            vcpu_migrate_start(unit->vcpu);
             unit_schedule_unlock_irqrestore(lock, flags, unit);
 
-            vcpu_migrate_finish(v);
+            vcpu_migrate_finish(unit->vcpu);
 
             /*
              * The only caveat, in this case, is that if a vcpu active in
              * the hypervisor isn't migratable. In this case, the caller
              * should try again after releasing and reaquiring all locks.
              */
-            if ( v->processor == cpu )
+            if ( sched_unit_cpu(unit) == sched_get_resource_cpu(cpu) )
                 ret = -EAGAIN;
         }
     }
@@ -964,16 +967,16 @@ int cpu_disable_scheduler(unsigned int cpu)
 static int cpu_disable_scheduler_check(unsigned int cpu)
 {
     struct domain *d;
-    struct vcpu *v;
     struct cpupool *c;
+    struct sched_unit *unit;
 
     c = per_cpu(cpupool, cpu);
     if ( c == NULL )
         return 0;
 
     for_each_domain_in_cpupool ( d, c )
-        for_each_vcpu ( d, v )
-            if ( v->sched_unit->affinity_broken )
+        for_each_sched_unit ( d, unit )
+            if ( unit->affinity_broken )
                 return -EADDRINUSE;
 
     return 0;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 30/60] xen: switch from for_each_vcpu() to for_each_sched_unit()
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli

Where appropriate switch from for_each_vcpu() to for_each_sched_unit()
in order to prepare core scheduling.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/domain.c   |   9 ++---
 xen/common/schedule.c | 109 ++++++++++++++++++++++++++------------------------
 2 files changed, 60 insertions(+), 58 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 2d3427eb0f..f55ff06513 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -509,7 +509,7 @@ void domain_update_node_affinity(struct domain *d)
     cpumask_var_t dom_cpumask, dom_cpumask_soft;
     cpumask_t *dom_affinity;
     const cpumask_t *online;
-    struct vcpu *v;
+    struct sched_unit *unit;
     unsigned int cpu;
 
     /* Do we have vcpus already? If not, no need to update node-affinity. */
@@ -542,12 +542,11 @@ void domain_update_node_affinity(struct domain *d)
          * and the full mask of where it would prefer to run (the union of
          * the soft affinity of all its various vcpus). Let's build them.
          */
-        for_each_vcpu ( d, v )
+        for_each_sched_unit ( d, unit )
         {
-            cpumask_or(dom_cpumask, dom_cpumask,
-                       v->sched_unit->cpu_hard_affinity);
+            cpumask_or(dom_cpumask, dom_cpumask, unit->cpu_hard_affinity);
             cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
-                       v->sched_unit->cpu_soft_affinity);
+                       unit->cpu_soft_affinity);
         }
         /* Filter out non-online cpus */
         cpumask_and(dom_cpumask, dom_cpumask, online);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 266932fe25..63c9c0a8fb 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -433,16 +433,17 @@ static void sched_move_irqs(struct sched_unit *unit)
 int sched_move_domain(struct domain *d, struct cpupool *c)
 {
     struct vcpu *v;
+    struct sched_unit *unit;
     unsigned int new_p;
-    void **vcpu_priv;
+    void **unit_priv;
     void *domdata;
-    void *vcpudata;
+    void *unitdata;
     struct scheduler *old_ops;
     void *old_domdata;
 
-    for_each_vcpu ( d, v )
+    for_each_sched_unit ( d, unit )
     {
-        if ( v->sched_unit->affinity_broken )
+        if ( unit->affinity_broken )
             return -EBUSY;
     }
 
@@ -450,22 +451,21 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     if ( IS_ERR(domdata) )
         return PTR_ERR(domdata);
 
-    vcpu_priv = xzalloc_array(void *, d->max_vcpus);
-    if ( vcpu_priv == NULL )
+    unit_priv = xzalloc_array(void *, d->max_vcpus);
+    if ( unit_priv == NULL )
     {
         sched_free_domdata(c->sched, domdata);
         return -ENOMEM;
     }
 
-    for_each_vcpu ( d, v )
+    for_each_sched_unit ( d, unit )
     {
-        vcpu_priv[v->vcpu_id] = sched_alloc_vdata(c->sched, v->sched_unit,
-                                                  domdata);
-        if ( vcpu_priv[v->vcpu_id] == NULL )
+        unit_priv[unit->unit_id] = sched_alloc_vdata(c->sched, unit, domdata);
+        if ( unit_priv[unit->unit_id] == NULL )
         {
-            for_each_vcpu ( d, v )
-                xfree(vcpu_priv[v->vcpu_id]);
-            xfree(vcpu_priv);
+            for_each_sched_unit ( d, unit )
+                xfree(unit_priv[unit->unit_id]);
+            xfree(unit_priv);
             sched_free_domdata(c->sched, domdata);
             return -ENOMEM;
         }
@@ -476,30 +476,35 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     old_ops = dom_scheduler(d);
     old_domdata = d->sched_priv;
 
-    for_each_vcpu ( d, v )
+    for_each_sched_unit ( d, unit )
     {
-        sched_remove_unit(old_ops, v->sched_unit);
+        sched_remove_unit(old_ops, unit);
     }
 
     d->cpupool = c;
     d->sched_priv = domdata;
 
     new_p = cpumask_first(c->cpu_valid);
-    for_each_vcpu ( d, v )
+    for_each_sched_unit ( d, unit )
     {
         spinlock_t *lock;
+        unsigned int unit_p = new_p;
 
-        vcpudata = v->sched_unit->priv;
+        unitdata = unit->priv;
 
-        migrate_timer(&v->periodic_timer, new_p);
-        migrate_timer(&v->singleshot_timer, new_p);
-        migrate_timer(&v->poll_timer, new_p);
+        for_each_sched_unit_vcpu ( unit, v )
+        {
+            migrate_timer(&v->periodic_timer, new_p);
+            migrate_timer(&v->singleshot_timer, new_p);
+            migrate_timer(&v->poll_timer, new_p);
+            new_p = cpumask_cycle(new_p, c->cpu_valid);
+        }
 
-        lock = unit_schedule_lock_irq(v->sched_unit);
+        lock = unit_schedule_lock_irq(unit);
 
-        sched_set_affinity(v, &cpumask_all, &cpumask_all);
+        sched_set_affinity(unit->vcpu, &cpumask_all, &cpumask_all);
 
-        sched_set_res(v->sched_unit, get_sched_res(new_p));
+        sched_set_res(unit, get_sched_res(unit_p));
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
@@ -507,15 +512,13 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
          */
         spin_unlock_irq(lock);
 
-        v->sched_unit->priv = vcpu_priv[v->vcpu_id];
+        unit->priv = unit_priv[unit->unit_id];
         if ( !d->is_dying )
-            sched_move_irqs(v->sched_unit);
-
-        new_p = cpumask_cycle(new_p, c->cpu_valid);
+            sched_move_irqs(unit);
 
-        sched_insert_unit(c->sched, v->sched_unit);
+        sched_insert_unit(c->sched, unit);
 
-        sched_free_vdata(old_ops, vcpudata);
+        sched_free_vdata(old_ops, unitdata);
     }
 
     domain_update_node_affinity(d);
@@ -524,7 +527,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     sched_free_domdata(old_ops, old_domdata);
 
-    xfree(vcpu_priv);
+    xfree(unit_priv);
 
     return 0;
 }
@@ -829,15 +832,14 @@ void vcpu_force_reschedule(struct vcpu *v)
 void restore_vcpu_affinity(struct domain *d)
 {
     unsigned int cpu = smp_processor_id();
-    struct vcpu *v;
+    struct sched_unit *unit;
 
     ASSERT(system_state == SYS_STATE_resume);
 
-    for_each_vcpu ( d, v )
+    for_each_sched_unit ( d, unit )
     {
         spinlock_t *lock;
-        unsigned int old_cpu = v->processor;
-        struct sched_unit *unit = v->sched_unit;
+        unsigned int old_cpu = sched_unit_cpu(unit);
         struct sched_resource *res;
 
         ASSERT(!unit_runnable(unit));
@@ -856,7 +858,8 @@ void restore_vcpu_affinity(struct domain *d)
         {
             if ( unit->affinity_broken )
             {
-                sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL);
+                sched_set_affinity(unit->vcpu, unit->cpu_hard_affinity_saved,
+                                   NULL);
                 unit->affinity_broken = 0;
                 cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
@@ -864,8 +867,8 @@ void restore_vcpu_affinity(struct domain *d)
 
             if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
             {
-                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
-                sched_set_affinity(v, &cpumask_all, NULL);
+                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", unit->vcpu);
+                sched_set_affinity(unit->vcpu, &cpumask_all, NULL);
                 cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
             }
@@ -875,12 +878,12 @@ void restore_vcpu_affinity(struct domain *d)
         sched_set_res(unit, res);
 
         lock = unit_schedule_lock_irq(unit);
-        res = sched_pick_resource(vcpu_scheduler(v), unit);
+        res = sched_pick_resource(vcpu_scheduler(unit->vcpu), unit);
         sched_set_res(unit, res);
         spin_unlock_irq(lock);
 
-        if ( old_cpu != v->processor )
-            sched_move_irqs(v->sched_unit);
+        if ( old_cpu != sched_unit_cpu(unit) )
+            sched_move_irqs(unit);
     }
 
     domain_update_node_affinity(d);
@@ -894,7 +897,6 @@ void restore_vcpu_affinity(struct domain *d)
 int cpu_disable_scheduler(unsigned int cpu)
 {
     struct domain *d;
-    struct vcpu *v;
     struct cpupool *c;
     cpumask_t online_affinity;
     int ret = 0;
@@ -905,10 +907,11 @@ int cpu_disable_scheduler(unsigned int cpu)
 
     for_each_domain_in_cpupool ( d, c )
     {
-        for_each_vcpu ( d, v )
+        struct sched_unit *unit;
+
+        for_each_sched_unit ( d, unit )
         {
             unsigned long flags;
-            struct sched_unit *unit = v->sched_unit;
             spinlock_t *lock = unit_schedule_lock_irqsave(unit, &flags);
 
             cpumask_and(&online_affinity, unit->cpu_hard_affinity, c->cpu_valid);
@@ -923,14 +926,14 @@ int cpu_disable_scheduler(unsigned int cpu)
                     break;
                 }
 
-                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
+                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", unit->vcpu);
 
-                sched_set_affinity(v, &cpumask_all, NULL);
+                sched_set_affinity(unit->vcpu, &cpumask_all, NULL);
             }
 
-            if ( v->processor != cpu )
+            if ( sched_unit_cpu(unit) != sched_get_resource_cpu(cpu) )
             {
-                /* The vcpu is not on this cpu, so we can move on. */
+                /* The unit is not on this cpu, so we can move on. */
                 unit_schedule_unlock_irqrestore(lock, flags, unit);
                 continue;
             }
@@ -943,17 +946,17 @@ int cpu_disable_scheduler(unsigned int cpu)
              *  * the scheduler will always find a suitable solution, or
              *    things would have failed before getting in here.
              */
-            vcpu_migrate_start(v);
+            vcpu_migrate_start(unit->vcpu);
             unit_schedule_unlock_irqrestore(lock, flags, unit);
 
-            vcpu_migrate_finish(v);
+            vcpu_migrate_finish(unit->vcpu);
 
             /*
              * The only caveat, in this case, is that if a vcpu active in
              * the hypervisor isn't migratable. In this case, the caller
              * should try again after releasing and reaquiring all locks.
              */
-            if ( v->processor == cpu )
+            if ( sched_unit_cpu(unit) == sched_get_resource_cpu(cpu) )
                 ret = -EAGAIN;
         }
     }
@@ -964,16 +967,16 @@ int cpu_disable_scheduler(unsigned int cpu)
 static int cpu_disable_scheduler_check(unsigned int cpu)
 {
     struct domain *d;
-    struct vcpu *v;
     struct cpupool *c;
+    struct sched_unit *unit;
 
     c = per_cpu(cpupool, cpu);
     if ( c == NULL )
         return 0;
 
     for_each_domain_in_cpupool ( d, c )
-        for_each_vcpu ( d, v )
-            if ( v->sched_unit->affinity_broken )
+        for_each_sched_unit ( d, unit )
+            if ( unit->affinity_broken )
                 return -EADDRINUSE;
 
     return 0;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 31/60] xen/sched: add runstate counters to struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add counters to struct sched_unit summing up runstates of associated
vcpus.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: add counters for each possible runstate
---
 xen/common/schedule.c   | 5 +++++
 xen/include/xen/sched.h | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 63c9c0a8fb..fd0ca1e137 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -174,6 +174,7 @@ static inline void vcpu_runstate_change(
     struct vcpu *v, int new_state, s_time_t new_entry_time)
 {
     s_time_t delta;
+    struct sched_unit *unit = v->sched_unit;
 
     ASSERT(v->runstate.state != new_state);
     ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
@@ -182,6 +183,9 @@ static inline void vcpu_runstate_change(
 
     trace_runstate_change(v, new_state);
 
+    unit->runstate_cnt[v->runstate.state]--;
+    unit->runstate_cnt[new_state]++;
+
     delta = new_entry_time - v->runstate.state_entry_time;
     if ( delta > 0 )
     {
@@ -305,6 +309,7 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
     unit->vcpu = v;
     unit->unit_id = v->vcpu_id;
     unit->domain = d;
+    unit->runstate_cnt[v->runstate.state]++;
 
     for ( prev_unit = &d->sched_unit_list; *prev_unit;
           prev_unit = &(*prev_unit)->next_in_list )
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index b14dff6fc4..69d3b32c7b 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -272,6 +272,8 @@ struct sched_unit {
     uint64_t               last_run_time;
     /* Last time unit got (de-)scheduled. */
     uint64_t               state_entry_time;
+    /* Vcpu state summary. */
+    unsigned int           runstate_cnt[4];
 
     /* Currently running on a CPU? */
     bool                   is_running;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 31/60] xen/sched: add runstate counters to struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add counters to struct sched_unit summing up runstates of associated
vcpus.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: add counters for each possible runstate
---
 xen/common/schedule.c   | 5 +++++
 xen/include/xen/sched.h | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 63c9c0a8fb..fd0ca1e137 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -174,6 +174,7 @@ static inline void vcpu_runstate_change(
     struct vcpu *v, int new_state, s_time_t new_entry_time)
 {
     s_time_t delta;
+    struct sched_unit *unit = v->sched_unit;
 
     ASSERT(v->runstate.state != new_state);
     ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
@@ -182,6 +183,9 @@ static inline void vcpu_runstate_change(
 
     trace_runstate_change(v, new_state);
 
+    unit->runstate_cnt[v->runstate.state]--;
+    unit->runstate_cnt[new_state]++;
+
     delta = new_entry_time - v->runstate.state_entry_time;
     if ( delta > 0 )
     {
@@ -305,6 +309,7 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
     unit->vcpu = v;
     unit->unit_id = v->vcpu_id;
     unit->domain = d;
+    unit->runstate_cnt[v->runstate.state]++;
 
     for ( prev_unit = &d->sched_unit_list; *prev_unit;
           prev_unit = &(*prev_unit)->next_in_list )
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index b14dff6fc4..69d3b32c7b 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -272,6 +272,8 @@ struct sched_unit {
     uint64_t               last_run_time;
     /* Last time unit got (de-)scheduled. */
     uint64_t               state_entry_time;
+    /* Vcpu state summary. */
+    unsigned int           runstate_cnt[4];
 
     /* Currently running on a CPU? */
     bool                   is_running;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 32/60] xen/sched: rework and rename vcpu_force_reschedule()
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

vcpu_force_reschedule() is only used for modifying the periodic timer
of a vcpu. Forcing a vcpu to give up the physical cpu for that purpose
is kind of brutal.

So instead of doing the reschedule dance just operate on the timer
directly.

In case we are modifying the timer of the currently running vcpu we
can just do that. In case it is for a foreign vcpu we should pause it
for that purpose like we do for all other vcpu state modifications.

Rename the function to vcpu_set_periodic_timer() as this now reflects
the functionality.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: latch NOW() only after stopping the timer (Jan Beulich)
---
 xen/arch/x86/pv/shim.c  |  4 +---
 xen/common/domain.c     |  6 ++----
 xen/common/schedule.c   | 24 ++++++++++++++----------
 xen/include/xen/sched.h |  2 +-
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 324ca27f93..5edbcd9ac5 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -410,7 +410,7 @@ int pv_shim_shutdown(uint8_t reason)
         unmap_vcpu_info(v);
 
         /* Reset the periodic timer to the default value. */
-        v->periodic_period = MILLISECS(10);
+        vcpu_set_periodic_timer(v, MILLISECS(10));
         /* Stop the singleshot timer. */
         stop_timer(&v->singleshot_timer);
 
@@ -419,8 +419,6 @@ int pv_shim_shutdown(uint8_t reason)
 
         if ( v != current )
             vcpu_unpause_by_systemcontroller(v);
-        else
-            vcpu_force_reschedule(v);
     }
 
     return 0;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index f55ff06513..65f8a369c1 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1438,15 +1438,13 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( set.period_ns > STIME_DELTA_MAX )
             return -EINVAL;
 
-        v->periodic_period = set.period_ns;
-        vcpu_force_reschedule(v);
+        vcpu_set_periodic_timer(v, set.period_ns);
 
         break;
     }
 
     case VCPUOP_stop_periodic_timer:
-        v->periodic_period = 0;
-        vcpu_force_reschedule(v);
+        vcpu_set_periodic_timer(v, 0);
         break;
 
     case VCPUOP_set_singleshot_timer:
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index fd0ca1e137..12f9852786 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -817,21 +817,25 @@ static void vcpu_migrate_finish(struct vcpu *v)
 }
 
 /*
- * Force a VCPU through a deschedule/reschedule path.
- * For example, using this when setting the periodic timer period means that
- * most periodic-timer state need only be touched from within the scheduler
- * which can thus be done without need for synchronisation.
+ * Set the periodic timer of a vcpu.
  */
-void vcpu_force_reschedule(struct vcpu *v)
+void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value)
 {
-    spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit);
+    s_time_t now;
 
-    if ( v->sched_unit->is_running )
-        vcpu_migrate_start(v);
+    if ( v != current )
+        vcpu_pause(v);
+    else
+        stop_timer(&v->periodic_timer);
 
-    unit_schedule_unlock_irq(lock, v->sched_unit);
+    now = NOW();
+    v->periodic_period = value;
+    v->periodic_last_event = now;
 
-    vcpu_migrate_finish(v);
+    if ( v != current )
+        vcpu_unpause(v);
+    else if ( value != 0 )
+        set_timer(&v->periodic_timer, now + value);
 }
 
 void restore_vcpu_affinity(struct domain *d)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 69d3b32c7b..0ab358f632 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -894,7 +894,7 @@ struct scheduler *scheduler_get_default(void);
 struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr);
 void scheduler_free(struct scheduler *sched);
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c);
-void vcpu_force_reschedule(struct vcpu *v);
+void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value);
 int cpu_disable_scheduler(unsigned int cpu);
 /* We need it in dom0_setup_vcpu */
 void sched_set_affinity(struct vcpu *v, const cpumask_t *hard,
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 32/60] xen/sched: rework and rename vcpu_force_reschedule()
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

vcpu_force_reschedule() is only used for modifying the periodic timer
of a vcpu. Forcing a vcpu to give up the physical cpu for that purpose
is kind of brutal.

So instead of doing the reschedule dance just operate on the timer
directly.

In case we are modifying the timer of the currently running vcpu we
can just do that. In case it is for a foreign vcpu we should pause it
for that purpose like we do for all other vcpu state modifications.

Rename the function to vcpu_set_periodic_timer() as this now reflects
the functionality.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: latch NOW() only after stopping the timer (Jan Beulich)
---
 xen/arch/x86/pv/shim.c  |  4 +---
 xen/common/domain.c     |  6 ++----
 xen/common/schedule.c   | 24 ++++++++++++++----------
 xen/include/xen/sched.h |  2 +-
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 324ca27f93..5edbcd9ac5 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -410,7 +410,7 @@ int pv_shim_shutdown(uint8_t reason)
         unmap_vcpu_info(v);
 
         /* Reset the periodic timer to the default value. */
-        v->periodic_period = MILLISECS(10);
+        vcpu_set_periodic_timer(v, MILLISECS(10));
         /* Stop the singleshot timer. */
         stop_timer(&v->singleshot_timer);
 
@@ -419,8 +419,6 @@ int pv_shim_shutdown(uint8_t reason)
 
         if ( v != current )
             vcpu_unpause_by_systemcontroller(v);
-        else
-            vcpu_force_reschedule(v);
     }
 
     return 0;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index f55ff06513..65f8a369c1 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1438,15 +1438,13 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( set.period_ns > STIME_DELTA_MAX )
             return -EINVAL;
 
-        v->periodic_period = set.period_ns;
-        vcpu_force_reschedule(v);
+        vcpu_set_periodic_timer(v, set.period_ns);
 
         break;
     }
 
     case VCPUOP_stop_periodic_timer:
-        v->periodic_period = 0;
-        vcpu_force_reschedule(v);
+        vcpu_set_periodic_timer(v, 0);
         break;
 
     case VCPUOP_set_singleshot_timer:
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index fd0ca1e137..12f9852786 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -817,21 +817,25 @@ static void vcpu_migrate_finish(struct vcpu *v)
 }
 
 /*
- * Force a VCPU through a deschedule/reschedule path.
- * For example, using this when setting the periodic timer period means that
- * most periodic-timer state need only be touched from within the scheduler
- * which can thus be done without need for synchronisation.
+ * Set the periodic timer of a vcpu.
  */
-void vcpu_force_reschedule(struct vcpu *v)
+void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value)
 {
-    spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit);
+    s_time_t now;
 
-    if ( v->sched_unit->is_running )
-        vcpu_migrate_start(v);
+    if ( v != current )
+        vcpu_pause(v);
+    else
+        stop_timer(&v->periodic_timer);
 
-    unit_schedule_unlock_irq(lock, v->sched_unit);
+    now = NOW();
+    v->periodic_period = value;
+    v->periodic_last_event = now;
 
-    vcpu_migrate_finish(v);
+    if ( v != current )
+        vcpu_unpause(v);
+    else if ( value != 0 )
+        set_timer(&v->periodic_timer, now + value);
 }
 
 void restore_vcpu_affinity(struct domain *d)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 69d3b32c7b..0ab358f632 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -894,7 +894,7 @@ struct scheduler *scheduler_get_default(void);
 struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr);
 void scheduler_free(struct scheduler *sched);
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c);
-void vcpu_force_reschedule(struct vcpu *v);
+void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value);
 int cpu_disable_scheduler(unsigned int cpu);
 /* We need it in dom0_setup_vcpu */
 void sched_set_affinity(struct vcpu *v, const cpumask_t *hard,
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 33/60] xen/sched: Change vcpu_migrate_*() to operate on schedule unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Now that vcpu_migrate_start() and vcpu_migrate_finish() are used only
to ensure a vcpu is running on a suitable processor they can be
switched to operate on schedule units instead of vcpus.

While doing that rename them accordingly and make the _start() variant
static.

vcpu_move_locked() is switched to schedule unit, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 102 +++++++++++++++++++++++++++++---------------------
 1 file changed, 59 insertions(+), 43 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 12f9852786..8121da15c6 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -673,35 +673,40 @@ void vcpu_unblock(struct vcpu *v)
 }
 
 /*
- * Do the actual movement of a vcpu from old to new CPU. Locks for *both*
+ * Do the actual movement of an unit from old to new CPU. Locks for *both*
  * CPUs needs to have been taken already when calling this!
  */
-static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
+static void sched_unit_move_locked(struct sched_unit *unit,
+                                   unsigned int new_cpu)
 {
-    unsigned int old_cpu = v->processor;
+    unsigned int old_cpu = unit->res->processor;
+    struct vcpu *v;
 
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
      * once the switch occurs, v->is_urgent is no longer protected by
      * the per-CPU scheduler lock we are holding.
      */
-    if ( unlikely(v->is_urgent) && (old_cpu != new_cpu) )
+    for_each_sched_unit_vcpu ( unit, v )
     {
-        atomic_inc(&get_sched_res(new_cpu)->urgent_count);
-        atomic_dec(&get_sched_res(old_cpu)->urgent_count);
+        if ( unlikely(v->is_urgent) && (old_cpu != new_cpu) )
+        {
+            atomic_inc(&get_sched_res(new_cpu)->urgent_count);
+            atomic_dec(&get_sched_res(old_cpu)->urgent_count);
+        }
     }
 
     /*
      * Actual CPU switch to new CPU.  This is safe because the lock
      * pointer can't change while the current lock is held.
      */
-    sched_migrate(vcpu_scheduler(v), v->sched_unit, new_cpu);
+    sched_migrate(vcpu_scheduler(unit->vcpu), unit, new_cpu);
 }
 
 /*
  * Initiating migration
  *
- * In order to migrate, we need the vcpu in question to have stopped
+ * In order to migrate, we need the unit in question to have stopped
  * running and had sched_sleep() called (to take it off any
  * runqueues, for instance); and if it is currently running, it needs
  * to be scheduled out.  Finally, we need to hold the scheduling locks
@@ -717,37 +722,45 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
  * should be called like this:
  *
  *     lock = unit_schedule_lock_irq(unit);
- *     vcpu_migrate_start(v);
+ *     sched_unit_migrate_start(unit);
  *     unit_schedule_unlock_irq(lock, unit)
- *     vcpu_migrate_finish(v);
+ *     sched_unit_migrate_finish(unit);
  *
- * vcpu_migrate_finish() will do the work now if it can, or simply
- * return if it can't (because v is still running); in that case
- * vcpu_migrate_finish() will be called by context_saved().
+ * sched_unit_migrate_finish() will do the work now if it can, or simply
+ * return if it can't (because unit is still running); in that case
+ * sched_unit_migrate_finish() will be called by context_saved().
  */
-static void vcpu_migrate_start(struct vcpu *v)
+static void sched_unit_migrate_start(struct sched_unit *unit)
 {
-    set_bit(_VPF_migrating, &v->pause_flags);
-    vcpu_sleep_nosync_locked(v);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        set_bit(_VPF_migrating, &v->pause_flags);
+        vcpu_sleep_nosync_locked(v);
+    }
 }
 
-static void vcpu_migrate_finish(struct vcpu *v)
+static void sched_unit_migrate_finish(struct sched_unit *unit)
 {
     unsigned long flags;
     unsigned int old_cpu, new_cpu;
     spinlock_t *old_lock, *new_lock;
     bool_t pick_called = 0;
+    struct vcpu *v;
 
     /*
-     * If the vcpu is currently running, this will be handled by
+     * If the unit is currently running, this will be handled by
      * context_saved(); and in any case, if the bit is cleared, then
      * someone else has already done the work so we don't need to.
      */
-    if ( v->sched_unit->is_running ||
-         !test_bit(_VPF_migrating, &v->pause_flags) )
-        return;
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        if ( unit->is_running || !test_bit(_VPF_migrating, &v->pause_flags) )
+            return;
+    }
 
-    old_cpu = new_cpu = v->processor;
+    old_cpu = new_cpu = unit->res->processor;
     for ( ; ; )
     {
         /*
@@ -760,7 +773,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
 
         sched_spin_lock_double(old_lock, new_lock, &flags);
 
-        old_cpu = v->processor;
+        old_cpu = unit->res->processor;
         if ( old_lock == get_sched_res(old_cpu)->schedule_lock )
         {
             /*
@@ -769,15 +782,15 @@ static void vcpu_migrate_finish(struct vcpu *v)
              */
             if ( pick_called &&
                  (new_lock == get_sched_res(new_cpu)->schedule_lock) &&
-                 cpumask_test_cpu(new_cpu, v->sched_unit->cpu_hard_affinity) &&
-                 cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
+                 cpumask_test_cpu(new_cpu, unit->cpu_hard_affinity) &&
+                 cpumask_test_cpu(new_cpu, unit->domain->cpupool->cpu_valid) )
                 break;
 
             /* Select a new CPU. */
-            new_cpu = sched_pick_resource(vcpu_scheduler(v),
-                                          v->sched_unit)->processor;
+            new_cpu = sched_pick_resource(vcpu_scheduler(unit->vcpu),
+                                          unit)->processor;
             if ( (new_lock == get_sched_res(new_cpu)->schedule_lock) &&
-                 cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
+                 cpumask_test_cpu(new_cpu, unit->domain->cpupool->cpu_valid) )
                 break;
             pick_called = 1;
         }
@@ -798,22 +811,26 @@ static void vcpu_migrate_finish(struct vcpu *v)
      * because they both happen in (different) spinlock regions, and those
      * regions are strictly serialised.
      */
-    if ( v->sched_unit->is_running ||
-         !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
+    for_each_sched_unit_vcpu ( unit, v )
     {
-        sched_spin_unlock_double(old_lock, new_lock, flags);
-        return;
+        if ( unit->is_running ||
+             !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
+        {
+            sched_spin_unlock_double(old_lock, new_lock, flags);
+            return;
+        }
     }
 
-    vcpu_move_locked(v, new_cpu);
+    sched_unit_move_locked(unit, new_cpu);
 
     sched_spin_unlock_double(old_lock, new_lock, flags);
 
     if ( old_cpu != new_cpu )
-        sched_move_irqs(v->sched_unit);
+        sched_move_irqs(unit);
 
     /* Wake on new CPU. */
-    vcpu_wake(v);
+    for_each_sched_unit_vcpu ( unit, v )
+        vcpu_wake(v);
 }
 
 /*
@@ -955,10 +972,9 @@ int cpu_disable_scheduler(unsigned int cpu)
              *  * the scheduler will always find a suitable solution, or
              *    things would have failed before getting in here.
              */
-            vcpu_migrate_start(unit->vcpu);
+            sched_unit_migrate_start(unit);
             unit_schedule_unlock_irqrestore(lock, flags, unit);
-
-            vcpu_migrate_finish(unit->vcpu);
+            sched_unit_migrate_finish(unit);
 
             /*
              * The only caveat, in this case, is that if a vcpu active in
@@ -1042,14 +1058,14 @@ static int vcpu_set_affinity(
             ASSERT(which == unit->cpu_soft_affinity);
             sched_set_affinity(v, NULL, affinity);
         }
-        vcpu_migrate_start(v);
+        sched_unit_migrate_start(unit);
     }
 
     unit_schedule_unlock_irq(lock, unit);
 
     domain_update_node_affinity(v->domain);
 
-    vcpu_migrate_finish(v);
+    sched_unit_migrate_finish(unit);
 
     return ret;
 }
@@ -1293,13 +1309,13 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     }
 
     if ( ret == 0 )
-        vcpu_migrate_start(v);
+        sched_unit_migrate_start(unit);
 
     unit_schedule_unlock_irq(lock, unit);
 
     domain_update_node_affinity(v->domain);
 
-    vcpu_migrate_finish(v);
+    sched_unit_migrate_finish(unit);
 
     return ret;
 }
@@ -1686,7 +1702,7 @@ void context_saved(struct vcpu *prev)
 
     sched_context_saved(vcpu_scheduler(prev), prev->sched_unit);
 
-    vcpu_migrate_finish(prev);
+    sched_unit_migrate_finish(prev->sched_unit);
 }
 
 /* The scheduler timer: force a run through the scheduler */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 33/60] xen/sched: Change vcpu_migrate_*() to operate on schedule unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Now that vcpu_migrate_start() and vcpu_migrate_finish() are used only
to ensure a vcpu is running on a suitable processor they can be
switched to operate on schedule units instead of vcpus.

While doing that rename them accordingly and make the _start() variant
static.

vcpu_move_locked() is switched to schedule unit, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 102 +++++++++++++++++++++++++++++---------------------
 1 file changed, 59 insertions(+), 43 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 12f9852786..8121da15c6 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -673,35 +673,40 @@ void vcpu_unblock(struct vcpu *v)
 }
 
 /*
- * Do the actual movement of a vcpu from old to new CPU. Locks for *both*
+ * Do the actual movement of an unit from old to new CPU. Locks for *both*
  * CPUs needs to have been taken already when calling this!
  */
-static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
+static void sched_unit_move_locked(struct sched_unit *unit,
+                                   unsigned int new_cpu)
 {
-    unsigned int old_cpu = v->processor;
+    unsigned int old_cpu = unit->res->processor;
+    struct vcpu *v;
 
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
      * once the switch occurs, v->is_urgent is no longer protected by
      * the per-CPU scheduler lock we are holding.
      */
-    if ( unlikely(v->is_urgent) && (old_cpu != new_cpu) )
+    for_each_sched_unit_vcpu ( unit, v )
     {
-        atomic_inc(&get_sched_res(new_cpu)->urgent_count);
-        atomic_dec(&get_sched_res(old_cpu)->urgent_count);
+        if ( unlikely(v->is_urgent) && (old_cpu != new_cpu) )
+        {
+            atomic_inc(&get_sched_res(new_cpu)->urgent_count);
+            atomic_dec(&get_sched_res(old_cpu)->urgent_count);
+        }
     }
 
     /*
      * Actual CPU switch to new CPU.  This is safe because the lock
      * pointer can't change while the current lock is held.
      */
-    sched_migrate(vcpu_scheduler(v), v->sched_unit, new_cpu);
+    sched_migrate(vcpu_scheduler(unit->vcpu), unit, new_cpu);
 }
 
 /*
  * Initiating migration
  *
- * In order to migrate, we need the vcpu in question to have stopped
+ * In order to migrate, we need the unit in question to have stopped
  * running and had sched_sleep() called (to take it off any
  * runqueues, for instance); and if it is currently running, it needs
  * to be scheduled out.  Finally, we need to hold the scheduling locks
@@ -717,37 +722,45 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
  * should be called like this:
  *
  *     lock = unit_schedule_lock_irq(unit);
- *     vcpu_migrate_start(v);
+ *     sched_unit_migrate_start(unit);
  *     unit_schedule_unlock_irq(lock, unit)
- *     vcpu_migrate_finish(v);
+ *     sched_unit_migrate_finish(unit);
  *
- * vcpu_migrate_finish() will do the work now if it can, or simply
- * return if it can't (because v is still running); in that case
- * vcpu_migrate_finish() will be called by context_saved().
+ * sched_unit_migrate_finish() will do the work now if it can, or simply
+ * return if it can't (because unit is still running); in that case
+ * sched_unit_migrate_finish() will be called by context_saved().
  */
-static void vcpu_migrate_start(struct vcpu *v)
+static void sched_unit_migrate_start(struct sched_unit *unit)
 {
-    set_bit(_VPF_migrating, &v->pause_flags);
-    vcpu_sleep_nosync_locked(v);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        set_bit(_VPF_migrating, &v->pause_flags);
+        vcpu_sleep_nosync_locked(v);
+    }
 }
 
-static void vcpu_migrate_finish(struct vcpu *v)
+static void sched_unit_migrate_finish(struct sched_unit *unit)
 {
     unsigned long flags;
     unsigned int old_cpu, new_cpu;
     spinlock_t *old_lock, *new_lock;
     bool_t pick_called = 0;
+    struct vcpu *v;
 
     /*
-     * If the vcpu is currently running, this will be handled by
+     * If the unit is currently running, this will be handled by
      * context_saved(); and in any case, if the bit is cleared, then
      * someone else has already done the work so we don't need to.
      */
-    if ( v->sched_unit->is_running ||
-         !test_bit(_VPF_migrating, &v->pause_flags) )
-        return;
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        if ( unit->is_running || !test_bit(_VPF_migrating, &v->pause_flags) )
+            return;
+    }
 
-    old_cpu = new_cpu = v->processor;
+    old_cpu = new_cpu = unit->res->processor;
     for ( ; ; )
     {
         /*
@@ -760,7 +773,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
 
         sched_spin_lock_double(old_lock, new_lock, &flags);
 
-        old_cpu = v->processor;
+        old_cpu = unit->res->processor;
         if ( old_lock == get_sched_res(old_cpu)->schedule_lock )
         {
             /*
@@ -769,15 +782,15 @@ static void vcpu_migrate_finish(struct vcpu *v)
              */
             if ( pick_called &&
                  (new_lock == get_sched_res(new_cpu)->schedule_lock) &&
-                 cpumask_test_cpu(new_cpu, v->sched_unit->cpu_hard_affinity) &&
-                 cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
+                 cpumask_test_cpu(new_cpu, unit->cpu_hard_affinity) &&
+                 cpumask_test_cpu(new_cpu, unit->domain->cpupool->cpu_valid) )
                 break;
 
             /* Select a new CPU. */
-            new_cpu = sched_pick_resource(vcpu_scheduler(v),
-                                          v->sched_unit)->processor;
+            new_cpu = sched_pick_resource(vcpu_scheduler(unit->vcpu),
+                                          unit)->processor;
             if ( (new_lock == get_sched_res(new_cpu)->schedule_lock) &&
-                 cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
+                 cpumask_test_cpu(new_cpu, unit->domain->cpupool->cpu_valid) )
                 break;
             pick_called = 1;
         }
@@ -798,22 +811,26 @@ static void vcpu_migrate_finish(struct vcpu *v)
      * because they both happen in (different) spinlock regions, and those
      * regions are strictly serialised.
      */
-    if ( v->sched_unit->is_running ||
-         !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
+    for_each_sched_unit_vcpu ( unit, v )
     {
-        sched_spin_unlock_double(old_lock, new_lock, flags);
-        return;
+        if ( unit->is_running ||
+             !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
+        {
+            sched_spin_unlock_double(old_lock, new_lock, flags);
+            return;
+        }
     }
 
-    vcpu_move_locked(v, new_cpu);
+    sched_unit_move_locked(unit, new_cpu);
 
     sched_spin_unlock_double(old_lock, new_lock, flags);
 
     if ( old_cpu != new_cpu )
-        sched_move_irqs(v->sched_unit);
+        sched_move_irqs(unit);
 
     /* Wake on new CPU. */
-    vcpu_wake(v);
+    for_each_sched_unit_vcpu ( unit, v )
+        vcpu_wake(v);
 }
 
 /*
@@ -955,10 +972,9 @@ int cpu_disable_scheduler(unsigned int cpu)
              *  * the scheduler will always find a suitable solution, or
              *    things would have failed before getting in here.
              */
-            vcpu_migrate_start(unit->vcpu);
+            sched_unit_migrate_start(unit);
             unit_schedule_unlock_irqrestore(lock, flags, unit);
-
-            vcpu_migrate_finish(unit->vcpu);
+            sched_unit_migrate_finish(unit);
 
             /*
              * The only caveat, in this case, is that if a vcpu active in
@@ -1042,14 +1058,14 @@ static int vcpu_set_affinity(
             ASSERT(which == unit->cpu_soft_affinity);
             sched_set_affinity(v, NULL, affinity);
         }
-        vcpu_migrate_start(v);
+        sched_unit_migrate_start(unit);
     }
 
     unit_schedule_unlock_irq(lock, unit);
 
     domain_update_node_affinity(v->domain);
 
-    vcpu_migrate_finish(v);
+    sched_unit_migrate_finish(unit);
 
     return ret;
 }
@@ -1293,13 +1309,13 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     }
 
     if ( ret == 0 )
-        vcpu_migrate_start(v);
+        sched_unit_migrate_start(unit);
 
     unit_schedule_unlock_irq(lock, unit);
 
     domain_update_node_affinity(v->domain);
 
-    vcpu_migrate_finish(v);
+    sched_unit_migrate_finish(unit);
 
     return ret;
 }
@@ -1686,7 +1702,7 @@ void context_saved(struct vcpu *prev)
 
     sched_context_saved(vcpu_scheduler(prev), prev->sched_unit);
 
-    vcpu_migrate_finish(prev);
+    sched_unit_migrate_finish(prev->sched_unit);
 }
 
 /* The scheduler timer: force a run through the scheduler */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 34/60] xen/sched: move struct task_slice into struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

In order to prepare for multiple vcpus per schedule unit move struct
task_slice in schedule() from the local stack into struct sched_unit
of the currently running unit. To make access easier for the single
schedulers add the pointer of the currently running unit as a parameter
of do_schedule().

While at it switch the tasklet_work_scheduled parameter of
do_schedule() from bool_t to bool.

As struct task_slice is only ever modified with the local schedule
lock held it is safe to directly set the different units in struct
sched_unit instead of using an on-stack copy for returning the data.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c | 20 +++++++-------------
 xen/common/sched_credit.c   | 25 +++++++++++--------------
 xen/common/sched_credit2.c  | 21 +++++++++------------
 xen/common/sched_null.c     | 26 ++++++++++++--------------
 xen/common/sched_rt.c       | 22 +++++++++++-----------
 xen/common/schedule.c       | 21 ++++++++++-----------
 xen/include/xen/sched-if.h  | 11 +++--------
 xen/include/xen/sched.h     |  6 ++++++
 8 files changed, 69 insertions(+), 83 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index e48f2b2eb9..34efcc07c9 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -497,18 +497,14 @@ a653sched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param now       Current time
- *
- * @return          Address of the UNIT structure scheduled to be run next
- *                  Amount of time to execute the returned UNIT
- *                  Flag for whether the UNIT was migrated
  */
-static struct task_slice
+static void
 a653sched_do_schedule(
     const struct scheduler *ops,
+    struct sched_unit *prev,
     s_time_t now,
-    bool_t tasklet_work_scheduled)
+    bool tasklet_work_scheduled)
 {
-    struct task_slice ret;                      /* hold the chosen domain */
     struct sched_unit *new_task = NULL;
     static unsigned int sched_index = 0;
     static s_time_t next_switch_time;
@@ -586,13 +582,11 @@ a653sched_do_schedule(
      * Return the amount of time the next domain has to run and the address
      * of the selected task's UNIT structure.
      */
-    ret.time = next_switch_time - now;
-    ret.task = new_task;
-    ret.migrated = 0;
-
-    BUG_ON(ret.time <= 0);
+    prev->next_time = next_switch_time - now;
+    prev->next_task = new_task;
+    new_task->migrated = false;
 
-    return ret;
+    BUG_ON(prev->next_time <= 0);
 }
 
 /**
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index e67b9bb23e..196f613420 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1681,7 +1681,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
 
 static struct csched_unit *
 csched_load_balance(struct csched_private *prv, int cpu,
-    struct csched_unit *snext, bool_t *stolen)
+    struct csched_unit *snext, bool *stolen)
 {
     struct cpupool *c = per_cpu(cpupool, cpu);
     struct csched_unit *speer;
@@ -1797,7 +1797,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                 /* As soon as one unit is found, balancing ends */
                 if ( speer != NULL )
                 {
-                    *stolen = 1;
+                    *stolen = true;
                     /*
                      * Next time we'll look for work to steal on this node, we
                      * will start from the next pCPU, with respect to this one,
@@ -1827,19 +1827,18 @@ csched_load_balance(struct csched_private *prv, int cpu,
  * This function is in the critical path. It is designed to be simple and
  * fast for the common case.
  */
-static struct task_slice
-csched_schedule(
-    const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
+static void csched_schedule(
+    const struct scheduler *ops, struct sched_unit *unit, s_time_t now,
+    bool tasklet_work_scheduled)
 {
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct list_head * const runq = RUNQ(sched_cpu);
-    struct sched_unit *unit = current->sched_unit;
     struct csched_unit * const scurr = CSCHED_UNIT(unit);
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_unit *snext;
-    struct task_slice ret;
     s_time_t runtime, tslice;
+    bool migrated = false;
 
     SCHED_STAT_CRANK(schedule);
     CSCHED_UNIT_CHECK(unit);
@@ -1929,7 +1928,6 @@ csched_schedule(
                         (unsigned char *)&d);
         }
 
-        ret.migrated = 0;
         goto out;
     }
     tslice = prv->tslice;
@@ -1947,7 +1945,6 @@ csched_schedule(
     }
 
     snext = __runq_elem(runq->next);
-    ret.migrated = 0;
 
     /* Tasklet work (which runs in idle UNIT context) overrides all else. */
     if ( tasklet_work_scheduled )
@@ -1973,7 +1970,7 @@ csched_schedule(
     if ( snext->pri > CSCHED_PRI_TS_OVER )
         __runq_remove(snext);
     else
-        snext = csched_load_balance(prv, sched_cpu, snext, &ret.migrated);
+        snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
 
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
@@ -1996,12 +1993,12 @@ out:
     /*
      * Return task to run next...
      */
-    ret.time = (is_idle_unit(snext->unit) ?
+    unit->next_time = (is_idle_unit(snext->unit) ?
                 -1 : tslice);
-    ret.task = snext->unit;
+    unit->next_task = snext->unit;
+    snext->unit->migrated = migrated;
 
-    CSCHED_UNIT_CHECK(ret.task);
-    return ret;
+    CSCHED_UNIT_CHECK(unit->next_task);
 }
 
 static void
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index cf502b3cbc..6695441588 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3443,19 +3443,18 @@ runq_candidate(struct csched2_runqueue_data *rqd,
  * This function is in the critical path. It is designed to be simple and
  * fast for the common case.
  */
-static struct task_slice
-csched2_schedule(
-    const struct scheduler *ops, s_time_t now, bool tasklet_work_scheduled)
+static void csched2_schedule(
+    const struct scheduler *ops, struct sched_unit *currunit, s_time_t now,
+    bool tasklet_work_scheduled)
 {
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct csched2_runqueue_data *rqd;
-    struct sched_unit *currunit = current->sched_unit;
     struct csched2_unit * const scurr = csched2_unit(currunit);
     struct csched2_unit *snext = NULL;
     unsigned int skipped_units = 0;
-    struct task_slice ret;
     bool tickled;
+    bool migrated = false;
 
     SCHED_STAT_CRANK(schedule);
     CSCHED2_UNIT_CHECK(currunit);
@@ -3540,8 +3539,6 @@ csched2_schedule(
          && unit_runnable(currunit) )
         __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags);
 
-    ret.migrated = 0;
-
     /* Accounting for non-idle tasks */
     if ( !is_idle_unit(snext->unit) )
     {
@@ -3591,7 +3588,7 @@ csched2_schedule(
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
             sched_set_res(snext->unit, get_sched_res(sched_cpu));
             SCHED_STAT_CRANK(migrated);
-            ret.migrated = 1;
+            migrated = true;
         }
     }
     else
@@ -3622,11 +3619,11 @@ csched2_schedule(
     /*
      * Return task to run next...
      */
-    ret.time = csched2_runtime(ops, sched_cpu, snext, now);
-    ret.task = snext->unit;
+    currunit->next_time = csched2_runtime(ops, sched_cpu, snext, now);
+    currunit->next_task = snext->unit;
+    snext->unit->migrated = migrated;
 
-    CSCHED2_UNIT_CHECK(ret.task);
-    return ret;
+    CSCHED2_UNIT_CHECK(currunit->next_task);
 }
 
 static void
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 6c350a342b..534d0aea5f 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -695,16 +695,14 @@ static inline void null_unit_check(struct sched_unit *unit)
  *  - the unit assigned to the pCPU, if there's one and it can run;
  *  - the idle unit, otherwise.
  */
-static struct task_slice null_schedule(const struct scheduler *ops,
-                                       s_time_t now,
-                                       bool_t tasklet_work_scheduled)
+static void null_schedule(const struct scheduler *ops, struct sched_unit *prev,
+                          s_time_t now, bool tasklet_work_scheduled)
 {
     unsigned int bs;
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct null_private *prv = null_priv(ops);
     struct null_unit *wvc;
-    struct task_slice ret;
 
     SCHED_STAT_CRANK(schedule);
     NULL_UNIT_CHECK(current->sched_unit);
@@ -732,19 +730,18 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = sched_idle_unit(sched_cpu);
+        prev->next_task = sched_idle_unit(sched_cpu);
     }
     else
-        ret.task = per_cpu(npc, sched_cpu).unit;
-    ret.migrated = 0;
-    ret.time = -1;
+        prev->next_task = per_cpu(npc, sched_cpu).unit;
+    prev->next_time = -1;
 
     /*
      * We may be new in the cpupool, or just coming back online. In which
      * case, there may be units in the waitqueue that we can assign to us
      * and run.
      */
-    if ( unlikely(ret.task == NULL) )
+    if ( unlikely(prev->next_task == NULL) )
     {
         spin_lock(&prv->waitq_lock);
 
@@ -770,7 +767,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
                 {
                     unit_assign(prv, wvc->unit, sched_cpu);
                     list_del_init(&wvc->waitq_elem);
-                    ret.task = wvc->unit;
+                    prev->next_task = wvc->unit;
                     goto unlock;
                 }
             }
@@ -779,11 +776,12 @@ static struct task_slice null_schedule(const struct scheduler *ops,
         spin_unlock(&prv->waitq_lock);
     }
 
-    if ( unlikely(ret.task == NULL || !unit_runnable(ret.task)) )
-        ret.task = sched_idle_unit(sched_cpu);
+    if ( unlikely(prev->next_task == NULL || !unit_runnable(prev->next_task)) )
+        prev->next_task = sched_idle_unit(sched_cpu);
 
-    NULL_UNIT_CHECK(ret.task);
-    return ret;
+    NULL_UNIT_CHECK(prev->next_task);
+
+    prev->next_task->migrated = false;
 }
 
 static inline void dump_unit(struct null_private *prv, struct null_unit *nvc)
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index ddd1badd38..aed498970e 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1055,16 +1055,16 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
  * schedule function for rt scheduler.
  * The lock is already grabbed in schedule.c, no need to lock here
  */
-static struct task_slice
-rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
+static void
+rt_schedule(const struct scheduler *ops, struct sched_unit *currunit,
+            s_time_t now, bool tasklet_work_scheduled)
 {
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct rt_private *prv = rt_priv(ops);
-    struct rt_unit *const scurr = rt_unit(current->sched_unit);
+    struct rt_unit *const scurr = rt_unit(currunit);
     struct rt_unit *snext = NULL;
-    struct task_slice ret = { .migrated = 0 };
-    struct sched_unit *currunit = current->sched_unit;
+    bool migrated = false;
 
     /* TRACE */
     {
@@ -1112,7 +1112,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         __set_bit(__RTDS_delayed_runq_add, &scurr->flags);
 
     snext->last_start = now;
-    ret.time =  -1; /* if an idle unit is picked */
+    currunit->next_time =  -1; /* if an idle unit is picked */
     if ( !is_idle_unit(snext->unit) )
     {
         if ( snext != scurr )
@@ -1123,13 +1123,13 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         if ( sched_unit_cpu(snext->unit) != sched_cpu )
         {
             sched_set_res(snext->unit, get_sched_res(sched_cpu));
-            ret.migrated = 1;
+            migrated = true;
         }
-        ret.time = snext->cur_budget; /* invoke the scheduler next time */
+        /* Invoke the scheduler next time. */
+        currunit->next_time = snext->cur_budget;
     }
-    ret.task = snext->unit;
-
-    return ret;
+    currunit->next_task = snext->unit;
+    snext->unit->migrated = migrated;
 }
 
 /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 8121da15c6..c23a5629de 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1585,10 +1585,9 @@ static void schedule(void)
     s_time_t              now;
     struct scheduler     *sched;
     unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
-    bool_t                tasklet_work_scheduled = 0;
+    bool                  tasklet_work_scheduled = false;
     struct sched_resource *sd;
     spinlock_t           *lock;
-    struct task_slice     next_slice;
     int cpu = smp_processor_id();
 
     ASSERT_NOT_IN_ATOMIC();
@@ -1604,12 +1603,12 @@ static void schedule(void)
         set_bit(_TASKLET_scheduled, tasklet_work);
         /* fallthrough */
     case TASKLET_enqueued|TASKLET_scheduled:
-        tasklet_work_scheduled = 1;
+        tasklet_work_scheduled = true;
         break;
     case TASKLET_scheduled:
         clear_bit(_TASKLET_scheduled, tasklet_work);
     case 0:
-        /*tasklet_work_scheduled = 0;*/
+        /*tasklet_work_scheduled = false;*/
         break;
     default:
         BUG();
@@ -1623,14 +1622,14 @@ static void schedule(void)
 
     /* get policy-specific decision on scheduling... */
     sched = this_cpu(scheduler);
-    next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
+    sched->do_schedule(sched, prev, now, tasklet_work_scheduled);
 
-    next = next_slice.task;
+    next = prev->next_task;
 
     sd->curr = next;
 
-    if ( next_slice.time >= 0 ) /* -ve means no limit */
-        set_timer(&sd->s_timer, now + next_slice.time);
+    if ( prev->next_time >= 0 ) /* -ve means no limit */
+        set_timer(&sd->s_timer, now + prev->next_time);
 
     if ( unlikely(prev == next) )
     {
@@ -1638,7 +1637,7 @@ static void schedule(void)
         TRACE_4D(TRC_SCHED_SWITCH_INFCONT,
                  next->domain->domain_id, next->unit_id,
                  now - prev->state_entry_time,
-                 next_slice.time);
+                 prev->next_time);
         trace_continue_running(next->vcpu);
         return continue_running(prev->vcpu);
     }
@@ -1650,7 +1649,7 @@ static void schedule(void)
              next->domain->domain_id, next->unit_id,
              (next->vcpu->runstate.state == RUNSTATE_runnable) ?
              (now - next->state_entry_time) : 0,
-             next_slice.time);
+             prev->next_time);
 
     ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
 
@@ -1680,7 +1679,7 @@ static void schedule(void)
 
     stop_timer(&prev->vcpu->periodic_timer);
 
-    if ( next_slice.migrated )
+    if ( next->migrated )
         vcpu_move_irqs(next->vcpu);
 
     vcpu_periodic_timer_work(next->vcpu);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 8b349069db..7163ee869b 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -190,12 +190,6 @@ static inline spinlock_t *pcpu_schedule_trylock(unsigned int cpu)
     return NULL;
 }
 
-struct task_slice {
-    struct sched_unit *task;
-    s_time_t           time;
-    bool_t             migrated;
-};
-
 struct scheduler {
     char *name;             /* full name for this scheduler      */
     char *opt_name;         /* option name for this scheduler    */
@@ -238,8 +232,9 @@ struct scheduler {
     void         (*context_saved)  (const struct scheduler *,
                                     struct sched_unit *);
 
-    struct task_slice (*do_schedule) (const struct scheduler *, s_time_t,
-                                      bool_t tasklet_work_scheduled);
+    void         (*do_schedule)    (const struct scheduler *,
+                                    struct sched_unit *, s_time_t,
+                                    bool tasklet_work_scheduled);
 
     struct sched_resource * (*pick_resource) (const struct scheduler *,
                                               struct sched_unit *);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 0ab358f632..d0048711cf 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -281,6 +281,8 @@ struct sched_unit {
     bool                   affinity_broken;
     /* Does soft affinity actually play a role (given hard affinity)? */
     bool                   soft_aff_effective;
+    /* Item has been migrated to other cpu(s). */
+    bool                   migrated;
     /* Bitmask of CPUs on which this VCPU may run. */
     cpumask_var_t          cpu_hard_affinity;
     /* Used to change affinity temporarily. */
@@ -289,6 +291,10 @@ struct sched_unit {
     cpumask_var_t          cpu_hard_affinity_saved;
     /* Bitmask of CPUs on which this VCPU prefers to run. */
     cpumask_var_t          cpu_soft_affinity;
+
+    /* Next unit to run. */
+    struct sched_unit      *next_task;
+    s_time_t                next_time;
 };
 
 #define for_each_sched_unit(d, e)                                         \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 34/60] xen/sched: move struct task_slice into struct sched_unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

In order to prepare for multiple vcpus per schedule unit move struct
task_slice in schedule() from the local stack into struct sched_unit
of the currently running unit. To make access easier for the single
schedulers add the pointer of the currently running unit as a parameter
of do_schedule().

While at it switch the tasklet_work_scheduled parameter of
do_schedule() from bool_t to bool.

As struct task_slice is only ever modified with the local schedule
lock held it is safe to directly set the different units in struct
sched_unit instead of using an on-stack copy for returning the data.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c | 20 +++++++-------------
 xen/common/sched_credit.c   | 25 +++++++++++--------------
 xen/common/sched_credit2.c  | 21 +++++++++------------
 xen/common/sched_null.c     | 26 ++++++++++++--------------
 xen/common/sched_rt.c       | 22 +++++++++++-----------
 xen/common/schedule.c       | 21 ++++++++++-----------
 xen/include/xen/sched-if.h  | 11 +++--------
 xen/include/xen/sched.h     |  6 ++++++
 8 files changed, 69 insertions(+), 83 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index e48f2b2eb9..34efcc07c9 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -497,18 +497,14 @@ a653sched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param now       Current time
- *
- * @return          Address of the UNIT structure scheduled to be run next
- *                  Amount of time to execute the returned UNIT
- *                  Flag for whether the UNIT was migrated
  */
-static struct task_slice
+static void
 a653sched_do_schedule(
     const struct scheduler *ops,
+    struct sched_unit *prev,
     s_time_t now,
-    bool_t tasklet_work_scheduled)
+    bool tasklet_work_scheduled)
 {
-    struct task_slice ret;                      /* hold the chosen domain */
     struct sched_unit *new_task = NULL;
     static unsigned int sched_index = 0;
     static s_time_t next_switch_time;
@@ -586,13 +582,11 @@ a653sched_do_schedule(
      * Return the amount of time the next domain has to run and the address
      * of the selected task's UNIT structure.
      */
-    ret.time = next_switch_time - now;
-    ret.task = new_task;
-    ret.migrated = 0;
-
-    BUG_ON(ret.time <= 0);
+    prev->next_time = next_switch_time - now;
+    prev->next_task = new_task;
+    new_task->migrated = false;
 
-    return ret;
+    BUG_ON(prev->next_time <= 0);
 }
 
 /**
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index e67b9bb23e..196f613420 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1681,7 +1681,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
 
 static struct csched_unit *
 csched_load_balance(struct csched_private *prv, int cpu,
-    struct csched_unit *snext, bool_t *stolen)
+    struct csched_unit *snext, bool *stolen)
 {
     struct cpupool *c = per_cpu(cpupool, cpu);
     struct csched_unit *speer;
@@ -1797,7 +1797,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                 /* As soon as one unit is found, balancing ends */
                 if ( speer != NULL )
                 {
-                    *stolen = 1;
+                    *stolen = true;
                     /*
                      * Next time we'll look for work to steal on this node, we
                      * will start from the next pCPU, with respect to this one,
@@ -1827,19 +1827,18 @@ csched_load_balance(struct csched_private *prv, int cpu,
  * This function is in the critical path. It is designed to be simple and
  * fast for the common case.
  */
-static struct task_slice
-csched_schedule(
-    const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
+static void csched_schedule(
+    const struct scheduler *ops, struct sched_unit *unit, s_time_t now,
+    bool tasklet_work_scheduled)
 {
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct list_head * const runq = RUNQ(sched_cpu);
-    struct sched_unit *unit = current->sched_unit;
     struct csched_unit * const scurr = CSCHED_UNIT(unit);
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_unit *snext;
-    struct task_slice ret;
     s_time_t runtime, tslice;
+    bool migrated = false;
 
     SCHED_STAT_CRANK(schedule);
     CSCHED_UNIT_CHECK(unit);
@@ -1929,7 +1928,6 @@ csched_schedule(
                         (unsigned char *)&d);
         }
 
-        ret.migrated = 0;
         goto out;
     }
     tslice = prv->tslice;
@@ -1947,7 +1945,6 @@ csched_schedule(
     }
 
     snext = __runq_elem(runq->next);
-    ret.migrated = 0;
 
     /* Tasklet work (which runs in idle UNIT context) overrides all else. */
     if ( tasklet_work_scheduled )
@@ -1973,7 +1970,7 @@ csched_schedule(
     if ( snext->pri > CSCHED_PRI_TS_OVER )
         __runq_remove(snext);
     else
-        snext = csched_load_balance(prv, sched_cpu, snext, &ret.migrated);
+        snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
 
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
@@ -1996,12 +1993,12 @@ out:
     /*
      * Return task to run next...
      */
-    ret.time = (is_idle_unit(snext->unit) ?
+    unit->next_time = (is_idle_unit(snext->unit) ?
                 -1 : tslice);
-    ret.task = snext->unit;
+    unit->next_task = snext->unit;
+    snext->unit->migrated = migrated;
 
-    CSCHED_UNIT_CHECK(ret.task);
-    return ret;
+    CSCHED_UNIT_CHECK(unit->next_task);
 }
 
 static void
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index cf502b3cbc..6695441588 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3443,19 +3443,18 @@ runq_candidate(struct csched2_runqueue_data *rqd,
  * This function is in the critical path. It is designed to be simple and
  * fast for the common case.
  */
-static struct task_slice
-csched2_schedule(
-    const struct scheduler *ops, s_time_t now, bool tasklet_work_scheduled)
+static void csched2_schedule(
+    const struct scheduler *ops, struct sched_unit *currunit, s_time_t now,
+    bool tasklet_work_scheduled)
 {
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct csched2_runqueue_data *rqd;
-    struct sched_unit *currunit = current->sched_unit;
     struct csched2_unit * const scurr = csched2_unit(currunit);
     struct csched2_unit *snext = NULL;
     unsigned int skipped_units = 0;
-    struct task_slice ret;
     bool tickled;
+    bool migrated = false;
 
     SCHED_STAT_CRANK(schedule);
     CSCHED2_UNIT_CHECK(currunit);
@@ -3540,8 +3539,6 @@ csched2_schedule(
          && unit_runnable(currunit) )
         __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags);
 
-    ret.migrated = 0;
-
     /* Accounting for non-idle tasks */
     if ( !is_idle_unit(snext->unit) )
     {
@@ -3591,7 +3588,7 @@ csched2_schedule(
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
             sched_set_res(snext->unit, get_sched_res(sched_cpu));
             SCHED_STAT_CRANK(migrated);
-            ret.migrated = 1;
+            migrated = true;
         }
     }
     else
@@ -3622,11 +3619,11 @@ csched2_schedule(
     /*
      * Return task to run next...
      */
-    ret.time = csched2_runtime(ops, sched_cpu, snext, now);
-    ret.task = snext->unit;
+    currunit->next_time = csched2_runtime(ops, sched_cpu, snext, now);
+    currunit->next_task = snext->unit;
+    snext->unit->migrated = migrated;
 
-    CSCHED2_UNIT_CHECK(ret.task);
-    return ret;
+    CSCHED2_UNIT_CHECK(currunit->next_task);
 }
 
 static void
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 6c350a342b..534d0aea5f 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -695,16 +695,14 @@ static inline void null_unit_check(struct sched_unit *unit)
  *  - the unit assigned to the pCPU, if there's one and it can run;
  *  - the idle unit, otherwise.
  */
-static struct task_slice null_schedule(const struct scheduler *ops,
-                                       s_time_t now,
-                                       bool_t tasklet_work_scheduled)
+static void null_schedule(const struct scheduler *ops, struct sched_unit *prev,
+                          s_time_t now, bool tasklet_work_scheduled)
 {
     unsigned int bs;
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct null_private *prv = null_priv(ops);
     struct null_unit *wvc;
-    struct task_slice ret;
 
     SCHED_STAT_CRANK(schedule);
     NULL_UNIT_CHECK(current->sched_unit);
@@ -732,19 +730,18 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = sched_idle_unit(sched_cpu);
+        prev->next_task = sched_idle_unit(sched_cpu);
     }
     else
-        ret.task = per_cpu(npc, sched_cpu).unit;
-    ret.migrated = 0;
-    ret.time = -1;
+        prev->next_task = per_cpu(npc, sched_cpu).unit;
+    prev->next_time = -1;
 
     /*
      * We may be new in the cpupool, or just coming back online. In which
      * case, there may be units in the waitqueue that we can assign to us
      * and run.
      */
-    if ( unlikely(ret.task == NULL) )
+    if ( unlikely(prev->next_task == NULL) )
     {
         spin_lock(&prv->waitq_lock);
 
@@ -770,7 +767,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
                 {
                     unit_assign(prv, wvc->unit, sched_cpu);
                     list_del_init(&wvc->waitq_elem);
-                    ret.task = wvc->unit;
+                    prev->next_task = wvc->unit;
                     goto unlock;
                 }
             }
@@ -779,11 +776,12 @@ static struct task_slice null_schedule(const struct scheduler *ops,
         spin_unlock(&prv->waitq_lock);
     }
 
-    if ( unlikely(ret.task == NULL || !unit_runnable(ret.task)) )
-        ret.task = sched_idle_unit(sched_cpu);
+    if ( unlikely(prev->next_task == NULL || !unit_runnable(prev->next_task)) )
+        prev->next_task = sched_idle_unit(sched_cpu);
 
-    NULL_UNIT_CHECK(ret.task);
-    return ret;
+    NULL_UNIT_CHECK(prev->next_task);
+
+    prev->next_task->migrated = false;
 }
 
 static inline void dump_unit(struct null_private *prv, struct null_unit *nvc)
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index ddd1badd38..aed498970e 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1055,16 +1055,16 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
  * schedule function for rt scheduler.
  * The lock is already grabbed in schedule.c, no need to lock here
  */
-static struct task_slice
-rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
+static void
+rt_schedule(const struct scheduler *ops, struct sched_unit *currunit,
+            s_time_t now, bool tasklet_work_scheduled)
 {
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct rt_private *prv = rt_priv(ops);
-    struct rt_unit *const scurr = rt_unit(current->sched_unit);
+    struct rt_unit *const scurr = rt_unit(currunit);
     struct rt_unit *snext = NULL;
-    struct task_slice ret = { .migrated = 0 };
-    struct sched_unit *currunit = current->sched_unit;
+    bool migrated = false;
 
     /* TRACE */
     {
@@ -1112,7 +1112,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         __set_bit(__RTDS_delayed_runq_add, &scurr->flags);
 
     snext->last_start = now;
-    ret.time =  -1; /* if an idle unit is picked */
+    currunit->next_time =  -1; /* if an idle unit is picked */
     if ( !is_idle_unit(snext->unit) )
     {
         if ( snext != scurr )
@@ -1123,13 +1123,13 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         if ( sched_unit_cpu(snext->unit) != sched_cpu )
         {
             sched_set_res(snext->unit, get_sched_res(sched_cpu));
-            ret.migrated = 1;
+            migrated = true;
         }
-        ret.time = snext->cur_budget; /* invoke the scheduler next time */
+        /* Invoke the scheduler next time. */
+        currunit->next_time = snext->cur_budget;
     }
-    ret.task = snext->unit;
-
-    return ret;
+    currunit->next_task = snext->unit;
+    snext->unit->migrated = migrated;
 }
 
 /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 8121da15c6..c23a5629de 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1585,10 +1585,9 @@ static void schedule(void)
     s_time_t              now;
     struct scheduler     *sched;
     unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
-    bool_t                tasklet_work_scheduled = 0;
+    bool                  tasklet_work_scheduled = false;
     struct sched_resource *sd;
     spinlock_t           *lock;
-    struct task_slice     next_slice;
     int cpu = smp_processor_id();
 
     ASSERT_NOT_IN_ATOMIC();
@@ -1604,12 +1603,12 @@ static void schedule(void)
         set_bit(_TASKLET_scheduled, tasklet_work);
         /* fallthrough */
     case TASKLET_enqueued|TASKLET_scheduled:
-        tasklet_work_scheduled = 1;
+        tasklet_work_scheduled = true;
         break;
     case TASKLET_scheduled:
         clear_bit(_TASKLET_scheduled, tasklet_work);
     case 0:
-        /*tasklet_work_scheduled = 0;*/
+        /*tasklet_work_scheduled = false;*/
         break;
     default:
         BUG();
@@ -1623,14 +1622,14 @@ static void schedule(void)
 
     /* get policy-specific decision on scheduling... */
     sched = this_cpu(scheduler);
-    next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
+    sched->do_schedule(sched, prev, now, tasklet_work_scheduled);
 
-    next = next_slice.task;
+    next = prev->next_task;
 
     sd->curr = next;
 
-    if ( next_slice.time >= 0 ) /* -ve means no limit */
-        set_timer(&sd->s_timer, now + next_slice.time);
+    if ( prev->next_time >= 0 ) /* -ve means no limit */
+        set_timer(&sd->s_timer, now + prev->next_time);
 
     if ( unlikely(prev == next) )
     {
@@ -1638,7 +1637,7 @@ static void schedule(void)
         TRACE_4D(TRC_SCHED_SWITCH_INFCONT,
                  next->domain->domain_id, next->unit_id,
                  now - prev->state_entry_time,
-                 next_slice.time);
+                 prev->next_time);
         trace_continue_running(next->vcpu);
         return continue_running(prev->vcpu);
     }
@@ -1650,7 +1649,7 @@ static void schedule(void)
              next->domain->domain_id, next->unit_id,
              (next->vcpu->runstate.state == RUNSTATE_runnable) ?
              (now - next->state_entry_time) : 0,
-             next_slice.time);
+             prev->next_time);
 
     ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
 
@@ -1680,7 +1679,7 @@ static void schedule(void)
 
     stop_timer(&prev->vcpu->periodic_timer);
 
-    if ( next_slice.migrated )
+    if ( next->migrated )
         vcpu_move_irqs(next->vcpu);
 
     vcpu_periodic_timer_work(next->vcpu);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 8b349069db..7163ee869b 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -190,12 +190,6 @@ static inline spinlock_t *pcpu_schedule_trylock(unsigned int cpu)
     return NULL;
 }
 
-struct task_slice {
-    struct sched_unit *task;
-    s_time_t           time;
-    bool_t             migrated;
-};
-
 struct scheduler {
     char *name;             /* full name for this scheduler      */
     char *opt_name;         /* option name for this scheduler    */
@@ -238,8 +232,9 @@ struct scheduler {
     void         (*context_saved)  (const struct scheduler *,
                                     struct sched_unit *);
 
-    struct task_slice (*do_schedule) (const struct scheduler *, s_time_t,
-                                      bool_t tasklet_work_scheduled);
+    void         (*do_schedule)    (const struct scheduler *,
+                                    struct sched_unit *, s_time_t,
+                                    bool tasklet_work_scheduled);
 
     struct sched_resource * (*pick_resource) (const struct scheduler *,
                                               struct sched_unit *);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 0ab358f632..d0048711cf 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -281,6 +281,8 @@ struct sched_unit {
     bool                   affinity_broken;
     /* Does soft affinity actually play a role (given hard affinity)? */
     bool                   soft_aff_effective;
+    /* Item has been migrated to other cpu(s). */
+    bool                   migrated;
     /* Bitmask of CPUs on which this VCPU may run. */
     cpumask_var_t          cpu_hard_affinity;
     /* Used to change affinity temporarily. */
@@ -289,6 +291,10 @@ struct sched_unit {
     cpumask_var_t          cpu_hard_affinity_saved;
     /* Bitmask of CPUs on which this VCPU prefers to run. */
     cpumask_var_t          cpu_soft_affinity;
+
+    /* Next unit to run. */
+    struct sched_unit      *next_task;
+    s_time_t                next_time;
 };
 
 #define for_each_sched_unit(d, e)                                         \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 35/60] xen/sched: add code to sync scheduling of all vcpus of a sched unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

When switching sched units synchronize all vcpus of the new unit to be
scheduled at the same time.

A variable sched_granularity is added which holds the number of vcpus
per schedule unit.

As tasklets require to schedule the idle unit it is required to set the
tasklet_work_scheduled parameter of do_schedule() to true if any cpu
covered by the current schedule() call has any pending tasklet work.

For joining other vcpus of the schedule unit we need to add a new
softirq SCHED_SLAVE_SOFTIRQ in order to have a way to initiate a
context switch without calling the generic schedule() function
selecting the vcpu to switch to, as we already know which vcpu we
want to run. This has the other advantage not to loose any other
concurrent SCHEDULE_SOFTIRQ events.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: move syncing after context_switch() to schedule.c
---
 xen/arch/arm/domain.c      |   2 +-
 xen/arch/x86/domain.c      |   3 +-
 xen/common/schedule.c      | 326 ++++++++++++++++++++++++++++++++++-----------
 xen/common/softirq.c       |   6 +-
 xen/include/xen/sched-if.h |   1 +
 xen/include/xen/sched.h    |  16 ++-
 xen/include/xen/softirq.h  |   1 +
 7 files changed, 269 insertions(+), 86 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index ff330b35e6..9140a79c60 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -313,7 +313,7 @@ static void schedule_tail(struct vcpu *prev)
 
     local_irq_enable();
 
-    context_saved(prev);
+    sched_context_switched(prev, current);
 
     update_runstate_area(current);
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index ac960ddd40..d3ee699da6 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1721,7 +1721,6 @@ static void __context_switch(void)
     per_cpu(curr_vcpu, cpu) = n;
 }
 
-
 void context_switch(struct vcpu *prev, struct vcpu *next)
 {
     unsigned int cpu = smp_processor_id();
@@ -1797,7 +1796,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
         }
     }
 
-    context_saved(prev);
+    sched_context_switched(prev, next);
 
     _update_runstate_area(next);
     /* Must be done with interrupts enabled */
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c23a5629de..9de315ada1 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -54,6 +54,10 @@ boolean_param("sched_smt_power_savings", sched_smt_power_savings);
  * */
 int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
+
+/* Number of vcpus per struct sched_unit. */
+static unsigned int sched_granularity = 1;
+
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
 static void vcpu_periodic_timer_fn(void *data);
@@ -1574,134 +1578,304 @@ static void vcpu_periodic_timer_work(struct vcpu *v)
     set_timer(&v->periodic_timer, periodic_next_event);
 }
 
-/*
- * The main function
- * - deschedule the current domain (scheduler independent).
- * - pick a new domain (scheduler dependent).
- */
-static void schedule(void)
+static void sched_switch_units(struct sched_resource *sd,
+                               struct sched_unit *next, struct sched_unit *prev,
+                               s_time_t now)
 {
-    struct sched_unit    *prev = current->sched_unit, *next = NULL;
-    s_time_t              now;
-    struct scheduler     *sched;
-    unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
-    bool                  tasklet_work_scheduled = false;
-    struct sched_resource *sd;
-    spinlock_t           *lock;
-    int cpu = smp_processor_id();
+    sd->curr = next;
 
-    ASSERT_NOT_IN_ATOMIC();
+    TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, prev->unit_id,
+             now - prev->state_entry_time);
+    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, next->unit_id,
+             (next->vcpu->runstate.state == RUNSTATE_runnable) ?
+             (now - next->state_entry_time) : 0, prev->next_time);
 
-    SCHED_STAT_CRANK(sched_run);
+    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
 
-    sd = get_sched_res(cpu);
+    TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
+             next->domain->domain_id, next->unit_id);
+
+    sched_unit_runstate_change(prev, false, now);
+    prev->last_run_time = now;
+
+    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
+    sched_unit_runstate_change(next, true, now);
+
+    /*
+     * NB. Don't add any trace records from here until the actual context
+     * switch, else lost_records resume will not work properly.
+     */
+
+    ASSERT(!next->is_running);
+    next->vcpu->is_running = 1;
+    next->is_running = 1;
+}
+
+static bool sched_tasklet_check_cpu(unsigned int cpu)
+{
+    unsigned long *tasklet_work = &per_cpu(tasklet_work_to_do, cpu);
 
-    /* Update tasklet scheduling status. */
     switch ( *tasklet_work )
     {
     case TASKLET_enqueued:
         set_bit(_TASKLET_scheduled, tasklet_work);
         /* fallthrough */
     case TASKLET_enqueued|TASKLET_scheduled:
-        tasklet_work_scheduled = true;
+        return true;
         break;
     case TASKLET_scheduled:
         clear_bit(_TASKLET_scheduled, tasklet_work);
+        /* fallthrough */
     case 0:
-        /*tasklet_work_scheduled = false;*/
+        /* return false; */
         break;
     default:
         BUG();
     }
 
-    lock = pcpu_schedule_lock_irq(cpu);
+    return false;
+}
 
-    now = NOW();
+static bool sched_tasklet_check(unsigned int cpu)
+{
+    bool tasklet_work_scheduled = false;
+    const cpumask_t *mask = get_sched_res(cpu)->cpus;
+    int cpu_iter;
 
-    stop_timer(&sd->s_timer);
+    for_each_cpu ( cpu_iter, mask )
+        if ( sched_tasklet_check_cpu(cpu_iter) )
+            tasklet_work_scheduled = true;
+
+    return tasklet_work_scheduled;
+}
+
+static struct sched_unit *do_schedule(struct sched_unit *prev, s_time_t now,
+                                      unsigned int cpu)
+{
+    struct scheduler *sched = per_cpu(scheduler, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
+    struct sched_unit *next;
 
     /* get policy-specific decision on scheduling... */
-    sched = this_cpu(scheduler);
-    sched->do_schedule(sched, prev, now, tasklet_work_scheduled);
+    sched->do_schedule(sched, prev, now, sched_tasklet_check(cpu));
 
     next = prev->next_task;
 
-    sd->curr = next;
-
     if ( prev->next_time >= 0 ) /* -ve means no limit */
         set_timer(&sd->s_timer, now + prev->next_time);
 
-    if ( unlikely(prev == next) )
+    if ( likely(prev != next) )
+        sched_switch_units(sd, next, prev, now);
+
+    return next;
+}
+
+static void context_saved(struct vcpu *prev)
+{
+    struct sched_unit *unit = prev->sched_unit;
+
+    unit->is_running = 0;
+    unit->state_entry_time = NOW();
+
+    /* Check for migration request /after/ clearing running flag. */
+    smp_mb();
+
+    sched_context_saved(vcpu_scheduler(prev), unit);
+
+    sched_unit_migrate_finish(unit);
+}
+
+/*
+ * Rendezvous on end of context switch.
+ * As no lock is protecting this rendezvous function we need to use atomic
+ * access functions on the counter.
+ * The counter will be 0 in case no rendezvous is needed. For the rendezvous
+ * case it is initialised to the number of cpus to rendezvous plus 1. Each
+ * member entering decrements the counter. The last one will decrement it to
+ * 1 and perform the final needed action in that case (call of context_saved()
+ * if vcpu was switched), and then set the counter to zero. The other members
+ * will wait until the counter becomes zero until they proceed.
+ */
+void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
+{
+    struct sched_unit *next = vnext->sched_unit;
+
+    /* Clear running flag /after/ writing context to memory. */
+    smp_wmb();
+
+    vprev->is_running = 0;
+
+    if ( atomic_read(&next->rendezvous_out_cnt) )
+    {
+        int cnt = atomic_dec_return(&next->rendezvous_out_cnt);
+
+        /* Call context_saved() before releasing other waiters. */
+        if ( cnt == 1 )
+        {
+            if ( vprev != vnext )
+                context_saved(vprev);
+            atomic_set(&next->rendezvous_out_cnt, 0);
+        }
+        else
+            while ( atomic_read(&next->rendezvous_out_cnt) )
+                cpu_relax();
+    }
+    else if ( vprev != vnext )
+        context_saved(vprev);
+}
+
+static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
+                                 s_time_t now)
+{
+    if ( unlikely(vprev == vnext) )
     {
-        pcpu_schedule_unlock_irq(lock, cpu);
         TRACE_4D(TRC_SCHED_SWITCH_INFCONT,
-                 next->domain->domain_id, next->unit_id,
-                 now - prev->state_entry_time,
-                 prev->next_time);
-        trace_continue_running(next->vcpu);
-        return continue_running(prev->vcpu);
+                 vnext->domain->domain_id, vnext->sched_unit->unit_id,
+                 now - vprev->runstate.state_entry_time,
+                 vprev->sched_unit->next_time);
+        sched_context_switched(vprev, vnext);
+        trace_continue_running(vnext);
+        return continue_running(vprev);
     }
 
-    TRACE_3D(TRC_SCHED_SWITCH_INFPREV,
-             prev->domain->domain_id, prev->unit_id,
-             now - prev->state_entry_time);
-    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT,
-             next->domain->domain_id, next->unit_id,
-             (next->vcpu->runstate.state == RUNSTATE_runnable) ?
-             (now - next->state_entry_time) : 0,
-             prev->next_time);
+    SCHED_STAT_CRANK(sched_ctx);
 
-    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
+    stop_timer(&vprev->periodic_timer);
 
-    TRACE_4D(TRC_SCHED_SWITCH,
-             prev->domain->domain_id, prev->unit_id,
-             next->domain->domain_id, next->unit_id);
+    if ( vnext->sched_unit->migrated )
+        vcpu_move_irqs(vnext);
 
-    sched_unit_runstate_change(prev, false, now);
-    prev->last_run_time = now;
+    vcpu_periodic_timer_work(vnext);
 
-    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
-    sched_unit_runstate_change(next, true, now);
+    context_switch(vprev, vnext);
+}
 
-    /*
-     * NB. Don't add any trace records from here until the actual context
-     * switch, else lost_records resume will not work properly.
-     */
+/*
+ * Rendezvous before taking a scheduling decision.
+ * Called with schedule lock held, so all accesses to the rendezvous counter
+ * can be normal ones (no atomic accesses needed).
+ * The counter is initialized to the number of cpus to rendezvous initially.
+ * Each cpu entering will decrement the counter. In case the counter becomes
+ * zero do_schedule() is called and the rendezvous counter for leaving
+ * context_switch() is set. All other members will wait until the counter is
+ * becoming zero, dropping the schedule lock in between.
+ */
+static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
+                                                   spinlock_t *lock, int cpu,
+                                                   s_time_t now)
+{
+    struct sched_unit *next;
 
-    ASSERT(!next->is_running);
-    next->vcpu->is_running = 1;
-    next->is_running = 1;
-    next->state_entry_time = now;
+    if ( !--prev->rendezvous_in_cnt )
+    {
+        next = do_schedule(prev, now, cpu);
+        atomic_set(&next->rendezvous_out_cnt, sched_granularity + 1);
+        return next;
+    }
 
-    pcpu_schedule_unlock_irq(lock, cpu);
+    while ( prev->rendezvous_in_cnt )
+    {
+        pcpu_schedule_unlock_irq(lock, cpu);
 
-    SCHED_STAT_CRANK(sched_ctx);
+        /* Coming from idle might need to do tasklet work. */
+        if ( is_idle_unit(prev) && sched_tasklet_check_cpu(cpu) )
+            do_tasklet();
+        else
+            cpu_relax();
+
+        pcpu_schedule_lock_irq(cpu);
+    }
+
+    return prev->next_task;
+}
+
+static void sched_slave(void)
+{
+    struct vcpu          *vprev = current;
+    struct sched_unit    *prev = vprev->sched_unit, *next;
+    s_time_t              now;
+    spinlock_t           *lock;
+    int cpu = smp_processor_id();
 
-    stop_timer(&prev->vcpu->periodic_timer);
+    ASSERT_NOT_IN_ATOMIC();
+
+    lock = pcpu_schedule_lock_irq(cpu);
+
+    now = NOW();
+
+    if ( !prev->rendezvous_in_cnt )
+    {
+        pcpu_schedule_unlock_irq(lock, cpu);
+        return;
+    }
 
-    if ( next->migrated )
-        vcpu_move_irqs(next->vcpu);
+    stop_timer(&get_sched_res(cpu)->s_timer);
 
-    vcpu_periodic_timer_work(next->vcpu);
+    next = sched_wait_rendezvous_in(prev, lock, cpu, now);
 
-    context_switch(prev->vcpu, next->vcpu);
+    pcpu_schedule_unlock_irq(lock, cpu);
+
+    sched_context_switch(vprev, next->vcpu, now);
 }
 
-void context_saved(struct vcpu *prev)
+/*
+ * The main function
+ * - deschedule the current domain (scheduler independent).
+ * - pick a new domain (scheduler dependent).
+ */
+static void schedule(void)
 {
-    /* Clear running flag /after/ writing context to memory. */
-    smp_wmb();
+    struct vcpu          *vnext, *vprev = current;
+    struct sched_unit    *prev = vprev->sched_unit, *next = NULL;
+    s_time_t              now;
+    struct sched_resource *sd;
+    spinlock_t           *lock;
+    int cpu = smp_processor_id();
 
-    prev->is_running = 0;
-    prev->sched_unit->is_running = 0;
-    prev->sched_unit->state_entry_time = NOW();
+    ASSERT_NOT_IN_ATOMIC();
 
-    /* Check for migration request /after/ clearing running flag. */
-    smp_mb();
+    SCHED_STAT_CRANK(sched_run);
+
+    sd = get_sched_res(cpu);
+
+    lock = pcpu_schedule_lock_irq(cpu);
+
+    if ( prev->rendezvous_in_cnt )
+    {
+        /*
+         * We have a race: sched_slave() should be called, so raise a softirq
+         * in order to re-enter schedule() later and call sched_slave() now.
+         */
+        pcpu_schedule_unlock_irq(lock, cpu);
+
+        raise_softirq(SCHEDULE_SOFTIRQ);
+        return sched_slave();
+    }
+
+    now = NOW();
 
-    sched_context_saved(vcpu_scheduler(prev), prev->sched_unit);
+    stop_timer(&sd->s_timer);
+
+    if ( sched_granularity > 1 )
+    {
+        cpumask_t mask;
+
+        prev->rendezvous_in_cnt = sched_granularity;
+        cpumask_andnot(&mask, sd->cpus, cpumask_of(cpu));
+        cpumask_raise_softirq(&mask, SCHED_SLAVE_SOFTIRQ);
+        next = sched_wait_rendezvous_in(prev, lock, cpu, now);
+    }
+    else
+    {
+        prev->rendezvous_in_cnt = 0;
+        next = do_schedule(prev, now, cpu);
+        atomic_set(&next->rendezvous_out_cnt, 0);
+    }
+
+    pcpu_schedule_unlock_irq(lock, cpu);
 
-    sched_unit_migrate_finish(prev->sched_unit);
+    vnext = next->vcpu;
+    sched_context_switch(vprev, vnext, now);
 }
 
 /* The scheduler timer: force a run through the scheduler */
@@ -1743,6 +1917,7 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( sd == NULL )
         return -ENOMEM;
     sd->processor = cpu;
+    sd->cpus = cpumask_of(cpu);
     set_sched_res(cpu, sd);
 
     per_cpu(scheduler, cpu) = &ops;
@@ -1905,6 +2080,7 @@ void __init scheduler_init(void)
     int i;
 
     open_softirq(SCHEDULE_SOFTIRQ, schedule);
+    open_softirq(SCHED_SLAVE_SOFTIRQ, sched_slave);
 
     for ( i = 0; i < NUM_SCHEDULERS; i++)
     {
diff --git a/xen/common/softirq.c b/xen/common/softirq.c
index 83c3c09bd5..2d66193203 100644
--- a/xen/common/softirq.c
+++ b/xen/common/softirq.c
@@ -33,8 +33,8 @@ static void __do_softirq(unsigned long ignore_mask)
     for ( ; ; )
     {
         /*
-         * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ may move
-         * us to another processor.
+         * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ or
+         * SCHED_SLAVE_SOFTIRQ may move us to another processor.
          */
         cpu = smp_processor_id();
 
@@ -55,7 +55,7 @@ void process_pending_softirqs(void)
 {
     ASSERT(!in_irq() && local_irq_is_enabled());
     /* Do not enter scheduler as it can preempt the calling context. */
-    __do_softirq(1ul<<SCHEDULE_SOFTIRQ);
+    __do_softirq((1ul << SCHEDULE_SOFTIRQ) | (1ul << SCHED_SLAVE_SOFTIRQ));
 }
 
 void do_softirq(void)
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 7163ee869b..e66d866f41 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -41,6 +41,7 @@ struct sched_resource {
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
     unsigned int        processor;
+    const cpumask_t    *cpus;           /* cpus covered by this struct     */
 };
 
 #define curr_on_cpu(c)    (get_sched_res(c)->curr)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d0048711cf..9ff9cb148d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -295,6 +295,12 @@ struct sched_unit {
     /* Next unit to run. */
     struct sched_unit      *next_task;
     s_time_t                next_time;
+
+    /* Number of vcpus not yet joined for context switch. */
+    unsigned int            rendezvous_in_cnt;
+
+    /* Number of vcpus not yet finished with context switch. */
+    atomic_t                rendezvous_out_cnt;
 };
 
 #define for_each_sched_unit(d, e)                                         \
@@ -695,10 +701,10 @@ void sync_local_execstate(void);
 
 /*
  * Called by the scheduler to switch to another VCPU. This function must
- * call context_saved(@prev) when the local CPU is no longer running in
- * @prev's context, and that context is saved to memory. Alternatively, if
- * implementing lazy context switching, it suffices to ensure that invoking
- * sync_vcpu_execstate() will switch and commit @prev's state.
+ * call sched_context_switched(@prev, @next) when the local CPU is no longer
+ * running in @prev's context, and that context is saved to memory.
+ * Alternatively, if implementing lazy context switching, it suffices to ensure
+ * that invoking sync_vcpu_execstate() will switch and commit @prev's state.
  */
 void context_switch(
     struct vcpu *prev,
@@ -710,7 +716,7 @@ void context_switch(
  * saved to memory. Alternatively, if implementing lazy context switching,
  * ensure that invoking sync_vcpu_execstate() will switch and commit @prev.
  */
-void context_saved(struct vcpu *prev);
+void sched_context_switched(struct vcpu *prev, struct vcpu *vnext);
 
 /* Called by the scheduler to continue running the current VCPU. */
 void continue_running(
diff --git a/xen/include/xen/softirq.h b/xen/include/xen/softirq.h
index c327c9b6cd..d7273b389b 100644
--- a/xen/include/xen/softirq.h
+++ b/xen/include/xen/softirq.h
@@ -4,6 +4,7 @@
 /* Low-latency softirqs come first in the following list. */
 enum {
     TIMER_SOFTIRQ = 0,
+    SCHED_SLAVE_SOFTIRQ,
     SCHEDULE_SOFTIRQ,
     NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
     RCU_SOFTIRQ,
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 35/60] xen/sched: add code to sync scheduling of all vcpus of a sched unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

When switching sched units synchronize all vcpus of the new unit to be
scheduled at the same time.

A variable sched_granularity is added which holds the number of vcpus
per schedule unit.

As tasklets require to schedule the idle unit it is required to set the
tasklet_work_scheduled parameter of do_schedule() to true if any cpu
covered by the current schedule() call has any pending tasklet work.

For joining other vcpus of the schedule unit we need to add a new
softirq SCHED_SLAVE_SOFTIRQ in order to have a way to initiate a
context switch without calling the generic schedule() function
selecting the vcpu to switch to, as we already know which vcpu we
want to run. This has the other advantage not to loose any other
concurrent SCHEDULE_SOFTIRQ events.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: move syncing after context_switch() to schedule.c
---
 xen/arch/arm/domain.c      |   2 +-
 xen/arch/x86/domain.c      |   3 +-
 xen/common/schedule.c      | 326 ++++++++++++++++++++++++++++++++++-----------
 xen/common/softirq.c       |   6 +-
 xen/include/xen/sched-if.h |   1 +
 xen/include/xen/sched.h    |  16 ++-
 xen/include/xen/softirq.h  |   1 +
 7 files changed, 269 insertions(+), 86 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index ff330b35e6..9140a79c60 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -313,7 +313,7 @@ static void schedule_tail(struct vcpu *prev)
 
     local_irq_enable();
 
-    context_saved(prev);
+    sched_context_switched(prev, current);
 
     update_runstate_area(current);
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index ac960ddd40..d3ee699da6 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1721,7 +1721,6 @@ static void __context_switch(void)
     per_cpu(curr_vcpu, cpu) = n;
 }
 
-
 void context_switch(struct vcpu *prev, struct vcpu *next)
 {
     unsigned int cpu = smp_processor_id();
@@ -1797,7 +1796,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
         }
     }
 
-    context_saved(prev);
+    sched_context_switched(prev, next);
 
     _update_runstate_area(next);
     /* Must be done with interrupts enabled */
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c23a5629de..9de315ada1 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -54,6 +54,10 @@ boolean_param("sched_smt_power_savings", sched_smt_power_savings);
  * */
 int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
+
+/* Number of vcpus per struct sched_unit. */
+static unsigned int sched_granularity = 1;
+
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
 static void vcpu_periodic_timer_fn(void *data);
@@ -1574,134 +1578,304 @@ static void vcpu_periodic_timer_work(struct vcpu *v)
     set_timer(&v->periodic_timer, periodic_next_event);
 }
 
-/*
- * The main function
- * - deschedule the current domain (scheduler independent).
- * - pick a new domain (scheduler dependent).
- */
-static void schedule(void)
+static void sched_switch_units(struct sched_resource *sd,
+                               struct sched_unit *next, struct sched_unit *prev,
+                               s_time_t now)
 {
-    struct sched_unit    *prev = current->sched_unit, *next = NULL;
-    s_time_t              now;
-    struct scheduler     *sched;
-    unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
-    bool                  tasklet_work_scheduled = false;
-    struct sched_resource *sd;
-    spinlock_t           *lock;
-    int cpu = smp_processor_id();
+    sd->curr = next;
 
-    ASSERT_NOT_IN_ATOMIC();
+    TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, prev->unit_id,
+             now - prev->state_entry_time);
+    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, next->unit_id,
+             (next->vcpu->runstate.state == RUNSTATE_runnable) ?
+             (now - next->state_entry_time) : 0, prev->next_time);
 
-    SCHED_STAT_CRANK(sched_run);
+    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
 
-    sd = get_sched_res(cpu);
+    TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
+             next->domain->domain_id, next->unit_id);
+
+    sched_unit_runstate_change(prev, false, now);
+    prev->last_run_time = now;
+
+    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
+    sched_unit_runstate_change(next, true, now);
+
+    /*
+     * NB. Don't add any trace records from here until the actual context
+     * switch, else lost_records resume will not work properly.
+     */
+
+    ASSERT(!next->is_running);
+    next->vcpu->is_running = 1;
+    next->is_running = 1;
+}
+
+static bool sched_tasklet_check_cpu(unsigned int cpu)
+{
+    unsigned long *tasklet_work = &per_cpu(tasklet_work_to_do, cpu);
 
-    /* Update tasklet scheduling status. */
     switch ( *tasklet_work )
     {
     case TASKLET_enqueued:
         set_bit(_TASKLET_scheduled, tasklet_work);
         /* fallthrough */
     case TASKLET_enqueued|TASKLET_scheduled:
-        tasklet_work_scheduled = true;
+        return true;
         break;
     case TASKLET_scheduled:
         clear_bit(_TASKLET_scheduled, tasklet_work);
+        /* fallthrough */
     case 0:
-        /*tasklet_work_scheduled = false;*/
+        /* return false; */
         break;
     default:
         BUG();
     }
 
-    lock = pcpu_schedule_lock_irq(cpu);
+    return false;
+}
 
-    now = NOW();
+static bool sched_tasklet_check(unsigned int cpu)
+{
+    bool tasklet_work_scheduled = false;
+    const cpumask_t *mask = get_sched_res(cpu)->cpus;
+    int cpu_iter;
 
-    stop_timer(&sd->s_timer);
+    for_each_cpu ( cpu_iter, mask )
+        if ( sched_tasklet_check_cpu(cpu_iter) )
+            tasklet_work_scheduled = true;
+
+    return tasklet_work_scheduled;
+}
+
+static struct sched_unit *do_schedule(struct sched_unit *prev, s_time_t now,
+                                      unsigned int cpu)
+{
+    struct scheduler *sched = per_cpu(scheduler, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
+    struct sched_unit *next;
 
     /* get policy-specific decision on scheduling... */
-    sched = this_cpu(scheduler);
-    sched->do_schedule(sched, prev, now, tasklet_work_scheduled);
+    sched->do_schedule(sched, prev, now, sched_tasklet_check(cpu));
 
     next = prev->next_task;
 
-    sd->curr = next;
-
     if ( prev->next_time >= 0 ) /* -ve means no limit */
         set_timer(&sd->s_timer, now + prev->next_time);
 
-    if ( unlikely(prev == next) )
+    if ( likely(prev != next) )
+        sched_switch_units(sd, next, prev, now);
+
+    return next;
+}
+
+static void context_saved(struct vcpu *prev)
+{
+    struct sched_unit *unit = prev->sched_unit;
+
+    unit->is_running = 0;
+    unit->state_entry_time = NOW();
+
+    /* Check for migration request /after/ clearing running flag. */
+    smp_mb();
+
+    sched_context_saved(vcpu_scheduler(prev), unit);
+
+    sched_unit_migrate_finish(unit);
+}
+
+/*
+ * Rendezvous on end of context switch.
+ * As no lock is protecting this rendezvous function we need to use atomic
+ * access functions on the counter.
+ * The counter will be 0 in case no rendezvous is needed. For the rendezvous
+ * case it is initialised to the number of cpus to rendezvous plus 1. Each
+ * member entering decrements the counter. The last one will decrement it to
+ * 1 and perform the final needed action in that case (call of context_saved()
+ * if vcpu was switched), and then set the counter to zero. The other members
+ * will wait until the counter becomes zero until they proceed.
+ */
+void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
+{
+    struct sched_unit *next = vnext->sched_unit;
+
+    /* Clear running flag /after/ writing context to memory. */
+    smp_wmb();
+
+    vprev->is_running = 0;
+
+    if ( atomic_read(&next->rendezvous_out_cnt) )
+    {
+        int cnt = atomic_dec_return(&next->rendezvous_out_cnt);
+
+        /* Call context_saved() before releasing other waiters. */
+        if ( cnt == 1 )
+        {
+            if ( vprev != vnext )
+                context_saved(vprev);
+            atomic_set(&next->rendezvous_out_cnt, 0);
+        }
+        else
+            while ( atomic_read(&next->rendezvous_out_cnt) )
+                cpu_relax();
+    }
+    else if ( vprev != vnext )
+        context_saved(vprev);
+}
+
+static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
+                                 s_time_t now)
+{
+    if ( unlikely(vprev == vnext) )
     {
-        pcpu_schedule_unlock_irq(lock, cpu);
         TRACE_4D(TRC_SCHED_SWITCH_INFCONT,
-                 next->domain->domain_id, next->unit_id,
-                 now - prev->state_entry_time,
-                 prev->next_time);
-        trace_continue_running(next->vcpu);
-        return continue_running(prev->vcpu);
+                 vnext->domain->domain_id, vnext->sched_unit->unit_id,
+                 now - vprev->runstate.state_entry_time,
+                 vprev->sched_unit->next_time);
+        sched_context_switched(vprev, vnext);
+        trace_continue_running(vnext);
+        return continue_running(vprev);
     }
 
-    TRACE_3D(TRC_SCHED_SWITCH_INFPREV,
-             prev->domain->domain_id, prev->unit_id,
-             now - prev->state_entry_time);
-    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT,
-             next->domain->domain_id, next->unit_id,
-             (next->vcpu->runstate.state == RUNSTATE_runnable) ?
-             (now - next->state_entry_time) : 0,
-             prev->next_time);
+    SCHED_STAT_CRANK(sched_ctx);
 
-    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
+    stop_timer(&vprev->periodic_timer);
 
-    TRACE_4D(TRC_SCHED_SWITCH,
-             prev->domain->domain_id, prev->unit_id,
-             next->domain->domain_id, next->unit_id);
+    if ( vnext->sched_unit->migrated )
+        vcpu_move_irqs(vnext);
 
-    sched_unit_runstate_change(prev, false, now);
-    prev->last_run_time = now;
+    vcpu_periodic_timer_work(vnext);
 
-    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
-    sched_unit_runstate_change(next, true, now);
+    context_switch(vprev, vnext);
+}
 
-    /*
-     * NB. Don't add any trace records from here until the actual context
-     * switch, else lost_records resume will not work properly.
-     */
+/*
+ * Rendezvous before taking a scheduling decision.
+ * Called with schedule lock held, so all accesses to the rendezvous counter
+ * can be normal ones (no atomic accesses needed).
+ * The counter is initialized to the number of cpus to rendezvous initially.
+ * Each cpu entering will decrement the counter. In case the counter becomes
+ * zero do_schedule() is called and the rendezvous counter for leaving
+ * context_switch() is set. All other members will wait until the counter is
+ * becoming zero, dropping the schedule lock in between.
+ */
+static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
+                                                   spinlock_t *lock, int cpu,
+                                                   s_time_t now)
+{
+    struct sched_unit *next;
 
-    ASSERT(!next->is_running);
-    next->vcpu->is_running = 1;
-    next->is_running = 1;
-    next->state_entry_time = now;
+    if ( !--prev->rendezvous_in_cnt )
+    {
+        next = do_schedule(prev, now, cpu);
+        atomic_set(&next->rendezvous_out_cnt, sched_granularity + 1);
+        return next;
+    }
 
-    pcpu_schedule_unlock_irq(lock, cpu);
+    while ( prev->rendezvous_in_cnt )
+    {
+        pcpu_schedule_unlock_irq(lock, cpu);
 
-    SCHED_STAT_CRANK(sched_ctx);
+        /* Coming from idle might need to do tasklet work. */
+        if ( is_idle_unit(prev) && sched_tasklet_check_cpu(cpu) )
+            do_tasklet();
+        else
+            cpu_relax();
+
+        pcpu_schedule_lock_irq(cpu);
+    }
+
+    return prev->next_task;
+}
+
+static void sched_slave(void)
+{
+    struct vcpu          *vprev = current;
+    struct sched_unit    *prev = vprev->sched_unit, *next;
+    s_time_t              now;
+    spinlock_t           *lock;
+    int cpu = smp_processor_id();
 
-    stop_timer(&prev->vcpu->periodic_timer);
+    ASSERT_NOT_IN_ATOMIC();
+
+    lock = pcpu_schedule_lock_irq(cpu);
+
+    now = NOW();
+
+    if ( !prev->rendezvous_in_cnt )
+    {
+        pcpu_schedule_unlock_irq(lock, cpu);
+        return;
+    }
 
-    if ( next->migrated )
-        vcpu_move_irqs(next->vcpu);
+    stop_timer(&get_sched_res(cpu)->s_timer);
 
-    vcpu_periodic_timer_work(next->vcpu);
+    next = sched_wait_rendezvous_in(prev, lock, cpu, now);
 
-    context_switch(prev->vcpu, next->vcpu);
+    pcpu_schedule_unlock_irq(lock, cpu);
+
+    sched_context_switch(vprev, next->vcpu, now);
 }
 
-void context_saved(struct vcpu *prev)
+/*
+ * The main function
+ * - deschedule the current domain (scheduler independent).
+ * - pick a new domain (scheduler dependent).
+ */
+static void schedule(void)
 {
-    /* Clear running flag /after/ writing context to memory. */
-    smp_wmb();
+    struct vcpu          *vnext, *vprev = current;
+    struct sched_unit    *prev = vprev->sched_unit, *next = NULL;
+    s_time_t              now;
+    struct sched_resource *sd;
+    spinlock_t           *lock;
+    int cpu = smp_processor_id();
 
-    prev->is_running = 0;
-    prev->sched_unit->is_running = 0;
-    prev->sched_unit->state_entry_time = NOW();
+    ASSERT_NOT_IN_ATOMIC();
 
-    /* Check for migration request /after/ clearing running flag. */
-    smp_mb();
+    SCHED_STAT_CRANK(sched_run);
+
+    sd = get_sched_res(cpu);
+
+    lock = pcpu_schedule_lock_irq(cpu);
+
+    if ( prev->rendezvous_in_cnt )
+    {
+        /*
+         * We have a race: sched_slave() should be called, so raise a softirq
+         * in order to re-enter schedule() later and call sched_slave() now.
+         */
+        pcpu_schedule_unlock_irq(lock, cpu);
+
+        raise_softirq(SCHEDULE_SOFTIRQ);
+        return sched_slave();
+    }
+
+    now = NOW();
 
-    sched_context_saved(vcpu_scheduler(prev), prev->sched_unit);
+    stop_timer(&sd->s_timer);
+
+    if ( sched_granularity > 1 )
+    {
+        cpumask_t mask;
+
+        prev->rendezvous_in_cnt = sched_granularity;
+        cpumask_andnot(&mask, sd->cpus, cpumask_of(cpu));
+        cpumask_raise_softirq(&mask, SCHED_SLAVE_SOFTIRQ);
+        next = sched_wait_rendezvous_in(prev, lock, cpu, now);
+    }
+    else
+    {
+        prev->rendezvous_in_cnt = 0;
+        next = do_schedule(prev, now, cpu);
+        atomic_set(&next->rendezvous_out_cnt, 0);
+    }
+
+    pcpu_schedule_unlock_irq(lock, cpu);
 
-    sched_unit_migrate_finish(prev->sched_unit);
+    vnext = next->vcpu;
+    sched_context_switch(vprev, vnext, now);
 }
 
 /* The scheduler timer: force a run through the scheduler */
@@ -1743,6 +1917,7 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( sd == NULL )
         return -ENOMEM;
     sd->processor = cpu;
+    sd->cpus = cpumask_of(cpu);
     set_sched_res(cpu, sd);
 
     per_cpu(scheduler, cpu) = &ops;
@@ -1905,6 +2080,7 @@ void __init scheduler_init(void)
     int i;
 
     open_softirq(SCHEDULE_SOFTIRQ, schedule);
+    open_softirq(SCHED_SLAVE_SOFTIRQ, sched_slave);
 
     for ( i = 0; i < NUM_SCHEDULERS; i++)
     {
diff --git a/xen/common/softirq.c b/xen/common/softirq.c
index 83c3c09bd5..2d66193203 100644
--- a/xen/common/softirq.c
+++ b/xen/common/softirq.c
@@ -33,8 +33,8 @@ static void __do_softirq(unsigned long ignore_mask)
     for ( ; ; )
     {
         /*
-         * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ may move
-         * us to another processor.
+         * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ or
+         * SCHED_SLAVE_SOFTIRQ may move us to another processor.
          */
         cpu = smp_processor_id();
 
@@ -55,7 +55,7 @@ void process_pending_softirqs(void)
 {
     ASSERT(!in_irq() && local_irq_is_enabled());
     /* Do not enter scheduler as it can preempt the calling context. */
-    __do_softirq(1ul<<SCHEDULE_SOFTIRQ);
+    __do_softirq((1ul << SCHEDULE_SOFTIRQ) | (1ul << SCHED_SLAVE_SOFTIRQ));
 }
 
 void do_softirq(void)
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 7163ee869b..e66d866f41 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -41,6 +41,7 @@ struct sched_resource {
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
     unsigned int        processor;
+    const cpumask_t    *cpus;           /* cpus covered by this struct     */
 };
 
 #define curr_on_cpu(c)    (get_sched_res(c)->curr)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d0048711cf..9ff9cb148d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -295,6 +295,12 @@ struct sched_unit {
     /* Next unit to run. */
     struct sched_unit      *next_task;
     s_time_t                next_time;
+
+    /* Number of vcpus not yet joined for context switch. */
+    unsigned int            rendezvous_in_cnt;
+
+    /* Number of vcpus not yet finished with context switch. */
+    atomic_t                rendezvous_out_cnt;
 };
 
 #define for_each_sched_unit(d, e)                                         \
@@ -695,10 +701,10 @@ void sync_local_execstate(void);
 
 /*
  * Called by the scheduler to switch to another VCPU. This function must
- * call context_saved(@prev) when the local CPU is no longer running in
- * @prev's context, and that context is saved to memory. Alternatively, if
- * implementing lazy context switching, it suffices to ensure that invoking
- * sync_vcpu_execstate() will switch and commit @prev's state.
+ * call sched_context_switched(@prev, @next) when the local CPU is no longer
+ * running in @prev's context, and that context is saved to memory.
+ * Alternatively, if implementing lazy context switching, it suffices to ensure
+ * that invoking sync_vcpu_execstate() will switch and commit @prev's state.
  */
 void context_switch(
     struct vcpu *prev,
@@ -710,7 +716,7 @@ void context_switch(
  * saved to memory. Alternatively, if implementing lazy context switching,
  * ensure that invoking sync_vcpu_execstate() will switch and commit @prev.
  */
-void context_saved(struct vcpu *prev);
+void sched_context_switched(struct vcpu *prev, struct vcpu *vnext);
 
 /* Called by the scheduler to continue running the current VCPU. */
 void continue_running(
diff --git a/xen/include/xen/softirq.h b/xen/include/xen/softirq.h
index c327c9b6cd..d7273b389b 100644
--- a/xen/include/xen/softirq.h
+++ b/xen/include/xen/softirq.h
@@ -4,6 +4,7 @@
 /* Low-latency softirqs come first in the following list. */
 enum {
     TIMER_SOFTIRQ = 0,
+    SCHED_SLAVE_SOFTIRQ,
     SCHEDULE_SOFTIRQ,
     NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
     RCU_SOFTIRQ,
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 36/60] xen/sched: introduce unit_runnable_state()
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Tim Deegan, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich, Dario Faggioli

Today the vcpu runstate of a new scheduled vcpu is always set to
"running" even if at that time vcpu_runnable() is already returning
false due to a race (e.g. with pausing the vcpu).

With core scheduling this can no longer work as not all vcpus of a
schedule unit have to be "running" when being scheduled. So the vcpu's
new runstate has to be selected at the same time as the runnability of
the related schedule unit is probed.

For this purpose introduce a new helper unit_runnable_state() which
will save the new runstate of all tested vcpus in a new field of the
vcpu struct.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch
---
 xen/common/domain.c         |  1 +
 xen/common/sched_arinc653.c |  2 +-
 xen/common/sched_credit.c   | 49 ++++++++++++++++++++++++---------------------
 xen/common/sched_credit2.c  |  7 ++++---
 xen/common/sched_null.c     |  3 ++-
 xen/common/sched_rt.c       |  8 +++++++-
 xen/common/schedule.c       |  2 +-
 xen/include/xen/sched-if.h  | 14 +++++++++++++
 xen/include/xen/sched.h     |  1 +
 9 files changed, 57 insertions(+), 30 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 65f8a369c1..0daedca729 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -150,6 +150,7 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
     if ( is_idle_domain(d) )
     {
         v->runstate.state = RUNSTATE_running;
+        v->new_state = RUNSTATE_running;
     }
     else
     {
diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 34efcc07c9..759c42ab73 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -557,7 +557,7 @@ a653sched_do_schedule(
     if ( !((new_task != NULL)
            && (AUNIT(new_task) != NULL)
            && AUNIT(new_task)->awake
-           && unit_runnable(new_task)) )
+           && unit_runnable_state(new_task)) )
         new_task = IDLETASK(cpu);
     BUG_ON(new_task == NULL);
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 196f613420..15339e6fae 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1899,7 +1899,7 @@ static void csched_schedule(
     if ( !test_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags)
          && !tasklet_work_scheduled
          && prv->ratelimit
-         && unit_runnable(unit)
+         && unit_runnable_state(unit)
          && !is_idle_unit(unit)
          && runtime < prv->ratelimit )
     {
@@ -1944,33 +1944,36 @@ static void csched_schedule(
         dec_nr_runnable(sched_cpu);
     }
 
-    snext = __runq_elem(runq->next);
-
-    /* Tasklet work (which runs in idle UNIT context) overrides all else. */
-    if ( tasklet_work_scheduled )
-    {
-        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_UNIT(sched_idle_unit(sched_cpu));
-        snext->pri = CSCHED_PRI_TS_BOOST;
-    }
-
     /*
      * Clear YIELD flag before scheduling out
      */
     clear_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags);
 
-    /*
-     * SMP Load balance:
-     *
-     * If the next highest priority local runnable UNIT has already eaten
-     * through its credits, look on other PCPUs to see if we have more
-     * urgent work... If not, csched_load_balance() will return snext, but
-     * already removed from the runq.
-     */
-    if ( snext->pri > CSCHED_PRI_TS_OVER )
-        __runq_remove(snext);
-    else
-        snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
+    do {
+        snext = __runq_elem(runq->next);
+
+        /* Tasklet work (which runs in idle UNIT context) overrides all else. */
+        if ( tasklet_work_scheduled )
+        {
+            TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
+            snext = CSCHED_UNIT(sched_idle_unit(sched_cpu));
+            snext->pri = CSCHED_PRI_TS_BOOST;
+        }
+
+        /*
+         * SMP Load balance:
+         *
+         * If the next highest priority local runnable UNIT has already eaten
+         * through its credits, look on other PCPUs to see if we have more
+         * urgent work... If not, csched_load_balance() will return snext, but
+         * already removed from the runq.
+         */
+        if ( snext->pri > CSCHED_PRI_TS_OVER )
+            __runq_remove(snext);
+        else
+            snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
+
+    } while ( !unit_runnable_state(snext->unit) );
 
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 6695441588..3bfceefa46 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3288,7 +3288,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !yield && prv->ratelimit_us && unit_runnable(scurr->unit) &&
+    if ( !yield && prv->ratelimit_us && unit_runnable_state(scurr->unit) &&
          (now - scurr->unit->state_entry_time) < MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
@@ -3342,7 +3342,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      *
      * Of course, we also default to idle also if scurr is not runnable.
      */
-    if ( unit_runnable(scurr->unit) && !soft_aff_preempt )
+    if ( unit_runnable_state(scurr->unit) && !soft_aff_preempt )
         snext = scurr;
     else
         snext = csched2_unit(sched_idle_unit(cpu));
@@ -3402,7 +3402,8 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          * some budget, then choose it.
          */
         if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || unit_grab_budget(svc)) )
+             (!has_cap(svc) || unit_grab_budget(svc)) &&
+             unit_runnable_state(svc->unit) )
             snext = svc;
 
         /* In any case, if we got this far, break. */
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 534d0aea5f..e9336a2948 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -776,7 +776,8 @@ static void null_schedule(const struct scheduler *ops, struct sched_unit *prev,
         spin_unlock(&prv->waitq_lock);
     }
 
-    if ( unlikely(prev->next_task == NULL || !unit_runnable(prev->next_task)) )
+    if ( unlikely(prev->next_task == NULL ||
+                  !unit_runnable_state(prev->next_task)) )
         prev->next_task = sched_idle_unit(sched_cpu);
 
     NULL_UNIT_CHECK(prev->next_task);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index aed498970e..6258dd2f2a 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1094,12 +1094,18 @@ rt_schedule(const struct scheduler *ops, struct sched_unit *currunit,
     else
     {
         snext = runq_pick(ops, cpumask_of(sched_cpu));
+
         if ( snext == NULL )
             snext = rt_unit(sched_idle_unit(sched_cpu));
+        else if ( !unit_runnable_state(snext->unit) )
+        {
+            q_remove(snext);
+            snext = rt_unit(sched_idle_unit(sched_cpu));
+        }
 
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_unit(currunit) &&
-             unit_runnable(currunit) &&
+             unit_runnable_state(currunit) &&
              scurr->cur_budget > 0 &&
              ( is_idle_unit(snext->unit) ||
                compare_unit_priority(scurr, snext) > 0 ) )
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 9de315ada1..352803622d 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -206,7 +206,7 @@ static inline void sched_unit_runstate_change(struct sched_unit *unit,
     struct vcpu *v = unit->vcpu;
 
     if ( running )
-        vcpu_runstate_change(v, RUNSTATE_running, new_entry_time);
+        vcpu_runstate_change(v, v->new_state, new_entry_time);
     else
         vcpu_runstate_change(v,
             ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index e66d866f41..805cefaa07 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -70,6 +70,20 @@ static inline bool unit_runnable(const struct sched_unit *unit)
     return vcpu_runnable(unit->vcpu);
 }
 
+static inline bool unit_runnable_state(const struct sched_unit *unit)
+{
+    struct vcpu *v;
+    bool runnable;
+
+    v = unit->vcpu;
+    runnable = vcpu_runnable(v);
+
+    v->new_state = runnable ? RUNSTATE_running
+                            : (v->pause_flags & VPF_blocked)
+                              ? RUNSTATE_blocked : RUNSTATE_offline;
+    return runnable;
+}
+
 static inline void sched_set_res(struct sched_unit *unit,
                                  struct sched_resource *res)
 {
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 9ff9cb148d..d5bf61ced0 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -174,6 +174,7 @@ struct vcpu
         XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
     } runstate_guest; /* guest address */
 #endif
+    int              new_state;
 
     /* Has the FPU been initialised? */
     bool             fpu_initialised;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 36/60] xen/sched: introduce unit_runnable_state()
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Tim Deegan, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich, Dario Faggioli

Today the vcpu runstate of a new scheduled vcpu is always set to
"running" even if at that time vcpu_runnable() is already returning
false due to a race (e.g. with pausing the vcpu).

With core scheduling this can no longer work as not all vcpus of a
schedule unit have to be "running" when being scheduled. So the vcpu's
new runstate has to be selected at the same time as the runnability of
the related schedule unit is probed.

For this purpose introduce a new helper unit_runnable_state() which
will save the new runstate of all tested vcpus in a new field of the
vcpu struct.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch
---
 xen/common/domain.c         |  1 +
 xen/common/sched_arinc653.c |  2 +-
 xen/common/sched_credit.c   | 49 ++++++++++++++++++++++++---------------------
 xen/common/sched_credit2.c  |  7 ++++---
 xen/common/sched_null.c     |  3 ++-
 xen/common/sched_rt.c       |  8 +++++++-
 xen/common/schedule.c       |  2 +-
 xen/include/xen/sched-if.h  | 14 +++++++++++++
 xen/include/xen/sched.h     |  1 +
 9 files changed, 57 insertions(+), 30 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 65f8a369c1..0daedca729 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -150,6 +150,7 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
     if ( is_idle_domain(d) )
     {
         v->runstate.state = RUNSTATE_running;
+        v->new_state = RUNSTATE_running;
     }
     else
     {
diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 34efcc07c9..759c42ab73 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -557,7 +557,7 @@ a653sched_do_schedule(
     if ( !((new_task != NULL)
            && (AUNIT(new_task) != NULL)
            && AUNIT(new_task)->awake
-           && unit_runnable(new_task)) )
+           && unit_runnable_state(new_task)) )
         new_task = IDLETASK(cpu);
     BUG_ON(new_task == NULL);
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 196f613420..15339e6fae 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1899,7 +1899,7 @@ static void csched_schedule(
     if ( !test_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags)
          && !tasklet_work_scheduled
          && prv->ratelimit
-         && unit_runnable(unit)
+         && unit_runnable_state(unit)
          && !is_idle_unit(unit)
          && runtime < prv->ratelimit )
     {
@@ -1944,33 +1944,36 @@ static void csched_schedule(
         dec_nr_runnable(sched_cpu);
     }
 
-    snext = __runq_elem(runq->next);
-
-    /* Tasklet work (which runs in idle UNIT context) overrides all else. */
-    if ( tasklet_work_scheduled )
-    {
-        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_UNIT(sched_idle_unit(sched_cpu));
-        snext->pri = CSCHED_PRI_TS_BOOST;
-    }
-
     /*
      * Clear YIELD flag before scheduling out
      */
     clear_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags);
 
-    /*
-     * SMP Load balance:
-     *
-     * If the next highest priority local runnable UNIT has already eaten
-     * through its credits, look on other PCPUs to see if we have more
-     * urgent work... If not, csched_load_balance() will return snext, but
-     * already removed from the runq.
-     */
-    if ( snext->pri > CSCHED_PRI_TS_OVER )
-        __runq_remove(snext);
-    else
-        snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
+    do {
+        snext = __runq_elem(runq->next);
+
+        /* Tasklet work (which runs in idle UNIT context) overrides all else. */
+        if ( tasklet_work_scheduled )
+        {
+            TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
+            snext = CSCHED_UNIT(sched_idle_unit(sched_cpu));
+            snext->pri = CSCHED_PRI_TS_BOOST;
+        }
+
+        /*
+         * SMP Load balance:
+         *
+         * If the next highest priority local runnable UNIT has already eaten
+         * through its credits, look on other PCPUs to see if we have more
+         * urgent work... If not, csched_load_balance() will return snext, but
+         * already removed from the runq.
+         */
+        if ( snext->pri > CSCHED_PRI_TS_OVER )
+            __runq_remove(snext);
+        else
+            snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
+
+    } while ( !unit_runnable_state(snext->unit) );
 
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 6695441588..3bfceefa46 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3288,7 +3288,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !yield && prv->ratelimit_us && unit_runnable(scurr->unit) &&
+    if ( !yield && prv->ratelimit_us && unit_runnable_state(scurr->unit) &&
          (now - scurr->unit->state_entry_time) < MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
@@ -3342,7 +3342,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      *
      * Of course, we also default to idle also if scurr is not runnable.
      */
-    if ( unit_runnable(scurr->unit) && !soft_aff_preempt )
+    if ( unit_runnable_state(scurr->unit) && !soft_aff_preempt )
         snext = scurr;
     else
         snext = csched2_unit(sched_idle_unit(cpu));
@@ -3402,7 +3402,8 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          * some budget, then choose it.
          */
         if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || unit_grab_budget(svc)) )
+             (!has_cap(svc) || unit_grab_budget(svc)) &&
+             unit_runnable_state(svc->unit) )
             snext = svc;
 
         /* In any case, if we got this far, break. */
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 534d0aea5f..e9336a2948 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -776,7 +776,8 @@ static void null_schedule(const struct scheduler *ops, struct sched_unit *prev,
         spin_unlock(&prv->waitq_lock);
     }
 
-    if ( unlikely(prev->next_task == NULL || !unit_runnable(prev->next_task)) )
+    if ( unlikely(prev->next_task == NULL ||
+                  !unit_runnable_state(prev->next_task)) )
         prev->next_task = sched_idle_unit(sched_cpu);
 
     NULL_UNIT_CHECK(prev->next_task);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index aed498970e..6258dd2f2a 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1094,12 +1094,18 @@ rt_schedule(const struct scheduler *ops, struct sched_unit *currunit,
     else
     {
         snext = runq_pick(ops, cpumask_of(sched_cpu));
+
         if ( snext == NULL )
             snext = rt_unit(sched_idle_unit(sched_cpu));
+        else if ( !unit_runnable_state(snext->unit) )
+        {
+            q_remove(snext);
+            snext = rt_unit(sched_idle_unit(sched_cpu));
+        }
 
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_unit(currunit) &&
-             unit_runnable(currunit) &&
+             unit_runnable_state(currunit) &&
              scurr->cur_budget > 0 &&
              ( is_idle_unit(snext->unit) ||
                compare_unit_priority(scurr, snext) > 0 ) )
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 9de315ada1..352803622d 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -206,7 +206,7 @@ static inline void sched_unit_runstate_change(struct sched_unit *unit,
     struct vcpu *v = unit->vcpu;
 
     if ( running )
-        vcpu_runstate_change(v, RUNSTATE_running, new_entry_time);
+        vcpu_runstate_change(v, v->new_state, new_entry_time);
     else
         vcpu_runstate_change(v,
             ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index e66d866f41..805cefaa07 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -70,6 +70,20 @@ static inline bool unit_runnable(const struct sched_unit *unit)
     return vcpu_runnable(unit->vcpu);
 }
 
+static inline bool unit_runnable_state(const struct sched_unit *unit)
+{
+    struct vcpu *v;
+    bool runnable;
+
+    v = unit->vcpu;
+    runnable = vcpu_runnable(v);
+
+    v->new_state = runnable ? RUNSTATE_running
+                            : (v->pause_flags & VPF_blocked)
+                              ? RUNSTATE_blocked : RUNSTATE_offline;
+    return runnable;
+}
+
 static inline void sched_set_res(struct sched_unit *unit,
                                  struct sched_resource *res)
 {
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 9ff9cb148d..d5bf61ced0 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -174,6 +174,7 @@ struct vcpu
         XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
     } runstate_guest; /* guest address */
 #endif
+    int              new_state;
 
     /* Has the FPU been initialised? */
     bool             fpu_initialised;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 37/60] xen/sched: add support for multiple vcpus per sched unit where missing
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

In several places there is support for multiple vcpus per sched unit
missing. Add that missing support (with the exception of initial
allocation) and missing helpers for that.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: fix vcpu_runstate_helper()
V1: add special handling for idle unit in unit_runnable() and
    unit_runnable_state()
---
 xen/common/schedule.c      | 26 ++++++++-------
 xen/include/xen/sched-if.h | 80 +++++++++++++++++++++++++++++++++++++---------
 2 files changed, 79 insertions(+), 27 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 352803622d..a4dbc19403 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -180,8 +180,9 @@ static inline void vcpu_runstate_change(
     s_time_t delta;
     struct sched_unit *unit = v->sched_unit;
 
-    ASSERT(v->runstate.state != new_state);
     ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
+    if ( v->runstate.state == new_state )
+        return;
 
     vcpu_urgent_count_update(v);
 
@@ -203,15 +204,16 @@ static inline void vcpu_runstate_change(
 static inline void sched_unit_runstate_change(struct sched_unit *unit,
     bool running, s_time_t new_entry_time)
 {
-    struct vcpu *v = unit->vcpu;
+    struct vcpu *v;
 
-    if ( running )
-        vcpu_runstate_change(v, v->new_state, new_entry_time);
-    else
-        vcpu_runstate_change(v,
-            ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
-             (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)),
-            new_entry_time);
+    for_each_sched_unit_vcpu ( unit, v )
+        if ( running )
+            vcpu_runstate_change(v, v->new_state, new_entry_time);
+        else
+            vcpu_runstate_change(v,
+                ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
+                 (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)),
+                new_entry_time);
 }
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
@@ -1590,7 +1592,7 @@ static void sched_switch_units(struct sched_resource *sd,
              (next->vcpu->runstate.state == RUNSTATE_runnable) ?
              (now - next->state_entry_time) : 0, prev->next_time);
 
-    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
+    ASSERT(unit_running(prev));
 
     TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
              next->domain->domain_id, next->unit_id);
@@ -1598,7 +1600,7 @@ static void sched_switch_units(struct sched_resource *sd,
     sched_unit_runstate_change(prev, false, now);
     prev->last_run_time = now;
 
-    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
+    ASSERT(!unit_running(next));
     sched_unit_runstate_change(next, true, now);
 
     /*
@@ -1720,7 +1722,7 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
             while ( atomic_read(&next->rendezvous_out_cnt) )
                 cpu_relax();
     }
-    else if ( vprev != vnext )
+    else if ( vprev != vnext && sched_granularity == 1 )
         context_saved(vprev);
 }
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 805cefaa07..98d4dc65e2 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -65,29 +65,61 @@ static inline bool is_idle_unit(const struct sched_unit *unit)
     return is_idle_vcpu(unit->vcpu);
 }
 
+static inline unsigned int unit_running(const struct sched_unit *unit)
+{
+    return unit->runstate_cnt[RUNSTATE_running];
+}
+
 static inline bool unit_runnable(const struct sched_unit *unit)
 {
-    return vcpu_runnable(unit->vcpu);
+    struct vcpu *v;
+
+    if ( is_idle_unit(unit) )
+        return true;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        if ( vcpu_runnable(v) )
+            return true;
+
+    return false;
 }
 
 static inline bool unit_runnable_state(const struct sched_unit *unit)
 {
     struct vcpu *v;
-    bool runnable;
+    bool runnable, ret = false;
 
-    v = unit->vcpu;
-    runnable = vcpu_runnable(v);
+    if ( is_idle_unit(unit) )
+        return true;
 
-    v->new_state = runnable ? RUNSTATE_running
-                            : (v->pause_flags & VPF_blocked)
-                              ? RUNSTATE_blocked : RUNSTATE_offline;
-    return runnable;
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        runnable = vcpu_runnable(v);
+
+        v->new_state = runnable ? RUNSTATE_running
+                                : (v->pause_flags & VPF_blocked)
+                                  ? RUNSTATE_blocked : RUNSTATE_offline;
+
+        if ( runnable )
+            ret = true;
+    }
+
+    return ret;
 }
 
 static inline void sched_set_res(struct sched_unit *unit,
                                  struct sched_resource *res)
 {
-    unit->vcpu->processor = res->processor;
+    int cpu = cpumask_first(res->cpus);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        ASSERT(cpu < nr_cpu_ids);
+        v->processor = cpu;
+        cpu = cpumask_next(cpu, res->cpus);
+    }
+
     unit->res = res;
 }
 
@@ -99,25 +131,37 @@ static inline unsigned int sched_unit_cpu(struct sched_unit *unit)
 static inline void sched_set_pause_flags(struct sched_unit *unit,
                                          unsigned int bit)
 {
-    __set_bit(bit, &unit->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        __set_bit(bit, &v->pause_flags);
 }
 
 static inline void sched_clear_pause_flags(struct sched_unit *unit,
                                            unsigned int bit)
 {
-    __clear_bit(bit, &unit->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        __clear_bit(bit, &v->pause_flags);
 }
 
 static inline void sched_set_pause_flags_atomic(struct sched_unit *unit,
                                                 unsigned int bit)
 {
-    set_bit(bit, &unit->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        set_bit(bit, &v->pause_flags);
 }
 
 static inline void sched_clear_pause_flags_atomic(struct sched_unit *unit,
                                                   unsigned int bit)
 {
-    clear_bit(bit, &unit->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        clear_bit(bit, &v->pause_flags);
 }
 
 static inline struct sched_unit *sched_idle_unit(unsigned int cpu)
@@ -443,12 +487,18 @@ static inline int sched_adjust_cpupool(const struct scheduler *s,
 
 static inline void sched_unit_pause_nosync(struct sched_unit *unit)
 {
-    vcpu_pause_nosync(unit->vcpu);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        vcpu_pause_nosync(v);
 }
 
 static inline void sched_unit_unpause(struct sched_unit *unit)
 {
-    vcpu_unpause(unit->vcpu);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        vcpu_unpause(v);
 }
 
 #define REGISTER_SCHEDULER(x) static const struct scheduler *x##_entry \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 37/60] xen/sched: add support for multiple vcpus per sched unit where missing
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

In several places there is support for multiple vcpus per sched unit
missing. Add that missing support (with the exception of initial
allocation) and missing helpers for that.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: fix vcpu_runstate_helper()
V1: add special handling for idle unit in unit_runnable() and
    unit_runnable_state()
---
 xen/common/schedule.c      | 26 ++++++++-------
 xen/include/xen/sched-if.h | 80 +++++++++++++++++++++++++++++++++++++---------
 2 files changed, 79 insertions(+), 27 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 352803622d..a4dbc19403 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -180,8 +180,9 @@ static inline void vcpu_runstate_change(
     s_time_t delta;
     struct sched_unit *unit = v->sched_unit;
 
-    ASSERT(v->runstate.state != new_state);
     ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
+    if ( v->runstate.state == new_state )
+        return;
 
     vcpu_urgent_count_update(v);
 
@@ -203,15 +204,16 @@ static inline void vcpu_runstate_change(
 static inline void sched_unit_runstate_change(struct sched_unit *unit,
     bool running, s_time_t new_entry_time)
 {
-    struct vcpu *v = unit->vcpu;
+    struct vcpu *v;
 
-    if ( running )
-        vcpu_runstate_change(v, v->new_state, new_entry_time);
-    else
-        vcpu_runstate_change(v,
-            ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
-             (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)),
-            new_entry_time);
+    for_each_sched_unit_vcpu ( unit, v )
+        if ( running )
+            vcpu_runstate_change(v, v->new_state, new_entry_time);
+        else
+            vcpu_runstate_change(v,
+                ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
+                 (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)),
+                new_entry_time);
 }
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
@@ -1590,7 +1592,7 @@ static void sched_switch_units(struct sched_resource *sd,
              (next->vcpu->runstate.state == RUNSTATE_runnable) ?
              (now - next->state_entry_time) : 0, prev->next_time);
 
-    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
+    ASSERT(unit_running(prev));
 
     TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
              next->domain->domain_id, next->unit_id);
@@ -1598,7 +1600,7 @@ static void sched_switch_units(struct sched_resource *sd,
     sched_unit_runstate_change(prev, false, now);
     prev->last_run_time = now;
 
-    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
+    ASSERT(!unit_running(next));
     sched_unit_runstate_change(next, true, now);
 
     /*
@@ -1720,7 +1722,7 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
             while ( atomic_read(&next->rendezvous_out_cnt) )
                 cpu_relax();
     }
-    else if ( vprev != vnext )
+    else if ( vprev != vnext && sched_granularity == 1 )
         context_saved(vprev);
 }
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 805cefaa07..98d4dc65e2 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -65,29 +65,61 @@ static inline bool is_idle_unit(const struct sched_unit *unit)
     return is_idle_vcpu(unit->vcpu);
 }
 
+static inline unsigned int unit_running(const struct sched_unit *unit)
+{
+    return unit->runstate_cnt[RUNSTATE_running];
+}
+
 static inline bool unit_runnable(const struct sched_unit *unit)
 {
-    return vcpu_runnable(unit->vcpu);
+    struct vcpu *v;
+
+    if ( is_idle_unit(unit) )
+        return true;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        if ( vcpu_runnable(v) )
+            return true;
+
+    return false;
 }
 
 static inline bool unit_runnable_state(const struct sched_unit *unit)
 {
     struct vcpu *v;
-    bool runnable;
+    bool runnable, ret = false;
 
-    v = unit->vcpu;
-    runnable = vcpu_runnable(v);
+    if ( is_idle_unit(unit) )
+        return true;
 
-    v->new_state = runnable ? RUNSTATE_running
-                            : (v->pause_flags & VPF_blocked)
-                              ? RUNSTATE_blocked : RUNSTATE_offline;
-    return runnable;
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        runnable = vcpu_runnable(v);
+
+        v->new_state = runnable ? RUNSTATE_running
+                                : (v->pause_flags & VPF_blocked)
+                                  ? RUNSTATE_blocked : RUNSTATE_offline;
+
+        if ( runnable )
+            ret = true;
+    }
+
+    return ret;
 }
 
 static inline void sched_set_res(struct sched_unit *unit,
                                  struct sched_resource *res)
 {
-    unit->vcpu->processor = res->processor;
+    int cpu = cpumask_first(res->cpus);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        ASSERT(cpu < nr_cpu_ids);
+        v->processor = cpu;
+        cpu = cpumask_next(cpu, res->cpus);
+    }
+
     unit->res = res;
 }
 
@@ -99,25 +131,37 @@ static inline unsigned int sched_unit_cpu(struct sched_unit *unit)
 static inline void sched_set_pause_flags(struct sched_unit *unit,
                                          unsigned int bit)
 {
-    __set_bit(bit, &unit->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        __set_bit(bit, &v->pause_flags);
 }
 
 static inline void sched_clear_pause_flags(struct sched_unit *unit,
                                            unsigned int bit)
 {
-    __clear_bit(bit, &unit->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        __clear_bit(bit, &v->pause_flags);
 }
 
 static inline void sched_set_pause_flags_atomic(struct sched_unit *unit,
                                                 unsigned int bit)
 {
-    set_bit(bit, &unit->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        set_bit(bit, &v->pause_flags);
 }
 
 static inline void sched_clear_pause_flags_atomic(struct sched_unit *unit,
                                                   unsigned int bit)
 {
-    clear_bit(bit, &unit->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        clear_bit(bit, &v->pause_flags);
 }
 
 static inline struct sched_unit *sched_idle_unit(unsigned int cpu)
@@ -443,12 +487,18 @@ static inline int sched_adjust_cpupool(const struct scheduler *s,
 
 static inline void sched_unit_pause_nosync(struct sched_unit *unit)
 {
-    vcpu_pause_nosync(unit->vcpu);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        vcpu_pause_nosync(v);
 }
 
 static inline void sched_unit_unpause(struct sched_unit *unit)
 {
-    vcpu_unpause(unit->vcpu);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+        vcpu_unpause(v);
 }
 
 #define REGISTER_SCHEDULER(x) static const struct scheduler *x##_entry \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 38/60] x86: make loading of GDT at context switch more modular
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

In preparation for core scheduling carve out the GDT related
functionality (writing GDT related PTEs, loading default of full GDT)
into sub-functions.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
RFC V2: split off non-refactoring part
V1: constify pointers, use initializers (Jan Beulich)
---
 xen/arch/x86/domain.c | 57 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index d3ee699da6..adc06154ee 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1626,6 +1626,37 @@ static inline bool need_full_gdt(const struct domain *d)
     return is_pv_domain(d) && !is_idle_domain(d);
 }
 
+static inline void write_full_gdt_ptes(seg_desc_t *gdt, const struct vcpu *v)
+{
+    unsigned long mfn = virt_to_mfn(gdt);
+    l1_pgentry_t *pl1e = pv_gdt_ptes(v);
+    unsigned int i;
+
+    for ( i = 0; i < NR_RESERVED_GDT_PAGES; i++ )
+        l1e_write(pl1e + FIRST_RESERVED_GDT_PAGE + i,
+                  l1e_from_pfn(mfn + i, __PAGE_HYPERVISOR_RW));
+}
+
+static inline void load_full_gdt(const struct vcpu *v, unsigned int cpu)
+{
+    struct desc_ptr gdt_desc = {
+        .limit = LAST_RESERVED_GDT_BYTE,
+        .base = GDT_VIRT_START(v),
+    };
+
+    lgdt(&gdt_desc);
+}
+
+static inline void load_default_gdt(const seg_desc_t *gdt, unsigned int cpu)
+{
+    struct desc_ptr gdt_desc = {
+        .limit = LAST_RESERVED_GDT_BYTE,
+        .base  = (unsigned long)(gdt - FIRST_RESERVED_GDT_ENTRY),
+    };
+
+    lgdt(&gdt_desc);
+}
+
 static void __context_switch(void)
 {
     struct cpu_user_regs *stack_regs = guest_cpu_user_regs();
@@ -1634,7 +1665,6 @@ static void __context_switch(void)
     struct vcpu          *n = current;
     struct domain        *pd = p->domain, *nd = n->domain;
     seg_desc_t           *gdt;
-    struct desc_ptr       gdt_desc;
 
     ASSERT(p != n);
     ASSERT(!vcpu_cpu_dirty(n));
@@ -1676,25 +1706,13 @@ static void __context_switch(void)
 
     gdt = !is_pv_32bit_domain(nd) ? per_cpu(gdt_table, cpu) :
                                     per_cpu(compat_gdt_table, cpu);
-    if ( need_full_gdt(nd) )
-    {
-        unsigned long mfn = virt_to_mfn(gdt);
-        l1_pgentry_t *pl1e = pv_gdt_ptes(n);
-        unsigned int i;
 
-        for ( i = 0; i < NR_RESERVED_GDT_PAGES; i++ )
-            l1e_write(pl1e + FIRST_RESERVED_GDT_PAGE + i,
-                      l1e_from_pfn(mfn + i, __PAGE_HYPERVISOR_RW));
-    }
+    if ( need_full_gdt(nd) )
+        write_full_gdt_ptes(gdt, n);
 
     if ( need_full_gdt(pd) &&
          ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd)) )
-    {
-        gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
-        gdt_desc.base  = (unsigned long)(gdt - FIRST_RESERVED_GDT_ENTRY);
-
-        lgdt(&gdt_desc);
-    }
+        load_default_gdt(gdt, cpu);
 
     write_ptbase(n);
 
@@ -1707,12 +1725,7 @@ static void __context_switch(void)
 
     if ( need_full_gdt(nd) &&
          ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) )
-    {
-        gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
-        gdt_desc.base = GDT_VIRT_START(n);
-
-        lgdt(&gdt_desc);
-    }
+        load_full_gdt(n, cpu);
 
     if ( pd != nd )
         cpumask_clear_cpu(cpu, pd->dirty_cpumask);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 38/60] x86: make loading of GDT at context switch more modular
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

In preparation for core scheduling carve out the GDT related
functionality (writing GDT related PTEs, loading default of full GDT)
into sub-functions.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
RFC V2: split off non-refactoring part
V1: constify pointers, use initializers (Jan Beulich)
---
 xen/arch/x86/domain.c | 57 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index d3ee699da6..adc06154ee 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1626,6 +1626,37 @@ static inline bool need_full_gdt(const struct domain *d)
     return is_pv_domain(d) && !is_idle_domain(d);
 }
 
+static inline void write_full_gdt_ptes(seg_desc_t *gdt, const struct vcpu *v)
+{
+    unsigned long mfn = virt_to_mfn(gdt);
+    l1_pgentry_t *pl1e = pv_gdt_ptes(v);
+    unsigned int i;
+
+    for ( i = 0; i < NR_RESERVED_GDT_PAGES; i++ )
+        l1e_write(pl1e + FIRST_RESERVED_GDT_PAGE + i,
+                  l1e_from_pfn(mfn + i, __PAGE_HYPERVISOR_RW));
+}
+
+static inline void load_full_gdt(const struct vcpu *v, unsigned int cpu)
+{
+    struct desc_ptr gdt_desc = {
+        .limit = LAST_RESERVED_GDT_BYTE,
+        .base = GDT_VIRT_START(v),
+    };
+
+    lgdt(&gdt_desc);
+}
+
+static inline void load_default_gdt(const seg_desc_t *gdt, unsigned int cpu)
+{
+    struct desc_ptr gdt_desc = {
+        .limit = LAST_RESERVED_GDT_BYTE,
+        .base  = (unsigned long)(gdt - FIRST_RESERVED_GDT_ENTRY),
+    };
+
+    lgdt(&gdt_desc);
+}
+
 static void __context_switch(void)
 {
     struct cpu_user_regs *stack_regs = guest_cpu_user_regs();
@@ -1634,7 +1665,6 @@ static void __context_switch(void)
     struct vcpu          *n = current;
     struct domain        *pd = p->domain, *nd = n->domain;
     seg_desc_t           *gdt;
-    struct desc_ptr       gdt_desc;
 
     ASSERT(p != n);
     ASSERT(!vcpu_cpu_dirty(n));
@@ -1676,25 +1706,13 @@ static void __context_switch(void)
 
     gdt = !is_pv_32bit_domain(nd) ? per_cpu(gdt_table, cpu) :
                                     per_cpu(compat_gdt_table, cpu);
-    if ( need_full_gdt(nd) )
-    {
-        unsigned long mfn = virt_to_mfn(gdt);
-        l1_pgentry_t *pl1e = pv_gdt_ptes(n);
-        unsigned int i;
 
-        for ( i = 0; i < NR_RESERVED_GDT_PAGES; i++ )
-            l1e_write(pl1e + FIRST_RESERVED_GDT_PAGE + i,
-                      l1e_from_pfn(mfn + i, __PAGE_HYPERVISOR_RW));
-    }
+    if ( need_full_gdt(nd) )
+        write_full_gdt_ptes(gdt, n);
 
     if ( need_full_gdt(pd) &&
          ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd)) )
-    {
-        gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
-        gdt_desc.base  = (unsigned long)(gdt - FIRST_RESERVED_GDT_ENTRY);
-
-        lgdt(&gdt_desc);
-    }
+        load_default_gdt(gdt, cpu);
 
     write_ptbase(n);
 
@@ -1707,12 +1725,7 @@ static void __context_switch(void)
 
     if ( need_full_gdt(nd) &&
          ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) )
-    {
-        gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
-        gdt_desc.base = GDT_VIRT_START(n);
-
-        lgdt(&gdt_desc);
-    }
+        load_full_gdt(n, cpu);
 
     if ( pd != nd )
         cpumask_clear_cpu(cpu, pd->dirty_cpumask);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 39/60] x86: optimize loading of GDT at context switch
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

Instead of dynamically decide whether the previous vcpu was using full
or default GDT just add a percpu variable for that purpose. This at
once removes the need for testing vcpu_ids to differ twice.

Cache the need_full_gdt(nd) value in a local variable.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
RFC V2: new patch (split from previous one)
V1: init percpu flag at cpu startup
    rename variable (Jan Beulich)
---
 xen/arch/x86/cpu/common.c  |  3 +++
 xen/arch/x86/domain.c      | 16 +++++++++++-----
 xen/include/asm-x86/desc.h |  1 +
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 33f5d32557..8b90356fe5 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -49,6 +49,8 @@ unsigned int vaddr_bits __read_mostly = VADDR_BITS;
 static unsigned int cleared_caps[NCAPINTS];
 static unsigned int forced_caps[NCAPINTS];
 
+DEFINE_PER_CPU(bool, full_gdt_loaded);
+
 void __init setup_clear_cpu_cap(unsigned int cap)
 {
 	const uint32_t *dfs;
@@ -745,6 +747,7 @@ void load_system_tables(void)
 		offsetof(struct tss_struct, __cacheline_filler) - 1,
 		SYS_DESC_tss_busy);
 
+        per_cpu(full_gdt_loaded, cpu) = false;
 	lgdt(&gdtr);
 	lidt(&idtr);
 	ltr(TSS_ENTRY << 3);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index adc06154ee..98d2939daf 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1645,6 +1645,8 @@ static inline void load_full_gdt(const struct vcpu *v, unsigned int cpu)
     };
 
     lgdt(&gdt_desc);
+
+    per_cpu(full_gdt_loaded, cpu) = true;
 }
 
 static inline void load_default_gdt(const seg_desc_t *gdt, unsigned int cpu)
@@ -1655,6 +1657,8 @@ static inline void load_default_gdt(const seg_desc_t *gdt, unsigned int cpu)
     };
 
     lgdt(&gdt_desc);
+
+    per_cpu(full_gdt_loaded, cpu) = false;
 }
 
 static void __context_switch(void)
@@ -1665,6 +1669,7 @@ static void __context_switch(void)
     struct vcpu          *n = current;
     struct domain        *pd = p->domain, *nd = n->domain;
     seg_desc_t           *gdt;
+    bool                  full_gdt;
 
     ASSERT(p != n);
     ASSERT(!vcpu_cpu_dirty(n));
@@ -1707,11 +1712,13 @@ static void __context_switch(void)
     gdt = !is_pv_32bit_domain(nd) ? per_cpu(gdt_table, cpu) :
                                     per_cpu(compat_gdt_table, cpu);
 
-    if ( need_full_gdt(nd) )
+    full_gdt = need_full_gdt(nd);
+
+    if ( full_gdt )
         write_full_gdt_ptes(gdt, n);
 
-    if ( need_full_gdt(pd) &&
-         ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd)) )
+    if ( per_cpu(full_gdt_loaded, cpu) &&
+         ((p->vcpu_id != n->vcpu_id) || !full_gdt) )
         load_default_gdt(gdt, cpu);
 
     write_ptbase(n);
@@ -1723,8 +1730,7 @@ static void __context_switch(void)
         svm_load_segs(0, 0, 0, 0, 0, 0, 0);
 #endif
 
-    if ( need_full_gdt(nd) &&
-         ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) )
+    if ( full_gdt && !per_cpu(full_gdt_loaded, cpu) )
         load_full_gdt(n, cpu);
 
     if ( pd != nd )
diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h
index 85e83bcefb..ff9ac5f15d 100644
--- a/xen/include/asm-x86/desc.h
+++ b/xen/include/asm-x86/desc.h
@@ -208,6 +208,7 @@ extern seg_desc_t boot_cpu_gdt_table[];
 DECLARE_PER_CPU(seg_desc_t *, gdt_table);
 extern seg_desc_t boot_cpu_compat_gdt_table[];
 DECLARE_PER_CPU(seg_desc_t *, compat_gdt_table);
+DECLARE_PER_CPU(bool, full_gdt_loaded);
 
 extern void load_TR(void);
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 39/60] x86: optimize loading of GDT at context switch
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

Instead of dynamically decide whether the previous vcpu was using full
or default GDT just add a percpu variable for that purpose. This at
once removes the need for testing vcpu_ids to differ twice.

Cache the need_full_gdt(nd) value in a local variable.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
RFC V2: new patch (split from previous one)
V1: init percpu flag at cpu startup
    rename variable (Jan Beulich)
---
 xen/arch/x86/cpu/common.c  |  3 +++
 xen/arch/x86/domain.c      | 16 +++++++++++-----
 xen/include/asm-x86/desc.h |  1 +
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 33f5d32557..8b90356fe5 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -49,6 +49,8 @@ unsigned int vaddr_bits __read_mostly = VADDR_BITS;
 static unsigned int cleared_caps[NCAPINTS];
 static unsigned int forced_caps[NCAPINTS];
 
+DEFINE_PER_CPU(bool, full_gdt_loaded);
+
 void __init setup_clear_cpu_cap(unsigned int cap)
 {
 	const uint32_t *dfs;
@@ -745,6 +747,7 @@ void load_system_tables(void)
 		offsetof(struct tss_struct, __cacheline_filler) - 1,
 		SYS_DESC_tss_busy);
 
+        per_cpu(full_gdt_loaded, cpu) = false;
 	lgdt(&gdtr);
 	lidt(&idtr);
 	ltr(TSS_ENTRY << 3);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index adc06154ee..98d2939daf 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1645,6 +1645,8 @@ static inline void load_full_gdt(const struct vcpu *v, unsigned int cpu)
     };
 
     lgdt(&gdt_desc);
+
+    per_cpu(full_gdt_loaded, cpu) = true;
 }
 
 static inline void load_default_gdt(const seg_desc_t *gdt, unsigned int cpu)
@@ -1655,6 +1657,8 @@ static inline void load_default_gdt(const seg_desc_t *gdt, unsigned int cpu)
     };
 
     lgdt(&gdt_desc);
+
+    per_cpu(full_gdt_loaded, cpu) = false;
 }
 
 static void __context_switch(void)
@@ -1665,6 +1669,7 @@ static void __context_switch(void)
     struct vcpu          *n = current;
     struct domain        *pd = p->domain, *nd = n->domain;
     seg_desc_t           *gdt;
+    bool                  full_gdt;
 
     ASSERT(p != n);
     ASSERT(!vcpu_cpu_dirty(n));
@@ -1707,11 +1712,13 @@ static void __context_switch(void)
     gdt = !is_pv_32bit_domain(nd) ? per_cpu(gdt_table, cpu) :
                                     per_cpu(compat_gdt_table, cpu);
 
-    if ( need_full_gdt(nd) )
+    full_gdt = need_full_gdt(nd);
+
+    if ( full_gdt )
         write_full_gdt_ptes(gdt, n);
 
-    if ( need_full_gdt(pd) &&
-         ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd)) )
+    if ( per_cpu(full_gdt_loaded, cpu) &&
+         ((p->vcpu_id != n->vcpu_id) || !full_gdt) )
         load_default_gdt(gdt, cpu);
 
     write_ptbase(n);
@@ -1723,8 +1730,7 @@ static void __context_switch(void)
         svm_load_segs(0, 0, 0, 0, 0, 0, 0);
 #endif
 
-    if ( need_full_gdt(nd) &&
-         ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) )
+    if ( full_gdt && !per_cpu(full_gdt_loaded, cpu) )
         load_full_gdt(n, cpu);
 
     if ( pd != nd )
diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h
index 85e83bcefb..ff9ac5f15d 100644
--- a/xen/include/asm-x86/desc.h
+++ b/xen/include/asm-x86/desc.h
@@ -208,6 +208,7 @@ extern seg_desc_t boot_cpu_gdt_table[];
 DECLARE_PER_CPU(seg_desc_t *, gdt_table);
 extern seg_desc_t boot_cpu_compat_gdt_table[];
 DECLARE_PER_CPU(seg_desc_t *, compat_gdt_table);
+DECLARE_PER_CPU(bool, full_gdt_loaded);
 
 extern void load_TR(void);
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 40/60] xen/sched: modify cpupool_domain_cpumask() to be an unit mask
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

cpupool_domain_cpumask() is used by scheduling to select cpus or to
iterate over cpus. In order to support scheduling units spanning
multiple cpus let cpupool_domain_cpumask() return a cpumask with only
one bit set per scheduling resource.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/cpupool.c       | 30 +++++++++++++++++++++---------
 xen/common/schedule.c      |  5 +++--
 xen/include/xen/sched-if.h |  5 ++++-
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 31ac323e40..ba76045937 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -38,26 +38,35 @@ DEFINE_PER_CPU(struct cpupool *, cpupool);
 
 #define cpupool_dprintk(x...) ((void)0)
 
+static void free_cpupool_struct(struct cpupool *c)
+{
+    if ( c )
+    {
+        free_cpumask_var(c->res_valid);
+        free_cpumask_var(c->cpu_valid);
+    }
+    xfree(c);
+}
+
 static struct cpupool *alloc_cpupool_struct(void)
 {
     struct cpupool *c = xzalloc(struct cpupool);
 
-    if ( !c || !zalloc_cpumask_var(&c->cpu_valid) )
+    if ( !c )
+        return NULL;
+
+    zalloc_cpumask_var(&c->cpu_valid);
+    zalloc_cpumask_var(&c->res_valid);
+
+    if ( !c->cpu_valid || !c->res_valid )
     {
-        xfree(c);
+        free_cpupool_struct(c);
         c = NULL;
     }
 
     return c;
 }
 
-static void free_cpupool_struct(struct cpupool *c)
-{
-    if ( c )
-        free_cpumask_var(c->cpu_valid);
-    xfree(c);
-}
-
 /*
  * find a cpupool by it's id. to be called with cpupool lock held
  * if exact is not specified, the first cpupool with an id larger or equal to
@@ -271,6 +280,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
         cpupool_cpu_moving = NULL;
     }
     cpumask_set_cpu(cpu, c->cpu_valid);
+    cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
 
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool(d, c)
@@ -393,6 +403,7 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
     atomic_inc(&c->refcnt);
     cpupool_cpu_moving = c;
     cpumask_clear_cpu(cpu, c->cpu_valid);
+    cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
     spin_unlock(&cpupool_lock);
 
     work_cpu = smp_processor_id();
@@ -509,6 +520,7 @@ static int cpupool_cpu_remove(unsigned int cpu)
          * allowed only for CPUs in pool0.
          */
         cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
+        cpumask_and(cpupool0->res_valid, cpupool0->cpu_valid, sched_res_mask);
         ret = 0;
     }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index a4dbc19403..8af6283758 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -57,6 +57,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
 
 /* Number of vcpus per struct sched_unit. */
 static unsigned int sched_granularity = 1;
+const cpumask_t *sched_res_mask = &cpumask_all;
 
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
@@ -352,9 +353,9 @@ static unsigned int sched_select_initial_cpu(const struct vcpu *v)
     cpumask_clear(cpus);
     for_each_node_mask ( node, d->node_affinity )
         cpumask_or(cpus, cpus, &node_to_cpumask(node));
-    cpumask_and(cpus, cpus, cpupool_domain_cpumask(d));
+    cpumask_and(cpus, cpus, d->cpupool->cpu_valid);
     if ( cpumask_empty(cpus) )
-        cpumask_copy(cpus, cpupool_domain_cpumask(d));
+        cpumask_copy(cpus, d->cpupool->cpu_valid);
 
     if ( v->vcpu_id == 0 )
         cpu_ret = cpumask_first(cpus);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 98d4dc65e2..e93fe9f3be 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -22,6 +22,8 @@ extern cpumask_t cpupool_free_cpus;
 #define SCHED_DEFAULT_RATELIMIT_US 1000
 extern int sched_ratelimit_us;
 
+/* Scheduling resource mask. */
+extern const cpumask_t *sched_res_mask;
 
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
@@ -508,6 +510,7 @@ struct cpupool
 {
     int              cpupool_id;
     cpumask_var_t    cpu_valid;      /* all cpus assigned to pool */
+    cpumask_var_t    res_valid;      /* all scheduling resources of pool */
     struct cpupool   *next;
     unsigned int     n_dom;
     struct scheduler *sched;
@@ -524,7 +527,7 @@ static inline cpumask_t* cpupool_domain_cpumask(const struct domain *d)
      * be interested in calling this for the idle domain.
      */
     ASSERT(d->cpupool != NULL);
-    return d->cpupool->cpu_valid;
+    return d->cpupool->res_valid;
 }
 
 /*
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 40/60] xen/sched: modify cpupool_domain_cpumask() to be an unit mask
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

cpupool_domain_cpumask() is used by scheduling to select cpus or to
iterate over cpus. In order to support scheduling units spanning
multiple cpus let cpupool_domain_cpumask() return a cpumask with only
one bit set per scheduling resource.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/cpupool.c       | 30 +++++++++++++++++++++---------
 xen/common/schedule.c      |  5 +++--
 xen/include/xen/sched-if.h |  5 ++++-
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 31ac323e40..ba76045937 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -38,26 +38,35 @@ DEFINE_PER_CPU(struct cpupool *, cpupool);
 
 #define cpupool_dprintk(x...) ((void)0)
 
+static void free_cpupool_struct(struct cpupool *c)
+{
+    if ( c )
+    {
+        free_cpumask_var(c->res_valid);
+        free_cpumask_var(c->cpu_valid);
+    }
+    xfree(c);
+}
+
 static struct cpupool *alloc_cpupool_struct(void)
 {
     struct cpupool *c = xzalloc(struct cpupool);
 
-    if ( !c || !zalloc_cpumask_var(&c->cpu_valid) )
+    if ( !c )
+        return NULL;
+
+    zalloc_cpumask_var(&c->cpu_valid);
+    zalloc_cpumask_var(&c->res_valid);
+
+    if ( !c->cpu_valid || !c->res_valid )
     {
-        xfree(c);
+        free_cpupool_struct(c);
         c = NULL;
     }
 
     return c;
 }
 
-static void free_cpupool_struct(struct cpupool *c)
-{
-    if ( c )
-        free_cpumask_var(c->cpu_valid);
-    xfree(c);
-}
-
 /*
  * find a cpupool by it's id. to be called with cpupool lock held
  * if exact is not specified, the first cpupool with an id larger or equal to
@@ -271,6 +280,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
         cpupool_cpu_moving = NULL;
     }
     cpumask_set_cpu(cpu, c->cpu_valid);
+    cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
 
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool(d, c)
@@ -393,6 +403,7 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
     atomic_inc(&c->refcnt);
     cpupool_cpu_moving = c;
     cpumask_clear_cpu(cpu, c->cpu_valid);
+    cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
     spin_unlock(&cpupool_lock);
 
     work_cpu = smp_processor_id();
@@ -509,6 +520,7 @@ static int cpupool_cpu_remove(unsigned int cpu)
          * allowed only for CPUs in pool0.
          */
         cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
+        cpumask_and(cpupool0->res_valid, cpupool0->cpu_valid, sched_res_mask);
         ret = 0;
     }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index a4dbc19403..8af6283758 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -57,6 +57,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
 
 /* Number of vcpus per struct sched_unit. */
 static unsigned int sched_granularity = 1;
+const cpumask_t *sched_res_mask = &cpumask_all;
 
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
@@ -352,9 +353,9 @@ static unsigned int sched_select_initial_cpu(const struct vcpu *v)
     cpumask_clear(cpus);
     for_each_node_mask ( node, d->node_affinity )
         cpumask_or(cpus, cpus, &node_to_cpumask(node));
-    cpumask_and(cpus, cpus, cpupool_domain_cpumask(d));
+    cpumask_and(cpus, cpus, d->cpupool->cpu_valid);
     if ( cpumask_empty(cpus) )
-        cpumask_copy(cpus, cpupool_domain_cpumask(d));
+        cpumask_copy(cpus, d->cpupool->cpu_valid);
 
     if ( v->vcpu_id == 0 )
         cpu_ret = cpumask_first(cpus);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 98d4dc65e2..e93fe9f3be 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -22,6 +22,8 @@ extern cpumask_t cpupool_free_cpus;
 #define SCHED_DEFAULT_RATELIMIT_US 1000
 extern int sched_ratelimit_us;
 
+/* Scheduling resource mask. */
+extern const cpumask_t *sched_res_mask;
 
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
@@ -508,6 +510,7 @@ struct cpupool
 {
     int              cpupool_id;
     cpumask_var_t    cpu_valid;      /* all cpus assigned to pool */
+    cpumask_var_t    res_valid;      /* all scheduling resources of pool */
     struct cpupool   *next;
     unsigned int     n_dom;
     struct scheduler *sched;
@@ -524,7 +527,7 @@ static inline cpumask_t* cpupool_domain_cpumask(const struct domain *d)
      * be interested in calling this for the idle domain.
      */
     ASSERT(d->cpupool != NULL);
-    return d->cpupool->cpu_valid;
+    return d->cpupool->res_valid;
 }
 
 /*
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 41/60] xen/sched: support allocating multiple vcpus into one sched unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

With a scheduling granularity greater than 1 multiple vcpus share the
same struct sched_unit. Support that.

Setting the initial processor must be done carefully: we can't use
sched_set_res() as that relies on for_each_sched_unit_vcpu() which in
turn needs the vcpu already as a member of the domain's vcpu linked
list, which isn't the case.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 85 +++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 65 insertions(+), 20 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 8af6283758..837e183004 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -275,10 +275,26 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
     spin_unlock_irqrestore(lock1, flags);
 }
 
-static void sched_free_unit(struct sched_unit *unit)
+static void sched_free_unit(struct sched_unit *unit, struct vcpu *v)
 {
     struct sched_unit *prev_unit;
     struct domain *d = unit->domain;
+    struct vcpu *vunit;
+    unsigned int cnt = 0;
+
+    /* Don't count to be released vcpu, might be not in vcpu list yet. */
+    for_each_sched_unit_vcpu ( unit, vunit )
+        if ( vunit != v )
+            cnt++;
+
+    v->sched_unit = NULL;
+    unit->runstate_cnt[v->runstate.state]--;
+
+    if ( cnt )
+        return;
+
+    if ( unit->vcpu == v )
+        unit->vcpu = v->next_in_list;
 
     if ( d->sched_unit_list == unit )
         d->sched_unit_list = unit->next_in_list;
@@ -294,8 +310,6 @@ static void sched_free_unit(struct sched_unit *unit)
         }
     }
 
-    unit->vcpu->sched_unit = NULL;
-
     free_cpumask_var(unit->cpu_hard_affinity);
     free_cpumask_var(unit->cpu_hard_affinity_tmp);
     free_cpumask_var(unit->cpu_hard_affinity_saved);
@@ -304,19 +318,38 @@ static void sched_free_unit(struct sched_unit *unit)
     xfree(unit);
 }
 
+static void sched_unit_add_vcpu(struct sched_unit *unit, struct vcpu *v)
+{
+    v->sched_unit = unit;
+    if ( !unit->vcpu || unit->vcpu->vcpu_id > v->vcpu_id )
+    {
+        unit->vcpu = v;
+        unit->unit_id = v->vcpu_id;
+    }
+    unit->runstate_cnt[v->runstate.state]++;
+}
+
 static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 {
     struct sched_unit *unit, **prev_unit;
     struct domain *d = v->domain;
 
+    for_each_sched_unit ( d, unit )
+        if ( unit->vcpu->vcpu_id / sched_granularity ==
+             v->vcpu_id / sched_granularity )
+            break;
+
+    if ( unit )
+    {
+        sched_unit_add_vcpu(unit, v);
+        return unit;
+    }
+
     if ( (unit = xzalloc(struct sched_unit)) == NULL )
         return NULL;
 
-    v->sched_unit = unit;
-    unit->vcpu = v;
-    unit->unit_id = v->vcpu_id;
+    sched_unit_add_vcpu(unit, v);
     unit->domain = d;
-    unit->runstate_cnt[v->runstate.state]++;
 
     for ( prev_unit = &d->sched_unit_list; *prev_unit;
           prev_unit = &(*prev_unit)->next_in_list )
@@ -336,7 +369,7 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
     return unit;
 
  fail:
-    sched_free_unit(unit);
+    sched_free_unit(unit, v);
     return NULL;
 }
 
@@ -386,20 +419,25 @@ int sched_init_vcpu(struct vcpu *v)
     else
         processor = sched_select_initial_cpu(v);
 
-    sched_set_res(unit, get_sched_res(processor));
-
     /* Initialise the per-vcpu timers. */
-    init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
-               v, v->processor);
-    init_timer(&v->singleshot_timer, vcpu_singleshot_timer_fn,
-               v, v->processor);
-    init_timer(&v->poll_timer, poll_timer_fn,
-               v, v->processor);
+    init_timer(&v->periodic_timer, vcpu_periodic_timer_fn, v, processor);
+    init_timer(&v->singleshot_timer, vcpu_singleshot_timer_fn, v, processor);
+    init_timer(&v->poll_timer, poll_timer_fn, v, processor);
+
+    /* If this is not the first vcpu of the unit we are done. */
+    if ( unit->priv != NULL )
+    {
+        v->processor = processor;
+        return 0;
+    }
+
+    /* The first vcpu of an unit can be set via sched_set_res(). */
+    sched_set_res(unit, get_sched_res(processor));
 
     unit->priv = sched_alloc_vdata(dom_scheduler(d), unit, d->sched_priv);
     if ( unit->priv == NULL )
     {
-        sched_free_unit(unit);
+        sched_free_unit(unit, v);
         return 1;
     }
 
@@ -553,9 +591,16 @@ void sched_destroy_vcpu(struct vcpu *v)
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&get_sched_res(v->processor)->urgent_count);
-    sched_remove_unit(vcpu_scheduler(v), unit);
-    sched_free_vdata(vcpu_scheduler(v), unit->priv);
-    sched_free_unit(unit);
+    /*
+     * Vcpus are being destroyed top-down. So being the first vcpu of an unit
+     * is the same as being the only one.
+     */
+    if ( unit->vcpu == v )
+    {
+        sched_remove_unit(vcpu_scheduler(v), unit);
+        sched_free_vdata(vcpu_scheduler(v), unit->priv);
+        sched_free_unit(unit, v);
+    }
 }
 
 int sched_init_domain(struct domain *d, int poolid)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 41/60] xen/sched: support allocating multiple vcpus into one sched unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

With a scheduling granularity greater than 1 multiple vcpus share the
same struct sched_unit. Support that.

Setting the initial processor must be done carefully: we can't use
sched_set_res() as that relies on for_each_sched_unit_vcpu() which in
turn needs the vcpu already as a member of the domain's vcpu linked
list, which isn't the case.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 85 +++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 65 insertions(+), 20 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 8af6283758..837e183004 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -275,10 +275,26 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
     spin_unlock_irqrestore(lock1, flags);
 }
 
-static void sched_free_unit(struct sched_unit *unit)
+static void sched_free_unit(struct sched_unit *unit, struct vcpu *v)
 {
     struct sched_unit *prev_unit;
     struct domain *d = unit->domain;
+    struct vcpu *vunit;
+    unsigned int cnt = 0;
+
+    /* Don't count to be released vcpu, might be not in vcpu list yet. */
+    for_each_sched_unit_vcpu ( unit, vunit )
+        if ( vunit != v )
+            cnt++;
+
+    v->sched_unit = NULL;
+    unit->runstate_cnt[v->runstate.state]--;
+
+    if ( cnt )
+        return;
+
+    if ( unit->vcpu == v )
+        unit->vcpu = v->next_in_list;
 
     if ( d->sched_unit_list == unit )
         d->sched_unit_list = unit->next_in_list;
@@ -294,8 +310,6 @@ static void sched_free_unit(struct sched_unit *unit)
         }
     }
 
-    unit->vcpu->sched_unit = NULL;
-
     free_cpumask_var(unit->cpu_hard_affinity);
     free_cpumask_var(unit->cpu_hard_affinity_tmp);
     free_cpumask_var(unit->cpu_hard_affinity_saved);
@@ -304,19 +318,38 @@ static void sched_free_unit(struct sched_unit *unit)
     xfree(unit);
 }
 
+static void sched_unit_add_vcpu(struct sched_unit *unit, struct vcpu *v)
+{
+    v->sched_unit = unit;
+    if ( !unit->vcpu || unit->vcpu->vcpu_id > v->vcpu_id )
+    {
+        unit->vcpu = v;
+        unit->unit_id = v->vcpu_id;
+    }
+    unit->runstate_cnt[v->runstate.state]++;
+}
+
 static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 {
     struct sched_unit *unit, **prev_unit;
     struct domain *d = v->domain;
 
+    for_each_sched_unit ( d, unit )
+        if ( unit->vcpu->vcpu_id / sched_granularity ==
+             v->vcpu_id / sched_granularity )
+            break;
+
+    if ( unit )
+    {
+        sched_unit_add_vcpu(unit, v);
+        return unit;
+    }
+
     if ( (unit = xzalloc(struct sched_unit)) == NULL )
         return NULL;
 
-    v->sched_unit = unit;
-    unit->vcpu = v;
-    unit->unit_id = v->vcpu_id;
+    sched_unit_add_vcpu(unit, v);
     unit->domain = d;
-    unit->runstate_cnt[v->runstate.state]++;
 
     for ( prev_unit = &d->sched_unit_list; *prev_unit;
           prev_unit = &(*prev_unit)->next_in_list )
@@ -336,7 +369,7 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
     return unit;
 
  fail:
-    sched_free_unit(unit);
+    sched_free_unit(unit, v);
     return NULL;
 }
 
@@ -386,20 +419,25 @@ int sched_init_vcpu(struct vcpu *v)
     else
         processor = sched_select_initial_cpu(v);
 
-    sched_set_res(unit, get_sched_res(processor));
-
     /* Initialise the per-vcpu timers. */
-    init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
-               v, v->processor);
-    init_timer(&v->singleshot_timer, vcpu_singleshot_timer_fn,
-               v, v->processor);
-    init_timer(&v->poll_timer, poll_timer_fn,
-               v, v->processor);
+    init_timer(&v->periodic_timer, vcpu_periodic_timer_fn, v, processor);
+    init_timer(&v->singleshot_timer, vcpu_singleshot_timer_fn, v, processor);
+    init_timer(&v->poll_timer, poll_timer_fn, v, processor);
+
+    /* If this is not the first vcpu of the unit we are done. */
+    if ( unit->priv != NULL )
+    {
+        v->processor = processor;
+        return 0;
+    }
+
+    /* The first vcpu of an unit can be set via sched_set_res(). */
+    sched_set_res(unit, get_sched_res(processor));
 
     unit->priv = sched_alloc_vdata(dom_scheduler(d), unit, d->sched_priv);
     if ( unit->priv == NULL )
     {
-        sched_free_unit(unit);
+        sched_free_unit(unit, v);
         return 1;
     }
 
@@ -553,9 +591,16 @@ void sched_destroy_vcpu(struct vcpu *v)
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&get_sched_res(v->processor)->urgent_count);
-    sched_remove_unit(vcpu_scheduler(v), unit);
-    sched_free_vdata(vcpu_scheduler(v), unit->priv);
-    sched_free_unit(unit);
+    /*
+     * Vcpus are being destroyed top-down. So being the first vcpu of an unit
+     * is the same as being the only one.
+     */
+    if ( unit->vcpu == v )
+    {
+        sched_remove_unit(vcpu_scheduler(v), unit);
+        sched_free_vdata(vcpu_scheduler(v), unit->priv);
+        sched_free_unit(unit, v);
+    }
 }
 
 int sched_init_domain(struct domain *d, int poolid)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 42/60] xen/sched: add a scheduler_percpu_init() function
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

For support of core scheduling the scheduler cpu callback for
CPU_STARTING has to be moved into a dedicated function called by
start_secondary() as it needs to run before spin_debug_enable() then
due to potentially calling xfree().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: fix ARM build
---
 xen/arch/arm/smpboot.c  |  2 ++
 xen/arch/x86/smpboot.c  |  2 ++
 xen/common/schedule.c   | 19 ++++++++++++-------
 xen/include/xen/sched.h |  1 +
 4 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index f756444362..9a6582f2a6 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -350,6 +350,8 @@ void start_secondary(unsigned long boot_phys_offset,
 
     setup_cpu_sibling_map(cpuid);
 
+    scheduler_percpu_init(cpuid);
+
     /* Run local notifiers */
     notify_cpu_starting(cpuid);
     /*
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 153bfbb4b7..7e95b2cdac 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -382,6 +382,8 @@ void start_secondary(void *unused)
 
     set_cpu_sibling_map(cpu);
 
+    scheduler_percpu_init(cpu);
+
     init_percpu_time();
 
     setup_secondary_APIC_clock();
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 837e183004..b4e87e2a58 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -2040,6 +2040,15 @@ static void cpu_schedule_down(unsigned int cpu)
     xfree(sd);
 }
 
+void scheduler_percpu_init(unsigned int cpu)
+{
+    struct scheduler *sched = per_cpu(scheduler, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
+
+    if ( system_state != SYS_STATE_resume )
+        sched_init_pdata(sched, sd->sched_priv, cpu);
+}
+
 static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -2058,8 +2067,8 @@ static int cpu_schedule_callback(
      * data can avoid implementing alloc_pdata. init_pdata may, however, be
      * necessary/useful in this case too (e.g., it can contain the "register
      * the pCPU to the scheduler" part). alloc_pdata (if present) is called
-     * during CPU_UP_PREPARE. init_pdata (if present) is called during
-     * CPU_STARTING.
+     * during CPU_UP_PREPARE. init_pdata (if present) is called before
+     * CPU_STARTING in scheduler_percpu_init().
      *
      * On the other hand, at teardown, we need to reverse what has been done
      * during initialization, and then free the per-pCPU specific data. This
@@ -2082,10 +2091,6 @@ static int cpu_schedule_callback(
      */
     switch ( action )
     {
-    case CPU_STARTING:
-        if ( system_state != SYS_STATE_resume )
-            sched_init_pdata(sched, sd->sched_priv, cpu);
-        break;
     case CPU_UP_PREPARE:
         if ( system_state != SYS_STATE_resume )
             rc = cpu_schedule_up(cpu);
@@ -2206,7 +2211,7 @@ void __init scheduler_init(void)
     get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_priv = sched_alloc_pdata(&ops, 0);
     BUG_ON(IS_ERR(get_sched_res(0)->sched_priv));
-    sched_init_pdata(&ops, get_sched_res(0)->sched_priv, 0);
+    scheduler_percpu_init(0);
 }
 
 /*
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d5bf61ced0..af4c934e0b 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -675,6 +675,7 @@ void __domain_crash(struct domain *d);
 void noreturn asm_domain_crash_synchronous(unsigned long addr);
 
 void scheduler_init(void);
+void scheduler_percpu_init(unsigned int cpu);
 int  sched_init_vcpu(struct vcpu *v);
 void sched_destroy_vcpu(struct vcpu *v);
 int  sched_init_domain(struct domain *d, int poolid);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 42/60] xen/sched: add a scheduler_percpu_init() function
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

For support of core scheduling the scheduler cpu callback for
CPU_STARTING has to be moved into a dedicated function called by
start_secondary() as it needs to run before spin_debug_enable() then
due to potentially calling xfree().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: fix ARM build
---
 xen/arch/arm/smpboot.c  |  2 ++
 xen/arch/x86/smpboot.c  |  2 ++
 xen/common/schedule.c   | 19 ++++++++++++-------
 xen/include/xen/sched.h |  1 +
 4 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index f756444362..9a6582f2a6 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -350,6 +350,8 @@ void start_secondary(unsigned long boot_phys_offset,
 
     setup_cpu_sibling_map(cpuid);
 
+    scheduler_percpu_init(cpuid);
+
     /* Run local notifiers */
     notify_cpu_starting(cpuid);
     /*
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 153bfbb4b7..7e95b2cdac 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -382,6 +382,8 @@ void start_secondary(void *unused)
 
     set_cpu_sibling_map(cpu);
 
+    scheduler_percpu_init(cpu);
+
     init_percpu_time();
 
     setup_secondary_APIC_clock();
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 837e183004..b4e87e2a58 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -2040,6 +2040,15 @@ static void cpu_schedule_down(unsigned int cpu)
     xfree(sd);
 }
 
+void scheduler_percpu_init(unsigned int cpu)
+{
+    struct scheduler *sched = per_cpu(scheduler, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
+
+    if ( system_state != SYS_STATE_resume )
+        sched_init_pdata(sched, sd->sched_priv, cpu);
+}
+
 static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -2058,8 +2067,8 @@ static int cpu_schedule_callback(
      * data can avoid implementing alloc_pdata. init_pdata may, however, be
      * necessary/useful in this case too (e.g., it can contain the "register
      * the pCPU to the scheduler" part). alloc_pdata (if present) is called
-     * during CPU_UP_PREPARE. init_pdata (if present) is called during
-     * CPU_STARTING.
+     * during CPU_UP_PREPARE. init_pdata (if present) is called before
+     * CPU_STARTING in scheduler_percpu_init().
      *
      * On the other hand, at teardown, we need to reverse what has been done
      * during initialization, and then free the per-pCPU specific data. This
@@ -2082,10 +2091,6 @@ static int cpu_schedule_callback(
      */
     switch ( action )
     {
-    case CPU_STARTING:
-        if ( system_state != SYS_STATE_resume )
-            sched_init_pdata(sched, sd->sched_priv, cpu);
-        break;
     case CPU_UP_PREPARE:
         if ( system_state != SYS_STATE_resume )
             rc = cpu_schedule_up(cpu);
@@ -2206,7 +2211,7 @@ void __init scheduler_init(void)
     get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_priv = sched_alloc_pdata(&ops, 0);
     BUG_ON(IS_ERR(get_sched_res(0)->sched_priv));
-    sched_init_pdata(&ops, get_sched_res(0)->sched_priv, 0);
+    scheduler_percpu_init(0);
 }
 
 /*
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d5bf61ced0..af4c934e0b 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -675,6 +675,7 @@ void __domain_crash(struct domain *d);
 void noreturn asm_domain_crash_synchronous(unsigned long addr);
 
 void scheduler_init(void);
+void scheduler_percpu_init(unsigned int cpu);
 int  sched_init_vcpu(struct vcpu *v);
 void sched_destroy_vcpu(struct vcpu *v);
 int  sched_init_domain(struct domain *d, int poolid);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 43/60] xen/sched: add a percpu resource index
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Add a percpu variable holding the index of the cpu in the current
sched_resource structure. This index is used to get the correct vcpu
of a sched_unit on a specific cpu.

For now this index will be zero for all cpus, but with core scheduling
it will be possible to have higher values, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch (carved out from RFC V1 patch 49)
---
 xen/common/schedule.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index b4e87e2a58..58d3de340e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -68,6 +68,7 @@ static void poll_timer_fn(void *data);
 /* This is global for now so that private implementations can reach it */
 DEFINE_PER_CPU(struct scheduler *, scheduler);
 DEFINE_PER_CPU(struct sched_resource *, sched_res);
+static DEFINE_PER_CPU(unsigned int, sched_res_idx);
 
 /* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
@@ -78,6 +79,12 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr
 
 static struct scheduler __read_mostly ops;
 
+static inline struct vcpu *sched_unit2vcpu_cpu(struct sched_unit *unit,
+                                               unsigned int cpu)
+{
+    return unit->domain->vcpu[unit->unit_id + per_cpu(sched_res_idx, cpu)];
+}
+
 static inline struct scheduler *dom_scheduler(const struct domain *d)
 {
     if ( likely(d->cpupool != NULL) )
@@ -1863,7 +1870,7 @@ static void sched_slave(void)
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
-    sched_context_switch(vprev, next->vcpu, now);
+    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now);
 }
 
 /*
@@ -1922,7 +1929,7 @@ static void schedule(void)
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
-    vnext = next->vcpu;
+    vnext = sched_unit2vcpu_cpu(next, cpu);
     sched_context_switch(vprev, vnext, now);
 }
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 43/60] xen/sched: add a percpu resource index
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Add a percpu variable holding the index of the cpu in the current
sched_resource structure. This index is used to get the correct vcpu
of a sched_unit on a specific cpu.

For now this index will be zero for all cpus, but with core scheduling
it will be possible to have higher values, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch (carved out from RFC V1 patch 49)
---
 xen/common/schedule.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index b4e87e2a58..58d3de340e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -68,6 +68,7 @@ static void poll_timer_fn(void *data);
 /* This is global for now so that private implementations can reach it */
 DEFINE_PER_CPU(struct scheduler *, scheduler);
 DEFINE_PER_CPU(struct sched_resource *, sched_res);
+static DEFINE_PER_CPU(unsigned int, sched_res_idx);
 
 /* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
@@ -78,6 +79,12 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr
 
 static struct scheduler __read_mostly ops;
 
+static inline struct vcpu *sched_unit2vcpu_cpu(struct sched_unit *unit,
+                                               unsigned int cpu)
+{
+    return unit->domain->vcpu[unit->unit_id + per_cpu(sched_res_idx, cpu)];
+}
+
 static inline struct scheduler *dom_scheduler(const struct domain *d)
 {
     if ( likely(d->cpupool != NULL) )
@@ -1863,7 +1870,7 @@ static void sched_slave(void)
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
-    sched_context_switch(vprev, next->vcpu, now);
+    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now);
 }
 
 /*
@@ -1922,7 +1929,7 @@ static void schedule(void)
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
-    vnext = next->vcpu;
+    vnext = sched_unit2vcpu_cpu(next, cpu);
     sched_context_switch(vprev, vnext, now);
 }
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 44/60] xen/sched: add fall back to idle vcpu when scheduling unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

When scheduling an unit with multiple vcpus there is no guarantee all
vcpus are available (e.g. above maxvcpus or vcpu offline). Fall back to
idle vcpu of the current cpu in that case. This requires to store the
correct schedule_unit pointer in the idle vcpu as long as it used as
fallback vcpu.

In order to modify the runstates of the correct vcpus when switching
schedule units merge sched_unit_runstate_change() into
sched_switch_units() and loop over the affected physical cpus instead
of the unit's vcpus. This in turn requires an access function to the
current variable of other cpus.

Today context_saved() is called in case previous and next vcpus differ
when doing a context switch. With an idle vcpu being capable to be a
substitute for an offline vcpu this is problematic when switching to
an idle scheduling unit. An idle previous vcpu leaves us in doubt which
schedule unit was active previously, so save the previous unit pointer
in the per-schedule resource area and use its value being non-NULL as
a hint whether context_saved() should be called.

When running an idle vcpu in a non-idle scheduling unit use a specific
guest idle loop not performing any tasklets, memory scrubbing and
livepatching in order to avoid populating the cpu caches with memory
used by other domains (as far as possible). Softirqs are considered to
be save (timers might want to be excluded, but this can be fine-tuned
later).

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch (Andrew Cooper)
V1: use urgent_count to select correct idle routine (Jan Beulich)
---
 xen/arch/x86/domain.c         |  21 ++++++
 xen/common/schedule.c         | 155 ++++++++++++++++++++++++++++--------------
 xen/include/asm-arm/current.h |   1 +
 xen/include/asm-x86/current.h |   7 +-
 xen/include/asm-x86/smp.h     |   3 +
 xen/include/xen/sched-if.h    |   4 +-
 xen/include/xen/sched.h       |   1 +
 7 files changed, 140 insertions(+), 52 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 98d2939daf..39a2c1a047 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -159,6 +159,23 @@ static void idle_loop(void)
     }
 }
 
+/*
+ * Idle loop for siblings of active schedule units.
+ * We don't do any standard idle work like tasklets, page scrubbing or
+ * livepatching.
+ */
+static void guest_idle_loop(void)
+{
+    unsigned int cpu = smp_processor_id();
+
+    for ( ; ; )
+    {
+        if ( !softirq_pending(cpu) )
+            sched_guest_idle(pm_idle, cpu);
+        do_softirq();
+    }
+}
+
 void startup_cpu_idle_loop(void)
 {
     struct vcpu *v = current;
@@ -172,6 +189,10 @@ void startup_cpu_idle_loop(void)
 
 static void noreturn continue_idle_domain(struct vcpu *v)
 {
+    /* Idle vcpus might be attached to non-idle units! */
+    if ( !is_idle_domain(v->sched_unit->domain) )
+        reset_stack_and_jump(guest_idle_loop);
+
     reset_stack_and_jump(idle_loop);
 }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 58d3de340e..e0103fdb1d 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -82,7 +82,18 @@ static struct scheduler __read_mostly ops;
 static inline struct vcpu *sched_unit2vcpu_cpu(struct sched_unit *unit,
                                                unsigned int cpu)
 {
-    return unit->domain->vcpu[unit->unit_id + per_cpu(sched_res_idx, cpu)];
+    unsigned int idx = unit->unit_id + per_cpu(sched_res_idx, cpu);
+    const struct domain *d = unit->domain;
+    struct vcpu *v;
+
+    if ( idx < d->max_vcpus && d->vcpu[idx] )
+    {
+        v = d->vcpu[idx];
+        if ( v->new_state == RUNSTATE_running )
+            return v;
+    }
+
+    return idle_vcpu[cpu];
 }
 
 static inline struct scheduler *dom_scheduler(const struct domain *d)
@@ -196,8 +207,11 @@ static inline void vcpu_runstate_change(
 
     trace_runstate_change(v, new_state);
 
-    unit->runstate_cnt[v->runstate.state]--;
-    unit->runstate_cnt[new_state]++;
+    if ( !is_idle_vcpu(v) )
+    {
+        unit->runstate_cnt[v->runstate.state]--;
+        unit->runstate_cnt[new_state]++;
+    }
 
     delta = new_entry_time - v->runstate.state_entry_time;
     if ( delta > 0 )
@@ -209,19 +223,11 @@ static inline void vcpu_runstate_change(
     v->runstate.state = new_state;
 }
 
-static inline void sched_unit_runstate_change(struct sched_unit *unit,
-    bool running, s_time_t new_entry_time)
+void sched_guest_idle(void (*idle) (void), unsigned int cpu)
 {
-    struct vcpu *v;
-
-    for_each_sched_unit_vcpu ( unit, v )
-        if ( running )
-            vcpu_runstate_change(v, v->new_state, new_entry_time);
-        else
-            vcpu_runstate_change(v,
-                ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
-                 (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)),
-                new_entry_time);
+    atomic_inc(&get_sched_res(cpu)->urgent_count);
+    idle();
+    atomic_dec(&get_sched_res(cpu)->urgent_count);
 }
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
@@ -461,6 +467,7 @@ int sched_init_vcpu(struct vcpu *v)
     if ( is_idle_domain(d) )
     {
         get_sched_res(v->processor)->curr = unit;
+        get_sched_res(v->processor)->sched_unit_idle = unit;
         v->is_running = 1;
         unit->is_running = 1;
         unit->state_entry_time = NOW();
@@ -1637,33 +1644,67 @@ static void sched_switch_units(struct sched_resource *sd,
                                struct sched_unit *next, struct sched_unit *prev,
                                s_time_t now)
 {
-    sd->curr = next;
-
-    TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, prev->unit_id,
-             now - prev->state_entry_time);
-    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, next->unit_id,
-             (next->vcpu->runstate.state == RUNSTATE_runnable) ?
-             (now - next->state_entry_time) : 0, prev->next_time);
+    int cpu;
 
     ASSERT(unit_running(prev));
 
-    TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
-             next->domain->domain_id, next->unit_id);
+    if ( prev != next )
+    {
+        sd->curr = next;
+        sd->prev = prev;
 
-    sched_unit_runstate_change(prev, false, now);
-    prev->last_run_time = now;
+        TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id,
+                 prev->unit_id, now - prev->state_entry_time);
+        TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id,
+                 next->unit_id,
+                 (next->vcpu->runstate.state == RUNSTATE_runnable) ?
+                 (now - next->state_entry_time) : 0, prev->next_time);
+        TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
+                 next->domain->domain_id, next->unit_id);
 
-    ASSERT(!unit_running(next));
-    sched_unit_runstate_change(next, true, now);
+        prev->last_run_time = now;
 
-    /*
-     * NB. Don't add any trace records from here until the actual context
-     * switch, else lost_records resume will not work properly.
-     */
+        ASSERT(!unit_running(next));
+
+        /*
+         * NB. Don't add any trace records from here until the actual context
+         * switch, else lost_records resume will not work properly.
+         */
 
-    ASSERT(!next->is_running);
-    next->vcpu->is_running = 1;
-    next->is_running = 1;
+        ASSERT(!next->is_running);
+        next->is_running = 1;
+
+        if ( is_idle_unit(prev) )
+        {
+            prev->runstate_cnt[RUNSTATE_running] = 0;
+            prev->runstate_cnt[RUNSTATE_runnable] = sched_granularity;
+        }
+        if ( is_idle_unit(next) )
+        {
+            next->runstate_cnt[RUNSTATE_running] = sched_granularity;
+            next->runstate_cnt[RUNSTATE_runnable] = 0;
+        }
+    }
+
+    for_each_cpu ( cpu, sd->cpus )
+    {
+        struct vcpu *vprev = get_cpu_current(cpu);
+        struct vcpu *vnext = sched_unit2vcpu_cpu(next, cpu);
+
+        if ( vprev != vnext || vprev->runstate.state != vnext->new_state )
+        {
+            vcpu_runstate_change(vprev,
+                ((vprev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
+                 (vcpu_runnable(vprev) ? RUNSTATE_runnable : RUNSTATE_offline)),
+                now);
+            vcpu_runstate_change(vnext, vnext->new_state, now);
+        }
+
+        vnext->is_running = 1;
+
+        if ( is_idle_vcpu(vnext) )
+            vnext->sched_unit = next;
+    }
 }
 
 static bool sched_tasklet_check_cpu(unsigned int cpu)
@@ -1719,25 +1760,25 @@ static struct sched_unit *do_schedule(struct sched_unit *prev, s_time_t now,
     if ( prev->next_time >= 0 ) /* -ve means no limit */
         set_timer(&sd->s_timer, now + prev->next_time);
 
-    if ( likely(prev != next) )
-        sched_switch_units(sd, next, prev, now);
+    sched_switch_units(sd, next, prev, now);
 
     return next;
 }
 
-static void context_saved(struct vcpu *prev)
+static void context_saved(struct sched_unit *unit)
 {
-    struct sched_unit *unit = prev->sched_unit;
-
     unit->is_running = 0;
     unit->state_entry_time = NOW();
+    get_sched_res(smp_processor_id())->prev = NULL;
 
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
 
-    sched_context_saved(vcpu_scheduler(prev), unit);
+    sched_context_saved(vcpu_scheduler(unit->vcpu), unit);
 
-    sched_unit_migrate_finish(unit);
+    /* Idle never migrates and idle vcpus might belong to other units. */
+    if ( !is_idle_unit(unit) )
+        sched_unit_migrate_finish(unit);
 }
 
 /*
@@ -1754,11 +1795,13 @@ static void context_saved(struct vcpu *prev)
 void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
 {
     struct sched_unit *next = vnext->sched_unit;
+    struct sched_resource *sd = get_sched_res(smp_processor_id());
 
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
 
-    vprev->is_running = 0;
+    if ( vprev != vnext )
+        vprev->is_running = 0;
 
     if ( atomic_read(&next->rendezvous_out_cnt) )
     {
@@ -1767,20 +1810,23 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
         /* Call context_saved() before releasing other waiters. */
         if ( cnt == 1 )
         {
-            if ( vprev != vnext )
-                context_saved(vprev);
+            if ( sd->prev )
+                context_saved(sd->prev);
             atomic_set(&next->rendezvous_out_cnt, 0);
         }
         else
             while ( atomic_read(&next->rendezvous_out_cnt) )
                 cpu_relax();
     }
-    else if ( vprev != vnext && sched_granularity == 1 )
-        context_saved(vprev);
+    else if ( sd->prev )
+        context_saved(sd->prev);
+
+    if ( is_idle_vcpu(vprev) && vprev != vnext )
+        vprev->sched_unit = sd->sched_unit_idle;
 }
 
 static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
-                                 s_time_t now)
+                                 bool reset_idle_unit, s_time_t now)
 {
     if ( unlikely(vprev == vnext) )
     {
@@ -1789,6 +1835,11 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
                  now - vprev->runstate.state_entry_time,
                  vprev->sched_unit->next_time);
         sched_context_switched(vprev, vnext);
+
+        if ( reset_idle_unit )
+            vnext->sched_unit =
+                get_sched_res(smp_processor_id())->sched_unit_idle;
+
         trace_continue_running(vnext);
         return continue_running(vprev);
     }
@@ -1870,7 +1921,8 @@ static void sched_slave(void)
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
-    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now);
+    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu),
+                         is_idle_unit(next) && !is_idle_unit(prev), now);
 }
 
 /*
@@ -1930,7 +1982,8 @@ static void schedule(void)
     pcpu_schedule_unlock_irq(lock, cpu);
 
     vnext = sched_unit2vcpu_cpu(next, cpu);
-    sched_context_switch(vprev, vnext, now);
+    sched_context_switch(vprev, vnext,
+                         !is_idle_unit(prev) && is_idle_unit(next), now);
 }
 
 /* The scheduler timer: force a run through the scheduler */
@@ -2015,6 +2068,7 @@ static int cpu_schedule_up(unsigned int cpu)
         return -ENOMEM;
 
     sd->curr = idle_vcpu[cpu]->sched_unit;
+    sd->sched_unit_idle = idle_vcpu[cpu]->sched_unit;
 
     /*
      * We don't want to risk calling xfree() on an sd->sched_priv
@@ -2216,6 +2270,7 @@ void __init scheduler_init(void)
     if ( vcpu_create(idle_domain, 0) == NULL )
         BUG();
     get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
+    get_sched_res(0)->sched_unit_idle = idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_priv = sched_alloc_pdata(&ops, 0);
     BUG_ON(IS_ERR(get_sched_res(0)->sched_priv));
     scheduler_percpu_init(0);
diff --git a/xen/include/asm-arm/current.h b/xen/include/asm-arm/current.h
index c4af66fbb9..a7602eef8c 100644
--- a/xen/include/asm-arm/current.h
+++ b/xen/include/asm-arm/current.h
@@ -18,6 +18,7 @@ DECLARE_PER_CPU(struct vcpu *, curr_vcpu);
 
 #define current            (this_cpu(curr_vcpu))
 #define set_current(vcpu)  do { current = (vcpu); } while (0)
+#define get_cpu_current(cpu)  (per_cpu(curr_vcpu, cpu))
 
 /* Per-VCPU state that lives at the top of the stack */
 struct cpu_info {
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index f3508c3c08..1e807d63cf 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -77,6 +77,11 @@ struct cpu_info {
     /* get_stack_bottom() must be 16-byte aligned */
 };
 
+static inline struct cpu_info *get_cpu_info_from_stack(unsigned long sp)
+{
+    return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1;
+}
+
 static inline struct cpu_info *get_cpu_info(void)
 {
 #ifdef __clang__
@@ -87,7 +92,7 @@ static inline struct cpu_info *get_cpu_info(void)
     register unsigned long sp asm("rsp");
 #endif
 
-    return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1;
+    return get_cpu_info_from_stack(sp);
 }
 
 #define get_current()         (get_cpu_info()->current_vcpu)
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index 9f533f9072..51a31ab00a 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -76,6 +76,9 @@ void set_nr_sockets(void);
 /* Representing HT and core siblings in each socket. */
 extern cpumask_t **socket_cpumask;
 
+#define get_cpu_current(cpu) \
+    (get_cpu_info_from_stack((unsigned long)stack_base[cpu])->current_vcpu)
+
 #endif /* !__ASSEMBLY__ */
 
 #endif
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index e93fe9f3be..a1aefa2a25 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -39,6 +39,8 @@ struct sched_resource {
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_unit  *curr;           /* current task                    */
+    struct sched_unit  *sched_unit_idle;
+    struct sched_unit  *prev;           /* previous task                   */
     void               *sched_priv;
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
@@ -168,7 +170,7 @@ static inline void sched_clear_pause_flags_atomic(struct sched_unit *unit,
 
 static inline struct sched_unit *sched_idle_unit(unsigned int cpu)
 {
-    return idle_vcpu[cpu]->sched_unit;
+    return get_sched_res(cpu)->sched_unit_idle;
 }
 
 static inline unsigned int sched_get_resource_cpu(unsigned int cpu)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index af4c934e0b..0581c7d44f 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -921,6 +921,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu);
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate);
 uint64_t get_cpu_idle_time(unsigned int cpu);
 int sched_has_urgent_vcpu(void);
+void sched_guest_idle(void (*idle) (void), unsigned int cpu);
 
 /*
  * Used by idle loop to decide whether there is work to do:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 44/60] xen/sched: add fall back to idle vcpu when scheduling unit
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

When scheduling an unit with multiple vcpus there is no guarantee all
vcpus are available (e.g. above maxvcpus or vcpu offline). Fall back to
idle vcpu of the current cpu in that case. This requires to store the
correct schedule_unit pointer in the idle vcpu as long as it used as
fallback vcpu.

In order to modify the runstates of the correct vcpus when switching
schedule units merge sched_unit_runstate_change() into
sched_switch_units() and loop over the affected physical cpus instead
of the unit's vcpus. This in turn requires an access function to the
current variable of other cpus.

Today context_saved() is called in case previous and next vcpus differ
when doing a context switch. With an idle vcpu being capable to be a
substitute for an offline vcpu this is problematic when switching to
an idle scheduling unit. An idle previous vcpu leaves us in doubt which
schedule unit was active previously, so save the previous unit pointer
in the per-schedule resource area and use its value being non-NULL as
a hint whether context_saved() should be called.

When running an idle vcpu in a non-idle scheduling unit use a specific
guest idle loop not performing any tasklets, memory scrubbing and
livepatching in order to avoid populating the cpu caches with memory
used by other domains (as far as possible). Softirqs are considered to
be save (timers might want to be excluded, but this can be fine-tuned
later).

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch (Andrew Cooper)
V1: use urgent_count to select correct idle routine (Jan Beulich)
---
 xen/arch/x86/domain.c         |  21 ++++++
 xen/common/schedule.c         | 155 ++++++++++++++++++++++++++++--------------
 xen/include/asm-arm/current.h |   1 +
 xen/include/asm-x86/current.h |   7 +-
 xen/include/asm-x86/smp.h     |   3 +
 xen/include/xen/sched-if.h    |   4 +-
 xen/include/xen/sched.h       |   1 +
 7 files changed, 140 insertions(+), 52 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 98d2939daf..39a2c1a047 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -159,6 +159,23 @@ static void idle_loop(void)
     }
 }
 
+/*
+ * Idle loop for siblings of active schedule units.
+ * We don't do any standard idle work like tasklets, page scrubbing or
+ * livepatching.
+ */
+static void guest_idle_loop(void)
+{
+    unsigned int cpu = smp_processor_id();
+
+    for ( ; ; )
+    {
+        if ( !softirq_pending(cpu) )
+            sched_guest_idle(pm_idle, cpu);
+        do_softirq();
+    }
+}
+
 void startup_cpu_idle_loop(void)
 {
     struct vcpu *v = current;
@@ -172,6 +189,10 @@ void startup_cpu_idle_loop(void)
 
 static void noreturn continue_idle_domain(struct vcpu *v)
 {
+    /* Idle vcpus might be attached to non-idle units! */
+    if ( !is_idle_domain(v->sched_unit->domain) )
+        reset_stack_and_jump(guest_idle_loop);
+
     reset_stack_and_jump(idle_loop);
 }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 58d3de340e..e0103fdb1d 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -82,7 +82,18 @@ static struct scheduler __read_mostly ops;
 static inline struct vcpu *sched_unit2vcpu_cpu(struct sched_unit *unit,
                                                unsigned int cpu)
 {
-    return unit->domain->vcpu[unit->unit_id + per_cpu(sched_res_idx, cpu)];
+    unsigned int idx = unit->unit_id + per_cpu(sched_res_idx, cpu);
+    const struct domain *d = unit->domain;
+    struct vcpu *v;
+
+    if ( idx < d->max_vcpus && d->vcpu[idx] )
+    {
+        v = d->vcpu[idx];
+        if ( v->new_state == RUNSTATE_running )
+            return v;
+    }
+
+    return idle_vcpu[cpu];
 }
 
 static inline struct scheduler *dom_scheduler(const struct domain *d)
@@ -196,8 +207,11 @@ static inline void vcpu_runstate_change(
 
     trace_runstate_change(v, new_state);
 
-    unit->runstate_cnt[v->runstate.state]--;
-    unit->runstate_cnt[new_state]++;
+    if ( !is_idle_vcpu(v) )
+    {
+        unit->runstate_cnt[v->runstate.state]--;
+        unit->runstate_cnt[new_state]++;
+    }
 
     delta = new_entry_time - v->runstate.state_entry_time;
     if ( delta > 0 )
@@ -209,19 +223,11 @@ static inline void vcpu_runstate_change(
     v->runstate.state = new_state;
 }
 
-static inline void sched_unit_runstate_change(struct sched_unit *unit,
-    bool running, s_time_t new_entry_time)
+void sched_guest_idle(void (*idle) (void), unsigned int cpu)
 {
-    struct vcpu *v;
-
-    for_each_sched_unit_vcpu ( unit, v )
-        if ( running )
-            vcpu_runstate_change(v, v->new_state, new_entry_time);
-        else
-            vcpu_runstate_change(v,
-                ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
-                 (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)),
-                new_entry_time);
+    atomic_inc(&get_sched_res(cpu)->urgent_count);
+    idle();
+    atomic_dec(&get_sched_res(cpu)->urgent_count);
 }
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
@@ -461,6 +467,7 @@ int sched_init_vcpu(struct vcpu *v)
     if ( is_idle_domain(d) )
     {
         get_sched_res(v->processor)->curr = unit;
+        get_sched_res(v->processor)->sched_unit_idle = unit;
         v->is_running = 1;
         unit->is_running = 1;
         unit->state_entry_time = NOW();
@@ -1637,33 +1644,67 @@ static void sched_switch_units(struct sched_resource *sd,
                                struct sched_unit *next, struct sched_unit *prev,
                                s_time_t now)
 {
-    sd->curr = next;
-
-    TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, prev->unit_id,
-             now - prev->state_entry_time);
-    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, next->unit_id,
-             (next->vcpu->runstate.state == RUNSTATE_runnable) ?
-             (now - next->state_entry_time) : 0, prev->next_time);
+    int cpu;
 
     ASSERT(unit_running(prev));
 
-    TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
-             next->domain->domain_id, next->unit_id);
+    if ( prev != next )
+    {
+        sd->curr = next;
+        sd->prev = prev;
 
-    sched_unit_runstate_change(prev, false, now);
-    prev->last_run_time = now;
+        TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id,
+                 prev->unit_id, now - prev->state_entry_time);
+        TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id,
+                 next->unit_id,
+                 (next->vcpu->runstate.state == RUNSTATE_runnable) ?
+                 (now - next->state_entry_time) : 0, prev->next_time);
+        TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
+                 next->domain->domain_id, next->unit_id);
 
-    ASSERT(!unit_running(next));
-    sched_unit_runstate_change(next, true, now);
+        prev->last_run_time = now;
 
-    /*
-     * NB. Don't add any trace records from here until the actual context
-     * switch, else lost_records resume will not work properly.
-     */
+        ASSERT(!unit_running(next));
+
+        /*
+         * NB. Don't add any trace records from here until the actual context
+         * switch, else lost_records resume will not work properly.
+         */
 
-    ASSERT(!next->is_running);
-    next->vcpu->is_running = 1;
-    next->is_running = 1;
+        ASSERT(!next->is_running);
+        next->is_running = 1;
+
+        if ( is_idle_unit(prev) )
+        {
+            prev->runstate_cnt[RUNSTATE_running] = 0;
+            prev->runstate_cnt[RUNSTATE_runnable] = sched_granularity;
+        }
+        if ( is_idle_unit(next) )
+        {
+            next->runstate_cnt[RUNSTATE_running] = sched_granularity;
+            next->runstate_cnt[RUNSTATE_runnable] = 0;
+        }
+    }
+
+    for_each_cpu ( cpu, sd->cpus )
+    {
+        struct vcpu *vprev = get_cpu_current(cpu);
+        struct vcpu *vnext = sched_unit2vcpu_cpu(next, cpu);
+
+        if ( vprev != vnext || vprev->runstate.state != vnext->new_state )
+        {
+            vcpu_runstate_change(vprev,
+                ((vprev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
+                 (vcpu_runnable(vprev) ? RUNSTATE_runnable : RUNSTATE_offline)),
+                now);
+            vcpu_runstate_change(vnext, vnext->new_state, now);
+        }
+
+        vnext->is_running = 1;
+
+        if ( is_idle_vcpu(vnext) )
+            vnext->sched_unit = next;
+    }
 }
 
 static bool sched_tasklet_check_cpu(unsigned int cpu)
@@ -1719,25 +1760,25 @@ static struct sched_unit *do_schedule(struct sched_unit *prev, s_time_t now,
     if ( prev->next_time >= 0 ) /* -ve means no limit */
         set_timer(&sd->s_timer, now + prev->next_time);
 
-    if ( likely(prev != next) )
-        sched_switch_units(sd, next, prev, now);
+    sched_switch_units(sd, next, prev, now);
 
     return next;
 }
 
-static void context_saved(struct vcpu *prev)
+static void context_saved(struct sched_unit *unit)
 {
-    struct sched_unit *unit = prev->sched_unit;
-
     unit->is_running = 0;
     unit->state_entry_time = NOW();
+    get_sched_res(smp_processor_id())->prev = NULL;
 
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
 
-    sched_context_saved(vcpu_scheduler(prev), unit);
+    sched_context_saved(vcpu_scheduler(unit->vcpu), unit);
 
-    sched_unit_migrate_finish(unit);
+    /* Idle never migrates and idle vcpus might belong to other units. */
+    if ( !is_idle_unit(unit) )
+        sched_unit_migrate_finish(unit);
 }
 
 /*
@@ -1754,11 +1795,13 @@ static void context_saved(struct vcpu *prev)
 void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
 {
     struct sched_unit *next = vnext->sched_unit;
+    struct sched_resource *sd = get_sched_res(smp_processor_id());
 
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
 
-    vprev->is_running = 0;
+    if ( vprev != vnext )
+        vprev->is_running = 0;
 
     if ( atomic_read(&next->rendezvous_out_cnt) )
     {
@@ -1767,20 +1810,23 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
         /* Call context_saved() before releasing other waiters. */
         if ( cnt == 1 )
         {
-            if ( vprev != vnext )
-                context_saved(vprev);
+            if ( sd->prev )
+                context_saved(sd->prev);
             atomic_set(&next->rendezvous_out_cnt, 0);
         }
         else
             while ( atomic_read(&next->rendezvous_out_cnt) )
                 cpu_relax();
     }
-    else if ( vprev != vnext && sched_granularity == 1 )
-        context_saved(vprev);
+    else if ( sd->prev )
+        context_saved(sd->prev);
+
+    if ( is_idle_vcpu(vprev) && vprev != vnext )
+        vprev->sched_unit = sd->sched_unit_idle;
 }
 
 static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
-                                 s_time_t now)
+                                 bool reset_idle_unit, s_time_t now)
 {
     if ( unlikely(vprev == vnext) )
     {
@@ -1789,6 +1835,11 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
                  now - vprev->runstate.state_entry_time,
                  vprev->sched_unit->next_time);
         sched_context_switched(vprev, vnext);
+
+        if ( reset_idle_unit )
+            vnext->sched_unit =
+                get_sched_res(smp_processor_id())->sched_unit_idle;
+
         trace_continue_running(vnext);
         return continue_running(vprev);
     }
@@ -1870,7 +1921,8 @@ static void sched_slave(void)
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
-    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now);
+    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu),
+                         is_idle_unit(next) && !is_idle_unit(prev), now);
 }
 
 /*
@@ -1930,7 +1982,8 @@ static void schedule(void)
     pcpu_schedule_unlock_irq(lock, cpu);
 
     vnext = sched_unit2vcpu_cpu(next, cpu);
-    sched_context_switch(vprev, vnext, now);
+    sched_context_switch(vprev, vnext,
+                         !is_idle_unit(prev) && is_idle_unit(next), now);
 }
 
 /* The scheduler timer: force a run through the scheduler */
@@ -2015,6 +2068,7 @@ static int cpu_schedule_up(unsigned int cpu)
         return -ENOMEM;
 
     sd->curr = idle_vcpu[cpu]->sched_unit;
+    sd->sched_unit_idle = idle_vcpu[cpu]->sched_unit;
 
     /*
      * We don't want to risk calling xfree() on an sd->sched_priv
@@ -2216,6 +2270,7 @@ void __init scheduler_init(void)
     if ( vcpu_create(idle_domain, 0) == NULL )
         BUG();
     get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
+    get_sched_res(0)->sched_unit_idle = idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_priv = sched_alloc_pdata(&ops, 0);
     BUG_ON(IS_ERR(get_sched_res(0)->sched_priv));
     scheduler_percpu_init(0);
diff --git a/xen/include/asm-arm/current.h b/xen/include/asm-arm/current.h
index c4af66fbb9..a7602eef8c 100644
--- a/xen/include/asm-arm/current.h
+++ b/xen/include/asm-arm/current.h
@@ -18,6 +18,7 @@ DECLARE_PER_CPU(struct vcpu *, curr_vcpu);
 
 #define current            (this_cpu(curr_vcpu))
 #define set_current(vcpu)  do { current = (vcpu); } while (0)
+#define get_cpu_current(cpu)  (per_cpu(curr_vcpu, cpu))
 
 /* Per-VCPU state that lives at the top of the stack */
 struct cpu_info {
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index f3508c3c08..1e807d63cf 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -77,6 +77,11 @@ struct cpu_info {
     /* get_stack_bottom() must be 16-byte aligned */
 };
 
+static inline struct cpu_info *get_cpu_info_from_stack(unsigned long sp)
+{
+    return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1;
+}
+
 static inline struct cpu_info *get_cpu_info(void)
 {
 #ifdef __clang__
@@ -87,7 +92,7 @@ static inline struct cpu_info *get_cpu_info(void)
     register unsigned long sp asm("rsp");
 #endif
 
-    return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1;
+    return get_cpu_info_from_stack(sp);
 }
 
 #define get_current()         (get_cpu_info()->current_vcpu)
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index 9f533f9072..51a31ab00a 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -76,6 +76,9 @@ void set_nr_sockets(void);
 /* Representing HT and core siblings in each socket. */
 extern cpumask_t **socket_cpumask;
 
+#define get_cpu_current(cpu) \
+    (get_cpu_info_from_stack((unsigned long)stack_base[cpu])->current_vcpu)
+
 #endif /* !__ASSEMBLY__ */
 
 #endif
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index e93fe9f3be..a1aefa2a25 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -39,6 +39,8 @@ struct sched_resource {
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_unit  *curr;           /* current task                    */
+    struct sched_unit  *sched_unit_idle;
+    struct sched_unit  *prev;           /* previous task                   */
     void               *sched_priv;
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
@@ -168,7 +170,7 @@ static inline void sched_clear_pause_flags_atomic(struct sched_unit *unit,
 
 static inline struct sched_unit *sched_idle_unit(unsigned int cpu)
 {
-    return idle_vcpu[cpu]->sched_unit;
+    return get_sched_res(cpu)->sched_unit_idle;
 }
 
 static inline unsigned int sched_get_resource_cpu(unsigned int cpu)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index af4c934e0b..0581c7d44f 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -921,6 +921,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu);
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate);
 uint64_t get_cpu_idle_time(unsigned int cpu);
 int sched_has_urgent_vcpu(void);
+void sched_guest_idle(void (*idle) (void), unsigned int cpu);
 
 /*
  * Used by idle loop to decide whether there is work to do:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 45/60] xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

vcpu_wake() and vcpu_sleep() need to be made core scheduling aware:
they might need to switch a single vcpu of an already scheduled unit
between running and not running.

Especially when vcpu_sleep() for a vcpu is being called by a vcpu of
the same scheduling unit special care must be taken in order to avoid
a deadlock: the vcpu to be put asleep must be forced through a
context switch without doing so for the calling vcpu. For this
purpose add a vcpu flag handled in sched_slave() and in
sched_wait_rendezvous_in() allowing a vcpu of the currently running
unit to switch state at a higher priority than a normal schedule
event.

Use the same mechanism when waking up a vcpu of a currently active
unit.

While at it make vcpu_sleep_nosync_locked() static as it is used in
schedule.c only.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: add vcpu_sleep() handling and force_context_switch flag
---
 xen/common/schedule.c      | 144 ++++++++++++++++++++++++++++++++++++++++-----
 xen/include/xen/sched-if.h |   9 ++-
 xen/include/xen/sched.h    |   2 +
 3 files changed, 136 insertions(+), 19 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e0103fdb1d..e189cc7c2b 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -79,21 +79,21 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr
 
 static struct scheduler __read_mostly ops;
 
-static inline struct vcpu *sched_unit2vcpu_cpu(struct sched_unit *unit,
-                                               unsigned int cpu)
+static inline struct vcpu *unit2vcpu_cpu(struct sched_unit *unit,
+                                         unsigned int cpu)
 {
     unsigned int idx = unit->unit_id + per_cpu(sched_res_idx, cpu);
     const struct domain *d = unit->domain;
-    struct vcpu *v;
 
-    if ( idx < d->max_vcpus && d->vcpu[idx] )
-    {
-        v = d->vcpu[idx];
-        if ( v->new_state == RUNSTATE_running )
-            return v;
-    }
+    return (idx < d->max_vcpus && d->vcpu[idx]) ? d->vcpu[idx] : NULL;
+}
 
-    return idle_vcpu[cpu];
+static inline struct vcpu *sched_unit2vcpu_cpu(struct sched_unit *unit,
+                                               unsigned int cpu)
+{
+    struct vcpu *v = unit2vcpu_cpu(unit, cpu);
+
+    return (v && v->new_state == RUNSTATE_running) ? v : idle_vcpu[cpu];
 }
 
 static inline struct scheduler *dom_scheduler(const struct domain *d)
@@ -656,8 +656,10 @@ void sched_destroy_domain(struct domain *d)
     }
 }
 
-void vcpu_sleep_nosync_locked(struct vcpu *v)
+static void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
+    struct sched_unit *unit = v->sched_unit;
+
     ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
 
     if ( likely(!vcpu_runnable(v)) )
@@ -665,7 +667,14 @@ void vcpu_sleep_nosync_locked(struct vcpu *v)
         if ( v->runstate.state == RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
 
-        sched_sleep(vcpu_scheduler(v), v->sched_unit);
+        if ( likely(!unit_runnable(unit)) )
+            sched_sleep(vcpu_scheduler(v), unit);
+        else if ( unit_running(unit) > 1 && v->is_running &&
+                  !v->force_context_switch )
+        {
+            v->force_context_switch = true;
+            cpu_raise_softirq(v->processor, SCHED_SLAVE_SOFTIRQ);
+        }
     }
 }
 
@@ -697,16 +706,22 @@ void vcpu_wake(struct vcpu *v)
 {
     unsigned long flags;
     spinlock_t *lock;
+    struct sched_unit *unit = v->sched_unit;
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
-    lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
+    lock = unit_schedule_lock_irqsave(unit, &flags);
 
     if ( likely(vcpu_runnable(v)) )
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        sched_wake(vcpu_scheduler(v), v->sched_unit);
+        sched_wake(vcpu_scheduler(v), unit);
+        if ( unit->is_running && !v->is_running && !v->force_context_switch )
+        {
+            v->force_context_switch = true;
+            cpu_raise_softirq(v->processor, SCHED_SLAVE_SOFTIRQ);
+        }
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -714,7 +729,7 @@ void vcpu_wake(struct vcpu *v)
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
     }
 
-    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+    unit_schedule_unlock_irqrestore(lock, flags, unit);
 }
 
 void vcpu_unblock(struct vcpu *v)
@@ -1856,6 +1871,61 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
     context_switch(vprev, vnext);
 }
 
+/*
+ * Force a context switch of a single vcpu of an unit.
+ * Might be called either if a vcpu of an already running unit is woken up
+ * or if a vcpu of a running unit is put asleep with other vcpus of the same
+ * unit still running.
+ */
+static struct vcpu *sched_force_context_switch(struct vcpu *vprev,
+                                               struct vcpu *v,
+                                               int cpu, s_time_t now)
+{
+    v->force_context_switch = false;
+
+    if ( vcpu_runnable(v) == v->is_running )
+        return NULL;
+
+    if ( vcpu_runnable(v) )
+    {
+        if ( is_idle_vcpu(vprev) )
+        {
+            vcpu_runstate_change(vprev, RUNSTATE_runnable, now);
+            vprev->sched_unit = get_sched_res(cpu)->sched_unit_idle;
+        }
+        vcpu_runstate_change(v, RUNSTATE_running, now);
+    }
+    else
+    {
+        /* Make sure not to switch last vcpu of an unit away. */
+        if ( unit_running(v->sched_unit) == 1 )
+            return NULL;
+
+        vcpu_runstate_change(v, vcpu_runstate_blocked(v), now);
+        v = sched_unit2vcpu_cpu(vprev->sched_unit, cpu);
+        if ( v != vprev )
+        {
+            if ( is_idle_vcpu(vprev) )
+            {
+                vcpu_runstate_change(vprev, RUNSTATE_runnable, now);
+                vprev->sched_unit = get_sched_res(cpu)->sched_unit_idle;
+            }
+            else
+            {
+                v->sched_unit = vprev->sched_unit;
+                vcpu_runstate_change(v, RUNSTATE_running, now);
+            }
+        }
+    }
+
+    v->is_running = 1;
+
+    /* Make sure not to loose another slave call. */
+    raise_softirq(SCHED_SLAVE_SOFTIRQ);
+
+    return v;
+}
+
 /*
  * Rendezvous before taking a scheduling decision.
  * Called with schedule lock held, so all accesses to the rendezvous counter
@@ -1871,6 +1941,7 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
                                                    s_time_t now)
 {
     struct sched_unit *next;
+    struct vcpu *v;
 
     if ( !--prev->rendezvous_in_cnt )
     {
@@ -1879,8 +1950,28 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
         return next;
     }
 
+    v = unit2vcpu_cpu(prev, cpu);
     while ( prev->rendezvous_in_cnt )
     {
+        if ( v && v->force_context_switch )
+        {
+            struct vcpu *vprev = current;
+
+            v = sched_force_context_switch(vprev, v, cpu, now);
+
+            if ( v )
+            {
+                /* We'll come back another time, so adjust rendezvous_in_cnt. */
+                prev->rendezvous_in_cnt++;
+
+                pcpu_schedule_unlock_irq(lock, cpu);
+
+                sched_context_switch(vprev, v, false, now);
+            }
+
+            v = unit2vcpu_cpu(prev, cpu);
+        }
+
         pcpu_schedule_unlock_irq(lock, cpu);
 
         /* Coming from idle might need to do tasklet work. */
@@ -1897,10 +1988,11 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
 
 static void sched_slave(void)
 {
-    struct vcpu          *vprev = current;
+    struct vcpu          *v, *vprev = current;
     struct sched_unit    *prev = vprev->sched_unit, *next;
     s_time_t              now;
     spinlock_t           *lock;
+    bool                  do_softirq = false;
     int cpu = smp_processor_id();
 
     ASSERT_NOT_IN_ATOMIC();
@@ -1909,9 +2001,29 @@ static void sched_slave(void)
 
     now = NOW();
 
+    v = unit2vcpu_cpu(prev, cpu);
+    if ( v && v->force_context_switch )
+    {
+        v = sched_force_context_switch(vprev, v, cpu, now);
+
+        if ( v )
+        {
+            pcpu_schedule_unlock_irq(lock, cpu);
+
+            sched_context_switch(vprev, v, false, now);
+        }
+
+        do_softirq = true;
+    }
+
     if ( !prev->rendezvous_in_cnt )
     {
         pcpu_schedule_unlock_irq(lock, cpu);
+
+        /* Check for failed forced context switch. */
+        if ( do_softirq )
+            raise_softirq(SCHEDULE_SOFTIRQ);
+
         return;
     }
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index a1aefa2a25..f5962cbcfb 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -88,6 +88,11 @@ static inline bool unit_runnable(const struct sched_unit *unit)
     return false;
 }
 
+static inline int vcpu_runstate_blocked(struct vcpu *v)
+{
+    return (v->pause_flags & VPF_blocked) ? RUNSTATE_blocked : RUNSTATE_offline;
+}
+
 static inline bool unit_runnable_state(const struct sched_unit *unit)
 {
     struct vcpu *v;
@@ -100,9 +105,7 @@ static inline bool unit_runnable_state(const struct sched_unit *unit)
     {
         runnable = vcpu_runnable(v);
 
-        v->new_state = runnable ? RUNSTATE_running
-                                : (v->pause_flags & VPF_blocked)
-                                  ? RUNSTATE_blocked : RUNSTATE_offline;
+        v->new_state = runnable ? RUNSTATE_running : vcpu_runstate_blocked(v);
 
         if ( runnable )
             ret = true;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 0581c7d44f..b6496f57f6 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -186,6 +186,8 @@ struct vcpu
     bool             is_running;
     /* VCPU should wake fast (do not deep sleep the CPU). */
     bool             is_urgent;
+    /* VCPU must context_switch without scheduling unit. */
+    bool             force_context_switch;
 
 #ifdef VCPU_TRAP_LAST
 #define VCPU_TRAP_NONE    0
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 45/60] xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

vcpu_wake() and vcpu_sleep() need to be made core scheduling aware:
they might need to switch a single vcpu of an already scheduled unit
between running and not running.

Especially when vcpu_sleep() for a vcpu is being called by a vcpu of
the same scheduling unit special care must be taken in order to avoid
a deadlock: the vcpu to be put asleep must be forced through a
context switch without doing so for the calling vcpu. For this
purpose add a vcpu flag handled in sched_slave() and in
sched_wait_rendezvous_in() allowing a vcpu of the currently running
unit to switch state at a higher priority than a normal schedule
event.

Use the same mechanism when waking up a vcpu of a currently active
unit.

While at it make vcpu_sleep_nosync_locked() static as it is used in
schedule.c only.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: add vcpu_sleep() handling and force_context_switch flag
---
 xen/common/schedule.c      | 144 ++++++++++++++++++++++++++++++++++++++++-----
 xen/include/xen/sched-if.h |   9 ++-
 xen/include/xen/sched.h    |   2 +
 3 files changed, 136 insertions(+), 19 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e0103fdb1d..e189cc7c2b 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -79,21 +79,21 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr
 
 static struct scheduler __read_mostly ops;
 
-static inline struct vcpu *sched_unit2vcpu_cpu(struct sched_unit *unit,
-                                               unsigned int cpu)
+static inline struct vcpu *unit2vcpu_cpu(struct sched_unit *unit,
+                                         unsigned int cpu)
 {
     unsigned int idx = unit->unit_id + per_cpu(sched_res_idx, cpu);
     const struct domain *d = unit->domain;
-    struct vcpu *v;
 
-    if ( idx < d->max_vcpus && d->vcpu[idx] )
-    {
-        v = d->vcpu[idx];
-        if ( v->new_state == RUNSTATE_running )
-            return v;
-    }
+    return (idx < d->max_vcpus && d->vcpu[idx]) ? d->vcpu[idx] : NULL;
+}
 
-    return idle_vcpu[cpu];
+static inline struct vcpu *sched_unit2vcpu_cpu(struct sched_unit *unit,
+                                               unsigned int cpu)
+{
+    struct vcpu *v = unit2vcpu_cpu(unit, cpu);
+
+    return (v && v->new_state == RUNSTATE_running) ? v : idle_vcpu[cpu];
 }
 
 static inline struct scheduler *dom_scheduler(const struct domain *d)
@@ -656,8 +656,10 @@ void sched_destroy_domain(struct domain *d)
     }
 }
 
-void vcpu_sleep_nosync_locked(struct vcpu *v)
+static void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
+    struct sched_unit *unit = v->sched_unit;
+
     ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
 
     if ( likely(!vcpu_runnable(v)) )
@@ -665,7 +667,14 @@ void vcpu_sleep_nosync_locked(struct vcpu *v)
         if ( v->runstate.state == RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
 
-        sched_sleep(vcpu_scheduler(v), v->sched_unit);
+        if ( likely(!unit_runnable(unit)) )
+            sched_sleep(vcpu_scheduler(v), unit);
+        else if ( unit_running(unit) > 1 && v->is_running &&
+                  !v->force_context_switch )
+        {
+            v->force_context_switch = true;
+            cpu_raise_softirq(v->processor, SCHED_SLAVE_SOFTIRQ);
+        }
     }
 }
 
@@ -697,16 +706,22 @@ void vcpu_wake(struct vcpu *v)
 {
     unsigned long flags;
     spinlock_t *lock;
+    struct sched_unit *unit = v->sched_unit;
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
-    lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
+    lock = unit_schedule_lock_irqsave(unit, &flags);
 
     if ( likely(vcpu_runnable(v)) )
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        sched_wake(vcpu_scheduler(v), v->sched_unit);
+        sched_wake(vcpu_scheduler(v), unit);
+        if ( unit->is_running && !v->is_running && !v->force_context_switch )
+        {
+            v->force_context_switch = true;
+            cpu_raise_softirq(v->processor, SCHED_SLAVE_SOFTIRQ);
+        }
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -714,7 +729,7 @@ void vcpu_wake(struct vcpu *v)
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
     }
 
-    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+    unit_schedule_unlock_irqrestore(lock, flags, unit);
 }
 
 void vcpu_unblock(struct vcpu *v)
@@ -1856,6 +1871,61 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
     context_switch(vprev, vnext);
 }
 
+/*
+ * Force a context switch of a single vcpu of an unit.
+ * Might be called either if a vcpu of an already running unit is woken up
+ * or if a vcpu of a running unit is put asleep with other vcpus of the same
+ * unit still running.
+ */
+static struct vcpu *sched_force_context_switch(struct vcpu *vprev,
+                                               struct vcpu *v,
+                                               int cpu, s_time_t now)
+{
+    v->force_context_switch = false;
+
+    if ( vcpu_runnable(v) == v->is_running )
+        return NULL;
+
+    if ( vcpu_runnable(v) )
+    {
+        if ( is_idle_vcpu(vprev) )
+        {
+            vcpu_runstate_change(vprev, RUNSTATE_runnable, now);
+            vprev->sched_unit = get_sched_res(cpu)->sched_unit_idle;
+        }
+        vcpu_runstate_change(v, RUNSTATE_running, now);
+    }
+    else
+    {
+        /* Make sure not to switch last vcpu of an unit away. */
+        if ( unit_running(v->sched_unit) == 1 )
+            return NULL;
+
+        vcpu_runstate_change(v, vcpu_runstate_blocked(v), now);
+        v = sched_unit2vcpu_cpu(vprev->sched_unit, cpu);
+        if ( v != vprev )
+        {
+            if ( is_idle_vcpu(vprev) )
+            {
+                vcpu_runstate_change(vprev, RUNSTATE_runnable, now);
+                vprev->sched_unit = get_sched_res(cpu)->sched_unit_idle;
+            }
+            else
+            {
+                v->sched_unit = vprev->sched_unit;
+                vcpu_runstate_change(v, RUNSTATE_running, now);
+            }
+        }
+    }
+
+    v->is_running = 1;
+
+    /* Make sure not to loose another slave call. */
+    raise_softirq(SCHED_SLAVE_SOFTIRQ);
+
+    return v;
+}
+
 /*
  * Rendezvous before taking a scheduling decision.
  * Called with schedule lock held, so all accesses to the rendezvous counter
@@ -1871,6 +1941,7 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
                                                    s_time_t now)
 {
     struct sched_unit *next;
+    struct vcpu *v;
 
     if ( !--prev->rendezvous_in_cnt )
     {
@@ -1879,8 +1950,28 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
         return next;
     }
 
+    v = unit2vcpu_cpu(prev, cpu);
     while ( prev->rendezvous_in_cnt )
     {
+        if ( v && v->force_context_switch )
+        {
+            struct vcpu *vprev = current;
+
+            v = sched_force_context_switch(vprev, v, cpu, now);
+
+            if ( v )
+            {
+                /* We'll come back another time, so adjust rendezvous_in_cnt. */
+                prev->rendezvous_in_cnt++;
+
+                pcpu_schedule_unlock_irq(lock, cpu);
+
+                sched_context_switch(vprev, v, false, now);
+            }
+
+            v = unit2vcpu_cpu(prev, cpu);
+        }
+
         pcpu_schedule_unlock_irq(lock, cpu);
 
         /* Coming from idle might need to do tasklet work. */
@@ -1897,10 +1988,11 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
 
 static void sched_slave(void)
 {
-    struct vcpu          *vprev = current;
+    struct vcpu          *v, *vprev = current;
     struct sched_unit    *prev = vprev->sched_unit, *next;
     s_time_t              now;
     spinlock_t           *lock;
+    bool                  do_softirq = false;
     int cpu = smp_processor_id();
 
     ASSERT_NOT_IN_ATOMIC();
@@ -1909,9 +2001,29 @@ static void sched_slave(void)
 
     now = NOW();
 
+    v = unit2vcpu_cpu(prev, cpu);
+    if ( v && v->force_context_switch )
+    {
+        v = sched_force_context_switch(vprev, v, cpu, now);
+
+        if ( v )
+        {
+            pcpu_schedule_unlock_irq(lock, cpu);
+
+            sched_context_switch(vprev, v, false, now);
+        }
+
+        do_softirq = true;
+    }
+
     if ( !prev->rendezvous_in_cnt )
     {
         pcpu_schedule_unlock_irq(lock, cpu);
+
+        /* Check for failed forced context switch. */
+        if ( do_softirq )
+            raise_softirq(SCHEDULE_SOFTIRQ);
+
         return;
     }
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index a1aefa2a25..f5962cbcfb 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -88,6 +88,11 @@ static inline bool unit_runnable(const struct sched_unit *unit)
     return false;
 }
 
+static inline int vcpu_runstate_blocked(struct vcpu *v)
+{
+    return (v->pause_flags & VPF_blocked) ? RUNSTATE_blocked : RUNSTATE_offline;
+}
+
 static inline bool unit_runnable_state(const struct sched_unit *unit)
 {
     struct vcpu *v;
@@ -100,9 +105,7 @@ static inline bool unit_runnable_state(const struct sched_unit *unit)
     {
         runnable = vcpu_runnable(v);
 
-        v->new_state = runnable ? RUNSTATE_running
-                                : (v->pause_flags & VPF_blocked)
-                                  ? RUNSTATE_blocked : RUNSTATE_offline;
+        v->new_state = runnable ? RUNSTATE_running : vcpu_runstate_blocked(v);
 
         if ( runnable )
             ret = true;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 0581c7d44f..b6496f57f6 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -186,6 +186,8 @@ struct vcpu
     bool             is_running;
     /* VCPU should wake fast (do not deep sleep the CPU). */
     bool             is_urgent;
+    /* VCPU must context_switch without scheduling unit. */
+    bool             force_context_switch;
 
 #ifdef VCPU_TRAP_LAST
 #define VCPU_TRAP_NONE    0
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 46/60] xen/sched: carve out freeing sched_unit memory into dedicated function
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

We'll need a way to free a sched_unit structure without side effects
in a later patch.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch, carved out from RFC V1 patch 49
---
 xen/common/schedule.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e189cc7c2b..9bff4dc183 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -288,26 +288,10 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
     spin_unlock_irqrestore(lock1, flags);
 }
 
-static void sched_free_unit(struct sched_unit *unit, struct vcpu *v)
+static void sched_free_unit_mem(struct sched_unit *unit)
 {
     struct sched_unit *prev_unit;
     struct domain *d = unit->domain;
-    struct vcpu *vunit;
-    unsigned int cnt = 0;
-
-    /* Don't count to be released vcpu, might be not in vcpu list yet. */
-    for_each_sched_unit_vcpu ( unit, vunit )
-        if ( vunit != v )
-            cnt++;
-
-    v->sched_unit = NULL;
-    unit->runstate_cnt[v->runstate.state]--;
-
-    if ( cnt )
-        return;
-
-    if ( unit->vcpu == v )
-        unit->vcpu = v->next_in_list;
 
     if ( d->sched_unit_list == unit )
         d->sched_unit_list = unit->next_in_list;
@@ -331,6 +315,26 @@ static void sched_free_unit(struct sched_unit *unit, struct vcpu *v)
     xfree(unit);
 }
 
+static void sched_free_unit(struct sched_unit *unit, struct vcpu *v)
+{
+    struct vcpu *vunit;
+    unsigned int cnt = 0;
+
+    /* Don't count to be released vcpu, might be not in vcpu list yet. */
+    for_each_sched_unit_vcpu ( unit, vunit )
+        if ( vunit != v )
+            cnt++;
+
+    v->sched_unit = NULL;
+    unit->runstate_cnt[v->runstate.state]--;
+
+    if ( unit->vcpu == v )
+        unit->vcpu = v->next_in_list;
+
+    if ( !cnt )
+        sched_free_unit_mem(unit);
+}
+
 static void sched_unit_add_vcpu(struct sched_unit *unit, struct vcpu *v)
 {
     v->sched_unit = unit;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 46/60] xen/sched: carve out freeing sched_unit memory into dedicated function
@ 2019-05-28 10:32   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

We'll need a way to free a sched_unit structure without side effects
in a later patch.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch, carved out from RFC V1 patch 49
---
 xen/common/schedule.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e189cc7c2b..9bff4dc183 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -288,26 +288,10 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
     spin_unlock_irqrestore(lock1, flags);
 }
 
-static void sched_free_unit(struct sched_unit *unit, struct vcpu *v)
+static void sched_free_unit_mem(struct sched_unit *unit)
 {
     struct sched_unit *prev_unit;
     struct domain *d = unit->domain;
-    struct vcpu *vunit;
-    unsigned int cnt = 0;
-
-    /* Don't count to be released vcpu, might be not in vcpu list yet. */
-    for_each_sched_unit_vcpu ( unit, vunit )
-        if ( vunit != v )
-            cnt++;
-
-    v->sched_unit = NULL;
-    unit->runstate_cnt[v->runstate.state]--;
-
-    if ( cnt )
-        return;
-
-    if ( unit->vcpu == v )
-        unit->vcpu = v->next_in_list;
 
     if ( d->sched_unit_list == unit )
         d->sched_unit_list = unit->next_in_list;
@@ -331,6 +315,26 @@ static void sched_free_unit(struct sched_unit *unit, struct vcpu *v)
     xfree(unit);
 }
 
+static void sched_free_unit(struct sched_unit *unit, struct vcpu *v)
+{
+    struct vcpu *vunit;
+    unsigned int cnt = 0;
+
+    /* Don't count to be released vcpu, might be not in vcpu list yet. */
+    for_each_sched_unit_vcpu ( unit, vunit )
+        if ( vunit != v )
+            cnt++;
+
+    v->sched_unit = NULL;
+    unit->runstate_cnt[v->runstate.state]--;
+
+    if ( unit->vcpu == v )
+        unit->vcpu = v->next_in_list;
+
+    if ( !cnt )
+        sched_free_unit_mem(unit);
+}
+
 static void sched_unit_add_vcpu(struct sched_unit *unit, struct vcpu *v)
 {
     v->sched_unit = unit;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 47/60] xen/sched: move per-cpu variable scheduler to struct sched_resource
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Having a pointer to struct scheduler in struct sched_resource instead
of per cpu is enough.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/sched_credit.c  | 18 +++++++++++-------
 xen/common/sched_credit2.c |  3 ++-
 xen/common/schedule.c      | 21 ++++++++++-----------
 xen/include/xen/sched-if.h |  2 +-
 4 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 15339e6fae..5e60788112 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -350,9 +350,10 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 static inline void __runq_tickle(struct csched_unit *new)
 {
     unsigned int cpu = sched_unit_cpu(new->unit);
+    struct sched_resource *sd = get_sched_res(cpu);
     struct sched_unit *unit = new->unit;
     struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu));
-    struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
+    struct csched_private *prv = CSCHED_PRIV(sd->scheduler);
     cpumask_t mask, idle_mask, *online;
     int balance_step, idlers_empty;
 
@@ -937,7 +938,8 @@ csched_unit_acct(struct csched_private *prv, unsigned int cpu)
 {
     struct sched_unit *currunit = current->sched_unit;
     struct csched_unit * const svc = CSCHED_UNIT(currunit);
-    const struct scheduler *ops = per_cpu(scheduler, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
+    const struct scheduler *ops = sd->scheduler;
 
     ASSERT( sched_unit_cpu(currunit) == cpu );
     ASSERT( svc->sdom != NULL );
@@ -993,8 +995,7 @@ csched_unit_acct(struct csched_private *prv, unsigned int cpu)
              * idlers. But, if we are here, it means there is someone running
              * on it, and hence the bit must be zero already.
              */
-            ASSERT(!cpumask_test_cpu(cpu,
-                                     CSCHED_PRIV(per_cpu(scheduler, cpu))->idlers));
+            ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(sd->scheduler)->idlers));
             cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
         }
     }
@@ -1089,6 +1090,7 @@ csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched_unit * const svc = CSCHED_UNIT(unit);
     unsigned int cpu = sched_unit_cpu(unit);
+    struct sched_resource *sd = get_sched_res(cpu);
 
     SCHED_STAT_CRANK(unit_sleep);
 
@@ -1101,7 +1103,7 @@ csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
          * But, we are here because unit is going to sleep while running on cpu,
          * so the bit must be zero already.
          */
-        ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(per_cpu(scheduler, cpu))->idlers));
+        ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(sd->scheduler)->idlers));
         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
     }
     else if ( __unit_on_runq(svc) )
@@ -1581,8 +1583,9 @@ static void
 csched_tick(void *_cpu)
 {
     unsigned int cpu = (unsigned long)_cpu;
+    struct sched_resource *sd = get_sched_res(cpu);
     struct csched_pcpu *spc = CSCHED_PCPU(cpu);
-    struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
+    struct csched_private *prv = CSCHED_PRIV(sd->scheduler);
 
     spc->tick++;
 
@@ -1607,7 +1610,8 @@ csched_tick(void *_cpu)
 static struct csched_unit *
 csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
 {
-    const struct csched_private * const prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
+    struct sched_resource *sd = get_sched_res(cpu);
+    const struct csched_private * const prv = CSCHED_PRIV(sd->scheduler);
     const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu);
     struct csched_unit *speer;
     struct list_head *iter;
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 3bfceefa46..1764aa704e 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3265,8 +3265,9 @@ runq_candidate(struct csched2_runqueue_data *rqd,
                unsigned int *skipped)
 {
     struct list_head *iter, *temp;
+    struct sched_resource *sd = get_sched_res(cpu);
     struct csched2_unit *snext = NULL;
-    struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
+    struct csched2_private *prv = csched2_priv(sd->scheduler);
     bool yield = false, soft_aff_preempt = false;
 
     *skipped = 0;
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 9bff4dc183..34c95d1dc6 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -66,7 +66,6 @@ static void vcpu_singleshot_timer_fn(void *data);
 static void poll_timer_fn(void *data);
 
 /* This is global for now so that private implementations can reach it */
-DEFINE_PER_CPU(struct scheduler *, scheduler);
 DEFINE_PER_CPU(struct sched_resource *, sched_res);
 static DEFINE_PER_CPU(unsigned int, sched_res_idx);
 
@@ -133,7 +132,7 @@ static inline struct scheduler *vcpu_scheduler(const struct vcpu *v)
      * for idle vCPUs, it is safe to use it, with no locks, to figure that out.
      */
     ASSERT(is_idle_domain(d));
-    return per_cpu(scheduler, v->processor);
+    return get_sched_res(v->processor)->scheduler;
 }
 #define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
 
@@ -1767,8 +1766,8 @@ static bool sched_tasklet_check(unsigned int cpu)
 static struct sched_unit *do_schedule(struct sched_unit *prev, s_time_t now,
                                       unsigned int cpu)
 {
-    struct scheduler *sched = per_cpu(scheduler, cpu);
     struct sched_resource *sd = get_sched_res(cpu);
+    struct scheduler *sched = sd->scheduler;
     struct sched_unit *next;
 
     /* get policy-specific decision on scheduling... */
@@ -2144,7 +2143,7 @@ static int cpu_schedule_up(unsigned int cpu)
     sd->cpus = cpumask_of(cpu);
     set_sched_res(cpu, sd);
 
-    per_cpu(scheduler, cpu) = &ops;
+    sd->scheduler = &ops;
     spin_lock_init(&sd->_lock);
     sd->schedule_lock = &sd->_lock;
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
@@ -2203,7 +2202,7 @@ static int cpu_schedule_up(unsigned int cpu)
 static void cpu_schedule_down(unsigned int cpu)
 {
     struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *sched = per_cpu(scheduler, cpu);
+    struct scheduler *sched = sd->scheduler;
 
     sched_free_pdata(sched, sd->sched_priv, cpu);
     sched_free_vdata(sched, idle_vcpu[cpu]->sched_unit->priv);
@@ -2219,8 +2218,8 @@ static void cpu_schedule_down(unsigned int cpu)
 
 void scheduler_percpu_init(unsigned int cpu)
 {
-    struct scheduler *sched = per_cpu(scheduler, cpu);
     struct sched_resource *sd = get_sched_res(cpu);
+    struct scheduler *sched = sd->scheduler;
 
     if ( system_state != SYS_STATE_resume )
         sched_init_pdata(sched, sd->sched_priv, cpu);
@@ -2230,8 +2229,8 @@ static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
-    struct scheduler *sched = per_cpu(scheduler, cpu);
     struct sched_resource *sd = get_sched_res(cpu);
+    struct scheduler *sched = sd->scheduler;
     int rc = 0;
 
     /*
@@ -2407,7 +2406,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
-    struct scheduler *old_ops = per_cpu(scheduler, cpu);
+    struct scheduler *old_ops = get_sched_res(cpu)->scheduler;
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
     struct cpupool *old_pool = per_cpu(cpupool, cpu);
     struct sched_resource *sd = get_sched_res(cpu);
@@ -2473,7 +2472,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     ppriv_old = sd->sched_priv;
     new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
 
-    per_cpu(scheduler, cpu) = new_ops;
+    sd->scheduler = new_ops;
     sd->sched_priv = ppriv;
 
     /*
@@ -2574,7 +2573,7 @@ void sched_tick_suspend(void)
     struct scheduler *sched;
     unsigned int cpu = smp_processor_id();
 
-    sched = per_cpu(scheduler, cpu);
+    sched = get_sched_res(cpu)->scheduler;
     sched_do_tick_suspend(sched, cpu);
     rcu_idle_enter(cpu);
     rcu_idle_timer_start();
@@ -2587,7 +2586,7 @@ void sched_tick_resume(void)
 
     rcu_idle_timer_stop();
     rcu_idle_exit(cpu);
-    sched = per_cpu(scheduler, cpu);
+    sched = get_sched_res(cpu)->scheduler;
     sched_do_tick_resume(sched, cpu);
 }
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index f5962cbcfb..ad6cf43425 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -36,6 +36,7 @@ extern const cpumask_t *sched_res_mask;
  * as the rest of the struct.  Just have the scheduler point to the
  * one it wants (This may be the one right in front of it).*/
 struct sched_resource {
+    struct scheduler   *scheduler;
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_unit  *curr;           /* current task                    */
@@ -50,7 +51,6 @@ struct sched_resource {
 
 #define curr_on_cpu(c)    (get_sched_res(c)->curr)
 
-DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 47/60] xen/sched: move per-cpu variable scheduler to struct sched_resource
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Having a pointer to struct scheduler in struct sched_resource instead
of per cpu is enough.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/sched_credit.c  | 18 +++++++++++-------
 xen/common/sched_credit2.c |  3 ++-
 xen/common/schedule.c      | 21 ++++++++++-----------
 xen/include/xen/sched-if.h |  2 +-
 4 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 15339e6fae..5e60788112 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -350,9 +350,10 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 static inline void __runq_tickle(struct csched_unit *new)
 {
     unsigned int cpu = sched_unit_cpu(new->unit);
+    struct sched_resource *sd = get_sched_res(cpu);
     struct sched_unit *unit = new->unit;
     struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu));
-    struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
+    struct csched_private *prv = CSCHED_PRIV(sd->scheduler);
     cpumask_t mask, idle_mask, *online;
     int balance_step, idlers_empty;
 
@@ -937,7 +938,8 @@ csched_unit_acct(struct csched_private *prv, unsigned int cpu)
 {
     struct sched_unit *currunit = current->sched_unit;
     struct csched_unit * const svc = CSCHED_UNIT(currunit);
-    const struct scheduler *ops = per_cpu(scheduler, cpu);
+    struct sched_resource *sd = get_sched_res(cpu);
+    const struct scheduler *ops = sd->scheduler;
 
     ASSERT( sched_unit_cpu(currunit) == cpu );
     ASSERT( svc->sdom != NULL );
@@ -993,8 +995,7 @@ csched_unit_acct(struct csched_private *prv, unsigned int cpu)
              * idlers. But, if we are here, it means there is someone running
              * on it, and hence the bit must be zero already.
              */
-            ASSERT(!cpumask_test_cpu(cpu,
-                                     CSCHED_PRIV(per_cpu(scheduler, cpu))->idlers));
+            ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(sd->scheduler)->idlers));
             cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
         }
     }
@@ -1089,6 +1090,7 @@ csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
     struct csched_unit * const svc = CSCHED_UNIT(unit);
     unsigned int cpu = sched_unit_cpu(unit);
+    struct sched_resource *sd = get_sched_res(cpu);
 
     SCHED_STAT_CRANK(unit_sleep);
 
@@ -1101,7 +1103,7 @@ csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
          * But, we are here because unit is going to sleep while running on cpu,
          * so the bit must be zero already.
          */
-        ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(per_cpu(scheduler, cpu))->idlers));
+        ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(sd->scheduler)->idlers));
         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
     }
     else if ( __unit_on_runq(svc) )
@@ -1581,8 +1583,9 @@ static void
 csched_tick(void *_cpu)
 {
     unsigned int cpu = (unsigned long)_cpu;
+    struct sched_resource *sd = get_sched_res(cpu);
     struct csched_pcpu *spc = CSCHED_PCPU(cpu);
-    struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
+    struct csched_private *prv = CSCHED_PRIV(sd->scheduler);
 
     spc->tick++;
 
@@ -1607,7 +1610,8 @@ csched_tick(void *_cpu)
 static struct csched_unit *
 csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
 {
-    const struct csched_private * const prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
+    struct sched_resource *sd = get_sched_res(cpu);
+    const struct csched_private * const prv = CSCHED_PRIV(sd->scheduler);
     const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu);
     struct csched_unit *speer;
     struct list_head *iter;
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 3bfceefa46..1764aa704e 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3265,8 +3265,9 @@ runq_candidate(struct csched2_runqueue_data *rqd,
                unsigned int *skipped)
 {
     struct list_head *iter, *temp;
+    struct sched_resource *sd = get_sched_res(cpu);
     struct csched2_unit *snext = NULL;
-    struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
+    struct csched2_private *prv = csched2_priv(sd->scheduler);
     bool yield = false, soft_aff_preempt = false;
 
     *skipped = 0;
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 9bff4dc183..34c95d1dc6 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -66,7 +66,6 @@ static void vcpu_singleshot_timer_fn(void *data);
 static void poll_timer_fn(void *data);
 
 /* This is global for now so that private implementations can reach it */
-DEFINE_PER_CPU(struct scheduler *, scheduler);
 DEFINE_PER_CPU(struct sched_resource *, sched_res);
 static DEFINE_PER_CPU(unsigned int, sched_res_idx);
 
@@ -133,7 +132,7 @@ static inline struct scheduler *vcpu_scheduler(const struct vcpu *v)
      * for idle vCPUs, it is safe to use it, with no locks, to figure that out.
      */
     ASSERT(is_idle_domain(d));
-    return per_cpu(scheduler, v->processor);
+    return get_sched_res(v->processor)->scheduler;
 }
 #define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
 
@@ -1767,8 +1766,8 @@ static bool sched_tasklet_check(unsigned int cpu)
 static struct sched_unit *do_schedule(struct sched_unit *prev, s_time_t now,
                                       unsigned int cpu)
 {
-    struct scheduler *sched = per_cpu(scheduler, cpu);
     struct sched_resource *sd = get_sched_res(cpu);
+    struct scheduler *sched = sd->scheduler;
     struct sched_unit *next;
 
     /* get policy-specific decision on scheduling... */
@@ -2144,7 +2143,7 @@ static int cpu_schedule_up(unsigned int cpu)
     sd->cpus = cpumask_of(cpu);
     set_sched_res(cpu, sd);
 
-    per_cpu(scheduler, cpu) = &ops;
+    sd->scheduler = &ops;
     spin_lock_init(&sd->_lock);
     sd->schedule_lock = &sd->_lock;
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
@@ -2203,7 +2202,7 @@ static int cpu_schedule_up(unsigned int cpu)
 static void cpu_schedule_down(unsigned int cpu)
 {
     struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *sched = per_cpu(scheduler, cpu);
+    struct scheduler *sched = sd->scheduler;
 
     sched_free_pdata(sched, sd->sched_priv, cpu);
     sched_free_vdata(sched, idle_vcpu[cpu]->sched_unit->priv);
@@ -2219,8 +2218,8 @@ static void cpu_schedule_down(unsigned int cpu)
 
 void scheduler_percpu_init(unsigned int cpu)
 {
-    struct scheduler *sched = per_cpu(scheduler, cpu);
     struct sched_resource *sd = get_sched_res(cpu);
+    struct scheduler *sched = sd->scheduler;
 
     if ( system_state != SYS_STATE_resume )
         sched_init_pdata(sched, sd->sched_priv, cpu);
@@ -2230,8 +2229,8 @@ static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
-    struct scheduler *sched = per_cpu(scheduler, cpu);
     struct sched_resource *sd = get_sched_res(cpu);
+    struct scheduler *sched = sd->scheduler;
     int rc = 0;
 
     /*
@@ -2407,7 +2406,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
-    struct scheduler *old_ops = per_cpu(scheduler, cpu);
+    struct scheduler *old_ops = get_sched_res(cpu)->scheduler;
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
     struct cpupool *old_pool = per_cpu(cpupool, cpu);
     struct sched_resource *sd = get_sched_res(cpu);
@@ -2473,7 +2472,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     ppriv_old = sd->sched_priv;
     new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
 
-    per_cpu(scheduler, cpu) = new_ops;
+    sd->scheduler = new_ops;
     sd->sched_priv = ppriv;
 
     /*
@@ -2574,7 +2573,7 @@ void sched_tick_suspend(void)
     struct scheduler *sched;
     unsigned int cpu = smp_processor_id();
 
-    sched = per_cpu(scheduler, cpu);
+    sched = get_sched_res(cpu)->scheduler;
     sched_do_tick_suspend(sched, cpu);
     rcu_idle_enter(cpu);
     rcu_idle_timer_start();
@@ -2587,7 +2586,7 @@ void sched_tick_resume(void)
 
     rcu_idle_timer_stop();
     rcu_idle_exit(cpu);
-    sched = per_cpu(scheduler, cpu);
+    sched = get_sched_res(cpu)->scheduler;
     sched_do_tick_resume(sched, cpu);
 }
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index f5962cbcfb..ad6cf43425 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -36,6 +36,7 @@ extern const cpumask_t *sched_res_mask;
  * as the rest of the struct.  Just have the scheduler point to the
  * one it wants (This may be the one right in front of it).*/
 struct sched_resource {
+    struct scheduler   *scheduler;
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_unit  *curr;           /* current task                    */
@@ -50,7 +51,6 @@ struct sched_resource {
 
 #define curr_on_cpu(c)    (get_sched_res(c)->curr)
 
-DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 48/60] xen/sched: move per-cpu variable cpupool to struct sched_resource
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Having a pointer to struct cpupool in struct sched_resource instead
of per cpu is enough.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       | 4 +---
 xen/common/sched_credit.c  | 2 +-
 xen/common/sched_rt.c      | 2 +-
 xen/common/schedule.c      | 8 ++++----
 xen/include/xen/sched-if.h | 2 +-
 5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index ba76045937..aa0428cdc0 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -34,8 +34,6 @@ static cpumask_t cpupool_locked_cpus;
 
 static DEFINE_SPINLOCK(cpupool_lock);
 
-DEFINE_PER_CPU(struct cpupool *, cpupool);
-
 #define cpupool_dprintk(x...) ((void)0)
 
 static void free_cpupool_struct(struct cpupool *c)
@@ -494,7 +492,7 @@ static int cpupool_cpu_add(unsigned int cpu)
      * (or unplugging would have failed) and that is the default behavior
      * anyway.
      */
-    per_cpu(cpupool, cpu) = NULL;
+    get_sched_res(cpu)->cpupool = NULL;
     ret = cpupool_assign_cpu_locked(cpupool0, cpu);
 
     spin_unlock(&cpupool_lock);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 5e60788112..969ac4cc20 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1687,7 +1687,7 @@ static struct csched_unit *
 csched_load_balance(struct csched_private *prv, int cpu,
     struct csched_unit *snext, bool *stolen)
 {
-    struct cpupool *c = per_cpu(cpupool, cpu);
+    struct cpupool *c = get_sched_res(cpu)->cpupool;
     struct csched_unit *speer;
     cpumask_t workers;
     cpumask_t *online;
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 6258dd2f2a..88c6b9615e 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -776,7 +776,7 @@ rt_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 
     if ( prv->repl_timer.cpu == cpu )
     {
-        struct cpupool *c = per_cpu(cpupool, cpu);
+        struct cpupool *c = get_sched_res(cpu)->cpupool;
         unsigned int new_cpu = cpumask_cycle(cpu, cpupool_online_cpumask(c));
 
         /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 34c95d1dc6..3c85861b15 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1011,7 +1011,7 @@ int cpu_disable_scheduler(unsigned int cpu)
     cpumask_t online_affinity;
     int ret = 0;
 
-    c = per_cpu(cpupool, cpu);
+    c = get_sched_res(cpu)->cpupool;
     if ( c == NULL )
         return ret;
 
@@ -1079,7 +1079,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
     struct cpupool *c;
     struct sched_unit *unit;
 
-    c = per_cpu(cpupool, cpu);
+    c = get_sched_res(cpu)->cpupool;
     if ( c == NULL )
         return 0;
 
@@ -2408,8 +2408,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = get_sched_res(cpu)->scheduler;
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
-    struct cpupool *old_pool = per_cpu(cpupool, cpu);
     struct sched_resource *sd = get_sched_res(cpu);
+    struct cpupool *old_pool = sd->cpupool;
     spinlock_t *old_lock, *new_lock;
 
     /*
@@ -2494,7 +2494,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
  out:
-    per_cpu(cpupool, cpu) = c;
+    get_sched_res(cpu)->cpupool = c;
     /* When a cpu is added to a pool, trigger it to go pick up some work */
     if ( c != NULL )
         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index ad6cf43425..f75f9673e9 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -37,6 +37,7 @@ extern const cpumask_t *sched_res_mask;
  * one it wants (This may be the one right in front of it).*/
 struct sched_resource {
     struct scheduler   *scheduler;
+    struct cpupool     *cpupool;
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_unit  *curr;           /* current task                    */
@@ -51,7 +52,6 @@ struct sched_resource {
 
 #define curr_on_cpu(c)    (get_sched_res(c)->curr)
 
-DECLARE_PER_CPU(struct cpupool *, cpupool);
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
 
 static inline struct sched_resource *get_sched_res(unsigned int cpu)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 48/60] xen/sched: move per-cpu variable cpupool to struct sched_resource
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Having a pointer to struct cpupool in struct sched_resource instead
of per cpu is enough.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       | 4 +---
 xen/common/sched_credit.c  | 2 +-
 xen/common/sched_rt.c      | 2 +-
 xen/common/schedule.c      | 8 ++++----
 xen/include/xen/sched-if.h | 2 +-
 5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index ba76045937..aa0428cdc0 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -34,8 +34,6 @@ static cpumask_t cpupool_locked_cpus;
 
 static DEFINE_SPINLOCK(cpupool_lock);
 
-DEFINE_PER_CPU(struct cpupool *, cpupool);
-
 #define cpupool_dprintk(x...) ((void)0)
 
 static void free_cpupool_struct(struct cpupool *c)
@@ -494,7 +492,7 @@ static int cpupool_cpu_add(unsigned int cpu)
      * (or unplugging would have failed) and that is the default behavior
      * anyway.
      */
-    per_cpu(cpupool, cpu) = NULL;
+    get_sched_res(cpu)->cpupool = NULL;
     ret = cpupool_assign_cpu_locked(cpupool0, cpu);
 
     spin_unlock(&cpupool_lock);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 5e60788112..969ac4cc20 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1687,7 +1687,7 @@ static struct csched_unit *
 csched_load_balance(struct csched_private *prv, int cpu,
     struct csched_unit *snext, bool *stolen)
 {
-    struct cpupool *c = per_cpu(cpupool, cpu);
+    struct cpupool *c = get_sched_res(cpu)->cpupool;
     struct csched_unit *speer;
     cpumask_t workers;
     cpumask_t *online;
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 6258dd2f2a..88c6b9615e 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -776,7 +776,7 @@ rt_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 
     if ( prv->repl_timer.cpu == cpu )
     {
-        struct cpupool *c = per_cpu(cpupool, cpu);
+        struct cpupool *c = get_sched_res(cpu)->cpupool;
         unsigned int new_cpu = cpumask_cycle(cpu, cpupool_online_cpumask(c));
 
         /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 34c95d1dc6..3c85861b15 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1011,7 +1011,7 @@ int cpu_disable_scheduler(unsigned int cpu)
     cpumask_t online_affinity;
     int ret = 0;
 
-    c = per_cpu(cpupool, cpu);
+    c = get_sched_res(cpu)->cpupool;
     if ( c == NULL )
         return ret;
 
@@ -1079,7 +1079,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
     struct cpupool *c;
     struct sched_unit *unit;
 
-    c = per_cpu(cpupool, cpu);
+    c = get_sched_res(cpu)->cpupool;
     if ( c == NULL )
         return 0;
 
@@ -2408,8 +2408,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = get_sched_res(cpu)->scheduler;
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
-    struct cpupool *old_pool = per_cpu(cpupool, cpu);
     struct sched_resource *sd = get_sched_res(cpu);
+    struct cpupool *old_pool = sd->cpupool;
     spinlock_t *old_lock, *new_lock;
 
     /*
@@ -2494,7 +2494,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
  out:
-    per_cpu(cpupool, cpu) = c;
+    get_sched_res(cpu)->cpupool = c;
     /* When a cpu is added to a pool, trigger it to go pick up some work */
     if ( c != NULL )
         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index ad6cf43425..f75f9673e9 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -37,6 +37,7 @@ extern const cpumask_t *sched_res_mask;
  * one it wants (This may be the one right in front of it).*/
 struct sched_resource {
     struct scheduler   *scheduler;
+    struct cpupool     *cpupool;
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_unit  *curr;           /* current task                    */
@@ -51,7 +52,6 @@ struct sched_resource {
 
 #define curr_on_cpu(c)    (get_sched_res(c)->curr)
 
-DECLARE_PER_CPU(struct cpupool *, cpupool);
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
 
 static inline struct sched_resource *get_sched_res(unsigned int cpu)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 49/60] xen/sched: reject switching smt on/off with core scheduling active
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

When core or socket scheduling are active enabling or disabling smt is
not possible as that would require a major host reconfiguration.

Add a bool sched_disable_smt_switching which will be set for core or
socket scheduling.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/arch/x86/sysctl.c   | 3 ++-
 xen/common/schedule.c   | 1 +
 xen/include/xen/sched.h | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 3f06fecbd8..034d78fe67 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -200,7 +200,8 @@ long arch_do_sysctl(
 
         case XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE:
         case XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE:
-            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 )
+            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 ||
+                 sched_disable_smt_switching )
             {
                 ret = -EOPNOTSUPP;
                 break;
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 3c85861b15..8607262a71 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -57,6 +57,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
 
 /* Number of vcpus per struct sched_unit. */
 static unsigned int sched_granularity = 1;
+bool sched_disable_smt_switching;
 const cpumask_t *sched_res_mask = &cpumask_all;
 
 /* Various timer handlers. */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index b6496f57f6..7dc63c449b 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1020,6 +1020,7 @@ static inline bool is_vcpu_online(const struct vcpu *v)
 }
 
 extern bool sched_smt_power_savings;
+extern bool sched_disable_smt_switching;
 
 extern enum cpufreq_controller {
     FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 49/60] xen/sched: reject switching smt on/off with core scheduling active
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

When core or socket scheduling are active enabling or disabling smt is
not possible as that would require a major host reconfiguration.

Add a bool sched_disable_smt_switching which will be set for core or
socket scheduling.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/arch/x86/sysctl.c   | 3 ++-
 xen/common/schedule.c   | 1 +
 xen/include/xen/sched.h | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 3f06fecbd8..034d78fe67 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -200,7 +200,8 @@ long arch_do_sysctl(
 
         case XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE:
         case XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE:
-            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 )
+            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 ||
+                 sched_disable_smt_switching )
             {
                 ret = -EOPNOTSUPP;
                 break;
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 3c85861b15..8607262a71 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -57,6 +57,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
 
 /* Number of vcpus per struct sched_unit. */
 static unsigned int sched_granularity = 1;
+bool sched_disable_smt_switching;
 const cpumask_t *sched_res_mask = &cpumask_all;
 
 /* Various timer handlers. */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index b6496f57f6..7dc63c449b 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1020,6 +1020,7 @@ static inline bool is_vcpu_online(const struct vcpu *v)
 }
 
 extern bool sched_smt_power_savings;
+extern bool sched_disable_smt_switching;
 
 extern enum cpufreq_controller {
     FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 50/60] xen/sched: prepare per-cpupool scheduling granularity
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

On- and offlining cpus with core scheduling is rather complicated as
the cpus are taken on- or offline one by one, but scheduling wants them
rather to be handled per core.

As the future plan is to be able to select scheduling granularity per
cpupool prepare that by storing the granularity in struct cpupool and
struct sched_resource (we need it there for free cpus which are not
associated to any cpupool). Free cpus will always use granularity 1.

Store the selected granularity option (cpu, core or socket) in the
cpupool as well, as we will need it to select the appropriate cpu mask
when populating the cpupool with cpus.

This will make on- and offlining of cpus much easier and avoids
writing code which would needed to be thrown away later.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       |  2 ++
 xen/common/schedule.c      | 23 +++++++++++++++--------
 xen/include/xen/sched-if.h | 12 ++++++++++++
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index aa0428cdc0..403036c092 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -177,6 +177,8 @@ static struct cpupool *cpupool_create(
             return NULL;
         }
     }
+    c->granularity = sched_granularity;
+    c->opt_granularity = opt_sched_granularity;
 
     *q = c;
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 8607262a71..7fd83ffd4e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -56,7 +56,8 @@ int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
 
 /* Number of vcpus per struct sched_unit. */
-static unsigned int sched_granularity = 1;
+enum sched_gran opt_sched_granularity = SCHED_GRAN_cpu;
+unsigned int sched_granularity = 1;
 bool sched_disable_smt_switching;
 const cpumask_t *sched_res_mask = &cpumask_all;
 
@@ -350,10 +351,10 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 {
     struct sched_unit *unit, **prev_unit;
     struct domain *d = v->domain;
+    unsigned int gran = d->cpupool ? d->cpupool->granularity : 1;
 
     for_each_sched_unit ( d, unit )
-        if ( unit->vcpu->vcpu_id / sched_granularity ==
-             v->vcpu_id / sched_granularity )
+        if ( unit->vcpu->vcpu_id / gran == v->vcpu_id / gran )
             break;
 
     if ( unit )
@@ -1696,11 +1697,11 @@ static void sched_switch_units(struct sched_resource *sd,
         if ( is_idle_unit(prev) )
         {
             prev->runstate_cnt[RUNSTATE_running] = 0;
-            prev->runstate_cnt[RUNSTATE_runnable] = sched_granularity;
+            prev->runstate_cnt[RUNSTATE_runnable] = 1;
         }
         if ( is_idle_unit(next) )
         {
-            next->runstate_cnt[RUNSTATE_running] = sched_granularity;
+            next->runstate_cnt[RUNSTATE_running] = 1;
             next->runstate_cnt[RUNSTATE_runnable] = 0;
         }
     }
@@ -1946,11 +1947,12 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
 {
     struct sched_unit *next;
     struct vcpu *v;
+    unsigned int gran = get_sched_res(cpu)->granularity;
 
     if ( !--prev->rendezvous_in_cnt )
     {
         next = do_schedule(prev, now, cpu);
-        atomic_set(&next->rendezvous_out_cnt, sched_granularity + 1);
+        atomic_set(&next->rendezvous_out_cnt, gran + 1);
         return next;
     }
 
@@ -2054,6 +2056,7 @@ static void schedule(void)
     struct sched_resource *sd;
     spinlock_t           *lock;
     int cpu = smp_processor_id();
+    unsigned int          gran = get_sched_res(cpu)->granularity;
 
     ASSERT_NOT_IN_ATOMIC();
 
@@ -2079,11 +2082,11 @@ static void schedule(void)
 
     stop_timer(&sd->s_timer);
 
-    if ( sched_granularity > 1 )
+    if ( gran > 1 )
     {
         cpumask_t mask;
 
-        prev->rendezvous_in_cnt = sched_granularity;
+        prev->rendezvous_in_cnt = gran;
         cpumask_andnot(&mask, sd->cpus, cpumask_of(cpu));
         cpumask_raise_softirq(&mask, SCHED_SLAVE_SOFTIRQ);
         next = sched_wait_rendezvous_in(prev, lock, cpu, now);
@@ -2150,6 +2153,9 @@ static int cpu_schedule_up(unsigned int cpu)
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&sd->urgent_count, 0);
 
+    /* We start with cpu granularity. */
+    sd->granularity = 1;
+
     /* Boot CPU is dealt with later in schedule_init(). */
     if ( cpu == 0 )
         return 0;
@@ -2495,6 +2501,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
  out:
+    get_sched_res(cpu)->granularity = c ? c->granularity : 1;
     get_sched_res(cpu)->cpupool = c;
     /* When a cpu is added to a pool, trigger it to go pick up some work */
     if ( c != NULL )
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index f75f9673e9..a0f11d0c15 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -25,6 +25,15 @@ extern int sched_ratelimit_us;
 /* Scheduling resource mask. */
 extern const cpumask_t *sched_res_mask;
 
+/* Number of vcpus per struct sched_unit. */
+enum sched_gran {
+    SCHED_GRAN_cpu,
+    SCHED_GRAN_core,
+    SCHED_GRAN_socket
+};
+extern enum sched_gran opt_sched_granularity;
+extern unsigned int sched_granularity;
+
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
  * we have a per-cpu pointer, along with a pre-allocated set of
@@ -47,6 +56,7 @@ struct sched_resource {
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
     unsigned int        processor;
+    unsigned int        granularity;
     const cpumask_t    *cpus;           /* cpus covered by this struct     */
 };
 
@@ -520,6 +530,8 @@ struct cpupool
     unsigned int     n_dom;
     struct scheduler *sched;
     atomic_t         refcnt;
+    unsigned int     granularity;
+    enum sched_gran  opt_granularity;
 };
 
 #define cpupool_online_cpumask(_pool) \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 50/60] xen/sched: prepare per-cpupool scheduling granularity
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

On- and offlining cpus with core scheduling is rather complicated as
the cpus are taken on- or offline one by one, but scheduling wants them
rather to be handled per core.

As the future plan is to be able to select scheduling granularity per
cpupool prepare that by storing the granularity in struct cpupool and
struct sched_resource (we need it there for free cpus which are not
associated to any cpupool). Free cpus will always use granularity 1.

Store the selected granularity option (cpu, core or socket) in the
cpupool as well, as we will need it to select the appropriate cpu mask
when populating the cpupool with cpus.

This will make on- and offlining of cpus much easier and avoids
writing code which would needed to be thrown away later.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       |  2 ++
 xen/common/schedule.c      | 23 +++++++++++++++--------
 xen/include/xen/sched-if.h | 12 ++++++++++++
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index aa0428cdc0..403036c092 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -177,6 +177,8 @@ static struct cpupool *cpupool_create(
             return NULL;
         }
     }
+    c->granularity = sched_granularity;
+    c->opt_granularity = opt_sched_granularity;
 
     *q = c;
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 8607262a71..7fd83ffd4e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -56,7 +56,8 @@ int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
 
 /* Number of vcpus per struct sched_unit. */
-static unsigned int sched_granularity = 1;
+enum sched_gran opt_sched_granularity = SCHED_GRAN_cpu;
+unsigned int sched_granularity = 1;
 bool sched_disable_smt_switching;
 const cpumask_t *sched_res_mask = &cpumask_all;
 
@@ -350,10 +351,10 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 {
     struct sched_unit *unit, **prev_unit;
     struct domain *d = v->domain;
+    unsigned int gran = d->cpupool ? d->cpupool->granularity : 1;
 
     for_each_sched_unit ( d, unit )
-        if ( unit->vcpu->vcpu_id / sched_granularity ==
-             v->vcpu_id / sched_granularity )
+        if ( unit->vcpu->vcpu_id / gran == v->vcpu_id / gran )
             break;
 
     if ( unit )
@@ -1696,11 +1697,11 @@ static void sched_switch_units(struct sched_resource *sd,
         if ( is_idle_unit(prev) )
         {
             prev->runstate_cnt[RUNSTATE_running] = 0;
-            prev->runstate_cnt[RUNSTATE_runnable] = sched_granularity;
+            prev->runstate_cnt[RUNSTATE_runnable] = 1;
         }
         if ( is_idle_unit(next) )
         {
-            next->runstate_cnt[RUNSTATE_running] = sched_granularity;
+            next->runstate_cnt[RUNSTATE_running] = 1;
             next->runstate_cnt[RUNSTATE_runnable] = 0;
         }
     }
@@ -1946,11 +1947,12 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
 {
     struct sched_unit *next;
     struct vcpu *v;
+    unsigned int gran = get_sched_res(cpu)->granularity;
 
     if ( !--prev->rendezvous_in_cnt )
     {
         next = do_schedule(prev, now, cpu);
-        atomic_set(&next->rendezvous_out_cnt, sched_granularity + 1);
+        atomic_set(&next->rendezvous_out_cnt, gran + 1);
         return next;
     }
 
@@ -2054,6 +2056,7 @@ static void schedule(void)
     struct sched_resource *sd;
     spinlock_t           *lock;
     int cpu = smp_processor_id();
+    unsigned int          gran = get_sched_res(cpu)->granularity;
 
     ASSERT_NOT_IN_ATOMIC();
 
@@ -2079,11 +2082,11 @@ static void schedule(void)
 
     stop_timer(&sd->s_timer);
 
-    if ( sched_granularity > 1 )
+    if ( gran > 1 )
     {
         cpumask_t mask;
 
-        prev->rendezvous_in_cnt = sched_granularity;
+        prev->rendezvous_in_cnt = gran;
         cpumask_andnot(&mask, sd->cpus, cpumask_of(cpu));
         cpumask_raise_softirq(&mask, SCHED_SLAVE_SOFTIRQ);
         next = sched_wait_rendezvous_in(prev, lock, cpu, now);
@@ -2150,6 +2153,9 @@ static int cpu_schedule_up(unsigned int cpu)
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&sd->urgent_count, 0);
 
+    /* We start with cpu granularity. */
+    sd->granularity = 1;
+
     /* Boot CPU is dealt with later in schedule_init(). */
     if ( cpu == 0 )
         return 0;
@@ -2495,6 +2501,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
  out:
+    get_sched_res(cpu)->granularity = c ? c->granularity : 1;
     get_sched_res(cpu)->cpupool = c;
     /* When a cpu is added to a pool, trigger it to go pick up some work */
     if ( c != NULL )
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index f75f9673e9..a0f11d0c15 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -25,6 +25,15 @@ extern int sched_ratelimit_us;
 /* Scheduling resource mask. */
 extern const cpumask_t *sched_res_mask;
 
+/* Number of vcpus per struct sched_unit. */
+enum sched_gran {
+    SCHED_GRAN_cpu,
+    SCHED_GRAN_core,
+    SCHED_GRAN_socket
+};
+extern enum sched_gran opt_sched_granularity;
+extern unsigned int sched_granularity;
+
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
  * we have a per-cpu pointer, along with a pre-allocated set of
@@ -47,6 +56,7 @@ struct sched_resource {
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
     unsigned int        processor;
+    unsigned int        granularity;
     const cpumask_t    *cpus;           /* cpus covered by this struct     */
 };
 
@@ -520,6 +530,8 @@ struct cpupool
     unsigned int     n_dom;
     struct scheduler *sched;
     atomic_t         refcnt;
+    unsigned int     granularity;
+    enum sched_gran  opt_granularity;
 };
 
 #define cpupool_online_cpumask(_pool) \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 51/60] xen/sched: use one schedule lock for all free cpus
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

In order to prepare using always cpu scheduling for free cpus
regardless of other cpupools scheduling granularity always use a single
fixed lock for all free cpus shared by all schedulers. This will allow
to move any number of free cpus to a cpupool guarded by only one lock.

This requires to drop ASSERTs regarding the lock in some schedulers.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c | 9 ---------
 xen/common/sched_null.c   | 7 -------
 xen/common/schedule.c     | 7 +++++--
 3 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 969ac4cc20..0b14fa9e11 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -616,15 +616,6 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     unsigned long flags;
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct sched_resource *sd = get_sched_res(cpu);
-
-    /*
-     * This is called either during during boot, resume or hotplug, in
-     * case Credit1 is the scheduler chosen at boot. In such cases, the
-     * scheduler lock for cpu is already pointing to the default per-cpu
-     * spinlock, as Credit1 needs it, so there is no remapping to be done.
-     */
-    ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock));
 
     spin_lock_irqsave(&prv->lock, flags);
     init_pdata(prv, pdata, cpu);
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index e9336a2948..1499c82422 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -169,17 +169,10 @@ static void init_pdata(struct null_private *prv, unsigned int cpu)
 static void null_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     struct null_private *prv = null_priv(ops);
-    struct sched_resource *sd = get_sched_res(cpu);
 
     /* alloc_pdata is not implemented, so we want this to be NULL. */
     ASSERT(!pdata);
 
-    /*
-     * The scheduler lock points already to the default per-cpu spinlock,
-     * so there is no remapping to be done.
-     */
-    ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock));
-
     init_pdata(prv, cpu);
 }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7fd83ffd4e..44364ff4d2 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -61,6 +61,9 @@ unsigned int sched_granularity = 1;
 bool sched_disable_smt_switching;
 const cpumask_t *sched_res_mask = &cpumask_all;
 
+/* Common lock for free cpus. */
+static DEFINE_SPINLOCK(sched_free_cpu_lock);
+
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
 static void vcpu_periodic_timer_fn(void *data);
@@ -2149,7 +2152,7 @@ static int cpu_schedule_up(unsigned int cpu)
 
     sd->scheduler = &ops;
     spin_lock_init(&sd->_lock);
-    sd->schedule_lock = &sd->_lock;
+    sd->schedule_lock = &sched_free_cpu_lock;
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&sd->urgent_count, 0);
 
@@ -2488,7 +2491,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      * taking it, finds all the initializations we've done above in place.
      */
     smp_mb();
-    sd->schedule_lock = new_lock;
+    sd->schedule_lock = c ? new_lock : &sched_free_cpu_lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock_irq(old_lock);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 51/60] xen/sched: use one schedule lock for all free cpus
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

In order to prepare using always cpu scheduling for free cpus
regardless of other cpupools scheduling granularity always use a single
fixed lock for all free cpus shared by all schedulers. This will allow
to move any number of free cpus to a cpupool guarded by only one lock.

This requires to drop ASSERTs regarding the lock in some schedulers.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c | 9 ---------
 xen/common/sched_null.c   | 7 -------
 xen/common/schedule.c     | 7 +++++--
 3 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 969ac4cc20..0b14fa9e11 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -616,15 +616,6 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     unsigned long flags;
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct sched_resource *sd = get_sched_res(cpu);
-
-    /*
-     * This is called either during during boot, resume or hotplug, in
-     * case Credit1 is the scheduler chosen at boot. In such cases, the
-     * scheduler lock for cpu is already pointing to the default per-cpu
-     * spinlock, as Credit1 needs it, so there is no remapping to be done.
-     */
-    ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock));
 
     spin_lock_irqsave(&prv->lock, flags);
     init_pdata(prv, pdata, cpu);
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index e9336a2948..1499c82422 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -169,17 +169,10 @@ static void init_pdata(struct null_private *prv, unsigned int cpu)
 static void null_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     struct null_private *prv = null_priv(ops);
-    struct sched_resource *sd = get_sched_res(cpu);
 
     /* alloc_pdata is not implemented, so we want this to be NULL. */
     ASSERT(!pdata);
 
-    /*
-     * The scheduler lock points already to the default per-cpu spinlock,
-     * so there is no remapping to be done.
-     */
-    ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock));
-
     init_pdata(prv, cpu);
 }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7fd83ffd4e..44364ff4d2 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -61,6 +61,9 @@ unsigned int sched_granularity = 1;
 bool sched_disable_smt_switching;
 const cpumask_t *sched_res_mask = &cpumask_all;
 
+/* Common lock for free cpus. */
+static DEFINE_SPINLOCK(sched_free_cpu_lock);
+
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
 static void vcpu_periodic_timer_fn(void *data);
@@ -2149,7 +2152,7 @@ static int cpu_schedule_up(unsigned int cpu)
 
     sd->scheduler = &ops;
     spin_lock_init(&sd->_lock);
-    sd->schedule_lock = &sd->_lock;
+    sd->schedule_lock = &sched_free_cpu_lock;
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&sd->urgent_count, 0);
 
@@ -2488,7 +2491,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      * taking it, finds all the initializations we've done above in place.
      */
     smp_mb();
-    sd->schedule_lock = new_lock;
+    sd->schedule_lock = c ? new_lock : &sched_free_cpu_lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock_irq(old_lock);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 52/60] xen/sched: populate cpupool0 only after all cpus are up
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Dario Faggioli

With core or socket scheduling we need to know the number of siblings
per scheduling unit before we can setup the scheduler properly. In
order to prepare that do cpupool0 population only after all cpus are
up.

With that in place there is no need to create cpupool0 earlier, so
do that just before assigning the cpus. Initialize free cpus with all
online cpus at that time in order to be able to add the cpu notifier
late, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 403036c092..2a3e144700 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -775,18 +775,28 @@ static struct notifier_block cpu_nfb = {
     .notifier_call = cpu_callback
 };
 
-static int __init cpupool_presmp_init(void)
+static int __init cpupool_init(void)
 {
+    unsigned int cpu;
     int err;
-    void *cpu = (void *)(long)smp_processor_id();
+
     cpupool0 = cpupool_create(0, 0, &err);
     BUG_ON(cpupool0 == NULL);
     cpupool_put(cpupool0);
-    cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
     register_cpu_notifier(&cpu_nfb);
+
+    spin_lock(&cpupool_lock);
+
+    cpumask_copy(&cpupool_free_cpus, &cpu_online_map);
+
+    for_each_cpu ( cpu, &cpupool_free_cpus )
+        cpupool_assign_cpu_locked(cpupool0, cpu);
+
+    spin_unlock(&cpupool_lock);
+
     return 0;
 }
-presmp_initcall(cpupool_presmp_init);
+__initcall(cpupool_init);
 
 /*
  * Local variables:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 52/60] xen/sched: populate cpupool0 only after all cpus are up
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Dario Faggioli

With core or socket scheduling we need to know the number of siblings
per scheduling unit before we can setup the scheduler properly. In
order to prepare that do cpupool0 population only after all cpus are
up.

With that in place there is no need to create cpupool0 earlier, so
do that just before assigning the cpus. Initialize free cpus with all
online cpus at that time in order to be able to add the cpu notifier
late, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 403036c092..2a3e144700 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -775,18 +775,28 @@ static struct notifier_block cpu_nfb = {
     .notifier_call = cpu_callback
 };
 
-static int __init cpupool_presmp_init(void)
+static int __init cpupool_init(void)
 {
+    unsigned int cpu;
     int err;
-    void *cpu = (void *)(long)smp_processor_id();
+
     cpupool0 = cpupool_create(0, 0, &err);
     BUG_ON(cpupool0 == NULL);
     cpupool_put(cpupool0);
-    cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
     register_cpu_notifier(&cpu_nfb);
+
+    spin_lock(&cpupool_lock);
+
+    cpumask_copy(&cpupool_free_cpus, &cpu_online_map);
+
+    for_each_cpu ( cpu, &cpupool_free_cpus )
+        cpupool_assign_cpu_locked(cpupool0, cpu);
+
+    spin_unlock(&cpupool_lock);
+
     return 0;
 }
-presmp_initcall(cpupool_presmp_init);
+__initcall(cpupool_init);
 
 /*
  * Local variables:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 53/60] xen/sched: remove cpu from pool0 before removing it
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Today a cpu which is removed from the system is taken directly from
Pool0 to the offline state. This will conflict with core scheduling,
so remove it from Pool0 first. Additionally accept removing a free cpu
instead of requiring it to be in Pool0.

For the resume failed case we need to call the scheduler code for that
situation after the cpupool handling, so move the scheduler code into
a function and call it from cpupool_cpu_remove_forced() and remove the
CPU_RESUME_FAILED case from cpu_schedule_callback().

Note that we are calling now schedule_cpu_switch() in stop_machine
context so we need to switch from spinlock_irq to spinlock_irqsave.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       | 177 +++++++++++++++++++++++++++------------------
 xen/common/schedule.c      |  29 +++++---
 xen/include/xen/sched-if.h |   2 +
 3 files changed, 128 insertions(+), 80 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 2a3e144700..ab4a2be4fc 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -292,22 +292,14 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
     return 0;
 }
 
-static long cpupool_unassign_cpu_helper(void *info)
+static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
 {
     int cpu = cpupool_moving_cpu;
-    struct cpupool *c = info;
     struct domain *d;
-    long ret;
-
-    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
-                    cpupool_cpu_moving->cpupool_id, cpu);
+    int ret;
 
-    spin_lock(&cpupool_lock);
     if ( c != cpupool_cpu_moving )
-    {
-        ret = -EADDRNOTAVAIL;
-        goto out;
-    }
+        return -EADDRNOTAVAIL;
 
     /*
      * We need this for scanning the domain list, both in
@@ -342,39 +334,19 @@ static long cpupool_unassign_cpu_helper(void *info)
         domain_update_node_affinity(d);
     }
     rcu_read_unlock(&domlist_read_lock);
-out:
-    spin_unlock(&cpupool_lock);
-    cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
+
     return ret;
 }
 
-/*
- * unassign a specific cpu from a cpupool
- * we must be sure not to run on the cpu to be unassigned! to achieve this
- * the main functionality is performed via continue_hypercall_on_cpu on a
- * specific cpu.
- * if the cpu to be removed is the last one of the cpupool no active domain
- * must be bound to the cpupool. dying domains are moved to cpupool0 as they
- * might be zombies.
- * possible failures:
- * - last cpu and still active domains in cpupool
- * - cpu just being unplugged
- */
-static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
+static int cpupool_unassign_cpu_prologue(struct cpupool *c, unsigned int cpu)
 {
-    int work_cpu;
     int ret;
     struct domain *d;
 
-    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
-                    c->cpupool_id, cpu);
-
     spin_lock(&cpupool_lock);
     ret = -EADDRNOTAVAIL;
     if ( (cpupool_moving_cpu != -1) && (cpu != cpupool_moving_cpu) )
         goto out;
-    if ( cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
-        goto out;
 
     ret = 0;
     if ( !cpumask_test_cpu(cpu, c->cpu_valid) && (cpu != cpupool_moving_cpu) )
@@ -386,7 +358,7 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
         rcu_read_lock(&domlist_read_lock);
         for_each_domain_in_cpupool(d, c)
         {
-            if ( !d->is_dying )
+            if ( !d->is_dying && system_state == SYS_STATE_active )
             {
                 ret = -EBUSY;
                 break;
@@ -404,8 +376,58 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
     cpupool_cpu_moving = c;
     cpumask_clear_cpu(cpu, c->cpu_valid);
     cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
+
+out:
     spin_unlock(&cpupool_lock);
 
+    return ret;
+}
+
+static long cpupool_unassign_cpu_helper(void *info)
+{
+    struct cpupool *c = info;
+    long ret;
+
+    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
+                    cpupool_cpu_moving->cpupool_id, cpu);
+    spin_lock(&cpupool_lock);
+
+    ret = cpupool_unassign_cpu_epilogue(c);
+
+    spin_unlock(&cpupool_lock);
+    cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
+
+    return ret;
+}
+
+/*
+ * unassign a specific cpu from a cpupool
+ * we must be sure not to run on the cpu to be unassigned! to achieve this
+ * the main functionality is performed via continue_hypercall_on_cpu on a
+ * specific cpu.
+ * if the cpu to be removed is the last one of the cpupool no active domain
+ * must be bound to the cpupool. dying domains are moved to cpupool0 as they
+ * might be zombies.
+ * possible failures:
+ * - last cpu and still active domains in cpupool
+ * - cpu just being unplugged
+ */
+static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
+{
+    int work_cpu;
+    int ret;
+
+    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
+                    c->cpupool_id, cpu);
+
+    ret = cpupool_unassign_cpu_prologue(c, cpu);
+    if ( ret )
+    {
+        cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d) ret %d\n",
+                        c->cpupool_id, cpu, ret);
+        return ret;
+    }
+
     work_cpu = smp_processor_id();
     if ( work_cpu == cpu )
     {
@@ -414,12 +436,6 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
             work_cpu = cpumask_next(cpu, cpupool0->cpu_valid);
     }
     return continue_hypercall_on_cpu(work_cpu, cpupool_unassign_cpu_helper, c);
-
-out:
-    spin_unlock(&cpupool_lock);
-    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d) ret %d\n",
-                    c->cpupool_id, cpu, ret);
-    return ret;
 }
 
 /*
@@ -503,31 +519,54 @@ static int cpupool_cpu_add(unsigned int cpu)
 }
 
 /*
- * Called to remove a CPU from a pool. The CPU is locked, to forbid removing
- * it from pool0. In fact, if we want to hot-unplug a CPU, it must belong to
- * pool0, or we fail.
+ * This function is called in stop_machine context, so we can be sure no
+ * non-idle vcpu is active on the system.
  */
-static int cpupool_cpu_remove(unsigned int cpu)
+static void cpupool_cpu_remove(unsigned int cpu)
 {
-    int ret = -ENODEV;
+    int ret;
 
-    spin_lock(&cpupool_lock);
+    ASSERT(is_idle_vcpu(current));
 
-    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
     {
-        /*
-         * If we are not suspending, we are hot-unplugging cpu, and that is
-         * allowed only for CPUs in pool0.
-         */
-        cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
-        cpumask_and(cpupool0->res_valid, cpupool0->cpu_valid, sched_res_mask);
-        ret = 0;
+        ret = cpupool_unassign_cpu_epilogue(cpupool0);
+        BUG_ON(ret);
     }
+}
 
-    if ( !ret )
+/*
+ * Called before a CPU is being removed from the system.
+ * Removing a CPU is allowed for free CPUs or CPUs in Pool-0 (those are moved
+ * to free cpus actually before removing them).
+ * The CPU is locked, to forbid adding it again to another cpupool.
+ */
+static int cpupool_cpu_remove_prologue(unsigned int cpu)
+{
+    int ret = 0;
+
+    spin_lock(&cpupool_lock);
+
+    if ( cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
+        ret = -EBUSY;
+    else
         cpumask_set_cpu(cpu, &cpupool_locked_cpus);
+
     spin_unlock(&cpupool_lock);
 
+    if ( ret )
+        return  ret;
+
+    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    {
+        /* Cpupool0 is populated only after all cpus are up. */
+        ASSERT(system_state == SYS_STATE_active);
+
+        ret = cpupool_unassign_cpu_prologue(cpupool0, cpu);
+    }
+    else if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+        ret = -ENODEV;
+
     return ret;
 }
 
@@ -535,13 +574,13 @@ static int cpupool_cpu_remove(unsigned int cpu)
  * Called during resume for all cpus which didn't come up again. The cpu must
  * be removed from the cpupool it is assigned to. In case a cpupool will be
  * left without cpu we move all domains of that cpupool to cpupool0.
+ * As we are called with all domains still frozen there is no need to take the
+ * cpupool lock here.
  */
 static void cpupool_cpu_remove_forced(unsigned int cpu)
 {
     struct cpupool **c;
-    struct domain *d;
-
-    spin_lock(&cpupool_lock);
+    int ret;
 
     if ( cpumask_test_cpu(cpu, &cpupool_free_cpus) )
         cpumask_clear_cpu(cpu, &cpupool_free_cpus);
@@ -551,19 +590,13 @@ static void cpupool_cpu_remove_forced(unsigned int cpu)
         {
             if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
             {
-                cpumask_clear_cpu(cpu, (*c)->cpu_valid);
-                if ( cpumask_weight((*c)->cpu_valid) == 0 )
-                {
-                    if ( *c == cpupool0 )
-                        panic("No cpu left in cpupool0\n");
-                    for_each_domain_in_cpupool(d, *c)
-                        cpupool_move_domain_locked(d, cpupool0);
-                }
+                ret = cpupool_unassign_cpu(*c, cpu);
+                BUG_ON(ret);
             }
         }
     }
 
-    spin_unlock(&cpupool_lock);
+    sched_rm_cpu(cpu);
 }
 
 /*
@@ -631,7 +664,8 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
         if ( cpu >= nr_cpu_ids )
             goto addcpu_out;
         ret = -ENODEV;
-        if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+        if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
+             cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
             goto addcpu_out;
         c = cpupool_find_by_id(op->cpupool_id);
         ret = -ENOENT;
@@ -759,7 +793,12 @@ static int cpu_callback(
     case CPU_DOWN_PREPARE:
         /* Suspend/Resume don't change assignments of cpus to cpupools. */
         if ( system_state <= SYS_STATE_active )
-            rc = cpupool_cpu_remove(cpu);
+            rc = cpupool_cpu_remove_prologue(cpu);
+        break;
+    case CPU_DYING:
+        /* Suspend/Resume don't change assignments of cpus to cpupools. */
+        if ( system_state <= SYS_STATE_active )
+            cpupool_cpu_remove(cpu);
         break;
     case CPU_RESUME_FAILED:
         cpupool_cpu_remove_forced(cpu);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 44364ff4d2..7a5ab4b1b6 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -2235,12 +2235,24 @@ void scheduler_percpu_init(unsigned int cpu)
         sched_init_pdata(sched, sd->sched_priv, cpu);
 }
 
+void sched_rm_cpu(unsigned int cpu)
+{
+    int rc;
+    struct sched_resource *sd = get_sched_res(cpu);
+    struct scheduler *sched = sd->scheduler;
+
+    rcu_read_lock(&domlist_read_lock);
+    rc = cpu_disable_scheduler(cpu);
+    BUG_ON(rc);
+    rcu_read_unlock(&domlist_read_lock);
+    sched_deinit_pdata(sched, sd->sched_priv, cpu);
+    cpu_schedule_down(cpu);
+}
+
 static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
-    struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *sched = sd->scheduler;
     int rc = 0;
 
     /*
@@ -2286,16 +2298,10 @@ static int cpu_schedule_callback(
         rc = cpu_disable_scheduler_check(cpu);
         rcu_read_unlock(&domlist_read_lock);
         break;
-    case CPU_RESUME_FAILED:
     case CPU_DEAD:
         if ( system_state == SYS_STATE_suspend )
             break;
-        rcu_read_lock(&domlist_read_lock);
-        rc = cpu_disable_scheduler(cpu);
-        BUG_ON(rc);
-        rcu_read_unlock(&domlist_read_lock);
-        sched_deinit_pdata(sched, sd->sched_priv, cpu);
-        cpu_schedule_down(cpu);
+        sched_rm_cpu(cpu);
         break;
     case CPU_UP_CANCELED:
         if ( system_state != SYS_STATE_resume )
@@ -2421,6 +2427,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     struct sched_resource *sd = get_sched_res(cpu);
     struct cpupool *old_pool = sd->cpupool;
     spinlock_t *old_lock, *new_lock;
+    unsigned long flags;
 
     /*
      * pCPUs only move from a valid cpupool to free (i.e., out of any pool),
@@ -2476,7 +2483,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      * that the lock itself changed, and retry acquiring the new one (which
      * will be the correct, remapped one, at that point).
      */
-    old_lock = pcpu_schedule_lock_irq(cpu);
+    old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
 
     vpriv_old = idle->sched_unit->priv;
     ppriv_old = sd->sched_priv;
@@ -2494,7 +2501,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     sd->schedule_lock = c ? new_lock : &sched_free_cpu_lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
-    spin_unlock_irq(old_lock);
+    spin_unlock_irqrestore(old_lock, flags);
 
     sched_do_tick_resume(new_ops, cpu);
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index a0f11d0c15..e04d249dfd 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -609,4 +609,6 @@ affinity_balance_cpumask(const struct sched_unit *unit, int step,
         cpumask_copy(mask, unit->cpu_hard_affinity);
 }
 
+void sched_rm_cpu(unsigned int cpu);
+
 #endif /* __XEN_SCHED_IF_H__ */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 53/60] xen/sched: remove cpu from pool0 before removing it
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Today a cpu which is removed from the system is taken directly from
Pool0 to the offline state. This will conflict with core scheduling,
so remove it from Pool0 first. Additionally accept removing a free cpu
instead of requiring it to be in Pool0.

For the resume failed case we need to call the scheduler code for that
situation after the cpupool handling, so move the scheduler code into
a function and call it from cpupool_cpu_remove_forced() and remove the
CPU_RESUME_FAILED case from cpu_schedule_callback().

Note that we are calling now schedule_cpu_switch() in stop_machine
context so we need to switch from spinlock_irq to spinlock_irqsave.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       | 177 +++++++++++++++++++++++++++------------------
 xen/common/schedule.c      |  29 +++++---
 xen/include/xen/sched-if.h |   2 +
 3 files changed, 128 insertions(+), 80 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 2a3e144700..ab4a2be4fc 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -292,22 +292,14 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
     return 0;
 }
 
-static long cpupool_unassign_cpu_helper(void *info)
+static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
 {
     int cpu = cpupool_moving_cpu;
-    struct cpupool *c = info;
     struct domain *d;
-    long ret;
-
-    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
-                    cpupool_cpu_moving->cpupool_id, cpu);
+    int ret;
 
-    spin_lock(&cpupool_lock);
     if ( c != cpupool_cpu_moving )
-    {
-        ret = -EADDRNOTAVAIL;
-        goto out;
-    }
+        return -EADDRNOTAVAIL;
 
     /*
      * We need this for scanning the domain list, both in
@@ -342,39 +334,19 @@ static long cpupool_unassign_cpu_helper(void *info)
         domain_update_node_affinity(d);
     }
     rcu_read_unlock(&domlist_read_lock);
-out:
-    spin_unlock(&cpupool_lock);
-    cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
+
     return ret;
 }
 
-/*
- * unassign a specific cpu from a cpupool
- * we must be sure not to run on the cpu to be unassigned! to achieve this
- * the main functionality is performed via continue_hypercall_on_cpu on a
- * specific cpu.
- * if the cpu to be removed is the last one of the cpupool no active domain
- * must be bound to the cpupool. dying domains are moved to cpupool0 as they
- * might be zombies.
- * possible failures:
- * - last cpu and still active domains in cpupool
- * - cpu just being unplugged
- */
-static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
+static int cpupool_unassign_cpu_prologue(struct cpupool *c, unsigned int cpu)
 {
-    int work_cpu;
     int ret;
     struct domain *d;
 
-    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
-                    c->cpupool_id, cpu);
-
     spin_lock(&cpupool_lock);
     ret = -EADDRNOTAVAIL;
     if ( (cpupool_moving_cpu != -1) && (cpu != cpupool_moving_cpu) )
         goto out;
-    if ( cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
-        goto out;
 
     ret = 0;
     if ( !cpumask_test_cpu(cpu, c->cpu_valid) && (cpu != cpupool_moving_cpu) )
@@ -386,7 +358,7 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
         rcu_read_lock(&domlist_read_lock);
         for_each_domain_in_cpupool(d, c)
         {
-            if ( !d->is_dying )
+            if ( !d->is_dying && system_state == SYS_STATE_active )
             {
                 ret = -EBUSY;
                 break;
@@ -404,8 +376,58 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
     cpupool_cpu_moving = c;
     cpumask_clear_cpu(cpu, c->cpu_valid);
     cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
+
+out:
     spin_unlock(&cpupool_lock);
 
+    return ret;
+}
+
+static long cpupool_unassign_cpu_helper(void *info)
+{
+    struct cpupool *c = info;
+    long ret;
+
+    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
+                    cpupool_cpu_moving->cpupool_id, cpu);
+    spin_lock(&cpupool_lock);
+
+    ret = cpupool_unassign_cpu_epilogue(c);
+
+    spin_unlock(&cpupool_lock);
+    cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
+
+    return ret;
+}
+
+/*
+ * unassign a specific cpu from a cpupool
+ * we must be sure not to run on the cpu to be unassigned! to achieve this
+ * the main functionality is performed via continue_hypercall_on_cpu on a
+ * specific cpu.
+ * if the cpu to be removed is the last one of the cpupool no active domain
+ * must be bound to the cpupool. dying domains are moved to cpupool0 as they
+ * might be zombies.
+ * possible failures:
+ * - last cpu and still active domains in cpupool
+ * - cpu just being unplugged
+ */
+static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
+{
+    int work_cpu;
+    int ret;
+
+    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
+                    c->cpupool_id, cpu);
+
+    ret = cpupool_unassign_cpu_prologue(c, cpu);
+    if ( ret )
+    {
+        cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d) ret %d\n",
+                        c->cpupool_id, cpu, ret);
+        return ret;
+    }
+
     work_cpu = smp_processor_id();
     if ( work_cpu == cpu )
     {
@@ -414,12 +436,6 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
             work_cpu = cpumask_next(cpu, cpupool0->cpu_valid);
     }
     return continue_hypercall_on_cpu(work_cpu, cpupool_unassign_cpu_helper, c);
-
-out:
-    spin_unlock(&cpupool_lock);
-    cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d) ret %d\n",
-                    c->cpupool_id, cpu, ret);
-    return ret;
 }
 
 /*
@@ -503,31 +519,54 @@ static int cpupool_cpu_add(unsigned int cpu)
 }
 
 /*
- * Called to remove a CPU from a pool. The CPU is locked, to forbid removing
- * it from pool0. In fact, if we want to hot-unplug a CPU, it must belong to
- * pool0, or we fail.
+ * This function is called in stop_machine context, so we can be sure no
+ * non-idle vcpu is active on the system.
  */
-static int cpupool_cpu_remove(unsigned int cpu)
+static void cpupool_cpu_remove(unsigned int cpu)
 {
-    int ret = -ENODEV;
+    int ret;
 
-    spin_lock(&cpupool_lock);
+    ASSERT(is_idle_vcpu(current));
 
-    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
     {
-        /*
-         * If we are not suspending, we are hot-unplugging cpu, and that is
-         * allowed only for CPUs in pool0.
-         */
-        cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
-        cpumask_and(cpupool0->res_valid, cpupool0->cpu_valid, sched_res_mask);
-        ret = 0;
+        ret = cpupool_unassign_cpu_epilogue(cpupool0);
+        BUG_ON(ret);
     }
+}
 
-    if ( !ret )
+/*
+ * Called before a CPU is being removed from the system.
+ * Removing a CPU is allowed for free CPUs or CPUs in Pool-0 (those are moved
+ * to free cpus actually before removing them).
+ * The CPU is locked, to forbid adding it again to another cpupool.
+ */
+static int cpupool_cpu_remove_prologue(unsigned int cpu)
+{
+    int ret = 0;
+
+    spin_lock(&cpupool_lock);
+
+    if ( cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
+        ret = -EBUSY;
+    else
         cpumask_set_cpu(cpu, &cpupool_locked_cpus);
+
     spin_unlock(&cpupool_lock);
 
+    if ( ret )
+        return  ret;
+
+    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    {
+        /* Cpupool0 is populated only after all cpus are up. */
+        ASSERT(system_state == SYS_STATE_active);
+
+        ret = cpupool_unassign_cpu_prologue(cpupool0, cpu);
+    }
+    else if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+        ret = -ENODEV;
+
     return ret;
 }
 
@@ -535,13 +574,13 @@ static int cpupool_cpu_remove(unsigned int cpu)
  * Called during resume for all cpus which didn't come up again. The cpu must
  * be removed from the cpupool it is assigned to. In case a cpupool will be
  * left without cpu we move all domains of that cpupool to cpupool0.
+ * As we are called with all domains still frozen there is no need to take the
+ * cpupool lock here.
  */
 static void cpupool_cpu_remove_forced(unsigned int cpu)
 {
     struct cpupool **c;
-    struct domain *d;
-
-    spin_lock(&cpupool_lock);
+    int ret;
 
     if ( cpumask_test_cpu(cpu, &cpupool_free_cpus) )
         cpumask_clear_cpu(cpu, &cpupool_free_cpus);
@@ -551,19 +590,13 @@ static void cpupool_cpu_remove_forced(unsigned int cpu)
         {
             if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
             {
-                cpumask_clear_cpu(cpu, (*c)->cpu_valid);
-                if ( cpumask_weight((*c)->cpu_valid) == 0 )
-                {
-                    if ( *c == cpupool0 )
-                        panic("No cpu left in cpupool0\n");
-                    for_each_domain_in_cpupool(d, *c)
-                        cpupool_move_domain_locked(d, cpupool0);
-                }
+                ret = cpupool_unassign_cpu(*c, cpu);
+                BUG_ON(ret);
             }
         }
     }
 
-    spin_unlock(&cpupool_lock);
+    sched_rm_cpu(cpu);
 }
 
 /*
@@ -631,7 +664,8 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
         if ( cpu >= nr_cpu_ids )
             goto addcpu_out;
         ret = -ENODEV;
-        if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+        if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
+             cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
             goto addcpu_out;
         c = cpupool_find_by_id(op->cpupool_id);
         ret = -ENOENT;
@@ -759,7 +793,12 @@ static int cpu_callback(
     case CPU_DOWN_PREPARE:
         /* Suspend/Resume don't change assignments of cpus to cpupools. */
         if ( system_state <= SYS_STATE_active )
-            rc = cpupool_cpu_remove(cpu);
+            rc = cpupool_cpu_remove_prologue(cpu);
+        break;
+    case CPU_DYING:
+        /* Suspend/Resume don't change assignments of cpus to cpupools. */
+        if ( system_state <= SYS_STATE_active )
+            cpupool_cpu_remove(cpu);
         break;
     case CPU_RESUME_FAILED:
         cpupool_cpu_remove_forced(cpu);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 44364ff4d2..7a5ab4b1b6 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -2235,12 +2235,24 @@ void scheduler_percpu_init(unsigned int cpu)
         sched_init_pdata(sched, sd->sched_priv, cpu);
 }
 
+void sched_rm_cpu(unsigned int cpu)
+{
+    int rc;
+    struct sched_resource *sd = get_sched_res(cpu);
+    struct scheduler *sched = sd->scheduler;
+
+    rcu_read_lock(&domlist_read_lock);
+    rc = cpu_disable_scheduler(cpu);
+    BUG_ON(rc);
+    rcu_read_unlock(&domlist_read_lock);
+    sched_deinit_pdata(sched, sd->sched_priv, cpu);
+    cpu_schedule_down(cpu);
+}
+
 static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
-    struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *sched = sd->scheduler;
     int rc = 0;
 
     /*
@@ -2286,16 +2298,10 @@ static int cpu_schedule_callback(
         rc = cpu_disable_scheduler_check(cpu);
         rcu_read_unlock(&domlist_read_lock);
         break;
-    case CPU_RESUME_FAILED:
     case CPU_DEAD:
         if ( system_state == SYS_STATE_suspend )
             break;
-        rcu_read_lock(&domlist_read_lock);
-        rc = cpu_disable_scheduler(cpu);
-        BUG_ON(rc);
-        rcu_read_unlock(&domlist_read_lock);
-        sched_deinit_pdata(sched, sd->sched_priv, cpu);
-        cpu_schedule_down(cpu);
+        sched_rm_cpu(cpu);
         break;
     case CPU_UP_CANCELED:
         if ( system_state != SYS_STATE_resume )
@@ -2421,6 +2427,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     struct sched_resource *sd = get_sched_res(cpu);
     struct cpupool *old_pool = sd->cpupool;
     spinlock_t *old_lock, *new_lock;
+    unsigned long flags;
 
     /*
      * pCPUs only move from a valid cpupool to free (i.e., out of any pool),
@@ -2476,7 +2483,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      * that the lock itself changed, and retry acquiring the new one (which
      * will be the correct, remapped one, at that point).
      */
-    old_lock = pcpu_schedule_lock_irq(cpu);
+    old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
 
     vpriv_old = idle->sched_unit->priv;
     ppriv_old = sd->sched_priv;
@@ -2494,7 +2501,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     sd->schedule_lock = c ? new_lock : &sched_free_cpu_lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
-    spin_unlock_irq(old_lock);
+    spin_unlock_irqrestore(old_lock, flags);
 
     sched_do_tick_resume(new_ops, cpu);
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index a0f11d0c15..e04d249dfd 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -609,4 +609,6 @@ affinity_balance_cpumask(const struct sched_unit *unit, int step,
         cpumask_copy(mask, unit->cpu_hard_affinity);
 }
 
+void sched_rm_cpu(unsigned int cpu);
+
 #endif /* __XEN_SCHED_IF_H__ */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Instead of having a full blown scheduler running for the free cpus
add a very minimalistic scheduler for that purpose only ever scheduling
the related idle vcpu. This has the big advantage of not needing any
per-cpu, per-domain or per-scheduling unit data for free cpus and in
turn simplifying moving cpus to and from cpupools a lot.

As this new scheduler is not user selectable don't register it as an
official scheduler, but just include it in schedule.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/arch/arm/smpboot.c  |   2 -
 xen/arch/x86/smpboot.c  |   2 -
 xen/common/schedule.c   | 143 +++++++++++++++++++++++-------------------------
 xen/include/xen/sched.h |   1 -
 4 files changed, 67 insertions(+), 81 deletions(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 9a6582f2a6..f756444362 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -350,8 +350,6 @@ void start_secondary(unsigned long boot_phys_offset,
 
     setup_cpu_sibling_map(cpuid);
 
-    scheduler_percpu_init(cpuid);
-
     /* Run local notifiers */
     notify_cpu_starting(cpuid);
     /*
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 7e95b2cdac..153bfbb4b7 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -382,8 +382,6 @@ void start_secondary(void *unused)
 
     set_cpu_sibling_map(cpu);
 
-    scheduler_percpu_init(cpu);
-
     init_percpu_time();
 
     setup_secondary_APIC_clock();
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7a5ab4b1b6..d3e4ae226c 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -83,6 +83,57 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr
 
 static struct scheduler __read_mostly ops;
 
+static spinlock_t *
+sched_idle_switch_sched(struct scheduler *new_ops, unsigned int cpu,
+                        void *pdata, void *vdata)
+{
+    sched_idle_unit(cpu)->priv = NULL;
+
+    return &sched_free_cpu_lock;
+}
+
+static struct sched_resource *
+sched_idle_res_pick(const struct scheduler *ops, struct sched_unit *unit)
+{
+    return unit->res;
+}
+
+static void *
+sched_idle_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
+                       void *dd)
+{
+    /* Any non-NULL pointer is fine here. */
+    return (void *)1UL;
+}
+
+static void
+sched_idle_free_vdata(const struct scheduler *ops, void *priv)
+{
+}
+
+static void sched_idle_schedule(
+    const struct scheduler *ops, struct sched_unit *unit, s_time_t now,
+    bool tasklet_work_scheduled)
+{
+    const unsigned int cpu = smp_processor_id();
+
+    unit->next_time = -1;
+    unit->next_task = sched_idle_unit(sched_get_resource_cpu(cpu));
+}
+
+static struct scheduler sched_idle_ops = {
+    .name           = "Idle Scheduler",
+    .opt_name       = "idle",
+    .sched_data     = NULL,
+
+    .pick_resource  = sched_idle_res_pick,
+    .do_schedule    = sched_idle_schedule,
+
+    .alloc_vdata    = sched_idle_alloc_vdata,
+    .free_vdata     = sched_idle_free_vdata,
+    .switch_sched   = sched_idle_switch_sched,
+};
+
 static inline struct vcpu *unit2vcpu_cpu(struct sched_unit *unit,
                                          unsigned int cpu)
 {
@@ -2141,7 +2192,6 @@ static void poll_timer_fn(void *data)
 static int cpu_schedule_up(unsigned int cpu)
 {
     struct sched_resource *sd;
-    void *sched_priv;
 
     sd = xzalloc(struct sched_resource);
     if ( sd == NULL )
@@ -2150,7 +2200,7 @@ static int cpu_schedule_up(unsigned int cpu)
     sd->cpus = cpumask_of(cpu);
     set_sched_res(cpu, sd);
 
-    sd->scheduler = &ops;
+    sd->scheduler = &sched_idle_ops;
     spin_lock_init(&sd->_lock);
     sd->schedule_lock = &sched_free_cpu_lock;
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
@@ -2171,20 +2221,10 @@ static int cpu_schedule_up(unsigned int cpu)
         struct sched_unit *unit = idle->sched_unit;
 
         /*
-         * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
-         * while its scheduler specific data (what is pointed by sched_priv)
-         * is. Also, at this stage of the resume path, we attach the pCPU
-         * to the default scheduler, no matter in what cpupool it was before
-         * suspend. To avoid inconsistency, let's allocate default scheduler
-         * data for the idle vCPU here. If the pCPU was in a different pool
-         * with a different scheduler, it is schedule_cpu_switch(), invoked
-         * later, that will set things up as appropriate.
+         * No need to allocate any scheduler data, as cpus coming online are
+         * free initially and the idle scheduler doesn't need any data areas
+         * allocated.
          */
-        ASSERT(unit->priv == NULL);
-
-        unit->priv = sched_alloc_vdata(&ops, unit, idle->domain->sched_priv);
-        if ( unit->priv == NULL )
-            return -ENOMEM;
 
         /* Update the resource pointer in the idle unit. */
         unit->res = sd;
@@ -2195,16 +2235,7 @@ static int cpu_schedule_up(unsigned int cpu)
     sd->curr = idle_vcpu[cpu]->sched_unit;
     sd->sched_unit_idle = idle_vcpu[cpu]->sched_unit;
 
-    /*
-     * We don't want to risk calling xfree() on an sd->sched_priv
-     * (e.g., inside free_pdata, from cpu_schedule_down() called
-     * during CPU_UP_CANCELLED) that contains an IS_ERR value.
-     */
-    sched_priv = sched_alloc_pdata(&ops, cpu);
-    if ( IS_ERR(sched_priv) )
-        return PTR_ERR(sched_priv);
-
-    sd->sched_priv = sched_priv;
+    sd->sched_priv = NULL;
 
     return 0;
 }
@@ -2212,13 +2243,6 @@ static int cpu_schedule_up(unsigned int cpu)
 static void cpu_schedule_down(unsigned int cpu)
 {
     struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *sched = sd->scheduler;
-
-    sched_free_pdata(sched, sd->sched_priv, cpu);
-    sched_free_vdata(sched, idle_vcpu[cpu]->sched_unit->priv);
-
-    idle_vcpu[cpu]->sched_unit->priv = NULL;
-    sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
 
@@ -2226,26 +2250,14 @@ static void cpu_schedule_down(unsigned int cpu)
     xfree(sd);
 }
 
-void scheduler_percpu_init(unsigned int cpu)
-{
-    struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *sched = sd->scheduler;
-
-    if ( system_state != SYS_STATE_resume )
-        sched_init_pdata(sched, sd->sched_priv, cpu);
-}
-
 void sched_rm_cpu(unsigned int cpu)
 {
     int rc;
-    struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *sched = sd->scheduler;
 
     rcu_read_lock(&domlist_read_lock);
     rc = cpu_disable_scheduler(cpu);
     BUG_ON(rc);
     rcu_read_unlock(&domlist_read_lock);
-    sched_deinit_pdata(sched, sd->sched_priv, cpu);
     cpu_schedule_down(cpu);
 }
 
@@ -2260,32 +2272,22 @@ static int cpu_schedule_callback(
      * allocating and initializing the per-pCPU scheduler specific data,
      * as well as "registering" this pCPU to the scheduler (which may
      * involve modifying some scheduler wide data structures).
-     * This happens by calling the alloc_pdata and init_pdata hooks, in
-     * this order. A scheduler that does not need to allocate any per-pCPU
-     * data can avoid implementing alloc_pdata. init_pdata may, however, be
-     * necessary/useful in this case too (e.g., it can contain the "register
-     * the pCPU to the scheduler" part). alloc_pdata (if present) is called
-     * during CPU_UP_PREPARE. init_pdata (if present) is called before
-     * CPU_STARTING in scheduler_percpu_init().
+     * As new pCPUs always start as "free" cpus with the minimal idle
+     * scheduler being in charge, we don't need any of that.
      *
      * On the other hand, at teardown, we need to reverse what has been done
-     * during initialization, and then free the per-pCPU specific data. This
-     * happens by calling the deinit_pdata and free_pdata hooks, in this
+     * during initialization, and then free the per-pCPU specific data. A
+     * pCPU brought down is not forced through "free" cpus, so here we need to
+     * use the appropriate hooks.
+     *
+     * This happens by calling the deinit_pdata and free_pdata hooks, in this
      * order. If no per-pCPU memory was allocated, there is no need to
      * provide an implementation of free_pdata. deinit_pdata may, however,
      * be necessary/useful in this case too (e.g., it can undo something done
      * on scheduler wide data structure during init_pdata). Both deinit_pdata
      * and free_pdata are called during CPU_DEAD.
      *
-     * If someting goes wrong during bringup, we go to CPU_UP_CANCELLED
-     * *before* having called init_pdata. In this case, as there is no
-     * initialization needing undoing, only free_pdata should be called.
-     * This means it is possible to call free_pdata just after alloc_pdata,
-     * without a init_pdata/deinit_pdata "cycle" in between the two.
-     *
-     * So, in summary, the usage pattern should look either
-     *  - alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata, or
-     *  - alloc_pdata-->free_pdata.
+     * If someting goes wrong during bringup, we go to CPU_UP_CANCELLED.
      */
     switch ( action )
     {
@@ -2402,9 +2404,6 @@ void __init scheduler_init(void)
         BUG();
     get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_unit_idle = idle_vcpu[0]->sched_unit;
-    get_sched_res(0)->sched_priv = sched_alloc_pdata(&ops, 0);
-    BUG_ON(IS_ERR(get_sched_res(0)->sched_priv));
-    scheduler_percpu_init(0);
 }
 
 /*
@@ -2412,18 +2411,14 @@ void __init scheduler_init(void)
  * cpupool, or subject it to the scheduler of a new cpupool.
  *
  * For the pCPUs that are removed from their cpupool, their scheduler becomes
- * &ops (the default scheduler, selected at boot, which also services the
- * default cpupool). However, as these pCPUs are not really part of any pool,
- * there won't be any scheduling event on them, not even from the default
- * scheduler. Basically, they will just sit idle until they are explicitly
- * added back to a cpupool.
+ * &sched_idle_ops (the idle scheduler).
  */
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = get_sched_res(cpu)->scheduler;
-    struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
+    struct scheduler *new_ops = (c == NULL) ? &sched_idle_ops : c->sched;
     struct sched_resource *sd = get_sched_res(cpu);
     struct cpupool *old_pool = sd->cpupool;
     spinlock_t *old_lock, *new_lock;
@@ -2443,9 +2438,6 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     ASSERT((c == NULL && !cpumask_test_cpu(cpu, old_pool->cpu_valid)) ||
            (c != NULL && !cpumask_test_cpu(cpu, c->cpu_valid)));
 
-    if ( old_ops == new_ops )
-        goto out;
-
     /*
      * To setup the cpu for the new scheduler we need:
      *  - a valid instance of per-CPU scheduler specific data, as it is
@@ -2498,7 +2490,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      * taking it, finds all the initializations we've done above in place.
      */
     smp_mb();
-    sd->schedule_lock = c ? new_lock : &sched_free_cpu_lock;
+    sd->schedule_lock = new_lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock_irqrestore(old_lock, flags);
@@ -2510,7 +2502,6 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     sched_free_vdata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
- out:
     get_sched_res(cpu)->granularity = c ? c->granularity : 1;
     get_sched_res(cpu)->cpupool = c;
     /* When a cpu is added to a pool, trigger it to go pick up some work */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 7dc63c449b..e689bba361 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -677,7 +677,6 @@ void __domain_crash(struct domain *d);
 void noreturn asm_domain_crash_synchronous(unsigned long addr);
 
 void scheduler_init(void);
-void scheduler_percpu_init(unsigned int cpu);
 int  sched_init_vcpu(struct vcpu *v);
 void sched_destroy_vcpu(struct vcpu *v);
 int  sched_init_domain(struct domain *d, int poolid);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Instead of having a full blown scheduler running for the free cpus
add a very minimalistic scheduler for that purpose only ever scheduling
the related idle vcpu. This has the big advantage of not needing any
per-cpu, per-domain or per-scheduling unit data for free cpus and in
turn simplifying moving cpus to and from cpupools a lot.

As this new scheduler is not user selectable don't register it as an
official scheduler, but just include it in schedule.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/arch/arm/smpboot.c  |   2 -
 xen/arch/x86/smpboot.c  |   2 -
 xen/common/schedule.c   | 143 +++++++++++++++++++++++-------------------------
 xen/include/xen/sched.h |   1 -
 4 files changed, 67 insertions(+), 81 deletions(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 9a6582f2a6..f756444362 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -350,8 +350,6 @@ void start_secondary(unsigned long boot_phys_offset,
 
     setup_cpu_sibling_map(cpuid);
 
-    scheduler_percpu_init(cpuid);
-
     /* Run local notifiers */
     notify_cpu_starting(cpuid);
     /*
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 7e95b2cdac..153bfbb4b7 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -382,8 +382,6 @@ void start_secondary(void *unused)
 
     set_cpu_sibling_map(cpu);
 
-    scheduler_percpu_init(cpu);
-
     init_percpu_time();
 
     setup_secondary_APIC_clock();
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7a5ab4b1b6..d3e4ae226c 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -83,6 +83,57 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr
 
 static struct scheduler __read_mostly ops;
 
+static spinlock_t *
+sched_idle_switch_sched(struct scheduler *new_ops, unsigned int cpu,
+                        void *pdata, void *vdata)
+{
+    sched_idle_unit(cpu)->priv = NULL;
+
+    return &sched_free_cpu_lock;
+}
+
+static struct sched_resource *
+sched_idle_res_pick(const struct scheduler *ops, struct sched_unit *unit)
+{
+    return unit->res;
+}
+
+static void *
+sched_idle_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit,
+                       void *dd)
+{
+    /* Any non-NULL pointer is fine here. */
+    return (void *)1UL;
+}
+
+static void
+sched_idle_free_vdata(const struct scheduler *ops, void *priv)
+{
+}
+
+static void sched_idle_schedule(
+    const struct scheduler *ops, struct sched_unit *unit, s_time_t now,
+    bool tasklet_work_scheduled)
+{
+    const unsigned int cpu = smp_processor_id();
+
+    unit->next_time = -1;
+    unit->next_task = sched_idle_unit(sched_get_resource_cpu(cpu));
+}
+
+static struct scheduler sched_idle_ops = {
+    .name           = "Idle Scheduler",
+    .opt_name       = "idle",
+    .sched_data     = NULL,
+
+    .pick_resource  = sched_idle_res_pick,
+    .do_schedule    = sched_idle_schedule,
+
+    .alloc_vdata    = sched_idle_alloc_vdata,
+    .free_vdata     = sched_idle_free_vdata,
+    .switch_sched   = sched_idle_switch_sched,
+};
+
 static inline struct vcpu *unit2vcpu_cpu(struct sched_unit *unit,
                                          unsigned int cpu)
 {
@@ -2141,7 +2192,6 @@ static void poll_timer_fn(void *data)
 static int cpu_schedule_up(unsigned int cpu)
 {
     struct sched_resource *sd;
-    void *sched_priv;
 
     sd = xzalloc(struct sched_resource);
     if ( sd == NULL )
@@ -2150,7 +2200,7 @@ static int cpu_schedule_up(unsigned int cpu)
     sd->cpus = cpumask_of(cpu);
     set_sched_res(cpu, sd);
 
-    sd->scheduler = &ops;
+    sd->scheduler = &sched_idle_ops;
     spin_lock_init(&sd->_lock);
     sd->schedule_lock = &sched_free_cpu_lock;
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
@@ -2171,20 +2221,10 @@ static int cpu_schedule_up(unsigned int cpu)
         struct sched_unit *unit = idle->sched_unit;
 
         /*
-         * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
-         * while its scheduler specific data (what is pointed by sched_priv)
-         * is. Also, at this stage of the resume path, we attach the pCPU
-         * to the default scheduler, no matter in what cpupool it was before
-         * suspend. To avoid inconsistency, let's allocate default scheduler
-         * data for the idle vCPU here. If the pCPU was in a different pool
-         * with a different scheduler, it is schedule_cpu_switch(), invoked
-         * later, that will set things up as appropriate.
+         * No need to allocate any scheduler data, as cpus coming online are
+         * free initially and the idle scheduler doesn't need any data areas
+         * allocated.
          */
-        ASSERT(unit->priv == NULL);
-
-        unit->priv = sched_alloc_vdata(&ops, unit, idle->domain->sched_priv);
-        if ( unit->priv == NULL )
-            return -ENOMEM;
 
         /* Update the resource pointer in the idle unit. */
         unit->res = sd;
@@ -2195,16 +2235,7 @@ static int cpu_schedule_up(unsigned int cpu)
     sd->curr = idle_vcpu[cpu]->sched_unit;
     sd->sched_unit_idle = idle_vcpu[cpu]->sched_unit;
 
-    /*
-     * We don't want to risk calling xfree() on an sd->sched_priv
-     * (e.g., inside free_pdata, from cpu_schedule_down() called
-     * during CPU_UP_CANCELLED) that contains an IS_ERR value.
-     */
-    sched_priv = sched_alloc_pdata(&ops, cpu);
-    if ( IS_ERR(sched_priv) )
-        return PTR_ERR(sched_priv);
-
-    sd->sched_priv = sched_priv;
+    sd->sched_priv = NULL;
 
     return 0;
 }
@@ -2212,13 +2243,6 @@ static int cpu_schedule_up(unsigned int cpu)
 static void cpu_schedule_down(unsigned int cpu)
 {
     struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *sched = sd->scheduler;
-
-    sched_free_pdata(sched, sd->sched_priv, cpu);
-    sched_free_vdata(sched, idle_vcpu[cpu]->sched_unit->priv);
-
-    idle_vcpu[cpu]->sched_unit->priv = NULL;
-    sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
 
@@ -2226,26 +2250,14 @@ static void cpu_schedule_down(unsigned int cpu)
     xfree(sd);
 }
 
-void scheduler_percpu_init(unsigned int cpu)
-{
-    struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *sched = sd->scheduler;
-
-    if ( system_state != SYS_STATE_resume )
-        sched_init_pdata(sched, sd->sched_priv, cpu);
-}
-
 void sched_rm_cpu(unsigned int cpu)
 {
     int rc;
-    struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *sched = sd->scheduler;
 
     rcu_read_lock(&domlist_read_lock);
     rc = cpu_disable_scheduler(cpu);
     BUG_ON(rc);
     rcu_read_unlock(&domlist_read_lock);
-    sched_deinit_pdata(sched, sd->sched_priv, cpu);
     cpu_schedule_down(cpu);
 }
 
@@ -2260,32 +2272,22 @@ static int cpu_schedule_callback(
      * allocating and initializing the per-pCPU scheduler specific data,
      * as well as "registering" this pCPU to the scheduler (which may
      * involve modifying some scheduler wide data structures).
-     * This happens by calling the alloc_pdata and init_pdata hooks, in
-     * this order. A scheduler that does not need to allocate any per-pCPU
-     * data can avoid implementing alloc_pdata. init_pdata may, however, be
-     * necessary/useful in this case too (e.g., it can contain the "register
-     * the pCPU to the scheduler" part). alloc_pdata (if present) is called
-     * during CPU_UP_PREPARE. init_pdata (if present) is called before
-     * CPU_STARTING in scheduler_percpu_init().
+     * As new pCPUs always start as "free" cpus with the minimal idle
+     * scheduler being in charge, we don't need any of that.
      *
      * On the other hand, at teardown, we need to reverse what has been done
-     * during initialization, and then free the per-pCPU specific data. This
-     * happens by calling the deinit_pdata and free_pdata hooks, in this
+     * during initialization, and then free the per-pCPU specific data. A
+     * pCPU brought down is not forced through "free" cpus, so here we need to
+     * use the appropriate hooks.
+     *
+     * This happens by calling the deinit_pdata and free_pdata hooks, in this
      * order. If no per-pCPU memory was allocated, there is no need to
      * provide an implementation of free_pdata. deinit_pdata may, however,
      * be necessary/useful in this case too (e.g., it can undo something done
      * on scheduler wide data structure during init_pdata). Both deinit_pdata
      * and free_pdata are called during CPU_DEAD.
      *
-     * If someting goes wrong during bringup, we go to CPU_UP_CANCELLED
-     * *before* having called init_pdata. In this case, as there is no
-     * initialization needing undoing, only free_pdata should be called.
-     * This means it is possible to call free_pdata just after alloc_pdata,
-     * without a init_pdata/deinit_pdata "cycle" in between the two.
-     *
-     * So, in summary, the usage pattern should look either
-     *  - alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata, or
-     *  - alloc_pdata-->free_pdata.
+     * If someting goes wrong during bringup, we go to CPU_UP_CANCELLED.
      */
     switch ( action )
     {
@@ -2402,9 +2404,6 @@ void __init scheduler_init(void)
         BUG();
     get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_unit_idle = idle_vcpu[0]->sched_unit;
-    get_sched_res(0)->sched_priv = sched_alloc_pdata(&ops, 0);
-    BUG_ON(IS_ERR(get_sched_res(0)->sched_priv));
-    scheduler_percpu_init(0);
 }
 
 /*
@@ -2412,18 +2411,14 @@ void __init scheduler_init(void)
  * cpupool, or subject it to the scheduler of a new cpupool.
  *
  * For the pCPUs that are removed from their cpupool, their scheduler becomes
- * &ops (the default scheduler, selected at boot, which also services the
- * default cpupool). However, as these pCPUs are not really part of any pool,
- * there won't be any scheduling event on them, not even from the default
- * scheduler. Basically, they will just sit idle until they are explicitly
- * added back to a cpupool.
+ * &sched_idle_ops (the idle scheduler).
  */
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = get_sched_res(cpu)->scheduler;
-    struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
+    struct scheduler *new_ops = (c == NULL) ? &sched_idle_ops : c->sched;
     struct sched_resource *sd = get_sched_res(cpu);
     struct cpupool *old_pool = sd->cpupool;
     spinlock_t *old_lock, *new_lock;
@@ -2443,9 +2438,6 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     ASSERT((c == NULL && !cpumask_test_cpu(cpu, old_pool->cpu_valid)) ||
            (c != NULL && !cpumask_test_cpu(cpu, c->cpu_valid)));
 
-    if ( old_ops == new_ops )
-        goto out;
-
     /*
      * To setup the cpu for the new scheduler we need:
      *  - a valid instance of per-CPU scheduler specific data, as it is
@@ -2498,7 +2490,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      * taking it, finds all the initializations we've done above in place.
      */
     smp_mb();
-    sd->schedule_lock = c ? new_lock : &sched_free_cpu_lock;
+    sd->schedule_lock = new_lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock_irqrestore(old_lock, flags);
@@ -2510,7 +2502,6 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     sched_free_vdata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
- out:
     get_sched_res(cpu)->granularity = c ? c->granularity : 1;
     get_sched_res(cpu)->cpupool = c;
     /* When a cpu is added to a pool, trigger it to go pick up some work */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 7dc63c449b..e689bba361 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -677,7 +677,6 @@ void __domain_crash(struct domain *d);
 void noreturn asm_domain_crash_synchronous(unsigned long addr);
 
 void scheduler_init(void);
-void scheduler_percpu_init(unsigned int cpu);
 int  sched_init_vcpu(struct vcpu *v);
 void sched_destroy_vcpu(struct vcpu *v);
 int  sched_init_domain(struct domain *d, int poolid);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 55/60] xen/sched: split schedule_cpu_switch()
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Instead of letting schedule_cpu_switch() handle moving cpus from and
to cpupools, split it into schedule_cpu_add() and schedule_cpu_rm().

This will allow us to drop allocating/freeing scheduler data for free
cpus as the idle scheduler doesn't need such data.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c    |   4 +-
 xen/common/schedule.c   | 121 +++++++++++++++++++++++++++---------------------
 xen/include/xen/sched.h |   3 +-
 3 files changed, 72 insertions(+), 56 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index ab4a2be4fc..35108b9119 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -268,7 +268,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
 
     if ( (cpupool_moving_cpu == cpu) && (c != cpupool_cpu_moving) )
         return -EADDRNOTAVAIL;
-    ret = schedule_cpu_switch(cpu, c);
+    ret = schedule_cpu_add(cpu, c);
     if ( ret )
         return ret;
 
@@ -318,7 +318,7 @@ static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
      */
     if ( !ret )
     {
-        ret = schedule_cpu_switch(cpu, NULL);
+        ret = schedule_cpu_rm(cpu);
         if ( ret )
             cpumask_clear_cpu(cpu, &cpupool_free_cpus);
         else
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index d3e4ae226c..0f667068a8 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -83,15 +83,6 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr
 
 static struct scheduler __read_mostly ops;
 
-static spinlock_t *
-sched_idle_switch_sched(struct scheduler *new_ops, unsigned int cpu,
-                        void *pdata, void *vdata)
-{
-    sched_idle_unit(cpu)->priv = NULL;
-
-    return &sched_free_cpu_lock;
-}
-
 static struct sched_resource *
 sched_idle_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
@@ -131,7 +122,6 @@ static struct scheduler sched_idle_ops = {
 
     .alloc_vdata    = sched_idle_alloc_vdata,
     .free_vdata     = sched_idle_free_vdata,
-    .switch_sched   = sched_idle_switch_sched,
 };
 
 static inline struct vcpu *unit2vcpu_cpu(struct sched_unit *unit,
@@ -2407,36 +2397,22 @@ void __init scheduler_init(void)
 }
 
 /*
- * Move a pCPU outside of the influence of the scheduler of its current
- * cpupool, or subject it to the scheduler of a new cpupool.
- *
- * For the pCPUs that are removed from their cpupool, their scheduler becomes
- * &sched_idle_ops (the idle scheduler).
+ * Move a pCPU from free cpus (running the idle scheduler) to a cpupool
+ * using any "real" scheduler.
+ * The cpu is still marked as "free" and not yet valid for its cpupool.
  */
-int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
+int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
-    void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
-    struct scheduler *old_ops = get_sched_res(cpu)->scheduler;
-    struct scheduler *new_ops = (c == NULL) ? &sched_idle_ops : c->sched;
+    void *ppriv, *vpriv;
+    struct scheduler *new_ops = c->sched;
     struct sched_resource *sd = get_sched_res(cpu);
-    struct cpupool *old_pool = sd->cpupool;
     spinlock_t *old_lock, *new_lock;
     unsigned long flags;
 
-    /*
-     * pCPUs only move from a valid cpupool to free (i.e., out of any pool),
-     * or from free to a valid cpupool. In the former case (which happens when
-     * c is NULL), we want the CPU to have been marked as free already, as
-     * well as to not be valid for the source pool any longer, when we get to
-     * here. In the latter case (which happens when c is a valid cpupool), we
-     * want the CPU to still be marked as free, as well as to not yet be valid
-     * for the destination pool.
-     */
-    ASSERT(c != old_pool && (c != NULL || old_pool != NULL));
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
-    ASSERT((c == NULL && !cpumask_test_cpu(cpu, old_pool->cpu_valid)) ||
-           (c != NULL && !cpumask_test_cpu(cpu, c->cpu_valid)));
+    ASSERT(!cpumask_test_cpu(cpu, c->cpu_valid));
+    ASSERT(get_sched_res(cpu)->cpupool == NULL);
 
     /*
      * To setup the cpu for the new scheduler we need:
@@ -2461,52 +2437,91 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
         return -ENOMEM;
     }
 
-    sched_do_tick_suspend(old_ops, cpu);
-
     /*
-     * The actual switch, including (if necessary) the rerouting of the
-     * scheduler lock to whatever new_ops prefers,  needs to happen in one
-     * critical section, protected by old_ops' lock, or races are possible.
-     * It is, in fact, the lock of another scheduler that we are taking (the
-     * scheduler of the cpupool that cpu still belongs to). But that is ok
-     * as, anyone trying to schedule on this cpu will spin until when we
-     * release that lock (bottom of this function). When he'll get the lock
-     * --thanks to the loop inside *_schedule_lock() functions-- he'll notice
-     * that the lock itself changed, and retry acquiring the new one (which
-     * will be the correct, remapped one, at that point).
+     * The actual switch, including the rerouting of the scheduler lock to
+     * whatever new_ops prefers, needs to happen in one critical section,
+     * protected by old_ops' lock, or races are possible.
+     * It is, in fact, the lock of the idle scheduler that we are taking.
+     * But that is ok as anyone trying to schedule on this cpu will spin until
+     * when we release that lock (bottom of this function). When he'll get the
+     * lock --thanks to the loop inside *_schedule_lock() functions-- he'll
+     * notice that the lock itself changed, and retry acquiring the new one
+     * (which will be the correct, remapped one, at that point).
      */
     old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
 
-    vpriv_old = idle->sched_unit->priv;
-    ppriv_old = sd->sched_priv;
     new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
 
     sd->scheduler = new_ops;
     sd->sched_priv = ppriv;
 
     /*
-     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
+     * Reroute the lock to the per pCPU lock as /last/ thing. In fact,
      * if it is free (and it can be) we want that anyone that manages
      * taking it, finds all the initializations we've done above in place.
      */
     smp_mb();
     sd->schedule_lock = new_lock;
 
-    /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+    /* _Not_ pcpu_schedule_unlock(): schedule_lock has changed! */
     spin_unlock_irqrestore(old_lock, flags);
 
     sched_do_tick_resume(new_ops, cpu);
 
+    sd->granularity = c->granularity;
+    sd->cpupool = c;
+    /* The  cpu is added to a pool, trigger it to go pick up some work */
+    cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+
+    return 0;
+}
+
+/*
+ * Remove a pCPU from its cpupool. Its scheduler becomes &sched_idle_ops
+ * (the idle scheduler).
+ * The cpu is already marked as "free" and not valid any longer for its
+ * cpupool.
+ */
+int schedule_cpu_rm(unsigned int cpu)
+{
+    struct vcpu *idle;
+    void *ppriv_old, *vpriv_old;
+    struct sched_resource *sd = get_sched_res(cpu);
+    struct scheduler *old_ops = sd->scheduler;
+    spinlock_t *old_lock;
+    unsigned long flags;
+
+    ASSERT(sd->cpupool != NULL);
+    ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
+    ASSERT(!cpumask_test_cpu(cpu, sd->cpupool->cpu_valid));
+
+    idle = idle_vcpu[cpu];
+
+    sched_do_tick_suspend(old_ops, cpu);
+
+    /* See comment in schedule_cpu_add() regarding lock switching. */
+    old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
+
+    vpriv_old = idle->sched_unit->priv;
+    ppriv_old = sd->sched_priv;
+
+    idle->sched_unit->priv = NULL;
+    sd->scheduler = &sched_idle_ops;
+    sd->sched_priv = NULL;
+
+    smp_mb();
+    sd->schedule_lock = &sched_free_cpu_lock;
+
+    /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+    spin_unlock_irqrestore(old_lock, flags);
+
     sched_deinit_pdata(old_ops, ppriv_old, cpu);
 
     sched_free_vdata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
-    get_sched_res(cpu)->granularity = c ? c->granularity : 1;
-    get_sched_res(cpu)->cpupool = c;
-    /* When a cpu is added to a pool, trigger it to go pick up some work */
-    if ( c != NULL )
-        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+    sd->granularity = 1;
+    sd->cpupool = NULL;
 
     return 0;
 }
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e689bba361..9cee0ec9a3 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -908,7 +908,8 @@ struct scheduler;
 struct scheduler *scheduler_get_default(void);
 struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr);
 void scheduler_free(struct scheduler *sched);
-int schedule_cpu_switch(unsigned int cpu, struct cpupool *c);
+int schedule_cpu_add(unsigned int cpu, struct cpupool *c);
+int schedule_cpu_rm(unsigned int cpu);
 void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value);
 int cpu_disable_scheduler(unsigned int cpu);
 /* We need it in dom0_setup_vcpu */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 55/60] xen/sched: split schedule_cpu_switch()
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Instead of letting schedule_cpu_switch() handle moving cpus from and
to cpupools, split it into schedule_cpu_add() and schedule_cpu_rm().

This will allow us to drop allocating/freeing scheduler data for free
cpus as the idle scheduler doesn't need such data.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c    |   4 +-
 xen/common/schedule.c   | 121 +++++++++++++++++++++++++++---------------------
 xen/include/xen/sched.h |   3 +-
 3 files changed, 72 insertions(+), 56 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index ab4a2be4fc..35108b9119 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -268,7 +268,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
 
     if ( (cpupool_moving_cpu == cpu) && (c != cpupool_cpu_moving) )
         return -EADDRNOTAVAIL;
-    ret = schedule_cpu_switch(cpu, c);
+    ret = schedule_cpu_add(cpu, c);
     if ( ret )
         return ret;
 
@@ -318,7 +318,7 @@ static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
      */
     if ( !ret )
     {
-        ret = schedule_cpu_switch(cpu, NULL);
+        ret = schedule_cpu_rm(cpu);
         if ( ret )
             cpumask_clear_cpu(cpu, &cpupool_free_cpus);
         else
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index d3e4ae226c..0f667068a8 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -83,15 +83,6 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr
 
 static struct scheduler __read_mostly ops;
 
-static spinlock_t *
-sched_idle_switch_sched(struct scheduler *new_ops, unsigned int cpu,
-                        void *pdata, void *vdata)
-{
-    sched_idle_unit(cpu)->priv = NULL;
-
-    return &sched_free_cpu_lock;
-}
-
 static struct sched_resource *
 sched_idle_res_pick(const struct scheduler *ops, struct sched_unit *unit)
 {
@@ -131,7 +122,6 @@ static struct scheduler sched_idle_ops = {
 
     .alloc_vdata    = sched_idle_alloc_vdata,
     .free_vdata     = sched_idle_free_vdata,
-    .switch_sched   = sched_idle_switch_sched,
 };
 
 static inline struct vcpu *unit2vcpu_cpu(struct sched_unit *unit,
@@ -2407,36 +2397,22 @@ void __init scheduler_init(void)
 }
 
 /*
- * Move a pCPU outside of the influence of the scheduler of its current
- * cpupool, or subject it to the scheduler of a new cpupool.
- *
- * For the pCPUs that are removed from their cpupool, their scheduler becomes
- * &sched_idle_ops (the idle scheduler).
+ * Move a pCPU from free cpus (running the idle scheduler) to a cpupool
+ * using any "real" scheduler.
+ * The cpu is still marked as "free" and not yet valid for its cpupool.
  */
-int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
+int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
-    void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
-    struct scheduler *old_ops = get_sched_res(cpu)->scheduler;
-    struct scheduler *new_ops = (c == NULL) ? &sched_idle_ops : c->sched;
+    void *ppriv, *vpriv;
+    struct scheduler *new_ops = c->sched;
     struct sched_resource *sd = get_sched_res(cpu);
-    struct cpupool *old_pool = sd->cpupool;
     spinlock_t *old_lock, *new_lock;
     unsigned long flags;
 
-    /*
-     * pCPUs only move from a valid cpupool to free (i.e., out of any pool),
-     * or from free to a valid cpupool. In the former case (which happens when
-     * c is NULL), we want the CPU to have been marked as free already, as
-     * well as to not be valid for the source pool any longer, when we get to
-     * here. In the latter case (which happens when c is a valid cpupool), we
-     * want the CPU to still be marked as free, as well as to not yet be valid
-     * for the destination pool.
-     */
-    ASSERT(c != old_pool && (c != NULL || old_pool != NULL));
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
-    ASSERT((c == NULL && !cpumask_test_cpu(cpu, old_pool->cpu_valid)) ||
-           (c != NULL && !cpumask_test_cpu(cpu, c->cpu_valid)));
+    ASSERT(!cpumask_test_cpu(cpu, c->cpu_valid));
+    ASSERT(get_sched_res(cpu)->cpupool == NULL);
 
     /*
      * To setup the cpu for the new scheduler we need:
@@ -2461,52 +2437,91 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
         return -ENOMEM;
     }
 
-    sched_do_tick_suspend(old_ops, cpu);
-
     /*
-     * The actual switch, including (if necessary) the rerouting of the
-     * scheduler lock to whatever new_ops prefers,  needs to happen in one
-     * critical section, protected by old_ops' lock, or races are possible.
-     * It is, in fact, the lock of another scheduler that we are taking (the
-     * scheduler of the cpupool that cpu still belongs to). But that is ok
-     * as, anyone trying to schedule on this cpu will spin until when we
-     * release that lock (bottom of this function). When he'll get the lock
-     * --thanks to the loop inside *_schedule_lock() functions-- he'll notice
-     * that the lock itself changed, and retry acquiring the new one (which
-     * will be the correct, remapped one, at that point).
+     * The actual switch, including the rerouting of the scheduler lock to
+     * whatever new_ops prefers, needs to happen in one critical section,
+     * protected by old_ops' lock, or races are possible.
+     * It is, in fact, the lock of the idle scheduler that we are taking.
+     * But that is ok as anyone trying to schedule on this cpu will spin until
+     * when we release that lock (bottom of this function). When he'll get the
+     * lock --thanks to the loop inside *_schedule_lock() functions-- he'll
+     * notice that the lock itself changed, and retry acquiring the new one
+     * (which will be the correct, remapped one, at that point).
      */
     old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
 
-    vpriv_old = idle->sched_unit->priv;
-    ppriv_old = sd->sched_priv;
     new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
 
     sd->scheduler = new_ops;
     sd->sched_priv = ppriv;
 
     /*
-     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
+     * Reroute the lock to the per pCPU lock as /last/ thing. In fact,
      * if it is free (and it can be) we want that anyone that manages
      * taking it, finds all the initializations we've done above in place.
      */
     smp_mb();
     sd->schedule_lock = new_lock;
 
-    /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+    /* _Not_ pcpu_schedule_unlock(): schedule_lock has changed! */
     spin_unlock_irqrestore(old_lock, flags);
 
     sched_do_tick_resume(new_ops, cpu);
 
+    sd->granularity = c->granularity;
+    sd->cpupool = c;
+    /* The  cpu is added to a pool, trigger it to go pick up some work */
+    cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+
+    return 0;
+}
+
+/*
+ * Remove a pCPU from its cpupool. Its scheduler becomes &sched_idle_ops
+ * (the idle scheduler).
+ * The cpu is already marked as "free" and not valid any longer for its
+ * cpupool.
+ */
+int schedule_cpu_rm(unsigned int cpu)
+{
+    struct vcpu *idle;
+    void *ppriv_old, *vpriv_old;
+    struct sched_resource *sd = get_sched_res(cpu);
+    struct scheduler *old_ops = sd->scheduler;
+    spinlock_t *old_lock;
+    unsigned long flags;
+
+    ASSERT(sd->cpupool != NULL);
+    ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
+    ASSERT(!cpumask_test_cpu(cpu, sd->cpupool->cpu_valid));
+
+    idle = idle_vcpu[cpu];
+
+    sched_do_tick_suspend(old_ops, cpu);
+
+    /* See comment in schedule_cpu_add() regarding lock switching. */
+    old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
+
+    vpriv_old = idle->sched_unit->priv;
+    ppriv_old = sd->sched_priv;
+
+    idle->sched_unit->priv = NULL;
+    sd->scheduler = &sched_idle_ops;
+    sd->sched_priv = NULL;
+
+    smp_mb();
+    sd->schedule_lock = &sched_free_cpu_lock;
+
+    /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+    spin_unlock_irqrestore(old_lock, flags);
+
     sched_deinit_pdata(old_ops, ppriv_old, cpu);
 
     sched_free_vdata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
-    get_sched_res(cpu)->granularity = c ? c->granularity : 1;
-    get_sched_res(cpu)->cpupool = c;
-    /* When a cpu is added to a pool, trigger it to go pick up some work */
-    if ( c != NULL )
-        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+    sd->granularity = 1;
+    sd->cpupool = NULL;
 
     return 0;
 }
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e689bba361..9cee0ec9a3 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -908,7 +908,8 @@ struct scheduler;
 struct scheduler *scheduler_get_default(void);
 struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr);
 void scheduler_free(struct scheduler *sched);
-int schedule_cpu_switch(unsigned int cpu, struct cpupool *c);
+int schedule_cpu_add(unsigned int cpu, struct cpupool *c);
+int schedule_cpu_rm(unsigned int cpu);
 void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value);
 int cpu_disable_scheduler(unsigned int cpu);
 /* We need it in dom0_setup_vcpu */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 56/60] xen/sched: protect scheduling resource via rcu
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

In order to be able to move cpus to cpupools with core scheduling
active it is mandatory to merge multiple cpus into one scheduling
resource or to split a scheduling resource with multiple cpus in it
into multiple scheduling resources. This in turn requires to modify
the cpu <-> scheduling resource relation. In order to be able to free
unused resources protect struct sched_resource via RCU. This ensures
there are no users left when freeing such a resource.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       |   4 +
 xen/common/schedule.c      | 200 ++++++++++++++++++++++++++++++++++++++++-----
 xen/include/xen/sched-if.h |   7 +-
 3 files changed, 190 insertions(+), 21 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 35108b9119..1c7722552c 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -510,8 +510,10 @@ static int cpupool_cpu_add(unsigned int cpu)
      * (or unplugging would have failed) and that is the default behavior
      * anyway.
      */
+    rcu_read_lock(&sched_res_rculock);
     get_sched_res(cpu)->cpupool = NULL;
     ret = cpupool_assign_cpu_locked(cpupool0, cpu);
+    rcu_read_unlock(&sched_res_rculock);
 
     spin_unlock(&cpupool_lock);
 
@@ -596,7 +598,9 @@ static void cpupool_cpu_remove_forced(unsigned int cpu)
         }
     }
 
+    rcu_read_lock(&sched_res_rculock);
     sched_rm_cpu(cpu);
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 0f667068a8..78eb055d07 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -73,6 +73,7 @@ static void poll_timer_fn(void *data);
 /* This is global for now so that private implementations can reach it */
 DEFINE_PER_CPU(struct sched_resource *, sched_res);
 static DEFINE_PER_CPU(unsigned int, sched_res_idx);
+DEFINE_RCU_READ_LOCK(sched_res_rculock);
 
 /* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
@@ -270,17 +271,25 @@ static inline void vcpu_runstate_change(
 
 void sched_guest_idle(void (*idle) (void), unsigned int cpu)
 {
+    rcu_read_lock(&sched_res_rculock);
     atomic_inc(&get_sched_res(cpu)->urgent_count);
+    rcu_read_unlock(&sched_res_rculock);
+
     idle();
+
+    rcu_read_lock(&sched_res_rculock);
     atomic_dec(&get_sched_res(cpu)->urgent_count);
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
 {
-    spinlock_t *lock = likely(v == current)
-                       ? NULL : unit_schedule_lock_irq(v->sched_unit);
+    spinlock_t *lock;
     s_time_t delta;
 
+    rcu_read_lock(&sched_res_rculock);
+
+    lock = likely(v == current) ? NULL : unit_schedule_lock_irq(v->sched_unit);
     memcpy(runstate, &v->runstate, sizeof(*runstate));
     delta = NOW() - runstate->state_entry_time;
     if ( delta > 0 )
@@ -288,6 +297,8 @@ void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
 
     if ( unlikely(lock != NULL) )
         unit_schedule_unlock_irq(lock, v->sched_unit);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 uint64_t get_cpu_idle_time(unsigned int cpu)
@@ -493,6 +504,8 @@ int sched_init_vcpu(struct vcpu *v)
         return 0;
     }
 
+    rcu_read_lock(&sched_res_rculock);
+
     /* The first vcpu of an unit can be set via sched_set_res(). */
     sched_set_res(unit, get_sched_res(processor));
 
@@ -500,6 +513,7 @@ int sched_init_vcpu(struct vcpu *v)
     if ( unit->priv == NULL )
     {
         sched_free_unit(unit, v);
+        rcu_read_unlock(&sched_res_rculock);
         return 1;
     }
 
@@ -526,6 +540,8 @@ int sched_init_vcpu(struct vcpu *v)
         sched_insert_unit(dom_scheduler(d), unit);
     }
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return 0;
 }
 
@@ -553,6 +569,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     void *unitdata;
     struct scheduler *old_ops;
     void *old_domdata;
+    int ret = 0;
 
     for_each_sched_unit ( d, unit )
     {
@@ -560,15 +577,21 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
             return -EBUSY;
     }
 
+    rcu_read_lock(&sched_res_rculock);
+
     domdata = sched_alloc_domdata(c->sched, d);
     if ( IS_ERR(domdata) )
-        return PTR_ERR(domdata);
+    {
+        ret = PTR_ERR(domdata);
+        goto out;
+    }
 
     unit_priv = xzalloc_array(void *, d->max_vcpus);
     if ( unit_priv == NULL )
     {
         sched_free_domdata(c->sched, domdata);
-        return -ENOMEM;
+        ret = -ENOMEM;
+        goto out;
     }
 
     for_each_sched_unit ( d, unit )
@@ -580,7 +603,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
                 xfree(unit_priv[unit->unit_id]);
             xfree(unit_priv);
             sched_free_domdata(c->sched, domdata);
-            return -ENOMEM;
+            ret = -ENOMEM;
+            goto out;
         }
     }
 
@@ -642,7 +666,10 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     xfree(unit_priv);
 
-    return 0;
+out:
+    rcu_read_unlock(&sched_res_rculock);
+
+    return ret;
 }
 
 void sched_destroy_vcpu(struct vcpu *v)
@@ -660,9 +687,13 @@ void sched_destroy_vcpu(struct vcpu *v)
      */
     if ( unit->vcpu == v )
     {
+        rcu_read_lock(&sched_res_rculock);
+
         sched_remove_unit(vcpu_scheduler(v), unit);
         sched_free_vdata(vcpu_scheduler(v), unit->priv);
         sched_free_unit(unit, v);
+
+        rcu_read_unlock(&sched_res_rculock);
     }
 }
 
@@ -680,7 +711,12 @@ int sched_init_domain(struct domain *d, int poolid)
     SCHED_STAT_CRANK(dom_init);
     TRACE_1D(TRC_SCHED_DOM_ADD, d->domain_id);
 
+    rcu_read_lock(&sched_res_rculock);
+
     sdom = sched_alloc_domdata(dom_scheduler(d), d);
+
+    rcu_read_unlock(&sched_res_rculock);
+
     if ( IS_ERR(sdom) )
         return PTR_ERR(sdom);
 
@@ -698,9 +734,13 @@ void sched_destroy_domain(struct domain *d)
         SCHED_STAT_CRANK(dom_destroy);
         TRACE_1D(TRC_SCHED_DOM_REM, d->domain_id);
 
+        rcu_read_lock(&sched_res_rculock);
+
         sched_free_domdata(dom_scheduler(d), d->sched_priv);
         d->sched_priv = NULL;
 
+        rcu_read_unlock(&sched_res_rculock);
+
         cpupool_rm_domain(d);
     }
 }
@@ -734,11 +774,15 @@ void vcpu_sleep_nosync(struct vcpu *v)
 
     TRACE_2D(TRC_SCHED_SLEEP, v->domain->domain_id, v->vcpu_id);
 
+    rcu_read_lock(&sched_res_rculock);
+
     lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
 
     vcpu_sleep_nosync_locked(v);
 
     unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void vcpu_sleep_sync(struct vcpu *v)
@@ -759,6 +803,8 @@ void vcpu_wake(struct vcpu *v)
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
+    rcu_read_lock(&sched_res_rculock);
+
     lock = unit_schedule_lock_irqsave(unit, &flags);
 
     if ( likely(vcpu_runnable(v)) )
@@ -779,6 +825,8 @@ void vcpu_wake(struct vcpu *v)
     }
 
     unit_schedule_unlock_irqrestore(lock, flags, unit);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void vcpu_unblock(struct vcpu *v)
@@ -812,6 +860,8 @@ static void sched_unit_move_locked(struct sched_unit *unit,
     unsigned int old_cpu = unit->res->processor;
     struct vcpu *v;
 
+    rcu_read_lock(&sched_res_rculock);
+
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
      * once the switch occurs, v->is_urgent is no longer protected by
@@ -831,6 +881,8 @@ static void sched_unit_move_locked(struct sched_unit *unit,
      * pointer can't change while the current lock is held.
      */
     sched_migrate(vcpu_scheduler(unit->vcpu), unit, new_cpu);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 /*
@@ -992,6 +1044,8 @@ void restore_vcpu_affinity(struct domain *d)
 
     ASSERT(system_state == SYS_STATE_resume);
 
+    rcu_read_lock(&sched_res_rculock);
+
     for_each_sched_unit ( d, unit )
     {
         spinlock_t *lock;
@@ -1042,6 +1096,8 @@ void restore_vcpu_affinity(struct domain *d)
             sched_move_irqs(unit);
     }
 
+    rcu_read_unlock(&sched_res_rculock);
+
     domain_update_node_affinity(d);
 }
 
@@ -1057,9 +1113,11 @@ int cpu_disable_scheduler(unsigned int cpu)
     cpumask_t online_affinity;
     int ret = 0;
 
+    rcu_read_lock(&sched_res_rculock);
+
     c = get_sched_res(cpu)->cpupool;
     if ( c == NULL )
-        return ret;
+        goto out;
 
     for_each_domain_in_cpupool ( d, c )
     {
@@ -1116,6 +1174,9 @@ int cpu_disable_scheduler(unsigned int cpu)
         }
     }
 
+out:
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
 
@@ -1149,7 +1210,9 @@ void sched_set_affinity(
 {
     struct sched_unit *unit = v->sched_unit;
 
+    rcu_read_lock(&sched_res_rculock);
     sched_adjust_affinity(dom_scheduler(v->domain), unit, hard, soft);
+    rcu_read_unlock(&sched_res_rculock);
 
     if ( hard )
         cpumask_copy(unit->cpu_hard_affinity, hard);
@@ -1169,6 +1232,8 @@ static int vcpu_set_affinity(
     spinlock_t *lock;
     int ret = 0;
 
+    rcu_read_lock(&sched_res_rculock);
+
     lock = unit_schedule_lock_irq(unit);
 
     if ( unit->affinity_broken )
@@ -1197,6 +1262,8 @@ static int vcpu_set_affinity(
 
     sched_unit_migrate_finish(unit);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
 
@@ -1323,11 +1390,16 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=current;
-    spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit);
+    spinlock_t *lock;
+
+    rcu_read_lock(&sched_res_rculock);
 
+    lock = unit_schedule_lock_irq(v->sched_unit);
     sched_yield(vcpu_scheduler(v), v->sched_unit);
     unit_schedule_unlock_irq(lock, v->sched_unit);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     SCHED_STAT_CRANK(vcpu_yield);
 
     TRACE_2D(TRC_SCHED_YIELD, current->domain->domain_id, current->vcpu_id);
@@ -1413,6 +1485,8 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     spinlock_t *lock;
     int ret = -EINVAL;
 
+    rcu_read_lock(&sched_res_rculock);
+
     lock = unit_schedule_lock_irq(unit);
 
     if ( cpu < 0 )
@@ -1447,6 +1521,8 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
 
     sched_unit_migrate_finish(unit);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
 
@@ -1652,9 +1728,13 @@ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
 
     /* NB: the pluggable scheduler code needs to take care
      * of locking by itself. */
+    rcu_read_lock(&sched_res_rculock);
+
     if ( (ret = sched_adjust_dom(dom_scheduler(d), d, op)) == 0 )
         TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
 
@@ -1675,9 +1755,13 @@ long sched_adjust_global(struct xen_sysctl_scheduler_op *op)
     if ( pool == NULL )
         return -ESRCH;
 
+    rcu_read_lock(&sched_res_rculock);
+
     rc = ((op->sched_id == pool->sched->sched_id)
           ? sched_adjust_cpupool(pool->sched, op) : -EINVAL);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     cpupool_put(pool);
 
     return rc;
@@ -1859,8 +1943,11 @@ static void context_saved(struct sched_unit *unit)
 void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
 {
     struct sched_unit *next = vnext->sched_unit;
-    struct sched_resource *sd = get_sched_res(smp_processor_id());
+    struct sched_resource *sd;
 
+    rcu_read_lock(&sched_res_rculock);
+
+    sd = get_sched_res(smp_processor_id());
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
 
@@ -1887,6 +1974,8 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
 
     if ( is_idle_vcpu(vprev) && vprev != vnext )
         vprev->sched_unit = sd->sched_unit_idle;
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
@@ -1904,6 +1993,8 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
             vnext->sched_unit =
                 get_sched_res(smp_processor_id())->sched_unit_idle;
 
+        rcu_read_unlock(&sched_res_rculock);
+
         trace_continue_running(vnext);
         return continue_running(vprev);
     }
@@ -1917,6 +2008,8 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
 
     vcpu_periodic_timer_work(vnext);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     context_switch(vprev, vnext);
 }
 
@@ -2047,6 +2140,8 @@ static void sched_slave(void)
 
     ASSERT_NOT_IN_ATOMIC();
 
+    rcu_read_lock(&sched_res_rculock);
+
     lock = pcpu_schedule_lock_irq(cpu);
 
     now = NOW();
@@ -2070,6 +2165,8 @@ static void sched_slave(void)
     {
         pcpu_schedule_unlock_irq(lock, cpu);
 
+        rcu_read_unlock(&sched_res_rculock);
+
         /* Check for failed forced context switch. */
         if ( do_softirq )
             raise_softirq(SCHEDULE_SOFTIRQ);
@@ -2100,13 +2197,16 @@ static void schedule(void)
     struct sched_resource *sd;
     spinlock_t           *lock;
     int cpu = smp_processor_id();
-    unsigned int          gran = get_sched_res(cpu)->granularity;
+    unsigned int          gran;
 
     ASSERT_NOT_IN_ATOMIC();
 
     SCHED_STAT_CRANK(sched_run);
 
+    rcu_read_lock(&sched_res_rculock);
+
     sd = get_sched_res(cpu);
+    gran = sd->granularity;
 
     lock = pcpu_schedule_lock_irq(cpu);
 
@@ -2118,6 +2218,8 @@ static void schedule(void)
          */
         pcpu_schedule_unlock_irq(lock, cpu);
 
+        rcu_read_unlock(&sched_res_rculock);
+
         raise_softirq(SCHEDULE_SOFTIRQ);
         return sched_slave();
     }
@@ -2230,14 +2332,27 @@ static int cpu_schedule_up(unsigned int cpu)
     return 0;
 }
 
+static void sched_res_free(struct rcu_head *head)
+{
+    struct sched_resource *sd = container_of(head, struct sched_resource, rcu);
+
+    xfree(sd);
+}
+
 static void cpu_schedule_down(unsigned int cpu)
 {
-    struct sched_resource *sd = get_sched_res(cpu);
+    struct sched_resource *sd;
+
+    rcu_read_lock(&sched_res_rculock);
+
+    sd = get_sched_res(cpu);
 
     kill_timer(&sd->s_timer);
 
     set_sched_res(cpu, NULL);
-    xfree(sd);
+    call_rcu(&sd->rcu, sched_res_free);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void sched_rm_cpu(unsigned int cpu)
@@ -2257,6 +2372,8 @@ static int cpu_schedule_callback(
     unsigned int cpu = (unsigned long)hcpu;
     int rc = 0;
 
+    rcu_read_lock(&sched_res_rculock);
+
     /*
      * From the scheduler perspective, bringing up a pCPU requires
      * allocating and initializing the per-pCPU scheduler specific data,
@@ -2303,6 +2420,8 @@ static int cpu_schedule_callback(
         break;
     }
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
 }
 
@@ -2392,8 +2511,13 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0) == NULL )
         BUG();
+
+    rcu_read_lock(&sched_res_rculock);
+
     get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_unit_idle = idle_vcpu[0]->sched_unit;
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 /*
@@ -2406,9 +2530,14 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
     struct vcpu *idle;
     void *ppriv, *vpriv;
     struct scheduler *new_ops = c->sched;
-    struct sched_resource *sd = get_sched_res(cpu);
+    struct sched_resource *sd;
     spinlock_t *old_lock, *new_lock;
     unsigned long flags;
+    int ret = 0;
+
+    rcu_read_lock(&sched_res_rculock);
+
+    sd = get_sched_res(cpu);
 
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
     ASSERT(!cpumask_test_cpu(cpu, c->cpu_valid));
@@ -2428,13 +2557,18 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
     idle = idle_vcpu[cpu];
     ppriv = sched_alloc_pdata(new_ops, cpu);
     if ( IS_ERR(ppriv) )
-        return PTR_ERR(ppriv);
+    {
+        ret = PTR_ERR(ppriv);
+        goto out;
+    }
+
     vpriv = sched_alloc_vdata(new_ops, idle->sched_unit,
                               idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
         sched_free_pdata(new_ops, ppriv, cpu);
-        return -ENOMEM;
+        ret = -ENOMEM;
+        goto out;
     }
 
     /*
@@ -2473,7 +2607,10 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
     /* The  cpu is added to a pool, trigger it to go pick up some work */
     cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
 
-    return 0;
+out:
+    rcu_read_unlock(&sched_res_rculock);
+
+    return ret;
 }
 
 /*
@@ -2486,11 +2623,16 @@ int schedule_cpu_rm(unsigned int cpu)
 {
     struct vcpu *idle;
     void *ppriv_old, *vpriv_old;
-    struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *old_ops = sd->scheduler;
+    struct sched_resource *sd;
+    struct scheduler *old_ops;
     spinlock_t *old_lock;
     unsigned long flags;
 
+    rcu_read_lock(&sched_res_rculock);
+
+    sd = get_sched_res(cpu);
+    old_ops = sd->scheduler;
+
     ASSERT(sd->cpupool != NULL);
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
     ASSERT(!cpumask_test_cpu(cpu, sd->cpupool->cpu_valid));
@@ -2523,6 +2665,8 @@ int schedule_cpu_rm(unsigned int cpu)
     sd->granularity = 1;
     sd->cpupool = NULL;
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return 0;
 }
 
@@ -2571,6 +2715,8 @@ void schedule_dump(struct cpupool *c)
 
     /* Locking, if necessary, must be handled withing each scheduler */
 
+    rcu_read_lock(&sched_res_rculock);
+
     if ( c != NULL )
     {
         sched = c->sched;
@@ -2590,6 +2736,8 @@ void schedule_dump(struct cpupool *c)
         for_each_cpu (i, cpus)
             sched_dump_cpu_state(sched, i);
     }
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void sched_tick_suspend(void)
@@ -2597,10 +2745,14 @@ void sched_tick_suspend(void)
     struct scheduler *sched;
     unsigned int cpu = smp_processor_id();
 
+    rcu_read_lock(&sched_res_rculock);
+
     sched = get_sched_res(cpu)->scheduler;
     sched_do_tick_suspend(sched, cpu);
     rcu_idle_enter(cpu);
     rcu_idle_timer_start();
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void sched_tick_resume(void)
@@ -2608,10 +2760,14 @@ void sched_tick_resume(void)
     struct scheduler *sched;
     unsigned int cpu = smp_processor_id();
 
+    rcu_read_lock(&sched_res_rculock);
+
     rcu_idle_timer_stop();
     rcu_idle_exit(cpu);
     sched = get_sched_res(cpu)->scheduler;
     sched_do_tick_resume(sched, cpu);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void wait(void)
@@ -2626,7 +2782,13 @@ void wait(void)
  */
 int sched_has_urgent_vcpu(void)
 {
-    return atomic_read(&get_sched_res(smp_processor_id())->urgent_count);
+    int val;
+
+    rcu_read_lock(&sched_res_rculock);
+    val = atomic_read(&get_sched_res(smp_processor_id())->urgent_count);
+    rcu_read_unlock(&sched_res_rculock);
+
+    return val;
 }
 
 #ifdef CONFIG_COMPAT
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index e04d249dfd..abf6d0522d 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -10,6 +10,7 @@
 
 #include <xen/percpu.h>
 #include <xen/err.h>
+#include <xen/rcupdate.h>
 
 /* A global pointer to the initial cpupool (POOL0). */
 extern struct cpupool *cpupool0;
@@ -58,20 +59,22 @@ struct sched_resource {
     unsigned int        processor;
     unsigned int        granularity;
     const cpumask_t    *cpus;           /* cpus covered by this struct     */
+    struct rcu_head     rcu;
 };
 
 #define curr_on_cpu(c)    (get_sched_res(c)->curr)
 
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
+extern rcu_read_lock_t sched_res_rculock;
 
 static inline struct sched_resource *get_sched_res(unsigned int cpu)
 {
-    return per_cpu(sched_res, cpu);
+    return rcu_dereference(per_cpu(sched_res, cpu));
 }
 
 static inline void set_sched_res(unsigned int cpu, struct sched_resource *res)
 {
-    per_cpu(sched_res, cpu) = res;
+    rcu_assign_pointer(per_cpu(sched_res, cpu), res);
 }
 
 static inline bool is_idle_unit(const struct sched_unit *unit)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 56/60] xen/sched: protect scheduling resource via rcu
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

In order to be able to move cpus to cpupools with core scheduling
active it is mandatory to merge multiple cpus into one scheduling
resource or to split a scheduling resource with multiple cpus in it
into multiple scheduling resources. This in turn requires to modify
the cpu <-> scheduling resource relation. In order to be able to free
unused resources protect struct sched_resource via RCU. This ensures
there are no users left when freeing such a resource.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       |   4 +
 xen/common/schedule.c      | 200 ++++++++++++++++++++++++++++++++++++++++-----
 xen/include/xen/sched-if.h |   7 +-
 3 files changed, 190 insertions(+), 21 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 35108b9119..1c7722552c 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -510,8 +510,10 @@ static int cpupool_cpu_add(unsigned int cpu)
      * (or unplugging would have failed) and that is the default behavior
      * anyway.
      */
+    rcu_read_lock(&sched_res_rculock);
     get_sched_res(cpu)->cpupool = NULL;
     ret = cpupool_assign_cpu_locked(cpupool0, cpu);
+    rcu_read_unlock(&sched_res_rculock);
 
     spin_unlock(&cpupool_lock);
 
@@ -596,7 +598,9 @@ static void cpupool_cpu_remove_forced(unsigned int cpu)
         }
     }
 
+    rcu_read_lock(&sched_res_rculock);
     sched_rm_cpu(cpu);
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 0f667068a8..78eb055d07 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -73,6 +73,7 @@ static void poll_timer_fn(void *data);
 /* This is global for now so that private implementations can reach it */
 DEFINE_PER_CPU(struct sched_resource *, sched_res);
 static DEFINE_PER_CPU(unsigned int, sched_res_idx);
+DEFINE_RCU_READ_LOCK(sched_res_rculock);
 
 /* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
@@ -270,17 +271,25 @@ static inline void vcpu_runstate_change(
 
 void sched_guest_idle(void (*idle) (void), unsigned int cpu)
 {
+    rcu_read_lock(&sched_res_rculock);
     atomic_inc(&get_sched_res(cpu)->urgent_count);
+    rcu_read_unlock(&sched_res_rculock);
+
     idle();
+
+    rcu_read_lock(&sched_res_rculock);
     atomic_dec(&get_sched_res(cpu)->urgent_count);
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
 {
-    spinlock_t *lock = likely(v == current)
-                       ? NULL : unit_schedule_lock_irq(v->sched_unit);
+    spinlock_t *lock;
     s_time_t delta;
 
+    rcu_read_lock(&sched_res_rculock);
+
+    lock = likely(v == current) ? NULL : unit_schedule_lock_irq(v->sched_unit);
     memcpy(runstate, &v->runstate, sizeof(*runstate));
     delta = NOW() - runstate->state_entry_time;
     if ( delta > 0 )
@@ -288,6 +297,8 @@ void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
 
     if ( unlikely(lock != NULL) )
         unit_schedule_unlock_irq(lock, v->sched_unit);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 uint64_t get_cpu_idle_time(unsigned int cpu)
@@ -493,6 +504,8 @@ int sched_init_vcpu(struct vcpu *v)
         return 0;
     }
 
+    rcu_read_lock(&sched_res_rculock);
+
     /* The first vcpu of an unit can be set via sched_set_res(). */
     sched_set_res(unit, get_sched_res(processor));
 
@@ -500,6 +513,7 @@ int sched_init_vcpu(struct vcpu *v)
     if ( unit->priv == NULL )
     {
         sched_free_unit(unit, v);
+        rcu_read_unlock(&sched_res_rculock);
         return 1;
     }
 
@@ -526,6 +540,8 @@ int sched_init_vcpu(struct vcpu *v)
         sched_insert_unit(dom_scheduler(d), unit);
     }
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return 0;
 }
 
@@ -553,6 +569,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     void *unitdata;
     struct scheduler *old_ops;
     void *old_domdata;
+    int ret = 0;
 
     for_each_sched_unit ( d, unit )
     {
@@ -560,15 +577,21 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
             return -EBUSY;
     }
 
+    rcu_read_lock(&sched_res_rculock);
+
     domdata = sched_alloc_domdata(c->sched, d);
     if ( IS_ERR(domdata) )
-        return PTR_ERR(domdata);
+    {
+        ret = PTR_ERR(domdata);
+        goto out;
+    }
 
     unit_priv = xzalloc_array(void *, d->max_vcpus);
     if ( unit_priv == NULL )
     {
         sched_free_domdata(c->sched, domdata);
-        return -ENOMEM;
+        ret = -ENOMEM;
+        goto out;
     }
 
     for_each_sched_unit ( d, unit )
@@ -580,7 +603,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
                 xfree(unit_priv[unit->unit_id]);
             xfree(unit_priv);
             sched_free_domdata(c->sched, domdata);
-            return -ENOMEM;
+            ret = -ENOMEM;
+            goto out;
         }
     }
 
@@ -642,7 +666,10 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     xfree(unit_priv);
 
-    return 0;
+out:
+    rcu_read_unlock(&sched_res_rculock);
+
+    return ret;
 }
 
 void sched_destroy_vcpu(struct vcpu *v)
@@ -660,9 +687,13 @@ void sched_destroy_vcpu(struct vcpu *v)
      */
     if ( unit->vcpu == v )
     {
+        rcu_read_lock(&sched_res_rculock);
+
         sched_remove_unit(vcpu_scheduler(v), unit);
         sched_free_vdata(vcpu_scheduler(v), unit->priv);
         sched_free_unit(unit, v);
+
+        rcu_read_unlock(&sched_res_rculock);
     }
 }
 
@@ -680,7 +711,12 @@ int sched_init_domain(struct domain *d, int poolid)
     SCHED_STAT_CRANK(dom_init);
     TRACE_1D(TRC_SCHED_DOM_ADD, d->domain_id);
 
+    rcu_read_lock(&sched_res_rculock);
+
     sdom = sched_alloc_domdata(dom_scheduler(d), d);
+
+    rcu_read_unlock(&sched_res_rculock);
+
     if ( IS_ERR(sdom) )
         return PTR_ERR(sdom);
 
@@ -698,9 +734,13 @@ void sched_destroy_domain(struct domain *d)
         SCHED_STAT_CRANK(dom_destroy);
         TRACE_1D(TRC_SCHED_DOM_REM, d->domain_id);
 
+        rcu_read_lock(&sched_res_rculock);
+
         sched_free_domdata(dom_scheduler(d), d->sched_priv);
         d->sched_priv = NULL;
 
+        rcu_read_unlock(&sched_res_rculock);
+
         cpupool_rm_domain(d);
     }
 }
@@ -734,11 +774,15 @@ void vcpu_sleep_nosync(struct vcpu *v)
 
     TRACE_2D(TRC_SCHED_SLEEP, v->domain->domain_id, v->vcpu_id);
 
+    rcu_read_lock(&sched_res_rculock);
+
     lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
 
     vcpu_sleep_nosync_locked(v);
 
     unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void vcpu_sleep_sync(struct vcpu *v)
@@ -759,6 +803,8 @@ void vcpu_wake(struct vcpu *v)
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
+    rcu_read_lock(&sched_res_rculock);
+
     lock = unit_schedule_lock_irqsave(unit, &flags);
 
     if ( likely(vcpu_runnable(v)) )
@@ -779,6 +825,8 @@ void vcpu_wake(struct vcpu *v)
     }
 
     unit_schedule_unlock_irqrestore(lock, flags, unit);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void vcpu_unblock(struct vcpu *v)
@@ -812,6 +860,8 @@ static void sched_unit_move_locked(struct sched_unit *unit,
     unsigned int old_cpu = unit->res->processor;
     struct vcpu *v;
 
+    rcu_read_lock(&sched_res_rculock);
+
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
      * once the switch occurs, v->is_urgent is no longer protected by
@@ -831,6 +881,8 @@ static void sched_unit_move_locked(struct sched_unit *unit,
      * pointer can't change while the current lock is held.
      */
     sched_migrate(vcpu_scheduler(unit->vcpu), unit, new_cpu);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 /*
@@ -992,6 +1044,8 @@ void restore_vcpu_affinity(struct domain *d)
 
     ASSERT(system_state == SYS_STATE_resume);
 
+    rcu_read_lock(&sched_res_rculock);
+
     for_each_sched_unit ( d, unit )
     {
         spinlock_t *lock;
@@ -1042,6 +1096,8 @@ void restore_vcpu_affinity(struct domain *d)
             sched_move_irqs(unit);
     }
 
+    rcu_read_unlock(&sched_res_rculock);
+
     domain_update_node_affinity(d);
 }
 
@@ -1057,9 +1113,11 @@ int cpu_disable_scheduler(unsigned int cpu)
     cpumask_t online_affinity;
     int ret = 0;
 
+    rcu_read_lock(&sched_res_rculock);
+
     c = get_sched_res(cpu)->cpupool;
     if ( c == NULL )
-        return ret;
+        goto out;
 
     for_each_domain_in_cpupool ( d, c )
     {
@@ -1116,6 +1174,9 @@ int cpu_disable_scheduler(unsigned int cpu)
         }
     }
 
+out:
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
 
@@ -1149,7 +1210,9 @@ void sched_set_affinity(
 {
     struct sched_unit *unit = v->sched_unit;
 
+    rcu_read_lock(&sched_res_rculock);
     sched_adjust_affinity(dom_scheduler(v->domain), unit, hard, soft);
+    rcu_read_unlock(&sched_res_rculock);
 
     if ( hard )
         cpumask_copy(unit->cpu_hard_affinity, hard);
@@ -1169,6 +1232,8 @@ static int vcpu_set_affinity(
     spinlock_t *lock;
     int ret = 0;
 
+    rcu_read_lock(&sched_res_rculock);
+
     lock = unit_schedule_lock_irq(unit);
 
     if ( unit->affinity_broken )
@@ -1197,6 +1262,8 @@ static int vcpu_set_affinity(
 
     sched_unit_migrate_finish(unit);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
 
@@ -1323,11 +1390,16 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=current;
-    spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit);
+    spinlock_t *lock;
+
+    rcu_read_lock(&sched_res_rculock);
 
+    lock = unit_schedule_lock_irq(v->sched_unit);
     sched_yield(vcpu_scheduler(v), v->sched_unit);
     unit_schedule_unlock_irq(lock, v->sched_unit);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     SCHED_STAT_CRANK(vcpu_yield);
 
     TRACE_2D(TRC_SCHED_YIELD, current->domain->domain_id, current->vcpu_id);
@@ -1413,6 +1485,8 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     spinlock_t *lock;
     int ret = -EINVAL;
 
+    rcu_read_lock(&sched_res_rculock);
+
     lock = unit_schedule_lock_irq(unit);
 
     if ( cpu < 0 )
@@ -1447,6 +1521,8 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
 
     sched_unit_migrate_finish(unit);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
 
@@ -1652,9 +1728,13 @@ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
 
     /* NB: the pluggable scheduler code needs to take care
      * of locking by itself. */
+    rcu_read_lock(&sched_res_rculock);
+
     if ( (ret = sched_adjust_dom(dom_scheduler(d), d, op)) == 0 )
         TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
 
@@ -1675,9 +1755,13 @@ long sched_adjust_global(struct xen_sysctl_scheduler_op *op)
     if ( pool == NULL )
         return -ESRCH;
 
+    rcu_read_lock(&sched_res_rculock);
+
     rc = ((op->sched_id == pool->sched->sched_id)
           ? sched_adjust_cpupool(pool->sched, op) : -EINVAL);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     cpupool_put(pool);
 
     return rc;
@@ -1859,8 +1943,11 @@ static void context_saved(struct sched_unit *unit)
 void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
 {
     struct sched_unit *next = vnext->sched_unit;
-    struct sched_resource *sd = get_sched_res(smp_processor_id());
+    struct sched_resource *sd;
 
+    rcu_read_lock(&sched_res_rculock);
+
+    sd = get_sched_res(smp_processor_id());
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
 
@@ -1887,6 +1974,8 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
 
     if ( is_idle_vcpu(vprev) && vprev != vnext )
         vprev->sched_unit = sd->sched_unit_idle;
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
@@ -1904,6 +1993,8 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
             vnext->sched_unit =
                 get_sched_res(smp_processor_id())->sched_unit_idle;
 
+        rcu_read_unlock(&sched_res_rculock);
+
         trace_continue_running(vnext);
         return continue_running(vprev);
     }
@@ -1917,6 +2008,8 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
 
     vcpu_periodic_timer_work(vnext);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     context_switch(vprev, vnext);
 }
 
@@ -2047,6 +2140,8 @@ static void sched_slave(void)
 
     ASSERT_NOT_IN_ATOMIC();
 
+    rcu_read_lock(&sched_res_rculock);
+
     lock = pcpu_schedule_lock_irq(cpu);
 
     now = NOW();
@@ -2070,6 +2165,8 @@ static void sched_slave(void)
     {
         pcpu_schedule_unlock_irq(lock, cpu);
 
+        rcu_read_unlock(&sched_res_rculock);
+
         /* Check for failed forced context switch. */
         if ( do_softirq )
             raise_softirq(SCHEDULE_SOFTIRQ);
@@ -2100,13 +2197,16 @@ static void schedule(void)
     struct sched_resource *sd;
     spinlock_t           *lock;
     int cpu = smp_processor_id();
-    unsigned int          gran = get_sched_res(cpu)->granularity;
+    unsigned int          gran;
 
     ASSERT_NOT_IN_ATOMIC();
 
     SCHED_STAT_CRANK(sched_run);
 
+    rcu_read_lock(&sched_res_rculock);
+
     sd = get_sched_res(cpu);
+    gran = sd->granularity;
 
     lock = pcpu_schedule_lock_irq(cpu);
 
@@ -2118,6 +2218,8 @@ static void schedule(void)
          */
         pcpu_schedule_unlock_irq(lock, cpu);
 
+        rcu_read_unlock(&sched_res_rculock);
+
         raise_softirq(SCHEDULE_SOFTIRQ);
         return sched_slave();
     }
@@ -2230,14 +2332,27 @@ static int cpu_schedule_up(unsigned int cpu)
     return 0;
 }
 
+static void sched_res_free(struct rcu_head *head)
+{
+    struct sched_resource *sd = container_of(head, struct sched_resource, rcu);
+
+    xfree(sd);
+}
+
 static void cpu_schedule_down(unsigned int cpu)
 {
-    struct sched_resource *sd = get_sched_res(cpu);
+    struct sched_resource *sd;
+
+    rcu_read_lock(&sched_res_rculock);
+
+    sd = get_sched_res(cpu);
 
     kill_timer(&sd->s_timer);
 
     set_sched_res(cpu, NULL);
-    xfree(sd);
+    call_rcu(&sd->rcu, sched_res_free);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void sched_rm_cpu(unsigned int cpu)
@@ -2257,6 +2372,8 @@ static int cpu_schedule_callback(
     unsigned int cpu = (unsigned long)hcpu;
     int rc = 0;
 
+    rcu_read_lock(&sched_res_rculock);
+
     /*
      * From the scheduler perspective, bringing up a pCPU requires
      * allocating and initializing the per-pCPU scheduler specific data,
@@ -2303,6 +2420,8 @@ static int cpu_schedule_callback(
         break;
     }
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
 }
 
@@ -2392,8 +2511,13 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0) == NULL )
         BUG();
+
+    rcu_read_lock(&sched_res_rculock);
+
     get_sched_res(0)->curr = idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_unit_idle = idle_vcpu[0]->sched_unit;
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 /*
@@ -2406,9 +2530,14 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
     struct vcpu *idle;
     void *ppriv, *vpriv;
     struct scheduler *new_ops = c->sched;
-    struct sched_resource *sd = get_sched_res(cpu);
+    struct sched_resource *sd;
     spinlock_t *old_lock, *new_lock;
     unsigned long flags;
+    int ret = 0;
+
+    rcu_read_lock(&sched_res_rculock);
+
+    sd = get_sched_res(cpu);
 
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
     ASSERT(!cpumask_test_cpu(cpu, c->cpu_valid));
@@ -2428,13 +2557,18 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
     idle = idle_vcpu[cpu];
     ppriv = sched_alloc_pdata(new_ops, cpu);
     if ( IS_ERR(ppriv) )
-        return PTR_ERR(ppriv);
+    {
+        ret = PTR_ERR(ppriv);
+        goto out;
+    }
+
     vpriv = sched_alloc_vdata(new_ops, idle->sched_unit,
                               idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
         sched_free_pdata(new_ops, ppriv, cpu);
-        return -ENOMEM;
+        ret = -ENOMEM;
+        goto out;
     }
 
     /*
@@ -2473,7 +2607,10 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
     /* The  cpu is added to a pool, trigger it to go pick up some work */
     cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
 
-    return 0;
+out:
+    rcu_read_unlock(&sched_res_rculock);
+
+    return ret;
 }
 
 /*
@@ -2486,11 +2623,16 @@ int schedule_cpu_rm(unsigned int cpu)
 {
     struct vcpu *idle;
     void *ppriv_old, *vpriv_old;
-    struct sched_resource *sd = get_sched_res(cpu);
-    struct scheduler *old_ops = sd->scheduler;
+    struct sched_resource *sd;
+    struct scheduler *old_ops;
     spinlock_t *old_lock;
     unsigned long flags;
 
+    rcu_read_lock(&sched_res_rculock);
+
+    sd = get_sched_res(cpu);
+    old_ops = sd->scheduler;
+
     ASSERT(sd->cpupool != NULL);
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
     ASSERT(!cpumask_test_cpu(cpu, sd->cpupool->cpu_valid));
@@ -2523,6 +2665,8 @@ int schedule_cpu_rm(unsigned int cpu)
     sd->granularity = 1;
     sd->cpupool = NULL;
 
+    rcu_read_unlock(&sched_res_rculock);
+
     return 0;
 }
 
@@ -2571,6 +2715,8 @@ void schedule_dump(struct cpupool *c)
 
     /* Locking, if necessary, must be handled withing each scheduler */
 
+    rcu_read_lock(&sched_res_rculock);
+
     if ( c != NULL )
     {
         sched = c->sched;
@@ -2590,6 +2736,8 @@ void schedule_dump(struct cpupool *c)
         for_each_cpu (i, cpus)
             sched_dump_cpu_state(sched, i);
     }
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void sched_tick_suspend(void)
@@ -2597,10 +2745,14 @@ void sched_tick_suspend(void)
     struct scheduler *sched;
     unsigned int cpu = smp_processor_id();
 
+    rcu_read_lock(&sched_res_rculock);
+
     sched = get_sched_res(cpu)->scheduler;
     sched_do_tick_suspend(sched, cpu);
     rcu_idle_enter(cpu);
     rcu_idle_timer_start();
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void sched_tick_resume(void)
@@ -2608,10 +2760,14 @@ void sched_tick_resume(void)
     struct scheduler *sched;
     unsigned int cpu = smp_processor_id();
 
+    rcu_read_lock(&sched_res_rculock);
+
     rcu_idle_timer_stop();
     rcu_idle_exit(cpu);
     sched = get_sched_res(cpu)->scheduler;
     sched_do_tick_resume(sched, cpu);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
 
 void wait(void)
@@ -2626,7 +2782,13 @@ void wait(void)
  */
 int sched_has_urgent_vcpu(void)
 {
-    return atomic_read(&get_sched_res(smp_processor_id())->urgent_count);
+    int val;
+
+    rcu_read_lock(&sched_res_rculock);
+    val = atomic_read(&get_sched_res(smp_processor_id())->urgent_count);
+    rcu_read_unlock(&sched_res_rculock);
+
+    return val;
 }
 
 #ifdef CONFIG_COMPAT
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index e04d249dfd..abf6d0522d 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -10,6 +10,7 @@
 
 #include <xen/percpu.h>
 #include <xen/err.h>
+#include <xen/rcupdate.h>
 
 /* A global pointer to the initial cpupool (POOL0). */
 extern struct cpupool *cpupool0;
@@ -58,20 +59,22 @@ struct sched_resource {
     unsigned int        processor;
     unsigned int        granularity;
     const cpumask_t    *cpus;           /* cpus covered by this struct     */
+    struct rcu_head     rcu;
 };
 
 #define curr_on_cpu(c)    (get_sched_res(c)->curr)
 
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
+extern rcu_read_lock_t sched_res_rculock;
 
 static inline struct sched_resource *get_sched_res(unsigned int cpu)
 {
-    return per_cpu(sched_res, cpu);
+    return rcu_dereference(per_cpu(sched_res, cpu));
 }
 
 static inline void set_sched_res(unsigned int cpu, struct sched_resource *res)
 {
-    per_cpu(sched_res, cpu) = res;
+    rcu_assign_pointer(per_cpu(sched_res, cpu), res);
 }
 
 static inline bool is_idle_unit(const struct sched_unit *unit)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 57/60] xen/sched: support multiple cpus per scheduling resource
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Prepare supporting multiple cpus per scheduling resource by allocating
the cpumask per resource dynamically.

Modify sched_res_mask to have only one bit per scheduling resource set.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch (carved out from other patch)
---
 xen/common/schedule.c      | 16 ++++++++++++++--
 xen/include/xen/sched-if.h |  4 ++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 78eb055d07..6f6886d5c3 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -59,7 +59,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
 enum sched_gran opt_sched_granularity = SCHED_GRAN_cpu;
 unsigned int sched_granularity = 1;
 bool sched_disable_smt_switching;
-const cpumask_t *sched_res_mask = &cpumask_all;
+cpumask_var_t sched_res_mask;
 
 /* Common lock for free cpus. */
 static DEFINE_SPINLOCK(sched_free_cpu_lock);
@@ -2288,8 +2288,14 @@ static int cpu_schedule_up(unsigned int cpu)
     sd = xzalloc(struct sched_resource);
     if ( sd == NULL )
         return -ENOMEM;
+    if ( !zalloc_cpumask_var(&sd->cpus) )
+    {
+        xfree(sd);
+        return -ENOMEM;
+    }
+
     sd->processor = cpu;
-    sd->cpus = cpumask_of(cpu);
+    cpumask_copy(sd->cpus, cpumask_of(cpu));
     set_sched_res(cpu, sd);
 
     sd->scheduler = &sched_idle_ops;
@@ -2301,6 +2307,8 @@ static int cpu_schedule_up(unsigned int cpu)
     /* We start with cpu granularity. */
     sd->granularity = 1;
 
+    cpumask_set_cpu(cpu, sched_res_mask);
+
     /* Boot CPU is dealt with later in schedule_init(). */
     if ( cpu == 0 )
         return 0;
@@ -2336,6 +2344,7 @@ static void sched_res_free(struct rcu_head *head)
 {
     struct sched_resource *sd = container_of(head, struct sched_resource, rcu);
 
+    free_cpumask_var(sd->cpus);
     xfree(sd);
 }
 
@@ -2484,6 +2493,9 @@ void __init scheduler_init(void)
         printk("Using '%s' (%s)\n", ops.name, ops.opt_name);
     }
 
+    if ( !zalloc_cpumask_var(&sched_res_mask) )
+        BUG();
+
     if ( cpu_schedule_up(0) )
         BUG();
     register_cpu_notifier(&cpu_schedule_nfb);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index abf6d0522d..714d793815 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -24,7 +24,7 @@ extern cpumask_t cpupool_free_cpus;
 extern int sched_ratelimit_us;
 
 /* Scheduling resource mask. */
-extern const cpumask_t *sched_res_mask;
+extern cpumask_var_t sched_res_mask;
 
 /* Number of vcpus per struct sched_unit. */
 enum sched_gran {
@@ -58,7 +58,7 @@ struct sched_resource {
     atomic_t            urgent_count;   /* how many urgent vcpus           */
     unsigned int        processor;
     unsigned int        granularity;
-    const cpumask_t    *cpus;           /* cpus covered by this struct     */
+    cpumask_var_t       cpus;           /* cpus covered by this struct     */
     struct rcu_head     rcu;
 };
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 57/60] xen/sched: support multiple cpus per scheduling resource
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Prepare supporting multiple cpus per scheduling resource by allocating
the cpumask per resource dynamically.

Modify sched_res_mask to have only one bit per scheduling resource set.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch (carved out from other patch)
---
 xen/common/schedule.c      | 16 ++++++++++++++--
 xen/include/xen/sched-if.h |  4 ++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 78eb055d07..6f6886d5c3 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -59,7 +59,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
 enum sched_gran opt_sched_granularity = SCHED_GRAN_cpu;
 unsigned int sched_granularity = 1;
 bool sched_disable_smt_switching;
-const cpumask_t *sched_res_mask = &cpumask_all;
+cpumask_var_t sched_res_mask;
 
 /* Common lock for free cpus. */
 static DEFINE_SPINLOCK(sched_free_cpu_lock);
@@ -2288,8 +2288,14 @@ static int cpu_schedule_up(unsigned int cpu)
     sd = xzalloc(struct sched_resource);
     if ( sd == NULL )
         return -ENOMEM;
+    if ( !zalloc_cpumask_var(&sd->cpus) )
+    {
+        xfree(sd);
+        return -ENOMEM;
+    }
+
     sd->processor = cpu;
-    sd->cpus = cpumask_of(cpu);
+    cpumask_copy(sd->cpus, cpumask_of(cpu));
     set_sched_res(cpu, sd);
 
     sd->scheduler = &sched_idle_ops;
@@ -2301,6 +2307,8 @@ static int cpu_schedule_up(unsigned int cpu)
     /* We start with cpu granularity. */
     sd->granularity = 1;
 
+    cpumask_set_cpu(cpu, sched_res_mask);
+
     /* Boot CPU is dealt with later in schedule_init(). */
     if ( cpu == 0 )
         return 0;
@@ -2336,6 +2344,7 @@ static void sched_res_free(struct rcu_head *head)
 {
     struct sched_resource *sd = container_of(head, struct sched_resource, rcu);
 
+    free_cpumask_var(sd->cpus);
     xfree(sd);
 }
 
@@ -2484,6 +2493,9 @@ void __init scheduler_init(void)
         printk("Using '%s' (%s)\n", ops.name, ops.opt_name);
     }
 
+    if ( !zalloc_cpumask_var(&sched_res_mask) )
+        BUG();
+
     if ( cpu_schedule_up(0) )
         BUG();
     register_cpu_notifier(&cpu_schedule_nfb);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index abf6d0522d..714d793815 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -24,7 +24,7 @@ extern cpumask_t cpupool_free_cpus;
 extern int sched_ratelimit_us;
 
 /* Scheduling resource mask. */
-extern const cpumask_t *sched_res_mask;
+extern cpumask_var_t sched_res_mask;
 
 /* Number of vcpus per struct sched_unit. */
 enum sched_gran {
@@ -58,7 +58,7 @@ struct sched_resource {
     atomic_t            urgent_count;   /* how many urgent vcpus           */
     unsigned int        processor;
     unsigned int        granularity;
-    const cpumask_t    *cpus;           /* cpus covered by this struct     */
+    cpumask_var_t       cpus;           /* cpus covered by this struct     */
     struct rcu_head     rcu;
 };
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 58/60] xen/sched: support differing granularity in schedule_cpu_[add/rm]()
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

With core scheduling active schedule_cpu_[add/rm]() has to cope with
different scheduling granularity: a cpu not in any cpupool is subject
to granularity 1 (cpu scheduling), while a cpu in a cpupool might be
in a scheduling resource with more than one cpu.

Handle that by having arrays of old/new pdata and vdata and loop over
those where appropriate.

Additionally the scheduling resource(s) must either be merged or
split.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/cpupool.c  |  18 ++--
 xen/common/schedule.c | 228 +++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 205 insertions(+), 41 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 1c7722552c..ec81124182 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -535,6 +535,7 @@ static void cpupool_cpu_remove(unsigned int cpu)
         ret = cpupool_unassign_cpu_epilogue(cpupool0);
         BUG_ON(ret);
     }
+    cpumask_clear_cpu(cpu, &cpupool_free_cpus);
 }
 
 /*
@@ -584,20 +585,19 @@ static void cpupool_cpu_remove_forced(unsigned int cpu)
     struct cpupool **c;
     int ret;
 
-    if ( cpumask_test_cpu(cpu, &cpupool_free_cpus) )
-        cpumask_clear_cpu(cpu, &cpupool_free_cpus);
-    else
+    for_each_cpupool ( c )
     {
-        for_each_cpupool(c)
+        if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
         {
-            if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
-            {
-                ret = cpupool_unassign_cpu(*c, cpu);
-                BUG_ON(ret);
-            }
+            ret = cpupool_unassign_cpu_prologue(*c, cpu);
+            BUG_ON(ret);
+            ret = cpupool_unassign_cpu_epilogue(*c);
+            BUG_ON(ret);
         }
     }
 
+    cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+
     rcu_read_lock(&sched_res_rculock);
     sched_rm_cpu(cpu);
     rcu_read_unlock(&sched_res_rculock);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 6f6886d5c3..b4e65e81b0 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -402,26 +402,30 @@ static void sched_unit_add_vcpu(struct sched_unit *unit, struct vcpu *v)
     unit->runstate_cnt[v->runstate.state]++;
 }
 
-static struct sched_unit *sched_alloc_unit(struct vcpu *v)
+static struct sched_unit *sched_alloc_unit_mem(void)
 {
-    struct sched_unit *unit, **prev_unit;
-    struct domain *d = v->domain;
-    unsigned int gran = d->cpupool ? d->cpupool->granularity : 1;
+    struct sched_unit *unit;
 
-    for_each_sched_unit ( d, unit )
-        if ( unit->vcpu->vcpu_id / gran == v->vcpu_id / gran )
-            break;
+    unit = xzalloc(struct sched_unit);
+    if ( !unit )
+        return NULL;
 
-    if ( unit )
+    if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) ||
+         !zalloc_cpumask_var(&unit->cpu_hard_affinity_tmp) ||
+         !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) ||
+         !zalloc_cpumask_var(&unit->cpu_soft_affinity) )
     {
-        sched_unit_add_vcpu(unit, v);
-        return unit;
+        sched_free_unit_mem(unit);
+        unit = NULL;
     }
 
-    if ( (unit = xzalloc(struct sched_unit)) == NULL )
-        return NULL;
+    return unit;
+}
+
+static void sched_domain_insert_unit(struct sched_unit *unit, struct domain *d)
+{
+    struct sched_unit **prev_unit;
 
-    sched_unit_add_vcpu(unit, v);
     unit->domain = d;
 
     for ( prev_unit = &d->sched_unit_list; *prev_unit;
@@ -432,18 +436,31 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 
     unit->next_in_list = *prev_unit;
     *prev_unit = unit;
+}
 
-    if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) ||
-         !zalloc_cpumask_var(&unit->cpu_hard_affinity_tmp) ||
-         !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) ||
-         !zalloc_cpumask_var(&unit->cpu_soft_affinity) )
-        goto fail;
+static struct sched_unit *sched_alloc_unit(struct vcpu *v)
+{
+    struct sched_unit *unit;
+    struct domain *d = v->domain;
+    unsigned int gran = d->cpupool ? d->cpupool->granularity : 1;
 
-    return unit;
+    for_each_sched_unit ( d, unit )
+        if ( unit->vcpu->vcpu_id / gran == v->vcpu_id / gran )
+            break;
 
- fail:
-    sched_free_unit(unit, v);
-    return NULL;
+    if ( unit )
+    {
+        sched_unit_add_vcpu(unit, v);
+        return unit;
+    }
+
+    if ( (unit = sched_alloc_unit_mem()) == NULL )
+        return NULL;
+
+    sched_unit_add_vcpu(unit, v);
+    sched_domain_insert_unit(unit, d);
+
+    return unit;
 }
 
 static unsigned int sched_select_initial_cpu(const struct vcpu *v)
@@ -2281,18 +2298,28 @@ static void poll_timer_fn(void *data)
         vcpu_unblock(v);
 }
 
-static int cpu_schedule_up(unsigned int cpu)
+static struct sched_resource *sched_alloc_res(void)
 {
     struct sched_resource *sd;
 
     sd = xzalloc(struct sched_resource);
     if ( sd == NULL )
-        return -ENOMEM;
+        return NULL;
     if ( !zalloc_cpumask_var(&sd->cpus) )
     {
         xfree(sd);
-        return -ENOMEM;
+        return NULL;
     }
+    return sd;
+}
+
+static int cpu_schedule_up(unsigned int cpu)
+{
+    struct sched_resource *sd;
+
+    sd = sched_alloc_res();
+    if ( sd == NULL )
+        return -ENOMEM;
 
     sd->processor = cpu;
     cpumask_copy(sd->cpus, cpumask_of(cpu));
@@ -2345,6 +2372,8 @@ static void sched_res_free(struct rcu_head *head)
     struct sched_resource *sd = container_of(head, struct sched_resource, rcu);
 
     free_cpumask_var(sd->cpus);
+    if ( sd->sched_unit_idle )
+        sched_free_unit_mem(sd->sched_unit_idle);
     xfree(sd);
 }
 
@@ -2359,6 +2388,8 @@ static void cpu_schedule_down(unsigned int cpu)
     kill_timer(&sd->s_timer);
 
     set_sched_res(cpu, NULL);
+    /* Keep idle unit. */
+    sd->sched_unit_idle = NULL;
     call_rcu(&sd->rcu, sched_res_free);
 
     rcu_read_unlock(&sched_res_rculock);
@@ -2438,6 +2469,30 @@ static struct notifier_block cpu_schedule_nfb = {
     .notifier_call = cpu_schedule_callback
 };
 
+static const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt,
+                                              unsigned int cpu)
+{
+    const cpumask_t *mask;
+
+    switch ( opt )
+    {
+    case SCHED_GRAN_cpu:
+        mask = cpumask_of(cpu);
+        break;
+    case SCHED_GRAN_core:
+        mask = per_cpu(cpu_sibling_mask, cpu);
+        break;
+    case SCHED_GRAN_socket:
+        mask = per_cpu(cpu_core_mask, cpu);
+        break;
+    default:
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    return mask;
+}
+
 /* Initialise the data structures. */
 void __init scheduler_init(void)
 {
@@ -2596,6 +2651,46 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
      */
     old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
 
+    if ( c->granularity > 1 )
+    {
+        const cpumask_t *mask;
+        unsigned int cpu_iter, idx = 0;
+        struct sched_unit *old_unit, *master_unit;
+        struct sched_resource *sd_old;
+
+        /*
+         * We need to merge multiple idle_vcpu units and sched_resource structs
+         * into one. As the free cpus all share the same lock we are fine doing
+         * that now. The worst which could happen would be someone waiting for
+         * the lock, thus dereferencing sched_res->schedule_lock. This is the
+         * reason we are freeing struct sched_res via call_rcu() to avoid the
+         * lock pointer suddenly disappearing.
+         */
+        mask = sched_get_opt_cpumask(c->opt_granularity, cpu);
+        master_unit = idle_vcpu[cpu]->sched_unit;
+
+        for_each_cpu ( cpu_iter, mask )
+        {
+            if ( idx )
+                cpumask_clear_cpu(cpu_iter, sched_res_mask);
+
+            per_cpu(sched_res_idx, cpu_iter) = idx++;
+
+            if ( cpu == cpu_iter )
+                continue;
+
+            old_unit = idle_vcpu[cpu_iter]->sched_unit;
+            sd_old = get_sched_res(cpu_iter);
+            kill_timer(&sd_old->s_timer);
+            idle_vcpu[cpu_iter]->sched_unit = master_unit;
+            master_unit->runstate_cnt[RUNSTATE_running]++;
+            set_sched_res(cpu_iter, sd);
+            cpumask_set_cpu(cpu_iter, sd->cpus);
+
+            call_rcu(&sd_old->rcu, sched_res_free);
+        }
+    }
+
     new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
 
     sd->scheduler = new_ops;
@@ -2633,33 +2728,100 @@ out:
  */
 int schedule_cpu_rm(unsigned int cpu)
 {
-    struct vcpu *idle;
     void *ppriv_old, *vpriv_old;
-    struct sched_resource *sd;
+    struct sched_resource *sd, **sd_new = NULL;
+    struct sched_unit *unit;
     struct scheduler *old_ops;
     spinlock_t *old_lock;
     unsigned long flags;
+    int idx, ret = -ENOMEM;
+    unsigned int cpu_iter;
 
     rcu_read_lock(&sched_res_rculock);
 
     sd = get_sched_res(cpu);
     old_ops = sd->scheduler;
 
+    if ( sd->granularity > 1 )
+    {
+        sd_new = xmalloc_array(struct sched_resource *, sd->granularity - 1);
+        if ( !sd_new )
+            goto out;
+        for ( idx = 0; idx < sd->granularity - 1; idx++ )
+        {
+            sd_new[idx] = sched_alloc_res();
+            if ( sd_new[idx] )
+            {
+                sd_new[idx]->sched_unit_idle = sched_alloc_unit_mem();
+                if ( !sd_new[idx]->sched_unit_idle )
+                {
+                    sched_res_free(&sd_new[idx]->rcu);
+                    sd_new[idx] = NULL;
+                }
+            }
+            if ( !sd_new[idx] )
+            {
+                for ( idx--; idx >= 0; idx-- )
+                    sched_res_free(&sd_new[idx]->rcu);
+                goto out;
+            }
+            sd_new[idx]->curr = sd_new[idx]->sched_unit_idle;
+            sd_new[idx]->scheduler = &sched_idle_ops;
+            sd_new[idx]->granularity = 1;
+
+            /* We want the lock not to change when replacing the resource. */
+            sd_new[idx]->schedule_lock = sd->schedule_lock;
+        }
+    }
+
+    ret = 0;
     ASSERT(sd->cpupool != NULL);
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
     ASSERT(!cpumask_test_cpu(cpu, sd->cpupool->cpu_valid));
 
-    idle = idle_vcpu[cpu];
-
     sched_do_tick_suspend(old_ops, cpu);
 
     /* See comment in schedule_cpu_add() regarding lock switching. */
     old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
 
-    vpriv_old = idle->sched_unit->priv;
+    vpriv_old = idle_vcpu[cpu]->sched_unit->priv;
     ppriv_old = sd->sched_priv;
 
-    idle->sched_unit->priv = NULL;
+    idx = 0;
+    for_each_cpu ( cpu_iter, sd->cpus )
+    {
+        per_cpu(sched_res_idx, cpu_iter) = 0;
+        if ( cpu_iter == cpu )
+        {
+            idle_vcpu[cpu_iter]->sched_unit->priv = NULL;
+        }
+        else
+        {
+            /* Initialize unit. */
+            unit = sd_new[idx]->sched_unit_idle;
+            unit->res = sd_new[idx];
+            unit->is_running = true;
+            sched_unit_add_vcpu(unit, idle_vcpu[cpu_iter]);
+            sched_domain_insert_unit(unit, idle_vcpu[cpu_iter]->domain);
+
+            /* Adjust cpu masks of resources (old and new). */
+            cpumask_clear_cpu(cpu_iter, sd->cpus);
+            cpumask_set_cpu(cpu_iter, sd_new[idx]->cpus);
+
+            /* Init timer. */
+            init_timer(&sd_new[idx]->s_timer, s_timer_fn, NULL, cpu_iter);
+
+            /* Last resource initializations and insert resource pointer. */
+            sd_new[idx]->processor = cpu_iter;
+            set_sched_res(cpu_iter, sd_new[idx]);
+
+            /* Last action: set the new lock pointer. */
+            smp_mb();
+            sd_new[idx]->schedule_lock = &sched_free_cpu_lock;
+
+            idx++;
+        }
+    }
     sd->scheduler = &sched_idle_ops;
     sd->sched_priv = NULL;
 
@@ -2677,9 +2839,11 @@ int schedule_cpu_rm(unsigned int cpu)
     sd->granularity = 1;
     sd->cpupool = NULL;
 
+out:
     rcu_read_unlock(&sched_res_rculock);
+    xfree(sd_new);
 
-    return 0;
+    return ret;
 }
 
 struct scheduler *scheduler_get_default(void)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 58/60] xen/sched: support differing granularity in schedule_cpu_[add/rm]()
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

With core scheduling active schedule_cpu_[add/rm]() has to cope with
different scheduling granularity: a cpu not in any cpupool is subject
to granularity 1 (cpu scheduling), while a cpu in a cpupool might be
in a scheduling resource with more than one cpu.

Handle that by having arrays of old/new pdata and vdata and loop over
those where appropriate.

Additionally the scheduling resource(s) must either be merged or
split.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/cpupool.c  |  18 ++--
 xen/common/schedule.c | 228 +++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 205 insertions(+), 41 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 1c7722552c..ec81124182 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -535,6 +535,7 @@ static void cpupool_cpu_remove(unsigned int cpu)
         ret = cpupool_unassign_cpu_epilogue(cpupool0);
         BUG_ON(ret);
     }
+    cpumask_clear_cpu(cpu, &cpupool_free_cpus);
 }
 
 /*
@@ -584,20 +585,19 @@ static void cpupool_cpu_remove_forced(unsigned int cpu)
     struct cpupool **c;
     int ret;
 
-    if ( cpumask_test_cpu(cpu, &cpupool_free_cpus) )
-        cpumask_clear_cpu(cpu, &cpupool_free_cpus);
-    else
+    for_each_cpupool ( c )
     {
-        for_each_cpupool(c)
+        if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
         {
-            if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
-            {
-                ret = cpupool_unassign_cpu(*c, cpu);
-                BUG_ON(ret);
-            }
+            ret = cpupool_unassign_cpu_prologue(*c, cpu);
+            BUG_ON(ret);
+            ret = cpupool_unassign_cpu_epilogue(*c);
+            BUG_ON(ret);
         }
     }
 
+    cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+
     rcu_read_lock(&sched_res_rculock);
     sched_rm_cpu(cpu);
     rcu_read_unlock(&sched_res_rculock);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 6f6886d5c3..b4e65e81b0 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -402,26 +402,30 @@ static void sched_unit_add_vcpu(struct sched_unit *unit, struct vcpu *v)
     unit->runstate_cnt[v->runstate.state]++;
 }
 
-static struct sched_unit *sched_alloc_unit(struct vcpu *v)
+static struct sched_unit *sched_alloc_unit_mem(void)
 {
-    struct sched_unit *unit, **prev_unit;
-    struct domain *d = v->domain;
-    unsigned int gran = d->cpupool ? d->cpupool->granularity : 1;
+    struct sched_unit *unit;
 
-    for_each_sched_unit ( d, unit )
-        if ( unit->vcpu->vcpu_id / gran == v->vcpu_id / gran )
-            break;
+    unit = xzalloc(struct sched_unit);
+    if ( !unit )
+        return NULL;
 
-    if ( unit )
+    if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) ||
+         !zalloc_cpumask_var(&unit->cpu_hard_affinity_tmp) ||
+         !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) ||
+         !zalloc_cpumask_var(&unit->cpu_soft_affinity) )
     {
-        sched_unit_add_vcpu(unit, v);
-        return unit;
+        sched_free_unit_mem(unit);
+        unit = NULL;
     }
 
-    if ( (unit = xzalloc(struct sched_unit)) == NULL )
-        return NULL;
+    return unit;
+}
+
+static void sched_domain_insert_unit(struct sched_unit *unit, struct domain *d)
+{
+    struct sched_unit **prev_unit;
 
-    sched_unit_add_vcpu(unit, v);
     unit->domain = d;
 
     for ( prev_unit = &d->sched_unit_list; *prev_unit;
@@ -432,18 +436,31 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 
     unit->next_in_list = *prev_unit;
     *prev_unit = unit;
+}
 
-    if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) ||
-         !zalloc_cpumask_var(&unit->cpu_hard_affinity_tmp) ||
-         !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) ||
-         !zalloc_cpumask_var(&unit->cpu_soft_affinity) )
-        goto fail;
+static struct sched_unit *sched_alloc_unit(struct vcpu *v)
+{
+    struct sched_unit *unit;
+    struct domain *d = v->domain;
+    unsigned int gran = d->cpupool ? d->cpupool->granularity : 1;
 
-    return unit;
+    for_each_sched_unit ( d, unit )
+        if ( unit->vcpu->vcpu_id / gran == v->vcpu_id / gran )
+            break;
 
- fail:
-    sched_free_unit(unit, v);
-    return NULL;
+    if ( unit )
+    {
+        sched_unit_add_vcpu(unit, v);
+        return unit;
+    }
+
+    if ( (unit = sched_alloc_unit_mem()) == NULL )
+        return NULL;
+
+    sched_unit_add_vcpu(unit, v);
+    sched_domain_insert_unit(unit, d);
+
+    return unit;
 }
 
 static unsigned int sched_select_initial_cpu(const struct vcpu *v)
@@ -2281,18 +2298,28 @@ static void poll_timer_fn(void *data)
         vcpu_unblock(v);
 }
 
-static int cpu_schedule_up(unsigned int cpu)
+static struct sched_resource *sched_alloc_res(void)
 {
     struct sched_resource *sd;
 
     sd = xzalloc(struct sched_resource);
     if ( sd == NULL )
-        return -ENOMEM;
+        return NULL;
     if ( !zalloc_cpumask_var(&sd->cpus) )
     {
         xfree(sd);
-        return -ENOMEM;
+        return NULL;
     }
+    return sd;
+}
+
+static int cpu_schedule_up(unsigned int cpu)
+{
+    struct sched_resource *sd;
+
+    sd = sched_alloc_res();
+    if ( sd == NULL )
+        return -ENOMEM;
 
     sd->processor = cpu;
     cpumask_copy(sd->cpus, cpumask_of(cpu));
@@ -2345,6 +2372,8 @@ static void sched_res_free(struct rcu_head *head)
     struct sched_resource *sd = container_of(head, struct sched_resource, rcu);
 
     free_cpumask_var(sd->cpus);
+    if ( sd->sched_unit_idle )
+        sched_free_unit_mem(sd->sched_unit_idle);
     xfree(sd);
 }
 
@@ -2359,6 +2388,8 @@ static void cpu_schedule_down(unsigned int cpu)
     kill_timer(&sd->s_timer);
 
     set_sched_res(cpu, NULL);
+    /* Keep idle unit. */
+    sd->sched_unit_idle = NULL;
     call_rcu(&sd->rcu, sched_res_free);
 
     rcu_read_unlock(&sched_res_rculock);
@@ -2438,6 +2469,30 @@ static struct notifier_block cpu_schedule_nfb = {
     .notifier_call = cpu_schedule_callback
 };
 
+static const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt,
+                                              unsigned int cpu)
+{
+    const cpumask_t *mask;
+
+    switch ( opt )
+    {
+    case SCHED_GRAN_cpu:
+        mask = cpumask_of(cpu);
+        break;
+    case SCHED_GRAN_core:
+        mask = per_cpu(cpu_sibling_mask, cpu);
+        break;
+    case SCHED_GRAN_socket:
+        mask = per_cpu(cpu_core_mask, cpu);
+        break;
+    default:
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    return mask;
+}
+
 /* Initialise the data structures. */
 void __init scheduler_init(void)
 {
@@ -2596,6 +2651,46 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
      */
     old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
 
+    if ( c->granularity > 1 )
+    {
+        const cpumask_t *mask;
+        unsigned int cpu_iter, idx = 0;
+        struct sched_unit *old_unit, *master_unit;
+        struct sched_resource *sd_old;
+
+        /*
+         * We need to merge multiple idle_vcpu units and sched_resource structs
+         * into one. As the free cpus all share the same lock we are fine doing
+         * that now. The worst which could happen would be someone waiting for
+         * the lock, thus dereferencing sched_res->schedule_lock. This is the
+         * reason we are freeing struct sched_res via call_rcu() to avoid the
+         * lock pointer suddenly disappearing.
+         */
+        mask = sched_get_opt_cpumask(c->opt_granularity, cpu);
+        master_unit = idle_vcpu[cpu]->sched_unit;
+
+        for_each_cpu ( cpu_iter, mask )
+        {
+            if ( idx )
+                cpumask_clear_cpu(cpu_iter, sched_res_mask);
+
+            per_cpu(sched_res_idx, cpu_iter) = idx++;
+
+            if ( cpu == cpu_iter )
+                continue;
+
+            old_unit = idle_vcpu[cpu_iter]->sched_unit;
+            sd_old = get_sched_res(cpu_iter);
+            kill_timer(&sd_old->s_timer);
+            idle_vcpu[cpu_iter]->sched_unit = master_unit;
+            master_unit->runstate_cnt[RUNSTATE_running]++;
+            set_sched_res(cpu_iter, sd);
+            cpumask_set_cpu(cpu_iter, sd->cpus);
+
+            call_rcu(&sd_old->rcu, sched_res_free);
+        }
+    }
+
     new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
 
     sd->scheduler = new_ops;
@@ -2633,33 +2728,100 @@ out:
  */
 int schedule_cpu_rm(unsigned int cpu)
 {
-    struct vcpu *idle;
     void *ppriv_old, *vpriv_old;
-    struct sched_resource *sd;
+    struct sched_resource *sd, **sd_new = NULL;
+    struct sched_unit *unit;
     struct scheduler *old_ops;
     spinlock_t *old_lock;
     unsigned long flags;
+    int idx, ret = -ENOMEM;
+    unsigned int cpu_iter;
 
     rcu_read_lock(&sched_res_rculock);
 
     sd = get_sched_res(cpu);
     old_ops = sd->scheduler;
 
+    if ( sd->granularity > 1 )
+    {
+        sd_new = xmalloc_array(struct sched_resource *, sd->granularity - 1);
+        if ( !sd_new )
+            goto out;
+        for ( idx = 0; idx < sd->granularity - 1; idx++ )
+        {
+            sd_new[idx] = sched_alloc_res();
+            if ( sd_new[idx] )
+            {
+                sd_new[idx]->sched_unit_idle = sched_alloc_unit_mem();
+                if ( !sd_new[idx]->sched_unit_idle )
+                {
+                    sched_res_free(&sd_new[idx]->rcu);
+                    sd_new[idx] = NULL;
+                }
+            }
+            if ( !sd_new[idx] )
+            {
+                for ( idx--; idx >= 0; idx-- )
+                    sched_res_free(&sd_new[idx]->rcu);
+                goto out;
+            }
+            sd_new[idx]->curr = sd_new[idx]->sched_unit_idle;
+            sd_new[idx]->scheduler = &sched_idle_ops;
+            sd_new[idx]->granularity = 1;
+
+            /* We want the lock not to change when replacing the resource. */
+            sd_new[idx]->schedule_lock = sd->schedule_lock;
+        }
+    }
+
+    ret = 0;
     ASSERT(sd->cpupool != NULL);
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
     ASSERT(!cpumask_test_cpu(cpu, sd->cpupool->cpu_valid));
 
-    idle = idle_vcpu[cpu];
-
     sched_do_tick_suspend(old_ops, cpu);
 
     /* See comment in schedule_cpu_add() regarding lock switching. */
     old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
 
-    vpriv_old = idle->sched_unit->priv;
+    vpriv_old = idle_vcpu[cpu]->sched_unit->priv;
     ppriv_old = sd->sched_priv;
 
-    idle->sched_unit->priv = NULL;
+    idx = 0;
+    for_each_cpu ( cpu_iter, sd->cpus )
+    {
+        per_cpu(sched_res_idx, cpu_iter) = 0;
+        if ( cpu_iter == cpu )
+        {
+            idle_vcpu[cpu_iter]->sched_unit->priv = NULL;
+        }
+        else
+        {
+            /* Initialize unit. */
+            unit = sd_new[idx]->sched_unit_idle;
+            unit->res = sd_new[idx];
+            unit->is_running = true;
+            sched_unit_add_vcpu(unit, idle_vcpu[cpu_iter]);
+            sched_domain_insert_unit(unit, idle_vcpu[cpu_iter]->domain);
+
+            /* Adjust cpu masks of resources (old and new). */
+            cpumask_clear_cpu(cpu_iter, sd->cpus);
+            cpumask_set_cpu(cpu_iter, sd_new[idx]->cpus);
+
+            /* Init timer. */
+            init_timer(&sd_new[idx]->s_timer, s_timer_fn, NULL, cpu_iter);
+
+            /* Last resource initializations and insert resource pointer. */
+            sd_new[idx]->processor = cpu_iter;
+            set_sched_res(cpu_iter, sd_new[idx]);
+
+            /* Last action: set the new lock pointer. */
+            smp_mb();
+            sd_new[idx]->schedule_lock = &sched_free_cpu_lock;
+
+            idx++;
+        }
+    }
     sd->scheduler = &sched_idle_ops;
     sd->sched_priv = NULL;
 
@@ -2677,9 +2839,11 @@ int schedule_cpu_rm(unsigned int cpu)
     sd->granularity = 1;
     sd->cpupool = NULL;
 
+out:
     rcu_read_unlock(&sched_res_rculock);
+    xfree(sd_new);
 
-    return 0;
+    return ret;
 }
 
 struct scheduler *scheduler_get_default(void)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 59/60] xen/sched: support core scheduling for moving cpus to/from cpupools
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

With core scheduling active it is necessary to move multiple cpus at
the same time to or from a cpupool in order to avoid split scheduling
resources in between.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       | 87 +++++++++++++++++++++++++++++++++++-----------
 xen/common/schedule.c      |  3 +-
 xen/include/xen/sched-if.h |  1 +
 3 files changed, 68 insertions(+), 23 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index ec81124182..4dac7dedca 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -265,23 +265,30 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
 {
     int ret;
     struct domain *d;
+    const cpumask_t *cpus;
+
+    cpus = sched_get_opt_cpumask(c->opt_granularity, cpu);
 
     if ( (cpupool_moving_cpu == cpu) && (c != cpupool_cpu_moving) )
         return -EADDRNOTAVAIL;
-    ret = schedule_cpu_add(cpu, c);
+    ret = schedule_cpu_add(cpumask_first(cpus), c);
     if ( ret )
         return ret;
 
-    cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+    rcu_read_lock(&sched_res_rculock);
+
+    cpumask_andnot(&cpupool_free_cpus, &cpupool_free_cpus, cpus);
     if (cpupool_moving_cpu == cpu)
     {
         cpupool_moving_cpu = -1;
         cpupool_put(cpupool_cpu_moving);
         cpupool_cpu_moving = NULL;
     }
-    cpumask_set_cpu(cpu, c->cpu_valid);
+    cpumask_or(c->cpu_valid, c->cpu_valid, cpus);
     cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool(d, c)
     {
@@ -295,6 +302,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
 static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
 {
     int cpu = cpupool_moving_cpu;
+    const cpumask_t *cpus;
     struct domain *d;
     int ret;
 
@@ -307,7 +315,10 @@ static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
      */
     rcu_read_lock(&domlist_read_lock);
     ret = cpu_disable_scheduler(cpu);
-    cpumask_set_cpu(cpu, &cpupool_free_cpus);
+
+    rcu_read_lock(&sched_res_rculock);
+    cpus = get_sched_res(cpu)->cpus;
+    cpumask_or(&cpupool_free_cpus, &cpupool_free_cpus, cpus);
 
     /*
      * cpu_disable_scheduler() returning an error doesn't require resetting
@@ -320,7 +331,7 @@ static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
     {
         ret = schedule_cpu_rm(cpu);
         if ( ret )
-            cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+            cpumask_andnot(&cpupool_free_cpus, &cpupool_free_cpus, cpus);
         else
         {
             cpupool_moving_cpu = -1;
@@ -328,6 +339,7 @@ static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
             cpupool_cpu_moving = NULL;
         }
     }
+    rcu_read_unlock(&domlist_read_lock);
 
     for_each_domain_in_cpupool(d, c)
     {
@@ -342,6 +354,7 @@ static int cpupool_unassign_cpu_prologue(struct cpupool *c, unsigned int cpu)
 {
     int ret;
     struct domain *d;
+    const cpumask_t *cpus;
 
     spin_lock(&cpupool_lock);
     ret = -EADDRNOTAVAIL;
@@ -352,7 +365,11 @@ static int cpupool_unassign_cpu_prologue(struct cpupool *c, unsigned int cpu)
     if ( !cpumask_test_cpu(cpu, c->cpu_valid) && (cpu != cpupool_moving_cpu) )
         goto out;
 
-    if ( (c->n_dom > 0) && (cpumask_weight(c->cpu_valid) == 1) &&
+    rcu_read_lock(&sched_res_rculock);
+    cpus = get_sched_res(cpu)->cpus;
+
+    if ( (c->n_dom > 0) &&
+         (cpumask_weight(c->cpu_valid) == cpumask_weight(cpus)) &&
          (cpu != cpupool_moving_cpu) )
     {
         rcu_read_lock(&domlist_read_lock);
@@ -374,9 +391,10 @@ static int cpupool_unassign_cpu_prologue(struct cpupool *c, unsigned int cpu)
     cpupool_moving_cpu = cpu;
     atomic_inc(&c->refcnt);
     cpupool_cpu_moving = c;
-    cpumask_clear_cpu(cpu, c->cpu_valid);
+    cpumask_andnot(c->cpu_valid, c->cpu_valid, cpus);
     cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
 
+    rcu_read_unlock(&domlist_read_lock);
 out:
     spin_unlock(&cpupool_lock);
 
@@ -428,12 +446,12 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
         return ret;
     }
 
-    work_cpu = smp_processor_id();
+    work_cpu = sched_get_resource_cpu(smp_processor_id());
     if ( work_cpu == cpu )
     {
         work_cpu = cpumask_first(cpupool0->cpu_valid);
         if ( work_cpu == cpu )
-            work_cpu = cpumask_next(cpu, cpupool0->cpu_valid);
+            work_cpu = cpumask_last(cpupool0->cpu_valid);
     }
     return continue_hypercall_on_cpu(work_cpu, cpupool_unassign_cpu_helper, c);
 }
@@ -499,6 +517,7 @@ void cpupool_rm_domain(struct domain *d)
 static int cpupool_cpu_add(unsigned int cpu)
 {
     int ret = 0;
+    const cpumask_t *cpus;
 
     spin_lock(&cpupool_lock);
     cpumask_clear_cpu(cpu, &cpupool_locked_cpus);
@@ -512,7 +531,11 @@ static int cpupool_cpu_add(unsigned int cpu)
      */
     rcu_read_lock(&sched_res_rculock);
     get_sched_res(cpu)->cpupool = NULL;
-    ret = cpupool_assign_cpu_locked(cpupool0, cpu);
+
+    cpus = sched_get_opt_cpumask(cpupool0->opt_granularity, cpu);
+    if ( cpumask_subset(cpus, &cpupool_free_cpus) )
+        ret = cpupool_assign_cpu_locked(cpupool0, cpu);
+
     rcu_read_unlock(&sched_res_rculock);
 
     spin_unlock(&cpupool_lock);
@@ -547,27 +570,33 @@ static void cpupool_cpu_remove(unsigned int cpu)
 static int cpupool_cpu_remove_prologue(unsigned int cpu)
 {
     int ret = 0;
+    cpumask_t *cpus;
+    unsigned int master_cpu;
 
     spin_lock(&cpupool_lock);
 
-    if ( cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
+    rcu_read_lock(&sched_res_rculock);
+    cpus = get_sched_res(cpu)->cpus;
+    master_cpu = sched_get_resource_cpu(cpu);
+    if ( cpumask_intersects(cpus, &cpupool_locked_cpus) )
         ret = -EBUSY;
     else
         cpumask_set_cpu(cpu, &cpupool_locked_cpus);
+    rcu_read_unlock(&sched_res_rculock);
 
     spin_unlock(&cpupool_lock);
 
     if ( ret )
         return  ret;
 
-    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    if ( cpumask_test_cpu(master_cpu, cpupool0->cpu_valid) )
     {
         /* Cpupool0 is populated only after all cpus are up. */
         ASSERT(system_state == SYS_STATE_active);
 
-        ret = cpupool_unassign_cpu_prologue(cpupool0, cpu);
+        ret = cpupool_unassign_cpu_prologue(cpupool0, master_cpu);
     }
-    else if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+    else if ( !cpumask_test_cpu(master_cpu, &cpupool_free_cpus) )
         ret = -ENODEV;
 
     return ret;
@@ -657,27 +686,43 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
     case XEN_SYSCTL_CPUPOOL_OP_ADDCPU:
     {
         unsigned cpu;
+        const cpumask_t *cpus;
 
         cpu = op->cpu;
         cpupool_dprintk("cpupool_assign_cpu(pool=%d,cpu=%d)\n",
                         op->cpupool_id, cpu);
+
         spin_lock(&cpupool_lock);
+
+        c = cpupool_find_by_id(op->cpupool_id);
+        ret = -ENOENT;
+        if ( c == NULL )
+            goto addcpu_out;
         if ( cpu == XEN_SYSCTL_CPUPOOL_PAR_ANY )
-            cpu = cpumask_first(&cpupool_free_cpus);
+        {
+            for_each_cpu ( cpu, &cpupool_free_cpus )
+            {
+                cpus = sched_get_opt_cpumask(c->opt_granularity, cpu);
+                if ( cpumask_subset(cpus, &cpupool_free_cpus) )
+                    break;
+            }
+            ret = -ENODEV;
+            if ( cpu >= nr_cpu_ids )
+                goto addcpu_out;
+        }
         ret = -EINVAL;
         if ( cpu >= nr_cpu_ids )
             goto addcpu_out;
         ret = -ENODEV;
-        if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
-             cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
-            goto addcpu_out;
-        c = cpupool_find_by_id(op->cpupool_id);
-        ret = -ENOENT;
-        if ( c == NULL )
+        cpus = sched_get_opt_cpumask(c->opt_granularity, cpu);
+        if ( !cpumask_subset(cpus, &cpupool_free_cpus) ||
+             cpumask_intersects(cpus, &cpupool_locked_cpus) )
             goto addcpu_out;
         ret = cpupool_assign_cpu_locked(c, cpu);
+
     addcpu_out:
         spin_unlock(&cpupool_lock);
+
         cpupool_dprintk("cpupool_assign_cpu(pool=%d,cpu=%d) ret %d\n",
                         op->cpupool_id, cpu, ret);
     }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index b4e65e81b0..fed71b26d2 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -2469,8 +2469,7 @@ static struct notifier_block cpu_schedule_nfb = {
     .notifier_call = cpu_schedule_callback
 };
 
-static const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt,
-                                              unsigned int cpu)
+const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu)
 {
     const cpumask_t *mask;
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 714d793815..c125236068 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -613,5 +613,6 @@ affinity_balance_cpumask(const struct sched_unit *unit, int step,
 }
 
 void sched_rm_cpu(unsigned int cpu);
+const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu);
 
 #endif /* __XEN_SCHED_IF_H__ */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 59/60] xen/sched: support core scheduling for moving cpus to/from cpupools
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

With core scheduling active it is necessary to move multiple cpus at
the same time to or from a cpupool in order to avoid split scheduling
resources in between.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       | 87 +++++++++++++++++++++++++++++++++++-----------
 xen/common/schedule.c      |  3 +-
 xen/include/xen/sched-if.h |  1 +
 3 files changed, 68 insertions(+), 23 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index ec81124182..4dac7dedca 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -265,23 +265,30 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
 {
     int ret;
     struct domain *d;
+    const cpumask_t *cpus;
+
+    cpus = sched_get_opt_cpumask(c->opt_granularity, cpu);
 
     if ( (cpupool_moving_cpu == cpu) && (c != cpupool_cpu_moving) )
         return -EADDRNOTAVAIL;
-    ret = schedule_cpu_add(cpu, c);
+    ret = schedule_cpu_add(cpumask_first(cpus), c);
     if ( ret )
         return ret;
 
-    cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+    rcu_read_lock(&sched_res_rculock);
+
+    cpumask_andnot(&cpupool_free_cpus, &cpupool_free_cpus, cpus);
     if (cpupool_moving_cpu == cpu)
     {
         cpupool_moving_cpu = -1;
         cpupool_put(cpupool_cpu_moving);
         cpupool_cpu_moving = NULL;
     }
-    cpumask_set_cpu(cpu, c->cpu_valid);
+    cpumask_or(c->cpu_valid, c->cpu_valid, cpus);
     cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
 
+    rcu_read_unlock(&sched_res_rculock);
+
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool(d, c)
     {
@@ -295,6 +302,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
 static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
 {
     int cpu = cpupool_moving_cpu;
+    const cpumask_t *cpus;
     struct domain *d;
     int ret;
 
@@ -307,7 +315,10 @@ static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
      */
     rcu_read_lock(&domlist_read_lock);
     ret = cpu_disable_scheduler(cpu);
-    cpumask_set_cpu(cpu, &cpupool_free_cpus);
+
+    rcu_read_lock(&sched_res_rculock);
+    cpus = get_sched_res(cpu)->cpus;
+    cpumask_or(&cpupool_free_cpus, &cpupool_free_cpus, cpus);
 
     /*
      * cpu_disable_scheduler() returning an error doesn't require resetting
@@ -320,7 +331,7 @@ static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
     {
         ret = schedule_cpu_rm(cpu);
         if ( ret )
-            cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+            cpumask_andnot(&cpupool_free_cpus, &cpupool_free_cpus, cpus);
         else
         {
             cpupool_moving_cpu = -1;
@@ -328,6 +339,7 @@ static int cpupool_unassign_cpu_epilogue(struct cpupool *c)
             cpupool_cpu_moving = NULL;
         }
     }
+    rcu_read_unlock(&domlist_read_lock);
 
     for_each_domain_in_cpupool(d, c)
     {
@@ -342,6 +354,7 @@ static int cpupool_unassign_cpu_prologue(struct cpupool *c, unsigned int cpu)
 {
     int ret;
     struct domain *d;
+    const cpumask_t *cpus;
 
     spin_lock(&cpupool_lock);
     ret = -EADDRNOTAVAIL;
@@ -352,7 +365,11 @@ static int cpupool_unassign_cpu_prologue(struct cpupool *c, unsigned int cpu)
     if ( !cpumask_test_cpu(cpu, c->cpu_valid) && (cpu != cpupool_moving_cpu) )
         goto out;
 
-    if ( (c->n_dom > 0) && (cpumask_weight(c->cpu_valid) == 1) &&
+    rcu_read_lock(&sched_res_rculock);
+    cpus = get_sched_res(cpu)->cpus;
+
+    if ( (c->n_dom > 0) &&
+         (cpumask_weight(c->cpu_valid) == cpumask_weight(cpus)) &&
          (cpu != cpupool_moving_cpu) )
     {
         rcu_read_lock(&domlist_read_lock);
@@ -374,9 +391,10 @@ static int cpupool_unassign_cpu_prologue(struct cpupool *c, unsigned int cpu)
     cpupool_moving_cpu = cpu;
     atomic_inc(&c->refcnt);
     cpupool_cpu_moving = c;
-    cpumask_clear_cpu(cpu, c->cpu_valid);
+    cpumask_andnot(c->cpu_valid, c->cpu_valid, cpus);
     cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
 
+    rcu_read_unlock(&domlist_read_lock);
 out:
     spin_unlock(&cpupool_lock);
 
@@ -428,12 +446,12 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
         return ret;
     }
 
-    work_cpu = smp_processor_id();
+    work_cpu = sched_get_resource_cpu(smp_processor_id());
     if ( work_cpu == cpu )
     {
         work_cpu = cpumask_first(cpupool0->cpu_valid);
         if ( work_cpu == cpu )
-            work_cpu = cpumask_next(cpu, cpupool0->cpu_valid);
+            work_cpu = cpumask_last(cpupool0->cpu_valid);
     }
     return continue_hypercall_on_cpu(work_cpu, cpupool_unassign_cpu_helper, c);
 }
@@ -499,6 +517,7 @@ void cpupool_rm_domain(struct domain *d)
 static int cpupool_cpu_add(unsigned int cpu)
 {
     int ret = 0;
+    const cpumask_t *cpus;
 
     spin_lock(&cpupool_lock);
     cpumask_clear_cpu(cpu, &cpupool_locked_cpus);
@@ -512,7 +531,11 @@ static int cpupool_cpu_add(unsigned int cpu)
      */
     rcu_read_lock(&sched_res_rculock);
     get_sched_res(cpu)->cpupool = NULL;
-    ret = cpupool_assign_cpu_locked(cpupool0, cpu);
+
+    cpus = sched_get_opt_cpumask(cpupool0->opt_granularity, cpu);
+    if ( cpumask_subset(cpus, &cpupool_free_cpus) )
+        ret = cpupool_assign_cpu_locked(cpupool0, cpu);
+
     rcu_read_unlock(&sched_res_rculock);
 
     spin_unlock(&cpupool_lock);
@@ -547,27 +570,33 @@ static void cpupool_cpu_remove(unsigned int cpu)
 static int cpupool_cpu_remove_prologue(unsigned int cpu)
 {
     int ret = 0;
+    cpumask_t *cpus;
+    unsigned int master_cpu;
 
     spin_lock(&cpupool_lock);
 
-    if ( cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
+    rcu_read_lock(&sched_res_rculock);
+    cpus = get_sched_res(cpu)->cpus;
+    master_cpu = sched_get_resource_cpu(cpu);
+    if ( cpumask_intersects(cpus, &cpupool_locked_cpus) )
         ret = -EBUSY;
     else
         cpumask_set_cpu(cpu, &cpupool_locked_cpus);
+    rcu_read_unlock(&sched_res_rculock);
 
     spin_unlock(&cpupool_lock);
 
     if ( ret )
         return  ret;
 
-    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    if ( cpumask_test_cpu(master_cpu, cpupool0->cpu_valid) )
     {
         /* Cpupool0 is populated only after all cpus are up. */
         ASSERT(system_state == SYS_STATE_active);
 
-        ret = cpupool_unassign_cpu_prologue(cpupool0, cpu);
+        ret = cpupool_unassign_cpu_prologue(cpupool0, master_cpu);
     }
-    else if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+    else if ( !cpumask_test_cpu(master_cpu, &cpupool_free_cpus) )
         ret = -ENODEV;
 
     return ret;
@@ -657,27 +686,43 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
     case XEN_SYSCTL_CPUPOOL_OP_ADDCPU:
     {
         unsigned cpu;
+        const cpumask_t *cpus;
 
         cpu = op->cpu;
         cpupool_dprintk("cpupool_assign_cpu(pool=%d,cpu=%d)\n",
                         op->cpupool_id, cpu);
+
         spin_lock(&cpupool_lock);
+
+        c = cpupool_find_by_id(op->cpupool_id);
+        ret = -ENOENT;
+        if ( c == NULL )
+            goto addcpu_out;
         if ( cpu == XEN_SYSCTL_CPUPOOL_PAR_ANY )
-            cpu = cpumask_first(&cpupool_free_cpus);
+        {
+            for_each_cpu ( cpu, &cpupool_free_cpus )
+            {
+                cpus = sched_get_opt_cpumask(c->opt_granularity, cpu);
+                if ( cpumask_subset(cpus, &cpupool_free_cpus) )
+                    break;
+            }
+            ret = -ENODEV;
+            if ( cpu >= nr_cpu_ids )
+                goto addcpu_out;
+        }
         ret = -EINVAL;
         if ( cpu >= nr_cpu_ids )
             goto addcpu_out;
         ret = -ENODEV;
-        if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
-             cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
-            goto addcpu_out;
-        c = cpupool_find_by_id(op->cpupool_id);
-        ret = -ENOENT;
-        if ( c == NULL )
+        cpus = sched_get_opt_cpumask(c->opt_granularity, cpu);
+        if ( !cpumask_subset(cpus, &cpupool_free_cpus) ||
+             cpumask_intersects(cpus, &cpupool_locked_cpus) )
             goto addcpu_out;
         ret = cpupool_assign_cpu_locked(c, cpu);
+
     addcpu_out:
         spin_unlock(&cpupool_lock);
+
         cpupool_dprintk("cpupool_assign_cpu(pool=%d,cpu=%d) ret %d\n",
                         op->cpupool_id, cpu, ret);
     }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index b4e65e81b0..fed71b26d2 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -2469,8 +2469,7 @@ static struct notifier_block cpu_schedule_nfb = {
     .notifier_call = cpu_schedule_callback
 };
 
-static const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt,
-                                              unsigned int cpu)
+const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu)
 {
     const cpumask_t *mask;
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 714d793815..c125236068 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -613,5 +613,6 @@ affinity_balance_cpumask(const struct sched_unit *unit, int step,
 }
 
 void sched_rm_cpu(unsigned int cpu);
+const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu);
 
 #endif /* __XEN_SCHED_IF_H__ */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH 60/60] xen/sched: add scheduling granularity enum
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add a scheduling granularity enum ("cpu", "core", "socket") for
specification of the scheduling granularity. Initially it is set to
"cpu", this can be modified by the new boot parameter (x86 only)
"sched-gran".

According to the selected granularity sched_granularity is set after
all cpus are online.

A test is added for all sched resources holding the same number of
cpus. Fall back to core- or cpu-scheduling in that case.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2:
- fixed freeing of sched_res when merging cpus
- rename parameter to "sched-gran" (Jan Beulich)
- rename parameter option from "thread" to "cpu" (Jan Beulich)

V1:
- rename scheduler_smp_init() to scheduler_gran_init(), let it be called
  by cpupool_init()
- avoid using literal cpu number 0 in scheduler_percpu_init() (Jan Beulich)
- style correction (Jan Beulich)
- fallback to smaller granularity instead of panic in case of
  unbalanced cpu configuration
---
 xen/common/cpupool.c       |  2 ++
 xen/common/schedule.c      | 78 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/sched-if.h |  1 +
 3 files changed, 81 insertions(+)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 4dac7dedca..5a01f90d08 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -868,6 +868,8 @@ static int __init cpupool_init(void)
     unsigned int cpu;
     int err;
 
+    scheduler_gran_init();
+
     cpupool0 = cpupool_create(0, 0, &err);
     BUG_ON(cpupool0 == NULL);
     cpupool_put(cpupool0);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index fed71b26d2..9e9b7f5f48 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -34,6 +34,7 @@
 #include <xen/cpu.h>
 #include <xen/preempt.h>
 #include <xen/event.h>
+#include <xen/warning.h>
 #include <public/sched.h>
 #include <xsm/xsm.h>
 #include <xen/err.h>
@@ -61,6 +62,23 @@ unsigned int sched_granularity = 1;
 bool sched_disable_smt_switching;
 cpumask_var_t sched_res_mask;
 
+#ifdef CONFIG_X86
+static int __init sched_select_granularity(const char *str)
+{
+    if (strcmp("cpu", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_cpu;
+    else if (strcmp("core", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_core;
+    else if (strcmp("socket", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_socket;
+    else
+        return -EINVAL;
+
+    return 0;
+}
+custom_param("sched-gran", sched_select_granularity);
+#endif
+
 /* Common lock for free cpus. */
 static DEFINE_SPINLOCK(sched_free_cpu_lock);
 
@@ -2492,6 +2510,66 @@ const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu)
     return mask;
 }
 
+static unsigned int __init sched_check_granularity(void)
+{
+    unsigned int cpu;
+    unsigned int siblings, gran = 0;
+
+    if ( opt_sched_granularity == SCHED_GRAN_cpu )
+        return 1;
+
+    for_each_online_cpu ( cpu )
+    {
+        siblings = cpumask_weight(sched_get_opt_cpumask(opt_sched_granularity,
+                                                        cpu));
+        if ( gran == 0 )
+            gran = siblings;
+        else if ( gran != siblings )
+            return 0;
+    }
+
+    sched_disable_smt_switching = true;
+
+    return gran;
+}
+
+/* Setup data for selected scheduler granularity. */
+void __init scheduler_gran_init(void)
+{
+    unsigned int gran = 0;
+    const char *fallback = NULL;
+
+    while ( gran == 0 )
+    {
+        gran = sched_check_granularity();
+
+        if ( gran == 0 )
+        {
+            switch ( opt_sched_granularity )
+            {
+            case SCHED_GRAN_core:
+                opt_sched_granularity = SCHED_GRAN_cpu;
+                fallback = "Asymmetric cpu configuration.\n"
+                           "Falling back to sched-gran=cpu.\n";
+                break;
+            case SCHED_GRAN_socket:
+                opt_sched_granularity = SCHED_GRAN_core;
+                fallback = "Asymmetric cpu configuration.\n"
+                           "Falling back to sched-gran=core.\n";
+                break;
+            default:
+                ASSERT_UNREACHABLE();
+                break;
+            }
+        }
+    }
+
+    if ( fallback )
+        warning_add(fallback);
+
+    sched_granularity = gran;
+}
+
 /* Initialise the data structures. */
 void __init scheduler_init(void)
 {
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index c125236068..e16884fdf5 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -614,5 +614,6 @@ affinity_balance_cpumask(const struct sched_unit *unit, int step,
 
 void sched_rm_cpu(unsigned int cpu);
 const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu);
+void scheduler_gran_init(void);
 
 #endif /* __XEN_SCHED_IF_H__ */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [Xen-devel] [PATCH 60/60] xen/sched: add scheduling granularity enum
@ 2019-05-28 10:33   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add a scheduling granularity enum ("cpu", "core", "socket") for
specification of the scheduling granularity. Initially it is set to
"cpu", this can be modified by the new boot parameter (x86 only)
"sched-gran".

According to the selected granularity sched_granularity is set after
all cpus are online.

A test is added for all sched resources holding the same number of
cpus. Fall back to core- or cpu-scheduling in that case.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2:
- fixed freeing of sched_res when merging cpus
- rename parameter to "sched-gran" (Jan Beulich)
- rename parameter option from "thread" to "cpu" (Jan Beulich)

V1:
- rename scheduler_smp_init() to scheduler_gran_init(), let it be called
  by cpupool_init()
- avoid using literal cpu number 0 in scheduler_percpu_init() (Jan Beulich)
- style correction (Jan Beulich)
- fallback to smaller granularity instead of panic in case of
  unbalanced cpu configuration
---
 xen/common/cpupool.c       |  2 ++
 xen/common/schedule.c      | 78 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/sched-if.h |  1 +
 3 files changed, 81 insertions(+)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 4dac7dedca..5a01f90d08 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -868,6 +868,8 @@ static int __init cpupool_init(void)
     unsigned int cpu;
     int err;
 
+    scheduler_gran_init();
+
     cpupool0 = cpupool_create(0, 0, &err);
     BUG_ON(cpupool0 == NULL);
     cpupool_put(cpupool0);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index fed71b26d2..9e9b7f5f48 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -34,6 +34,7 @@
 #include <xen/cpu.h>
 #include <xen/preempt.h>
 #include <xen/event.h>
+#include <xen/warning.h>
 #include <public/sched.h>
 #include <xsm/xsm.h>
 #include <xen/err.h>
@@ -61,6 +62,23 @@ unsigned int sched_granularity = 1;
 bool sched_disable_smt_switching;
 cpumask_var_t sched_res_mask;
 
+#ifdef CONFIG_X86
+static int __init sched_select_granularity(const char *str)
+{
+    if (strcmp("cpu", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_cpu;
+    else if (strcmp("core", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_core;
+    else if (strcmp("socket", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_socket;
+    else
+        return -EINVAL;
+
+    return 0;
+}
+custom_param("sched-gran", sched_select_granularity);
+#endif
+
 /* Common lock for free cpus. */
 static DEFINE_SPINLOCK(sched_free_cpu_lock);
 
@@ -2492,6 +2510,66 @@ const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu)
     return mask;
 }
 
+static unsigned int __init sched_check_granularity(void)
+{
+    unsigned int cpu;
+    unsigned int siblings, gran = 0;
+
+    if ( opt_sched_granularity == SCHED_GRAN_cpu )
+        return 1;
+
+    for_each_online_cpu ( cpu )
+    {
+        siblings = cpumask_weight(sched_get_opt_cpumask(opt_sched_granularity,
+                                                        cpu));
+        if ( gran == 0 )
+            gran = siblings;
+        else if ( gran != siblings )
+            return 0;
+    }
+
+    sched_disable_smt_switching = true;
+
+    return gran;
+}
+
+/* Setup data for selected scheduler granularity. */
+void __init scheduler_gran_init(void)
+{
+    unsigned int gran = 0;
+    const char *fallback = NULL;
+
+    while ( gran == 0 )
+    {
+        gran = sched_check_granularity();
+
+        if ( gran == 0 )
+        {
+            switch ( opt_sched_granularity )
+            {
+            case SCHED_GRAN_core:
+                opt_sched_granularity = SCHED_GRAN_cpu;
+                fallback = "Asymmetric cpu configuration.\n"
+                           "Falling back to sched-gran=cpu.\n";
+                break;
+            case SCHED_GRAN_socket:
+                opt_sched_granularity = SCHED_GRAN_core;
+                fallback = "Asymmetric cpu configuration.\n"
+                           "Falling back to sched-gran=core.\n";
+                break;
+            default:
+                ASSERT_UNREACHABLE();
+                break;
+            }
+        }
+    }
+
+    if ( fallback )
+        warning_add(fallback);
+
+    sched_granularity = gran;
+}
+
 /* Initialise the data structures. */
 void __init scheduler_init(void)
 {
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index c125236068..e16884fdf5 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -614,5 +614,6 @@ affinity_balance_cpumask(const struct sched_unit *unit, int step,
 
 void sched_rm_cpu(unsigned int cpu);
 const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu);
+void scheduler_gran_init(void);
 
 #endif /* __XEN_SCHED_IF_H__ */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* Re: [PATCH 11/60] xen/sched: move per cpu scheduler private data into struct sched_resource
@ 2019-05-28 11:32     ` Jan Beulich
  0 siblings, 0 replies; 202+ messages in thread
From: Jan Beulich @ 2019-05-28 11:32 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Robert VanVossen,
	Dario Faggioli, Julien Grall, Joshua Whitehead, Meng Xu,
	xen-devel, Roger Pau Monne

>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote:
> This prepares support of larger scheduling granularities, e.g. core
> scheduling.
> 
> While at it move sched_has_urgent_vcpu() from include/asm-x86/cpuidle.h
> into schedule.c removing the need for including sched-if.h in
> cpuidle.h and multiple other C sources.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

(minor) x86 changes
Acked-by: Jan Beulich <jbeulich@suse.com>

> @@ -2074,6 +2072,16 @@ void wait(void)
>      schedule();
>  }
>  
> +/*
> + * vcpu is urgent if vcpu is polling event channel
> + *
> + * if urgent vcpu exists, CPU should not enter deep C state
> + */
> +int sched_has_urgent_vcpu(void)
> +{
> +    return atomic_read(&get_sched_res(smp_processor_id())->urgent_count);
> +}

Make this function's return type "bool" at the same time?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 11/60] xen/sched: move per cpu scheduler private data into struct sched_resource
@ 2019-05-28 11:32     ` Jan Beulich
  0 siblings, 0 replies; 202+ messages in thread
From: Jan Beulich @ 2019-05-28 11:32 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Robert VanVossen,
	Dario Faggioli, Julien Grall, Joshua Whitehead, Meng Xu,
	xen-devel, Roger Pau Monne

>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote:
> This prepares support of larger scheduling granularities, e.g. core
> scheduling.
> 
> While at it move sched_has_urgent_vcpu() from include/asm-x86/cpuidle.h
> into schedule.c removing the need for including sched-if.h in
> cpuidle.h and multiple other C sources.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

(minor) x86 changes
Acked-by: Jan Beulich <jbeulich@suse.com>

> @@ -2074,6 +2072,16 @@ void wait(void)
>      schedule();
>  }
>  
> +/*
> + * vcpu is urgent if vcpu is polling event channel
> + *
> + * if urgent vcpu exists, CPU should not enter deep C state
> + */
> +int sched_has_urgent_vcpu(void)
> +{
> +    return atomic_read(&get_sched_res(smp_processor_id())->urgent_count);
> +}

Make this function's return type "bool" at the same time?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH 49/60] xen/sched: reject switching smt on/off with core scheduling active
@ 2019-05-28 11:44     ` Jan Beulich
  0 siblings, 0 replies; 202+ messages in thread
From: Jan Beulich @ 2019-05-28 11:44 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
> --- a/xen/arch/x86/sysctl.c
> +++ b/xen/arch/x86/sysctl.c
> @@ -200,7 +200,8 @@ long arch_do_sysctl(
>  
>          case XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE:
>          case XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE:
> -            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 )
> +            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 ||
> +                 sched_disable_smt_switching )
>              {
>                  ret = -EOPNOTSUPP;
>                  break;

I'm not convinced -EOPNOTSUPP is an appropriate error code for
this new case. -EPERM, -EACCES, or -EIO would all seem more
appropriate to me (and perhaps there are further ones).

> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -57,6 +57,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
>  
>  /* Number of vcpus per struct sched_unit. */
>  static unsigned int sched_granularity = 1;
> +bool sched_disable_smt_switching;

__read_mostly (perhaps also the pre-existing variable in context)?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 49/60] xen/sched: reject switching smt on/off with core scheduling active
@ 2019-05-28 11:44     ` Jan Beulich
  0 siblings, 0 replies; 202+ messages in thread
From: Jan Beulich @ 2019-05-28 11:44 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
> --- a/xen/arch/x86/sysctl.c
> +++ b/xen/arch/x86/sysctl.c
> @@ -200,7 +200,8 @@ long arch_do_sysctl(
>  
>          case XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE:
>          case XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE:
> -            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 )
> +            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 ||
> +                 sched_disable_smt_switching )
>              {
>                  ret = -EOPNOTSUPP;
>                  break;

I'm not convinced -EOPNOTSUPP is an appropriate error code for
this new case. -EPERM, -EACCES, or -EIO would all seem more
appropriate to me (and perhaps there are further ones).

> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -57,6 +57,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
>  
>  /* Number of vcpus per struct sched_unit. */
>  static unsigned int sched_granularity = 1;
> +bool sched_disable_smt_switching;

__read_mostly (perhaps also the pre-existing variable in context)?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-28 11:47     ` Jan Beulich
  0 siblings, 0 replies; 202+ messages in thread
From: Jan Beulich @ 2019-05-28 11:47 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
> Instead of having a full blown scheduler running for the free cpus
> add a very minimalistic scheduler for that purpose only ever scheduling
> the related idle vcpu. This has the big advantage of not needing any
> per-cpu, per-domain or per-scheduling unit data for free cpus and in
> turn simplifying moving cpus to and from cpupools a lot.

And the null scheduler is not minimalistic enough?

> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -350,8 +350,6 @@ void start_secondary(unsigned long boot_phys_offset,
>  
>      setup_cpu_sibling_map(cpuid);
>  
> -    scheduler_percpu_init(cpuid);
> -
>      /* Run local notifiers */
>      notify_cpu_starting(cpuid);
>      /*
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -382,8 +382,6 @@ void start_secondary(void *unused)
>  
>      set_cpu_sibling_map(cpu);
>  
> -    scheduler_percpu_init(cpu);
> -
>      init_percpu_time();
>  
>      setup_secondary_APIC_clock();

Seeing you revert here what an earlier patch in the series has added,
I don't suppose re-ordering could avoid this code churn?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-28 11:47     ` Jan Beulich
  0 siblings, 0 replies; 202+ messages in thread
From: Jan Beulich @ 2019-05-28 11:47 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
> Instead of having a full blown scheduler running for the free cpus
> add a very minimalistic scheduler for that purpose only ever scheduling
> the related idle vcpu. This has the big advantage of not needing any
> per-cpu, per-domain or per-scheduling unit data for free cpus and in
> turn simplifying moving cpus to and from cpupools a lot.

And the null scheduler is not minimalistic enough?

> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -350,8 +350,6 @@ void start_secondary(unsigned long boot_phys_offset,
>  
>      setup_cpu_sibling_map(cpuid);
>  
> -    scheduler_percpu_init(cpuid);
> -
>      /* Run local notifiers */
>      notify_cpu_starting(cpuid);
>      /*
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -382,8 +382,6 @@ void start_secondary(void *unused)
>  
>      set_cpu_sibling_map(cpu);
>  
> -    scheduler_percpu_init(cpu);
> -
>      init_percpu_time();
>  
>      setup_secondary_APIC_clock();

Seeing you revert here what an earlier patch in the series has added,
I don't suppose re-ordering could avoid this code churn?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH 60/60] xen/sched: add scheduling granularity enum
@ 2019-05-28 11:51     ` Jan Beulich
  0 siblings, 0 replies; 202+ messages in thread
From: Jan Beulich @ 2019-05-28 11:51 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel

>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
> @@ -61,6 +62,23 @@ unsigned int sched_granularity = 1;
>  bool sched_disable_smt_switching;
>  cpumask_var_t sched_res_mask;
>  
> +#ifdef CONFIG_X86
> +static int __init sched_select_granularity(const char *str)
> +{
> +    if (strcmp("cpu", str) == 0)
> +        opt_sched_granularity = SCHED_GRAN_cpu;
> +    else if (strcmp("core", str) == 0)
> +        opt_sched_granularity = SCHED_GRAN_core;
> +    else if (strcmp("socket", str) == 0)
> +        opt_sched_granularity = SCHED_GRAN_socket;
> +    else
> +        return -EINVAL;
> +
> +    return 0;
> +}
> +custom_param("sched-gran", sched_select_granularity);
> +#endif

I'm surprised by the x86 dependency here: I didn't think HT or multi-
core are x86-only concepts. Even if Arm may not want this right now,
I think it would be better to have a dedicated Kconfig setting, which
for now only x86 would select.

Also there are several missing blanks here.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 60/60] xen/sched: add scheduling granularity enum
@ 2019-05-28 11:51     ` Jan Beulich
  0 siblings, 0 replies; 202+ messages in thread
From: Jan Beulich @ 2019-05-28 11:51 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel

>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
> @@ -61,6 +62,23 @@ unsigned int sched_granularity = 1;
>  bool sched_disable_smt_switching;
>  cpumask_var_t sched_res_mask;
>  
> +#ifdef CONFIG_X86
> +static int __init sched_select_granularity(const char *str)
> +{
> +    if (strcmp("cpu", str) == 0)
> +        opt_sched_granularity = SCHED_GRAN_cpu;
> +    else if (strcmp("core", str) == 0)
> +        opt_sched_granularity = SCHED_GRAN_core;
> +    else if (strcmp("socket", str) == 0)
> +        opt_sched_granularity = SCHED_GRAN_socket;
> +    else
> +        return -EINVAL;
> +
> +    return 0;
> +}
> +custom_param("sched-gran", sched_select_granularity);
> +#endif

I'm surprised by the x86 dependency here: I didn't think HT or multi-
core are x86-only concepts. Even if Arm may not want this right now,
I think it would be better to have a dedicated Kconfig setting, which
for now only x86 would select.

Also there are several missing blanks here.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH 49/60] xen/sched: reject switching smt on/off with core scheduling active
@ 2019-05-28 11:52       ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 11:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall,
	xen-devel, Ian Jackson, Roger Pau Monne

On 28/05/2019 13:44, Jan Beulich wrote:
>>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
>> --- a/xen/arch/x86/sysctl.c
>> +++ b/xen/arch/x86/sysctl.c
>> @@ -200,7 +200,8 @@ long arch_do_sysctl(
>>  
>>          case XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE:
>>          case XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE:
>> -            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 )
>> +            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 ||
>> +                 sched_disable_smt_switching )
>>              {
>>                  ret = -EOPNOTSUPP;
>>                  break;
> 
> I'm not convinced -EOPNOTSUPP is an appropriate error code for
> this new case. -EPERM, -EACCES, or -EIO would all seem more
> appropriate to me (and perhaps there are further ones).

I think -EIO or -EBUSY would be the best fit.

> 
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -57,6 +57,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
>>  
>>  /* Number of vcpus per struct sched_unit. */
>>  static unsigned int sched_granularity = 1;
>> +bool sched_disable_smt_switching;
> 
> __read_mostly (perhaps also the pre-existing variable in context)?

Yes, can do.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 49/60] xen/sched: reject switching smt on/off with core scheduling active
@ 2019-05-28 11:52       ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 11:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall,
	xen-devel, Ian Jackson, Roger Pau Monne

On 28/05/2019 13:44, Jan Beulich wrote:
>>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
>> --- a/xen/arch/x86/sysctl.c
>> +++ b/xen/arch/x86/sysctl.c
>> @@ -200,7 +200,8 @@ long arch_do_sysctl(
>>  
>>          case XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE:
>>          case XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE:
>> -            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 )
>> +            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings < 2 ||
>> +                 sched_disable_smt_switching )
>>              {
>>                  ret = -EOPNOTSUPP;
>>                  break;
> 
> I'm not convinced -EOPNOTSUPP is an appropriate error code for
> this new case. -EPERM, -EACCES, or -EIO would all seem more
> appropriate to me (and perhaps there are further ones).

I think -EIO or -EBUSY would be the best fit.

> 
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -57,6 +57,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
>>  
>>  /* Number of vcpus per struct sched_unit. */
>>  static unsigned int sched_granularity = 1;
>> +bool sched_disable_smt_switching;
> 
> __read_mostly (perhaps also the pre-existing variable in context)?

Yes, can do.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-28 11:58       ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 11:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall,
	xen-devel, Ian Jackson, Roger Pau Monne

On 28/05/2019 13:47, Jan Beulich wrote:
>>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
>> Instead of having a full blown scheduler running for the free cpus
>> add a very minimalistic scheduler for that purpose only ever scheduling
>> the related idle vcpu. This has the big advantage of not needing any
>> per-cpu, per-domain or per-scheduling unit data for free cpus and in
>> turn simplifying moving cpus to and from cpupools a lot.
> 
> And the null scheduler is not minimalistic enough?

The main disadvantage of the null scheduler are the need for private
data (which has to be allocated/freed on scheduler switch), its not
yet perfect stability, and its "complexity" (e.g. 900 lines vs. 50).

> 
>> --- a/xen/arch/arm/smpboot.c
>> +++ b/xen/arch/arm/smpboot.c
>> @@ -350,8 +350,6 @@ void start_secondary(unsigned long boot_phys_offset,
>>  
>>      setup_cpu_sibling_map(cpuid);
>>  
>> -    scheduler_percpu_init(cpuid);
>> -
>>      /* Run local notifiers */
>>      notify_cpu_starting(cpuid);
>>      /*
>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -382,8 +382,6 @@ void start_secondary(void *unused)
>>  
>>      set_cpu_sibling_map(cpu);
>>  
>> -    scheduler_percpu_init(cpu);
>> -
>>      init_percpu_time();
>>  
>>      setup_secondary_APIC_clock();
> 
> Seeing you revert here what an earlier patch in the series has added,
> I don't suppose re-ordering could avoid this code churn?

This would require another major shuffling of patches. I'd like to avoid
that.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-28 11:58       ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 11:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall,
	xen-devel, Ian Jackson, Roger Pau Monne

On 28/05/2019 13:47, Jan Beulich wrote:
>>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
>> Instead of having a full blown scheduler running for the free cpus
>> add a very minimalistic scheduler for that purpose only ever scheduling
>> the related idle vcpu. This has the big advantage of not needing any
>> per-cpu, per-domain or per-scheduling unit data for free cpus and in
>> turn simplifying moving cpus to and from cpupools a lot.
> 
> And the null scheduler is not minimalistic enough?

The main disadvantage of the null scheduler are the need for private
data (which has to be allocated/freed on scheduler switch), its not
yet perfect stability, and its "complexity" (e.g. 900 lines vs. 50).

> 
>> --- a/xen/arch/arm/smpboot.c
>> +++ b/xen/arch/arm/smpboot.c
>> @@ -350,8 +350,6 @@ void start_secondary(unsigned long boot_phys_offset,
>>  
>>      setup_cpu_sibling_map(cpuid);
>>  
>> -    scheduler_percpu_init(cpuid);
>> -
>>      /* Run local notifiers */
>>      notify_cpu_starting(cpuid);
>>      /*
>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -382,8 +382,6 @@ void start_secondary(void *unused)
>>  
>>      set_cpu_sibling_map(cpu);
>>  
>> -    scheduler_percpu_init(cpu);
>> -
>>      init_percpu_time();
>>  
>>      setup_secondary_APIC_clock();
> 
> Seeing you revert here what an earlier patch in the series has added,
> I don't suppose re-ordering could avoid this code churn?

This would require another major shuffling of patches. I'd like to avoid
that.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH 60/60] xen/sched: add scheduling granularity enum
@ 2019-05-28 12:02       ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 12:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall,
	xen-devel, Ian Jackson

On 28/05/2019 13:51, Jan Beulich wrote:
>>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
>> @@ -61,6 +62,23 @@ unsigned int sched_granularity = 1;
>>  bool sched_disable_smt_switching;
>>  cpumask_var_t sched_res_mask;
>>  
>> +#ifdef CONFIG_X86
>> +static int __init sched_select_granularity(const char *str)
>> +{
>> +    if (strcmp("cpu", str) == 0)
>> +        opt_sched_granularity = SCHED_GRAN_cpu;
>> +    else if (strcmp("core", str) == 0)
>> +        opt_sched_granularity = SCHED_GRAN_core;
>> +    else if (strcmp("socket", str) == 0)
>> +        opt_sched_granularity = SCHED_GRAN_socket;
>> +    else
>> +        return -EINVAL;
>> +
>> +    return 0;
>> +}
>> +custom_param("sched-gran", sched_select_granularity);
>> +#endif
> 
> I'm surprised by the x86 dependency here: I didn't think HT or multi-
> core are x86-only concepts. Even if Arm may not want this right now,
> I think it would be better to have a dedicated Kconfig setting, which
> for now only x86 would select.

Fine with me.

I just wanted to highlight that ARM is missing some code in the context
switching area.

> Also there are several missing blanks here.

Oh, indeed!


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 60/60] xen/sched: add scheduling granularity enum
@ 2019-05-28 12:02       ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-28 12:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall,
	xen-devel, Ian Jackson

On 28/05/2019 13:51, Jan Beulich wrote:
>>>> On 28.05.19 at 12:33, <jgross@suse.com> wrote:
>> @@ -61,6 +62,23 @@ unsigned int sched_granularity = 1;
>>  bool sched_disable_smt_switching;
>>  cpumask_var_t sched_res_mask;
>>  
>> +#ifdef CONFIG_X86
>> +static int __init sched_select_granularity(const char *str)
>> +{
>> +    if (strcmp("cpu", str) == 0)
>> +        opt_sched_granularity = SCHED_GRAN_cpu;
>> +    else if (strcmp("core", str) == 0)
>> +        opt_sched_granularity = SCHED_GRAN_core;
>> +    else if (strcmp("socket", str) == 0)
>> +        opt_sched_granularity = SCHED_GRAN_socket;
>> +    else
>> +        return -EINVAL;
>> +
>> +    return 0;
>> +}
>> +custom_param("sched-gran", sched_select_granularity);
>> +#endif
> 
> I'm surprised by the x86 dependency here: I didn't think HT or multi-
> core are x86-only concepts. Even if Arm may not want this right now,
> I think it would be better to have a dedicated Kconfig setting, which
> for now only x86 would select.

Fine with me.

I just wanted to highlight that ARM is missing some code in the context
switching area.

> Also there are several missing blanks here.

Oh, indeed!


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-31 14:15         ` Dario Faggioli
  0 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-05-31 14:15 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Julien Grall, xen-devel, Ian Jackson,
	Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 2270 bytes --]

On Tue, 2019-05-28 at 13:58 +0200, Juergen Gross wrote:
> On 28/05/2019 13:47, Jan Beulich wrote:
> > > > > On 28.05.19 at 12:33, <jgross@suse.com> wrote:
> > > Instead of having a full blown scheduler running for the free
> > > cpus
> > > add a very minimalistic scheduler for that purpose only ever
> > > scheduling
> > > the related idle vcpu. This has the big advantage of not needing
> > > any
> > > per-cpu, per-domain or per-scheduling unit data for free cpus and
> > > in
> > > turn simplifying moving cpus to and from cpupools a lot.
> > 
> > And the null scheduler is not minimalistic enough?
> 
> The main disadvantage of the null scheduler are the need for private
> data (which has to be allocated/freed on scheduler switch), its not
> yet perfect stability, and its "complexity" (e.g. 900 lines vs. 50).
> 
Yes, I absolutely agree with this approach of having an even simpler
idle scheduler, for the idle vcpus, and not selectable and usable by
the user for other things.

It would, actually, be rather useful in other ways and places, so I'm
actually rather happy to see it appearing (I started multiple times
doing something like this myself, but never finished :-/)

For instance, the fact that we put cpus that are not in any pool in the
default scheduler, was weird (to say the least) from a conceptual point
of view, forced us to do some extra checks (in the form of, e.g.,
cpumask() operations) to avoid actually scheduling vcpus on them and
was causing accounting issues in Credit1.

Now that the free cpus can go and stay in their own "idle scheduler",
those things can all be solved, in the (IMO) best and most clean way.

And, yes, it has to be as simple as possible, even simpler than null...
And I think we can easily see why that's the case, just by looking at
what code this patch let us remove (e.g., the need for some of the
checking of `system_state` in cpupool or scheduler code, just to
mention one).

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-31 14:15         ` Dario Faggioli
  0 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-05-31 14:15 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Julien Grall, xen-devel, Ian Jackson,
	Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 2270 bytes --]

On Tue, 2019-05-28 at 13:58 +0200, Juergen Gross wrote:
> On 28/05/2019 13:47, Jan Beulich wrote:
> > > > > On 28.05.19 at 12:33, <jgross@suse.com> wrote:
> > > Instead of having a full blown scheduler running for the free
> > > cpus
> > > add a very minimalistic scheduler for that purpose only ever
> > > scheduling
> > > the related idle vcpu. This has the big advantage of not needing
> > > any
> > > per-cpu, per-domain or per-scheduling unit data for free cpus and
> > > in
> > > turn simplifying moving cpus to and from cpupools a lot.
> > 
> > And the null scheduler is not minimalistic enough?
> 
> The main disadvantage of the null scheduler are the need for private
> data (which has to be allocated/freed on scheduler switch), its not
> yet perfect stability, and its "complexity" (e.g. 900 lines vs. 50).
> 
Yes, I absolutely agree with this approach of having an even simpler
idle scheduler, for the idle vcpus, and not selectable and usable by
the user for other things.

It would, actually, be rather useful in other ways and places, so I'm
actually rather happy to see it appearing (I started multiple times
doing something like this myself, but never finished :-/)

For instance, the fact that we put cpus that are not in any pool in the
default scheduler, was weird (to say the least) from a conceptual point
of view, forced us to do some extra checks (in the form of, e.g.,
cpumask() operations) to avoid actually scheduling vcpus on them and
was causing accounting issues in Credit1.

Now that the free cpus can go and stay in their own "idle scheduler",
those things can all be solved, in the (IMO) best and most clean way.

And, yes, it has to be as simple as possible, even simpler than null...
And I think we can easily see why that's the case, just by looking at
what code this patch let us remove (e.g., the need for some of the
checking of `system_state` in cpupool or scheduler code, just to
mention one).

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-31 15:52     ` Dario Faggioli
  0 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-05-31 15:52 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1851 bytes --]

On Tue, 2019-05-28 at 12:33 +0200, Juergen Gross wrote:
> Instead of having a full blown scheduler running for the free cpus
> add a very minimalistic scheduler for that purpose only ever
> scheduling
> the related idle vcpu. This has the big advantage of not needing any
> per-cpu, per-domain or per-scheduling unit data for free cpus and in
> turn simplifying moving cpus to and from cpupools a lot.
> 
> As this new scheduler is not user selectable don't register it as an
> official scheduler, but just include it in schedule.c.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V1: new patch
> ---

> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -83,6 +83,57 @@ extern const struct scheduler
> *__start_schedulers_array[], *__end_schedulers_arr
>  
>  static struct scheduler __read_mostly ops;
>  
> [...]
>
> +static void *
> +sched_idle_alloc_vdata(const struct scheduler *ops, struct
> sched_unit *unit,
> +                       void *dd)
> +{
> +    /* Any non-NULL pointer is fine here. */
> +    return (void *)1UL;
> +}
> +
I think the proper thing to do, here, would be to convert alloc_vdata()
to PTR_ERR() & IS_ERR().

That's another patch, of course, and given that this series is rather
big already, I'm not sure about asking for it to be done in the context
of this work.

I can do it myself, either right now or after this series is merged
(for the sake of not making rebasing 60 patches more complicated than
it must be already, for you :-D).

Let me know what you think.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-31 15:52     ` Dario Faggioli
  0 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-05-31 15:52 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1851 bytes --]

On Tue, 2019-05-28 at 12:33 +0200, Juergen Gross wrote:
> Instead of having a full blown scheduler running for the free cpus
> add a very minimalistic scheduler for that purpose only ever
> scheduling
> the related idle vcpu. This has the big advantage of not needing any
> per-cpu, per-domain or per-scheduling unit data for free cpus and in
> turn simplifying moving cpus to and from cpupools a lot.
> 
> As this new scheduler is not user selectable don't register it as an
> official scheduler, but just include it in schedule.c.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V1: new patch
> ---

> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -83,6 +83,57 @@ extern const struct scheduler
> *__start_schedulers_array[], *__end_schedulers_arr
>  
>  static struct scheduler __read_mostly ops;
>  
> [...]
>
> +static void *
> +sched_idle_alloc_vdata(const struct scheduler *ops, struct
> sched_unit *unit,
> +                       void *dd)
> +{
> +    /* Any non-NULL pointer is fine here. */
> +    return (void *)1UL;
> +}
> +
I think the proper thing to do, here, would be to convert alloc_vdata()
to PTR_ERR() & IS_ERR().

That's another patch, of course, and given that this series is rather
big already, I'm not sure about asking for it to be done in the context
of this work.

I can do it myself, either right now or after this series is merged
(for the sake of not making rebasing 60 patches more complicated than
it must be already, for you :-D).

Let me know what you think.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-31 16:44       ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-31 16:44 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich, Roger Pau Monné

On 31/05/2019 17:52, Dario Faggioli wrote:
> On Tue, 2019-05-28 at 12:33 +0200, Juergen Gross wrote:
>> Instead of having a full blown scheduler running for the free cpus
>> add a very minimalistic scheduler for that purpose only ever
>> scheduling
>> the related idle vcpu. This has the big advantage of not needing any
>> per-cpu, per-domain or per-scheduling unit data for free cpus and in
>> turn simplifying moving cpus to and from cpupools a lot.
>>
>> As this new scheduler is not user selectable don't register it as an
>> official scheduler, but just include it in schedule.c.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V1: new patch
>> ---
> 
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -83,6 +83,57 @@ extern const struct scheduler
>> *__start_schedulers_array[], *__end_schedulers_arr
>>  
>>  static struct scheduler __read_mostly ops;
>>  
>> [...]
>>
>> +static void *
>> +sched_idle_alloc_vdata(const struct scheduler *ops, struct
>> sched_unit *unit,
>> +                       void *dd)
>> +{
>> +    /* Any non-NULL pointer is fine here. */
>> +    return (void *)1UL;
>> +}
>> +
> I think the proper thing to do, here, would be to convert alloc_vdata()
> to PTR_ERR() & IS_ERR().
> 
> That's another patch, of course, and given that this series is rather
> big already, I'm not sure about asking for it to be done in the context
> of this work.
> 
> I can do it myself, either right now or after this series is merged
> (for the sake of not making rebasing 60 patches more complicated than
> it must be already, for you :-D).
> 
> Let me know what you think.

There is only one place left where alloc_vdata is being called for the
idle scheduler, and that is sched_init_vcpu() of an idle vcpu.

I have planned to do a followup series cleaning up the scheduling stuff
where I can add a patch special casing that and removing
sched_idle_alloc_vdata().


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus
@ 2019-05-31 16:44       ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-05-31 16:44 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich, Roger Pau Monné

On 31/05/2019 17:52, Dario Faggioli wrote:
> On Tue, 2019-05-28 at 12:33 +0200, Juergen Gross wrote:
>> Instead of having a full blown scheduler running for the free cpus
>> add a very minimalistic scheduler for that purpose only ever
>> scheduling
>> the related idle vcpu. This has the big advantage of not needing any
>> per-cpu, per-domain or per-scheduling unit data for free cpus and in
>> turn simplifying moving cpus to and from cpupools a lot.
>>
>> As this new scheduler is not user selectable don't register it as an
>> official scheduler, but just include it in schedule.c.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V1: new patch
>> ---
> 
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -83,6 +83,57 @@ extern const struct scheduler
>> *__start_schedulers_array[], *__end_schedulers_arr
>>  
>>  static struct scheduler __read_mostly ops;
>>  
>> [...]
>>
>> +static void *
>> +sched_idle_alloc_vdata(const struct scheduler *ops, struct
>> sched_unit *unit,
>> +                       void *dd)
>> +{
>> +    /* Any non-NULL pointer is fine here. */
>> +    return (void *)1UL;
>> +}
>> +
> I think the proper thing to do, here, would be to convert alloc_vdata()
> to PTR_ERR() & IS_ERR().
> 
> That's another patch, of course, and given that this series is rather
> big already, I'm not sure about asking for it to be done in the context
> of this work.
> 
> I can do it myself, either right now or after this series is merged
> (for the sake of not making rebasing 60 patches more complicated than
> it must be already, for you :-D).
> 
> Let me know what you think.

There is only one place left where alloc_vdata is being called for the
idle scheduler, and that is sched_init_vcpu() of an idle vcpu.

I have planned to do a followup series cleaning up the scheduling stuff
where I can add a patch special casing that and removing
sched_idle_alloc_vdata().


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 01/60] xen/sched: only allow schedulers with all mandatory functions available
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-06-11 16:03   ` Dario Faggioli
  -1 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-06-11 16:03 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 738 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> Some functions of struct scheduler are mandatory. Test those in the
> scheduler initialization loop to be present and drop schedulers not
> complying.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
As discussed in the other thread already, I personally do like this
(and things like to this) quite a bit... Thanks for doing it!

Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Regards 
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 02/60] xen/sched: add inline wrappers for calling per-scheduler functions
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-06-11 16:21   ` Dario Faggioli
  -1 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-06-11 16:21 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 666 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> Instead of using the SCHED_OP() macro to call the different scheduler
> specific functions add inline wrappers for that purpose.
> 
Yep, a cleanup that we were thinking to make since quite some time. :-)

> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-06-11 16:55   ` Dario Faggioli
  2019-06-12  7:40     ` Jan Beulich
  -1 siblings, 1 reply; 202+ messages in thread
From: Dario Faggioli @ 2019-06-11 16:55 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Robert VanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1422 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> Instead of setting the scheduler percpu lock address in each of the
> switch_sched instances of the different schedulers do that in
> schedule_cpu_switch() which is the single caller of that function.
> For that purpose let sched_switch_sched() just return the new lock
> address.
> 
This looks good to me. The only actual functional/behavioral difference
I've been able to spot is the fact that, in Credit2, we currently
switch the lock pointer while still holding the write lock on the
global scheduler. After this change, we don't any longer.

That being said, I've tried to think about how this could be a problem,
but failed at imagining such a scenario, so:

> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

I'm wondering whether it make sense for the above to be quickly
mentioned in the changelog, but am leaning toward "not really
necessary". In particular, I don't think it's worth to respin the patch
just for that... So, either just something that can be added while
committing, or forget it.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
  2019-06-11 16:55   ` Dario Faggioli
@ 2019-06-12  7:40     ` Jan Beulich
  2019-06-12  8:06       ` Juergen Gross
  0 siblings, 1 reply; 202+ messages in thread
From: Jan Beulich @ 2019-06-12  7:40 UTC (permalink / raw)
  To: Dario Faggioli, Juergen Gross
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Robert VanVossen, Tim Deegan,
	Julien Grall, Joshua Whitehead, Meng Xu, xen-devel

>>> On 11.06.19 at 18:55, <dfaggioli@suse.com> wrote:
> On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
>> Instead of setting the scheduler percpu lock address in each of the
>> switch_sched instances of the different schedulers do that in
>> schedule_cpu_switch() which is the single caller of that function.
>> For that purpose let sched_switch_sched() just return the new lock
>> address.
>> 
> This looks good to me. The only actual functional/behavioral difference
> I've been able to spot is the fact that, in Credit2, we currently
> switch the lock pointer while still holding the write lock on the
> global scheduler. After this change, we don't any longer.
> 
> That being said, I've tried to think about how this could be a problem,
> but failed at imagining such a scenario, so:
> 
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
> 
> I'm wondering whether it make sense for the above to be quickly
> mentioned in the changelog, but am leaning toward "not really
> necessary". In particular, I don't think it's worth to respin the patch
> just for that... So, either just something that can be added while
> committing, or forget it.

I'd be happy to add something while committing, but one of you
would need to propose the wording of this "something".

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
  (?)
@ 2019-06-12  8:05   ` Andrew Cooper
  2019-06-12  8:19     ` Juergen Gross
  -1 siblings, 1 reply; 202+ messages in thread
From: Andrew Cooper @ 2019-06-12  8:05 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Robert VanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich

On 28/05/2019 11:32, Juergen Gross wrote:
> @@ -1870,8 +1871,19 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>      old_lock = pcpu_schedule_lock_irq(cpu);
>  
>      vpriv_old = idle->sched_priv;
> -    ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
> -    sched_switch_sched(new_ops, cpu, ppriv, vpriv);
> +    ppriv_old = sd->sched_priv;
> +    new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
> +
> +    per_cpu(scheduler, cpu) = new_ops;
> +    sd->sched_priv = ppriv;
> +
> +    /*
> +     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
> +     * if it is free (and it can be) we want that anyone that manages
> +     * taking it, finds all the initializations we've done above in place.
> +     */
> +    smp_mb();

I realise you're just moving existing code, but this barrier sticks out
like a sore thumb.

A full memory barrier is a massive overhead for what should be
smp_wmb().  The matching barrier is actually hidden in the implicit
semantics of managing to lock sd->schedule_lock (which is trial an error
anyway), but the only thing that matters here is that all other written
data is in place first.

Beyond that, local causality will cause all reads to be in order (not
that the are important) due to logic dependencies.  Any that miss out on
this are a optimisation-waiting-to-happen as the compiler could elide
them fully.

~Andrew

> +    sd->schedule_lock = new_lock;
>  
>      /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
>      spin_unlock_irq(old_lock);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
  2019-06-12  7:40     ` Jan Beulich
@ 2019-06-12  8:06       ` Juergen Gross
  2019-06-12  9:44         ` Dario Faggioli
  0 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-06-12  8:06 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Robert VanVossen, Tim Deegan,
	Julien Grall, Joshua Whitehead, Meng Xu, xen-devel

On 12.06.19 09:40, Jan Beulich wrote:
>>>> On 11.06.19 at 18:55, <dfaggioli@suse.com> wrote:
>> On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
>>> Instead of setting the scheduler percpu lock address in each of the
>>> switch_sched instances of the different schedulers do that in
>>> schedule_cpu_switch() which is the single caller of that function.
>>> For that purpose let sched_switch_sched() just return the new lock
>>> address.
>>>
>> This looks good to me. The only actual functional/behavioral difference
>> I've been able to spot is the fact that, in Credit2, we currently
>> switch the lock pointer while still holding the write lock on the
>> global scheduler. After this change, we don't any longer.
>>
>> That being said, I've tried to think about how this could be a problem,
>> but failed at imagining such a scenario, so:
>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
>>
>> I'm wondering whether it make sense for the above to be quickly
>> mentioned in the changelog, but am leaning toward "not really
>> necessary". In particular, I don't think it's worth to respin the patch
>> just for that... So, either just something that can be added while
>> committing, or forget it.
> 
> I'd be happy to add something while committing, but one of you
> would need to propose the wording of this "something".

What about:

It should be noted that in credit2 the lock used to be set while still
holding the global scheduler write lock, which will no longer be true
with the new scheme applied. This is actually no problem as the write
lock is meant to guard the call of init_pdata() which still is true.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
  2019-06-12  8:05   ` Andrew Cooper
@ 2019-06-12  8:19     ` Juergen Gross
  2019-06-12  9:32       ` Jan Beulich
       [not found]       ` <5D00C6960200007800237622@suse.com>
  0 siblings, 2 replies; 202+ messages in thread
From: Juergen Gross @ 2019-06-12  8:19 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Robert VanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich

On 12.06.19 10:05, Andrew Cooper wrote:
> On 28/05/2019 11:32, Juergen Gross wrote:
>> @@ -1870,8 +1871,19 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
>>       old_lock = pcpu_schedule_lock_irq(cpu);
>>   
>>       vpriv_old = idle->sched_priv;
>> -    ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
>> -    sched_switch_sched(new_ops, cpu, ppriv, vpriv);
>> +    ppriv_old = sd->sched_priv;
>> +    new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
>> +
>> +    per_cpu(scheduler, cpu) = new_ops;
>> +    sd->sched_priv = ppriv;
>> +
>> +    /*
>> +     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
>> +     * if it is free (and it can be) we want that anyone that manages
>> +     * taking it, finds all the initializations we've done above in place.
>> +     */
>> +    smp_mb();
> 
> I realise you're just moving existing code, but this barrier sticks out
> like a sore thumb.
> 
> A full memory barrier is a massive overhead for what should be
> smp_wmb().  The matching barrier is actually hidden in the implicit
> semantics of managing to lock sd->schedule_lock (which is trial an error
> anyway), but the only thing that matters here is that all other written
> data is in place first.
> 
> Beyond that, local causality will cause all reads to be in order (not
> that the are important) due to logic dependencies.  Any that miss out on
> this are a optimisation-waiting-to-happen as the compiler could elide
> them fully.

Not that it would really matter for performance (switching cpus between
cpupools is a _very_ rare operation), I'm fine transforming the barrier
into smp_wmb().


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
  2019-06-12  8:19     ` Juergen Gross
@ 2019-06-12  9:32       ` Jan Beulich
       [not found]       ` <5D00C6960200007800237622@suse.com>
  1 sibling, 0 replies; 202+ messages in thread
From: Jan Beulich @ 2019-06-12  9:32 UTC (permalink / raw)
  To: Andrew Cooper, Juergen Gross
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Tim Deegan, Ian Jackson, Robert VanVossen, Dario Faggioli,
	Julien Grall, Joshua Whitehead, Meng Xu, xen-devel

>>> On 12.06.19 at 10:19, <jgross@suse.com> wrote:
> On 12.06.19 10:05, Andrew Cooper wrote:
>> On 28/05/2019 11:32, Juergen Gross wrote:
>>> @@ -1870,8 +1871,19 @@ int schedule_cpu_switch(unsigned int cpu, struct 
> cpupool *c)
>>>       old_lock = pcpu_schedule_lock_irq(cpu);
>>>   
>>>       vpriv_old = idle->sched_priv;
>>> -    ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
>>> -    sched_switch_sched(new_ops, cpu, ppriv, vpriv);
>>> +    ppriv_old = sd->sched_priv;
>>> +    new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv);
>>> +
>>> +    per_cpu(scheduler, cpu) = new_ops;
>>> +    sd->sched_priv = ppriv;
>>> +
>>> +    /*
>>> +     * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
>>> +     * if it is free (and it can be) we want that anyone that manages
>>> +     * taking it, finds all the initializations we've done above in place.
>>> +     */
>>> +    smp_mb();
>> 
>> I realise you're just moving existing code, but this barrier sticks out
>> like a sore thumb.
>> 
>> A full memory barrier is a massive overhead for what should be
>> smp_wmb().  The matching barrier is actually hidden in the implicit
>> semantics of managing to lock sd->schedule_lock (which is trial an error
>> anyway), but the only thing that matters here is that all other written
>> data is in place first.
>> 
>> Beyond that, local causality will cause all reads to be in order (not
>> that the are important) due to logic dependencies.  Any that miss out on
>> this are a optimisation-waiting-to-happen as the compiler could elide
>> them fully.
> 
> Not that it would really matter for performance (switching cpus between
> cpupools is a _very_ rare operation), I'm fine transforming the barrier
> into smp_wmb().

This again is a change easy enough to make while committing. I'm
recording the above in case I end up being the committer.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 49/60] xen/sched: reject switching smt on/off with core scheduling active
  2019-05-28 11:52       ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-06-12  9:36       ` Dario Faggioli
  -1 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-06-12  9:36 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Julien Grall, xen-devel, Ian Jackson,
	Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 1302 bytes --]

On Tue, 2019-05-28 at 13:52 +0200, Juergen Gross wrote:
> On 28/05/2019 13:44, Jan Beulich wrote:
> > > > > On 28.05.19 at 12:33, <jgross@suse.com> wrote:
> > > --- a/xen/arch/x86/sysctl.c
> > > +++ b/xen/arch/x86/sysctl.c
> > > @@ -200,7 +200,8 @@ long arch_do_sysctl(
> > >  
> > >          case XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE:
> > >          case XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE:
> > > -            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings
> > > < 2 )
> > > +            if ( !cpu_has_htt || boot_cpu_data.x86_num_siblings
> > > < 2 ||
> > > +                 sched_disable_smt_switching )
> > >              {
> > >                  ret = -EOPNOTSUPP;
> > >                  break;
> > 
> > I'm not convinced -EOPNOTSUPP is an appropriate error code for
> > this new case. -EPERM, -EACCES, or -EIO would all seem more
> > appropriate to me (and perhaps there are further ones).
> 
> I think -EIO or -EBUSY would be the best fit.
> 
I agree, with mild preference for EBUSY.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
  2019-06-12  8:06       ` Juergen Gross
@ 2019-06-12  9:44         ` Dario Faggioli
  0 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-06-12  9:44 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Robert VanVossen, Tim Deegan,
	Julien Grall, Joshua Whitehead, Meng Xu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1471 bytes --]

On Wed, 2019-06-12 at 10:06 +0200, Juergen Gross wrote:
> On 12.06.19 09:40, Jan Beulich wrote:
> > > > > On 11.06.19 at 18:55, <dfaggioli@suse.com> wrote:
> > > On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> > > > Signed-off-by: Juergen Gross <jgross@suse.com>
> > > > 
> > > Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
> > > 
> > > I'm wondering whether it make sense for the above to be quickly
> > > mentioned in the changelog, but am leaning toward "not really
> > > necessary". In particular, I don't think it's worth to respin the
> > > patch
> > > just for that... So, either just something that can be added
> > > while
> > > committing, or forget it.
> > 
> > I'd be happy to add something while committing, but one of you
> > would need to propose the wording of this "something".
> 
> What about:
> 
> It should be noted that in credit2 the lock used to be set while
> still
> holding the global scheduler write lock, which will no longer be true
> with the new scheme applied. This is actually no problem as the write
> lock is meant to guard the call of init_pdata() which still is true.
> 
I'm ok with this.

Thanks (to both) and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
       [not found]       ` <5D00C6960200007800237622@suse.com>
@ 2019-06-12  9:56         ` Dario Faggioli
  2019-06-12 10:14           ` Jan Beulich
  0 siblings, 1 reply; 202+ messages in thread
From: Dario Faggioli @ 2019-06-12  9:56 UTC (permalink / raw)
  To: Jan Beulich, Juergen Gross, Andrew Cooper
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Tim Deegan, Ian Jackson, Robert VanVossen, Julien Grall,
	Joshua Whitehead, Meng Xu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2312 bytes --]

On Wed, 2019-06-12 at 10:32 +0100, Jan Beulich wrote:
> > > > On 12.06.19 at 10:19, <jgross@suse.com> wrote:
> > On 12.06.19 10:05, Andrew Cooper wrote:
> > > On 28/05/2019 11:32, Juergen Gross wrote:
> > > > 
> > > > +    per_cpu(scheduler, cpu) = new_ops;
> > > > +    sd->sched_priv = ppriv;
> > > > +
> > > > +    /*
> > > > +     * (Re?)route the lock to the per pCPU lock as /last/
> > > > thing. In fact,
> > > > +     * if it is free (and it can be) we want that anyone that
> > > > manages
> > > > +     * taking it, finds all the initializations we've done
> > > > above in place.
> > > > +     */
> > > > +    smp_mb();
> > > 
> > > A full memory barrier is a massive overhead for what should be
> > > smp_wmb().  The matching barrier is actually hidden in the
> > > implicit
> > > semantics of managing to lock sd->schedule_lock (which is trial
> > > an error
> > > anyway), but the only thing that matters here is that all other
> > > written
> > > data is in place first.
> > > 
> > Not that it would really matter for performance (switching cpus
> > between
> > cpupools is a _very_ rare operation), I'm fine transforming the
> > barrier
> > into smp_wmb().
> 
> This again is a change easy enough to make while committing. I'm
> recording the above in case I end up being the committer.
> 
I'm fine (i.e., my Rev-by: remains valid) with this being turned into a
wmb(), and it being done upon commit (thanks Jan for the offer to do
that!).

If we do it, however, I think the comment needs to be adjusted too, and
the commit message needs to briefly mention this new change we're
doing.

Maybe, for the comment, add something like:

"The smp_wmb() corresponds to the barrier implicit in successfully
taking the lock."

And, for the changelog, add:

"While there, turn the full barrier, which was overkill, into an
smp_wmb(), matching with the one implicit in managing to take the
lock."

Or something similar (and again, R-b: still stands with such changes).

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
  2019-06-12  9:56         ` Dario Faggioli
@ 2019-06-12 10:14           ` Jan Beulich
  2019-06-12 11:27             ` Andrew Cooper
  0 siblings, 1 reply; 202+ messages in thread
From: Jan Beulich @ 2019-06-12 10:14 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Robert VanVossen,
	Tim Deegan, Julien Grall, Joshua Whitehead, Meng Xu, xen-devel

>>> On 12.06.19 at 11:56, <dfaggioli@suse.com> wrote:
> On Wed, 2019-06-12 at 10:32 +0100, Jan Beulich wrote:
>> > > > On 12.06.19 at 10:19, <jgross@suse.com> wrote:
>> > On 12.06.19 10:05, Andrew Cooper wrote:
>> > > On 28/05/2019 11:32, Juergen Gross wrote:
>> > > > 
>> > > > +    per_cpu(scheduler, cpu) = new_ops;
>> > > > +    sd->sched_priv = ppriv;
>> > > > +
>> > > > +    /*
>> > > > +     * (Re?)route the lock to the per pCPU lock as /last/
>> > > > thing. In fact,
>> > > > +     * if it is free (and it can be) we want that anyone that
>> > > > manages
>> > > > +     * taking it, finds all the initializations we've done
>> > > > above in place.
>> > > > +     */
>> > > > +    smp_mb();
>> > > 
>> > > A full memory barrier is a massive overhead for what should be
>> > > smp_wmb().  The matching barrier is actually hidden in the
>> > > implicit
>> > > semantics of managing to lock sd->schedule_lock (which is trial
>> > > an error
>> > > anyway), but the only thing that matters here is that all other
>> > > written
>> > > data is in place first.
>> > > 
>> > Not that it would really matter for performance (switching cpus
>> > between
>> > cpupools is a _very_ rare operation), I'm fine transforming the
>> > barrier
>> > into smp_wmb().
>> 
>> This again is a change easy enough to make while committing. I'm
>> recording the above in case I end up being the committer.
>> 
> I'm fine (i.e., my Rev-by: remains valid) with this being turned into a
> wmb(), and it being done upon commit (thanks Jan for the offer to do
> that!).
> 
> If we do it, however, I think the comment needs to be adjusted too, and
> the commit message needs to briefly mention this new change we're
> doing.
> 
> Maybe, for the comment, add something like:
> 
> "The smp_wmb() corresponds to the barrier implicit in successfully
> taking the lock."

I'm not entirely happy with this one: Taking a lock is an implicit full
barrier, so cannot alone be used to reason why a write barrier is
sufficient here.

> And, for the changelog, add:
> 
> "While there, turn the full barrier, which was overkill, into an
> smp_wmb(), matching with the one implicit in managing to take the
> lock."

This one reads fine to me.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
  2019-06-12 10:14           ` Jan Beulich
@ 2019-06-12 11:27             ` Andrew Cooper
  2019-06-12 13:32               ` Dario Faggioli
  0 siblings, 1 reply; 202+ messages in thread
From: Andrew Cooper @ 2019-06-12 11:27 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli
  Cc: Juergen Gross, Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Robert VanVossen,
	Julien Grall, Joshua Whitehead, Meng Xu, xen-devel

On 12/06/2019 11:14, Jan Beulich wrote:
>>>> On 12.06.19 at 11:56, <dfaggioli@suse.com> wrote:
>> On Wed, 2019-06-12 at 10:32 +0100, Jan Beulich wrote:
>>>>>> On 12.06.19 at 10:19, <jgross@suse.com> wrote:
>>>> On 12.06.19 10:05, Andrew Cooper wrote:
>>>>> On 28/05/2019 11:32, Juergen Gross wrote:
>>>>>> +    per_cpu(scheduler, cpu) = new_ops;
>>>>>> +    sd->sched_priv = ppriv;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * (Re?)route the lock to the per pCPU lock as /last/
>>>>>> thing. In fact,
>>>>>> +     * if it is free (and it can be) we want that anyone that
>>>>>> manages
>>>>>> +     * taking it, finds all the initializations we've done
>>>>>> above in place.
>>>>>> +     */
>>>>>> +    smp_mb();
>>>>> A full memory barrier is a massive overhead for what should be
>>>>> smp_wmb().  The matching barrier is actually hidden in the
>>>>> implicit
>>>>> semantics of managing to lock sd->schedule_lock (which is trial
>>>>> an error
>>>>> anyway), but the only thing that matters here is that all other
>>>>> written
>>>>> data is in place first.
>>>>>
>>>> Not that it would really matter for performance (switching cpus
>>>> between
>>>> cpupools is a _very_ rare operation), I'm fine transforming the
>>>> barrier
>>>> into smp_wmb().
>>> This again is a change easy enough to make while committing. I'm
>>> recording the above in case I end up being the committer.
>>>
>> I'm fine (i.e., my Rev-by: remains valid) with this being turned into a
>> wmb(), and it being done upon commit (thanks Jan for the offer to do
>> that!).
>>
>> If we do it, however, I think the comment needs to be adjusted too, and
>> the commit message needs to briefly mention this new change we're
>> doing.
>>
>> Maybe, for the comment, add something like:
>>
>> "The smp_wmb() corresponds to the barrier implicit in successfully
>> taking the lock."
> I'm not entirely happy with this one: Taking a lock is an implicit full
> barrier, so cannot alone be used to reason why a write barrier is
> sufficient here.

It is a consequence of our extra magic scheduler locks which protect the
pointer used to locate them, and the ensuing double-step in ???
(seriously - I can't figure out the function names created by the
sched_lock() monstrosity) which take the lock, re-read the lock pointer
and drop on a mismatch.

I've gone with:

+    /*
+     * The data above is protected under new_lock, which may be unlocked.
+     * Another CPU can take new_lock as soon as sd->schedule_lock is
visible,
+     * and must observe all prior initialisation.
+     */
+    smp_wmb();

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address
  2019-06-12 11:27             ` Andrew Cooper
@ 2019-06-12 13:32               ` Dario Faggioli
  0 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-06-12 13:32 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Robert VanVossen,
	Julien Grall, Joshua Whitehead, Meng Xu, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 920 bytes --]

On Wed, 2019-06-12 at 12:27 +0100, Andrew Cooper wrote:
> It is a consequence of our extra magic scheduler locks which protect
> the
> pointer used to locate them, and the ensuing double-step in ???
> (seriously - I can't figure out the function names created by the
> sched_lock() monstrosity) which take the lock, re-read the lock
> pointer
> and drop on a mismatch.
> 
Just FTR, they're:

vcpu_schedule_lock()
vcpu_schedule_lock_irq()
vcpu_schedule_lock_irqsave()

and:

pcpu_schedule_lock()
pcpu_schedule_lock_irq()
pcpu_schedule_lock_irqsave()

and the corresponding *_schedule_unlock_*() ones, of course. ;-P

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-06-13  7:18   ` Andrii Anisov
  2019-06-13  7:29     ` Juergen Gross
  -1 siblings, 1 reply; 202+ messages in thread
From: Andrii Anisov @ 2019-06-13  7:18 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Hello Juergen,

Please note that this patch will clash with [1].

On 28.05.19 13:32, Juergen Gross wrote:
> vcpu->last_run_time is primarily used by sched_credit, so move it to
> struct sched_unit, too.

`last_run_time` is moved to credit privates as for current staging.


[1] https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=608639ffa0a0d6f219e14ba7397ab2cc018b93c9

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-06-13  7:18   ` Andrii Anisov
@ 2019-06-13  7:29     ` Juergen Gross
  2019-06-13  7:34       ` Andrii Anisov
  0 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-06-13  7:29 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Hi Andrii,

On 13.06.19 09:18, Andrii Anisov wrote:
> Hello Juergen,
> 
> Please note that this patch will clash with [1].
> 
> On 28.05.19 13:32, Juergen Gross wrote:
>> vcpu->last_run_time is primarily used by sched_credit, so move it to
>> struct sched_unit, too.
> 
> `last_run_time` is moved to credit privates as for current staging.

Thanks for the heads up, but I've rebased already. :-)


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-06-13  7:29     ` Juergen Gross
@ 2019-06-13  7:34       ` Andrii Anisov
  2019-06-13  8:39         ` Juergen Gross
  0 siblings, 1 reply; 202+ messages in thread
From: Andrii Anisov @ 2019-06-13  7:34 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné



On 13.06.19 10:29, Juergen Gross wrote:
> Thanks for the heads up, but I've rebased already. :-)

Oh, great. I'm just wondering if you put it already on your github?
I'm playing with scheduling on my site, and I have a strong feeling I should be based on your series ;)

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-06-13  7:34       ` Andrii Anisov
@ 2019-06-13  8:39         ` Juergen Gross
  2019-06-13  8:49           ` Andrii Anisov
  0 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-06-13  8:39 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

On 13.06.19 09:34, Andrii Anisov wrote:
> 
> 
> On 13.06.19 10:29, Juergen Gross wrote:
>> Thanks for the heads up, but I've rebased already. :-)
> 
> Oh, great. I'm just wondering if you put it already on your github?

github.com/jgross1/xen sched-v1-rebase

Only compile tested on x86 up to now, but rebase was rather easy.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-06-13  8:39         ` Juergen Gross
@ 2019-06-13  8:49           ` Andrii Anisov
  0 siblings, 0 replies; 202+ messages in thread
From: Andrii Anisov @ 2019-06-13  8:49 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Meng Xu, Jan Beulich, xen-devel, Dario Faggioli,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 278 bytes --]

чт, 13 черв. 2019 о 11:39 Juergen Gross <jgross@suse.com> пише:

> github.com/jgross1/xen sched-v1-rebase
>
> Only compile tested on x86 up to now, but rebase was rather easy.
>

Cool, will take it and check for ARM.
Thank you.

Sincerely,
Andrii Anisov.

[-- Attachment #1.2: Type: text/html, Size: 809 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 38/60] x86: make loading of GDT at context switch more modular
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-07-02 15:38   ` Andrew Cooper
  -1 siblings, 0 replies; 202+ messages in thread
From: Andrew Cooper @ 2019-07-02 15:38 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Jan Beulich, Roger Pau Monné

Subject "load of the GDT"

On 28/05/2019 11:32, Juergen Gross wrote:
> In preparation for core scheduling carve out the GDT related

"for core scheduling, carve"

> functionality (writing GDT related PTEs, loading default of full GDT)
> into sub-functions.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC V2: split off non-refactoring part
> V1: constify pointers, use initializers (Jan Beulich)
> ---
>  xen/arch/x86/domain.c | 57 +++++++++++++++++++++++++++++++--------------------
>  1 file changed, 35 insertions(+), 22 deletions(-)
>
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index d3ee699da6..adc06154ee 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1626,6 +1626,37 @@ static inline bool need_full_gdt(const struct domain *d)
>      return is_pv_domain(d) && !is_idle_domain(d);
>  }
>  
> +static inline void write_full_gdt_ptes(seg_desc_t *gdt, const struct vcpu *v)

We typically don't use inline for local static functions.

All can be fixed on commit.  Acked-by: Andrew Cooper
<andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 39/60] x86: optimize loading of GDT at context switch
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-07-02 16:09   ` Andrew Cooper
  2019-07-03  6:30     ` Juergen Gross
  -1 siblings, 1 reply; 202+ messages in thread
From: Andrew Cooper @ 2019-07-02 16:09 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 28/05/2019 11:32, Juergen Gross wrote:
> Instead of dynamically decide whether the previous vcpu was using full

"deciding"

> or default GDT just add a percpu variable for that purpose. This at

"was using a full or default GDT, just add"

> once removes the need for testing vcpu_ids to differ twice.
>
> Cache the need_full_gdt(nd) value in a local variable.

What's the point of doing this?  I know the logic is rather complicated
in __context_switch(), but at least it is visually consistent.  After
this change, it is asymmetric and harder to follow.

>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC V2: new patch (split from previous one)
> V1: init percpu flag at cpu startup
>     rename variable (Jan Beulich)
> ---
>  xen/arch/x86/cpu/common.c  |  3 +++
>  xen/arch/x86/domain.c      | 16 +++++++++++-----
>  xen/include/asm-x86/desc.h |  1 +
>  3 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
> index 33f5d32557..8b90356fe5 100644
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -49,6 +49,8 @@ unsigned int vaddr_bits __read_mostly = VADDR_BITS;
>  static unsigned int cleared_caps[NCAPINTS];
>  static unsigned int forced_caps[NCAPINTS];
>  
> +DEFINE_PER_CPU(bool, full_gdt_loaded);
> +
>  void __init setup_clear_cpu_cap(unsigned int cap)
>  {
>  	const uint32_t *dfs;
> @@ -745,6 +747,7 @@ void load_system_tables(void)
>  		offsetof(struct tss_struct, __cacheline_filler) - 1,
>  		SYS_DESC_tss_busy);
>  
> +        per_cpu(full_gdt_loaded, cpu) = false;

Indentation.  (Although I've got half a mind to do a blanket convert of
files like this to Xen style.  They are almost completely diverged from
their Linux heritage.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 39/60] x86: optimize loading of GDT at context switch
  2019-07-02 16:09   ` Andrew Cooper
@ 2019-07-03  6:30     ` Juergen Gross
  2019-07-03 12:21       ` Andrew Cooper
  0 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-07-03  6:30 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 02.07.19 18:09, Andrew Cooper wrote:
> On 28/05/2019 11:32, Juergen Gross wrote:
>> Instead of dynamically decide whether the previous vcpu was using full
> 
> "deciding"
> 
>> or default GDT just add a percpu variable for that purpose. This at
> 
> "was using a full or default GDT, just add"
> 
>> once removes the need for testing vcpu_ids to differ twice.
>>
>> Cache the need_full_gdt(nd) value in a local variable.
> 
> What's the point of doing this?  I know the logic is rather complicated
> in __context_switch(), but at least it is visually consistent.  After
> this change, it is asymmetric and harder to follow.

This is a hot path. need_full_gdt() needs two compares, of which one is
using evaluate_nospec().


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 39/60] x86: optimize loading of GDT at context switch
  2019-07-03  6:30     ` Juergen Gross
@ 2019-07-03 12:21       ` Andrew Cooper
  2019-07-05  7:30         ` Juergen Gross
  0 siblings, 1 reply; 202+ messages in thread
From: Andrew Cooper @ 2019-07-03 12:21 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 03/07/2019 07:30, Juergen Gross wrote:
> On 02.07.19 18:09, Andrew Cooper wrote:
>> On 28/05/2019 11:32, Juergen Gross wrote:
>>> Instead of dynamically decide whether the previous vcpu was using full
>>
>> "deciding"
>>
>>> or default GDT just add a percpu variable for that purpose. This at
>>
>> "was using a full or default GDT, just add"
>>
>>> once removes the need for testing vcpu_ids to differ twice.
>>>
>>> Cache the need_full_gdt(nd) value in a local variable.
>>
>> What's the point of doing this?  I know the logic is rather complicated
>> in __context_switch(), but at least it is visually consistent.  After
>> this change, it is asymmetric and harder to follow.
>
> This is a hot path. need_full_gdt() needs two compares, of which one is
> using evaluate_nospec().

Urgh.  So evalute_nospec() is already broken here because
need_full_gdt() isn't always_inline, but surely this isn't the only
example impacted in __context_switch()?  The choice of 'gdt' is
similarly impacted by the looks of things.

I'd recommend not worrying about evalute_nospec() for now.  There are
several fundamental problems atm, and Xen 4.13 cannot ship with it in
this state.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 39/60] x86: optimize loading of GDT at context switch
  2019-07-03 12:21       ` Andrew Cooper
@ 2019-07-05  7:30         ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-07-05  7:30 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 03.07.19 14:21, Andrew Cooper wrote:
> On 03/07/2019 07:30, Juergen Gross wrote:
>> On 02.07.19 18:09, Andrew Cooper wrote:
>>> On 28/05/2019 11:32, Juergen Gross wrote:
>>>> Instead of dynamically decide whether the previous vcpu was using full
>>>
>>> "deciding"
>>>
>>>> or default GDT just add a percpu variable for that purpose. This at
>>>
>>> "was using a full or default GDT, just add"
>>>
>>>> once removes the need for testing vcpu_ids to differ twice.
>>>>
>>>> Cache the need_full_gdt(nd) value in a local variable.
>>>
>>> What's the point of doing this?  I know the logic is rather complicated
>>> in __context_switch(), but at least it is visually consistent.  After
>>> this change, it is asymmetric and harder to follow.
>>
>> This is a hot path. need_full_gdt() needs two compares, of which one is
>> using evaluate_nospec().
> 
> Urgh.  So evalute_nospec() is already broken here because
> need_full_gdt() isn't always_inline, but surely this isn't the only
> example impacted in __context_switch()?  The choice of 'gdt' is
> similarly impacted by the looks of things.
> 
> I'd recommend not worrying about evalute_nospec() for now.  There are
> several fundamental problems atm, and Xen 4.13 cannot ship with it in
> this state.

I did a small performance test with this patch and then removed latching
of need_full_gdt(nd) in the local variable:

On a 8 cpu system I started 2 mini-os domains (1 vcpu each) doing a busy
loop sending events to dom0. On dom0 I did a build of the hypervisor via
"make -j 8" and measured the time for that build, then took the average
of 5 such builds (doing a make clean in between).

            elapsed  user   system
Unpatched:  66.51  232.93  109.21
latched:    64.82  232.33  109.18
unlatched:  63.39  231.81  107.49

As there is a small advantage for not latching I'll remove the full_gdt
local variable.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-05-28 10:32 ` [Xen-devel] " Juergen Gross
                   ` (60 preceding siblings ...)
  (?)
@ 2019-07-05 13:17 ` Sergey Dyasli
  2019-07-05 13:22   ` Juergen Gross
                     ` (3 more replies)
  -1 siblings, 4 replies; 202+ messages in thread
From: Sergey Dyasli @ 2019-07-05 13:17 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sergey.dyasli@citrix.com >> Sergey Dyasli,
	Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Robert VanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, Ian Jackson, Roger Pau Monné

Hi Juergen,

I did some testing of this series (with sched-gran=core) and posting a couple of
crash backtraces here for your information.

Additionally, resuming a Debian 7 guest after suspend is broken.

I will be able to provide any additional information only after XenSummit :)

1) This crash is quite likely to happen:

[2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects that CPU2 is stuck!
[2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] ----[ Xen-4.13.0-8.0.6-d  x86_64  debug=y   Not tainted ]----
[2019-07-04 18:22:46 UTC] (XEN) [ 3425.233576] CPU:    2
[2019-07-04 18:22:46 UTC] (XEN) [ 3425.236348] RIP:    e008:[<ffff82d08023d578>] vcpu_sleep_sync+0x50/0x71
[2019-07-04 18:22:46 UTC] (XEN) [ 3425.243458] RFLAGS: 0000000000000202   CONTEXT: hypervisor (d34v0)
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.250129] rax: 0000000000000001   rbx: ffff8305f29e6000   rcx: ffff8305f29e6128
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.258101] rdx: 0000000000000000   rsi: 0000000000000296   rdi: ffff8308066f9128
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.266076] rbp: ffff8308066f7cb8   rsp: ffff8308066f7ca8   r8:  00000000deadf00d
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.274052] r9:  00000000deadf00d   r10: 0000000000000000   r11: 0000000000000000
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.282026] r12: 0000000000000000   r13: ffff8305f29e6000   r14: 0000000000000000
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.289994] r15: 0000000000000003   cr0: 000000008005003b   cr4: 00000000001526e0
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.297970] cr3: 00000005f2de3000   cr2: 00000000c012ae78
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.303864] fsb: 0000000004724000   gsb: 00000000c52c4a20   gss: 0000000000000000
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.311836] ds: 007b   es: 007b   fs: 00d8   gs: 00e0   ss: 0000   cs: e008
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.319290] Xen code around <ffff82d08023d578> (vcpu_sleep_sync+0x50/0x71):
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.326744]  ec 01 00 00 09 d0 48 98 <48> 0b 83 20 01 00 00 74 09 80 bb 07 01 00 00 00
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.335152] Xen stack trace from rsp=ffff8308066f7ca8:
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.340783]    ffff82d0802aede4 ffff8305f29e6000 ffff8308066f7cc8 ffff82d080208370
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.348844]    ffff8308066f7ce8 ffff82d08023e25d 0000000000000001 ffff8305f33f0000
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.356904]    ffff8308066f7d58 ffff82d080209682 0000031c63c966ad 00000000ed601000
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.364963]    0000000092920063 0000000000000009 ffff8305f33f0000 0000000000000001
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.373024]    0000000000000292 ffff82d080242ee2 0000000000000001 ffff8305f29e6000
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.381084]    0000000000000000 ffff8305f33f0000 ffff8308066f7e28 ffff82d08024f970
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.389144]    ffff8305f33f00d4 000000000000000c ffff8305f33f0000 00000000deadf00d
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.397207]    ffff8308066f7da8 ffff82d0802b3754 ffff82d080209d46 ffff82d08020b6e7
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.405262]    ffff8308066f7e28 ffff82d08020c658 00000002ec86be74 0000000000000002
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.413325]    ffff8305c33d8300 ffff8305f33f00d4 aaaaaaaaaaaaaaaa 0000000c00000008
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.421383]    0000000000000009 ffff83081cca1000 ffff82d08038835a ffff8308066f7ef8
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.429445]    ffff8306a2b11000 00000000deadf00d 0000000000000180 0000000000000003
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.437503]    ffff8308066f7ec8 ffff82d080383964 ffff82d08038835a ffff82d000000007
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.445565]    ffff82d000000001 ffff82d000000000 ffff82d0deadf00d ffff82d0deadf00d
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.453624]    ffff82d08038835a ffff82d08038834e ffff82d08038835a ffff82d08038834e
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.461683]    ffff82d08038835a ffff82d08038834e ffff82d08038835a ffff8308066f7ef8
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.469744]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.477803]    ffff8308066f7ee8 ffff82d080385644 ffff82d08038835a ffff8306a2b11000
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.485865]    00007cf7f99080e7 ffff82d08038839b 0000000000000000 0000000000000000
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.493923]    0000000000000000 0000000000000000 0000000000000001 0000000000000007
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace:
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.505278]    [<ffff82d08023d578>] vcpu_sleep_sync+0x50/0x71
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.511518]    [<ffff82d080208370>] vcpu_pause+0x21/0x23
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.517326]    [<ffff82d08023e25d>] vcpu_set_periodic_timer+0x27/0x73
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.524258]    [<ffff82d080209682>] do_vcpu_op+0x2c9/0x668
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.530238]    [<ffff82d08024f970>] compat_vcpu_op+0x250/0x390
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.536566]    [<ffff82d080383964>] pv_hypercall+0x364/0x564
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.542719]    [<ffff82d080385644>] do_entry_int82+0x26/0x2d
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.548876]    [<ffff82d08038839b>] entry_int82+0xbb/0xc0
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.554764]
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.556760] CPU1 @ beef:fffff88000a5f495 (0000000000000000)
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.562825] CPU0 @ e008:ffff82d080253c51 (ns16550.c#ns16550_interrupt+0/0x79)
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.570452] CPU7 @ e033:ffffffff810fb49b (0000000000000000)
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.576518] CPU6 @ e008:ffff82d080279bc1 (domain.c#default_idle+0xc3/0xda)
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.583887] CPU3 @ 0061:c04b668a (0000000000000000)
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.589259] CPU4 @ 0061:c053468e (0000000000000000)
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.594636] CPU5 @ e008:ffff82d080279bc1 (domain.c#default_idle+0xc3/0xda)
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.602537]
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.604532] ****************************************
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.609991] Panic on CPU 2:
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.613283] FATAL TRAP: vector = 2 (nmi)
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.617706] [error_code=0000]
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.621257] ****************************************
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.626715]
[2019-07-04 18:22:47 UTC] (XEN) [ 3425.628711] Reboot in five seconds...

2) This one has been seen only once so far:

[2019-07-05 00:37:16 UTC] (XEN) [24907.482686] Watchdog timer detects that CPU30 is stuck!
[2019-07-05 00:37:16 UTC] (XEN) [24907.514180] ----[ Xen-4.13.0-8.0.6-d  x86_64  debug=y   Not tainted ]----
[2019-07-05 00:37:16 UTC] (XEN) [24907.552070] CPU:    30
[2019-07-05 00:37:16 UTC] (XEN) [24907.565281] RIP:    e008:[<ffff82d0802406fc>] sched_context_switched+0xaf/0x101
[2019-07-05 00:37:16 UTC] (XEN) [24907.601232] RFLAGS: 0000000000000202   CONTEXT: hypervisor
[2019-07-05 00:37:16 UTC] (XEN) [24907.629998] rax: 0000000000000002   rbx: ffff83202782e880   rcx: 000000000000001e
[2019-07-05 00:37:16 UTC] (XEN) [24907.669651] rdx: ffff83202782e904   rsi: ffff832027823000   rdi: ffff832027823000
[2019-07-05 00:37:16 UTC] (XEN) [24907.706560] rbp: ffff83403cab7d20   rsp: ffff83403cab7d00   r8:  0000000000000000
[2019-07-05 00:37:16 UTC] (XEN) [24907.743258] r9:  0000000000000000   r10: 0200200200200200   r11: 0100100100100100
[2019-07-05 00:37:16 UTC] (XEN) [24907.779940] r12: ffff832027823000   r13: ffff832027823000   r14: ffff83202782e7b0
[2019-07-05 00:37:16 UTC] (XEN) [24907.816849] r15: ffff83202782e880   cr0: 000000008005003b   cr4: 00000000000426e0
[2019-07-05 00:37:16 UTC] (XEN) [24907.854125] cr3: 00000000bd8a1000   cr2: 000000001851b798
[2019-07-05 00:37:16 UTC] (XEN) [24907.881483] fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
[2019-07-05 00:37:16 UTC] (XEN) [24907.918309] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
[2019-07-05 00:37:16 UTC] (XEN) [24907.952619] Xen code around <ffff82d0802406fc> (sched_context_switched+0xaf/0x101):
[2019-07-05 00:37:16 UTC] (XEN) [24907.990277]  00 00 eb 18 f3 90 8b 02 <85> c0 75 f8 eb 0e 49 8b 7e 30 48 85 ff 74 05 e8
[2019-07-05 00:37:16 UTC] (XEN) [24908.032393] Xen stack trace from rsp=ffff83403cab7d00:
[2019-07-05 00:37:16 UTC] (XEN) [24908.061298]    ffff832027823000 ffff832027823000 0000000000000000 ffff83202782e880
[2019-07-05 00:37:16 UTC] (XEN) [24908.098529]    ffff83403cab7d60 ffff82d0802407c0 0000000000000082 ffff83202782e7c8
[2019-07-05 00:37:16 UTC] (XEN) [24908.135622]    000000000000001e ffff83202782e7c8 000000000000001e ffff82d080602628
[2019-07-05 00:37:16 UTC] (XEN) [24908.172671]    ffff83403cab7dc0 ffff82d080240d83 000000000000df99 000000000000001e
[2019-07-05 00:37:16 UTC] (XEN) [24908.210212]    ffff832027823000 000016a62dc8c6bc 000000fc00000000 000000000000001e
[2019-07-05 00:37:16 UTC] (XEN) [24908.247181]    ffff83202782e7c8 ffff82d080602628 ffff82d0805da460 000000000000001e
[2019-07-05 00:37:16 UTC] (XEN) [24908.284279]    ffff83403cab7e60 ffff82d080240ea4 00000002802aecc5 ffff832027823000
[2019-07-05 00:37:16 UTC] (XEN) [24908.321128]    ffff83202782e7b0 ffff83202782e880 ffff83403cab7e10 ffff82d080273b4e
[2019-07-05 00:37:16 UTC] (XEN) [24908.358308]    ffff83403cab7e10 ffff82d080242f7f ffff83403cab7e60 ffff82d08024663a
[2019-07-05 00:37:17 UTC] (XEN) [24908.395662]    ffff83403cab7ea0 ffff82d0802ec32a ffff8340000000ff ffff82d0805bc880
[2019-07-05 00:37:17 UTC] (XEN) [24908.432376]    ffff82d0805bb980 ffffffffffffffff ffff83403cab7fff 000000000000001e
[2019-07-05 00:37:17 UTC] (XEN) [24908.469812]    ffff83403cab7e90 ffff82d080242575 0000000000000f00 ffff82d0805bb980
[2019-07-05 00:37:17 UTC] (XEN) [24908.508373]    000000000000001e ffff82d0806026f0 ffff83403cab7ea0 ffff82d0802425ca
[2019-07-05 00:37:17 UTC] (XEN) [24908.549856]    ffff83403cab7ef0 ffff82d08027a601 ffff82d080242575 0000001e7ffde000
[2019-07-05 00:37:17 UTC] (XEN) [24908.588022]    ffff832027823000 ffff832027823000 ffff83127ffde000 ffff83203ffe5000
[2019-07-05 00:37:17 UTC] (XEN) [24908.625217]    000000000000001e ffff831204092000 ffff83403cab7d78 00000000ffffffed
[2019-07-05 00:37:17 UTC] (XEN) [24908.662932]    ffffffff81800000 0000000000000000 ffffffff81800000 0000000000000000
[2019-07-05 00:37:17 UTC] (XEN) [24908.703246]    ffffffff818f4580 ffff880039118848 00000e6a3c4b2698 00000000148900db
[2019-07-05 00:37:17 UTC] (XEN) [24908.743671]    0000000000000000 ffffffff8101e650 ffffffff8185c3e0 0000000000000000
[2019-07-05 00:37:17 UTC] (XEN) [24908.781927]    0000000000000000 0000000000000000 0000beef0000beef ffffffff81054eb2
[2019-07-05 00:37:17 UTC] (XEN) [24908.820986] Xen call trace:
[2019-07-05 00:37:17 UTC] (XEN) [24908.836789]    [<ffff82d0802406fc>] sched_context_switched+0xaf/0x101
[2019-07-05 00:37:17 UTC] (XEN) [24908.869916]    [<ffff82d0802407c0>] schedule.c#sched_context_switch+0x72/0x151
[2019-07-05 00:37:17 UTC] (XEN) [24908.907384]    [<ffff82d080240d83>] schedule.c#sched_slave+0x2a3/0x2b2
[2019-07-05 00:37:17 UTC] (XEN) [24908.941241]    [<ffff82d080240ea4>] schedule.c#schedule+0x112/0x2a1
[2019-07-05 00:37:17 UTC] (XEN) [24908.973939]    [<ffff82d080242575>] softirq.c#__do_softirq+0x85/0x90
[2019-07-05 00:37:17 UTC] (XEN) [24909.007101]    [<ffff82d0802425ca>] do_softirq+0x13/0x15
[2019-07-05 00:37:17 UTC] (XEN) [24909.035971]    [<ffff82d08027a601>] domain.c#idle_loop+0xad/0xc0
[2019-07-05 00:37:17 UTC] (XEN) [24909.070546]
[2019-07-05 00:37:17 UTC] (XEN) [24909.080286] CPU0 @ e008:ffff82d0802431ba (stop_machine.c#stopmachine_wait_state+0x1a/0x24)
[2019-07-05 00:37:17 UTC] (XEN) [24909.122896] CPU1 @ e008:ffff82d0802406f8 (sched_context_switched+0xab/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.159518] CPU3 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.199607] CPU2 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.235773] CPU5 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.276039] CPU4 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.312371] CPU7 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.352930] CPU6 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.388928] CPU8 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.424664] CPU9 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.465376] CPU10 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.507449] CPU11 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.544703] CPU13 @ e008:ffff82d0802431f2 (stop_machine.c#stopmachine_action+0x2e/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.588884] CPU12 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.625781] CPU15 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.666649] CPU14 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.703396] CPU17 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.744089] CPU16 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.781117] CPU23 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.821692] CPU22 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
[2019-07-05 00:37:18 UTC] (XEN) [24909.858139] CPU27 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
[2019-07-05 00:37:18 UTC] (XEN) [24909.898704] CPU26 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
[2019-07-05 00:37:19 UTC] (XEN) [24909.936069] CPU19 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:19 UTC] (XEN) [24909.977291] CPU18 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
[2019-07-05 00:37:19 UTC] (XEN) [24910.014078] CPU31 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:19 UTC] (XEN) [24910.055692] CPU21 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:19 UTC] (XEN) [24910.100486] CPU24 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
[2019-07-05 00:37:19 UTC] (XEN) [24910.136824] CPU25 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
[2019-07-05 00:37:19 UTC] (XEN) [24910.177529] CPU29 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
[2019-07-05 00:37:19 UTC] (XEN) [24910.218420] CPU28 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
[2019-07-05 00:37:19 UTC] (XEN) [24910.255219] CPU20 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
[2019-07-05 00:37:19 UTC] (XEN) [24910.292152]
[2019-07-05 00:37:19 UTC] (XEN) [24910.301667] ****************************************
[2019-07-05 00:37:19 UTC] (XEN) [24910.327892] Panic on CPU 30:
[2019-07-05 00:37:19 UTC] (XEN) [24910.344165] FATAL TRAP: vector = 2 (nmi)
[2019-07-05 00:37:19 UTC] (XEN) [24910.365476] [error_code=0000]
[2019-07-05 00:37:19 UTC] (XEN) [24910.382509] ****************************************
[2019-07-05 00:37:19 UTC] (XEN) [24910.408547]
[2019-07-05 00:37:19 UTC] (XEN) [24910.418129] Reboot in five seconds...

Thanks,
Sergey

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-05 13:17 ` [Xen-devel] [PATCH 00/60] xen: add core scheduling support Sergey Dyasli
@ 2019-07-05 13:22   ` Juergen Gross
  2019-07-05 13:56   ` Dario Faggioli
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-07-05 13:22 UTC (permalink / raw)
  To: Sergey Dyasli, xen-devel
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, RobertVanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, Roger Pau Monné

On 05.07.19 15:17, Sergey Dyasli wrote:
> Hi Juergen,
> 
> I did some testing of this series (with sched-gran=core) and posting a couple of
> crash backtraces here for your information.
> 
> Additionally, resuming a Debian 7 guest after suspend is broken.
> 
> I will be able to provide any additional information only after XenSummit :)

Thanks for the reports!

I will look at this after XenSummit. :-)


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-05 13:17 ` [Xen-devel] [PATCH 00/60] xen: add core scheduling support Sergey Dyasli
  2019-07-05 13:22   ` Juergen Gross
@ 2019-07-05 13:56   ` Dario Faggioli
  2019-07-15 14:08     ` Sergey Dyasli
  2019-07-16 15:45   ` Sergey Dyasli
  2019-07-25 16:01   ` Sergey Dyasli
  3 siblings, 1 reply; 202+ messages in thread
From: Dario Faggioli @ 2019-07-05 13:56 UTC (permalink / raw)
  To: Sergey Dyasli, xen-devel, Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, RobertVanVossen,
	Tim Deegan, Julien Grall, Josh Whitehead, Meng Xu, Jan Beulich,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1600 bytes --]

On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote:
> 1) This crash is quite likely to happen:
> 
> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects
> that CPU2 is stuck!
> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] ----[ Xen-4.13.0-
> 8.0.6-d  x86_64  debug=y   Not tainted ]----
> [...]
> [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace:
> [2019-07-04 18:22:47 UTC] (XEN) [
> 3425.505278]    [<ffff82d08023d578>] vcpu_sleep_sync+0x50/0x71
> [2019-07-04 18:22:47 UTC] (XEN) [
> 3425.511518]    [<ffff82d080208370>] vcpu_pause+0x21/0x23
> [2019-07-04 18:22:47 UTC] (XEN) [
> 3425.517326]    [<ffff82d08023e25d>]
> vcpu_set_periodic_timer+0x27/0x73
> [2019-07-04 18:22:47 UTC] (XEN) [
> 3425.524258]    [<ffff82d080209682>] do_vcpu_op+0x2c9/0x668
> [2019-07-04 18:22:47 UTC] (XEN) [
> 3425.530238]    [<ffff82d08024f970>] compat_vcpu_op+0x250/0x390
> [2019-07-04 18:22:47 UTC] (XEN) [
> 3425.536566]    [<ffff82d080383964>] pv_hypercall+0x364/0x564
> [2019-07-04 18:22:47 UTC] (XEN) [
> 3425.542719]    [<ffff82d080385644>] do_entry_int82+0x26/0x2d
> [2019-07-04 18:22:47 UTC] (XEN) [
> 3425.548876]    [<ffff82d08038839b>] entry_int82+0xbb/0xc0
>
Mmm... vcpu_set_periodic_timer?

What kernel is this and when does this crash happen?

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-05-28 10:32 ` [Xen-devel] " Juergen Gross
                   ` (61 preceding siblings ...)
  (?)
@ 2019-07-11 13:40 ` Dario Faggioli
  -1 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-11 13:40 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, tim, ian.jackson,
	robert.vanvossen, julien.grall, josh.whitehead, mengxu,
	Jan Beulich, andrew.cooper3, roger.pau


[-- Attachment #1.1: Type: text/plain, Size: 110932 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> Add support for core- and socket-scheduling in the Xen hypervisor.
> 
> [...]
>
> I have done some very basic performance testing: on a 4 cpu system
> (2 cores with 2 threads each) I did a "make -j 4" for building the
> Xen
> hypervisor. With This test has been run on dom0, once with no other
> guest active and once with another guest with 4 vcpus running the
> same
> test. The results are (always elapsed time, system time, user time):
> 
> sched-gran=cpu,    no other guest: 116.10 177.65 207.84
> sched-gran=core,   no other guest: 114.04 175.47 207.45
> sched-gran=cpu,    other guest:    202.30 334.21 384.63
> sched-gran=core,   other guest:    207.24 293.04 371.37
> 
> The performance tests have been performed with credit2, the other
> schedulers are tested only briefly to be able to create a domain in a
> cpupool.
> 
I have done some more, and I'd like to report the results here.

For those that are attending the Xen-Project Dev Summit, and has seen
Juergen's talk about core-scheduling, these are the numbers he had in
his slides.

They're quite a few number, and there are multiple way to show them.

We arranged them in two different ways, and I'm showing both. Since
it's quite likely that the result will be poor in mail clients, here
they are the links to view in browser or download text files:

http://xenbits.xen.org/people/dariof/benchmarks/results/2019/07-July/xen/core-sched/mmtests/boxes/wayrath/summary.txt

http://xenbits.xen.org/people/dariof/benchmarks/results/2019/07-July/xen/core-sched/mmtests/boxes/wayrath/summary-5columns.txt

They're the same numbers, in the two files, but in
'summary-5columns.txt' has them arranged in tables that show, for each
combination of benchmark and configuration, the differences between the
various options (i.e., no core-scheduling, core-scheduling not used,
core-scheduling in use, etc).

The 'summary.txt' file, contains some more data (such as the results of
runs done on baremetal), arranged in different tables. It also contains
some of my thoughts and analysis about what the numbers tells us.

It's quite hard to come up with a concise summary, as results vary a
lot on a case by case basis, and there are a few things that needs
being investigated mode.

I'll try anyway, but please, if you are interested in the subject, do
have a look at the numbers themselves, even if there's a lot of them:

- Overhead: the cost of having this patch series applied, and not 
  using core-scheduling, seems acceptable to me. In most cases, the 
  overhead is within the noise margin of the measurements. There are a 
  couple of benchmarks where this is not the case. But that means we 
  can go trying figuring out why this happens only there, and, 
  potentially, optimize and tune.

- PV vs. HVM: there seem to be some differences, in some of the 
  results, for different type of guest (well, for PV I used dom0). In 
  general, HVM seems to behave a little worse, i.e., suffers from more 
  overhead and perf degradation, but this is not the case for all 
  benchmarks, so it's hard to tell whether it's something specific or 
  an actual trend.
  I don't have the numbers for proper PV guests and for PVH. I expect 
  the former to be close to dom0 numbers and the latter to HVM 
  numbers, but I'll try to do those runs as well (as soon as the 
  testbox is free again).

- HT vs. noHT: even before considering core-scheduling at all, the 
  debate is still open about whether or not Hyperthreading help in the 
  first place. These numbers shows that this very much depend on the 
  workload and on the load, which is no big surprise.
  It is quite a solid trend, however, than when load is high (look, 
  for instance, at runs that saturate the CPU, or at oversubscribed 
  runs), Hyperthreading let us achieve better results.

- Regressions versus no core-scheduling: this happens, as it could 
  have been expected. It does not happen 100% of the times, and 
  mileage may vary, but in most benchmarks and in most configurations, 
  we do regress.

- Core-scheduling vs. no-Hyperthreading: this is again a mixed bag. 
  There are cases where things are faster in one setup, and cases 
  where it is the other one that wins. Especially in the non 
  overloaded case.

- Core-scheduling and overloading: when more vCPUs than pCPUs are used 
  (and there is actual overload, i.e., the vCPUs actually generate 
  more load than there are pCPUs to satisfy), core-scheduling shows 
  pretty decent performance. This is easy to see, comparing core-
  scheduling with no-Hyperthreading, in the overloaded cases. In most 
  benchmark, both the configuration perform worse than default, but 
  core-scheduling regresses a lot less than no-Hyperthreading. And 
  this, I think, is quite important!

- Core-scheduling and HT-aware scheduling: currently, the scheduler 
  tend to spread vCPUs among cores. That is, if we have 2 vCPUs and 2 
  cores with two threads each, the vCPUs run faster if we put each one 
  on a core, because each one of them has the full core resources 
  available. With core-scheduling implemented this way, this happens a 
  lot less. Actually, most of the times work is run on siblings, 
  leaving full cores entirely idle. This is what hurts pretty much the 
  results of  the runs done inside an HVM guest with 4 vCPUs. It must  
  be said, however:
  * that this also means performance are more consistent, and 
    independent on the actual CPU load. So, if you have a VM and share 
    an host with other VMs, performance of your workload depend less   
    on whether the others are running or idle;
  * I expect this issue to be addressed, at least partially, by 
    exposing a topology to the guest that takes into account that, 
    thanks to core-scheduling, some vCPUs are actually SMT-siblings.  
    In fact, at that point, it will be the guest's scheduler job to 
    decide whether to consolidate or spread the work. And if it 
    decides to spread it, that means tasks will run on vCPUs that are 
    not virtual SMT-siblings, and hence run on different cores (if \
    they are available) at the host level.

Anyway, I've written way too much. Have a look at the results, and feel
free to share what you think about them.

Regards,
Dario
---
XEN CORE-SCHEDULING BENCHMARK
=============================

Code: 
 - Juergen's v1 patches, rebased
   git: https://github.com/jgross1/xen/tree/sched-v1-rebase

Host:
 - CPU: 1 socket, 4 cores, 2 threads
 - RAM: 32 GB
 - distro: opneSUSE Tumbleweed
 - Kernel: 5.1.7
 - HW Bugs Mitigations: fully disabled (both in Xen and dom0)
 - Filesys: XFS

VMs:
 - vCPUs: either 8 or 4
 - distro: opneSUSE Tumbleweed
 - Kernel: 5.1.7
 - HW Bugs Mitigations: fully disabled
 - Filesys: XFS

Configurations:
 - BM-HT: baremetal, HyperThreading enabled (8 CPUs)
 - BM-noHT: baremetal, Hyperthreading disabled (4 CPUs)
 - v-XEN-HT: Xen staging (no coresched patches applied), Xen dom0, 8 vCPUs, Hyperthreading enabled (8 CPUs)
 - v-XEN-noHT: Xen staging (no coresched patches applied), Xen dom0, 4 vCPUs, Hyperthreading disabled (4 CPUs)
 - XEN-HT: Xen with coresched patches applied, Xen dom0, 8 vCPUs, Hyperthreading enabled (8 CPUs), sched-gran=cpu (default)
 - XEN-noHT: Xen with coresched patches applied, Xen dom0, 4 vCPUs, Hyperthreading disabled (4 CPUs), sched-gran=cpu (default)
 - XEN-csc: Xen with coresched patches applied, Xen dom0, 8 vCPUs, Hyperthreading enabled (8 CPUs), sched-gran=core
 - v-HVM-v8-HT: Xen staging (no coresched patches applied), HVM guest, 8 vCPUs, Hyperthreading enabled (8 CPUs)
 - v-HVM-v8-noHT: Xen staging (no coresched patches applied), HVM guest, 8 vCPUs, Hyperthreading enabled (4 CPUs)
 - HVM-v8-HT: Xen with coresched patches applied, HVM guest, 8 vCPUs, Hyperthreading enabled (8 CPUs), sched-gran=cpu (default)
 - HVM-v8-noHT: Xen with coresched patches applied, HVM guest, 8 vCPUs, Hyperthreading enabled (4 CPUs), sched-gran=cpu (default)
 - HVM-v8-csc: Xen with coresched patches applied, HVM guest, 8 vCPUs, Hyperthreading enabled (8 CPUs), sched-gran=core
 - v-HVM-v4-HT: Xen staging (no coresched patches applied), HVM guest, 4 vCPUs, Hyperthreading enabled (8 CPUs)
 - v-HVM-v4-noHT: Xen staging (no coresched patches applied), HVM guest, 4 vCPUs, Hyperthreading enabled (4 CPUs)
 - HVM-v4-HT: Xen with coresched patches applied, HVM guest, 4 vCPUs, Hyperthreading enabled (8 CPUs), sched-gran=cpu (default)
 - HVM-v4-noHT: Xen with coresched patches applied, HVM guest, 4 vCPUs, Hyperthreading enabled (4 CPUs), sched-gran=cpu (default)
 - HVM-v4-csc: Xen with coresched patches applied, HVM guest, 4 vCPUs, Hyperthreading enabled (8 CPUs), sched-gran=core
 - v-HVMx2-v8-HT: Xen staging (no coresched patches applied), 2x HVM guest, 8 vCPUs, Hyperthreading enabled (8 CPUs)
 - v-HVMx2-v8-noHT: Xen staging (no coresched patches applied), 2x HVM guest, 8 vCPUs, Hyperthreading enabled (4 CPUs)
 - HVMx2-v8-HT: Xen with coresched patches applied, 2x HVM guest, 8 vCPUs, Hyperthreading enabled (8 CPUs), sched-gran=cpu (default)
 - HVMx2-v8-noHT: Xen with coresched patches applied, 2x HVM guest, 8 vCPUs, Hyperthreading enabled (4 CPUs), sched-gran=cpu (default)
 - HVMx2-v8-csc: Xen with coresched patches applied, 2x HVM guest, 8 vCPUs, Hyperthreading enabled (8 CPUs), sched-gran=core

Benchmarks:
 - Suite: MMTests, https://github.com/gormanm/mmtests (e.g., see: https://lkml.org/lkml/2019/4/25/588)
   * Stream: pure memory benchmark (various kind of mem-ops done in parallel). Parallelism is NR_CPUS/2 tasks (i.e, . Ideally, we wouldn't see any difference between HT, noHT and csc;
   * Kernbench: builds a kernel, with varying number of compile jobs. HT is, in general, known to help, at least a little, as it let us do more parallel builds;
   * Hackbench: communication (via pipes, in this case) between group of processes. As we deal with _groups_ of tasks, we're already in saturation with 1 group, hence we expect noHT to suffer;
   * mutilate: load generator for memcached, with high request rate;
   * netperf-{udp,tcp,unix}: two communicating tasks. Without any pinning (either at hypervisor and dom0/guest level), we expect HT to play a role, as depending on where the two task are scheduler (i.e., whether on two core of the same thread, or not) performance may vary;
  * pgioperfbench: Postgres microbenchmark (I wanted to use pgbench, but had issues);
  * iozone: I have the number, but they're not shown here (needs time for looking at them and properly present them).

Raw results:
 - On my test box. I'll put them somewhere soon... :-)

Result comparisons:
 1) BM-HT vs. BM-noHT: check whether, on baremetal, for the given benchmark, HT has a positive or negative impact on performance
 2) v-XEN-HT vs. XEN-HT: check the overhead introduced by having the coresched patches applied, but not in use, in case HT is enabled. This tells us how expensive it is to have these patches in, for someone that does not need or want to use them, and keeps HT enabled;
 3) v-XEN-noHT vs. XEN-noHT: check the overhead introduced by having the coresched patches applied, but not in use, in case HT is disabled. This tells us how expensive it is to have these patches in, for someone that does not need or want them, and keeps HT disabled;
 4) XEN-HT vs. XEN-noHT vs. XEN-csc: with coresched patches applied, compare sched-gran=cpu and HT enabled (current default), with turning HT off and with sched-gran=core (and HT enabled, of course). This tells us how much perf we lose (or gain!) when using core-scheduling, as compared to both default and disabling hyperthreading disabled. Ideally, we'd end up a bit slower than default, but not as slow(er) as with HT disabled.
 5) HVM-v8-HT vs. HVM-v8-noHT vs. HVM-v8-csc: same as above, but benchmarks are running in an HVM guest with 8 vCPUs.
 6) HVM-v4-HT vs. HVM-v4-noHT vs. HVM-v4-csc: same as 5), but the HVM VM has now only 4 vCPUs.
 7) HVMx2-v8-HT vs.  HVMx2-v8-noHT vs. HVMx2-v8-csc: now there are 2 HVM guests, both with 8 vCPUs. One runs synthetic workloads (CPU and memory) producing ~600% CPU load. The other runs our benchmarks. So, the host is in overload by 2x factor, in terms of number of vCPUs vs. number of pCPUs (not counting dom0!). In terms of CPU load, it's almost the same, but harder to tell exactly, as the synthetic load is ~600% (and not 800%) and the various benchmarks have variable CPU load.

STREAM
======
1)                            BM-HT                BM-noHT
MB/sec copy     32623.20 (   0.00%)    33906.18 (   3.93%)
MB/sec scale    22056.18 (   0.00%)    23005.44 (   4.30%)
MB/sec add      25827.70 (   0.00%)    26707.06 (   3.40%)
MB/sec triad    25771.70 (   0.00%)    26705.98 (   3.63%)

I was expecting nearly identical performance between 'HT' and 'noHT'. For sure, I wasn't expecting 'noHT' to regress. It's actually going a little faster, which does indeed makes sense (as we don't use any task pinning).

2)                         v-XEN-HT                 XEN-HT
MB/sec copy     33076.00 (   0.00%)    33112.76 (   0.11%)
MB/sec scale    24170.98 (   0.00%)    24225.04 (   0.22%)
MB/sec add      27060.58 (   0.00%)    27195.52 (   0.50%)
MB/sec triad    27172.34 (   0.00%)    27296.42 (   0.46%)

3)                       v-XEN-noHT               XEN-noHT
MB/sec copy     33054.78 (   0.00%)    33070.62 (   0.05%)
MB/sec scale    24198.12 (   0.00%)    24335.76 (   0.57%)
MB/sec add      27084.58 (   0.00%)    27360.86 (   1.02%)
MB/sec triad    27199.12 (   0.00%)    27468.54 (   0.99%)

4)                           XEN-HT               XEN-noHT                XEN-csc
MB/sec copy     33112.76 (   0.00%)    33070.62 (  -0.13%)    31811.06 (  -3.93%)
MB/sec scale    24225.04 (   0.00%)    24335.76 (   0.46%)    22745.18 (  -6.11%)
MB/sec add      27195.52 (   0.00%)    27360.86 (   0.61%)    25088.74 (  -7.75%)
MB/sec triad    27296.42 (   0.00%)    27468.54 (   0.63%)    25170.02 (  -7.79%)

I wasn't expecting a degradation for 'csc' (as, in fact, we don't see any in 'noHT'). Perhaps, re-run the 'csc' case with 2 threads, and see if it makes any difference.

5)                        HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
MB/sec copy     33338.56 (   0.00%)    34168.32 (   2.49%)    29568.34 ( -11.31%)
MB/sec scale    22001.94 (   0.00%)    23023.06 (   4.64%)    18635.30 ( -15.30%)
MB/sec add      25442.48 (   0.00%)    26676.96 (   4.85%)    21632.14 ( -14.98%)
MB/sec triad    26376.48 (   0.00%)    26542.96 (   0.63%)    21751.54 ( -17.53%)

(Unexpected) Impact is way more than just "noticeable". Furthermore, 'HT' and 'noHT' results are only a little worse than the PV (well, dom0) case, while 'csc' is much worse. I suspect this may be caused by the benchmark's tasks running on siblings, even if there are fully idle cores, but this needs more investigation.

6)                        HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
MB/sec copy     31917.34 (   0.00%)    33820.44 (   5.96%)    31186.82 (  -2.29%)
MB/sec scale    20741.28 (   0.00%)    22320.82 (   7.62%)    19471.54 (  -6.12%)
MB/sec add      25120.08 (   0.00%)    26553.58 (   5.71%)    22348.62 ( -11.03%)
MB/sec triad    24979.40 (   0.00%)    26075.40 (   4.39%)    21988.46 ( -11.97%)

Not as bad as the 8 vCPU case, but still not great.

7)                      HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
MB/sec copy     28821.94 (   0.00%)    22508.84 ( -21.90%)    27223.16 (  -5.55%)
MB/sec scale    20544.64 (   0.00%)    15269.06 ( -25.68%)    17813.02 ( -13.30%)
MB/sec add      22774.16 (   0.00%)    16733.48 ( -26.52%)    19659.98 ( -13.67%)
MB/sec triad    22163.52 (   0.00%)    16508.20 ( -25.52%)    19722.54 ( -11.01%)

Under load, core-scheduling does better than 'noHT'. Good! :-D


KERNBENCH
=========
Done runs with 2, 4,8 or 16 threads. So, with HT on, we saturate at 8 threads, with HT off, at 4.
------------------------------------------------------------
1)                                 BM-HT                BM-noHT
Amean     elsp-2       199.46 (   0.00%)      196.25 *   1.61%*
Amean     elsp-4       114.96 (   0.00%)      108.73 *   5.42%*
Amean     elsp-8        83.65 (   0.00%)      109.77 * -31.22%*
Amean     elsp-16       84.46 (   0.00%)      112.61 * -33.33%*
Stddev    elsp-2         0.09 (   0.00%)        0.05 (  46.12%)
Stddev    elsp-4         0.08 (   0.00%)        0.19 (-126.04%)
Stddev    elsp-8         0.15 (   0.00%)        0.15 (  -2.80%)
Stddev    elsp-16        0.03 (   0.00%)        0.50 (-1328.97%)

Things are (slightly) better, on 'noHT', before saturation, but get substantially worse after that point.

2)                              v-XEN-HT                 XEN-HT
Amean     elsp-2       234.13 (   0.00%)      233.63 (   0.21%)
Amean     elsp-4       139.89 (   0.00%)      140.21 *  -0.23%*
Amean     elsp-8        98.15 (   0.00%)       98.22 (  -0.08%)
Amean     elsp-16       99.05 (   0.00%)       99.65 *  -0.61%*
Stddev    elsp-2         0.40 (   0.00%)        0.14 (  64.63%)
Stddev    elsp-4         0.11 (   0.00%)        0.14 ( -33.77%)
Stddev    elsp-8         0.23 (   0.00%)        0.40 ( -68.71%)
Stddev    elsp-16        0.05 (   0.00%)        0.47 (-855.23%)

3)                            v-XEN-noHT               XEN-noHT
Amean     elsp-2       233.23 (   0.00%)      233.16 (   0.03%)
Amean     elsp-4       130.00 (   0.00%)      130.43 (  -0.33%)
Amean     elsp-8       132.86 (   0.00%)      132.34 *   0.39%*
Amean     elsp-16      135.61 (   0.00%)      135.51 (   0.07%)
Stddev    elsp-2         0.10 (   0.00%)        0.06 (  39.87%)
Stddev    elsp-4         0.04 (   0.00%)        0.67 (-1795.94%)
Stddev    elsp-8         0.37 (   0.00%)        0.04 (  89.24%)
Stddev    elsp-16        0.20 (   0.00%)        0.34 ( -69.82%)

4)                                XEN-HT               XEN-noHT                XEN-csc
Amean     elsp-2       233.63 (   0.00%)      233.16 *   0.20%*      248.12 *  -6.20%*
Amean     elsp-4       140.21 (   0.00%)      130.43 *   6.98%*      145.65 *  -3.88%*
Amean     elsp-8        98.22 (   0.00%)      132.34 * -34.73%*       98.15 (   0.07%)
Amean     elsp-16       99.65 (   0.00%)      135.51 * -35.98%*       99.58 (   0.07%)
Stddev    elsp-2         0.14 (   0.00%)        0.06 (  56.88%)        0.19 ( -36.79%)
Stddev    elsp-4         0.14 (   0.00%)        0.67 (-369.64%)        0.57 (-305.25%)
Stddev    elsp-8         0.40 (   0.00%)        0.04 (  89.89%)        0.03 (  91.88%)
Stddev    elsp-16        0.47 (   0.00%)        0.34 (  27.80%)        0.47 (   0.53%)

'csc' proves to be quite effective for this workload. It does cause some small regressions at low job count, but it lets us get back all the perf we lost when disabling Hyperthreading.

5)                             HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Amean     elsp-2       202.45 (   0.00%)      205.87 *  -1.69%*      218.87 *  -8.11%*
Amean     elsp-4       121.90 (   0.00%)      115.41 *   5.32%*      128.78 *  -5.64%*
Amean     elsp-8        85.94 (   0.00%)      125.71 * -46.28%*       87.26 *  -1.54%*
Amean     elsp-16       87.37 (   0.00%)      128.52 * -47.09%*       88.03 *  -0.76%*
Stddev    elsp-2         0.09 (   0.00%)        0.34 (-299.25%)        1.30 (-1433.24%)
Stddev    elsp-4         0.44 (   0.00%)        0.27 (  38.66%)        0.60 ( -36.58%)
Stddev    elsp-8         0.14 (   0.00%)        0.22 ( -63.81%)        0.35 (-153.50%)
Stddev    elsp-16        0.02 (   0.00%)        0.40 (-1627.17%)        0.32 (-1300.67%)

Trend is pretty much the same than in the dom0 case.

6)                             HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Amean     elsp-2       206.05 (   0.00%)      208.49 *  -1.18%*      237.58 * -15.30%*
Amean     elsp-4       114.53 (   0.00%)      115.46 *  -0.82%*      162.61 * -41.98%*
Amean     elsp-8       127.22 (   0.00%)      117.63 *   7.54%*      166.22 * -30.66%*
Amean     elsp-16      133.70 (   0.00%)      120.53 *   9.85%*      170.05 * -27.19%*
Stddev    elsp-2         0.26 (   0.00%)        0.24 (   7.38%)        0.34 ( -30.97%)
Stddev    elsp-4         0.29 (   0.00%)        0.23 (  20.88%)        0.24 (  17.16%)
Stddev    elsp-8         2.21 (   0.00%)        0.12 (  94.48%)        0.24 (  89.01%)
Stddev    elsp-16        2.00 (   0.00%)        0.57 (  71.34%)        0.34 (  82.87%)

Quite bad, this time. Basically, with only 4 vCPUs in the VM, we never saturate the machine, even in the 'noHT' case. In fact, 'noHT' goes similarly than 'HT' (even a little better, actually), as it could have been expected. OTOH, 'csc' regresses badly.

I did some more investigation, and found out that, with core-scheduling, we no longer spread the work among cores. E.g., with 4 cores and 4 busy vCPUs, in the 'HT' case, we run (most of the time) one vCPU on each core. In the 'csc' case, they often on 2 cores and leaves 2 cores idle. This means lower, but also more deterministic, performance. Whether or not is something we like, needs to be discussed.

7)                           HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Amean     elsp-2       288.82 (   0.00%)      437.79 * -51.58%*      339.86 * -17.67%*
Amean     elsp-4       196.18 (   0.00%)      294.95 * -50.35%*      242.91 * -23.82%*
Amean     elsp-8       153.27 (   0.00%)      229.25 * -49.57%*      182.50 * -19.07%*
Amean     elsp-16      156.59 (   0.00%)      235.63 * -50.48%*      185.82 * -18.67%*
Stddev    elsp-2         0.20 (   0.00%)        0.34 ( -67.23%)        2.56 (-1175.73%)
Stddev    elsp-4         0.48 (   0.00%)        0.07 (  86.25%)        0.41 (  15.93%)
Stddev    elsp-8         0.39 (   0.00%)        0.81 (-109.26%)        1.30 (-235.77%)
Stddev    elsp-16        0.88 (   0.00%)        0.87 (   1.97%)        1.75 ( -98.28%)

In overload, 'csc' regresses noticeably, as compared to 'HT'. But improves things a lot, as compared to 'noHT'.


HACKBENCH-PROCESS-PIPE
======================
1)                            BM-HT                BM-noHT
Amean     1       0.3838 (   0.00%)      0.6752 * -75.92%*
Amean     3       1.1718 (   0.00%)      1.9740 * -68.46%*
Amean     5       1.9334 (   0.00%)      2.5968 * -34.31%*
Amean     7       2.6592 (   0.00%)      2.6656 (  -0.24%)
Amean     12      3.0228 (   0.00%)      3.2494 (  -7.50%)
Amean     18      3.4848 (   0.00%)      3.7932 *  -8.85%*
Amean     24      4.1308 (   0.00%)      5.2044 * -25.99%*
Amean     30      4.7290 (   0.00%)      5.8558 * -23.83%*
Amean     32      4.9450 (   0.00%)      6.2686 * -26.77%*
Stddev    1       0.0391 (   0.00%)      0.0308 (  21.27%)
Stddev    3       0.0349 (   0.00%)      0.1551 (-344.67%)
Stddev    5       0.1620 (   0.00%)      0.1590 (   1.90%)
Stddev    7       0.1166 (   0.00%)      0.3178 (-172.48%)
Stddev    12      0.3413 (   0.00%)      0.4880 ( -43.00%)
Stddev    18      0.2202 (   0.00%)      0.0784 (  64.40%)
Stddev    24      0.4053 (   0.00%)      0.3042 (  24.95%)
Stddev    30      0.1251 (   0.00%)      0.1655 ( -32.33%)
Stddev    32      0.0344 (   0.00%)      0.2019 (-486.80%)

'noHT' hurts (as could have been anticipated), even quite significantly in some cases (actually, I'm not sure why 7, 12 and 18 groups went so much better than others...)

2)                         v-XEN-HT                 XEN-HT
Amean     1       0.7356 (   0.00%)      0.7470 (  -1.55%)
Amean     3       2.3506 (   0.00%)      2.4632 (  -4.79%)
Amean     5       4.1084 (   0.00%)      4.0418 (   1.62%)
Amean     7       5.8314 (   0.00%)      5.9784 (  -2.52%)
Amean     12      9.9694 (   0.00%)     10.4374 *  -4.69%*
Amean     18     14.2844 (   0.00%)     14.1366 (   1.03%)
Amean     24     19.0304 (   0.00%)     17.6314 (   7.35%)
Amean     30     21.8334 (   0.00%)     21.9030 (  -0.32%)
Amean     32     22.8946 (   0.00%)     22.0216 *   3.81%*
Stddev    1       0.0213 (   0.00%)      0.0327 ( -53.61%)
Stddev    3       0.1634 (   0.00%)      0.0952 (  41.73%)
Stddev    5       0.3934 (   0.00%)      0.1954 (  50.33%)
Stddev    7       0.1796 (   0.00%)      0.2809 ( -56.40%)
Stddev    12      0.1695 (   0.00%)      0.3725 (-119.74%)
Stddev    18      0.1392 (   0.00%)      0.4003 (-187.67%)
Stddev    24      1.5323 (   0.00%)      0.7330 (  52.16%)
Stddev    30      0.4214 (   0.00%)      1.0668 (-153.14%)
Stddev    32      0.8221 (   0.00%)      0.3849 (  53.18%)

With the core-scheduling patches applied, even if 'csc' is not used, there were runs where the overhead is noticeable (but also other runs where patched Xen was faster, so...)

3)                       v-XEN-noHT               XEN-noHT
Amean     1       1.2154 (   0.00%)      1.2200 (  -0.38%)
Amean     3       3.9274 (   0.00%)      3.9302 (  -0.07%)
Amean     5       6.6430 (   0.00%)      6.6584 (  -0.23%)
Amean     7       9.2772 (   0.00%)      9.8144 (  -5.79%)
Amean     12     14.8220 (   0.00%)     14.4464 (   2.53%)
Amean     18     21.6266 (   0.00%)     20.2166 *   6.52%*
Amean     24     27.2974 (   0.00%)     26.1082 *   4.36%*
Amean     30     32.2406 (   0.00%)     31.1702 *   3.32%*
Amean     32     34.8984 (   0.00%)     33.8204 *   3.09%*
Stddev    1       0.0727 (   0.00%)      0.0560 (  22.95%)
Stddev    3       0.2564 (   0.00%)      0.1332 (  48.04%)
Stddev    5       0.6043 (   0.00%)      0.5454 (   9.75%)
Stddev    7       0.6785 (   0.00%)      0.4508 (  33.56%)
Stddev    12      0.5074 (   0.00%)      0.8115 ( -59.92%)
Stddev    18      0.9622 (   0.00%)      0.8868 (   7.84%)
Stddev    24      0.4904 (   0.00%)      0.8453 ( -72.38%)
Stddev    30      0.5542 (   0.00%)      0.6499 ( -17.27%)
Stddev    32      0.4986 (   0.00%)      0.7965 ( -59.77%)

4)                           XEN-HT               XEN-noHT                XEN-csc
Amean     1       0.7470 (   0.00%)      1.2200 * -63.32%*      0.7160 *   4.15%*
Amean     3       2.4632 (   0.00%)      3.9302 * -59.56%*      2.2284 *   9.53%*
Amean     5       4.0418 (   0.00%)      6.6584 * -64.74%*      4.0492 (  -0.18%)
Amean     7       5.9784 (   0.00%)      9.8144 * -64.16%*      5.8718 (   1.78%)
Amean     12     10.4374 (   0.00%)     14.4464 * -38.41%*     10.1488 (   2.77%)
Amean     18     14.1366 (   0.00%)     20.2166 * -43.01%*     14.7450 *  -4.30%*
Amean     24     17.6314 (   0.00%)     26.1082 * -48.08%*     18.0072 (  -2.13%)
Amean     30     21.9030 (   0.00%)     31.1702 * -42.31%*     21.1058 (   3.64%)
Amean     32     22.0216 (   0.00%)     33.8204 * -53.58%*     22.7300 (  -3.22%)
Stddev    1       0.0327 (   0.00%)      0.0560 ( -71.05%)      0.0168 (  48.71%)
Stddev    3       0.0952 (   0.00%)      0.1332 ( -39.92%)      0.0944 (   0.88%)
Stddev    5       0.1954 (   0.00%)      0.5454 (-179.08%)      0.3064 ( -56.79%)
Stddev    7       0.2809 (   0.00%)      0.4508 ( -60.47%)      0.5762 (-105.11%)
Stddev    12      0.3725 (   0.00%)      0.8115 (-117.83%)      0.3594 (   3.53%)
Stddev    18      0.4003 (   0.00%)      0.8868 (-121.54%)      0.2580 (  35.56%)
Stddev    24      0.7330 (   0.00%)      0.8453 ( -15.32%)      0.8976 ( -22.45%)
Stddev    30      1.0668 (   0.00%)      0.6499 (  39.08%)      0.8666 (  18.76%)
Stddev    32      0.3849 (   0.00%)      0.7965 (-106.96%)      1.4428 (-274.86%)

This looks similar to 'kernbench'. 'noHT' hurts the workload badly, but 'csc' seems to come quite a bit to the rescue!

5)                        HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Amean     1       0.8992 (   0.00%)      1.1826 * -31.52%*      0.9462 *  -5.23%*
Amean     3       2.4524 (   0.00%)      2.4990 (  -1.90%)      2.5514 *  -4.04%*
Amean     5       3.6710 (   0.00%)      3.5258 (   3.96%)      3.9270 *  -6.97%*
Amean     7       4.6588 (   0.00%)      4.6430 (   0.34%)      5.0420 (  -8.23%)
Amean     12      7.7872 (   0.00%)      7.6310 (   2.01%)      7.6498 (   1.76%)
Amean     18     10.9800 (   0.00%)     12.2772 ( -11.81%)     11.1940 (  -1.95%)
Amean     24     16.5552 (   0.00%)     18.1368 (  -9.55%)     15.1922 (   8.23%)
Amean     30     17.1778 (   0.00%)     24.0352 * -39.92%*     17.7776 (  -3.49%)
Amean     32     19.1966 (   0.00%)     25.3104 * -31.85%*     19.8716 (  -3.52%)
Stddev    1       0.0306 (   0.00%)      0.0440 ( -44.08%)      0.0419 ( -36.94%)
Stddev    3       0.0681 (   0.00%)      0.1289 ( -89.35%)      0.0975 ( -43.30%)
Stddev    5       0.1214 (   0.00%)      0.2722 (-124.14%)      0.2080 ( -71.33%)
Stddev    7       0.5810 (   0.00%)      0.2687 (  53.75%)      0.3038 (  47.71%)
Stddev    12      1.4867 (   0.00%)      1.3196 (  11.24%)      1.5775 (  -6.11%)
Stddev    18      1.9110 (   0.00%)      1.8377 (   3.83%)      1.3230 (  30.77%)
Stddev    24      1.4038 (   0.00%)      1.2938 (   7.84%)      1.2301 (  12.37%)
Stddev    30      2.0035 (   0.00%)      1.3529 (  32.47%)      2.4480 ( -22.19%)
Stddev    32      1.8277 (   0.00%)      1.1786 (  35.52%)      2.4069 ( -31.69%)

Similar to the dom0 case, but it's rather weird how the 'noHT' case is, in some cases, quite good. Even better than 'HT'. :-O

6)                        HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Amean     1       1.4962 (   0.00%)      1.4424 (   3.60%)      1.6824 * -12.44%*
Amean     3       2.5826 (   0.00%)      2.9692 ( -14.97%)      2.9984 ( -16.10%)
Amean     5       3.4330 (   0.00%)      3.5064 (  -2.14%)      4.7144 * -37.33%*
Amean     7       5.3896 (   0.00%)      4.5122 (  16.28%)      5.7828 (  -7.30%)
Amean     12      7.6862 (   0.00%)      8.1164 (  -5.60%)     10.3868 * -35.14%*
Amean     18     10.8628 (   0.00%)     10.8600 (   0.03%)     14.5226 * -33.69%*
Amean     24     16.1058 (   0.00%)     13.9238 (  13.55%)     17.3746 (  -7.88%)
Amean     30     17.9462 (   0.00%)     20.1588 ( -12.33%)     22.2760 * -24.13%*
Amean     32     20.8922 (   0.00%)     15.2004 *  27.24%*     22.5168 (  -7.78%)
Stddev    1       0.0353 (   0.00%)      0.0996 (-181.94%)      0.0573 ( -62.13%)
Stddev    3       0.4556 (   0.00%)      0.6097 ( -33.81%)      0.4727 (  -3.73%)
Stddev    5       0.7936 (   0.00%)      0.6409 (  19.25%)      1.0294 ( -29.71%)
Stddev    7       1.3387 (   0.00%)      0.9335 (  30.27%)      1.5652 ( -16.92%)
Stddev    12      1.1304 (   0.00%)      1.8932 ( -67.48%)      1.6302 ( -44.21%)
Stddev    18      2.6187 (   0.00%)      2.2309 (  14.81%)      0.9304 (  64.47%)
Stddev    24      2.2138 (   0.00%)      1.6892 (  23.70%)      1.4706 (  33.57%)
Stddev    30      1.6264 (   0.00%)      2.1632 ( -33.01%)      1.6574 (  -1.91%)
Stddev    32      3.5272 (   0.00%)      2.5166 (  28.65%)      2.0956 (  40.59%)

7)                      HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Amean     1       1.4736 (   0.00%)      1.2224 *  17.05%*      1.6544 * -12.27%*
Amean     3       3.1764 (   0.00%)      2.4984 *  21.34%*      2.9720 *   6.43%*
Amean     5       4.7694 (   0.00%)      3.4870 *  26.89%*      4.4406 *   6.89%*
Amean     7       6.3958 (   0.00%)      4.6320 *  27.58%*      5.9150 *   7.52%*
Amean     12     11.0484 (   0.00%)      7.6534 *  30.73%*      9.7064 (  12.15%)
Amean     18     16.8658 (   0.00%)     12.8028 *  24.09%*     17.0654 (  -1.18%)
Amean     24     23.6798 (   0.00%)     18.8692 *  20.32%*     24.1742 (  -2.09%)
Amean     30     29.6202 (   0.00%)     25.4126 *  14.21%*     30.5034 (  -2.98%)
Amean     32     31.7582 (   0.00%)     28.5946 *   9.96%*     33.0222 *  -3.98%*
Stddev    1       0.0309 (   0.00%)      0.0294 (   5.09%)      0.0526 ( -69.92%)
Stddev    3       0.1102 (   0.00%)      0.0670 (  39.25%)      0.0956 (  13.30%)
Stddev    5       0.2023 (   0.00%)      0.1089 (  46.16%)      0.0694 (  65.68%)
Stddev    7       0.3783 (   0.00%)      0.1408 (  62.79%)      0.2421 (  36.01%)
Stddev    12      1.5160 (   0.00%)      0.3483 (  77.02%)      1.3210 (  12.87%)
Stddev    18      1.3000 (   0.00%)      1.2858 (   1.09%)      0.8736 (  32.80%)
Stddev    24      0.8749 (   0.00%)      0.8663 (   0.98%)      0.8197 (   6.31%)
Stddev    30      1.3074 (   0.00%)      1.4297 (  -9.36%)      1.2882 (   1.46%)
Stddev    32      1.2521 (   0.00%)      1.0791 (  13.81%)      0.7319 (  41.54%)

'noHT' becomes _super_fast_ (as compared to 'HT') when the system is overloaded. And that is (continues to be) quite weird. About 'csc', it's not too bad, as it performs in line with 'HT'.


MUTILATE
========
1)                           BM-HT                BM-noHT
Hmean     1    60932.67 (   0.00%)    60817.19 (  -0.19%)
Hmean     3   150762.70 (   0.00%)   123570.43 * -18.04%*
Hmean     5   197564.84 (   0.00%)   176716.37 * -10.55%*
Hmean     7   178089.27 (   0.00%)   170859.21 *  -4.06%*
Hmean     8   151372.16 (   0.00%)   160742.09 *   6.19%*
Stddev    1      591.29 (   0.00%)      513.06 (  13.23%)
Stddev    3     1358.38 (   0.00%)     1783.77 ( -31.32%)
Stddev    5     3477.60 (   0.00%)     5380.83 ( -54.73%)
Stddev    7     2676.96 (   0.00%)     5119.18 ( -91.23%)
Stddev    8      696.32 (   0.00%)     2888.47 (-314.82%)

It's less of a clear cut whether or not HT helps or not...

2)                        v-XEN-HT                 XEN-HT
Hmean     1    31121.40 (   0.00%)    31063.01 (  -0.19%)
Hmean     3    75847.21 (   0.00%)    75373.02 (  -0.63%)
Hmean     5   113823.13 (   0.00%)   113127.84 (  -0.61%)
Hmean     7   112553.93 (   0.00%)   112467.16 (  -0.08%)
Hmean     8   103472.02 (   0.00%)   103246.75 (  -0.22%)
Stddev    1      147.14 (   0.00%)      175.64 ( -19.37%)
Stddev    3      920.80 (   0.00%)     1017.40 ( -10.49%)
Stddev    5      339.23 (   0.00%)      781.99 (-130.52%)
Stddev    7      248.83 (   0.00%)      404.74 ( -62.66%)
Stddev    8      132.22 (   0.00%)      274.16 (-107.35%)

3)                      v-XEN-noHT               XEN-noHT
Hmean     1    31182.90 (   0.00%)    31107.06 (  -0.24%)
Hmean     3    83473.18 (   0.00%)    84387.92 *   1.10%*
Hmean     5    90322.68 (   0.00%)    91396.40 (   1.19%)
Hmean     7    91082.97 (   0.00%)    91296.28 (   0.23%)
Hmean     8    89529.42 (   0.00%)    89442.61 (  -0.10%)
Stddev    1      369.37 (   0.00%)      336.37 (   8.93%)
Stddev    3      300.44 (   0.00%)      333.46 ( -10.99%)
Stddev    5      842.22 (   0.00%)      816.23 (   3.08%)
Stddev    7      907.80 (   0.00%)     1084.18 ( -19.43%)
Stddev    8      362.90 (   0.00%)      890.80 (-145.47%)

4)                          XEN-HT               XEN-noHT                XEN-csc
Hmean     1    31063.01 (   0.00%)    31107.06 (   0.14%)    29609.98 *  -4.68%*
Hmean     3    75373.02 (   0.00%)    84387.92 *  11.96%*    73419.60 *  -2.59%*
Hmean     5   113127.84 (   0.00%)    91396.40 * -19.21%*   100843.37 * -10.86%*
Hmean     7   112467.16 (   0.00%)    91296.28 * -18.82%*   108439.16 *  -3.58%*
Hmean     8   103246.75 (   0.00%)    89442.61 * -13.37%*   101164.18 *  -2.02%*
Stddev    1      175.64 (   0.00%)      336.37 ( -91.51%)      271.28 ( -54.46%)
Stddev    3     1017.40 (   0.00%)      333.46 (  67.22%)      367.95 (  63.83%)
Stddev    5      781.99 (   0.00%)      816.23 (  -4.38%)      549.77 (  29.70%)
Stddev    7      404.74 (   0.00%)     1084.18 (-167.87%)      376.92 (   6.87%)
Stddev    8      274.16 (   0.00%)      890.80 (-224.93%)      987.59 (-260.23%)

And it's a mixed bag even here, but it clearly looks like 'csc' helps (i.e., let us achieve better perf than 'noHT' when in and above saturation).

5)                       HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Hmean     1    39583.53 (   0.00%)    38828.65 *  -1.91%*    35103.76 * -11.32%*
Hmean     3    87210.00 (   0.00%)    57874.31 * -33.64%*    47160.22 * -45.92%*
Hmean     5   130676.61 (   0.00%)    42883.21 * -67.18%*    85504.60 * -34.57%*
Hmean     7   157586.93 (   0.00%)    59146.02 * -62.47%*   136717.21 * -13.24%*
Hmean     8   174113.49 (   0.00%)    75503.09 * -56.64%*   159666.07 *  -8.30%*
Stddev    1      160.68 (   0.00%)      232.01 ( -44.39%)      547.91 (-240.99%)
Stddev    3     1158.62 (   0.00%)      654.05 (  43.55%)    18841.25 (-1526.19%)
Stddev    5      298.35 (   0.00%)      542.51 ( -81.84%)     1274.93 (-327.32%)
Stddev    7     2569.12 (   0.00%)     1034.22 (  59.74%)     4163.80 ( -62.07%)
Stddev    8      982.48 (   0.00%)      692.10 (  29.56%)     1324.30 ( -34.79%)

'csc' helps, and it let us achieve better performance than 'noHT', overall. Absolute performance looks worse than in dom0, but the 'noHT' case seems to be affected even more.

6)                       HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Hmean     1    37156.88 (   0.00%)    36938.92 (  -0.59%)    34869.19 *  -6.16%*
Hmean     3    91170.71 (   0.00%)    91469.09 (   0.33%)    76214.51 * -16.40%*
Hmean     5   119020.55 (   0.00%)   126949.11 *   6.66%*   109062.82 *  -8.37%*
Hmean     7   119924.66 (   0.00%)   133372.57 *  11.21%*   110466.18 *  -7.89%*
Hmean     8   121329.34 (   0.00%)   136100.78 (  12.17%)   114071.36 *  -5.98%*
Stddev    1      214.02 (   0.00%)      362.58 ( -69.42%)      516.53 (-141.35%)
Stddev    3     2554.47 (   0.00%)     1549.81 (  39.33%)      748.72 (  70.69%)
Stddev    5     1061.32 (   0.00%)     2250.11 (-112.01%)     2377.12 (-123.98%)
Stddev    7     5486.68 (   0.00%)     3214.60 (  41.41%)     4147.45 (  24.41%)
Stddev    8     1857.60 (   0.00%)    12837.92 (-591.10%)     3400.78 ( -83.07%)

7)                     HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Hmean     1    30194.08 (   0.00%)     2457.66 * -91.86%*     4940.93 * -83.64%*
Hmean     3    33860.02 (   0.00%)    11209.64 * -66.89%*    23274.26 * -31.26%*
Hmean     5    45606.53 (   0.00%)    15984.89 * -64.95%*    38599.75 ( -15.36%)
Hmean     7    70754.93 (   0.00%)    23045.58 * -67.43%*    55799.74 * -21.14%*
Hmean     8    88529.15 (   0.00%)    36532.64 * -58.73%*    70367.33 * -20.52%*
Stddev    1      214.55 (   0.00%)        2.87 (  98.66%)    13159.62 (-6033.62%)
Stddev    3      440.61 (   0.00%)      179.15 (  59.34%)     4463.14 (-912.95%)
Stddev    5      900.27 (   0.00%)      161.20 (  82.09%)     6286.95 (-598.34%)
Stddev    7      706.22 (   0.00%)      708.81 (  -0.37%)     3845.49 (-444.52%)
Stddev    8     3197.46 (   0.00%)      818.97 (  74.39%)     4477.51 ( -40.03%)

Similar to kernbench: 'csc' a lot better than 'noHT' in overload.


PGIOPERFBENCH
=============
1)                                BM-HT                BM-noHT
Amean     commit        7.72 (   0.00%)        9.73 * -26.14%*
Amean     read          6.99 (   0.00%)        8.15 * -16.66%*
Amean     wal           0.02 (   0.00%)        0.02 (   3.63%)
Stddev    commit        1.35 (   0.00%)        5.48 (-304.84%)
Stddev    read          0.61 (   0.00%)        3.82 (-521.73%)
Stddev    wal           0.04 (   0.00%)        0.04 (   1.36%)

Commit and read runs are suffering withough HT.

2)                             v-XEN-HT                 XEN-HT
Amean     commit        7.88 (   0.00%)        8.78 * -11.43%*
Amean     read          6.85 (   0.00%)        7.71 * -12.54%*
Amean     wal           0.07 (   0.00%)        0.07 (   1.15%)
Stddev    commit        1.86 (   0.00%)        3.59 ( -92.56%)
Stddev    read          0.53 (   0.00%)        2.89 (-450.88%)
Stddev    wal           0.05 (   0.00%)        0.05 (  -0.86%)

Significant overhead introduced by core scheduler patches, with sched-gran=cpu and HT enabled.

3)                           v-XEN-noHT               XEN-noHT
Amean     commit        8.53 (   0.00%)        7.64 *  10.47%*
Amean     read          7.30 (   0.00%)        6.86 *   6.02%*
Amean     wal           0.08 (   0.00%)        0.08 (   2.61%)
Stddev    commit        4.27 (   0.00%)        1.35 (  68.46%)
Stddev    read          3.12 (   0.00%)        0.39 (  87.45%)
Stddev    wal           0.04 (   0.00%)        0.04 (   6.28%)

While not so much (actually, things are faster) with sched-gran=cpu and HT disabled.

4)                               XEN-HT               XEN-noHT                XEN-csc
Amean     commit        8.78 (   0.00%)        7.64 *  13.03%*        8.20 (   6.63%)
Amean     read          7.71 (   0.00%)        6.86 *  10.95%*        7.21 *   6.43%*
Amean     wal           0.07 (   0.00%)        0.08 * -12.33%*        0.08 (  -7.96%)
Stddev    commit        3.59 (   0.00%)        1.35 (  62.41%)        3.50 (   2.37%)
Stddev    read          2.89 (   0.00%)        0.39 (  86.46%)        2.50 (  13.52%)
Stddev    wal           0.05 (   0.00%)        0.04 (  11.34%)        0.04 (   6.71%)

Interestingly, when running under Xen and without HT, 'commit' and 'read' runs are not suffering any longer (they're actually faster!). 'wal' is. With core scheduling, WAL still slows down, but a little bit less, and commit and read are still fine (although they speed-up less).

5)                            HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Amean     commit        7.83 (   0.00%)        7.74 (   1.16%)        7.68 (   1.87%)
Amean     read          6.97 (   0.00%)        6.97 (  -0.03%)        6.92 *   0.62%*
Amean     wal           0.04 (   0.00%)        0.01 *  79.85%*        0.07 * -84.69%*
Stddev    commit        1.60 (   0.00%)        1.54 (   3.79%)        1.59 (   0.35%)
Stddev    read          0.53 (   0.00%)        0.55 (  -3.61%)        0.68 ( -27.35%)
Stddev    wal           0.05 (   0.00%)        0.03 (  45.88%)        0.05 (   2.11%)

'commit' and 'read' looks similar to the dom0 case. 'wal' is weird, as it went very very good in the 'noHT' case... I'm not sure how significant this benchmark is, to be honest.

6)                            HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Stddev    commit        1.62 (   0.00%)        1.51 (   7.24%)        1.65 (  -1.80%)
Stddev    read          0.55 (   0.00%)        0.66 ( -19.98%)        0.57 (  -2.02%)
Stddev    wal           0.05 (   0.00%)        0.05 (  -6.25%)        0.05 (  -5.64%)

7)                          HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Amean     commit        7.94 (   0.00%)        8.31 *  -4.57%*       10.28 * -29.45%*
Amean     read          6.98 (   0.00%)        7.91 * -13.20%*        9.47 * -35.58%*
Amean     wal           0.01 (   0.00%)        0.07 *-415.01%*        0.24 *-1752.46%*
Stddev    commit        1.75 (   0.00%)        1.30 (  26.10%)        2.17 ( -23.63%)
Stddev    read          0.67 (   0.00%)        0.79 ( -17.48%)        2.21 (-228.37%)
Stddev    wal           0.03 (   0.00%)        0.09 (-157.29%)        0.39 (-1057.77%)

Here, on the other hand, 'csc' regresses against 'noHT'. We've seen 'wal' fluctuating a bit. But in this case, even if we'd ignore it, we see that also 'commit' and 'read' slow down significantly.


NETPERF-UDP
===========
1)                                   BM-HT                BM-noHT
Hmean     send-64        395.57 (   0.00%)      402.32 *   1.70%*
Hmean     send-256      1474.96 (   0.00%)     1515.87 *   2.77%*
Hmean     send-2048    10592.86 (   0.00%)    11137.39 *   5.14%*
Hmean     send-8192    27861.86 (   0.00%)    29965.72 *   7.55%*
Hmean     recv-64        395.51 (   0.00%)      402.30 *   1.72%*
Hmean     recv-256      1474.96 (   0.00%)     1515.61 *   2.76%*
Hmean     recv-2048    10591.01 (   0.00%)    11135.47 *   5.14%*
Hmean     recv-8192    27858.04 (   0.00%)    29956.95 *   7.53%*
Stddev    send-64          4.61 (   0.00%)        3.01 (  34.70%)
Stddev    send-256        28.10 (   0.00%)       11.44 (  59.29%)
Stddev    send-2048      164.91 (   0.00%)      112.24 (  31.94%)
Stddev    send-8192      930.19 (   0.00%)      416.78 (  55.19%)
Stddev    recv-64          4.73 (   0.00%)        3.00 (  36.49%)
Stddev    recv-256        28.10 (   0.00%)       11.86 (  57.79%)
Stddev    recv-2048      163.52 (   0.00%)      111.83 (  31.61%)
Stddev    recv-8192      930.48 (   0.00%)      415.22 (  55.38%)

'noHT' being consistently --although marginally, in most cases-- better seems to reveal non-ideality in how SMT is handled in the Linux scheduler.

2)                                v-XEN-HT                 XEN-HT
Hmean     send-64        223.35 (   0.00%)      220.23 *  -1.39%*
Hmean     send-256       876.26 (   0.00%)      863.81 *  -1.42%*
Hmean     send-2048     6575.95 (   0.00%)     6542.13 (  -0.51%)
Hmean     send-8192    20882.75 (   0.00%)    20999.88 (   0.56%)
Hmean     recv-64        223.35 (   0.00%)      220.23 *  -1.39%*
Hmean     recv-256       876.22 (   0.00%)      863.81 *  -1.42%*
Hmean     recv-2048     6575.00 (   0.00%)     6541.22 (  -0.51%)
Hmean     recv-8192    20881.63 (   0.00%)    20996.13 (   0.55%)
Stddev    send-64          1.16 (   0.00%)        0.94 (  19.15%)
Stddev    send-256         7.50 (   0.00%)        7.71 (  -2.79%)
Stddev    send-2048       29.58 (   0.00%)       57.84 ( -95.56%)
Stddev    send-8192      106.01 (   0.00%)      172.69 ( -62.90%)
Stddev    recv-64          1.16 (   0.00%)        0.94 (  19.15%)
Stddev    recv-256         7.50 (   0.00%)        7.70 (  -2.70%)
Stddev    recv-2048       28.84 (   0.00%)       56.58 ( -96.20%)
Stddev    recv-8192      105.96 (   0.00%)      167.07 ( -57.68%)

3)                              v-XEN-noHT               XEN-noHT
Hmean     send-64        222.53 (   0.00%)      220.12 *  -1.08%*
Hmean     send-256       874.65 (   0.00%)      866.59 (  -0.92%)
Hmean     send-2048     6595.97 (   0.00%)     6568.38 (  -0.42%)
Hmean     send-8192    21537.45 (   0.00%)    21163.69 *  -1.74%*
Hmean     recv-64        222.52 (   0.00%)      220.08 *  -1.10%*
Hmean     recv-256       874.55 (   0.00%)      866.47 (  -0.92%)
Hmean     recv-2048     6595.64 (   0.00%)     6566.75 (  -0.44%)
Hmean     recv-8192    21534.73 (   0.00%)    21157.86 *  -1.75%*
Stddev    send-64          1.45 (   0.00%)        1.64 ( -13.12%)
Stddev    send-256         4.96 (   0.00%)        8.92 ( -79.63%)
Stddev    send-2048       35.06 (   0.00%)       64.88 ( -85.05%)
Stddev    send-8192      199.63 (   0.00%)      161.66 (  19.02%)
Stddev    recv-64          1.44 (   0.00%)        1.62 ( -12.50%)
Stddev    recv-256         4.94 (   0.00%)        9.03 ( -82.65%)
Stddev    recv-2048       34.99 (   0.00%)       65.80 ( -88.02%)
Stddev    recv-8192      201.10 (   0.00%)      162.03 (  19.43%)

4)                                  XEN-HT               XEN-noHT                XEN-csc
Hmean     send-64        220.23 (   0.00%)      220.12 (  -0.05%)      199.70 *  -9.32%*
Hmean     send-256       863.81 (   0.00%)      866.59 (   0.32%)      791.34 *  -8.39%*
Hmean     send-2048     6542.13 (   0.00%)     6568.38 (   0.40%)     5758.09 * -11.98%*
Hmean     send-8192    20999.88 (   0.00%)    21163.69 (   0.78%)    19706.48 *  -6.16%*
Hmean     recv-64        220.23 (   0.00%)      220.08 (  -0.07%)      199.70 *  -9.32%*
Hmean     recv-256       863.81 (   0.00%)      866.47 (   0.31%)      791.09 *  -8.42%*
Hmean     recv-2048     6541.22 (   0.00%)     6566.75 (   0.39%)     5757.50 * -11.98%*
Hmean     recv-8192    20996.13 (   0.00%)    21157.86 (   0.77%)    19703.65 *  -6.16%*
Stddev    send-64          0.94 (   0.00%)        1.64 ( -74.89%)       21.30 (-2173.95%)
Stddev    send-256         7.71 (   0.00%)        8.92 ( -15.74%)       63.89 (-729.17%)
Stddev    send-2048       57.84 (   0.00%)       64.88 ( -12.18%)      112.05 ( -93.72%)
Stddev    send-8192      172.69 (   0.00%)      161.66 (   6.39%)      276.33 ( -60.02%)
Stddev    recv-64          0.94 (   0.00%)        1.62 ( -72.94%)       21.30 (-2173.74%)
Stddev    recv-256         7.70 (   0.00%)        9.03 ( -17.23%)       64.08 (-732.24%)
Stddev    recv-2048       56.58 (   0.00%)       65.80 ( -16.28%)      112.00 ( -97.94%)
Stddev    recv-8192      167.07 (   0.00%)      162.03 (   3.01%)      276.84 ( -65.70%)

In this case, 'csc' is performing worse than 'noHT', pretty consistently. This again may be due to sub-optimal placement of the vCPUs, with core-scheduling enabled.

5)                               HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Hmean     send-64        518.85 (   0.00%)      487.85 *  -5.97%*      505.10 (  -2.65%)
Hmean     send-256      1955.77 (   0.00%)     1784.00 *  -8.78%*     1931.18 (  -1.26%)
Hmean     send-2048    12849.78 (   0.00%)    13027.81 (   1.39%)    13485.59 (   4.95%)
Hmean     send-8192    37659.20 (   0.00%)    35773.79 *  -5.01%*    35040.88 *  -6.95%*
Hmean     recv-64        518.32 (   0.00%)      487.76 *  -5.89%*      504.86 (  -2.60%)
Hmean     recv-256      1954.42 (   0.00%)     1782.90 *  -8.78%*     1930.33 (  -1.23%)
Hmean     recv-2048    12830.91 (   0.00%)    13018.47 (   1.46%)    13475.39 (   5.02%)
Hmean     recv-8192    37619.84 (   0.00%)    35768.42 *  -4.92%*    35035.40 *  -6.87%*
Stddev    send-64         29.69 (   0.00%)        6.43 (  78.34%)       17.36 (  41.53%)
Stddev    send-256        64.17 (   0.00%)       26.82 (  58.21%)       33.37 (  47.99%)
Stddev    send-2048     1342.98 (   0.00%)      246.21 (  81.67%)      353.97 (  73.64%)
Stddev    send-8192      224.20 (   0.00%)      467.40 (-108.48%)      928.70 (-314.24%)
Stddev    recv-64         29.96 (   0.00%)        6.54 (  78.16%)       17.64 (  41.11%)
Stddev    recv-256        64.29 (   0.00%)       27.28 (  57.56%)       34.17 (  46.85%)
Stddev    recv-2048     1353.73 (   0.00%)      251.38 (  81.43%)      368.14 (  72.81%)
Stddev    recv-8192      206.98 (   0.00%)      467.59 (-125.91%)      928.73 (-348.71%)

This looks different than the dom0 case. Basically, here, 'noHT' and 'csc' look on par (maybe 'csc' is even a little bit better). Another difference is that 'noHT' slows down more, as compared to 'HT'.

6)                               HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Hmean     send-64        513.95 (   0.00%)      511.96 (  -0.39%)      458.46 * -10.80%*
Hmean     send-256      1928.30 (   0.00%)     1930.57 (   0.12%)     1831.70 *  -5.01%*
Hmean     send-2048    13509.70 (   0.00%)    13802.90 *   2.17%*    12739.81 *  -5.70%*
Hmean     send-8192    35966.73 (   0.00%)    36086.45 (   0.33%)    35293.89 (  -1.87%)
Hmean     recv-64        513.90 (   0.00%)      511.91 (  -0.39%)      458.34 * -10.81%*
Hmean     recv-256      1927.94 (   0.00%)     1930.48 (   0.13%)     1831.50 *  -5.00%*
Hmean     recv-2048    13506.19 (   0.00%)    13799.61 *   2.17%*    12734.53 *  -5.71%*
Hmean     recv-8192    35936.42 (   0.00%)    36069.50 (   0.37%)    35266.17 (  -1.87%)
Stddev    send-64         12.88 (   0.00%)        3.89 (  69.78%)       16.11 ( -25.10%)
Stddev    send-256        37.13 (   0.00%)       19.62 (  47.16%)       30.54 (  17.75%)
Stddev    send-2048       86.00 (   0.00%)      100.64 ( -17.03%)      311.99 (-262.80%)
Stddev    send-8192      926.89 (   0.00%)     1088.50 ( -17.44%)     2801.65 (-202.26%)
Stddev    recv-64         12.93 (   0.00%)        3.96 (  69.36%)       16.15 ( -24.90%)
Stddev    recv-256        37.76 (   0.00%)       19.75 (  47.68%)       30.68 (  18.73%)
Stddev    recv-2048       88.84 (   0.00%)      100.51 ( -13.13%)      313.79 (-253.20%)

7)                             HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Hmean     send-64        359.65 (   0.00%)      251.84 * -29.98%*      348.95 (  -2.97%)
Hmean     send-256      1376.16 (   0.00%)      969.58 * -29.54%*     1293.19 *  -6.03%*
Hmean     send-2048    10212.34 (   0.00%)     6893.41 * -32.50%*     8933.39 * -12.52%*
Hmean     send-8192    30442.79 (   0.00%)    18372.42 * -39.65%*    25486.82 * -16.28%*
Hmean     recv-64        359.01 (   0.00%)      240.67 * -32.96%*      319.95 * -10.88%*
Hmean     recv-256      1372.71 (   0.00%)      923.23 * -32.74%*     1206.20 * -12.13%*
Hmean     recv-2048    10185.94 (   0.00%)     6258.59 * -38.56%*     8371.03 * -17.82%*
Hmean     recv-8192    30272.47 (   0.00%)    16888.08 * -44.21%*    22719.70 * -24.95%*
Stddev    send-64          6.49 (   0.00%)        7.35 ( -13.35%)       18.06 (-178.42%)
Stddev    send-256        29.41 (   0.00%)       15.84 (  46.15%)       25.77 (  12.38%)
Stddev    send-2048       97.12 (   0.00%)      238.24 (-145.32%)      276.14 (-184.34%)
Stddev    send-8192      680.48 (   0.00%)      336.65 (  50.53%)      482.57 (  29.08%)
Stddev    recv-64          6.80 (   0.00%)        3.56 (  47.64%)        6.62 (   2.60%)
Stddev    recv-256        31.00 (   0.00%)       10.11 (  67.39%)       14.94 (  51.80%)
Stddev    recv-2048       99.42 (   0.00%)       96.49 (   2.94%)       42.24 (  57.51%)
Stddev    recv-8192      759.75 (   0.00%)      108.46 (  85.72%)      745.22 (   1.91%)

And even for this benchmark, for which 'csc' does not perform very well in general, under load, it does good vs. 'noHT'.


NETPERF-TCP
===========
1)                              BM-HT                BM-noHT
Hmean     64       2495.25 (   0.00%)     2509.94 *   0.59%*
Hmean     256      7331.05 (   0.00%)     7350.96 (   0.27%)
Hmean     2048    32003.90 (   0.00%)    31975.79 (  -0.09%)
Hmean     8192    50482.11 (   0.00%)    51254.29 (   1.53%)
Stddev    64         12.54 (   0.00%)        9.28 (  26.01%)
Stddev    256        20.90 (   0.00%)       35.47 ( -69.76%)
Stddev    2048      160.50 (   0.00%)      186.36 ( -16.12%)
Stddev    8192      944.33 (   0.00%)      365.62 (  61.28%)

Basically on par.

2)                           v-XEN-HT                 XEN-HT
Hmean     64        674.18 (   0.00%)      677.61 (   0.51%)
Hmean     256      2424.16 (   0.00%)     2448.40 *   1.00%*
Hmean     2048    15114.37 (   0.00%)    15160.59 (   0.31%)
Hmean     8192    32626.11 (   0.00%)    32056.40 (  -1.75%)
Stddev    64          5.99 (   0.00%)        5.93 (   0.98%)
Stddev    256        12.07 (   0.00%)       15.11 ( -25.17%)
Stddev    2048      106.56 (   0.00%)      180.05 ( -68.97%)
Stddev    8192      649.77 (   0.00%)      329.45 (  49.30%)

3)                         v-XEN-noHT               XEN-noHT
Hmean     64        674.89 (   0.00%)      679.64 (   0.70%)
Hmean     256      2426.14 (   0.00%)     2443.24 *   0.70%*
Hmean     2048    15273.97 (   0.00%)    15372.31 *   0.64%*
Hmean     8192    32895.01 (   0.00%)    32473.01 *  -1.28%*
Stddev    64          3.90 (   0.00%)        6.01 ( -54.19%)
Stddev    256         8.14 (   0.00%)       18.76 (-130.35%)
Stddev    2048       35.02 (   0.00%)       86.00 (-145.62%)
Stddev    8192      123.38 (   0.00%)      448.67 (-263.64%)

4)                             XEN-HT               XEN-noHT                XEN-csc
Hmean     64        677.61 (   0.00%)      679.64 (   0.30%)      660.08 (  -2.59%)
Hmean     256      2448.40 (   0.00%)     2443.24 (  -0.21%)     2402.64 *  -1.87%*
Hmean     2048    15160.59 (   0.00%)    15372.31 *   1.40%*    14591.20 *  -3.76%*
Hmean     8192    32056.40 (   0.00%)    32473.01 (   1.30%)    32024.74 (  -0.10%)
Stddev    64          5.93 (   0.00%)        6.01 (  -1.39%)       31.04 (-423.45%)
Stddev    256        15.11 (   0.00%)       18.76 ( -24.13%)       46.90 (-210.39%)
Stddev    2048      180.05 (   0.00%)       86.00 (  52.23%)      339.19 ( -88.39%)
Stddev    8192      329.45 (   0.00%)      448.67 ( -36.18%)     1120.36 (-240.07%)

Also basically on par. Well, 'csc' is a little worse, but not as much as the UDP case was. This suggests looking at differences in ways in which, from a scheduling point of view, the two protocols are handled, could help understand what is
happening.

5)                          HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Hmean     64       1756.72 (   0.00%)     1687.44 *  -3.94%*     1644.27 *  -6.40%*
Hmean     256      6335.31 (   0.00%)     6142.90 *  -3.04%*     5850.62 *  -7.65%*
Hmean     2048    29738.54 (   0.00%)    28871.57 *  -2.92%*    27437.72 *  -7.74%*
Hmean     8192    46926.01 (   0.00%)    46011.63 *  -1.95%*    42627.47 *  -9.16%*
Stddev    64         35.93 (   0.00%)       13.81 (  61.56%)       14.56 (  59.48%)
Stddev    256       110.09 (   0.00%)       53.63 (  51.28%)      142.11 ( -29.08%)
Stddev    2048      172.37 (   0.00%)      189.21 (  -9.77%)     1004.81 (-482.94%)
Stddev    8192      453.49 (   0.00%)      239.72 (  47.14%)     1262.32 (-178.36%)

'csc' not vastly, but consistently worse than 'noHT'.

6)                          HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Stddev    64         20.10 (   0.00%)       10.70 (  46.75%)       45.83 (-128.07%)
Stddev    256        47.00 (   0.00%)       32.37 (  31.14%)       45.15 (   3.95%)
Stddev    2048      205.29 (   0.00%)      148.42 (  27.70%)     1950.42 (-850.08%)
Stddev    8192      515.92 (   0.00%)       62.10 (  87.96%)    11909.19 (-2208.33%)
CoeffVar  64          1.19 (   0.00%)        0.64 (  46.37%)        2.79 (-135.02%)
CoeffVar  256         0.77 (   0.00%)        0.53 (  31.25%)        0.76 (   1.48%)
CoeffVar  2048        0.71 (   0.00%)        0.51 (  27.61%)        7.00 (-887.77%)
CoeffVar  8192        1.13 (   0.00%)        0.13 (  88.04%)       24.76 (-2096.20%)

7)                        HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Hmean     64       1272.66 (   0.00%)      996.12 * -21.73%*      968.44 * -23.90%*
Hmean     256      4376.58 (   0.00%)     3565.25 * -18.54%*     4088.77 *  -6.58%*
Hmean     2048    24077.77 (   0.00%)    17496.21 * -27.33%*    21709.20 *  -9.84%*
Hmean     8192    43332.46 (   0.00%)    32506.09 * -24.98%*    36700.18 * -15.31%*
Stddev    64         13.14 (   0.00%)        3.25 (  75.23%)      221.08 (-1582.74%)
Stddev    256        38.64 (   0.00%)       43.54 ( -12.66%)      316.66 (-719.41%)
Stddev    2048      202.53 (   0.00%)      117.33 (  42.07%)      208.33 (  -2.86%)
Stddev    8192      274.71 (   0.00%)      299.39 (  -8.98%)      412.99 ( -50.33%)

Mixed. Some transfer size are better with 'csc' than 'noHT', some are worse.


NETPERF-UNIX
============
1)                              BM-HT                BM-noHT
Hmean     64        998.53 (   0.00%)     1000.92 (   0.24%)
Hmean     256      3672.82 (   0.00%)     3822.29 *   4.07%*
Hmean     2048     7951.98 (   0.00%)     7987.46 (   0.45%)
Hmean     8192     8218.97 (   0.00%)     8329.87 (   1.35%)
Stddev    64         38.66 (   0.00%)       15.94 (  58.78%)
Stddev    256        99.05 (   0.00%)       93.25 (   5.86%)
Stddev    2048       87.53 (   0.00%)       40.49 (  53.75%)
Stddev    8192      133.15 (   0.00%)       74.00 (  44.42%)

Substantially on par.

2)                           v-XEN-HT                 XEN-HT
Hmean     64        499.62 (   0.00%)      492.89 (  -1.35%)
Hmean     256      1900.97 (   0.00%)     1852.77 (  -2.54%)
Hmean     2048     2892.22 (   0.00%)     2765.65 *  -4.38%*
Hmean     8192     3105.85 (   0.00%)     2996.64 *  -3.52%*
Stddev    64          4.49 (   0.00%)       10.64 (-137.15%)
Stddev    256        87.54 (   0.00%)       54.63 (  37.59%)
Stddev    2048       35.26 (   0.00%)       18.20 (  48.39%)
Stddev    8192       21.32 (   0.00%)       15.74 (  26.15%)

3)                         v-XEN-noHT               XEN-noHT
Hmean     64        509.03 (   0.00%)      485.34 *  -4.65%*
Hmean     256      1894.53 (   0.00%)     1835.34 *  -3.12%*
Hmean     2048     2906.15 (   0.00%)     2747.32 *  -5.47%*
Hmean     8192     3158.33 (   0.00%)     2968.85 *  -6.00%*
Stddev    64          3.39 (   0.00%)       20.01 (-489.71%)
Stddev    256        51.32 (   0.00%)       15.13 (  70.52%)
Stddev    2048       33.35 (   0.00%)        9.37 (  71.89%)
Stddev    8192       32.53 (   0.00%)       11.30 (  65.26%)

In both 'HT' and 'noHT' cases, overhead of just having the patch applied is not too big, but it's there. It may be worth studying this workload, trying to figure what are the paths that are introducing such overhead (with the goal of optimizing them).

4)                             XEN-HT               XEN-noHT                XEN-csc
Hmean     64        492.89 (   0.00%)      485.34 (  -1.53%)      451.13 *  -8.47%*
Hmean     256      1852.77 (   0.00%)     1835.34 (  -0.94%)     1272.07 * -31.34%*
Hmean     2048     2765.65 (   0.00%)     2747.32 *  -0.66%*     3086.19 (  11.59%)
Hmean     8192     2996.64 (   0.00%)     2968.85 *  -0.93%*     2731.31 *  -8.85%*
Stddev    64         10.64 (   0.00%)       20.01 ( -88.04%)       25.28 (-137.53%)
Stddev    256        54.63 (   0.00%)       15.13 (  72.31%)      393.61 (-620.45%)
Stddev    2048       18.20 (   0.00%)        9.37 (  48.48%)      946.82 (-5103.09%)
Stddev    8192       15.74 (   0.00%)       11.30 (  28.21%)      104.54 (-564.07%)

Mixed-bag, with higher (both positive and negative, mostly negative) peaks.

5)                          HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Hmean     64        275.22 (   0.00%)      288.06 (   4.66%)      277.23 (   0.73%)
Hmean     256       977.34 (   0.00%)      877.63 ( -10.20%)      761.25 * -22.11%*
Hmean     2048     4702.72 (   0.00%)     5819.12 (  23.74%)     5531.96 (  17.63%)
Hmean     8192     5962.89 (   0.00%)     6462.60 (   8.38%)     5387.48 (  -9.65%)
Stddev    64         48.37 (   0.00%)       46.27 (   4.34%)       59.79 ( -23.61%)
Stddev    256       123.98 (   0.00%)       48.93 (  60.54%)      163.00 ( -31.47%)
Stddev    2048     1181.41 (   0.00%)      761.99 (  35.50%)      827.15 (  29.99%)
Stddev    8192     1708.12 (   0.00%)     1247.83 (  26.95%)     1065.82 (  37.60%)

What we see here, is that 'noHT' deviates more from 'HT'. For 3 configurations out of 4, for the better (i.e., 'noHT' is faster than 'HT'). 'csc' is, again, generally worse than 'noHT'.

6)                          HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Hmean     64        225.32 (   0.00%)      239.36 (   6.23%)      203.05 (  -9.88%)
Hmean     256       707.19 (   0.00%)      782.19 (  10.60%)      751.77 (   6.30%)
Hmean     2048     2719.26 (   0.00%)     4156.29 *  52.85%*     3624.27 (  33.28%)
Hmean     8192     4263.18 (   0.00%)     5724.08 *  34.27%*     3579.16 ( -16.04%)
Stddev    64         46.49 (   0.00%)       31.12 (  33.07%)       11.06 (  76.21%)
Stddev    256       119.32 (   0.00%)      130.87 (  -9.67%)       87.78 (  26.44%)
Stddev    2048     1030.34 (   0.00%)      907.53 (  11.92%)     1715.05 ( -66.46%)
Stddev    8192      821.15 (   0.00%)      275.68 (  66.43%)     1131.40 ( -37.78%)

7)                        HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Hmean     64        243.29 (   0.00%)      165.06 * -32.16%*      188.01 * -22.72%*
Hmean     256       711.91 (   0.00%)      472.92 * -33.57%*      504.46 * -29.14%*
Hmean     2048     3163.60 (   0.00%)     1955.93 * -38.17%*     2880.11 (  -8.96%)
Hmean     8192     4277.51 (   0.00%)     2377.98 * -44.41%*     3086.59 * -27.84%*
Stddev    64         21.43 (   0.00%)       19.08 (  10.96%)       29.45 ( -37.42%)
Stddev    256        86.25 (   0.00%)       50.56 (  41.38%)      112.78 ( -30.76%)
Stddev    2048      311.30 (   0.00%)      317.03 (  -1.84%)      610.38 ( -96.07%)
Stddev    8192      208.86 (   0.00%)      314.18 ( -50.42%)      580.38 (-177.88%)
---
STREAM
======
Dom0
----
                           v-XEN-HT             v-XEN-noHT                 XEN-HT                XEN-noHT                 XEN-csc
MB/sec copy     33076.00 (   0.00%)    33054.78 (  -0.06%)    33112.76 (   0.11%)    33070.62       -0.02    31811.06       -3.82
MB/sec scale    24170.98 (   0.00%)    24198.12 (   0.11%)    24225.04 (   0.22%)    24335.76        0.68    22745.18       -5.90
MB/sec add      27060.58 (   0.00%)    27084.58 (   0.09%)    27195.52 (   0.50%)    27360.86        1.11    25088.74       -7.29
MB/sec triad    27172.34 (   0.00%)    27199.12 (   0.10%)    27296.42 (   0.46%)    27468.54        1.09    25170.02       -7.37

HVM-v8
------
                        v-HVM-v8-HT          v-HVM-v8-noHT              HVM-v8-HT             HVM-v8-noHT              HVM-v8-csc
MB/sec copy     32690.84 (   0.00%)    33611.24 (   2.82%)    33338.56 (   1.98%)    34168.32        4.52    29568.34       -9.55
MB/sec scale    21899.94 (   0.00%)    22677.44 (   3.55%)    22001.94 (   0.47%)    23023.06        5.13    18635.30      -14.91
MB/sec add      25256.10 (   0.00%)    26822.30 (   6.20%)    25442.48 (   0.74%)    26676.96        5.63    21632.14      -14.35
MB/sec triad    25104.72 (   0.00%)    26821.96 (   6.84%)    26376.48 (   5.07%)    26542.96        5.73    21751.54      -13.36

HVM-v4
------
                        v-HVM-v4-HT          v-HVM-v4-noHT              HVM-v4-HT             HVM-v4-noHT              HVM-v4-csc
MB/sec copy     34027.40 (   0.00%)    32887.74 (  -3.35%)    31917.34 (  -6.20%)    33820.44       -0.61    31186.82       -8.35
MB/sec scale    22269.94 (   0.00%)    20785.18 (  -6.67%)    20741.28 (  -6.86%)    22320.82        0.23    19471.54      -12.57
MB/sec add      26549.00 (   0.00%)    25561.32 (  -3.72%)    25120.08 (  -5.38%)    26553.58        0.02    22348.62      -15.82
MB/sec triad    26431.74 (   0.00%)    26365.78 (  -0.25%)    24979.40 (  -5.49%)    26075.40       -1.35    21988.46      -16.81

HVMx2
-----
                      v-HVMx2-v8-HT        v-HVMx2-v8-noHT            HVMx2-v8-HT           HVMx2-v8-noHT            HVMx2-v8-csc
MB/sec copy     29073.86 (   0.00%)    22403.74 ( -22.94%)    28821.94 (  -0.87%)    22508.84      -22.58    27223.16       -6.37
MB/sec scale    20555.88 (   0.00%)    15184.40 ( -26.13%)    20544.64 (  -0.05%)    15269.06      -25.72    17813.02      -13.34
MB/sec add      22447.72 (   0.00%)    16661.36 ( -25.78%)    22774.16 (   1.45%)    16733.48      -25.46    19659.98      -12.42
MB/sec triad    22261.18 (   0.00%)    16654.90 ( -25.18%)    22163.52 (  -0.44%)    16508.20      -25.84    19722.54      -11.40


KERNBENCH
=========
Dom0
----
                                v-XEN-HT             v-XEN-noHT                 XEN-HT               XEN-noHT                XEN-csc
Amean     elsp-2       234.13 (   0.00%)      233.23 *   0.38%*      233.63 (   0.21%)      233.16 *   0.42%*      248.12 *  -5.98%*
Amean     elsp-4       139.89 (   0.00%)      130.00 *   7.07%*      140.21 *  -0.23%*      130.43 *   6.77%*      145.65 *  -4.12%*
Amean     elsp-8        98.15 (   0.00%)      132.86 * -35.37%*       98.22 (  -0.08%)      132.34 * -34.84%*       98.15 (  -0.01%)
Amean     elsp-16       99.05 (   0.00%)      135.61 * -36.90%*       99.65 *  -0.61%*      135.51 * -36.81%*       99.58 (  -0.53%)
Stddev    elsp-2         0.40 (   0.00%)        0.10 (  74.63%)        0.14 (  64.63%)        0.06 (  84.75%)        0.19 (  51.61%)
Stddev    elsp-4         0.11 (   0.00%)        0.04 (  66.87%)        0.14 ( -33.77%)        0.67 (-528.22%)        0.57 (-442.09%)
Stddev    elsp-8         0.23 (   0.00%)        0.37 ( -58.43%)        0.40 ( -68.71%)        0.04 (  82.95%)        0.03 (  86.30%)
Stddev    elsp-16        0.05 (   0.00%)        0.20 (-306.12%)        0.47 (-855.23%)        0.34 (-589.65%)        0.47 (-850.20%)

HVM-v8
------
                             v-HVM-v8-HT          v-HVM-v8-noHT              HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Amean     elsp-2       204.73 (   0.00%)      206.77 *  -1.00%*      202.45 *   1.11%*      205.87 *  -0.56%*      218.87 *  -6.91%*
Amean     elsp-4       123.10 (   0.00%)      115.66 *   6.04%*      121.90 *   0.98%*      115.41 *   6.25%*      128.78 *  -4.61%*
Amean     elsp-8        87.42 (   0.00%)      125.83 * -43.94%*       85.94 *   1.70%*      125.71 * -43.79%*       87.26 (   0.19%)
Amean     elsp-16       87.78 (   0.00%)      128.51 * -46.41%*       87.37 (   0.46%)      128.52 * -46.42%*       88.03 (  -0.29%)
Stddev    elsp-2         0.08 (   0.00%)        0.11 ( -32.29%)        0.09 (  -5.22%)        0.34 (-320.09%)        1.30 (-1513.29%)
Stddev    elsp-4         0.41 (   0.00%)        0.07 (  82.57%)        0.44 (  -8.26%)        0.27 (  33.60%)        0.60 ( -47.86%)
Stddev    elsp-8         1.11 (   0.00%)        0.42 (  61.72%)        0.14 (  87.69%)        0.22 (  79.83%)        0.35 (  68.79%)
Stddev    elsp-16        0.45 (   0.00%)        0.41 (   8.89%)        0.02 (  94.90%)        0.40 (  12.00%)        0.32 (  28.63%)

HVM-v4
------
                             v-HVM-v4-HT          v-HVM-v4-noHT              HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Amean     elsp-2       208.43 (   0.00%)      205.99 *   1.17%*      206.05 *   1.14%*      208.49 (  -0.03%)      237.58 * -13.99%*
Amean     elsp-4       116.18 (   0.00%)      113.77 *   2.07%*      114.53 *   1.42%*      115.46 (   0.62%)      162.61 * -39.96%*
Amean     elsp-8       135.79 (   0.00%)      116.07 *  14.52%*      127.22 *   6.31%*      117.63 *  13.37%*      166.22 * -22.41%*
Amean     elsp-16      139.37 (   0.00%)      119.35 *  14.37%*      133.70 *   4.07%*      120.53 *  13.52%*      170.05 * -22.01%*
Stddev    elsp-2         0.12 (   0.00%)        0.08 (  31.03%)        0.26 (-118.28%)        0.24 (-102.17%)        0.34 (-185.87%)
Stddev    elsp-4         0.64 (   0.00%)        0.02 (  96.39%)        0.29 (  54.24%)        0.23 (  63.79%)        0.24 (  62.09%)
Stddev    elsp-8         1.69 (   0.00%)        0.21 (  87.34%)        2.21 ( -31.12%)        0.12 (  92.76%)        0.24 (  85.59%)
Stddev    elsp-16        1.55 (   0.00%)        0.13 (  91.93%)        2.00 ( -28.78%)        0.57 (  63.08%)        0.34 (  77.94%)

HVMx2
-----
                           v-HVMx2-v8-HT        v-HVMx2-v8-noHT            HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Amean     elsp-2       288.41 (   0.00%)      433.37 * -50.26%*      288.82 (  -0.14%)      437.79 * -51.79%*      339.86 * -17.84%*
Amean     elsp-4       195.93 (   0.00%)      292.53 * -49.30%*      196.18 (  -0.13%)      294.95 * -50.54%*      242.91 * -23.98%*
Amean     elsp-8       152.94 (   0.00%)      226.94 * -48.38%*      153.27 (  -0.21%)      229.25 * -49.89%*      182.50 * -19.33%*
Amean     elsp-16      156.21 (   0.00%)      233.02 * -49.17%*      156.59 (  -0.24%)      235.63 * -50.84%*      185.82 * -18.95%*
Stddev    elsp-2         0.52 (   0.00%)        0.36 (  30.55%)        0.20 (  61.63%)        0.34 (  35.83%)        2.56 (-389.55%)
Stddev    elsp-4         0.58 (   0.00%)        0.32 (  44.47%)        0.48 (  16.07%)        0.07 (  88.46%)        0.41 (  29.44%)
Stddev    elsp-8         0.15 (   0.00%)        0.87 (-477.96%)        0.39 (-157.52%)        0.81 (-438.90%)        1.30 (-764.70%)
Stddev    elsp-16        0.71 (   0.00%)        0.16 (  77.40%)        0.88 ( -25.43%)        0.87 ( -22.96%)        1.75 (-148.69%)


HACKBENCH-PROCESS-PIPE
======================
Dom0
----
                           v-XEN-HT             v-XEN-noHT                 XEN-HT               XEN-noHT                XEN-csc
Amean     1       0.7356 (   0.00%)      1.2154 * -65.23%*      0.7470 (  -1.55%)      1.2200 * -65.85%*      0.7160 (   2.66%)
Amean     3       2.3506 (   0.00%)      3.9274 * -67.08%*      2.4632 (  -4.79%)      3.9302 * -67.20%*      2.2284 (   5.20%)
Amean     5       4.1084 (   0.00%)      6.6430 * -61.69%*      4.0418 (   1.62%)      6.6584 * -62.07%*      4.0492 (   1.44%)
Amean     7       5.8314 (   0.00%)      9.2772 * -59.09%*      5.9784 (  -2.52%)      9.8144 * -68.30%*      5.8718 (  -0.69%)
Amean     12      9.9694 (   0.00%)     14.8220 * -48.67%*     10.4374 *  -4.69%*     14.4464 * -44.91%*     10.1488 (  -1.80%)
Amean     18     14.2844 (   0.00%)     21.6266 * -51.40%*     14.1366 (   1.03%)     20.2166 * -41.53%*     14.7450 *  -3.22%*
Amean     24     19.0304 (   0.00%)     27.2974 * -43.44%*     17.6314 (   7.35%)     26.1082 * -37.19%*     18.0072 (   5.38%)
Amean     30     21.8334 (   0.00%)     32.2406 * -47.67%*     21.9030 (  -0.32%)     31.1702 * -42.76%*     21.1058 (   3.33%)
Amean     32     22.8946 (   0.00%)     34.8984 * -52.43%*     22.0216 *   3.81%*     33.8204 * -47.72%*     22.7300 (   0.72%)
Stddev    1       0.0213 (   0.00%)      0.0727 (-241.00%)      0.0327 ( -53.61%)      0.0560 (-162.76%)      0.0168 (  21.21%)
Stddev    3       0.1634 (   0.00%)      0.2564 ( -56.92%)      0.0952 (  41.73%)      0.1332 (  18.47%)      0.0944 (  42.24%)
Stddev    5       0.3934 (   0.00%)      0.6043 ( -53.61%)      0.1954 (  50.33%)      0.5454 ( -38.63%)      0.3064 (  22.12%)
Stddev    7       0.1796 (   0.00%)      0.6785 (-277.74%)      0.2809 ( -56.40%)      0.4508 (-150.96%)      0.5762 (-220.78%)
Stddev    12      0.1695 (   0.00%)      0.5074 (-199.31%)      0.3725 (-119.74%)      0.8115 (-378.66%)      0.3594 (-111.98%)
Stddev    18      0.1392 (   0.00%)      0.9622 (-591.50%)      0.4003 (-187.67%)      0.8868 (-537.31%)      0.2580 ( -85.37%)
Stddev    24      1.5323 (   0.00%)      0.4904 (  68.00%)      0.7330 (  52.16%)      0.8453 (  44.83%)      0.8976 (  41.42%)
Stddev    30      0.4214 (   0.00%)      0.5542 ( -31.51%)      1.0668 (-153.14%)      0.6499 ( -54.22%)      0.8666 (-105.64%)
Stddev    32      0.8221 (   0.00%)      0.4986 (  39.35%)      0.3849 (  53.18%)      0.7965 (   3.11%)      1.4428 ( -75.50%)

HVM-v8
------
                        v-HVM-v8-HT          v-HVM-v8-noHT              HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Amean     1       0.9110 (   0.00%)      1.1916 * -30.80%*      0.8992 (   1.30%)      1.1826 * -29.81%*      0.9462 (  -3.86%)
Amean     3       2.5652 (   0.00%)      2.5224 (   1.67%)      2.4524 *   4.40%*      2.4990 (   2.58%)      2.5514 (   0.54%)
Amean     5       3.7944 (   0.00%)      3.6360 (   4.17%)      3.6710 (   3.25%)      3.5258 (   7.08%)      3.9270 (  -3.49%)
Amean     7       4.9098 (   0.00%)      4.7782 (   2.68%)      4.6588 (   5.11%)      4.6430 (   5.43%)      5.0420 (  -2.69%)
Amean     12      7.7490 (   0.00%)      7.8662 (  -1.51%)      7.7872 (  -0.49%)      7.6310 (   1.52%)      7.6498 (   1.28%)
Amean     18     11.5028 (   0.00%)     13.6472 * -18.64%*     10.9800 (   4.54%)     12.2772 (  -6.73%)     11.1940 (   2.68%)
Amean     24     14.5122 (   0.00%)     18.7738 * -29.37%*     16.5552 * -14.08%*     18.1368 * -24.98%*     15.1922 (  -4.69%)
Amean     30     19.9878 (   0.00%)     23.7490 * -18.82%*     17.1778 *  14.06%*     24.0352 * -20.25%*     17.7776 *  11.06%*
Amean     32     17.0554 (   0.00%)     25.6216 * -50.23%*     19.1966 ( -12.55%)     25.3104 * -48.40%*     19.8716 * -16.51%*
Stddev    1       0.0403 (   0.00%)      0.0441 (  -9.28%)      0.0306 (  24.17%)      0.0440 (  -9.25%)      0.0419 (  -3.84%)
Stddev    3       0.0681 (   0.00%)      0.0243 (  64.28%)      0.0681 (   0.11%)      0.1289 ( -89.14%)      0.0975 ( -43.14%)
Stddev    5       0.2333 (   0.00%)      0.1336 (  42.73%)      0.1214 (  47.96%)      0.2722 ( -16.65%)      0.2080 (  10.83%)
Stddev    7       0.3640 (   0.00%)      0.3416 (   6.16%)      0.5810 ( -59.58%)      0.2687 (  26.18%)      0.3038 (  16.56%)
Stddev    12      1.4176 (   0.00%)      0.7817 (  44.85%)      1.4867 (  -4.88%)      1.3196 (   6.91%)      1.5775 ( -11.28%)
Stddev    18      2.0262 (   0.00%)      1.4042 (  30.70%)      1.9110 (   5.69%)      1.8377 (   9.30%)      1.3230 (  34.71%)
Stddev    24      1.9163 (   0.00%)      1.2319 (  35.71%)      1.4038 (  26.75%)      1.2938 (  32.49%)      1.2301 (  35.81%)
Stddev    30      0.8934 (   0.00%)      0.5216 (  41.62%)      2.0035 (-124.27%)      1.3529 ( -51.44%)      2.4480 (-174.03%)
Stddev    32      1.9707 (   0.00%)      1.1169 (  43.32%)      1.8277 (   7.26%)      1.1786 (  40.20%)      2.4069 ( -22.13%)

HVM-v4
------
                        v-HVM-v4-HT          v-HVM-v4-noHT              HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Amean     1       1.4166 (   0.00%)      1.5276 (  -7.84%)      1.4962 (  -5.62%)      1.4424 (  -1.82%)      1.6824 * -18.76%*
Amean     3       2.8812 (   0.00%)      2.5782 (  10.52%)      2.5826 (  10.36%)      2.9692 (  -3.05%)      2.9984 (  -4.07%)
Amean     5       3.6958 (   0.00%)      2.9828 *  19.29%*      3.4330 (   7.11%)      3.5064 (   5.12%)      4.7144 * -27.56%*
Amean     7       4.0368 (   0.00%)      4.3646 (  -8.12%)      5.3896 ( -33.51%)      4.5122 ( -11.78%)      5.7828 * -43.25%*
Amean     12      8.5918 (   0.00%)      7.5160 (  12.52%)      7.6862 (  10.54%)      8.1164 (   5.53%)     10.3868 * -20.89%*
Amean     18     11.8664 (   0.00%)      9.6676 (  18.53%)     10.8628 (   8.46%)     10.8600 (   8.48%)     14.5226 * -22.38%*
Amean     24     15.1868 (   0.00%)     14.3044 (   5.81%)     16.1058 (  -6.05%)     13.9238 (   8.32%)     17.3746 ( -14.41%)
Amean     30     18.1732 (   0.00%)     13.3870 *  26.34%*     17.9462 (   1.25%)     20.1588 ( -10.93%)     22.2760 * -22.58%*
Amean     32     20.1886 (   0.00%)     18.9476 (   6.15%)     20.8922 (  -3.49%)     15.2004 *  24.71%*     22.5168 ( -11.53%)
Stddev    1       0.1570 (   0.00%)      0.0459 (  70.75%)      0.0353 (  77.49%)      0.0996 (  36.54%)      0.0573 (  63.51%)
Stddev    3       0.2794 (   0.00%)      0.3201 ( -14.55%)      0.4556 ( -63.07%)      0.6097 (-118.20%)      0.4727 ( -69.16%)
Stddev    5       0.5446 (   0.00%)      0.2571 (  52.80%)      0.7936 ( -45.73%)      0.6409 ( -17.68%)      1.0294 ( -89.03%)
Stddev    7       1.0122 (   0.00%)      0.5749 (  43.20%)      1.3387 ( -32.26%)      0.9335 (   7.78%)      1.5652 ( -54.63%)
Stddev    12      1.3820 (   0.00%)      1.3625 (   1.41%)      1.1304 (  18.21%)      1.8932 ( -36.98%)      1.6302 ( -17.96%)
Stddev    18      1.7693 (   0.00%)      2.4127 ( -36.37%)      2.6187 ( -48.01%)      2.2309 ( -26.09%)      0.9304 (  47.41%)
Stddev    24      2.4119 (   0.00%)      3.0559 ( -26.70%)      2.2138 (   8.21%)      1.6892 (  29.96%)      1.4706 (  39.03%)
Stddev    30      1.5649 (   0.00%)      1.5202 (   2.85%)      1.6264 (  -3.93%)      2.1632 ( -38.24%)      1.6574 (  -5.91%)
Stddev    32      1.8988 (   0.00%)      2.5294 ( -33.21%)      3.5272 ( -85.76%)      2.5166 ( -32.54%)      2.0956 ( -10.36%)

HVMx2
-----
                    v-HVMx2-v8-HT        v-HVMx2-v8-noHT            HVMx2-v8-HT           HVMx2-v8-noHT            HVMx2-v8-csc
Amean     1       1.4746 (   0.00%)      1.1486 *  22.11%*      1.4736 (   0.07%)      1.2224 *  17.10%*      1.6544 * -12.19%*
Amean     3       3.2278 (   0.00%)      2.3356 *  27.64%*      3.1764 (   1.59%)      2.4984 *  22.60%*      2.9720 *   7.92%*
Amean     5       4.8324 (   0.00%)      3.4158 *  29.31%*      4.7694 (   1.30%)      3.4870 *  27.84%*      4.4406 *   8.11%*
Amean     7       6.2532 (   0.00%)      4.4590 *  28.69%*      6.3958 (  -2.28%)      4.6320 *  25.93%*      5.9150 (   5.41%)
Amean     12     11.6814 (   0.00%)      7.5236 *  35.59%*     11.0484 (   5.42%)      7.6534 *  34.48%*      9.7064 *  16.91%*
Amean     18     16.7618 (   0.00%)     11.7750 *  29.75%*     16.8658 (  -0.62%)     12.8028 *  23.62%*     17.0654 (  -1.81%)
Amean     24     23.3174 (   0.00%)     17.8512 *  23.44%*     23.6798 (  -1.55%)     18.8692 *  19.08%*     24.1742 (  -3.67%)
Amean     30     29.7604 (   0.00%)     25.2918 *  15.02%*     29.6202 (   0.47%)     25.4126 *  14.61%*     30.5034 (  -2.50%)
Amean     32     31.9166 (   0.00%)     26.6520 *  16.49%*     31.7582 (   0.50%)     28.5946 *  10.41%*     33.0222 *  -3.46%*
Stddev    1       0.0496 (   0.00%)      0.0159 (  67.85%)      0.0309 (  37.60%)      0.0294 (  40.77%)      0.0526 (  -6.03%)
Stddev    3       0.0469 (   0.00%)      0.1044 (-122.37%)      0.1102 (-134.80%)      0.0670 ( -42.65%)      0.0956 (-103.57%)
Stddev    5       0.1882 (   0.00%)      0.1189 (  36.79%)      0.2023 (  -7.51%)      0.1089 (  42.12%)      0.0694 (  63.10%)
Stddev    7       0.3875 (   0.00%)      0.1279 (  66.99%)      0.3783 (   2.36%)      0.1408 (  63.67%)      0.2421 (  37.52%)
Stddev    12      1.1041 (   0.00%)      0.3356 (  69.61%)      1.5160 ( -37.30%)      0.3483 (  68.45%)      1.3210 ( -19.64%)
Stddev    18      1.1648 (   0.00%)      0.9867 (  15.29%)      1.3000 ( -11.61%)      1.2858 ( -10.39%)      0.8736 (  25.00%)
Stddev    24      1.8746 (   0.00%)      1.7346 (   7.47%)      0.8749 (  53.33%)      0.8663 (  53.79%)      0.8197 (  56.27%)
Stddev    30      0.8898 (   0.00%)      0.8720 (   2.01%)      1.3074 ( -46.92%)      1.4297 ( -60.67%)      1.2882 ( -44.77%)
Stddev    32      0.8406 (   0.00%)      0.9333 ( -11.03%)      1.2521 ( -48.95%)      1.0791 ( -28.38%)      0.7319 (  12.93%)


MUTILATE
========
Dom0
----
                          v-XEN-HT             v-XEN-noHT                 XEN-HT               XEN-noHT                XEN-csc
Hmean     1    31121.40 (   0.00%)    31182.90 (   0.20%)    31063.01 (  -0.19%)    31107.06 (  -0.05%)    29609.98 *  -4.86%*
Hmean     3    75847.21 (   0.00%)    83473.18 *  10.05%*    75373.02 (  -0.63%)    84387.92 *  11.26%*    73419.60 *  -3.20%*
Hmean     5   113823.13 (   0.00%)    90322.68 * -20.65%*   113127.84 (  -0.61%)    91396.40 * -19.70%*   100843.37 * -11.40%*
Hmean     7   112553.93 (   0.00%)    91082.97 * -19.08%*   112467.16 (  -0.08%)    91296.28 * -18.89%*   108439.16 *  -3.66%*
Hmean     8   103472.02 (   0.00%)    89529.42 * -13.47%*   103246.75 (  -0.22%)    89442.61 * -13.56%*   101164.18 *  -2.23%*
Stddev    1      147.14 (   0.00%)      369.37 (-151.04%)      175.64 ( -19.37%)      336.37 (-128.61%)      271.28 ( -84.38%)
Stddev    3      920.80 (   0.00%)      300.44 (  67.37%)     1017.40 ( -10.49%)      333.46 (  63.79%)      367.95 (  60.04%)
Stddev    5      339.23 (   0.00%)      842.22 (-148.27%)      781.99 (-130.52%)      816.23 (-140.61%)      549.77 ( -62.06%)
Stddev    7      248.83 (   0.00%)      907.80 (-264.83%)      404.74 ( -62.66%)     1084.18 (-335.72%)      376.92 ( -51.48%)
Stddev    8      132.22 (   0.00%)      362.90 (-174.47%)      274.16 (-107.35%)      890.80 (-573.74%)      987.59 (-646.94%)

HVM-v8
------
                       v-HVM-v8-HT          v-HVM-v8-noHT              HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Hmean     1    40028.72 (   0.00%)    37434.58 *  -6.48%*    39583.53 *  -1.11%*    38828.65 *  -3.00%*    35103.76 * -12.30%*
Hmean     3    86535.13 (   0.00%)    55588.26 * -35.76%*    87210.00 (   0.78%)    57874.31 * -33.12%*    47160.22 * -45.50%*
Hmean     5   129214.62 (   0.00%)    40925.40 * -68.33%*   130676.61 (   1.13%)    42883.21 * -66.81%*    85504.60 * -33.83%*
Hmean     7   156682.49 (   0.00%)    51225.56 * -67.31%*   157586.93 (   0.58%)    59146.02 * -62.25%*   136717.21 * -12.74%*
Hmean     8   170156.78 (   0.00%)    67430.68 * -60.37%*   174113.49 *   2.33%*    75503.09 * -55.63%*   159666.07 *  -6.17%*
Stddev    1      151.77 (   0.00%)      288.75 ( -90.25%)      160.68 (  -5.87%)      232.01 ( -52.87%)      547.91 (-261.01%)
Stddev    3      322.96 (   0.00%)      557.82 ( -72.72%)     1158.62 (-258.75%)      654.05 (-102.51%)    18841.25 (-5733.90%)
Stddev    5     1396.24 (   0.00%)     1741.63 ( -24.74%)      298.35 (  78.63%)      542.51 (  61.14%)     1274.93 (   8.69%)
Stddev    7      808.54 (   0.00%)     2547.30 (-215.05%)     2569.12 (-217.75%)     1034.22 ( -27.91%)     4163.80 (-414.98%)
Stddev    8     2411.73 (   0.00%)     3538.44 ( -46.72%)      982.48 (  59.26%)      692.10 (  71.30%)     1324.30 (  45.09%)

HVM-v4
------
                       v-HVM-v4-HT          v-HVM-v4-noHT              HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Hmean     1    38127.59 (   0.00%)    37939.84 (  -0.49%)    37156.88 *  -2.55%*    36938.92 *  -3.12%*    34869.19 *  -8.55%*
Hmean     3    91447.41 (   0.00%)    91291.57 (  -0.17%)    91170.71 (  -0.30%)    91469.09 (   0.02%)    76214.51 * -16.66%*
Hmean     5   119169.44 (   0.00%)   130797.85 *   9.76%*   119020.55 (  -0.12%)   126949.11 *   6.53%*   109062.82 *  -8.48%*
Hmean     7   122588.88 (   0.00%)   129802.50 (   5.88%)   119924.66 (  -2.17%)   133372.57 *   8.80%*   110466.18 *  -9.89%*
Hmean     8   127027.63 (   0.00%)   137217.76 *   8.02%*   121329.34 (  -4.49%)   136100.78 (   7.14%)   114071.36 * -10.20%*
Stddev    1      100.77 (   0.00%)      271.37 (-169.31%)      214.02 (-112.39%)      362.58 (-259.82%)      516.53 (-412.61%)
Stddev    3      980.00 (   0.00%)     1264.82 ( -29.06%)     2554.47 (-160.66%)     1549.81 ( -58.14%)      748.72 (  23.60%)
Stddev    5      545.35 (   0.00%)     2910.89 (-433.76%)     1061.32 ( -94.61%)     2250.11 (-312.60%)     2377.12 (-335.89%)
Stddev    7     6082.18 (   0.00%)     9179.69 ( -50.93%)     5486.68 (   9.79%)     3214.60 (  47.15%)     4147.45 (  31.81%)
Stddev    8     6445.70 (   0.00%)      433.52 (  93.27%)     1857.60 (  71.18%)    12837.92 ( -99.17%)     3400.78 (  47.24%)

HVMx2
-----
                     v-HVMx2-v8-HT        v-HVMx2-v8-noHT            HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Hmean     1    30318.06 (   0.00%)     1864.40 * -93.85%*    30194.08 (  -0.41%)     2457.66 * -91.89%*     4940.93 * -83.70%*
Hmean     3    34617.67 (   0.00%)     7414.77 * -78.58%*    33860.02 (  -2.19%)    11209.64 * -67.62%*    23274.26 * -32.77%*
Hmean     5    44382.41 (   0.00%)    11534.71 * -74.01%*    45606.53 (   2.76%)    15984.89 * -63.98%*    38599.75 ( -13.03%)
Hmean     7    65170.13 (   0.00%)    23530.68 * -63.89%*    70754.93 *   8.57%*    23045.58 * -64.64%*    55799.74 * -14.38%*
Hmean     8    85619.81 (   0.00%)    36969.39 * -56.82%*    88529.15 (   3.40%)    36532.64 * -57.33%*    70367.33 * -17.81%*
Stddev    1      532.15 (   0.00%)        3.55 (  99.33%)      214.55 (  59.68%)        2.87 (  99.46%)    13159.62 (-2372.92%)
Stddev    3      520.12 (   0.00%)       66.14 (  87.28%)      440.61 (  15.29%)      179.15 (  65.56%)     4463.14 (-758.09%)
Stddev    5      730.81 (   0.00%)       52.70 (  92.79%)      900.27 ( -23.19%)      161.20 (  77.94%)     6286.95 (-760.27%)
Stddev    7     1851.68 (   0.00%)     1217.81 (  34.23%)      706.22 (  61.86%)      708.81 (  61.72%)     3845.49 (-107.68%)
Stddev    8     2002.34 (   0.00%)      868.00 (  56.65%)     3197.46 ( -59.69%)      818.97 (  59.10%)     4477.51 (-123.61%)


PGIOPERFBENCH
=============
Dom0
----
                               v-XEN-HT             v-XEN-noHT                 XEN-HT               XEN-noHT                XEN-csc
Amean     commit        7.88 (   0.00%)        8.53 (  -8.25%)        8.78 * -11.43%*        7.64 (   3.09%)        8.20 (  -4.04%)
Amean     read          6.85 (   0.00%)        7.30 *  -6.63%*        7.71 * -12.54%*        6.86 (  -0.21%)        7.21 *  -5.30%*
Amean     wal           0.07 (   0.00%)        0.08 * -14.01%*        0.07 (   1.15%)        0.08 * -11.04%*        0.08 (  -6.71%)
Stddev    commit        1.86 (   0.00%)        4.27 (-129.47%)        3.59 ( -92.56%)        1.35 (  27.62%)        3.50 ( -88.00%)
Stddev    read          0.53 (   0.00%)        3.12 (-494.65%)        2.89 (-450.88%)        0.39 (  25.38%)        2.50 (-376.42%)
Stddev    wal           0.05 (   0.00%)        0.04 (   4.58%)        0.05 (  -0.86%)        0.04 (  10.58%)        0.04 (   5.91%)

HVM-v8
------
                            v-HVM-v8-HT          v-HVM-v8-noHT              HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Amean     commit        7.71 (   0.00%)        7.64 (   0.79%)        7.83 (  -1.62%)        7.74 (  -0.44%)        7.68 (   0.28%)
Amean     read          6.96 (   0.00%)        6.88 *   1.12%*        6.97 (  -0.12%)        6.97 (  -0.15%)        6.92 *   0.51%*
Amean     wal           0.04 (   0.00%)        0.00 *  94.25%*        0.04 *  19.02%*        0.01 *  83.68%*        0.07 * -49.56%*
Stddev    commit        1.55 (   0.00%)        1.69 (  -8.82%)        1.60 (  -3.09%)        1.54 (   0.81%)        1.59 (  -2.73%)
Stddev    read          0.61 (   0.00%)        0.81 ( -32.60%)        0.53 (  12.82%)        0.55 (   9.68%)        0.68 ( -11.02%)
Stddev    wal           0.05 (   0.00%)        0.02 (  68.16%)        0.05 (   3.33%)        0.03 (  47.69%)        0.05 (   5.37%)

HVM-v4
------
                            v-HVM-v4-HT          v-HVM-v4-noHT              HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Amean     commit        7.85 (   0.00%)        7.57 (   3.57%)        7.71 (   1.77%)        7.67 (   2.33%)        7.82 (   0.38%)
Amean     read          6.92 (   0.00%)        6.83 *   1.26%*        6.90 (   0.23%)        6.90 (   0.28%)        6.91 (   0.10%)
Amean     wal           0.05 (   0.00%)        0.05 (  -1.47%)        0.07 * -29.63%*        0.05 (   5.48%)        0.04 (  13.85%)
Stddev    commit        1.78 (   0.00%)        1.47 (  17.89%)        1.62 (   8.98%)        1.51 (  15.57%)        1.65 (   7.34%)
Stddev    read          0.61 (   0.00%)        0.53 (  13.75%)        0.55 (   9.47%)        0.66 (  -8.62%)        0.57 (   7.64%)
Stddev    wal           0.05 (   0.00%)        0.05 (   0.07%)        0.05 (   5.86%)        0.05 (  -0.02%)        0.05 (   0.56%)

HVMx2
-----
                          v-HVMx2-v8-HT        v-HVMx2-v8-noHT            HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Amean     commit        7.86 (   0.00%)        8.16 *  -3.81%*        7.94 (  -1.01%)        8.31 *  -5.63%*       10.28 * -30.76%*
Amean     read          7.00 (   0.00%)        7.52 *  -7.32%*        6.98 (   0.30%)        7.91 * -12.85%*        9.47 * -35.17%*
Amean     wal           0.00 (   0.00%)        0.00 (  -2.18%)        0.01 *-231.22%*        0.07 *-1605.80%*        0.24 *-6035.64%*
Stddev    commit        1.63 (   0.00%)        1.43 (  11.96%)        1.75 (  -7.70%)        1.30 (  20.41%)        2.17 ( -33.15%)
Stddev    read          0.62 (   0.00%)        0.69 ( -12.39%)        0.67 (  -8.79%)        0.79 ( -27.80%)        2.21 (-257.25%)
Stddev    wal           0.02 (   0.00%)        0.02 ( -16.40%)        0.03 ( -73.40%)        0.09 (-346.15%)        0.39 (-1907.58%)


NETPERF-UDP
===========
Dom0
----
                                  v-XEN-HT             v-XEN-noHT                 XEN-HT               XEN-noHT                XEN-csc
Hmean     send-64        223.35 (   0.00%)      222.53 (  -0.36%)      220.23 *  -1.39%*      220.12 *  -1.44%*      199.70 * -10.59%*
Hmean     send-256       876.26 (   0.00%)      874.65 (  -0.18%)      863.81 *  -1.42%*      866.59 (  -1.10%)      791.34 *  -9.69%*
Hmean     send-2048     6575.95 (   0.00%)     6595.97 (   0.30%)     6542.13 (  -0.51%)     6568.38 (  -0.12%)     5758.09 * -12.44%*
Hmean     send-8192    20882.75 (   0.00%)    21537.45 *   3.14%*    20999.88 (   0.56%)    21163.69 *   1.35%*    19706.48 *  -5.63%*
Hmean     recv-64        223.35 (   0.00%)      222.52 (  -0.37%)      220.23 *  -1.39%*      220.08 *  -1.46%*      199.70 * -10.59%*
Hmean     recv-256       876.22 (   0.00%)      874.55 (  -0.19%)      863.81 *  -1.42%*      866.47 (  -1.11%)      791.09 *  -9.72%*
Hmean     recv-2048     6575.00 (   0.00%)     6595.64 (   0.31%)     6541.22 (  -0.51%)     6566.75 (  -0.13%)     5757.50 * -12.43%*
Hmean     recv-8192    20881.63 (   0.00%)    21534.73 *   3.13%*    20996.13 (   0.55%)    21157.86 *   1.32%*    19703.65 *  -5.64%*
Stddev    send-64          1.16 (   0.00%)        1.45 ( -25.00%)        0.94 (  19.15%)        1.64 ( -41.40%)       21.30 (-1738.47%)
Stddev    send-256         7.50 (   0.00%)        4.96 (  33.77%)        7.71 (  -2.79%)        8.92 ( -18.97%)       63.89 (-752.31%)
Stddev    send-2048       29.58 (   0.00%)       35.06 ( -18.54%)       57.84 ( -95.56%)       64.88 (-119.37%)      112.05 (-278.84%)
Stddev    send-8192      106.01 (   0.00%)      199.63 ( -88.31%)      172.69 ( -62.90%)      161.66 ( -52.49%)      276.33 (-160.67%)
Stddev    recv-64          1.16 (   0.00%)        1.44 ( -24.28%)        0.94 (  19.15%)        1.62 ( -39.82%)       21.30 (-1738.30%)
Stddev    recv-256         7.50 (   0.00%)        4.94 (  34.09%)        7.70 (  -2.70%)        9.03 ( -20.39%)       64.08 (-754.67%)
Stddev    recv-2048       28.84 (   0.00%)       34.99 ( -21.34%)       56.58 ( -96.20%)       65.80 (-128.14%)      112.00 (-288.34%)
Stddev    recv-8192      105.96 (   0.00%)      201.10 ( -89.80%)      167.07 ( -57.68%)      162.03 ( -52.92%)      276.84 (-161.28%)

HVM-v8
------
                               v-HVM-v8-HT          v-HVM-v8-noHT              HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Hmean     send-64        496.94 (   0.00%)      447.92 *  -9.86%*      518.85 (   4.41%)      487.85 (  -1.83%)      505.10 (   1.64%)
Hmean     send-256      1952.73 (   0.00%)     1720.99 * -11.87%*     1955.77 (   0.16%)     1784.00 *  -8.64%*     1931.18 (  -1.10%)
Hmean     send-2048    13443.20 (   0.00%)    12361.56 *  -8.05%*    12849.78 (  -4.41%)    13027.81 (  -3.09%)    13485.59 (   0.32%)
Hmean     send-8192    37350.97 (   0.00%)    34440.52 *  -7.79%*    37659.20 (   0.83%)    35773.79 *  -4.22%*    35040.88 *  -6.18%*
Hmean     recv-64        495.88 (   0.00%)      447.65 *  -9.73%*      518.32 (   4.52%)      487.76 (  -1.64%)      504.86 (   1.81%)
Hmean     recv-256      1951.43 (   0.00%)     1720.97 * -11.81%*     1954.42 (   0.15%)     1782.90 *  -8.64%*     1930.33 (  -1.08%)
Hmean     recv-2048    13418.66 (   0.00%)    12358.40 *  -7.90%*    12830.91 (  -4.38%)    13018.47 (  -2.98%)    13475.39 (   0.42%)
Hmean     recv-8192    37326.31 (   0.00%)    34420.53 *  -7.78%*    37619.84 (   0.79%)    35768.42 *  -4.17%*    35035.40 *  -6.14%*
Stddev    send-64         15.80 (   0.00%)       11.38 (  27.94%)       29.69 ( -87.96%)        6.43 (  59.29%)       17.36 (  -9.89%)
Stddev    send-256        18.04 (   0.00%)       17.00 (   5.75%)       64.17 (-255.76%)       26.82 ( -48.67%)       33.37 ( -85.02%)
Stddev    send-2048      924.84 (   0.00%)       47.89 (  94.82%)     1342.98 ( -45.21%)      246.21 (  73.38%)      353.97 (  61.73%)
Stddev    send-8192      477.71 (   0.00%)      673.42 ( -40.97%)      224.20 (  53.07%)      467.40 (   2.16%)      928.70 ( -94.41%)
Stddev    recv-64         16.27 (   0.00%)       11.73 (  27.93%)       29.96 ( -84.12%)        6.54 (  59.80%)       17.64 (  -8.43%)
Stddev    recv-256        19.80 (   0.00%)       17.02 (  14.06%)       64.29 (-224.73%)       27.28 ( -37.81%)       34.17 ( -72.59%)
Stddev    recv-2048      948.89 (   0.00%)       48.42 (  94.90%)     1353.73 ( -42.66%)      251.38 (  73.51%)      368.14 (  61.20%)
Stddev    recv-8192      500.51 (   0.00%)      685.63 ( -36.99%)      206.98 (  58.65%)      467.59 (   6.58%)      928.73 ( -85.56%)

HVM-v4
------
                               v-HVM-v4-HT          v-HVM-v4-noHT              HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Hmean     send-64        501.41 (   0.00%)      502.90 (   0.30%)      513.95 (   2.50%)      511.96 *   2.11%*      458.46 *  -8.57%*
Hmean     send-256      1881.72 (   0.00%)     1898.71 (   0.90%)     1928.30 (   2.48%)     1930.57 (   2.60%)     1831.70 (  -2.66%)
Hmean     send-2048    13544.27 (   0.00%)    13626.68 (   0.61%)    13509.70 (  -0.26%)    13802.90 (   1.91%)    12739.81 *  -5.94%*
Hmean     send-8192    36486.91 (   0.00%)    36897.96 (   1.13%)    35966.73 (  -1.43%)    36086.45 (  -1.10%)    35293.89 (  -3.27%)
Hmean     recv-64        501.13 (   0.00%)      502.67 (   0.31%)      513.90 (   2.55%)      511.91 *   2.15%*      458.34 *  -8.54%*
Hmean     recv-256      1880.67 (   0.00%)     1897.95 (   0.92%)     1927.94 (   2.51%)     1930.48 (   2.65%)     1831.50 (  -2.61%)
Hmean     recv-2048    13540.53 (   0.00%)    13623.45 (   0.61%)    13506.19 (  -0.25%)    13799.61 (   1.91%)    12734.53 *  -5.95%*
Hmean     recv-8192    36462.71 (   0.00%)    36888.91 (   1.17%)    35936.42 (  -1.44%)    36069.50 (  -1.08%)    35266.17 (  -3.28%)
Stddev    send-64         11.94 (   0.00%)        7.15 (  40.09%)       12.88 (  -7.85%)        3.89 (  67.41%)       16.11 ( -34.93%)
Stddev    send-256        68.31 (   0.00%)       53.85 (  21.17%)       37.13 (  45.65%)       19.62 (  71.28%)       30.54 (  55.30%)
Stddev    send-2048      310.13 (   0.00%)      314.36 (  -1.36%)       86.00 (  72.27%)      100.64 (  67.55%)      311.99 (  -0.60%)
Stddev    send-8192      443.37 (   0.00%)      260.08 (  41.34%)      926.89 (-109.06%)     1088.50 (-145.50%)     2801.65 (-531.89%)
Stddev    recv-64         12.13 (   0.00%)        7.36 (  39.32%)       12.93 (  -6.59%)        3.96 (  67.34%)       16.15 ( -33.13%)
Stddev    recv-256        69.46 (   0.00%)       54.17 (  22.01%)       37.76 (  45.64%)       19.75 (  71.56%)       30.68 (  55.82%)
Stddev    recv-2048      314.18 (   0.00%)      318.63 (  -1.41%)       88.84 (  71.72%)      100.51 (  68.01%)      313.79 (   0.13%)
Stddev    recv-8192      461.94 (   0.00%)      261.06 (  43.49%)      937.92 (-103.04%)     1098.48 (-137.80%)     2822.43 (-511.00%)

HVMx2
-----
                             v-HVMx2-v8-HT        v-HVMx2-v8-noHT            HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Hmean     send-64        334.22 (   0.00%)      238.90 * -28.52%*      359.65 *   7.61%*      251.84 * -24.65%*      348.95 (   4.41%)
Hmean     send-256      1303.20 (   0.00%)      916.12 * -29.70%*     1376.16 *   5.60%*      969.58 * -25.60%*     1293.19 (  -0.77%)
Hmean     send-2048     9759.14 (   0.00%)     6606.50 * -32.30%*    10212.34 *   4.64%*     6893.41 * -29.36%*     8933.39 *  -8.46%*
Hmean     send-8192    29504.19 (   0.00%)    19067.47 * -35.37%*    30442.79 *   3.18%*    18372.42 * -37.73%*    25486.82 * -13.62%*
Hmean     recv-64        333.71 (   0.00%)      217.71 * -34.76%*      359.01 *   7.58%*      240.67 * -27.88%*      319.95 *  -4.13%*
Hmean     recv-256      1300.47 (   0.00%)      804.88 * -38.11%*     1372.71 *   5.55%*      923.23 * -29.01%*     1206.20 *  -7.25%*
Hmean     recv-2048     9740.73 (   0.00%)     5448.32 * -44.07%*    10185.94 *   4.57%*     6258.59 * -35.75%*     8371.03 * -14.06%*
Hmean     recv-8192    29416.78 (   0.00%)    14566.32 * -50.48%*    30272.47 *   2.91%*    16888.08 * -42.59%*    22719.70 * -22.77%*
Stddev    send-64          8.65 (   0.00%)        6.86 (  20.73%)        6.49 (  25.00%)        7.35 (  14.99%)       18.06 (-108.81%)
Stddev    send-256        35.52 (   0.00%)       10.83 (  69.50%)       29.41 (  17.19%)       15.84 (  55.41%)       25.77 (  27.45%)
Stddev    send-2048      167.61 (   0.00%)       42.08 (  74.89%)       97.12 (  42.06%)      238.24 ( -42.14%)      276.14 ( -64.75%)
Stddev    send-8192      291.17 (   0.00%)      305.17 (  -4.81%)      680.48 (-133.70%)      336.65 ( -15.62%)      482.57 ( -65.73%)
Stddev    recv-64          8.88 (   0.00%)        6.20 (  30.19%)        6.80 (  23.42%)        3.56 (  59.90%)        6.62 (  25.41%)
Stddev    recv-256        35.77 (   0.00%)       10.66 (  70.21%)       31.00 (  13.35%)       10.11 (  71.74%)       14.94 (  58.23%)
Stddev    recv-2048      169.57 (   0.00%)       52.21 (  69.21%)       99.42 (  41.37%)       96.49 (  43.10%)       42.24 (  75.09%)
Stddev    recv-8192      308.98 (   0.00%)      156.60 (  49.32%)      759.75 (-145.89%)      108.46 (  64.90%)      745.22 (-141.18%)


NETPERF-TCP
===========
Dom0
----
                             v-XEN-HT             v-XEN-noHT                 XEN-HT               XEN-noHT                XEN-csc
Hmean     64        674.18 (   0.00%)      674.89 (   0.11%)      677.61 (   0.51%)      679.64 (   0.81%)      660.08 (  -2.09%)
Hmean     256      2424.16 (   0.00%)     2426.14 (   0.08%)     2448.40 *   1.00%*     2443.24 *   0.79%*     2402.64 (  -0.89%)
Hmean     2048    15114.37 (   0.00%)    15273.97 *   1.06%*    15160.59 (   0.31%)    15372.31 *   1.71%*    14591.20 *  -3.46%*
Hmean     8192    32626.11 (   0.00%)    32895.01 (   0.82%)    32056.40 (  -1.75%)    32473.01 (  -0.47%)    32024.74 (  -1.84%)
Stddev    64          5.99 (   0.00%)        3.90 (  34.89%)        5.93 (   0.98%)        6.01 (  -0.40%)       31.04 (-418.30%)
Stddev    256        12.07 (   0.00%)        8.14 (  32.55%)       15.11 ( -25.17%)       18.76 ( -55.37%)       46.90 (-288.51%)
Stddev    2048      106.56 (   0.00%)       35.02 (  67.14%)      180.05 ( -68.97%)       86.00 (  19.29%)      339.19 (-218.32%)
Stddev    8192      649.77 (   0.00%)      123.38 (  81.01%)      329.45 (  49.30%)      448.67 (  30.95%)     1120.36 ( -72.42%)

HVM-v8
------
                          v-HVM-v8-HT          v-HVM-v8-noHT              HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Hmean     64       1682.93 (   0.00%)     1685.11 (   0.13%)     1756.72 *   4.38%*     1687.44 (   0.27%)     1644.27 *  -2.30%*
Hmean     256      6203.93 (   0.00%)     6125.84 *  -1.26%*     6335.31 *   2.12%*     6142.90 *  -0.98%*     5850.62 *  -5.69%*
Hmean     2048    29253.76 (   0.00%)    28671.53 *  -1.99%*    29738.54 *   1.66%*    28871.57 *  -1.31%*    27437.72 *  -6.21%*
Hmean     8192    46374.69 (   0.00%)    45277.03 *  -2.37%*    46926.01 *   1.19%*    46011.63 *  -0.78%*    42627.47 *  -8.08%*
Stddev    64         22.20 (   0.00%)       20.70 (   6.78%)       35.93 ( -61.82%)       13.81 (  37.79%)       14.56 (  34.43%)
Stddev    256        21.74 (   0.00%)       72.01 (-231.20%)      110.09 (-406.36%)       53.63 (-146.70%)      142.11 (-553.63%)
Stddev    2048      286.90 (   0.00%)      147.25 (  48.68%)      172.37 (  39.92%)      189.21 (  34.05%)     1004.81 (-250.23%)
Stddev    8192      259.59 (   0.00%)      232.26 (  10.53%)      453.49 ( -74.70%)      239.72 (   7.65%)     1262.32 (-386.28%)

HVM-v4
------
                          v-HVM-v4-HT          v-HVM-v4-noHT              HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Hmean     64       1685.06 (   0.00%)     1699.80 (   0.87%)     1693.76 (   0.52%)     1681.73 (  -0.20%)     1642.82 *  -2.51%*
Hmean     256      6210.06 (   0.00%)     6134.29 *  -1.22%*     6131.33 *  -1.27%*     6141.35 *  -1.11%*     5977.71 *  -3.74%*
Hmean     2048    28890.04 (   0.00%)    28999.17 (   0.38%)    28965.94 (   0.26%)    28931.03 (   0.14%)    27758.99 (  -3.92%)
Hmean     8192    45729.95 (   0.00%)    45801.43 (   0.16%)    45750.33 (   0.04%)    46037.28 *   0.67%*    46301.78 (   1.25%)
Stddev    64         20.94 (   0.00%)       22.24 (  -6.24%)       20.10 (   4.03%)       10.70 (  48.90%)       45.83 (-118.87%)
Stddev    256        26.43 (   0.00%)       51.40 ( -94.47%)       47.00 ( -77.85%)       32.37 ( -22.46%)       45.15 ( -70.82%)
Stddev    2048      169.64 (   0.00%)      149.97 (  11.60%)      205.29 ( -21.01%)      148.42 (  12.51%)     1950.42 (-1049.71%)
Stddev    8192      227.60 (   0.00%)      116.14 (  48.97%)      515.92 (-126.67%)       62.10 (  72.72%)    11909.19 (-5132.40%)
CoeffVar  64          1.24 (   0.00%)        1.31 (  -5.31%)        1.19 (   4.52%)        0.64 (  48.79%)        2.79 (-124.39%)
CoeffVar  256         0.43 (   0.00%)        0.84 ( -96.86%)        0.77 ( -80.12%)        0.53 ( -23.83%)        0.76 ( -77.46%)
CoeffVar  2048        0.59 (   0.00%)        0.52 (  11.93%)        0.71 ( -20.69%)        0.51 (  12.63%)        7.00 (-1092.17%)
CoeffVar  8192        0.50 (   0.00%)        0.25 (  49.05%)        1.13 (-126.56%)        0.13 (  72.90%)       24.76 (-4875.60%)

HVMx2
-----
                        v-HVMx2-v8-HT        v-HVMx2-v8-noHT            HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Hmean     64       1252.34 (   0.00%)     1020.91 * -18.48%*     1272.66 *   1.62%*      996.12 * -20.46%*      968.44 * -22.67%*
Hmean     256      4369.71 (   0.00%)     3680.51 * -15.77%*     4376.58 (   0.16%)     3565.25 * -18.41%*     4088.77 *  -6.43%*
Hmean     2048    23712.23 (   0.00%)    18259.78 * -22.99%*    24077.77 *   1.54%*    17496.21 * -26.21%*    21709.20 *  -8.45%*
Hmean     8192    43290.16 (   0.00%)    35700.62 * -17.53%*    43332.46 (   0.10%)    32506.09 * -24.91%*    36700.18 * -15.22%*
Stddev    64         13.74 (   0.00%)        3.98 (  71.03%)       13.14 (   4.38%)        3.25 (  76.32%)      221.08 (-1509.09%)
Stddev    256        40.68 (   0.00%)        5.67 (  86.06%)       38.64 (   5.01%)       43.54 (  -7.02%)      316.66 (-678.40%)
Stddev    2048      304.70 (   0.00%)       62.11 (  79.62%)      202.53 (  33.53%)      117.33 (  61.49%)      208.33 (  31.63%)
Stddev    8192      659.30 (   0.00%)      116.45 (  82.34%)      274.71 (  58.33%)      299.39 (  54.59%)      412.99 (  37.36%)


NETPERF-UNIX
============
Dom0
----
                             v-XEN-HT             v-XEN-noHT                 XEN-HT               XEN-noHT                XEN-csc
Hmean     64        499.62 (   0.00%)      509.03 *   1.88%*      492.89 (  -1.35%)      485.34 (  -2.86%)      451.13 *  -9.70%*
Hmean     256      1900.97 (   0.00%)     1894.53 (  -0.34%)     1852.77 (  -2.54%)     1835.34 (  -3.45%)     1272.07 * -33.08%*
Hmean     2048     2892.22 (   0.00%)     2906.15 (   0.48%)     2765.65 *  -4.38%*     2747.32 *  -5.01%*     3086.19 (   6.71%)
Hmean     8192     3105.85 (   0.00%)     3158.33 *   1.69%*     2996.64 *  -3.52%*     2968.85 *  -4.41%*     2731.31 * -12.06%*
Stddev    64          4.49 (   0.00%)        3.39 (  24.38%)       10.64 (-137.15%)       20.01 (-345.94%)       25.28 (-463.31%)
Stddev    256        87.54 (   0.00%)       51.32 (  41.37%)       54.63 (  37.59%)       15.13 (  82.72%)      393.61 (-349.64%)
Stddev    2048       35.26 (   0.00%)       33.35 (   5.41%)       18.20 (  48.39%)        9.37 (  73.41%)      946.82 (-2585.23%)
Stddev    8192       21.32 (   0.00%)       32.53 ( -52.60%)       15.74 (  26.15%)       11.30 (  46.98%)      104.54 (-390.43%)

HVM-v8
------
                          v-HVM-v8-HT          v-HVM-v8-noHT              HVM-v8-HT            HVM-v8-noHT             HVM-v8-csc
Hmean     64        312.04 (   0.00%)      291.15 (  -6.69%)      275.22 ( -11.80%)      288.06 (  -7.68%)      277.23 ( -11.15%)
Hmean     256       853.06 (   0.00%)      929.15 (   8.92%)      977.34 (  14.57%)      877.63 (   2.88%)      761.25 ( -10.76%)
Hmean     2048     5564.16 (   0.00%)     5332.08 (  -4.17%)     4702.72 ( -15.48%)     5819.12 (   4.58%)     5531.96 (  -0.58%)
Hmean     8192     6570.41 (   0.00%)     8275.74 *  25.95%*     5962.89 (  -9.25%)     6462.60 (  -1.64%)     5387.48 ( -18.00%)
Stddev    64         35.97 (   0.00%)       34.70 (   3.53%)       48.37 ( -34.46%)       46.27 ( -28.62%)       59.79 ( -66.21%)
Stddev    256       143.59 (   0.00%)       68.75 (  52.12%)      123.98 (  13.65%)       48.93 (  65.92%)      163.00 ( -13.52%)
Stddev    2048     1182.43 (   0.00%)      842.67 (  28.73%)     1181.41 (   0.09%)      761.99 (  35.56%)      827.15 (  30.05%)
Stddev    8192     1331.69 (   0.00%)     1099.39 (  17.44%)     1708.12 ( -28.27%)     1247.83 (   6.30%)     1065.82 (  19.96%)

HVM-v4
------
                          v-HVM-v4-HT          v-HVM-v4-noHT              HVM-v4-HT            HVM-v4-noHT             HVM-v4-csc
Hmean     64        248.93 (   0.00%)      245.55 (  -1.36%)      225.32 (  -9.49%)      239.36 (  -3.84%)      203.05 * -18.43%*
Hmean     256       717.78 (   0.00%)      741.28 (   3.27%)      707.19 (  -1.48%)      782.19 (   8.97%)      751.77 (   4.74%)
Hmean     2048     4658.71 (   0.00%)     5209.89 (  11.83%)     2719.26 * -41.63%*     4156.29 ( -10.78%)     3624.27 ( -22.20%)
Hmean     8192     4874.84 (   0.00%)     4483.10 (  -8.04%)     4263.18 ( -12.55%)     5724.08 *  17.42%*     3579.16 * -26.58%*
Stddev    64         23.09 (   0.00%)       58.27 (-152.37%)       46.49 (-101.39%)       31.12 ( -34.79%)       11.06 (  52.09%)
Stddev    256       104.17 (   0.00%)      133.06 ( -27.73%)      119.32 ( -14.54%)      130.87 ( -25.62%)       87.78 (  15.74%)
Stddev    2048      417.11 (   0.00%)      521.74 ( -25.09%)     1030.34 (-147.02%)      907.53 (-117.58%)     1715.05 (-311.18%)
Stddev    8192      825.38 (   0.00%)     1426.19 ( -72.79%)      821.15 (   0.51%)      275.68 (  66.60%)     1131.40 ( -37.08%)

HVMx2
-----
                        v-HVMx2-v8-HT        v-HVMx2-v8-noHT            HVMx2-v8-HT          HVMx2-v8-noHT           HVMx2-v8-csc
Hmean     64        230.53 (   0.00%)      148.42 * -35.62%*      243.29 (   5.53%)      165.06 * -28.40%*      188.01 * -18.45%*
Hmean     256       790.90 (   0.00%)      482.51 * -38.99%*      711.91 (  -9.99%)      472.92 * -40.21%*      504.46 * -36.22%*
Hmean     2048     3783.93 (   0.00%)     2172.49 * -42.59%*     3163.60 * -16.39%*     1955.93 * -48.31%*     2880.11 * -23.89%*
Hmean     8192     4051.12 (   0.00%)     2764.79 * -31.75%*     4277.51 (   5.59%)     2377.98 * -41.30%*     3086.59 * -23.81%*
Stddev    64         26.71 (   0.00%)       15.99 (  40.13%)       21.43 (  19.76%)       19.08 (  28.56%)       29.45 ( -10.26%)
Stddev    256       112.58 (   0.00%)       24.49 (  78.25%)       86.25 (  23.39%)       50.56 (  55.09%)      112.78 (  -0.18%)
Stddev    2048      535.34 (   0.00%)      395.47 (  26.13%)      311.30 (  41.85%)      317.03 (  40.78%)      610.38 ( -14.02%)
Stddev    8192      490.41 (   0.00%)      835.97 ( -70.46%)      208.86 (  57.41%)      314.18 (  35.94%)      580.38 ( -18.35%)

-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-05 13:56   ` Dario Faggioli
@ 2019-07-15 14:08     ` Sergey Dyasli
  2019-07-18 14:48       ` Juergen Gross
  0 siblings, 1 reply; 202+ messages in thread
From: Sergey Dyasli @ 2019-07-15 14:08 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel, Juergen Gross
  Cc: sergey.dyasli@citrix.com >> Sergey Dyasli,
	Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, RobertVanVossen,
	Tim Deegan, Julien Grall, Josh Whitehead, Meng Xu, Jan Beulich,
	Roger Pau Monné

On 05/07/2019 14:56, Dario Faggioli wrote:
> On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote:
>> 1) This crash is quite likely to happen:
>>
>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects
>> that CPU2 is stuck!
>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] ----[ Xen-4.13.0-
>> 8.0.6-d  x86_64  debug=y   Not tainted ]----
>> [...]
>> [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace:
>> [2019-07-04 18:22:47 UTC] (XEN) [
>> 3425.505278]    [<ffff82d08023d578>] vcpu_sleep_sync+0x50/0x71
>> [2019-07-04 18:22:47 UTC] (XEN) [
>> 3425.511518]    [<ffff82d080208370>] vcpu_pause+0x21/0x23
>> [2019-07-04 18:22:47 UTC] (XEN) [
>> 3425.517326]    [<ffff82d08023e25d>]
>> vcpu_set_periodic_timer+0x27/0x73
>> [2019-07-04 18:22:47 UTC] (XEN) [
>> 3425.524258]    [<ffff82d080209682>] do_vcpu_op+0x2c9/0x668
>> [2019-07-04 18:22:47 UTC] (XEN) [
>> 3425.530238]    [<ffff82d08024f970>] compat_vcpu_op+0x250/0x390
>> [2019-07-04 18:22:47 UTC] (XEN) [
>> 3425.536566]    [<ffff82d080383964>] pv_hypercall+0x364/0x564
>> [2019-07-04 18:22:47 UTC] (XEN) [
>> 3425.542719]    [<ffff82d080385644>] do_entry_int82+0x26/0x2d
>> [2019-07-04 18:22:47 UTC] (XEN) [
>> 3425.548876]    [<ffff82d08038839b>] entry_int82+0xbb/0xc0
>>
> Mmm... vcpu_set_periodic_timer?
> 
> What kernel is this and when does this crash happen?

Hi Dario,

I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 2GB RAM)
which has the following kernel:

# uname -a

Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux

All I need to do is suspend and resume the VM.

Thanks,
Sergey

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-05 13:17 ` [Xen-devel] [PATCH 00/60] xen: add core scheduling support Sergey Dyasli
  2019-07-05 13:22   ` Juergen Gross
  2019-07-05 13:56   ` Dario Faggioli
@ 2019-07-16 15:45   ` Sergey Dyasli
  2019-07-19 13:35     ` Juergen Gross
  2019-07-25 16:01   ` Sergey Dyasli
  3 siblings, 1 reply; 202+ messages in thread
From: Sergey Dyasli @ 2019-07-16 15:45 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sergey.dyasli@citrix.com >> Sergey Dyasli,
	Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Robert VanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, Ian Jackson, Roger Pau Monné

On 05/07/2019 14:17, Sergey Dyasli wrote:
> [2019-07-05 00:37:16 UTC] (XEN) [24907.482686] Watchdog timer detects that CPU30 is stuck!
> [2019-07-05 00:37:16 UTC] (XEN) [24907.514180] ----[ Xen-4.13.0-8.0.6-d  x86_64  debug=y   Not tainted ]----
> [2019-07-05 00:37:16 UTC] (XEN) [24907.552070] CPU:    30
> [2019-07-05 00:37:16 UTC] (XEN) [24907.565281] RIP:    e008:[<ffff82d0802406fc>] sched_context_switched+0xaf/0x101
> [2019-07-05 00:37:16 UTC] (XEN) [24907.601232] RFLAGS: 0000000000000202   CONTEXT: hypervisor
> [2019-07-05 00:37:16 UTC] (XEN) [24907.629998] rax: 0000000000000002   rbx: ffff83202782e880   rcx: 000000000000001e
> [2019-07-05 00:37:16 UTC] (XEN) [24907.669651] rdx: ffff83202782e904   rsi: ffff832027823000   rdi: ffff832027823000
> [2019-07-05 00:37:16 UTC] (XEN) [24907.706560] rbp: ffff83403cab7d20   rsp: ffff83403cab7d00   r8:  0000000000000000
> [2019-07-05 00:37:16 UTC] (XEN) [24907.743258] r9:  0000000000000000   r10: 0200200200200200   r11: 0100100100100100
> [2019-07-05 00:37:16 UTC] (XEN) [24907.779940] r12: ffff832027823000   r13: ffff832027823000   r14: ffff83202782e7b0
> [2019-07-05 00:37:16 UTC] (XEN) [24907.816849] r15: ffff83202782e880   cr0: 000000008005003b   cr4: 00000000000426e0
> [2019-07-05 00:37:16 UTC] (XEN) [24907.854125] cr3: 00000000bd8a1000   cr2: 000000001851b798
> [2019-07-05 00:37:16 UTC] (XEN) [24907.881483] fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> [2019-07-05 00:37:16 UTC] (XEN) [24907.918309] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> [2019-07-05 00:37:16 UTC] (XEN) [24907.952619] Xen code around <ffff82d0802406fc> (sched_context_switched+0xaf/0x101):
> [2019-07-05 00:37:16 UTC] (XEN) [24907.990277]  00 00 eb 18 f3 90 8b 02 <85> c0 75 f8 eb 0e 49 8b 7e 30 48 85 ff 74 05 e8
> [2019-07-05 00:37:16 UTC] (XEN) [24908.032393] Xen stack trace from rsp=ffff83403cab7d00:
> [2019-07-05 00:37:16 UTC] (XEN) [24908.061298]    ffff832027823000 ffff832027823000 0000000000000000 ffff83202782e880
> [2019-07-05 00:37:16 UTC] (XEN) [24908.098529]    ffff83403cab7d60 ffff82d0802407c0 0000000000000082 ffff83202782e7c8
> [2019-07-05 00:37:16 UTC] (XEN) [24908.135622]    000000000000001e ffff83202782e7c8 000000000000001e ffff82d080602628
> [2019-07-05 00:37:16 UTC] (XEN) [24908.172671]    ffff83403cab7dc0 ffff82d080240d83 000000000000df99 000000000000001e
> [2019-07-05 00:37:16 UTC] (XEN) [24908.210212]    ffff832027823000 000016a62dc8c6bc 000000fc00000000 000000000000001e
> [2019-07-05 00:37:16 UTC] (XEN) [24908.247181]    ffff83202782e7c8 ffff82d080602628 ffff82d0805da460 000000000000001e
> [2019-07-05 00:37:16 UTC] (XEN) [24908.284279]    ffff83403cab7e60 ffff82d080240ea4 00000002802aecc5 ffff832027823000
> [2019-07-05 00:37:16 UTC] (XEN) [24908.321128]    ffff83202782e7b0 ffff83202782e880 ffff83403cab7e10 ffff82d080273b4e
> [2019-07-05 00:37:16 UTC] (XEN) [24908.358308]    ffff83403cab7e10 ffff82d080242f7f ffff83403cab7e60 ffff82d08024663a
> [2019-07-05 00:37:17 UTC] (XEN) [24908.395662]    ffff83403cab7ea0 ffff82d0802ec32a ffff8340000000ff ffff82d0805bc880
> [2019-07-05 00:37:17 UTC] (XEN) [24908.432376]    ffff82d0805bb980 ffffffffffffffff ffff83403cab7fff 000000000000001e
> [2019-07-05 00:37:17 UTC] (XEN) [24908.469812]    ffff83403cab7e90 ffff82d080242575 0000000000000f00 ffff82d0805bb980
> [2019-07-05 00:37:17 UTC] (XEN) [24908.508373]    000000000000001e ffff82d0806026f0 ffff83403cab7ea0 ffff82d0802425ca
> [2019-07-05 00:37:17 UTC] (XEN) [24908.549856]    ffff83403cab7ef0 ffff82d08027a601 ffff82d080242575 0000001e7ffde000
> [2019-07-05 00:37:17 UTC] (XEN) [24908.588022]    ffff832027823000 ffff832027823000 ffff83127ffde000 ffff83203ffe5000
> [2019-07-05 00:37:17 UTC] (XEN) [24908.625217]    000000000000001e ffff831204092000 ffff83403cab7d78 00000000ffffffed
> [2019-07-05 00:37:17 UTC] (XEN) [24908.662932]    ffffffff81800000 0000000000000000 ffffffff81800000 0000000000000000
> [2019-07-05 00:37:17 UTC] (XEN) [24908.703246]    ffffffff818f4580 ffff880039118848 00000e6a3c4b2698 00000000148900db
> [2019-07-05 00:37:17 UTC] (XEN) [24908.743671]    0000000000000000 ffffffff8101e650 ffffffff8185c3e0 0000000000000000
> [2019-07-05 00:37:17 UTC] (XEN) [24908.781927]    0000000000000000 0000000000000000 0000beef0000beef ffffffff81054eb2
> [2019-07-05 00:37:17 UTC] (XEN) [24908.820986] Xen call trace:
> [2019-07-05 00:37:17 UTC] (XEN) [24908.836789]    [<ffff82d0802406fc>] sched_context_switched+0xaf/0x101
> [2019-07-05 00:37:17 UTC] (XEN) [24908.869916]    [<ffff82d0802407c0>] schedule.c#sched_context_switch+0x72/0x151
> [2019-07-05 00:37:17 UTC] (XEN) [24908.907384]    [<ffff82d080240d83>] schedule.c#sched_slave+0x2a3/0x2b2
> [2019-07-05 00:37:17 UTC] (XEN) [24908.941241]    [<ffff82d080240ea4>] schedule.c#schedule+0x112/0x2a1
> [2019-07-05 00:37:17 UTC] (XEN) [24908.973939]    [<ffff82d080242575>] softirq.c#__do_softirq+0x85/0x90
> [2019-07-05 00:37:17 UTC] (XEN) [24909.007101]    [<ffff82d0802425ca>] do_softirq+0x13/0x15
> [2019-07-05 00:37:17 UTC] (XEN) [24909.035971]    [<ffff82d08027a601>] domain.c#idle_loop+0xad/0xc0
> [2019-07-05 00:37:17 UTC] (XEN) [24909.070546]
> [2019-07-05 00:37:17 UTC] (XEN) [24909.080286] CPU0 @ e008:ffff82d0802431ba (stop_machine.c#stopmachine_wait_state+0x1a/0x24)
> [2019-07-05 00:37:17 UTC] (XEN) [24909.122896] CPU1 @ e008:ffff82d0802406f8 (sched_context_switched+0xab/0x101)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.159518] CPU3 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.199607] CPU2 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.235773] CPU5 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.276039] CPU4 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.312371] CPU7 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.352930] CPU6 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.388928] CPU8 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.424664] CPU9 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.465376] CPU10 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.507449] CPU11 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.544703] CPU13 @ e008:ffff82d0802431f2 (stop_machine.c#stopmachine_action+0x2e/0xa0)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.588884] CPU12 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.625781] CPU15 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.666649] CPU14 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.703396] CPU17 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.744089] CPU16 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.781117] CPU23 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.821692] CPU22 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.858139] CPU27 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
> [2019-07-05 00:37:18 UTC] (XEN) [24909.898704] CPU26 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
> [2019-07-05 00:37:19 UTC] (XEN) [24909.936069] CPU19 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
> [2019-07-05 00:37:19 UTC] (XEN) [24909.977291] CPU18 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
> [2019-07-05 00:37:19 UTC] (XEN) [24910.014078] CPU31 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
> [2019-07-05 00:37:19 UTC] (XEN) [24910.055692] CPU21 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
> [2019-07-05 00:37:19 UTC] (XEN) [24910.100486] CPU24 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
> [2019-07-05 00:37:19 UTC] (XEN) [24910.136824] CPU25 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
> [2019-07-05 00:37:19 UTC] (XEN) [24910.177529] CPU29 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
> [2019-07-05 00:37:19 UTC] (XEN) [24910.218420] CPU28 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
> [2019-07-05 00:37:19 UTC] (XEN) [24910.255219] CPU20 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
> [2019-07-05 00:37:19 UTC] (XEN) [24910.292152]
> [2019-07-05 00:37:19 UTC] (XEN) [24910.301667] ****************************************
> [2019-07-05 00:37:19 UTC] (XEN) [24910.327892] Panic on CPU 30:
> [2019-07-05 00:37:19 UTC] (XEN) [24910.344165] FATAL TRAP: vector = 2 (nmi)
> [2019-07-05 00:37:19 UTC] (XEN) [24910.365476] [error_code=0000]
> [2019-07-05 00:37:19 UTC] (XEN) [24910.382509] ****************************************
> [2019-07-05 00:37:19 UTC] (XEN) [24910.408547]
> [2019-07-05 00:37:19 UTC] (XEN) [24910.418129] Reboot in five seconds...

On a closer look, the second crash happens when you try to shutdown
the host ("poweroff" in my case).

Thanks,
Sergey


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-15 14:08     ` Sergey Dyasli
@ 2019-07-18 14:48       ` Juergen Gross
  2019-07-18 15:14         ` Sergey Dyasli
  0 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-07-18 14:48 UTC (permalink / raw)
  To: Sergey Dyasli, xen-devel, Dario Faggioli
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, RobertVanVossen,
	Tim Deegan, Julien Grall, Josh Whitehead, Meng Xu, Jan Beulich,
	Roger Pau Monné

On 15.07.19 16:08, Sergey Dyasli wrote:
> On 05/07/2019 14:56, Dario Faggioli wrote:
>> On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote:
>>> 1) This crash is quite likely to happen:
>>>
>>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects
>>> that CPU2 is stuck!
>>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] ----[ Xen-4.13.0-
>>> 8.0.6-d  x86_64  debug=y   Not tainted ]----
>>> [...]
>>> [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace:
>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>> 3425.505278]    [<ffff82d08023d578>] vcpu_sleep_sync+0x50/0x71
>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>> 3425.511518]    [<ffff82d080208370>] vcpu_pause+0x21/0x23
>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>> 3425.517326]    [<ffff82d08023e25d>]
>>> vcpu_set_periodic_timer+0x27/0x73
>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>> 3425.524258]    [<ffff82d080209682>] do_vcpu_op+0x2c9/0x668
>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>> 3425.530238]    [<ffff82d08024f970>] compat_vcpu_op+0x250/0x390
>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>> 3425.536566]    [<ffff82d080383964>] pv_hypercall+0x364/0x564
>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>> 3425.542719]    [<ffff82d080385644>] do_entry_int82+0x26/0x2d
>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>> 3425.548876]    [<ffff82d08038839b>] entry_int82+0xbb/0xc0
>>>
>> Mmm... vcpu_set_periodic_timer?
>>
>> What kernel is this and when does this crash happen?
> 
> Hi Dario,
> 
> I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 2GB RAM)
> which has the following kernel:
> 
> # uname -a
> 
> Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux
> 
> All I need to do is suspend and resume the VM.

Happens with a more recent kernel, too.

I can easily reproduce the issue with any PV guest with more than 1 vcpu
by doing "xl save" and then "xl restore" again.

With the reproducer being available I'm now diving into the issue...


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-18 14:48       ` Juergen Gross
@ 2019-07-18 15:14         ` Sergey Dyasli
  2019-07-18 16:04           ` Dario Faggioli
                             ` (2 more replies)
  0 siblings, 3 replies; 202+ messages in thread
From: Sergey Dyasli @ 2019-07-18 15:14 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, RobertVanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, sergey.dyasli@citrix.com >> Sergey Dyasli,
	Roger Pau Monné

On 18/07/2019 15:48, Juergen Gross wrote:
> On 15.07.19 16:08, Sergey Dyasli wrote:
>> On 05/07/2019 14:56, Dario Faggioli wrote:
>>> On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote:
>>>> 1) This crash is quite likely to happen:
>>>>
>>>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects
>>>> that CPU2 is stuck!
>>>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] ----[ Xen-4.13.0-
>>>> 8.0.6-d  x86_64  debug=y   Not tainted ]----
>>>> [...]
>>>> [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace:
>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>> 3425.505278]    [<ffff82d08023d578>] vcpu_sleep_sync+0x50/0x71
>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>> 3425.511518]    [<ffff82d080208370>] vcpu_pause+0x21/0x23
>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>> 3425.517326]    [<ffff82d08023e25d>]
>>>> vcpu_set_periodic_timer+0x27/0x73
>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>> 3425.524258]    [<ffff82d080209682>] do_vcpu_op+0x2c9/0x668
>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>> 3425.530238]    [<ffff82d08024f970>] compat_vcpu_op+0x250/0x390
>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>> 3425.536566]    [<ffff82d080383964>] pv_hypercall+0x364/0x564
>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>> 3425.542719]    [<ffff82d080385644>] do_entry_int82+0x26/0x2d
>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>> 3425.548876]    [<ffff82d08038839b>] entry_int82+0xbb/0xc0
>>>>
>>> Mmm... vcpu_set_periodic_timer?
>>>
>>> What kernel is this and when does this crash happen?
>>
>> Hi Dario,
>>
>> I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 2GB RAM)
>> which has the following kernel:
>>
>> # uname -a
>>
>> Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux
>>
>> All I need to do is suspend and resume the VM.
> 
> Happens with a more recent kernel, too.
> 
> I can easily reproduce the issue with any PV guest with more than 1 vcpu
> by doing "xl save" and then "xl restore" again.
> 
> With the reproducer being available I'm now diving into the issue...

One further thing to add is that I was able to avoid the crash by reverting

	xen/sched: rework and rename vcpu_force_reschedule()

which is a part of the series. This made all tests with PV guests pass.

Another question I have is do you have a git branch with core-scheduling
patches rebased on top of current staging available somewhere?

Thanks,
Sergey

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-18 15:14         ` Sergey Dyasli
@ 2019-07-18 16:04           ` Dario Faggioli
  2019-07-19  5:41           ` Juergen Gross
  2019-07-19 13:57           ` Juergen Gross
  2 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-18 16:04 UTC (permalink / raw)
  To: Sergey Dyasli, Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, RobertVanVossen,
	Julien Grall, Josh Whitehead, Meng Xu, Jan Beulich, Ian Jackson,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1199 bytes --]

On Thu, 2019-07-18 at 16:14 +0100, Sergey Dyasli wrote:
> On 18/07/2019 15:48, Juergen Gross wrote:
> > 
> > I can easily reproduce the issue with any PV guest with more than 1
> > vcpu
> > by doing "xl save" and then "xl restore" again.
> > 
> > With the reproducer being available I'm now diving into the
> > issue...
> 
> One further thing to add is that I was able to avoid the crash by
> reverting
> 
> 	xen/sched: rework and rename vcpu_force_reschedule()
> 
Ah, interesting!

> which is a part of the series. This made all tests with PV guests
> pass.
> 
That's good news. :-)

> Another question I have is do you have a git branch with core-
> scheduling
> patches rebased on top of current staging available somewhere?
> 
For my benchmarks, I used this:

https://github.com/jgross1/xen/tree/sched-v1-rebase

I don't know if Juergen has another one which is even more updated.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 04/60] xen/sched: use new sched_unit instead of vcpu in scheduler interfaces
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-07-18 17:44   ` Dario Faggioli
  2019-07-19  4:49     ` Juergen Gross
  -1 siblings, 1 reply; 202+ messages in thread
From: Dario Faggioli @ 2019-07-18 17:44 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, tim, ian.jackson,
	robert.vanvossen, julien.grall, josh.whitehead, mengxu,
	Jan Beulich, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 1806 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> In order to prepare core- and socket-scheduling use a new struct
> sched_unit instead of struct vcpu for interfaces of the different
> schedulers.
> 
> Rename the per-scheduler functions insert_vcpu and remove_vcpu to
> insert_unit and remove_unit to reflect the change of the parameter.
> In the schedulers rename local functions switched to sched_unit, too.
> 
> For now this new struct will contain a vcpu pointer only and is
> allocated on the stack. This will be changed later.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
This looks good to me.

One thing that came to mind, is that the various function parameters
and local variables called 'unit', could be called 'su'.

It's a contraction of 'sched_unit', like, e.g., 'v' or 'vc' were
contractions of 'vcpu', it's still quite descriptive, it's short, which
is always good, IMO, and might mean less line wrap reformatting
(considering that it's replacing 'v' or 'vc').

Of course, this will likely mean changing all the other ~60 patches, so
I'll understand if you say that it would be too much.

Also...

> index 2201faca6b..72a17758a1 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -275,6 +275,10 @@ struct vcpu
>      struct arch_vcpu arch;
>  };
>  
> +struct sched_unit {
> +    struct vcpu           *vcpu;
> +};
> +
>
Is my understanding correct that this field is going to be renamed
vcpu_list, right from this patch?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 05/60] xen/sched: alloc struct sched_unit for each vcpu
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-07-18 17:57   ` Dario Faggioli
  2019-07-19  4:56     ` Juergen Gross
  -1 siblings, 1 reply; 202+ messages in thread
From: Dario Faggioli @ 2019-07-18 17:57 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, ian.jackson, tim,
	julien.grall, Jan Beulich, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 1225 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> Allocate a struct sched_unit for each vcpu. This removes the need to
> have it locally on the stack in schedule.c.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
And this patch looks ok as well.

However, I don't see much value in not doing what's done here in patch
4 already (it's pretty much only line changed by patch 4 that are being
changed again here).

Is there a particular reason you think it's important to keep these two
patches separated?

Ah, my comment about 'unit' --> 'su' --in case you think it's feasible-
- applies to struct members as well, of course, e.g., here:

>@@ -160,6 +161,7 @@ struct vcpu
>  
>      struct timer     poll_timer;    /* timeout for SCHEDOP_poll */
>  
> +    struct sched_unit *sched_unit;
>      void            *sched_priv;    /* scheduler-specific data */
>  
>      struct vcpu_runstate_info runstate;

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 06/60] xen/sched: move per-vcpu scheduler private data pointer to sched_unit
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-07-18 22:52   ` Dario Faggioli
  2019-07-19  5:03     ` Juergen Gross
  -1 siblings, 1 reply; 202+ messages in thread
From: Dario Faggioli @ 2019-07-18 22:52 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, tim, ian.jackson,
	robert.vanvossen, julien.grall, josh.whitehead, mengxu,
	Jan Beulich, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 1590 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> This prepares making the different schedulers vcpu agnostic.
> 
Ok, but the scheduler private data is, actually, for all the
schedulers, per-vcpu scheduler data such as, for instance,
struct csched2_vcpu.

After this patch we have (sticking to Credit2 as an example)
csched2_vcpu allocated by a function called csched2_alloc_vdata() but
stored on a per-sched_unit basis.

Similarly, we have an accessor method called csched2_vcpu() which
returns the struct csched2_vcpu which was stored in the per-unit
private space.

But shouldn't then the struct be called csched2_unit, and cited
functions be called csched2_alloc_udata() and csched2_unit()?
Now, I know that these transformation happen later in the series.
And, this time, I'm not asking to consolidate the patches.

However:
- can the changelog of this patch be a little more explicit, not only 
  informing that this is a preparatory patch, but also explaining 
  briefly the temporary inconcistency
- could the patches that deal with this be grouped together, so that 
  they are close to each other in the series (e.g., this patch, the 
  renaming hunks of patch 10 and patches that are currently 20 to 24)?
  Or are there dependencies that I am not considering?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 07/60] xen/sched: build a linked list of struct sched_unit
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-07-19  0:01   ` Dario Faggioli
  2019-07-19  5:07     ` Juergen Gross
  -1 siblings, 1 reply; 202+ messages in thread
From: Dario Faggioli @ 2019-07-19  0:01 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, ian.jackson, tim,
	julien.grall, Jan Beulich, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 1675 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> In order to make it easy to iterate over sched_unit elements of a
> domain build a single linked list and add an iterator for it.
>
How about a ',' between domain and build?

> For completeness add another iterator for_each_sched_unit_vcpu()
> which
> will iterate over all vcpus if a sched_unit (right now only one).
> This

"over all vcpus of a sched_unit" ?

> will be needed later for larger scheduling granularity (e.g. cores).
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
One question:

> @@ -279,8 +279,16 @@ struct vcpu
>  struct sched_unit {
>      struct vcpu           *vcpu;
>      void                  *priv;      /* scheduler private data */
> +    struct sched_unit     *next_in_list;
>  };
>  
> +#define for_each_sched_unit(d,
> e)                                         \
> +    for ( (e) = (d)->sched_unit_list; (e) != NULL; (e) = (e)-
> >next_in_list )
> +
> +#define for_each_sched_unit_vcpu(i,
> v)                                    \
> +    for ( (v) = (i)->vcpu; (v) != NULL && (v)->sched_unit ==
> (i);         \
> +          (v) = (v)->next_in_list )
> +
>
So, here... sorry if it's me not seeing it, but why the 
(v)->sched_unit == (i) check is necessary?

Do we expect to put in the list of vcpus of a particular unit, vcpus
that are in another unit?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 04/60] xen/sched: use new sched_unit instead of vcpu in scheduler interfaces
  2019-07-18 17:44   ` Dario Faggioli
@ 2019-07-19  4:49     ` Juergen Gross
  2019-07-19 17:01       ` Dario Faggioli
  0 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-07-19  4:49 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, tim, ian.jackson,
	robert.vanvossen, julien.grall, josh.whitehead, mengxu,
	Jan Beulich, andrew.cooper3

On 18.07.19 19:44, Dario Faggioli wrote:
> On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
>> In order to prepare core- and socket-scheduling use a new struct
>> sched_unit instead of struct vcpu for interfaces of the different
>> schedulers.
>>
>> Rename the per-scheduler functions insert_vcpu and remove_vcpu to
>> insert_unit and remove_unit to reflect the change of the parameter.
>> In the schedulers rename local functions switched to sched_unit, too.
>>
>> For now this new struct will contain a vcpu pointer only and is
>> allocated on the stack. This will be changed later.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
> This looks good to me.
> 
> One thing that came to mind, is that the various function parameters
> and local variables called 'unit', could be called 'su'.
> 
> It's a contraction of 'sched_unit', like, e.g., 'v' or 'vc' were
> contractions of 'vcpu', it's still quite descriptive, it's short, which
> is always good, IMO, and might mean less line wrap reformatting
> (considering that it's replacing 'v' or 'vc').
> 
> Of course, this will likely mean changing all the other ~60 patches, so
> I'll understand if you say that it would be too much.

I prefer "unit", as it is more readable and the effort to do the change
would be quite large (replacing "item" by "unit" was doable via sed,
while this change would require more manual intervention).

> 
> Also...
> 
>> index 2201faca6b..72a17758a1 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -275,6 +275,10 @@ struct vcpu
>>       struct arch_vcpu arch;
>>   };
>>   
>> +struct sched_unit {
>> +    struct vcpu           *vcpu;
>> +};
>> +
>>
> Is my understanding correct that this field is going to be renamed
> vcpu_list, right from this patch?

Yes.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 05/60] xen/sched: alloc struct sched_unit for each vcpu
  2019-07-18 17:57   ` Dario Faggioli
@ 2019-07-19  4:56     ` Juergen Gross
  2019-07-19 17:04       ` Dario Faggioli
  0 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-07-19  4:56 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, ian.jackson, tim,
	julien.grall, Jan Beulich, andrew.cooper3

On 18.07.19 19:57, Dario Faggioli wrote:
> On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
>> Allocate a struct sched_unit for each vcpu. This removes the need to
>> have it locally on the stack in schedule.c.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
> And this patch looks ok as well.
> 
> However, I don't see much value in not doing what's done here in patch
> 4 already (it's pretty much only line changed by patch 4 that are being
> changed again here).
> 
> Is there a particular reason you think it's important to keep these two
> patches separated?

Not important, but I thought it would make it more clear.

If you like it better I can merge the two patches.

> 
> Ah, my comment about 'unit' --> 'su' --in case you think it's feasible-
> - applies to struct members as well, of course, e.g., here:
> 
>> @@ -160,6 +161,7 @@ struct vcpu
>>   
>>       struct timer     poll_timer;    /* timeout for SCHEDOP_poll */
>>   
>> +    struct sched_unit *sched_unit;
>>       void            *sched_priv;    /* scheduler-specific data */
>>   
>>       struct vcpu_runstate_info runstate;

I have to disagree here: this is no scheduler private structure, so I
believe the struct member names should be descriptive. I guess this is
the reason why there are many more struct members named "vcpu" than "v".


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 06/60] xen/sched: move per-vcpu scheduler private data pointer to sched_unit
  2019-07-18 22:52   ` Dario Faggioli
@ 2019-07-19  5:03     ` Juergen Gross
  2019-07-19 17:10       ` Dario Faggioli
  0 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-07-19  5:03 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, tim, ian.jackson,
	robert.vanvossen, julien.grall, josh.whitehead, mengxu,
	Jan Beulich, andrew.cooper3

On 19.07.19 00:52, Dario Faggioli wrote:
> On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
>> This prepares making the different schedulers vcpu agnostic.
>>
> Ok, but the scheduler private data is, actually, for all the
> schedulers, per-vcpu scheduler data such as, for instance,
> struct csched2_vcpu.
> 
> After this patch we have (sticking to Credit2 as an example)
> csched2_vcpu allocated by a function called csched2_alloc_vdata() but
> stored on a per-sched_unit basis.
> 
> Similarly, we have an accessor method called csched2_vcpu() which
> returns the struct csched2_vcpu which was stored in the per-unit
> private space.
> 
> But shouldn't then the struct be called csched2_unit, and cited
> functions be called csched2_alloc_udata() and csched2_unit()?
> Now, I know that these transformation happen later in the series.
> And, this time, I'm not asking to consolidate the patches.
> 
> However:
> - can the changelog of this patch be a little more explicit, not only
>    informing that this is a preparatory patch, but also explaining
>    briefly the temporary inconcistency

Sure.

> - could the patches that deal with this be grouped together, so that
>    they are close to each other in the series (e.g., this patch, the
>    renaming hunks of patch 10 and patches that are currently 20 to 24)?
>    Or are there dependencies that I am not considering?

There are some patches which could be reordered, but I'm not sure the
needed work is worth it. By moving the patches you named closer to each
other there will be other closely related patches ripped apart from
each other.

In the end it will make no sense to apply only some patches of the main
series. The huge amount of patches is meant only to make review easier.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 07/60] xen/sched: build a linked list of struct sched_unit
  2019-07-19  0:01   ` Dario Faggioli
@ 2019-07-19  5:07     ` Juergen Gross
  2019-07-19 17:16       ` Dario Faggioli
  0 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-07-19  5:07 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, ian.jackson, tim,
	julien.grall, Jan Beulich, andrew.cooper3

On 19.07.19 02:01, Dario Faggioli wrote:
> On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
>> In order to make it easy to iterate over sched_unit elements of a
>> domain build a single linked list and add an iterator for it.
>>
> How about a ',' between domain and build?

Okay.

> 
>> For completeness add another iterator for_each_sched_unit_vcpu()
>> which
>> will iterate over all vcpus if a sched_unit (right now only one).
>> This
> 
> "over all vcpus of a sched_unit" ?

Oh, of course!

> 
>> will be needed later for larger scheduling granularity (e.g. cores).
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
> One question:
> 
>> @@ -279,8 +279,16 @@ struct vcpu
>>   struct sched_unit {
>>       struct vcpu           *vcpu;
>>       void                  *priv;      /* scheduler private data */
>> +    struct sched_unit     *next_in_list;
>>   };
>>   
>> +#define for_each_sched_unit(d,
>> e)                                         \
>> +    for ( (e) = (d)->sched_unit_list; (e) != NULL; (e) = (e)-
>>> next_in_list )
>> +
>> +#define for_each_sched_unit_vcpu(i,
>> v)                                    \
>> +    for ( (v) = (i)->vcpu; (v) != NULL && (v)->sched_unit ==
>> (i);         \
>> +          (v) = (v)->next_in_list )
>> +
>>
> So, here... sorry if it's me not seeing it, but why the
> (v)->sched_unit == (i) check is necessary?
> 
> Do we expect to put in the list of vcpus of a particular unit, vcpus
> that are in another unit?

Yes. I'm making use of the fact that all vcpus in a unit are consecutive
as I'm re-using the already existing list of vcpus in a domain:

dom->vcpu0->vcpu1->vcpu2->vcpu3
       ^             ^
       !             !
unit0-+             !
                     !
unit2---------------+


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-18 15:14         ` Sergey Dyasli
  2019-07-18 16:04           ` Dario Faggioli
@ 2019-07-19  5:41           ` Juergen Gross
  2019-07-19 11:24             ` Juergen Gross
  2019-07-19 13:57           ` Juergen Gross
  2 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-07-19  5:41 UTC (permalink / raw)
  To: Sergey Dyasli, xen-devel
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, RobertVanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, Roger Pau Monné

On 18.07.19 17:14, Sergey Dyasli wrote:
> On 18/07/2019 15:48, Juergen Gross wrote:
>> On 15.07.19 16:08, Sergey Dyasli wrote:
>>> On 05/07/2019 14:56, Dario Faggioli wrote:
>>>> On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote:
>>>>> 1) This crash is quite likely to happen:
>>>>>
>>>>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects
>>>>> that CPU2 is stuck!
>>>>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] ----[ Xen-4.13.0-
>>>>> 8.0.6-d  x86_64  debug=y   Not tainted ]----
>>>>> [...]
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace:
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.505278]    [<ffff82d08023d578>] vcpu_sleep_sync+0x50/0x71
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.511518]    [<ffff82d080208370>] vcpu_pause+0x21/0x23
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.517326]    [<ffff82d08023e25d>]
>>>>> vcpu_set_periodic_timer+0x27/0x73
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.524258]    [<ffff82d080209682>] do_vcpu_op+0x2c9/0x668
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.530238]    [<ffff82d08024f970>] compat_vcpu_op+0x250/0x390
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.536566]    [<ffff82d080383964>] pv_hypercall+0x364/0x564
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.542719]    [<ffff82d080385644>] do_entry_int82+0x26/0x2d
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.548876]    [<ffff82d08038839b>] entry_int82+0xbb/0xc0
>>>>>
>>>> Mmm... vcpu_set_periodic_timer?
>>>>
>>>> What kernel is this and when does this crash happen?
>>>
>>> Hi Dario,
>>>
>>> I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 2GB RAM)
>>> which has the following kernel:
>>>
>>> # uname -a
>>>
>>> Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux
>>>
>>> All I need to do is suspend and resume the VM.
>>
>> Happens with a more recent kernel, too.
>>
>> I can easily reproduce the issue with any PV guest with more than 1 vcpu
>> by doing "xl save" and then "xl restore" again.
>>
>> With the reproducer being available I'm now diving into the issue...
> 
> One further thing to add is that I was able to avoid the crash by reverting
> 
> 	xen/sched: rework and rename vcpu_force_reschedule()
> 
> which is a part of the series. This made all tests with PV guests pass.

Yeah, but removing this patch is just papering over a general issue.
The main problem seems to be a vcpu trying to pause another vcpu of the
same sched_unit. I already have an idea what is really happening, I just
need to verify it.

> Another question I have is do you have a git branch with core-scheduling
> patches rebased on top of current staging available somewhere?

Only the one Dario already mentioned.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-19  5:41           ` Juergen Gross
@ 2019-07-19 11:24             ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-07-19 11:24 UTC (permalink / raw)
  To: Sergey Dyasli, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, RobertVanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, Ian Jackson, Roger Pau Monné

On 19.07.19 07:41, Juergen Gross wrote:
> On 18.07.19 17:14, Sergey Dyasli wrote:
>> On 18/07/2019 15:48, Juergen Gross wrote:
>>> On 15.07.19 16:08, Sergey Dyasli wrote:
>>>> On 05/07/2019 14:56, Dario Faggioli wrote:
>>>>> On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote:
>>>>>> 1) This crash is quite likely to happen:
>>>>>>
>>>>>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects
>>>>>> that CPU2 is stuck!
>>>>>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] ----[ Xen-4.13.0-
>>>>>> 8.0.6-d  x86_64  debug=y   Not tainted ]----
>>>>>> [...]
>>>>>> [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace:
>>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>>> 3425.505278]    [<ffff82d08023d578>] vcpu_sleep_sync+0x50/0x71
>>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>>> 3425.511518]    [<ffff82d080208370>] vcpu_pause+0x21/0x23
>>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>>> 3425.517326]    [<ffff82d08023e25d>]
>>>>>> vcpu_set_periodic_timer+0x27/0x73
>>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>>> 3425.524258]    [<ffff82d080209682>] do_vcpu_op+0x2c9/0x668
>>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>>> 3425.530238]    [<ffff82d08024f970>] compat_vcpu_op+0x250/0x390
>>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>>> 3425.536566]    [<ffff82d080383964>] pv_hypercall+0x364/0x564
>>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>>> 3425.542719]    [<ffff82d080385644>] do_entry_int82+0x26/0x2d
>>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>>> 3425.548876]    [<ffff82d08038839b>] entry_int82+0xbb/0xc0
>>>>>>
>>>>> Mmm... vcpu_set_periodic_timer?
>>>>>
>>>>> What kernel is this and when does this crash happen?
>>>>
>>>> Hi Dario,
>>>>
>>>> I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 
>>>> 2GB RAM)
>>>> which has the following kernel:
>>>>
>>>> # uname -a
>>>>
>>>> Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux
>>>>
>>>> All I need to do is suspend and resume the VM.
>>>
>>> Happens with a more recent kernel, too.
>>>
>>> I can easily reproduce the issue with any PV guest with more than 1 vcpu
>>> by doing "xl save" and then "xl restore" again.
>>>
>>> With the reproducer being available I'm now diving into the issue...
>>
>> One further thing to add is that I was able to avoid the crash by 
>> reverting
>>
>>     xen/sched: rework and rename vcpu_force_reschedule()
>>
>> which is a part of the series. This made all tests with PV guests pass.
> 
> Yeah, but removing this patch is just papering over a general issue.
> The main problem seems to be a vcpu trying to pause another vcpu of the
> same sched_unit. I already have an idea what is really happening, I just
> need to verify it.

Was another problem as I thought initially, but I've found it.

xl restore is working now. :-)


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-16 15:45   ` Sergey Dyasli
@ 2019-07-19 13:35     ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-07-19 13:35 UTC (permalink / raw)
  To: Sergey Dyasli, xen-devel
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, RobertVanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, Roger Pau Monné

On 16.07.19 17:45, Sergey Dyasli wrote:
> On 05/07/2019 14:17, Sergey Dyasli wrote:
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.482686] Watchdog timer detects that CPU30 is stuck!
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.514180] ----[ Xen-4.13.0-8.0.6-d  x86_64  debug=y   Not tainted ]----
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.552070] CPU:    30
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.565281] RIP:    e008:[<ffff82d0802406fc>] sched_context_switched+0xaf/0x101
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.601232] RFLAGS: 0000000000000202   CONTEXT: hypervisor
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.629998] rax: 0000000000000002   rbx: ffff83202782e880   rcx: 000000000000001e
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.669651] rdx: ffff83202782e904   rsi: ffff832027823000   rdi: ffff832027823000
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.706560] rbp: ffff83403cab7d20   rsp: ffff83403cab7d00   r8:  0000000000000000
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.743258] r9:  0000000000000000   r10: 0200200200200200   r11: 0100100100100100
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.779940] r12: ffff832027823000   r13: ffff832027823000   r14: ffff83202782e7b0
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.816849] r15: ffff83202782e880   cr0: 000000008005003b   cr4: 00000000000426e0
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.854125] cr3: 00000000bd8a1000   cr2: 000000001851b798
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.881483] fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.918309] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.952619] Xen code around <ffff82d0802406fc> (sched_context_switched+0xaf/0x101):
>> [2019-07-05 00:37:16 UTC] (XEN) [24907.990277]  00 00 eb 18 f3 90 8b 02 <85> c0 75 f8 eb 0e 49 8b 7e 30 48 85 ff 74 05 e8
>> [2019-07-05 00:37:16 UTC] (XEN) [24908.032393] Xen stack trace from rsp=ffff83403cab7d00:
>> [2019-07-05 00:37:16 UTC] (XEN) [24908.061298]    ffff832027823000 ffff832027823000 0000000000000000 ffff83202782e880
>> [2019-07-05 00:37:16 UTC] (XEN) [24908.098529]    ffff83403cab7d60 ffff82d0802407c0 0000000000000082 ffff83202782e7c8
>> [2019-07-05 00:37:16 UTC] (XEN) [24908.135622]    000000000000001e ffff83202782e7c8 000000000000001e ffff82d080602628
>> [2019-07-05 00:37:16 UTC] (XEN) [24908.172671]    ffff83403cab7dc0 ffff82d080240d83 000000000000df99 000000000000001e
>> [2019-07-05 00:37:16 UTC] (XEN) [24908.210212]    ffff832027823000 000016a62dc8c6bc 000000fc00000000 000000000000001e
>> [2019-07-05 00:37:16 UTC] (XEN) [24908.247181]    ffff83202782e7c8 ffff82d080602628 ffff82d0805da460 000000000000001e
>> [2019-07-05 00:37:16 UTC] (XEN) [24908.284279]    ffff83403cab7e60 ffff82d080240ea4 00000002802aecc5 ffff832027823000
>> [2019-07-05 00:37:16 UTC] (XEN) [24908.321128]    ffff83202782e7b0 ffff83202782e880 ffff83403cab7e10 ffff82d080273b4e
>> [2019-07-05 00:37:16 UTC] (XEN) [24908.358308]    ffff83403cab7e10 ffff82d080242f7f ffff83403cab7e60 ffff82d08024663a
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.395662]    ffff83403cab7ea0 ffff82d0802ec32a ffff8340000000ff ffff82d0805bc880
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.432376]    ffff82d0805bb980 ffffffffffffffff ffff83403cab7fff 000000000000001e
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.469812]    ffff83403cab7e90 ffff82d080242575 0000000000000f00 ffff82d0805bb980
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.508373]    000000000000001e ffff82d0806026f0 ffff83403cab7ea0 ffff82d0802425ca
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.549856]    ffff83403cab7ef0 ffff82d08027a601 ffff82d080242575 0000001e7ffde000
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.588022]    ffff832027823000 ffff832027823000 ffff83127ffde000 ffff83203ffe5000
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.625217]    000000000000001e ffff831204092000 ffff83403cab7d78 00000000ffffffed
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.662932]    ffffffff81800000 0000000000000000 ffffffff81800000 0000000000000000
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.703246]    ffffffff818f4580 ffff880039118848 00000e6a3c4b2698 00000000148900db
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.743671]    0000000000000000 ffffffff8101e650 ffffffff8185c3e0 0000000000000000
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.781927]    0000000000000000 0000000000000000 0000beef0000beef ffffffff81054eb2
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.820986] Xen call trace:
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.836789]    [<ffff82d0802406fc>] sched_context_switched+0xaf/0x101
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.869916]    [<ffff82d0802407c0>] schedule.c#sched_context_switch+0x72/0x151
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.907384]    [<ffff82d080240d83>] schedule.c#sched_slave+0x2a3/0x2b2
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.941241]    [<ffff82d080240ea4>] schedule.c#schedule+0x112/0x2a1
>> [2019-07-05 00:37:17 UTC] (XEN) [24908.973939]    [<ffff82d080242575>] softirq.c#__do_softirq+0x85/0x90
>> [2019-07-05 00:37:17 UTC] (XEN) [24909.007101]    [<ffff82d0802425ca>] do_softirq+0x13/0x15
>> [2019-07-05 00:37:17 UTC] (XEN) [24909.035971]    [<ffff82d08027a601>] domain.c#idle_loop+0xad/0xc0
>> [2019-07-05 00:37:17 UTC] (XEN) [24909.070546]
>> [2019-07-05 00:37:17 UTC] (XEN) [24909.080286] CPU0 @ e008:ffff82d0802431ba (stop_machine.c#stopmachine_wait_state+0x1a/0x24)
>> [2019-07-05 00:37:17 UTC] (XEN) [24909.122896] CPU1 @ e008:ffff82d0802406f8 (sched_context_switched+0xab/0x101)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.159518] CPU3 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.199607] CPU2 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.235773] CPU5 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.276039] CPU4 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.312371] CPU7 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.352930] CPU6 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.388928] CPU8 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.424664] CPU9 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.465376] CPU10 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.507449] CPU11 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.544703] CPU13 @ e008:ffff82d0802431f2 (stop_machine.c#stopmachine_action+0x2e/0xa0)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.588884] CPU12 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.625781] CPU15 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.666649] CPU14 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.703396] CPU17 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.744089] CPU16 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.781117] CPU23 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.821692] CPU22 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.858139] CPU27 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
>> [2019-07-05 00:37:18 UTC] (XEN) [24909.898704] CPU26 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
>> [2019-07-05 00:37:19 UTC] (XEN) [24909.936069] CPU19 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
>> [2019-07-05 00:37:19 UTC] (XEN) [24909.977291] CPU18 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.014078] CPU31 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.055692] CPU21 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.100486] CPU24 @ e008:ffff82d0802406fa (sched_context_switched+0xad/0x101)
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.136824] CPU25 @ e008:ffff82d0802431fa (stop_machine.c#stopmachine_action+0x36/0xa0)
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.177529] CPU29 @ e008:ffff82d0802431f4 (stop_machine.c#stopmachine_action+0x30/0xa0)
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.218420] CPU28 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.255219] CPU20 @ e008:ffff82d0802406fc (sched_context_switched+0xaf/0x101)
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.292152]
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.301667] ****************************************
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.327892] Panic on CPU 30:
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.344165] FATAL TRAP: vector = 2 (nmi)
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.365476] [error_code=0000]
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.382509] ****************************************
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.408547]
>> [2019-07-05 00:37:19 UTC] (XEN) [24910.418129] Reboot in five seconds...
> 
> On a closer look, the second crash happens when you try to shutdown
> the host ("poweroff" in my case).

And that was just another bug: the scheduler is still active when
trying to enter ACPI deep sleep states. As non-boot cpus are being
taken down via tasklets this will result in syncing problems when
one cpu of a sched_resource is down already and the other is waiting
for it to finish scheduling...

Replacing the common scheuling softirq handler with one doing only
tasklet scheduling in that case makes it work again.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-18 15:14         ` Sergey Dyasli
  2019-07-18 16:04           ` Dario Faggioli
  2019-07-19  5:41           ` Juergen Gross
@ 2019-07-19 13:57           ` Juergen Gross
  2019-07-22 14:22             ` Sergey Dyasli
  2 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-07-19 13:57 UTC (permalink / raw)
  To: Sergey Dyasli, xen-devel
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, RobertVanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, Roger Pau Monné

On 18.07.19 17:14, Sergey Dyasli wrote:
> On 18/07/2019 15:48, Juergen Gross wrote:
>> On 15.07.19 16:08, Sergey Dyasli wrote:
>>> On 05/07/2019 14:56, Dario Faggioli wrote:
>>>> On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote:
>>>>> 1) This crash is quite likely to happen:
>>>>>
>>>>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects
>>>>> that CPU2 is stuck!
>>>>> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] ----[ Xen-4.13.0-
>>>>> 8.0.6-d  x86_64  debug=y   Not tainted ]----
>>>>> [...]
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace:
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.505278]    [<ffff82d08023d578>] vcpu_sleep_sync+0x50/0x71
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.511518]    [<ffff82d080208370>] vcpu_pause+0x21/0x23
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.517326]    [<ffff82d08023e25d>]
>>>>> vcpu_set_periodic_timer+0x27/0x73
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.524258]    [<ffff82d080209682>] do_vcpu_op+0x2c9/0x668
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.530238]    [<ffff82d08024f970>] compat_vcpu_op+0x250/0x390
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.536566]    [<ffff82d080383964>] pv_hypercall+0x364/0x564
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.542719]    [<ffff82d080385644>] do_entry_int82+0x26/0x2d
>>>>> [2019-07-04 18:22:47 UTC] (XEN) [
>>>>> 3425.548876]    [<ffff82d08038839b>] entry_int82+0xbb/0xc0
>>>>>
>>>> Mmm... vcpu_set_periodic_timer?
>>>>
>>>> What kernel is this and when does this crash happen?
>>>
>>> Hi Dario,
>>>
>>> I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 2GB RAM)
>>> which has the following kernel:
>>>
>>> # uname -a
>>>
>>> Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux
>>>
>>> All I need to do is suspend and resume the VM.
>>
>> Happens with a more recent kernel, too.
>>
>> I can easily reproduce the issue with any PV guest with more than 1 vcpu
>> by doing "xl save" and then "xl restore" again.
>>
>> With the reproducer being available I'm now diving into the issue...
> 
> One further thing to add is that I was able to avoid the crash by reverting
> 
> 	xen/sched: rework and rename vcpu_force_reschedule()
> 
> which is a part of the series. This made all tests with PV guests pass.
> 
> Another question I have is do you have a git branch with core-scheduling
> patches rebased on top of current staging available somewhere?

I have now a git branch with the two problems corrected and rebased to
current staging available:

github.com/jgross1/xen.git sched-v1b


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 04/60] xen/sched: use new sched_unit instead of vcpu in scheduler interfaces
  2019-07-19  4:49     ` Juergen Gross
@ 2019-07-19 17:01       ` Dario Faggioli
  0 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-19 17:01 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, andrew.cooper3,
	ian.jackson, robert.vanvossen, tim, julien.grall, josh.whitehead,
	mengxu, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1948 bytes --]

On Fri, 2019-07-19 at 06:49 +0200, Juergen Gross wrote:
> On 18.07.19 19:44, Dario Faggioli wrote:
> > On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> > One thing that came to mind, is that the various function
> > parameters
> > and local variables called 'unit', could be called 'su'.
> > 
> > It's a contraction of 'sched_unit', like, e.g., 'v' or 'vc' were
> > contractions of 'vcpu', it's still quite descriptive, it's short,
> > which
> > is always good, IMO, and might mean less line wrap reformatting
> > (considering that it's replacing 'v' or 'vc').
> > 
> > Of course, this will likely mean changing all the other ~60
> > patches, so
> > I'll understand if you say that it would be too much.
> 
> I prefer "unit", as it is more readable and the effort to do the
> change
> would be quite large (replacing "item" by "unit" was doable via sed,
> while this change would require more manual intervention).
> 
I guess what'd be more readable it's to some big extent a matter of
personal taste. However, I trust you about the amount of work needed
for doing such a rework, so I won't insist on it.

Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

And this stands even with this:

> > > index 2201faca6b..72a17758a1 100644
> > > --- a/xen/include/xen/sched.h
> > > +++ b/xen/include/xen/sched.h
> > > @@ -275,6 +275,10 @@ struct vcpu
> > >       struct arch_vcpu arch;
> > >   };
> > >   
> > > +struct sched_unit {
> > > +    struct vcpu           *vcpu;
> > > +};
> > > +
> > > 
> > Is my understanding correct that this field is going to be renamed
> > vcpu_list, right from this patch?
> 
> Yes.
> 
Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 05/60] xen/sched: alloc struct sched_unit for each vcpu
  2019-07-19  4:56     ` Juergen Gross
@ 2019-07-19 17:04       ` Dario Faggioli
  0 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-19 17:04 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, ian.jackson, tim,
	julien.grall, Jan Beulich, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 1529 bytes --]

On Fri, 2019-07-19 at 06:56 +0200, Juergen Gross wrote:
> On 18.07.19 19:57, Dario Faggioli wrote:
> > On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> > > 
> > However, I don't see much value in not doing what's done here in
> > patch
> > 4 already (it's pretty much only line changed by patch 4 that are
> > being
> > changed again here).
> > 
> > Is there a particular reason you think it's important to keep these
> > two
> > patches separated?
> 
> Not important, but I thought it would make it more clear.
> 
> If you like it better I can merge the two patches.
> 
Yes, I'd prefer them merged.

> > Ah, my comment about 'unit' --> 'su' --in case you think it's
> > feasible-
> > - applies to struct members as well, of course, e.g., here:
> > 
> I have to disagree here: this is no scheduler private structure, so I
> believe the struct member names should be descriptive. I guess this
> is
> the reason why there are many more struct members named "vcpu" than
> "v".
> 
Well, ditto in my last reply to patch 4. And in fact, in scheduling
code, the preference was for 'v'.

But again, let's settle on 'unit', on the basis that the amount of work
necessary to change this is, again, not worth it.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 06/60] xen/sched: move per-vcpu scheduler private data pointer to sched_unit
  2019-07-19  5:03     ` Juergen Gross
@ 2019-07-19 17:10       ` Dario Faggioli
  0 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-19 17:10 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, andrew.cooper3,
	ian.jackson, robert.vanvossen, tim, julien.grall, josh.whitehead,
	mengxu, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1884 bytes --]

On Fri, 2019-07-19 at 07:03 +0200, Juergen Gross wrote:
> On 19.07.19 00:52, Dario Faggioli wrote:
> > On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> > > This prepares making the different schedulers vcpu agnostic.
>
> > But shouldn't then the struct be called csched2_unit, and cited
> > functions be called csched2_alloc_udata() and csched2_unit()?
> > Now, I know that these transformation happen later in the series.
> > And, this time, I'm not asking to consolidate the patches.
> > 
> > However:
> > - can the changelog of this patch be a little more explicit, not
> > only
> >    informing that this is a preparatory patch, but also explaining
> >    briefly the temporary inconcistency
> 
> Sure.
> 
Ok, thanks.

> > - could the patches that deal with this be grouped together, so
> > that
> >    they are close to each other in the series [...]
> 
> There are some patches which could be reordered, but I'm not sure the
> needed work is worth it. By moving the patches you named closer to
> each
> other there will be other closely related patches ripped apart from
> each other.
>
Yes, I appreciate that.
> 
> In the end it will make no sense to apply only some patches of the
> main
> series. The huge amount of patches is meant only to make review
> easier.
>
And yet, this happens some times, especially for big series, and was
exactly what I was thinking to when making this comment.

Anyway, let's go with an updated changelog for now... And I'll build a
better and more solid opinion after having looked at the rest of the
series.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 07/60] xen/sched: build a linked list of struct sched_unit
  2019-07-19  5:07     ` Juergen Gross
@ 2019-07-19 17:16       ` Dario Faggioli
  0 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-19 17:16 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, ian.jackson, tim,
	julien.grall, Jan Beulich, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 2100 bytes --]

On Fri, 2019-07-19 at 07:07 +0200, Juergen Gross wrote:
> On 19.07.19 02:01, Dario Faggioli wrote:
> > On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> > > 
> > How about a ',' between domain and build?
> 
> Okay.

> > "over all vcpus of a sched_unit" ?
> 
> Oh, of course!
> 
Thanks.

> > One question:
> > 
> > > @@ -279,8 +279,16 @@ struct vcpu
> > >   struct sched_unit {
> > >       struct vcpu           *vcpu;
> > >       void                  *priv;      /* scheduler private data
> > > */
> > > +    struct sched_unit     *next_in_list;
> > >   };
> > >   
> > > +#define for_each_sched_unit(d,
> > > e)                                         \
> > > +    for ( (e) = (d)->sched_unit_list; (e) != NULL; (e) = (e)-
> > > > next_in_list )
> > > +
> > > +#define for_each_sched_unit_vcpu(i,
> > > v)                                    \
> > > +    for ( (v) = (i)->vcpu; (v) != NULL && (v)->sched_unit ==
> > > (i);         \
> > > +          (v) = (v)->next_in_list )
> > > +
> > > 
> > So, here... sorry if it's me not seeing it, but why the
> > (v)->sched_unit == (i) check is necessary?
> > 
> > Do we expect to put in the list of vcpus of a particular unit,
> > vcpus
> > that are in another unit?
> 
> Yes. 
>
Ah!

> I'm making use of the fact that all vcpus in a unit are consecutive
> as I'm re-using the already existing list of vcpus in a domain:
> 
> dom->vcpu0->vcpu1->vcpu2->vcpu3
>        ^             ^
>        !             !
> unit0-+             !
>                      !
> unit2---------------+
> 
Ok, I see. Can you add a short comment, above the for_each_xxx
construct, about that?

"All vcpus from all sched units are kept in the same. Only iterate over
the ones from a particular unit"

Or something like this.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 08/60] xen/sched: introduce struct sched_resource
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-07-19 17:43   ` Dario Faggioli
  -1 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-19 17:43 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall, Meng Xu,
	Jan Beulich, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 1727 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> Add a scheduling abstraction layer between physical processors and
> the
> schedulers by introducing a struct sched_resource. Each scheduler
> unit
> running is active on such a scheduler resource. For the time being
> there is one struct sched_resource per cpu, but in future there might
> be one for each core or socket only.
> 
I only have two questions:

> @@ -1618,6 +1623,13 @@ static int cpu_schedule_up(unsigned int cpu)
>  {
>      struct schedule_data *sd = &per_cpu(schedule_data, cpu);
>      void *sched_priv;
> +    struct sched_resource *res;
> +
> +    res = xzalloc(struct sched_resource);
> +    if ( res == NULL )
> +        return -ENOMEM;
> +    res->processor = cpu;
> +    set_sched_res(cpu, res);
>  
>      per_cpu(scheduler, cpu) = &ops;
>      spin_lock_init(&sd->_lock);
> @@ -1682,6 +1694,9 @@ static void cpu_schedule_down(unsigned int cpu)
>      sd->sched_priv = NULL;
>  
>      kill_timer(&sd->s_timer);
> +
> +    set_sched_res(cpu, NULL);
> +    xfree(sd);
>  }
> 
What's that xfree(sd) about?
 

> @@ -334,7 +349,10 @@ static inline void sched_migrate(const struct
> scheduler *s,
>      if ( s->migrate )
>          s->migrate(s, unit, cpu);
>      else
> +    {
>          unit->vcpu->processor = cpu;
> +        unit->res = per_cpu(sched_res, cpu);
> +    }
>
  unit->res = get_sched_res(cpu)

?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 08/60] xen/sched: introduce struct sched_resource
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
  (?)
@ 2019-07-19 17:49   ` Dario Faggioli
  -1 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-19 17:49 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, ian.jackson, tim,
	julien.grall, mengxu, Jan Beulich, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 1149 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> Add a scheduling abstraction layer between physical processors and
> the
> schedulers by introducing a struct sched_resource. Each scheduler
> unit
> running is active on such a scheduler resource. For the time being
> there is one struct sched_resource per cpu, but in future there might
> be one for each core or socket only.
> 
Ah, one more thing.

> +++ b/xen/common/schedule.c
> @@ -63,6 +63,7 @@ static void poll_timer_fn(void *data);
>  /* This is global for now so that private implementations can reach
> it */
>  DEFINE_PER_CPU(struct schedule_data, schedule_data);
>  DEFINE_PER_CPU(struct scheduler *, scheduler);
> +DEFINE_PER_CPU(struct sched_resource *, sched_res);
>  
I wouldn't expect this to change too much.

Therefore, does DEFINE_PER_CPU_READ_MOSTLY() make sense?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 09/60] xen/sched: let pick_cpu return a scheduler resource
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-07-19 18:06   ` Dario Faggioli
  -1 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-19 18:06 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, tim, ian.jackson,
	robert.vanvossen, julien.grall, josh.whitehead, mengxu,
	Jan Beulich, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 1032 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> Instead of returning a physical cpu number let pick_cpu() return a
> scheduler resource instead. Rename pick_cpu() to pick_resource() to
> reflect that change.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

with:

> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -351,14 +351,14 @@ static inline void sched_migrate(const struct
> scheduler *s,
>      else
>      {
>          unit->vcpu->processor = cpu;
> -        unit->res = per_cpu(sched_res, cpu);
> +        unit->res = get_sched_res(cpu);
>      }
>  }
>  
this hunk moving in patch 8, as noted while reviewing it already. :-)

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 60/60] xen/sched: add scheduling granularity enum
  2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
  (?)
  (?)
@ 2019-07-19 18:31   ` Dario Faggioli
  -1 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-19 18:31 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sstabellini, wl, konrad.wilk, George.Dunlap, ian.jackson, tim,
	julien.grall, Jan Beulich, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 1300 bytes --]

On Tue, 2019-05-28 at 12:33 +0200, Juergen Gross wrote:
> Add a scheduling granularity enum ("cpu", "core", "socket") for
> specification of the scheduling granularity. Initially it is set to
> "cpu", this can be modified by the new boot parameter (x86 only)
> "sched-gran".
> 
> According to the selected granularity sched_granularity is set after
> all cpus are online.
> 
> A test is added for all sched resources holding the same number of
> cpus. Fall back to core- or cpu-scheduling in that case.
>
Might be me (non-native speaker), but this reads weird.

Perhaps the second sentence should have ended with "is that _is_not_
the case" ?

As in, "if it is not the case that all sched resources hold the same
number of cpus, we fallback". While right now it seems to me to say
that we fall back in the case that the resources hold the same number
of cpus...

> Signed-off-by: Juergen Gross <jgross@suse.com>
>
With the above clarified:

Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-19 13:57           ` Juergen Gross
@ 2019-07-22 14:22             ` Sergey Dyasli
  2019-07-24  9:13               ` Juergen Gross
  0 siblings, 1 reply; 202+ messages in thread
From: Sergey Dyasli @ 2019-07-22 14:22 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, RobertVanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, sergey.dyasli@citrix.com >> Sergey Dyasli,
	Roger Pau Monné

On 19/07/2019 14:57, Juergen Gross wrote:

> I have now a git branch with the two problems corrected and rebased to
> current staging available:
> 
> github.com/jgross1/xen.git sched-v1b

Many thanks for the branch! As for the crashes, vcpu_sleep_sync() one
seems to be fixed now. But I can still reproduce the shutdown one.
Interestingly, it now happens only if a host has running VMs (which
are automatically powered off via PV tools):

(XEN) [  332.981355] Preparing system for ACPI S5 state.
(XEN) [  332.981419] Disabling non-boot CPUs ...
(XEN) [  337.703896] Watchdog timer detects that CPU1 is stuck!
(XEN) [  337.709532] ----[ Xen-4.13.0-8.0.6-d  x86_64  debug=y   Not tainted ]----
(XEN) [  337.716808] CPU:    1
(XEN) [  337.719582] RIP:    e008:[<ffff82d08024041c>] sched_context_switched+0xaf/0x101
(XEN) [  337.727384] RFLAGS: 0000000000000202   CONTEXT: hypervisor
(XEN) [  337.733364] rax: 0000000000000002   rbx: ffff83081cc615b0   rcx: 0000000000000001
(XEN) [  337.741338] rdx: ffff83081cc61634   rsi: ffff83081cc72000   rdi: ffff83081cc72000
(XEN) [  337.749312] rbp: ffff83081cc8fdc0   rsp: ffff83081cc8fda0   r8:  0000000000000000
(XEN) [  337.757284] r9:  0000000000000000   r10: 0000004d88fc535e   r11: 0000004df8675ce7
(XEN) [  337.765256] r12: ffff83081cc72000   r13: ffff83081cc72000   r14: ffff83081ccb0e80
(XEN) [  337.773232] r15: ffff83081cc615b0   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) [  337.781206] cr3: 00000000dd2a1000   cr2: ffff88809ed1fb80
(XEN) [  337.787100] fsb: 0000000000000000   gsb: ffff8880a38c0000   gss: 0000000000000000
(XEN) [  337.795072] ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) [  337.802525] Xen code around <ffff82d08024041c> (sched_context_switched+0xaf/0x101):
(XEN) [  337.810672]  00 00 eb 18 f3 90 8b 02 <85> c0 75 f8 eb 0e 49 8b 7e 30 48 85 ff 74 05 e8
(XEN) [  337.819080] Xen stack trace from rsp=ffff83081cc8fda0:
(XEN) [  337.824713]    ffff83081cc72000 ffff83081cc72000 0000000000000000 ffff83081cc615b0
(XEN) [  337.832772]    ffff83081cc8fe00 ffff82d0802404e0 0000000000000082 ffff83081ccb0e98
(XEN) [  337.840832]    0000000000000001 ffff83081ccb0e98 0000000000000001 ffff82d080602628
(XEN) [  337.848895]    ffff83081cc8fe60 ffff82d080240aca 0000004d873bd669 0000000000000001
(XEN) [  337.856952]    ffff83081cc72000 0000004d873bdc1c ffff8308000000ff ffff82d0805bba00
(XEN) [  337.865012]    ffff82d0805bb980 ffffffffffffffff ffff83081cc8ffff 0000000000000001
(XEN) [  337.873072]    ffff83081cc8fe90 ffff82d080242315 0000000000000080 ffff82d0805bb980
(XEN) [  337.881132]    0000000000000001 ffff82d0806026f0 ffff83081cc8fea0 ffff82d08024236a
(XEN) [  337.889196]    ffff83081cc8fef0 ffff82d08027a151 ffff82d080242315 000000010665f000
(XEN) [  337.897256]    ffff83081cc72000 ffff83081cc72000 ffff83080665f000 ffff83081cc63000
(XEN) [  337.905313]    0000000000000001 ffff830806684000 ffff83081cc8fd78 ffff88809ee08000
(XEN) [  337.913373]    ffff88809ee08000 0000000000000000 0000000000000000 0000000000000003
(XEN) [  337.921434]    ffff88809ee08000 0000000000000246 aaaaaaaaaaaaaaaa 0000000000000000
(XEN) [  337.929497]    0000000096968abe 0000000000000000 ffffffff810013aa ffffffff8203c190
(XEN) [  337.937554]    deadbeefdeadf00d deadbeefdeadf00d 0000010000000000 ffffffff810013aa
(XEN) [  337.945615]    000000000000e033 0000000000000246 ffffc900400afeb0 000000000000e02b
(XEN) [  337.953674]    000000000000beef 000000000000beef 000000000000beef 000000000000beef
(XEN) [  337.961736]    0000e01000000001 ffff83081cc72000 000000379c66db80 00000000001526e0
(XEN) [  337.969797]    0000000000000000 0000000000000000 0000060000000000 0000000000000000
(XEN) [  337.977856] Xen call trace:
(XEN) [  337.981152]    [<ffff82d08024041c>] sched_context_switched+0xaf/0x101
(XEN) [  337.988083]    [<ffff82d0802404e0>] schedule.c#sched_context_switch+0x72/0x151
(XEN) [  337.995796]    [<ffff82d080240aca>] schedule.c#sched_slave+0x2a3/0x2b2
(XEN) [  338.002817]    [<ffff82d080242315>] softirq.c#__do_softirq+0x85/0x90
(XEN) [  338.009664]    [<ffff82d08024236a>] do_softirq+0x13/0x15
(XEN) [  338.015471]    [<ffff82d08027a151>] domain.c#idle_loop+0xb2/0xc9
(XEN) [  338.021970]
(XEN) [  338.023965] CPU7 @ e008:ffff82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0)
(XEN) [  338.032372] CPU5 @ e008:ffff82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0)
(XEN) [  338.040776] CPU4 @ e008:ffff82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0)
(XEN) [  338.049182] CPU2 @ e008:ffff82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0)
(XEN) [  338.057591] CPU6 @ e008:ffff82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0)
(XEN) [  338.065999] CPU3 @ e008:ffff82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0)
(XEN) [  338.074406] CPU0 @ e008:ffff82d0802532d1 (ns16550.c#ns_read_reg+0x21/0x42)
(XEN) [  338.081773]
(XEN) [  338.083764] ****************************************
(XEN) [  338.089226] Panic on CPU 1:
(XEN) [  338.092521] FATAL TRAP: vector = 2 (nmi)
(XEN) [  338.096940] [error_code=0000]
(XEN) [  338.100491] ****************************************
(XEN) [  338.105951]
(XEN) [  338.107946] Reboot in five seconds...
(XEN) [  338.112105] Executing kexec image on cpu1
(XEN) [  338.117383] Shot down all CPUs

And since Igor managed to fix kdump, I can now post backtraces from
all CPUs as well: https://paste.debian.net/1092609/

Thanks,
Sergey

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-22 14:22             ` Sergey Dyasli
@ 2019-07-24  9:13               ` Juergen Gross
  2019-07-24 14:54                 ` Sergey Dyasli
  0 siblings, 1 reply; 202+ messages in thread
From: Juergen Gross @ 2019-07-24  9:13 UTC (permalink / raw)
  To: Sergey Dyasli, xen-devel
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, RobertVanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, Roger Pau Monné

On 22.07.19 16:22, Sergey Dyasli wrote:
> On 19/07/2019 14:57, Juergen Gross wrote:
> 
>> I have now a git branch with the two problems corrected and rebased to
>> current staging available:
>>
>> github.com/jgross1/xen.git sched-v1b
> 
> Many thanks for the branch! As for the crashes, vcpu_sleep_sync() one
> seems to be fixed now. But I can still reproduce the shutdown one.
> Interestingly, it now happens only if a host has running VMs (which
> are automatically powered off via PV tools):
> 
> (XEN) [  332.981355] Preparing system for ACPI S5 state.
> (XEN) [  332.981419] Disabling non-boot CPUs ...
> (XEN) [  337.703896] Watchdog timer detects that CPU1 is stuck!
> (XEN) [  337.709532] ----[ Xen-4.13.0-8.0.6-d  x86_64  debug=y   Not tainted ]----
> (XEN) [  337.716808] CPU:    1
> (XEN) [  337.719582] RIP:    e008:[<ffff82d08024041c>] sched_context_switched+0xaf/0x101
> (XEN) [  337.727384] RFLAGS: 0000000000000202   CONTEXT: hypervisor
> (XEN) [  337.733364] rax: 0000000000000002   rbx: ffff83081cc615b0   rcx: 0000000000000001
> (XEN) [  337.741338] rdx: ffff83081cc61634   rsi: ffff83081cc72000   rdi: ffff83081cc72000
> (XEN) [  337.749312] rbp: ffff83081cc8fdc0   rsp: ffff83081cc8fda0   r8:  0000000000000000
> (XEN) [  337.757284] r9:  0000000000000000   r10: 0000004d88fc535e   r11: 0000004df8675ce7
> (XEN) [  337.765256] r12: ffff83081cc72000   r13: ffff83081cc72000   r14: ffff83081ccb0e80
> (XEN) [  337.773232] r15: ffff83081cc615b0   cr0: 000000008005003b   cr4: 00000000001526e0
> (XEN) [  337.781206] cr3: 00000000dd2a1000   cr2: ffff88809ed1fb80
> (XEN) [  337.787100] fsb: 0000000000000000   gsb: ffff8880a38c0000   gss: 0000000000000000
> (XEN) [  337.795072] ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) [  337.802525] Xen code around <ffff82d08024041c> (sched_context_switched+0xaf/0x101):
> (XEN) [  337.810672]  00 00 eb 18 f3 90 8b 02 <85> c0 75 f8 eb 0e 49 8b 7e 30 48 85 ff 74 05 e8
> (XEN) [  337.819080] Xen stack trace from rsp=ffff83081cc8fda0:
> (XEN) [  337.824713]    ffff83081cc72000 ffff83081cc72000 0000000000000000 ffff83081cc615b0
> (XEN) [  337.832772]    ffff83081cc8fe00 ffff82d0802404e0 0000000000000082 ffff83081ccb0e98
> (XEN) [  337.840832]    0000000000000001 ffff83081ccb0e98 0000000000000001 ffff82d080602628
> (XEN) [  337.848895]    ffff83081cc8fe60 ffff82d080240aca 0000004d873bd669 0000000000000001
> (XEN) [  337.856952]    ffff83081cc72000 0000004d873bdc1c ffff8308000000ff ffff82d0805bba00
> (XEN) [  337.865012]    ffff82d0805bb980 ffffffffffffffff ffff83081cc8ffff 0000000000000001
> (XEN) [  337.873072]    ffff83081cc8fe90 ffff82d080242315 0000000000000080 ffff82d0805bb980
> (XEN) [  337.881132]    0000000000000001 ffff82d0806026f0 ffff83081cc8fea0 ffff82d08024236a
> (XEN) [  337.889196]    ffff83081cc8fef0 ffff82d08027a151 ffff82d080242315 000000010665f000
> (XEN) [  337.897256]    ffff83081cc72000 ffff83081cc72000 ffff83080665f000 ffff83081cc63000
> (XEN) [  337.905313]    0000000000000001 ffff830806684000 ffff83081cc8fd78 ffff88809ee08000
> (XEN) [  337.913373]    ffff88809ee08000 0000000000000000 0000000000000000 0000000000000003
> (XEN) [  337.921434]    ffff88809ee08000 0000000000000246 aaaaaaaaaaaaaaaa 0000000000000000
> (XEN) [  337.929497]    0000000096968abe 0000000000000000 ffffffff810013aa ffffffff8203c190
> (XEN) [  337.937554]    deadbeefdeadf00d deadbeefdeadf00d 0000010000000000 ffffffff810013aa
> (XEN) [  337.945615]    000000000000e033 0000000000000246 ffffc900400afeb0 000000000000e02b
> (XEN) [  337.953674]    000000000000beef 000000000000beef 000000000000beef 000000000000beef
> (XEN) [  337.961736]    0000e01000000001 ffff83081cc72000 000000379c66db80 00000000001526e0
> (XEN) [  337.969797]    0000000000000000 0000000000000000 0000060000000000 0000000000000000
> (XEN) [  337.977856] Xen call trace:
> (XEN) [  337.981152]    [<ffff82d08024041c>] sched_context_switched+0xaf/0x101
> (XEN) [  337.988083]    [<ffff82d0802404e0>] schedule.c#sched_context_switch+0x72/0x151
> (XEN) [  337.995796]    [<ffff82d080240aca>] schedule.c#sched_slave+0x2a3/0x2b2
> (XEN) [  338.002817]    [<ffff82d080242315>] softirq.c#__do_softirq+0x85/0x90
> (XEN) [  338.009664]    [<ffff82d08024236a>] do_softirq+0x13/0x15
> (XEN) [  338.015471]    [<ffff82d08027a151>] domain.c#idle_loop+0xb2/0xc9
> (XEN) [  338.021970]
> (XEN) [  338.023965] CPU7 @ e008:ffff82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0)
> (XEN) [  338.032372] CPU5 @ e008:ffff82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0)
> (XEN) [  338.040776] CPU4 @ e008:ffff82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0)
> (XEN) [  338.049182] CPU2 @ e008:ffff82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0)
> (XEN) [  338.057591] CPU6 @ e008:ffff82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0)
> (XEN) [  338.065999] CPU3 @ e008:ffff82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0)
> (XEN) [  338.074406] CPU0 @ e008:ffff82d0802532d1 (ns16550.c#ns_read_reg+0x21/0x42)
> (XEN) [  338.081773]
> (XEN) [  338.083764] ****************************************
> (XEN) [  338.089226] Panic on CPU 1:
> (XEN) [  338.092521] FATAL TRAP: vector = 2 (nmi)
> (XEN) [  338.096940] [error_code=0000]
> (XEN) [  338.100491] ****************************************
> (XEN) [  338.105951]
> (XEN) [  338.107946] Reboot in five seconds...
> (XEN) [  338.112105] Executing kexec image on cpu1
> (XEN) [  338.117383] Shot down all CPUs
> 
> And since Igor managed to fix kdump, I can now post backtraces from
> all CPUs as well: https://paste.debian.net/1092609/

Thanks for the test (and report).

The fix is a one-liner. :-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index f0bc5b3161..da9efb147f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -2207,6 +2207,7 @@ static struct sched_unit 
*sched_wait_rendezvous_in(struct sched_unit *prev,
          if ( unlikely(!scheduler_active) )
          {
              ASSERT(is_idle_unit(prev));
+            atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
              prev->rendezvous_in_cnt = 0;
          }
      }


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-24  9:13               ` Juergen Gross
@ 2019-07-24 14:54                 ` Sergey Dyasli
  2019-07-24 15:11                   ` Juergen Gross
  0 siblings, 1 reply; 202+ messages in thread
From: Sergey Dyasli @ 2019-07-24 14:54 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, RobertVanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, sergey.dyasli@citrix.com >> Sergey Dyasli,
	Roger Pau Monné

On 24/07/2019 10:13, Juergen Gross wrote:
> The fix is a one-liner. :-)
> 
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index f0bc5b3161..da9efb147f 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -2207,6 +2207,7 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
>          if ( unlikely(!scheduler_active) )
>          {
>              ASSERT(is_idle_unit(prev));
> +            atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
>              prev->rendezvous_in_cnt = 0;
>          }
>      }

Even with that applied, I'm still seeing it :(

(XEN) [  311.223780] Watchdog timer detects that CPU1 is stuck!

(XEN) [  311.229413] ----[ Xen-4.13.0  x86_64  debug=y   Not tainted ]----

(XEN) [  311.236002] CPU:    1

(XEN) [  311.238774] RIP:    e008:[<ffff82d0802408a8>] sched_context_switched+0x92/0x101

(XEN) [  311.246575] RFLAGS: 0000000000000202   CONTEXT: hypervisor

(XEN) [  311.252556] rax: 0000000000000002   rbx: ffff83081cc635b0   rcx: 0000000000000001

(XEN) [  311.260530] rdx: ffff83081cc63634   rsi: ffff83081cc8f000   rdi: ffff83081cc8f000

(XEN) [  311.268502] rbp: ffff83081cc87df0   rsp: ffff83081cc87dd0   r8:  0000000000000000

(XEN) [  311.276474] r9:  ffff83081cc62000   r10: ffff83081cc62000   r11: ffff83081cc6b000

(XEN) [  311.284448] r12: ffff83081cc8f000   r13: ffff83081cc8f000   r14: ffff83081cc61e80

(XEN) [  311.292422] r15: ffff82d0805e2260   cr0: 000000008005003b   cr4: 00000000001526e0

(XEN) [  311.300395] cr3: 00000000dd4ac000   cr2: 0000559b05a94048

(XEN) [  311.306288] fsb: 0000000000000000   gsb: ffff8880a3940000   gss: 0000000000000000

(XEN) [  311.314262] ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008

(XEN) [  311.321716] Xen code around <ffff82d0802408a8> (sched_context_switched+0x92/0x101):

(XEN) [  311.329862]  85 c0 74 08 f3 90 8b 02 <85> c0 75 f8 49 8b 44 24 10 66 81 38 ff 7f 75 05

(XEN) [  311.338269] Xen stack trace from rsp=ffff83081cc87dd0:

(XEN) [  311.343904]    ffff83081cc8f000 ffff83081cc8f000 0000000000000000 ffff83081cc635b0

(XEN) [  311.351963]    ffff83081cc87e28 ffff82d080240996 ffff83081cc61e98 ffff82d08060a4a8

(XEN) [  311.360022]    ffff83081cc61e98 ffff82d08060a4a8 ffff83081cc635b0 ffff83081cc87e80

(XEN) [  311.368083]    ffff82d080240f7a 0000000000000001 ffff83081cc8f000 00000047588837ec

(XEN) [  311.376142]    000000011cc87ec0 ffff82d0805c3a00 ffff82d0805c3980 ffffffffffffffff

(XEN) [  311.384205]    ffff82d0805d3980 ffff82d0805e2260 ffff83081cc87eb0 ffff82d08024274a

(XEN) [  311.392263]    0000000000000001 ffff82d0805c3a00 0000000000000001 0000000000000001

(XEN) [  311.400324]    ffff83081cc87ec0 ffff82d0802427bf ffff83081cc87ef0 ffff82d080279a1d

(XEN) [  311.408385]    ffff83081cc8f000 ffff83081cc8f000 0000000000000001 ffff83081cc635b0

(XEN) [  311.416443]    ffff83081cc87df0 ffff88809ee1ba00 ffff88809ee1ba00 0000000000000000

(XEN) [  311.424504]    0000000000000000 0000000000000005 ffff88809ee1ba00 0000000000000246

(XEN) [  311.432563]    aaaaaaaaaaaaaaaa 0000000000000000 000000000001ca00 0000000000000000

(XEN) [  311.440625]    ffffffff810013aa ffffffff8203c190 deadbeefdeadf00d deadbeefdeadf00d

(XEN) [  311.448685]    0000010000000000 ffffffff810013aa 000000000000e033 0000000000000246

(XEN) [  311.456747]    ffffc900400bfeb0 000000000000e02b 000000000000beef 000000000000beef

(XEN) [  311.464807]    000000000000beef 000000000000beef 0000e01000000001 ffff83081cc8f000

(XEN) [  311.472864]    000000379c665d00 00000000001526e0 0000000000000000 0000000000000000

(XEN) [  311.480926]    0000060000000000 0000000000000000

(XEN) [  311.486041] Xen call trace:

(XEN) [  311.489332]    [<ffff82d0802408a8>] sched_context_switched+0x92/0x101

(XEN) [  311.496266]    [<ffff82d080240996>] schedule.c#sched_context_switch+0x7f/0x160

(XEN) [  311.503980]    [<ffff82d080240f7a>] schedule.c#sched_slave+0x28f/0x2b5

(XEN) [  311.510999]    [<ffff82d08024274a>] softirq.c#__do_softirq+0x61/0x8c

(XEN) [  311.517846]    [<ffff82d0802427bf>] do_softirq+0x13/0x15

(XEN) [  311.523653]    [<ffff82d080279a1d>] domain.c#idle_loop+0x52/0xa7

(XEN) [  311.530152]

(XEN) [  311.532144] CPU0 @ e008:ffff82d08024334d (stop_machine.c#stopmachine_wait_state+0x19/0x24)

(XEN) [  311.540899] CPU5 @ e008:ffff82d080243398 (stop_machine.c#stopmachine_action+0x40/0x93)

(XEN) [  311.549307] CPU3 @ e008:ffff82d08024339e (stop_machine.c#stopmachine_action+0x46/0x93)

(XEN) [  311.557712] CPU4 @ e008:ffff82d08024339e (stop_machine.c#stopmachine_action+0x46/0x93)

(XEN) [  311.566119] CPU7 @ e008:ffff82d08024339e (stop_machine.c#stopmachine_action+0x46/0x93)

(XEN) [  311.574526] CPU2 @ e008:ffff82d080243398 (stop_machine.c#stopmachine_action+0x40/0x93)

(XEN) [  311.582931] CPU6 @ e008:ffff82d080243398 (stop_machine.c#stopmachine_action+0x40/0x93)

(XEN) [  311.591919]

(XEN) [  311.593914] ****************************************

(XEN) [  311.599374] Panic on CPU 1:

(XEN) [  311.602669] FATAL TRAP: vector = 2 (nmi)

(XEN) [  311.607088] [error_code=0000]

(XEN) [  311.610641] ****************************************

(XEN) [  311.616101]

(XEN) [  311.618095] Reboot in five seconds...

(XEN) [  311.622254] Executing kexec image on cpu1

(XEN) [  311.627534] Shot down all CPUs

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-24 14:54                 ` Sergey Dyasli
@ 2019-07-24 15:11                   ` Juergen Gross
  0 siblings, 0 replies; 202+ messages in thread
From: Juergen Gross @ 2019-07-24 15:11 UTC (permalink / raw)
  To: Sergey Dyasli, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, RobertVanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, Ian Jackson, Roger Pau Monné

On 24.07.19 16:54, Sergey Dyasli wrote:
> On 24/07/2019 10:13, Juergen Gross wrote:
>> The fix is a one-liner. :-)
>>
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index f0bc5b3161..da9efb147f 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -2207,6 +2207,7 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
>>           if ( unlikely(!scheduler_active) )
>>           {
>>               ASSERT(is_idle_unit(prev));
>> +            atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
>>               prev->rendezvous_in_cnt = 0;
>>           }
>>       }
> 
> Even with that applied, I'm still seeing it :(

Interesting, for me it was gone.

Time for more tests and some debug code...


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
  2019-07-05 13:17 ` [Xen-devel] [PATCH 00/60] xen: add core scheduling support Sergey Dyasli
                     ` (2 preceding siblings ...)
  2019-07-16 15:45   ` Sergey Dyasli
@ 2019-07-25 16:01   ` Sergey Dyasli
  3 siblings, 0 replies; 202+ messages in thread
From: Sergey Dyasli @ 2019-07-25 16:01 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: sergey.dyasli@citrix.com >> Sergey Dyasli,
	Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Robert VanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich, Ian Jackson, Roger Pau Monné

Hi Juergen,

I've found another regression that happens only with sched-gran=core.
CentOS 5.11 (PV, CPUs: 32; RAM: 6GB) kernel hangs during suspend attempt.
The last kernel messages are:

    CPU 1 offline: Remove Rx thread
    CPU 2 offline: Remove Rx thread

Kernel: Linux localhost 2.6.18-398.el5xen #1 SMP Tue Sep 16 21:31:50 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

xl top shows 100% CPU utilization for the hanged VM. And here's its state:

(XEN) [ 1907.976356] *** Dumping CPU14 guest state (d1v0): ***

(XEN) [ 1907.982558] ----[ Xen-4.13.0-8.0.6-d  x86_64  debug=y   Not tainted ]----

(XEN) [ 1907.990704] CPU:    14

(XEN) [ 1907.993901] RIP:    e033:[<ffffffff80264a7b>]

(XEN) [ 1907.999333] RFLAGS: 0000000000000286   EM: 1   CONTEXT: pv guest (d1v0)

(XEN) [ 1908.007282] rax: 0000000000000001   rbx: ffffffff80522b80   rcx: 0000000000000000

(XEN) [ 1908.016203] rdx: ffffffff80522b90   rsi: 0000000000000079   rdi: ffffffff8052a980

(XEN) [ 1908.025121] rbp: ffffffff80522980   rsp: ffff88017106dcf8   r8:  ffff88017106c000

(XEN) [ 1908.034040] r9:  0000000000000000   r10: 0000000000000000   r11: ffff880176fad8c0

(XEN) [ 1908.042962] r12: 0000000000000001   r13: 0000000000000000   r14: ffffffff80522980

(XEN) [ 1908.051881] r15: 0000000000000003   cr0: 000000008005003b   cr4: 0000000000142660

(XEN) [ 1908.060800] cr3: 000000801d8c0000   cr2: 00002b5400970000

(XEN) [ 1908.067393] fsb: 0000000000000000   gsb: ffffffff80639000   gss: 0000000000000000

(XEN) [ 1908.076311] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033

(XEN) [ 1908.084650] Guest stack trace from rsp=ffff88017106dcf8:

(XEN) [ 1908.091147]    ffffffff802c3dd4 0000000001168460 ffffffff80522b90 0000000000000079

(XEN) [ 1908.100164]    ffffffff80522b80 ffffffff80522980 0000000000000001 0000000000000000

(XEN) [ 1908.109179]    0000000000000003 0000000000000003 ffffffff802c4041 ffff88017d68b040

(XEN) [ 1908.118197]    0000000000000003 0000000000000003 0000000000000007 0000000000000003

(XEN) [ 1908.127213]    00000000ffffffea ffffffff8029f0ad ffffffff802c4092 ffffffff8050ff90

(XEN) [ 1908.136229]    ffffffff80268111 0000000000000003 0000000000000000 ffff88017d68b040

(XEN) [ 1908.145245]    ffffffff802a408b 0000000000000000 ffff8800011d3860 fffffffffffffff7

(XEN) [ 1908.154263]    ffffffffffffffff ffffffffffffffff 7fffffffffffffff 0000000000000001

(XEN) [ 1908.163278]    0000000000000000 0000000000000000 0000000000000000 0000000000000000

(XEN) [ 1908.172296]    0000000000000003 0000000000000003 0000000000000000 ffff88017f5ebce0

(XEN) [ 1908.181312]    0000000000000000 ffff88017f5ebcd0 ffffffff803be1a6 0000000000000001

(XEN) [ 1908.190328]    0000000000000000 ffffffff803be9dd ffffffff803bea43 ffff88017106deb0

(XEN) [ 1908.199345]    ffffffff80289495 0000000300000000 ffff88017f5ebce0 ffff88017f5ebce8

(XEN) [ 1908.208362]    0000000000000000 ffff88017f5ebcd0 ffffffff8029f0ad ffff88017106dee0

(XEN) [ 1908.217379]    0000000000000000 ffff88017f5ebce0 0000000000000000 ffff88017f5ebcd0

(XEN) [ 1908.226396]    ffffffff8029f0ad 0000000000000000 ffffffff80233ee4 0000000000000000

(XEN) [ 1908.235413]    ffff88017d8687f0 ffffffffffffffff ffffffffffffffff ffffffffffffffff

(XEN) [ 1908.244427]    7fffffffffffffff ffffffffffffffff ffff8800011d3860 ffff88017f5ebcd0

(XEN) [ 1908.253444]    ffff88017f5ebc78 ffff88017f5ce6c0 ffffffff80260b2c 0000000000000000

(XEN) [ 1908.262461]    ffffffff8029f0ad ffff88017f5ebcd0 0000000000000000 ffff88017f5ce6c0



Thanks,
Sergey

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 10/60] xen/sched: switch schedule_data.curr to point at sched_unit
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
@ 2019-07-29 22:08   ` Dario Faggioli
  -1 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-29 22:08 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Robert VanVossen,
	Julien Grall, Josh Whitehead, Meng Xu, Jan Beulich, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 674 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> In preparation of core scheduling let the percpu pointer
> schedule_data.curr point to a strct sched_unit instead of the related
> vcpu. At the same time rename the per-vcpu scheduler specific structs
> to per-unit ones.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [Xen-devel] [PATCH 11/60] xen/sched: move per cpu scheduler private data into struct sched_resource
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
  (?)
  (?)
@ 2019-07-29 22:22   ` Dario Faggioli
  -1 siblings, 0 replies; 202+ messages in thread
From: Dario Faggioli @ 2019-07-29 22:22 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Robert VanVossen,
	Tim Deegan, Julien Grall, Josh Whitehead, Meng Xu, Jan Beulich,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 723 bytes --]

On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote:
> This prepares support of larger scheduling granularities, e.g. core
> scheduling.
> 
> While at it move sched_has_urgent_vcpu() from include/asm-
> x86/cpuidle.h
> into schedule.c removing the need for including sched-if.h in
> cpuidle.h and multiple other C sources.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Regard
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 202+ messages in thread

end of thread, other threads:[~2019-07-31  0:23 UTC | newest]

Thread overview: 202+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-28 10:32 [PATCH 00/60] xen: add core scheduling support Juergen Gross
2019-05-28 10:32 ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 01/60] xen/sched: only allow schedulers with all mandatory functions available Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-06-11 16:03   ` Dario Faggioli
2019-05-28 10:32 ` [PATCH 02/60] xen/sched: add inline wrappers for calling per-scheduler functions Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-06-11 16:21   ` Dario Faggioli
2019-05-28 10:32 ` [PATCH 03/60] xen/sched: let sched_switch_sched() return new lock address Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-06-11 16:55   ` Dario Faggioli
2019-06-12  7:40     ` Jan Beulich
2019-06-12  8:06       ` Juergen Gross
2019-06-12  9:44         ` Dario Faggioli
2019-06-12  8:05   ` Andrew Cooper
2019-06-12  8:19     ` Juergen Gross
2019-06-12  9:32       ` Jan Beulich
     [not found]       ` <5D00C6960200007800237622@suse.com>
2019-06-12  9:56         ` Dario Faggioli
2019-06-12 10:14           ` Jan Beulich
2019-06-12 11:27             ` Andrew Cooper
2019-06-12 13:32               ` Dario Faggioli
2019-05-28 10:32 ` [PATCH 04/60] xen/sched: use new sched_unit instead of vcpu in scheduler interfaces Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-07-18 17:44   ` Dario Faggioli
2019-07-19  4:49     ` Juergen Gross
2019-07-19 17:01       ` Dario Faggioli
2019-05-28 10:32 ` [PATCH 05/60] xen/sched: alloc struct sched_unit for each vcpu Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-07-18 17:57   ` Dario Faggioli
2019-07-19  4:56     ` Juergen Gross
2019-07-19 17:04       ` Dario Faggioli
2019-05-28 10:32 ` [PATCH 06/60] xen/sched: move per-vcpu scheduler private data pointer to sched_unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-07-18 22:52   ` Dario Faggioli
2019-07-19  5:03     ` Juergen Gross
2019-07-19 17:10       ` Dario Faggioli
2019-05-28 10:32 ` [PATCH 07/60] xen/sched: build a linked list of struct sched_unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-07-19  0:01   ` Dario Faggioli
2019-07-19  5:07     ` Juergen Gross
2019-07-19 17:16       ` Dario Faggioli
2019-05-28 10:32 ` [PATCH 08/60] xen/sched: introduce struct sched_resource Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-07-19 17:43   ` Dario Faggioli
2019-07-19 17:49   ` Dario Faggioli
2019-05-28 10:32 ` [PATCH 09/60] xen/sched: let pick_cpu return a scheduler resource Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-07-19 18:06   ` Dario Faggioli
2019-05-28 10:32 ` [PATCH 10/60] xen/sched: switch schedule_data.curr to point at sched_unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-07-29 22:08   ` Dario Faggioli
2019-05-28 10:32 ` [PATCH 11/60] xen/sched: move per cpu scheduler private data into struct sched_resource Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 11:32   ` Jan Beulich
2019-05-28 11:32     ` [Xen-devel] " Jan Beulich
2019-07-29 22:22   ` Dario Faggioli
2019-05-28 10:32 ` [PATCH 12/60] xen/sched: switch vcpu_schedule_lock to unit_schedule_lock Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-06-13  7:18   ` Andrii Anisov
2019-06-13  7:29     ` Juergen Gross
2019-06-13  7:34       ` Andrii Anisov
2019-06-13  8:39         ` Juergen Gross
2019-06-13  8:49           ` Andrii Anisov
2019-05-28 10:32 ` [PATCH 14/60] xen/sched: add scheduler helpers hiding vcpu Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 15/60] xen/sched: add domain pointer to struct sched_unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 16/60] xen/sched: add id " Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 17/60] xen/sched: rename scheduler related perf counters Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 18/60] xen/sched: switch struct task_slice from vcpu to sched_unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 19/60] xen/sched: add is_running indicator to struct sched_unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 20/60] xen/sched: make null scheduler vcpu agnostic Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 21/60] xen/sched: make rt " Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 22/60] xen/sched: make credit " Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 23/60] xen/sched: make credit2 " Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 24/60] xen/sched: make arinc653 " Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 25/60] xen: add sched_unit_pause_nosync() and sched_unit_unpause() Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 26/60] xen: let vcpu_create() select processor Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 27/60] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 28/60] xen/sched: switch schedule() from vcpus to sched_units Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 29/60] xen/sched: switch sched_move_irqs() to take sched_unit as parameter Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 30/60] xen: switch from for_each_vcpu() to for_each_sched_unit() Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 31/60] xen/sched: add runstate counters to struct sched_unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 32/60] xen/sched: rework and rename vcpu_force_reschedule() Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 33/60] xen/sched: Change vcpu_migrate_*() to operate on schedule unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 34/60] xen/sched: move struct task_slice into struct sched_unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 35/60] xen/sched: add code to sync scheduling of all vcpus of a sched unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 36/60] xen/sched: introduce unit_runnable_state() Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 37/60] xen/sched: add support for multiple vcpus per sched unit where missing Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 38/60] x86: make loading of GDT at context switch more modular Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-07-02 15:38   ` Andrew Cooper
2019-05-28 10:32 ` [PATCH 39/60] x86: optimize loading of GDT at context switch Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-07-02 16:09   ` Andrew Cooper
2019-07-03  6:30     ` Juergen Gross
2019-07-03 12:21       ` Andrew Cooper
2019-07-05  7:30         ` Juergen Gross
2019-05-28 10:32 ` [PATCH 40/60] xen/sched: modify cpupool_domain_cpumask() to be an unit mask Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 41/60] xen/sched: support allocating multiple vcpus into one sched unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 42/60] xen/sched: add a scheduler_percpu_init() function Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 43/60] xen/sched: add a percpu resource index Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 44/60] xen/sched: add fall back to idle vcpu when scheduling unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 45/60] xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:32 ` [PATCH 46/60] xen/sched: carve out freeing sched_unit memory into dedicated function Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 47/60] xen/sched: move per-cpu variable scheduler to struct sched_resource Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 48/60] xen/sched: move per-cpu variable cpupool " Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 49/60] xen/sched: reject switching smt on/off with core scheduling active Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 11:44   ` Jan Beulich
2019-05-28 11:44     ` [Xen-devel] " Jan Beulich
2019-05-28 11:52     ` Juergen Gross
2019-05-28 11:52       ` [Xen-devel] " Juergen Gross
2019-06-12  9:36       ` Dario Faggioli
2019-05-28 10:33 ` [PATCH 50/60] xen/sched: prepare per-cpupool scheduling granularity Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 51/60] xen/sched: use one schedule lock for all free cpus Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 52/60] xen/sched: populate cpupool0 only after all cpus are up Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 53/60] xen/sched: remove cpu from pool0 before removing it Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 11:47   ` Jan Beulich
2019-05-28 11:47     ` [Xen-devel] " Jan Beulich
2019-05-28 11:58     ` Juergen Gross
2019-05-28 11:58       ` [Xen-devel] " Juergen Gross
2019-05-31 14:15       ` Dario Faggioli
2019-05-31 14:15         ` [Xen-devel] " Dario Faggioli
2019-05-31 15:52   ` Dario Faggioli
2019-05-31 15:52     ` [Xen-devel] " Dario Faggioli
2019-05-31 16:44     ` Juergen Gross
2019-05-31 16:44       ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 55/60] xen/sched: split schedule_cpu_switch() Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 56/60] xen/sched: protect scheduling resource via rcu Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 57/60] xen/sched: support multiple cpus per scheduling resource Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 58/60] xen/sched: support differing granularity in schedule_cpu_[add/rm]() Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 59/60] xen/sched: support core scheduling for moving cpus to/from cpupools Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 10:33 ` [PATCH 60/60] xen/sched: add scheduling granularity enum Juergen Gross
2019-05-28 10:33   ` [Xen-devel] " Juergen Gross
2019-05-28 11:51   ` Jan Beulich
2019-05-28 11:51     ` [Xen-devel] " Jan Beulich
2019-05-28 12:02     ` Juergen Gross
2019-05-28 12:02       ` [Xen-devel] " Juergen Gross
2019-07-19 18:31   ` Dario Faggioli
2019-07-05 13:17 ` [Xen-devel] [PATCH 00/60] xen: add core scheduling support Sergey Dyasli
2019-07-05 13:22   ` Juergen Gross
2019-07-05 13:56   ` Dario Faggioli
2019-07-15 14:08     ` Sergey Dyasli
2019-07-18 14:48       ` Juergen Gross
2019-07-18 15:14         ` Sergey Dyasli
2019-07-18 16:04           ` Dario Faggioli
2019-07-19  5:41           ` Juergen Gross
2019-07-19 11:24             ` Juergen Gross
2019-07-19 13:57           ` Juergen Gross
2019-07-22 14:22             ` Sergey Dyasli
2019-07-24  9:13               ` Juergen Gross
2019-07-24 14:54                 ` Sergey Dyasli
2019-07-24 15:11                   ` Juergen Gross
2019-07-16 15:45   ` Sergey Dyasli
2019-07-19 13:35     ` Juergen Gross
2019-07-25 16:01   ` Sergey Dyasli
2019-07-11 13:40 ` Dario Faggioli
     [not found] <20190528103313.1343„1„jgross@suse.com>
     [not found] ` <20190528103313.1343„4„jgross@suse.com>
     [not found] <20190528103313.13431jgross@suse.com>
     [not found] ` <20190528103313.13434jgross@suse.com>

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.