All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/49] xen: add core scheduling support
@ 2019-03-29 15:08 Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier Juergen Gross
                   ` (54 more replies)
  0 siblings, 55 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Kevin Tian, Stefano Stabellini,
	Wei Liu, Jun Nakajima, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Robert VanVossen, Dario Faggioli,
	Julien Grall, Paul Durrant, Josh Whitehead, Meng Xu, Jan Beulich,
	Roger Pau Monné

This series is very RFC!!!!

Add support for core- and socket-scheduling in the Xen hypervisor.

Via boot parameter sched_granularity=core (or sched_granularity=socket)
it is possible to change the scheduling granularity from thread (the
default) to either whole cores or even sockets.

All logical cpus (threads) of the core or socket are always scheduled
together. This means that on a core always vcpus of the same domain
will be active, and those vcpus will always be scheduled at the same
time.

This is achieved by switching the scheduler to no longer see vcpus as
the primary object to schedule, but "schedule items". Each schedule
item consists of as many vcpus as each core has threads on the current
system. The vcpu->item relation is fixed.

I have done some very basic performance testing: on a 4 cpu system
(2 cores with 2 threads each) I did a "make -j 4" for building the Xen
hypervisor. With This test has been run on dom0, once with no other
guest active and once with another guest with 4 vcpus running the same
test. The results are (always elapsed time, system time, user time):

sched_granularity=thread, no other guest: 116.10 177.65 207.84
sched_granularity=core,   no other guest: 114.04 175.47 207.45
sched_granularity=thread, other guest:    202.30 334.21 384.63
sched_granularity=core,   other guest:    207.24 293.04 371.37

All tests have been performed with credit2, the other schedulers are
untested up to now.

Cpupools are not yet working, as moving cpus between cpupools needs
more work.

HVM domains do not work yet, there is a doublefault in Xen at the
end of Seabios. I'm currently investigating this issue.

This is x86-only for the moment. ARM doesn't even build with the
series applied. For full ARM support I might need some help with the
ARM specific context switch handling.

The first 7 patches have been sent to xen-devel already, I'm just
adding them here for convenience as they are prerequisites.

I'm especially looking for feedback regarding the overall idea and
design.


Juergen Gross (49):
  xen/sched: call cpu_disable_scheduler() via cpu notifier
  xen: add helper for calling notifier_call_chain() to common/cpu.c
  xen: add new cpu notifier action CPU_RESUME_FAILED
  xen: don't free percpu areas during suspend
  xen/cpupool: simplify suspend/resume handling
  xen/sched: don't disable scheduler on cpus during suspend
  xen/sched: fix credit2 smt idle handling
  xen/sched: use new sched_item instead of vcpu in scheduler interfaces
  xen/sched: alloc struct sched_item for each vcpu
  xen/sched: move per-vcpu scheduler private data pointer to sched_item
  xen/sched: build a linked list of struct sched_item
  xen/sched: introduce struct sched_resource
  xen/sched: let pick_cpu return a scheduler resource
  xen/sched: switch schedule_data.curr to point at sched_item
  xen/sched: move per cpu scheduler private data into struct
    sched_resource
  xen/sched: switch vcpu_schedule_lock to item_schedule_lock
  xen/sched: move some per-vcpu items to struct sched_item
  xen/sched: add scheduler helpers hiding vcpu
  xen/sched: add domain pointer to struct sched_item
  xen/sched: add id to struct sched_item
  xen/sched: rename scheduler related perf counters
  xen/sched: switch struct task_slice from vcpu to sched_item
  xen/sched: move is_running indicator to struct sched_item
  xen/sched: make null scheduler vcpu agnostic.
  xen/sched: make rt scheduler vcpu agnostic.
  xen/sched: make credit scheduler vcpu agnostic.
  xen/sched: make credit2 scheduler vcpu agnostic.
  xen/sched: make arinc653 scheduler vcpu agnostic.
  xen: add sched_item_pause_nosync() and sched_item_unpause()
  xen: let vcpu_create() select processor
  xen/sched: use sched_resource cpu instead smp_processor_id in
    schedulers
  xen/sched: switch schedule() from vcpus to sched_items
  xen/sched: switch sched_move_irqs() to take sched_item as parameter
  xen: switch from for_each_vcpu() to for_each_sched_item()
  xen/sched: add runstate counters to struct sched_item
  xen/sched: rework and rename vcpu_force_reschedule()
  xen/sched: Change vcpu_migrate_*() to operate on schedule item
  xen/sched: move struct task_slice into struct sched_item
  xen/sched: add code to sync scheduling of all vcpus of a sched item
  xen/sched: add support for multiple vcpus per sched item where missing
  x86: make loading of GDT at context switch more modular
  xen/sched: add support for guest vcpu idle
  xen/sched: modify cpupool_domain_cpumask() to be an item mask
  xen: round up max vcpus to scheduling granularity
  xen/sched: support allocating multiple vcpus into one sched item
  xen/sched: add a scheduler_percpu_init() function
  xen/sched: support core scheduling in continue_running()
  xen/sched: make vcpu_wake() core scheduling aware
  xen/sched: add scheduling granularity enum

 xen/arch/arm/domain.c                |   14 +
 xen/arch/arm/domain_build.c          |   13 +-
 xen/arch/arm/smpboot.c               |    6 +-
 xen/arch/x86/dom0_build.c            |   11 +-
 xen/arch/x86/domain.c                |  243 +++++--
 xen/arch/x86/hvm/dom0_build.c        |    9 +-
 xen/arch/x86/hvm/hvm.c               |    7 +-
 xen/arch/x86/hvm/viridian/viridian.c |    1 +
 xen/arch/x86/hvm/vlapic.c            |    1 +
 xen/arch/x86/hvm/vmx/vmcs.c          |    6 +-
 xen/arch/x86/hvm/vmx/vmx.c           |    5 +-
 xen/arch/x86/mm.c                    |   10 +-
 xen/arch/x86/percpu.c                |    3 +-
 xen/arch/x86/pv/descriptor-tables.c  |    6 +-
 xen/arch/x86/pv/dom0_build.c         |   10 +-
 xen/arch/x86/pv/domain.c             |   19 +
 xen/arch/x86/pv/emul-priv-op.c       |    2 +
 xen/arch/x86/pv/shim.c               |    4 +-
 xen/arch/x86/pv/traps.c              |    6 +-
 xen/arch/x86/setup.c                 |    2 +
 xen/arch/x86/smpboot.c               |    5 +-
 xen/arch/x86/traps.c                 |   10 +-
 xen/common/cpu.c                     |   61 +-
 xen/common/cpupool.c                 |  161 ++---
 xen/common/domain.c                  |   36 +-
 xen/common/domctl.c                  |   28 +-
 xen/common/keyhandler.c              |    7 +-
 xen/common/sched_arinc653.c          |  258 ++++---
 xen/common/sched_credit.c            |  704 +++++++++---------
 xen/common/sched_credit2.c           | 1143 +++++++++++++++---------------
 xen/common/sched_null.c              |  423 +++++------
 xen/common/sched_rt.c                |  538 +++++++-------
 xen/common/schedule.c                | 1292 ++++++++++++++++++++++++----------
 xen/common/softirq.c                 |    6 +-
 xen/common/wait.c                    |    5 +-
 xen/include/asm-x86/cpuidle.h        |    2 +-
 xen/include/asm-x86/dom0_build.h     |    3 +-
 xen/include/asm-x86/domain.h         |    3 +
 xen/include/xen/cpu.h                |   29 +-
 xen/include/xen/domain.h             |    3 +-
 xen/include/xen/perfc_defn.h         |   32 +-
 xen/include/xen/sched-if.h           |  276 ++++++--
 xen/include/xen/sched.h              |   40 +-
 xen/include/xen/softirq.h            |    1 +
 44 files changed, 3175 insertions(+), 2269 deletions(-)

-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-04-01  9:21   ` Julien Grall
  2019-03-29 15:08 ` [PATCH RFC 02/49] xen: add helper for calling notifier_call_chain() to common/cpu.c Juergen Gross
                   ` (53 subsequent siblings)
  54 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Dario Faggioli, Julien Grall, Jan Beulich,
	Roger Pau Monné

cpu_disable_scheduler() is being called from __cpu_disable() today.
There is no need to execute it on the cpu just being disabled, so use
the CPU_DEAD case of the cpu notifier chain. Moving the call out of
stop_machine() context is fine, as we just need to hold the domain RCU
lock and need the scheduler percpu data to be still allocated.

Add another hook for CPU_DOWN_PREPARE to bail out early in case
cpu_disable_scheduler() would fail. This will avoid crashes in rare
cases for cpu hotplug or suspend.

While at it remove a superfluous smp_mb() in the ARM __cpu_disable()
incarnation.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- add CPU_DOWN_PREPARE hook
- BUG() in case of cpu_disable_scheduler() failing in CPU_DEAD
  (Jan Beulich)
- modify ARM __cpu_disable(), too (Andrew Cooper)
---
 xen/arch/arm/smpboot.c |  4 ----
 xen/arch/x86/smpboot.c |  3 ---
 xen/common/schedule.c  | 42 +++++++++++++++++++++++++++++++++++-------
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 25cd44549c..0728a9b505 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -386,10 +386,6 @@ void __cpu_disable(void)
     /* It's now safe to remove this processor from the online map */
     cpumask_clear_cpu(cpu, &cpu_online_map);
 
-    if ( cpu_disable_scheduler(cpu) )
-        BUG();
-    smp_mb();
-
     /* Return to caller; eventually the IPI mechanism will unwind and the 
      * scheduler will drop to the idle loop, which will call stop_cpu(). */
 }
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 7d1226d7bc..b7a0a4a419 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -1221,9 +1221,6 @@ void __cpu_disable(void)
     cpumask_clear_cpu(cpu, &cpu_online_map);
     fixup_irqs(&cpu_online_map, 1);
     fixup_eoi();
-
-    if ( cpu_disable_scheduler(cpu) )
-        BUG();
 }
 
 void __cpu_die(unsigned int cpu)
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 60755a631e..5d2bbd5198 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -773,8 +773,9 @@ void restore_vcpu_affinity(struct domain *d)
 }
 
 /*
- * This function is used by cpu_hotplug code from stop_machine context
+ * This function is used by cpu_hotplug code via cpu notifier chain
  * and from cpupools to switch schedulers on a cpu.
+ * Caller must get domlist_read_lock.
  */
 int cpu_disable_scheduler(unsigned int cpu)
 {
@@ -789,12 +790,6 @@ int cpu_disable_scheduler(unsigned int cpu)
     if ( c == NULL )
         return ret;
 
-    /*
-     * We'd need the domain RCU lock, but:
-     *  - when we are called from cpupool code, it's acquired there already;
-     *  - when we are called for CPU teardown, we're in stop-machine context,
-     *    so that's not be a problem.
-     */
     for_each_domain_in_cpupool ( d, c )
     {
         for_each_vcpu ( d, v )
@@ -893,6 +888,30 @@ int cpu_disable_scheduler(unsigned int cpu)
     return ret;
 }
 
+static int cpu_disable_scheduler_check(unsigned int cpu)
+{
+    struct domain *d;
+    struct vcpu *v;
+    struct cpupool *c;
+
+    c = per_cpu(cpupool, cpu);
+    if ( c == NULL )
+        return 0;
+
+    for_each_domain_in_cpupool ( d, c )
+    {
+        for_each_vcpu ( d, v )
+        {
+            if ( v->affinity_broken )
+                return -EADDRINUSE;
+            if ( system_state != SYS_STATE_suspend && v->processor == cpu )
+                return -EAGAIN;
+        }
+    }
+
+    return 0;
+}
+
 /*
  * In general, this must be called with the scheduler lock held, because the
  * adjust_affinity hook may want to modify the vCPU state. However, when the
@@ -1737,7 +1756,16 @@ static int cpu_schedule_callback(
     case CPU_UP_PREPARE:
         rc = cpu_schedule_up(cpu);
         break;
+    case CPU_DOWN_PREPARE:
+        rcu_read_lock(&domlist_read_lock);
+        rc = cpu_disable_scheduler_check(cpu);
+        rcu_read_unlock(&domlist_read_lock);
+        break;
     case CPU_DEAD:
+        rcu_read_lock(&domlist_read_lock);
+        rc = cpu_disable_scheduler(cpu);
+        BUG_ON(rc);
+        rcu_read_unlock(&domlist_read_lock);
         SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
         /* Fallthrough */
     case CPU_UP_CANCELED:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 02/49] xen: add helper for calling notifier_call_chain() to common/cpu.c
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 03/49] xen: add new cpu notifier action CPU_RESUME_FAILED Juergen Gross
                   ` (52 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich

Add a helper cpu_notifier_call_chain() to call notifier_call_chain()
for a cpu with a specified action, returning an errno value.

This avoids coding the same pattern multiple times.

While at it avoid side effects from using BUG_ON() by not using
cpu_online(cpu) as a parameter.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- add nofail parameter to cpu_notifier_call_chain()
- avoid side effects from using BUG_ON() macro (Andrew Cooper)
---
 xen/common/cpu.c | 56 ++++++++++++++++++++++++++------------------------------
 1 file changed, 26 insertions(+), 30 deletions(-)

diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index 836c62f97f..8bf69600a6 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -71,11 +71,21 @@ void __init register_cpu_notifier(struct notifier_block *nb)
     spin_unlock(&cpu_add_remove_lock);
 }
 
+static int cpu_notifier_call_chain(unsigned int cpu, unsigned long action,
+                                   struct notifier_block **nb, bool nofail)
+{
+    void *hcpu = (void *)(long)cpu;
+    int notifier_rc = notifier_call_chain(&cpu_chain, action, hcpu, nb);
+    int ret = (notifier_rc == NOTIFY_DONE) ? 0 : notifier_to_errno(notifier_rc);
+
+    BUG_ON(ret && nofail);
+
+    return ret;
+}
+
 static void _take_cpu_down(void *unused)
 {
-    void *hcpu = (void *)(long)smp_processor_id();
-    int notifier_rc = notifier_call_chain(&cpu_chain, CPU_DYING, hcpu, NULL);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(smp_processor_id(), CPU_DYING, NULL, true);
     __cpu_disable();
 }
 
@@ -87,8 +97,7 @@ static int take_cpu_down(void *arg)
 
 int cpu_down(unsigned int cpu)
 {
-    int err, notifier_rc;
-    void *hcpu = (void *)(long)cpu;
+    int err;
     struct notifier_block *nb = NULL;
 
     if ( !cpu_hotplug_begin() )
@@ -100,12 +109,9 @@ int cpu_down(unsigned int cpu)
         return -EINVAL;
     }
 
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE, hcpu, &nb);
-    if ( notifier_rc != NOTIFY_DONE )
-    {
-        err = notifier_to_errno(notifier_rc);
+    err = cpu_notifier_call_chain(cpu, CPU_DOWN_PREPARE, &nb, false);
+    if ( err )
         goto fail;
-    }
 
     if ( unlikely(system_state < SYS_STATE_active) )
         on_selected_cpus(cpumask_of(cpu), _take_cpu_down, NULL, true);
@@ -113,26 +119,24 @@ int cpu_down(unsigned int cpu)
         goto fail;
 
     __cpu_die(cpu);
-    BUG_ON(cpu_online(cpu));
+    err = cpu_online(cpu);
+    BUG_ON(err);
 
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_DEAD, hcpu, NULL);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(cpu, CPU_DEAD, NULL, true);
 
     send_global_virq(VIRQ_PCPU_STATE);
     cpu_hotplug_done();
     return 0;
 
  fail:
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED, hcpu, &nb);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(cpu, CPU_DOWN_FAILED, &nb, true);
     cpu_hotplug_done();
     return err;
 }
 
 int cpu_up(unsigned int cpu)
 {
-    int notifier_rc, err = 0;
-    void *hcpu = (void *)(long)cpu;
+    int err;
     struct notifier_block *nb = NULL;
 
     if ( !cpu_hotplug_begin() )
@@ -144,19 +148,15 @@ int cpu_up(unsigned int cpu)
         return -EINVAL;
     }
 
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_UP_PREPARE, hcpu, &nb);
-    if ( notifier_rc != NOTIFY_DONE )
-    {
-        err = notifier_to_errno(notifier_rc);
+    err = cpu_notifier_call_chain(cpu, CPU_UP_PREPARE, &nb, false);
+    if ( err )
         goto fail;
-    }
 
     err = __cpu_up(cpu);
     if ( err < 0 )
         goto fail;
 
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_ONLINE, hcpu, NULL);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(cpu, CPU_ONLINE, NULL, true);
 
     send_global_virq(VIRQ_PCPU_STATE);
 
@@ -164,18 +164,14 @@ int cpu_up(unsigned int cpu)
     return 0;
 
  fail:
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_UP_CANCELED, hcpu, &nb);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(cpu, CPU_UP_CANCELED, &nb, true);
     cpu_hotplug_done();
     return err;
 }
 
 void notify_cpu_starting(unsigned int cpu)
 {
-    void *hcpu = (void *)(long)cpu;
-    int notifier_rc = notifier_call_chain(
-        &cpu_chain, CPU_STARTING, hcpu, NULL);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(cpu, CPU_STARTING, NULL, true);
 }
 
 static cpumask_t frozen_cpus;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 03/49] xen: add new cpu notifier action CPU_RESUME_FAILED
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 02/49] xen: add helper for calling notifier_call_chain() to common/cpu.c Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 04/49] xen: don't free percpu areas during suspend Juergen Gross
                   ` (51 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich

Add a new cpu notifier action CPU_RESUME_FAILED which is called for all
cpus which failed to come up at resume. The calls will be done after
all other cpus are already up in order to know which resources are
available then.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
V2:
- added comment in xen/include/xen/cpu.h (Dario Faggioli)
---
 xen/common/cpu.c      |  5 +++++
 xen/include/xen/cpu.h | 29 ++++++++++++++++++-----------
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index 8bf69600a6..a6efc5e604 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -218,7 +218,12 @@ void enable_nonboot_cpus(void)
             printk("Error bringing CPU%d up: %d\n", cpu, error);
             BUG_ON(error == -EBUSY);
         }
+        else
+            __cpumask_clear_cpu(cpu, &frozen_cpus);
     }
 
+    for_each_cpu ( cpu, &frozen_cpus )
+        cpu_notifier_call_chain(cpu, CPU_RESUME_FAILED, NULL, true);
+
     cpumask_clear(&frozen_cpus);
 }
diff --git a/xen/include/xen/cpu.h b/xen/include/xen/cpu.h
index 2fe3ec05d8..4638c509e2 100644
--- a/xen/include/xen/cpu.h
+++ b/xen/include/xen/cpu.h
@@ -22,33 +22,40 @@ void register_cpu_notifier(struct notifier_block *nb);
  *  CPU_UP_PREPARE -> CPU_STARTING -> CPU_ONLINE -- successful CPU up
  *  CPU_DOWN_PREPARE -> CPU_DOWN_FAILED          -- failed CPU down
  *  CPU_DOWN_PREPARE -> CPU_DYING -> CPU_DEAD    -- successful CPU down
- * 
+ * in the resume case we have additionally:
+ *  CPU_UP_PREPARE -> CPU_UP_CANCELLED -> CPU_RESUME_FAILED -- CPU not resumed
+ *  with the CPU_RESUME_FAILED handler called only after all CPUs have been
+ *  tried to put online again in order to know which CPUs did restart
+ *  successfully.
+ *
  * Hence note that only CPU_*_PREPARE handlers are allowed to fail. Also note
  * that once CPU_DYING is delivered, an offline action can no longer fail.
- * 
+ *
  * Notifiers are called highest-priority-first when:
  *  (a) A CPU is coming up; or (b) CPU_DOWN_FAILED
  * Notifiers are called lowest-priority-first when:
  *  (a) A CPU is going down; or (b) CPU_UP_CANCELED
  */
 /* CPU_UP_PREPARE: Preparing to bring CPU online. */
-#define CPU_UP_PREPARE   (0x0001 | NOTIFY_FORWARD)
+#define CPU_UP_PREPARE    (0x0001 | NOTIFY_FORWARD)
 /* CPU_UP_CANCELED: CPU is no longer being brought online. */
-#define CPU_UP_CANCELED  (0x0002 | NOTIFY_REVERSE)
+#define CPU_UP_CANCELED   (0x0002 | NOTIFY_REVERSE)
 /* CPU_STARTING: CPU nearly online. Runs on new CPU, irqs still disabled. */
-#define CPU_STARTING     (0x0003 | NOTIFY_FORWARD)
+#define CPU_STARTING      (0x0003 | NOTIFY_FORWARD)
 /* CPU_ONLINE: CPU is up. */
-#define CPU_ONLINE       (0x0004 | NOTIFY_FORWARD)
+#define CPU_ONLINE        (0x0004 | NOTIFY_FORWARD)
 /* CPU_DOWN_PREPARE: CPU is going down. */
-#define CPU_DOWN_PREPARE (0x0005 | NOTIFY_REVERSE)
+#define CPU_DOWN_PREPARE  (0x0005 | NOTIFY_REVERSE)
 /* CPU_DOWN_FAILED: CPU is no longer going down. */
-#define CPU_DOWN_FAILED  (0x0006 | NOTIFY_FORWARD)
+#define CPU_DOWN_FAILED   (0x0006 | NOTIFY_FORWARD)
 /* CPU_DYING: CPU is nearly dead (in stop_machine context). */
-#define CPU_DYING        (0x0007 | NOTIFY_REVERSE)
+#define CPU_DYING         (0x0007 | NOTIFY_REVERSE)
 /* CPU_DEAD: CPU is dead. */
-#define CPU_DEAD         (0x0008 | NOTIFY_REVERSE)
+#define CPU_DEAD          (0x0008 | NOTIFY_REVERSE)
 /* CPU_REMOVE: CPU was removed. */
-#define CPU_REMOVE       (0x0009 | NOTIFY_REVERSE)
+#define CPU_REMOVE        (0x0009 | NOTIFY_REVERSE)
+/* CPU_RESUME_FAILED: CPU failed to come up in resume, all other CPUs up. */
+#define CPU_RESUME_FAILED (0x000a | NOTIFY_REVERSE)
 
 /* Perform CPU hotplug. May return -EAGAIN. */
 int cpu_down(unsigned int cpu);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 04/49] xen: don't free percpu areas during suspend
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (2 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 03/49] xen: add new cpu notifier action CPU_RESUME_FAILED Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 05/49] xen/cpupool: simplify suspend/resume handling Juergen Gross
                   ` (50 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

Instead of freeing percpu areas during suspend and allocating them
again when resuming keep them. Only free an area in case a cpu didn't
come up again when resuming.

It should be noted that there is a potential change in behaviour as
the percpu areas are no longer zeroed out during suspend/resume. While
I have checked the called cpu notifier hooks to cope with that there
might be some well hidden dependency on the previous behaviour. OTOH
a component not registering itself for cpu down/up and expecting to
see a zeroed percpu variable after suspend/resume is kind of broken
already. And the opposite case, where a component is not registered
to be called for cpu down/up and is not expecting a percpu variable
suddenly to be zero due to suspend/resume is much more probable,
especially as the suspend/resume functionality seems not to be tested
that often.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
 xen/arch/x86/percpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/percpu.c b/xen/arch/x86/percpu.c
index 8be4ebddf4..5ea14b6ec3 100644
--- a/xen/arch/x86/percpu.c
+++ b/xen/arch/x86/percpu.c
@@ -76,7 +76,8 @@ static int cpu_percpu_callback(
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        if ( !park_offline_cpus )
+    case CPU_RESUME_FAILED:
+        if ( !park_offline_cpus && system_state != SYS_STATE_suspend )
             free_percpu_area(cpu);
         break;
     case CPU_REMOVE:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 05/49] xen/cpupool: simplify suspend/resume handling
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (3 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 04/49] xen: don't free percpu areas during suspend Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 06/49] xen/sched: don't disable scheduler on cpus during suspend Juergen Gross
                   ` (49 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Instead of removing cpus temporarily from cpupools during
suspend/resume only remove cpus finally which didn't come up when
resuming.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V2:
- add comment (George Dunlap)
---
 xen/common/cpupool.c       | 131 ++++++++++++++++++---------------------------
 xen/include/xen/sched-if.h |   1 -
 2 files changed, 52 insertions(+), 80 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index e89bb67e71..31ac323e40 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -47,12 +47,6 @@ static struct cpupool *alloc_cpupool_struct(void)
         xfree(c);
         c = NULL;
     }
-    else if ( !zalloc_cpumask_var(&c->cpu_suspended) )
-    {
-        free_cpumask_var(c->cpu_valid);
-        xfree(c);
-        c = NULL;
-    }
 
     return c;
 }
@@ -60,10 +54,7 @@ static struct cpupool *alloc_cpupool_struct(void)
 static void free_cpupool_struct(struct cpupool *c)
 {
     if ( c )
-    {
-        free_cpumask_var(c->cpu_suspended);
         free_cpumask_var(c->cpu_valid);
-    }
     xfree(c);
 }
 
@@ -477,10 +468,6 @@ void cpupool_rm_domain(struct domain *d)
 /*
  * Called to add a cpu to a pool. CPUs being hot-plugged are added to pool0,
  * as they must have been in there when unplugged.
- *
- * If, on the other hand, we are adding CPUs because we are resuming (e.g.,
- * after ACPI S3) we put the cpu back in the pool where it was in prior when
- * we suspended.
  */
 static int cpupool_cpu_add(unsigned int cpu)
 {
@@ -490,42 +477,15 @@ static int cpupool_cpu_add(unsigned int cpu)
     cpumask_clear_cpu(cpu, &cpupool_locked_cpus);
     cpumask_set_cpu(cpu, &cpupool_free_cpus);
 
-    if ( system_state == SYS_STATE_suspend || system_state == SYS_STATE_resume )
-    {
-        struct cpupool **c;
-
-        for_each_cpupool(c)
-        {
-            if ( cpumask_test_cpu(cpu, (*c)->cpu_suspended ) )
-            {
-                ret = cpupool_assign_cpu_locked(*c, cpu);
-                if ( ret )
-                    goto out;
-                cpumask_clear_cpu(cpu, (*c)->cpu_suspended);
-                break;
-            }
-        }
+    /*
+     * If we are not resuming, we are hot-plugging cpu, and in which case
+     * we add it to pool0, as it certainly was there when hot-unplagged
+     * (or unplugging would have failed) and that is the default behavior
+     * anyway.
+     */
+    per_cpu(cpupool, cpu) = NULL;
+    ret = cpupool_assign_cpu_locked(cpupool0, cpu);
 
-        /*
-         * Either cpu has been found as suspended in a pool, and added back
-         * there, or it stayed free (if it did not belong to any pool when
-         * suspending), and we don't want to do anything.
-         */
-        ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
-               cpumask_test_cpu(cpu, (*c)->cpu_valid));
-    }
-    else
-    {
-        /*
-         * If we are not resuming, we are hot-plugging cpu, and in which case
-         * we add it to pool0, as it certainly was there when hot-unplagged
-         * (or unplugging would have failed) and that is the default behavior
-         * anyway.
-         */
-        per_cpu(cpupool, cpu) = NULL;
-        ret = cpupool_assign_cpu_locked(cpupool0, cpu);
-    }
- out:
     spin_unlock(&cpupool_lock);
 
     return ret;
@@ -535,42 +495,14 @@ static int cpupool_cpu_add(unsigned int cpu)
  * Called to remove a CPU from a pool. The CPU is locked, to forbid removing
  * it from pool0. In fact, if we want to hot-unplug a CPU, it must belong to
  * pool0, or we fail.
- *
- * However, if we are suspending (e.g., to ACPI S3), we mark the CPU in such
- * a way that it can be put back in its pool when resuming.
  */
 static int cpupool_cpu_remove(unsigned int cpu)
 {
     int ret = -ENODEV;
 
     spin_lock(&cpupool_lock);
-    if ( system_state == SYS_STATE_suspend )
-    {
-        struct cpupool **c;
-
-        for_each_cpupool(c)
-        {
-            if ( cpumask_test_cpu(cpu, (*c)->cpu_valid ) )
-            {
-                cpumask_set_cpu(cpu, (*c)->cpu_suspended);
-                cpumask_clear_cpu(cpu, (*c)->cpu_valid);
-                break;
-            }
-        }
 
-        /*
-         * Either we found cpu in a pool, or it must be free (if it has been
-         * hot-unplagged, then we must have found it in pool0). It is, of
-         * course, fine to suspend or shutdown with CPUs not assigned to a
-         * pool, and (in case of suspend) they will stay free when resuming.
-         */
-        ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
-               cpumask_test_cpu(cpu, (*c)->cpu_suspended));
-        ASSERT(cpumask_test_cpu(cpu, &cpu_online_map) ||
-               cpumask_test_cpu(cpu, cpupool0->cpu_suspended));
-        ret = 0;
-    }
-    else if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
     {
         /*
          * If we are not suspending, we are hot-unplugging cpu, and that is
@@ -587,6 +519,41 @@ static int cpupool_cpu_remove(unsigned int cpu)
     return ret;
 }
 
+/*
+ * Called during resume for all cpus which didn't come up again. The cpu must
+ * be removed from the cpupool it is assigned to. In case a cpupool will be
+ * left without cpu we move all domains of that cpupool to cpupool0.
+ */
+static void cpupool_cpu_remove_forced(unsigned int cpu)
+{
+    struct cpupool **c;
+    struct domain *d;
+
+    spin_lock(&cpupool_lock);
+
+    if ( cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+        cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+    else
+    {
+        for_each_cpupool(c)
+        {
+            if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
+            {
+                cpumask_clear_cpu(cpu, (*c)->cpu_valid);
+                if ( cpumask_weight((*c)->cpu_valid) == 0 )
+                {
+                    if ( *c == cpupool0 )
+                        panic("No cpu left in cpupool0\n");
+                    for_each_domain_in_cpupool(d, *c)
+                        cpupool_move_domain_locked(d, cpupool0);
+                }
+            }
+        }
+    }
+
+    spin_unlock(&cpupool_lock);
+}
+
 /*
  * do cpupool related sysctl operations
  */
@@ -774,10 +741,16 @@ static int cpu_callback(
     {
     case CPU_DOWN_FAILED:
     case CPU_ONLINE:
-        rc = cpupool_cpu_add(cpu);
+        if ( system_state <= SYS_STATE_active )
+            rc = cpupool_cpu_add(cpu);
         break;
     case CPU_DOWN_PREPARE:
-        rc = cpupool_cpu_remove(cpu);
+        /* Suspend/Resume don't change assignments of cpus to cpupools. */
+        if ( system_state <= SYS_STATE_active )
+            rc = cpupool_cpu_remove(cpu);
+        break;
+    case CPU_RESUME_FAILED:
+        cpupool_cpu_remove_forced(cpu);
         break;
     default:
         break;
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 9596eae1e2..92bc7a0365 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -214,7 +214,6 @@ struct cpupool
 {
     int              cpupool_id;
     cpumask_var_t    cpu_valid;      /* all cpus assigned to pool */
-    cpumask_var_t    cpu_suspended;  /* cpus in S3 that should be in this pool */
     struct cpupool   *next;
     unsigned int     n_dom;
     struct scheduler *sched;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 06/49] xen/sched: don't disable scheduler on cpus during suspend
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (4 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 05/49] xen/cpupool: simplify suspend/resume handling Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 07/49] xen/sched: fix credit2 smt idle handling Juergen Gross
                   ` (48 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Today there is special handling in cpu_disable_scheduler() for suspend
by forcing all vcpus to the boot cpu. In fact there is no need for that
as during resume the vcpus are put on the correct cpus again.

So we can just omit the call of cpu_disable_scheduler() when offlining
a cpu due to suspend and on resuming we can omit taking the schedule
lock for selecting the new processor.

In restore_vcpu_affinity() we should be careful when applying affinity
as the cpu might not have come back to life. This in turn enables us
to even support affinity_broken across suspend/resume.

Avoid all other scheduler dealloc - alloc dance when doing suspend and
resume, too. It is enough to react on cpus failing to come up on resume
again.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 161 ++++++++++++++++----------------------------------
 1 file changed, 52 insertions(+), 109 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5d2bbd5198..6b5d454630 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -560,33 +560,6 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
         v->processor = new_cpu;
 }
 
-/*
- * Move a vcpu from its current processor to a target new processor,
- * without asking the scheduler to do any placement. This is intended
- * for being called from special contexts, where things are quiet
- * enough that no contention is supposed to happen (i.e., during
- * shutdown or software suspend, like ACPI S3).
- */
-static void vcpu_move_nosched(struct vcpu *v, unsigned int new_cpu)
-{
-    unsigned long flags;
-    spinlock_t *lock, *new_lock;
-
-    ASSERT(system_state == SYS_STATE_suspend);
-    ASSERT(!vcpu_runnable(v) && (atomic_read(&v->pause_count) ||
-                                 atomic_read(&v->domain->pause_count)));
-
-    lock = per_cpu(schedule_data, v->processor).schedule_lock;
-    new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
-
-    sched_spin_lock_double(lock, new_lock, &flags);
-    ASSERT(new_cpu != v->processor);
-    vcpu_move_locked(v, new_cpu);
-    sched_spin_unlock_double(lock, new_lock, flags);
-
-    sched_move_irqs(v);
-}
-
 /*
  * Initiating migration
  *
@@ -735,31 +708,36 @@ void restore_vcpu_affinity(struct domain *d)
 
         ASSERT(!vcpu_runnable(v));
 
-        lock = vcpu_schedule_lock_irq(v);
-
-        if ( v->affinity_broken )
-        {
-            sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
-            v->affinity_broken = 0;
-
-        }
-
         /*
-         * During suspend (in cpu_disable_scheduler()), we moved every vCPU
-         * to BSP (which, as of now, is pCPU 0), as a temporary measure to
-         * allow the nonboot processors to have their data structure freed
-         * and go to sleep. But nothing guardantees that the BSP is a valid
-         * pCPU for a particular domain.
+         * Re-assign the initial processor as after resume we have no
+         * guarantee the old processor has come back to life again.
          *
          * Therefore, here, before actually unpausing the domains, we should
          * set v->processor of each of their vCPUs to something that will
          * make sense for the scheduler of the cpupool in which they are in.
          */
         cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
-                    cpupool_domain_cpumask(v->domain));
-        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+                    cpupool_domain_cpumask(d));
+        if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
+        {
+            if ( v->affinity_broken )
+            {
+                sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
+                v->affinity_broken = 0;
+                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                            cpupool_domain_cpumask(d));
+            }
 
-        spin_unlock_irq(lock);
+            if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
+            {
+                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
+                sched_set_affinity(v, &cpumask_all, NULL);
+                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                            cpupool_domain_cpumask(d));
+            }
+        }
+
+        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
         lock = vcpu_schedule_lock_irq(v);
         v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, v);
@@ -783,7 +761,6 @@ int cpu_disable_scheduler(unsigned int cpu)
     struct vcpu *v;
     struct cpupool *c;
     cpumask_t online_affinity;
-    unsigned int new_cpu;
     int ret = 0;
 
     c = per_cpu(cpupool, cpu);
@@ -809,14 +786,7 @@ int cpu_disable_scheduler(unsigned int cpu)
                     break;
                 }
 
-                if (system_state == SYS_STATE_suspend)
-                {
-                    cpumask_copy(v->cpu_hard_affinity_saved,
-                                 v->cpu_hard_affinity);
-                    v->affinity_broken = 1;
-                }
-                else
-                    printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
+                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
 
                 sched_set_affinity(v, &cpumask_all, NULL);
             }
@@ -828,60 +798,26 @@ int cpu_disable_scheduler(unsigned int cpu)
                 continue;
             }
 
-            /* If it is on this cpu, we must send it away. */
-            if ( unlikely(system_state == SYS_STATE_suspend) )
-            {
-                vcpu_schedule_unlock_irqrestore(lock, flags, v);
-
-                /*
-                 * If we are doing a shutdown/suspend, it is not necessary to
-                 * ask the scheduler to chime in. In fact:
-                 *  * there is no reason for it: the end result we are after
-                 *    is just 'all the vcpus on the boot pcpu, and no vcpu
-                 *    anywhere else', so let's just go for it;
-                 *  * it's wrong, for cpupools with only non-boot pcpus, as
-                 *    the scheduler would always fail to send the vcpus away
-                 *    from the last online (non boot) pcpu!
-                 *
-                 * Therefore, in the shutdown/suspend case, we just pick up
-                 * one (still) online pcpu. Note that, at this stage, all
-                 * domains (including dom0) have been paused already, so we
-                 * do not expect any vcpu activity at all.
-                 */
-                cpumask_andnot(&online_affinity, &cpu_online_map,
-                               cpumask_of(cpu));
-                BUG_ON(cpumask_empty(&online_affinity));
-                /*
-                 * As boot cpu is, usually, pcpu #0, using cpumask_first()
-                 * will make us converge quicker.
-                 */
-                new_cpu = cpumask_first(&online_affinity);
-                vcpu_move_nosched(v, new_cpu);
-            }
-            else
-            {
-                /*
-                 * OTOH, if the system is still live, and we are here because
-                 * we are doing some cpupool manipulations:
-                 *  * we want to call the scheduler, and let it re-evaluation
-                 *    the placement of the vcpu, taking into account the new
-                 *    cpupool configuration;
-                 *  * the scheduler will always fine a suitable solution, or
-                 *    things would have failed before getting in here.
-                 */
-                vcpu_migrate_start(v);
-                vcpu_schedule_unlock_irqrestore(lock, flags, v);
+            /* If it is on this cpu, we must send it away.
+             * We are doing some cpupool manipulations:
+             *  * we want to call the scheduler, and let it re-evaluation
+             *    the placement of the vcpu, taking into account the new
+             *    cpupool configuration;
+             *  * the scheduler will always find a suitable solution, or
+             *    things would have failed before getting in here.
+             */
+            vcpu_migrate_start(v);
+            vcpu_schedule_unlock_irqrestore(lock, flags, v);
 
-                vcpu_migrate_finish(v);
+            vcpu_migrate_finish(v);
 
-                /*
-                 * The only caveat, in this case, is that if a vcpu active in
-                 * the hypervisor isn't migratable. In this case, the caller
-                 * should try again after releasing and reaquiring all locks.
-                 */
-                if ( v->processor == cpu )
-                    ret = -EAGAIN;
-            }
+            /*
+             * The only caveat, in this case, is that if a vcpu active in
+             * the hypervisor isn't migratable. In this case, the caller
+             * should try again after releasing and reaquiring all locks.
+             */
+            if ( v->processor == cpu )
+                ret = -EAGAIN;
         }
     }
 
@@ -1751,26 +1687,33 @@ static int cpu_schedule_callback(
     switch ( action )
     {
     case CPU_STARTING:
-        SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
+        if ( system_state != SYS_STATE_resume )
+            SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
         break;
     case CPU_UP_PREPARE:
-        rc = cpu_schedule_up(cpu);
+        if ( system_state != SYS_STATE_resume )
+            rc = cpu_schedule_up(cpu);
         break;
     case CPU_DOWN_PREPARE:
         rcu_read_lock(&domlist_read_lock);
         rc = cpu_disable_scheduler_check(cpu);
         rcu_read_unlock(&domlist_read_lock);
         break;
+    case CPU_RESUME_FAILED:
     case CPU_DEAD:
+        if ( system_state == SYS_STATE_suspend )
+            break;
         rcu_read_lock(&domlist_read_lock);
         rc = cpu_disable_scheduler(cpu);
         BUG_ON(rc);
         rcu_read_unlock(&domlist_read_lock);
         SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
-        /* Fallthrough */
-    case CPU_UP_CANCELED:
         cpu_schedule_down(cpu);
         break;
+    case CPU_UP_CANCELED:
+        if ( system_state != SYS_STATE_resume )
+            cpu_schedule_down(cpu);
+        break;
     default:
         break;
     }
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 07/49] xen/sched: fix credit2 smt idle handling
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (5 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 06/49] xen/sched: don't disable scheduler on cpus during suspend Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 18:22   ` Dario Faggioli
  2019-03-29 15:08 ` [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces Juergen Gross
                   ` (47 subsequent siblings)
  54 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Credit2's smt_idle_mask_set() and smt_idle_mask_clear() are used to
identify idle cores where vcpus can be moved to. A core is thought to
be idle when all siblings are known to have the idle vcpu running on
them.

Unfortunately the information of a vcpu running on a cpu is per
runqueue. So in case not all siblings are in the same runqueue a core
will never be regarded to be idle, as the sibling not in the runqueue
is never known to run the idle vcpu.

Use a credit2 specific cpumask of siblings with only those cpus
being marked which are in the same runqueue as the cpu in question.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- use credit2 per-cpu specific sibling mask
---
 xen/common/sched_credit2.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 543dc3664d..6958b265fc 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -504,6 +504,7 @@ struct csched2_private {
  * Physical CPU
  */
 struct csched2_pcpu {
+    cpumask_t sibling_mask;            /* Siblings in the same runqueue      */
     int runq_id;
 };
 
@@ -656,7 +657,7 @@ static inline
 void smt_idle_mask_set(unsigned int cpu, const cpumask_t *idlers,
                        cpumask_t *mask)
 {
-    const cpumask_t *cpu_siblings = per_cpu(cpu_sibling_mask, cpu);
+    const cpumask_t *cpu_siblings = &csched2_pcpu(cpu)->sibling_mask;
 
     if ( cpumask_subset(cpu_siblings, idlers) )
         cpumask_or(mask, mask, cpu_siblings);
@@ -668,10 +669,10 @@ void smt_idle_mask_set(unsigned int cpu, const cpumask_t *idlers,
 static inline
 void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 {
-    const cpumask_t *cpu_siblings = per_cpu(cpu_sibling_mask, cpu);
+    const cpumask_t *cpu_siblings = &csched2_pcpu(cpu)->sibling_mask;
 
     if ( cpumask_subset(cpu_siblings, mask) )
-        cpumask_andnot(mask, mask, per_cpu(cpu_sibling_mask, cpu));
+        cpumask_andnot(mask, mask, cpu_siblings);
 }
 
 /*
@@ -3793,6 +3794,7 @@ init_pdata(struct csched2_private *prv, struct csched2_pcpu *spc,
            unsigned int cpu)
 {
     struct csched2_runqueue_data *rqd;
+    unsigned int rcpu;
 
     ASSERT(rw_is_write_locked(&prv->lock));
     ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
@@ -3810,12 +3812,23 @@ init_pdata(struct csched2_private *prv, struct csched2_pcpu *spc,
         printk(XENLOG_INFO " First cpu on runqueue, activating\n");
         activate_runqueue(prv, spc->runq_id);
     }
-    
+
     __cpumask_set_cpu(cpu, &rqd->idle);
     __cpumask_set_cpu(cpu, &rqd->active);
     __cpumask_set_cpu(cpu, &prv->initialized);
     __cpumask_set_cpu(cpu, &rqd->smt_idle);
 
+    /* On the boot cpu we are called before cpu_sibling_mask has been set up. */
+    if ( cpu == 0 && system_state < SYS_STATE_active )
+        __cpumask_set_cpu(cpu, &csched2_pcpu(cpu)->sibling_mask);
+    else
+        for_each_cpu ( rcpu, per_cpu(cpu_sibling_mask, cpu) )
+            if ( cpumask_test_cpu(rcpu, &rqd->active) )
+            {
+                __cpumask_set_cpu(cpu, &csched2_pcpu(rcpu)->sibling_mask);
+                __cpumask_set_cpu(rcpu, &csched2_pcpu(cpu)->sibling_mask);
+            }
+
     if ( cpumask_weight(&rqd->active) == 1 )
         rqd->pick_bias = cpu;
 
@@ -3897,6 +3910,7 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
     struct csched2_private *prv = csched2_priv(ops);
     struct csched2_runqueue_data *rqd;
     struct csched2_pcpu *spc = pcpu;
+    unsigned int rcpu;
 
     write_lock_irqsave(&prv->lock, flags);
 
@@ -3923,6 +3937,9 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 
     printk(XENLOG_INFO "Removing cpu %d from runqueue %d\n", cpu, spc->runq_id);
 
+    for_each_cpu ( rcpu, &rqd->active )
+        __cpumask_clear_cpu(cpu, &csched2_pcpu(rcpu)->sibling_mask);
+
     __cpumask_clear_cpu(cpu, &rqd->idle);
     __cpumask_clear_cpu(cpu, &rqd->smt_idle);
     __cpumask_clear_cpu(cpu, &rqd->active);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (6 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 07/49] xen/sched: fix credit2 smt idle handling Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 18:42   ` Andrew Cooper
  2019-03-29 15:08 ` [PATCH RFC 09/49] xen/sched: alloc struct sched_item for each vcpu Juergen Gross
                   ` (46 subsequent siblings)
  54 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

In order to prepare core- and socket-scheduling use a new struct
sched_item instead of struct vcpu for interfaces of the different
schedulers.

Rename the per-scheduler functions insert_vcpu and remove_vcpu to
insert_item and remove_item to reflect the change of the parameter.
In the schedulers rename local functions switched to sched_item, too.

For now this new struct will contain a vcpu pointer only and is
allocated on the stack. This will be changed later.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c | 30 +++++++++++++++---------
 xen/common/sched_credit.c   | 41 +++++++++++++++++++-------------
 xen/common/sched_credit2.c  | 57 +++++++++++++++++++++++++++------------------
 xen/common/sched_null.c     | 39 ++++++++++++++++++++-----------
 xen/common/sched_rt.c       | 33 +++++++++++++++-----------
 xen/common/schedule.c       | 53 ++++++++++++++++++++++++++++-------------
 xen/include/xen/sched-if.h  | 40 ++++++++++++++++++++-----------
 7 files changed, 187 insertions(+), 106 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index a4c6d00b81..fffe23113e 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -376,13 +376,16 @@ a653sched_deinit(struct scheduler *ops)
  * This function allocates scheduler-specific data for a VCPU
  *
  * @param ops       Pointer to this instance of the scheduler structure
+ * @param item      Pointer to struct sched_item
  *
  * @return          Pointer to the allocated data
  */
 static void *
-a653sched_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+a653sched_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
+                      void *dd)
 {
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
+    struct vcpu *vc = item->vcpu;
     arinc653_vcpu_t *svc;
     unsigned int entry;
     unsigned long flags;
@@ -458,11 +461,13 @@ a653sched_free_vdata(const struct scheduler *ops, void *priv)
  * Xen scheduler callback function to sleep a VCPU
  *
  * @param ops       Pointer to this instance of the scheduler structure
- * @param vc        Pointer to the VCPU structure for the current domain
+ * @param item      Pointer to struct sched_item
  */
 static void
-a653sched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+a653sched_item_sleep(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
+
     if ( AVCPU(vc) != NULL )
         AVCPU(vc)->awake = 0;
 
@@ -478,11 +483,13 @@ a653sched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
  * Xen scheduler callback function to wake up a VCPU
  *
  * @param ops       Pointer to this instance of the scheduler structure
- * @param vc        Pointer to the VCPU structure for the current domain
+ * @param item      Pointer to struct sched_item
  */
 static void
-a653sched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+a653sched_item_wake(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
+
     if ( AVCPU(vc) != NULL )
         AVCPU(vc)->awake = 1;
 
@@ -597,13 +604,14 @@ a653sched_do_schedule(
  * Xen scheduler callback function to select a CPU for the VCPU to run on
  *
  * @param ops       Pointer to this instance of the scheduler structure
- * @param v         Pointer to the VCPU structure for the current domain
+ * @param item      Pointer to struct sched_item
  *
  * @return          Number of selected physical CPU
  */
 static int
-a653sched_pick_cpu(const struct scheduler *ops, struct vcpu *vc)
+a653sched_pick_cpu(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     cpumask_t *online;
     unsigned int cpu;
 
@@ -712,11 +720,11 @@ static const struct scheduler sched_arinc653_def = {
     .free_vdata     = a653sched_free_vdata,
     .alloc_vdata    = a653sched_alloc_vdata,
 
-    .insert_vcpu    = NULL,
-    .remove_vcpu    = NULL,
+    .insert_item    = NULL,
+    .remove_item    = NULL,
 
-    .sleep          = a653sched_vcpu_sleep,
-    .wake           = a653sched_vcpu_wake,
+    .sleep          = a653sched_item_sleep,
+    .wake           = a653sched_item_wake,
     .yield          = NULL,
     .context_saved  = NULL,
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 3abe20def8..3735486b4c 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -868,15 +868,16 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
 }
 
 static int
-csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
+csched_cpu_pick(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched_vcpu *svc = CSCHED_VCPU(vc);
 
     /*
      * We have been called by vcpu_migrate() (in schedule.c), as part
      * of the process of seeing if vc can be migrated to another pcpu.
      * We make a note about this in svc->flags so that later, in
-     * csched_vcpu_wake() (still called from vcpu_migrate()) we won't
+     * csched_item_wake() (still called from vcpu_migrate()) we won't
      * get boosted, which we don't deserve as we are "only" migrating.
      */
     set_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
@@ -1004,8 +1005,10 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
 }
 
 static void *
-csched_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+csched_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
+                   void *dd)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched_vcpu *svc;
 
     /* Allocate per-VCPU info */
@@ -1025,8 +1028,9 @@ csched_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
 }
 
 static void
-csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+csched_item_insert(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched_vcpu *svc = vc->sched_priv;
     spinlock_t *lock;
 
@@ -1035,7 +1039,7 @@ csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
     /* csched_cpu_pick() looks in vc->processor's runq, so we need the lock. */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched_cpu_pick(ops, vc);
+    vc->processor = csched_cpu_pick(ops, item);
 
     spin_unlock_irq(lock);
 
@@ -1060,9 +1064,10 @@ csched_free_vdata(const struct scheduler *ops, void *priv)
 }
 
 static void
-csched_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
+csched_item_remove(const struct scheduler *ops, struct sched_item *item)
 {
     struct csched_private *prv = CSCHED_PRIV(ops);
+    struct vcpu *vc = item->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
     struct csched_dom * const sdom = svc->sdom;
 
@@ -1087,8 +1092,9 @@ csched_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+csched_item_sleep(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
     unsigned int cpu = vc->processor;
 
@@ -1111,8 +1117,9 @@ csched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+csched_item_wake(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
     bool_t migrating;
 
@@ -1172,8 +1179,9 @@ csched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched_vcpu_yield(const struct scheduler *ops, struct vcpu *vc)
+csched_item_yield(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
 
     /* Let the scheduler know that this vcpu is trying to yield */
@@ -1226,9 +1234,10 @@ csched_dom_cntl(
 }
 
 static void
-csched_aff_cntl(const struct scheduler *ops, struct vcpu *v,
+csched_aff_cntl(const struct scheduler *ops, struct sched_item *item,
                 const cpumask_t *hard, const cpumask_t *soft)
 {
+    struct vcpu *v = item->vcpu;
     struct csched_vcpu *svc = CSCHED_VCPU(v);
 
     if ( !hard )
@@ -1756,7 +1765,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                  * - if we race with inc_nr_runnable(), we skip a pCPU that may
                  *   have runnable vCPUs in its runqueue, but that's not a
                  *   problem because:
-                 *   + if racing with csched_vcpu_insert() or csched_vcpu_wake(),
+                 *   + if racing with csched_item_insert() or csched_item_wake(),
                  *     __runq_tickle() will be called afterwords, so the vCPU
                  *     won't get stuck in the runqueue for too long;
                  *   + if racing with csched_runq_steal(), it may be that a
@@ -2268,12 +2277,12 @@ static const struct scheduler sched_credit_def = {
 
     .global_init    = csched_global_init,
 
-    .insert_vcpu    = csched_vcpu_insert,
-    .remove_vcpu    = csched_vcpu_remove,
+    .insert_item    = csched_item_insert,
+    .remove_item    = csched_item_remove,
 
-    .sleep          = csched_vcpu_sleep,
-    .wake           = csched_vcpu_wake,
-    .yield          = csched_vcpu_yield,
+    .sleep          = csched_item_sleep,
+    .wake           = csched_item_wake,
+    .yield          = csched_item_yield,
 
     .adjust         = csched_dom_cntl,
     .adjust_affinity= csched_aff_cntl,
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 6958b265fc..f44286c2a5 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -273,7 +273,7 @@
  * CSFLAG_delayed_runq_add: Do we need to add this to the runqueue once it'd done
  * being context switched out?
  * + Set when scheduling out in csched2_schedule() if prev is runnable
- * + Set in csched2_vcpu_wake if it finds CSFLAG_scheduled set
+ * + Set in csched2_item_wake if it finds CSFLAG_scheduled set
  * + Read in csched2_context_saved().  If set, it adds prev to the runqueue and
  *   clears the bit.
  */
@@ -623,14 +623,14 @@ static inline bool has_cap(const struct csched2_vcpu *svc)
  * This logic is entirely implemented in runq_tickle(), and that is enough.
  * In fact, in this scheduler, placement of a vcpu on one of the pcpus of a
  * runq, _always_ happens by means of tickling:
- *  - when a vcpu wakes up, it calls csched2_vcpu_wake(), which calls
+ *  - when a vcpu wakes up, it calls csched2_item_wake(), which calls
  *    runq_tickle();
  *  - when a migration is initiated in schedule.c, we call csched2_cpu_pick(),
- *    csched2_vcpu_migrate() (which calls migrate()) and csched2_vcpu_wake().
+ *    csched2_item_migrate() (which calls migrate()) and csched2_item_wake().
  *    csched2_cpu_pick() looks for the least loaded runq and return just any
- *    of its processors. Then, csched2_vcpu_migrate() just moves the vcpu to
+ *    of its processors. Then, csched2_item_migrate() just moves the vcpu to
  *    the chosen runq, and it is again runq_tickle(), called by
- *    csched2_vcpu_wake() that actually decides what pcpu to use within the
+ *    csched2_item_wake() that actually decides what pcpu to use within the
  *    chosen runq;
  *  - when a migration is initiated in sched_credit2.c, by calling  migrate()
  *    directly, that again temporarily use a random pcpu from the new runq,
@@ -2026,8 +2026,10 @@ csched2_vcpu_check(struct vcpu *vc)
 #endif
 
 static void *
-csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+csched2_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
+                    void *dd)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched2_vcpu *svc;
 
     /* Allocate per-VCPU info */
@@ -2069,8 +2071,9 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
 }
 
 static void
-csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+csched2_item_sleep(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
 
     ASSERT(!is_idle_vcpu(vc));
@@ -2091,8 +2094,9 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+csched2_item_wake(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     unsigned int cpu = vc->processor;
     s_time_t now;
@@ -2146,16 +2150,18 @@ out:
 }
 
 static void
-csched2_vcpu_yield(const struct scheduler *ops, struct vcpu *v)
+csched2_item_yield(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *v = item->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(v);
 
     __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
 }
 
 static void
-csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
+csched2_context_saved(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
     s_time_t now = NOW();
@@ -2196,9 +2202,10 @@ csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
 
 #define MAX_LOAD (STIME_MAX)
 static int
-csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
+csched2_cpu_pick(const struct scheduler *ops, struct sched_item *item)
 {
     struct csched2_private *prv = csched2_priv(ops);
+    struct vcpu *vc = item->vcpu;
     int i, min_rqi = -1, min_s_rqi = -1;
     unsigned int new_cpu, cpu = vc->processor;
     struct csched2_vcpu *svc = csched2_vcpu(vc);
@@ -2733,9 +2740,10 @@ retry:
 }
 
 static void
-csched2_vcpu_migrate(
-    const struct scheduler *ops, struct vcpu *vc, unsigned int new_cpu)
+csched2_item_migrate(
+    const struct scheduler *ops, struct sched_item *item, unsigned int new_cpu)
 {
+    struct vcpu *vc = item->vcpu;
     struct domain *d = vc->domain;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     struct csched2_runqueue_data *trqd;
@@ -2996,9 +3004,10 @@ csched2_dom_cntl(
 }
 
 static void
-csched2_aff_cntl(const struct scheduler *ops, struct vcpu *v,
+csched2_aff_cntl(const struct scheduler *ops, struct sched_item *item,
                  const cpumask_t *hard, const cpumask_t *soft)
 {
+    struct vcpu *v = item->vcpu;
     struct csched2_vcpu *svc = csched2_vcpu(v);
 
     if ( !hard )
@@ -3096,8 +3105,9 @@ csched2_free_domdata(const struct scheduler *ops, void *data)
 }
 
 static void
-csched2_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+csched2_item_insert(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched2_vcpu *svc = vc->sched_priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
@@ -3108,7 +3118,7 @@ csched2_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
     /* csched2_cpu_pick() expects the pcpu lock to be held */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched2_cpu_pick(ops, vc);
+    vc->processor = csched2_cpu_pick(ops, item);
 
     spin_unlock_irq(lock);
 
@@ -3135,8 +3145,9 @@ csched2_free_vdata(const struct scheduler *ops, void *priv)
 }
 
 static void
-csched2_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
+csched2_item_remove(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct csched2_vcpu * const svc = csched2_vcpu(vc);
     spinlock_t *lock;
 
@@ -4084,19 +4095,19 @@ static const struct scheduler sched_credit2_def = {
 
     .global_init    = csched2_global_init,
 
-    .insert_vcpu    = csched2_vcpu_insert,
-    .remove_vcpu    = csched2_vcpu_remove,
+    .insert_item    = csched2_item_insert,
+    .remove_item    = csched2_item_remove,
 
-    .sleep          = csched2_vcpu_sleep,
-    .wake           = csched2_vcpu_wake,
-    .yield          = csched2_vcpu_yield,
+    .sleep          = csched2_item_sleep,
+    .wake           = csched2_item_wake,
+    .yield          = csched2_item_yield,
 
     .adjust         = csched2_dom_cntl,
     .adjust_affinity= csched2_aff_cntl,
     .adjust_global  = csched2_sys_cntl,
 
     .pick_cpu       = csched2_cpu_pick,
-    .migrate        = csched2_vcpu_migrate,
+    .migrate        = csched2_item_migrate,
     .do_schedule    = csched2_schedule,
     .context_saved  = csched2_context_saved,
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index a59dbb2692..7b508f35a4 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -194,8 +194,9 @@ static void null_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 }
 
 static void *null_alloc_vdata(const struct scheduler *ops,
-                              struct vcpu *v, void *dd)
+                              struct sched_item *item, void *dd)
 {
+    struct vcpu *v = item->vcpu;
     struct null_vcpu *nvc;
 
     nvc = xzalloc(struct null_vcpu);
@@ -413,8 +414,10 @@ static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     sd->schedule_lock = &sd->_lock;
 }
 
-static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v)
+static void null_item_insert(const struct scheduler *ops,
+                             struct sched_item *item)
 {
+    struct vcpu *v = item->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_vcpu *nvc = null_vcpu(v);
     unsigned int cpu;
@@ -505,8 +508,10 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
     spin_unlock(&prv->waitq_lock);
 }
 
-static void null_vcpu_remove(const struct scheduler *ops, struct vcpu *v)
+static void null_item_remove(const struct scheduler *ops,
+                             struct sched_item *item)
 {
+    struct vcpu *v = item->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_vcpu *nvc = null_vcpu(v);
     spinlock_t *lock;
@@ -536,8 +541,11 @@ static void null_vcpu_remove(const struct scheduler *ops, struct vcpu *v)
     SCHED_STAT_CRANK(vcpu_remove);
 }
 
-static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v)
+static void null_item_wake(const struct scheduler *ops,
+                           struct sched_item *item)
 {
+    struct vcpu *v = item->vcpu;
+
     ASSERT(!is_idle_vcpu(v));
 
     if ( unlikely(curr_on_cpu(v->processor) == v) )
@@ -562,8 +570,11 @@ static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v)
     cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
 }
 
-static void null_vcpu_sleep(const struct scheduler *ops, struct vcpu *v)
+static void null_item_sleep(const struct scheduler *ops,
+                            struct sched_item *item)
 {
+    struct vcpu *v = item->vcpu;
+
     ASSERT(!is_idle_vcpu(v));
 
     /* If v is not assigned to a pCPU, or is not running, no need to bother */
@@ -573,15 +584,17 @@ static void null_vcpu_sleep(const struct scheduler *ops, struct vcpu *v)
     SCHED_STAT_CRANK(vcpu_sleep);
 }
 
-static int null_cpu_pick(const struct scheduler *ops, struct vcpu *v)
+static int null_cpu_pick(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *v = item->vcpu;
     ASSERT(!is_idle_vcpu(v));
     return pick_cpu(null_priv(ops), v);
 }
 
-static void null_vcpu_migrate(const struct scheduler *ops, struct vcpu *v,
-                              unsigned int new_cpu)
+static void null_item_migrate(const struct scheduler *ops,
+                              struct sched_item *item, unsigned int new_cpu)
 {
+    struct vcpu *v = item->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_vcpu *nvc = null_vcpu(v);
 
@@ -888,13 +901,13 @@ const struct scheduler sched_null_def = {
     .alloc_domdata  = null_alloc_domdata,
     .free_domdata   = null_free_domdata,
 
-    .insert_vcpu    = null_vcpu_insert,
-    .remove_vcpu    = null_vcpu_remove,
+    .insert_item    = null_item_insert,
+    .remove_item    = null_item_remove,
 
-    .wake           = null_vcpu_wake,
-    .sleep          = null_vcpu_sleep,
+    .wake           = null_item_wake,
+    .sleep          = null_item_sleep,
     .pick_cpu       = null_cpu_pick,
-    .migrate        = null_vcpu_migrate,
+    .migrate        = null_item_migrate,
     .do_schedule    = null_schedule,
 
     .dump_cpu_state = null_dump_pcpu,
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index f1b81f0373..ab8fa02306 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -136,7 +136,7 @@
  * RTDS_delayed_runq_add: Do we need to add this to the RunQ/DepletedQ
  * once it's done being context switching out?
  * + Set when scheduling out in rt_schedule() if prev is runable
- * + Set in rt_vcpu_wake if it finds RTDS_scheduled set
+ * + Set in rt_item_wake if it finds RTDS_scheduled set
  * + Read in rt_context_saved(). If set, it adds prev to the Runqueue/DepletedQ
  *   and clears the bit.
  */
@@ -637,8 +637,9 @@ replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc)
  * and available cpus
  */
 static int
-rt_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
+rt_cpu_pick(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     cpumask_t cpus;
     cpumask_t *online;
     int cpu;
@@ -846,8 +847,9 @@ rt_free_domdata(const struct scheduler *ops, void *data)
 }
 
 static void *
-rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
+rt_alloc_vdata(const struct scheduler *ops, struct sched_item *item, void *dd)
 {
+    struct vcpu *vc = item->vcpu;
     struct rt_vcpu *svc;
 
     /* Allocate per-VCPU info */
@@ -889,8 +891,9 @@ rt_free_vdata(const struct scheduler *ops, void *priv)
  * dest. cpupool.
  */
 static void
-rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+rt_item_insert(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct rt_vcpu *svc = rt_vcpu(vc);
     s_time_t now;
     spinlock_t *lock;
@@ -898,7 +901,7 @@ rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
     BUG_ON( is_idle_vcpu(vc) );
 
     /* This is safe because vc isn't yet being scheduled */
-    vc->processor = rt_cpu_pick(ops, vc);
+    vc->processor = rt_cpu_pick(ops, item);
 
     lock = vcpu_schedule_lock_irq(vc);
 
@@ -922,8 +925,9 @@ rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
  * Remove rt_vcpu svc from the old scheduler in source cpupool.
  */
 static void
-rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
+rt_item_remove(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct rt_vcpu * const svc = rt_vcpu(vc);
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
@@ -1142,8 +1146,9 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
  * The lock is already grabbed in schedule.c, no need to lock here
  */
 static void
-rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
+rt_item_sleep(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct rt_vcpu * const svc = rt_vcpu(vc);
 
     BUG_ON( is_idle_vcpu(vc) );
@@ -1257,8 +1262,9 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
  * TODO: what if these two vcpus belongs to the same domain?
  */
 static void
-rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+rt_item_wake(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct rt_vcpu * const svc = rt_vcpu(vc);
     s_time_t now;
     bool_t missed;
@@ -1327,8 +1333,9 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
  * and then pick the highest priority vcpu from runq to run
  */
 static void
-rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
+rt_context_saved(const struct scheduler *ops, struct sched_item *item)
 {
+    struct vcpu *vc = item->vcpu;
     struct rt_vcpu *svc = rt_vcpu(vc);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
 
@@ -1557,15 +1564,15 @@ static const struct scheduler sched_rtds_def = {
     .free_domdata   = rt_free_domdata,
     .alloc_vdata    = rt_alloc_vdata,
     .free_vdata     = rt_free_vdata,
-    .insert_vcpu    = rt_vcpu_insert,
-    .remove_vcpu    = rt_vcpu_remove,
+    .insert_item    = rt_item_insert,
+    .remove_item    = rt_item_remove,
 
     .adjust         = rt_dom_cntl,
 
     .pick_cpu       = rt_cpu_pick,
     .do_schedule    = rt_schedule,
-    .sleep          = rt_vcpu_sleep,
-    .wake           = rt_vcpu_wake,
+    .sleep          = rt_item_sleep,
+    .wake           = rt_item_wake,
     .context_saved  = rt_context_saved,
 };
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 6b5d454630..d1a958143a 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -256,6 +256,7 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 {
     struct domain *d = v->domain;
+    struct sched_item item = { .vcpu = v };
 
     v->processor = processor;
 
@@ -267,7 +268,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, v,
+    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, &item,
                      d->sched_priv);
     if ( v->sched_priv == NULL )
         return 1;
@@ -289,7 +290,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     }
     else
     {
-        SCHED_OP(dom_scheduler(d), insert_vcpu, v);
+        SCHED_OP(dom_scheduler(d), insert_item, &item);
     }
 
     return 0;
@@ -310,6 +311,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     void *vcpudata;
     struct scheduler *old_ops;
     void *old_domdata;
+    struct sched_item item;
 
     for_each_vcpu ( d, v )
     {
@@ -330,7 +332,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        vcpu_priv[v->vcpu_id] = SCHED_OP(c->sched, alloc_vdata, v, domdata);
+        item.vcpu = v;
+        vcpu_priv[v->vcpu_id] = SCHED_OP(c->sched, alloc_vdata, &item, domdata);
         if ( vcpu_priv[v->vcpu_id] == NULL )
         {
             for_each_vcpu ( d, v )
@@ -348,7 +351,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        SCHED_OP(old_ops, remove_vcpu, v);
+        item.vcpu = v;
+        SCHED_OP(old_ops, remove_item, &item);
     }
 
     d->cpupool = c;
@@ -359,6 +363,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     {
         spinlock_t *lock;
 
+        item.vcpu = v;
         vcpudata = v->sched_priv;
 
         migrate_timer(&v->periodic_timer, new_p);
@@ -383,7 +388,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         new_p = cpumask_cycle(new_p, c->cpu_valid);
 
-        SCHED_OP(c->sched, insert_vcpu, v);
+        SCHED_OP(c->sched, insert_item, &item);
 
         SCHED_OP(old_ops, free_vdata, vcpudata);
     }
@@ -401,12 +406,14 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
 void sched_destroy_vcpu(struct vcpu *v)
 {
+    struct sched_item item = { .vcpu = v };
+
     kill_timer(&v->periodic_timer);
     kill_timer(&v->singleshot_timer);
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
-    SCHED_OP(vcpu_scheduler(v), remove_vcpu, v);
+    SCHED_OP(vcpu_scheduler(v), remove_item, &item);
     SCHED_OP(vcpu_scheduler(v), free_vdata, v->sched_priv);
 }
 
@@ -451,6 +458,8 @@ void sched_destroy_domain(struct domain *d)
 
 void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
+    struct sched_item item = { .vcpu = v };
+
     ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
 
     if ( likely(!vcpu_runnable(v)) )
@@ -458,7 +467,7 @@ void vcpu_sleep_nosync_locked(struct vcpu *v)
         if ( v->runstate.state == RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
 
-        SCHED_OP(vcpu_scheduler(v), sleep, v);
+        SCHED_OP(vcpu_scheduler(v), sleep, &item);
     }
 }
 
@@ -490,6 +499,7 @@ void vcpu_wake(struct vcpu *v)
 {
     unsigned long flags;
     spinlock_t *lock;
+    struct sched_item item = { .vcpu = v };
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
@@ -499,7 +509,7 @@ void vcpu_wake(struct vcpu *v)
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        SCHED_OP(vcpu_scheduler(v), wake, v);
+        SCHED_OP(vcpu_scheduler(v), wake, &item);
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -538,6 +548,7 @@ void vcpu_unblock(struct vcpu *v)
 static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
 {
     unsigned int old_cpu = v->processor;
+    struct sched_item item = { .vcpu = v };
 
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
@@ -555,7 +566,7 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
      * pointer cant' change while the current lock is held.
      */
     if ( vcpu_scheduler(v)->migrate )
-        SCHED_OP(vcpu_scheduler(v), migrate, v, new_cpu);
+        SCHED_OP(vcpu_scheduler(v), migrate, &item, new_cpu);
     else
         v->processor = new_cpu;
 }
@@ -599,6 +610,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
     unsigned int old_cpu, new_cpu;
     spinlock_t *old_lock, *new_lock;
     bool_t pick_called = 0;
+    struct sched_item item = { .vcpu = v };
 
     /*
      * If the vcpu is currently running, this will be handled by
@@ -635,7 +647,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = SCHED_OP(vcpu_scheduler(v), pick_cpu, v);
+            new_cpu = SCHED_OP(vcpu_scheduler(v), pick_cpu, &item);
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -705,6 +717,7 @@ void restore_vcpu_affinity(struct domain *d)
     {
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
+        struct sched_item item = { .vcpu = v };
 
         ASSERT(!vcpu_runnable(v));
 
@@ -740,7 +753,7 @@ void restore_vcpu_affinity(struct domain *d)
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, v);
+        v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, &item);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -858,7 +871,9 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    SCHED_OP(dom_scheduler(v->domain), adjust_affinity, v, hard, soft);
+    struct sched_item item = { .vcpu = v };
+
+    SCHED_OP(dom_scheduler(v->domain), adjust_affinity, &item, hard, soft);
 
     if ( hard )
         cpumask_copy(v->cpu_hard_affinity, hard);
@@ -1034,9 +1049,10 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=current;
+    struct sched_item item = { .vcpu = v };
     spinlock_t *lock = vcpu_schedule_lock_irq(v);
 
-    SCHED_OP(vcpu_scheduler(v), yield, v);
+    SCHED_OP(vcpu_scheduler(v), yield, &item);
     vcpu_schedule_unlock_irq(lock, v);
 
     SCHED_STAT_CRANK(vcpu_yield);
@@ -1531,6 +1547,8 @@ static void schedule(void)
 
 void context_saved(struct vcpu *prev)
 {
+    struct sched_item item = { .vcpu = prev };
+
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
 
@@ -1539,7 +1557,7 @@ void context_saved(struct vcpu *prev)
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
 
-    SCHED_OP(vcpu_scheduler(prev), context_saved, prev);
+    SCHED_OP(vcpu_scheduler(prev), context_saved, &item);
 
     vcpu_migrate_finish(prev);
 }
@@ -1595,6 +1613,7 @@ static int cpu_schedule_up(unsigned int cpu)
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
+        struct sched_item item = { .vcpu = idle };
 
         /*
          * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
@@ -1608,7 +1627,7 @@ static int cpu_schedule_up(unsigned int cpu)
          */
         ASSERT(idle->sched_priv == NULL);
 
-        idle->sched_priv = SCHED_OP(&ops, alloc_vdata, idle,
+        idle->sched_priv = SCHED_OP(&ops, alloc_vdata, &item,
                                     idle->domain->sched_priv);
         if ( idle->sched_priv == NULL )
             return -ENOMEM;
@@ -1801,6 +1820,7 @@ void __init scheduler_init(void)
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
+    struct sched_item item;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
@@ -1836,10 +1856,11 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      *    sched_priv field of the per-vCPU info of the idle domain.
      */
     idle = idle_vcpu[cpu];
+    item.vcpu = idle;
     ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
     if ( IS_ERR(ppriv) )
         return PTR_ERR(ppriv);
-    vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
+    vpriv = SCHED_OP(new_ops, alloc_vdata, &item, idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
         SCHED_OP(new_ops, free_pdata, ppriv, cpu);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 92bc7a0365..a9916f35b8 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -48,6 +48,10 @@ DECLARE_PER_CPU(struct schedule_data, schedule_data);
 DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
 
+struct sched_item {
+    struct vcpu           *vcpu;
+};
+
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
  * Within each scheduler, when using the scratch mask of one pCPU:
@@ -141,8 +145,8 @@ struct scheduler {
     void         (*deinit)         (struct scheduler *);
 
     void         (*free_vdata)     (const struct scheduler *, void *);
-    void *       (*alloc_vdata)    (const struct scheduler *, struct vcpu *,
-                                    void *);
+    void *       (*alloc_vdata)    (const struct scheduler *,
+                                    struct sched_item *, void *);
     void         (*free_pdata)     (const struct scheduler *, void *, int);
     void *       (*alloc_pdata)    (const struct scheduler *, int);
     void         (*init_pdata)     (const struct scheduler *, void *, int);
@@ -156,24 +160,32 @@ struct scheduler {
     void         (*switch_sched)   (struct scheduler *, unsigned int,
                                     void *, void *);
 
-    /* Activate / deactivate vcpus in a cpu pool */
-    void         (*insert_vcpu)    (const struct scheduler *, struct vcpu *);
-    void         (*remove_vcpu)    (const struct scheduler *, struct vcpu *);
-
-    void         (*sleep)          (const struct scheduler *, struct vcpu *);
-    void         (*wake)           (const struct scheduler *, struct vcpu *);
-    void         (*yield)          (const struct scheduler *, struct vcpu *);
-    void         (*context_saved)  (const struct scheduler *, struct vcpu *);
+    /* Activate / deactivate items in a cpu pool */
+    void         (*insert_item)    (const struct scheduler *,
+                                    struct sched_item *);
+    void         (*remove_item)    (const struct scheduler *,
+                                    struct sched_item *);
+
+    void         (*sleep)          (const struct scheduler *,
+                                    struct sched_item *);
+    void         (*wake)           (const struct scheduler *,
+                                    struct sched_item *);
+    void         (*yield)          (const struct scheduler *,
+                                    struct sched_item *);
+    void         (*context_saved)  (const struct scheduler *,
+                                    struct sched_item *);
 
     struct task_slice (*do_schedule) (const struct scheduler *, s_time_t,
                                       bool_t tasklet_work_scheduled);
 
-    int          (*pick_cpu)       (const struct scheduler *, struct vcpu *);
-    void         (*migrate)        (const struct scheduler *, struct vcpu *,
-                                    unsigned int);
+    int          (*pick_cpu)       (const struct scheduler *,
+                                    struct sched_item *);
+    void         (*migrate)        (const struct scheduler *,
+                                    struct sched_item *, unsigned int);
     int          (*adjust)         (const struct scheduler *, struct domain *,
                                     struct xen_domctl_scheduler_op *);
-    void         (*adjust_affinity)(const struct scheduler *, struct vcpu *,
+    void         (*adjust_affinity)(const struct scheduler *,
+                                    struct sched_item *,
                                     const struct cpumask *,
                                     const struct cpumask *);
     int          (*adjust_global)  (const struct scheduler *,
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 09/49] xen/sched: alloc struct sched_item for each vcpu
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (7 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 10/49] xen/sched: move per-vcpu scheduler private data pointer to sched_item Juergen Gross
                   ` (45 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Allocate a struct sched_item for each vcpu. This removes the need to
have it locally on the stack in schedule.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c   | 68 +++++++++++++++++++++++--------------------------
 xen/include/xen/sched.h |  2 ++
 2 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index d1a958143a..2b7d62ede7 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -256,10 +256,15 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 {
     struct domain *d = v->domain;
-    struct sched_item item = { .vcpu = v };
+    struct sched_item *item;
 
     v->processor = processor;
 
+    if ( (item = xzalloc(struct sched_item)) == NULL )
+        return 1;
+    v->sched_item = item;
+    item->vcpu = v;
+
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -268,10 +273,14 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, &item,
+    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, item,
                      d->sched_priv);
     if ( v->sched_priv == NULL )
+    {
+        v->sched_item = NULL;
+        xfree(item);
         return 1;
+    }
 
     /*
      * Initialize affinity settings. The idler, and potentially
@@ -290,7 +299,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     }
     else
     {
-        SCHED_OP(dom_scheduler(d), insert_item, &item);
+        SCHED_OP(dom_scheduler(d), insert_item, item);
     }
 
     return 0;
@@ -311,7 +320,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     void *vcpudata;
     struct scheduler *old_ops;
     void *old_domdata;
-    struct sched_item item;
 
     for_each_vcpu ( d, v )
     {
@@ -332,8 +340,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        item.vcpu = v;
-        vcpu_priv[v->vcpu_id] = SCHED_OP(c->sched, alloc_vdata, &item, domdata);
+        vcpu_priv[v->vcpu_id] = SCHED_OP(c->sched, alloc_vdata,
+                                         v->sched_item, domdata);
         if ( vcpu_priv[v->vcpu_id] == NULL )
         {
             for_each_vcpu ( d, v )
@@ -351,8 +359,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        item.vcpu = v;
-        SCHED_OP(old_ops, remove_item, &item);
+        SCHED_OP(old_ops, remove_item, v->sched_item);
     }
 
     d->cpupool = c;
@@ -363,7 +370,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     {
         spinlock_t *lock;
 
-        item.vcpu = v;
         vcpudata = v->sched_priv;
 
         migrate_timer(&v->periodic_timer, new_p);
@@ -388,7 +394,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         new_p = cpumask_cycle(new_p, c->cpu_valid);
 
-        SCHED_OP(c->sched, insert_item, &item);
+        SCHED_OP(c->sched, insert_item, v->sched_item);
 
         SCHED_OP(old_ops, free_vdata, vcpudata);
     }
@@ -406,15 +412,17 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
 void sched_destroy_vcpu(struct vcpu *v)
 {
-    struct sched_item item = { .vcpu = v };
+    struct sched_item *item = v->sched_item;
 
     kill_timer(&v->periodic_timer);
     kill_timer(&v->singleshot_timer);
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
-    SCHED_OP(vcpu_scheduler(v), remove_item, &item);
+    SCHED_OP(vcpu_scheduler(v), remove_item, item);
     SCHED_OP(vcpu_scheduler(v), free_vdata, v->sched_priv);
+    xfree(item);
+    v->sched_item = NULL;
 }
 
 int sched_init_domain(struct domain *d, int poolid)
@@ -458,8 +466,6 @@ void sched_destroy_domain(struct domain *d)
 
 void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
-    struct sched_item item = { .vcpu = v };
-
     ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
 
     if ( likely(!vcpu_runnable(v)) )
@@ -467,7 +473,7 @@ void vcpu_sleep_nosync_locked(struct vcpu *v)
         if ( v->runstate.state == RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
 
-        SCHED_OP(vcpu_scheduler(v), sleep, &item);
+        SCHED_OP(vcpu_scheduler(v), sleep, v->sched_item);
     }
 }
 
@@ -499,7 +505,6 @@ void vcpu_wake(struct vcpu *v)
 {
     unsigned long flags;
     spinlock_t *lock;
-    struct sched_item item = { .vcpu = v };
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
@@ -509,7 +514,7 @@ void vcpu_wake(struct vcpu *v)
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        SCHED_OP(vcpu_scheduler(v), wake, &item);
+        SCHED_OP(vcpu_scheduler(v), wake, v->sched_item);
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -548,7 +553,6 @@ void vcpu_unblock(struct vcpu *v)
 static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
 {
     unsigned int old_cpu = v->processor;
-    struct sched_item item = { .vcpu = v };
 
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
@@ -566,7 +570,7 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
      * pointer cant' change while the current lock is held.
      */
     if ( vcpu_scheduler(v)->migrate )
-        SCHED_OP(vcpu_scheduler(v), migrate, &item, new_cpu);
+        SCHED_OP(vcpu_scheduler(v), migrate, v->sched_item, new_cpu);
     else
         v->processor = new_cpu;
 }
@@ -610,7 +614,6 @@ static void vcpu_migrate_finish(struct vcpu *v)
     unsigned int old_cpu, new_cpu;
     spinlock_t *old_lock, *new_lock;
     bool_t pick_called = 0;
-    struct sched_item item = { .vcpu = v };
 
     /*
      * If the vcpu is currently running, this will be handled by
@@ -647,7 +650,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = SCHED_OP(vcpu_scheduler(v), pick_cpu, &item);
+            new_cpu = SCHED_OP(vcpu_scheduler(v), pick_cpu, v->sched_item);
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -717,7 +720,6 @@ void restore_vcpu_affinity(struct domain *d)
     {
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
-        struct sched_item item = { .vcpu = v };
 
         ASSERT(!vcpu_runnable(v));
 
@@ -753,7 +755,7 @@ void restore_vcpu_affinity(struct domain *d)
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, &item);
+        v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, v->sched_item);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -871,9 +873,8 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    struct sched_item item = { .vcpu = v };
-
-    SCHED_OP(dom_scheduler(v->domain), adjust_affinity, &item, hard, soft);
+    SCHED_OP(dom_scheduler(v->domain), adjust_affinity, v->sched_item,
+             hard, soft);
 
     if ( hard )
         cpumask_copy(v->cpu_hard_affinity, hard);
@@ -1049,10 +1050,9 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=current;
-    struct sched_item item = { .vcpu = v };
     spinlock_t *lock = vcpu_schedule_lock_irq(v);
 
-    SCHED_OP(vcpu_scheduler(v), yield, &item);
+    SCHED_OP(vcpu_scheduler(v), yield, v->sched_item);
     vcpu_schedule_unlock_irq(lock, v);
 
     SCHED_STAT_CRANK(vcpu_yield);
@@ -1547,8 +1547,6 @@ static void schedule(void)
 
 void context_saved(struct vcpu *prev)
 {
-    struct sched_item item = { .vcpu = prev };
-
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
 
@@ -1557,7 +1555,7 @@ void context_saved(struct vcpu *prev)
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
 
-    SCHED_OP(vcpu_scheduler(prev), context_saved, &item);
+    SCHED_OP(vcpu_scheduler(prev), context_saved, prev->sched_item);
 
     vcpu_migrate_finish(prev);
 }
@@ -1613,7 +1611,6 @@ static int cpu_schedule_up(unsigned int cpu)
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
-        struct sched_item item = { .vcpu = idle };
 
         /*
          * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
@@ -1627,7 +1624,7 @@ static int cpu_schedule_up(unsigned int cpu)
          */
         ASSERT(idle->sched_priv == NULL);
 
-        idle->sched_priv = SCHED_OP(&ops, alloc_vdata, &item,
+        idle->sched_priv = SCHED_OP(&ops, alloc_vdata, idle->sched_item,
                                     idle->domain->sched_priv);
         if ( idle->sched_priv == NULL )
             return -ENOMEM;
@@ -1820,7 +1817,6 @@ void __init scheduler_init(void)
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
-    struct sched_item item;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
@@ -1856,11 +1852,11 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      *    sched_priv field of the per-vCPU info of the idle domain.
      */
     idle = idle_vcpu[cpu];
-    item.vcpu = idle;
     ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
     if ( IS_ERR(ppriv) )
         return PTR_ERR(ppriv);
-    vpriv = SCHED_OP(new_ops, alloc_vdata, &item, idle->domain->sched_priv);
+    vpriv = SCHED_OP(new_ops, alloc_vdata, idle->sched_item,
+                     idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
         SCHED_OP(new_ops, free_pdata, ppriv, cpu);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index edee52dfe4..c8aa2915c4 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -140,6 +140,7 @@ void evtchn_destroy(struct domain *d); /* from domain_kill */
 void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
 
 struct waitqueue_vcpu;
+struct sched_item;
 
 struct vcpu
 {
@@ -160,6 +161,7 @@ struct vcpu
 
     struct timer     poll_timer;    /* timeout for SCHEDOP_poll */
 
+    struct sched_item *sched_item;
     void            *sched_priv;    /* scheduler-specific data */
 
     struct vcpu_runstate_info runstate;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 10/49] xen/sched: move per-vcpu scheduler private data pointer to sched_item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (8 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 09/49] xen/sched: alloc struct sched_item for each vcpu Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 11/49] xen/sched: build a linked list of struct sched_item Juergen Gross
                   ` (44 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

This prepares making the different schedulers vcpu agnostic.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |  4 ++--
 xen/common/sched_credit.c   |  6 +++---
 xen/common/sched_credit2.c  | 10 +++++-----
 xen/common/sched_null.c     |  4 ++--
 xen/common/sched_rt.c       |  4 ++--
 xen/common/schedule.c       | 25 ++++++++++++-------------
 xen/include/xen/sched-if.h  |  1 +
 xen/include/xen/sched.h     |  1 -
 8 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index fffe23113e..f5af8b972d 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -53,7 +53,7 @@
  * Return a pointer to the ARINC 653-specific scheduler data information
  * associated with the given VCPU (vc)
  */
-#define AVCPU(vc) ((arinc653_vcpu_t *)(vc)->sched_priv)
+#define AVCPU(vc) ((arinc653_vcpu_t *)(vc)->sched_item->priv)
 
 /**
  * Return the global scheduler private data given the scheduler ops pointer
@@ -647,7 +647,7 @@ a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     ASSERT(!pdata && svc && is_idle_vcpu(svc->vc));
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_item->priv = vdata;
 
     per_cpu(scheduler, cpu) = new_ops;
     per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 3735486b4c..cb8e167fc9 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -83,7 +83,7 @@
     ((struct csched_private *)((_ops)->sched_data))
 #define CSCHED_PCPU(_c)     \
     ((struct csched_pcpu *)per_cpu(schedule_data, _c).sched_priv)
-#define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_priv)
+#define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_item->priv)
 #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
 #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
 
@@ -641,7 +641,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     ASSERT(svc && is_idle_vcpu(svc->vcpu));
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_item->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -1031,7 +1031,7 @@ static void
 csched_item_insert(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched_vcpu *svc = vc->sched_priv;
+    struct csched_vcpu *svc = item->priv;
     spinlock_t *lock;
 
     BUG_ON( is_idle_vcpu(vc) );
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index f44286c2a5..9c052c24a7 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -572,7 +572,7 @@ static inline struct csched2_pcpu *csched2_pcpu(unsigned int cpu)
 
 static inline struct csched2_vcpu *csched2_vcpu(const struct vcpu *v)
 {
-    return v->sched_priv;
+    return v->sched_item->priv;
 }
 
 static inline struct csched2_dom *csched2_dom(const struct domain *d)
@@ -970,7 +970,7 @@ _runq_assign(struct csched2_vcpu *svc, struct csched2_runqueue_data *rqd)
 static void
 runq_assign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_priv;
+    struct csched2_vcpu *svc = vc->sched_item->priv;
 
     ASSERT(svc->rqd == NULL);
 
@@ -997,7 +997,7 @@ _runq_deassign(struct csched2_vcpu *svc)
 static void
 runq_deassign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_priv;
+    struct csched2_vcpu *svc = vc->sched_item->priv;
 
     ASSERT(svc->rqd == c2rqd(ops, vc->processor));
 
@@ -3108,7 +3108,7 @@ static void
 csched2_item_insert(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched2_vcpu *svc = vc->sched_priv;
+    struct csched2_vcpu *svc = item->priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
@@ -3888,7 +3888,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     ASSERT(!local_irq_is_enabled());
     write_lock(&prv->lock);
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_item->priv = vdata;
 
     rqi = init_pdata(prv, pdata, cpu);
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 7b508f35a4..eb51ddbccb 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -117,7 +117,7 @@ static inline struct null_private *null_priv(const struct scheduler *ops)
 
 static inline struct null_vcpu *null_vcpu(const struct vcpu *v)
 {
-    return v->sched_priv;
+    return v->sched_item->priv;
 }
 
 static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
@@ -391,7 +391,7 @@ static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     ASSERT(nvc && is_idle_vcpu(nvc->vcpu));
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_item->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index ab8fa02306..c830aac92f 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -235,7 +235,7 @@ static inline struct rt_private *rt_priv(const struct scheduler *ops)
 
 static inline struct rt_vcpu *rt_vcpu(const struct vcpu *vcpu)
 {
-    return vcpu->sched_priv;
+    return vcpu->sched_item->priv;
 }
 
 static inline struct list_head *rt_runq(const struct scheduler *ops)
@@ -761,7 +761,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
         dprintk(XENLOG_DEBUG, "RTDS: timer initialized on cpu %u\n", cpu);
     }
 
-    idle_vcpu[cpu]->sched_priv = vdata;
+    idle_vcpu[cpu]->sched_item->priv = vdata;
     per_cpu(scheduler, cpu) = new_ops;
     per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 2b7d62ede7..819a78b646 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -273,9 +273,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, item,
-                     d->sched_priv);
-    if ( v->sched_priv == NULL )
+    item->priv = SCHED_OP(dom_scheduler(d), alloc_vdata, item, d->sched_priv);
+    if ( item->priv == NULL )
     {
         v->sched_item = NULL;
         xfree(item);
@@ -370,7 +369,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     {
         spinlock_t *lock;
 
-        vcpudata = v->sched_priv;
+        vcpudata = v->sched_item->priv;
 
         migrate_timer(&v->periodic_timer, new_p);
         migrate_timer(&v->singleshot_timer, new_p);
@@ -388,7 +387,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
          */
         spin_unlock_irq(lock);
 
-        v->sched_priv = vcpu_priv[v->vcpu_id];
+        v->sched_item->priv = vcpu_priv[v->vcpu_id];
         if ( !d->is_dying )
             sched_move_irqs(v);
 
@@ -420,7 +419,7 @@ void sched_destroy_vcpu(struct vcpu *v)
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
     SCHED_OP(vcpu_scheduler(v), remove_item, item);
-    SCHED_OP(vcpu_scheduler(v), free_vdata, v->sched_priv);
+    SCHED_OP(vcpu_scheduler(v), free_vdata, item->priv);
     xfree(item);
     v->sched_item = NULL;
 }
@@ -1611,6 +1610,7 @@ static int cpu_schedule_up(unsigned int cpu)
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
+        struct sched_item *item = idle->sched_item;
 
         /*
          * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
@@ -1622,11 +1622,10 @@ static int cpu_schedule_up(unsigned int cpu)
          * with a different scheduler, it is schedule_cpu_switch(), invoked
          * later, that will set things up as appropriate.
          */
-        ASSERT(idle->sched_priv == NULL);
+        ASSERT(item->priv == NULL);
 
-        idle->sched_priv = SCHED_OP(&ops, alloc_vdata, idle->sched_item,
-                                    idle->domain->sched_priv);
-        if ( idle->sched_priv == NULL )
+        item->priv = SCHED_OP(&ops, alloc_vdata, item, idle->domain->sched_priv);
+        if ( item->priv == NULL )
             return -ENOMEM;
     }
     if ( idle_vcpu[cpu] == NULL )
@@ -1652,9 +1651,9 @@ static void cpu_schedule_down(unsigned int cpu)
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
     SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
-    SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
+    SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_item->priv);
 
-    idle_vcpu[cpu]->sched_priv = NULL;
+    idle_vcpu[cpu]->sched_item->priv = NULL;
     sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
@@ -1879,7 +1878,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
      */
     old_lock = pcpu_schedule_lock_irq(cpu);
 
-    vpriv_old = idle->sched_priv;
+    vpriv_old = idle->sched_item->priv;
     ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
     SCHED_OP(new_ops, switch_sched, cpu, ppriv, vpriv);
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index a9916f35b8..1fe87a73b4 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -50,6 +50,7 @@ DECLARE_PER_CPU(struct cpupool *, cpupool);
 
 struct sched_item {
     struct vcpu           *vcpu;
+    void                  *priv;      /* scheduler private data */
 };
 
 /*
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c8aa2915c4..6acdc0f5be 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -162,7 +162,6 @@ struct vcpu
     struct timer     poll_timer;    /* timeout for SCHEDOP_poll */
 
     struct sched_item *sched_item;
-    void            *sched_priv;    /* scheduler-specific data */
 
     struct vcpu_runstate_info runstate;
 #ifndef CONFIG_COMPAT
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 11/49] xen/sched: build a linked list of struct sched_item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (9 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 10/49] xen/sched: move per-vcpu scheduler private data pointer to sched_item Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 12/49] xen/sched: introduce struct sched_resource Juergen Gross
                   ` (43 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

In order to make it easy to iterate over sched_item elements of a
domain build a single linked list and add an iterator for it. The new
list is guarded by the same mechanisms as the vcpu linked list as it
is modified only via vcpu_create() or vcpu_destroy().

For completeness add another iterator for_each_sched_item_vcpu() which
will iterate over all vcpus if a sched_item (right now only one). This
will be needed later for larger scheduling granularity (e.g. cores).

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c      | 56 ++++++++++++++++++++++++++++++++++++++++------
 xen/include/xen/sched-if.h |  8 +++++++
 xen/include/xen/sched.h    |  1 +
 3 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 819a78b646..e9d91d29cc 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -253,6 +253,52 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
     spin_unlock_irqrestore(lock1, flags);
 }
 
+static void sched_free_item(struct sched_item *item)
+{
+    struct sched_item *prev_item;
+    struct domain *d = item->vcpu->domain;
+
+    if ( d->sched_item_list == item )
+        d->sched_item_list = item->next_in_list;
+    else
+    {
+        for_each_sched_item(d, prev_item)
+        {
+            if ( prev_item->next_in_list == item )
+            {
+                prev_item->next_in_list = item->next_in_list;
+                break;
+            }
+        }
+    }
+
+    item->vcpu->sched_item = NULL;
+    xfree(item);
+}
+
+static struct sched_item *sched_alloc_item(struct vcpu *v)
+{
+    struct sched_item *item, **prev_item;
+    struct domain *d = v->domain;
+
+    if ( (item = xzalloc(struct sched_item)) == NULL )
+        return NULL;
+
+    v->sched_item = item;
+    item->vcpu = v;
+
+    for ( prev_item = &d->sched_item_list; *prev_item;
+          prev_item = &(*prev_item)->next_in_list )
+        if ( (*prev_item)->next_in_list &&
+             (*prev_item)->next_in_list->vcpu->vcpu_id > v->vcpu_id )
+            break;
+
+    item->next_in_list = *prev_item;
+    *prev_item = item;
+
+    return item;
+}
+
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 {
     struct domain *d = v->domain;
@@ -260,10 +306,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
 
     v->processor = processor;
 
-    if ( (item = xzalloc(struct sched_item)) == NULL )
+    if ( (item = sched_alloc_item(v)) == NULL )
         return 1;
-    v->sched_item = item;
-    item->vcpu = v;
 
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
@@ -276,8 +320,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     item->priv = SCHED_OP(dom_scheduler(d), alloc_vdata, item, d->sched_priv);
     if ( item->priv == NULL )
     {
-        v->sched_item = NULL;
-        xfree(item);
+        sched_free_item(item);
         return 1;
     }
 
@@ -420,8 +463,7 @@ void sched_destroy_vcpu(struct vcpu *v)
         atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
     SCHED_OP(vcpu_scheduler(v), remove_item, item);
     SCHED_OP(vcpu_scheduler(v), free_vdata, item->priv);
-    xfree(item);
-    v->sched_item = NULL;
+    sched_free_item(item);
 }
 
 int sched_init_domain(struct domain *d, int poolid)
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 1fe87a73b4..4caade5b8b 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -51,8 +51,16 @@ DECLARE_PER_CPU(struct cpupool *, cpupool);
 struct sched_item {
     struct vcpu           *vcpu;
     void                  *priv;      /* scheduler private data */
+    struct sched_item     *next_in_list;
 };
 
+#define for_each_sched_item(d, e)                                         \
+    for ( (e) = (d)->sched_item_list; (e) != NULL; (e) = (e)->next_in_list )
+
+#define for_each_sched_item_vcpu(i, v)                                    \
+    for ( (v) = (i)->vcpu; (v) != NULL && (v)->sched_item == (i);         \
+          (v) = (v)->next_in_list )
+
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
  * Within each scheduler, when using the scratch mask of one pCPU:
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 6acdc0f5be..2e9ced29a8 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -334,6 +334,7 @@ struct domain
 
     /* Scheduling. */
     void            *sched_priv;    /* scheduler-specific data */
+    struct sched_item *sched_item_list;
     struct cpupool  *cpupool;
 
     struct domain   *next_in_list;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 12/49] xen/sched: introduce struct sched_resource
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (10 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 11/49] xen/sched: build a linked list of struct sched_item Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 13/49] xen/sched: let pick_cpu return a scheduler resource Juergen Gross
                   ` (42 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Add a scheduling abstraction layer between physical processors and the
schedulers by introducing a struct sched_resource. Each scheduler item
running is active on such a scheduler resource. For the time being
there is one struct sched_resource per cpu, but in future there might
be one for each core or socket only.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c  |  2 ++
 xen/common/sched_credit2.c |  7 +++++++
 xen/common/sched_null.c    |  3 +++
 xen/common/sched_rt.c      |  2 ++
 xen/common/schedule.c      | 18 ++++++++++++++++++
 xen/include/xen/sched-if.h |  6 ++++++
 6 files changed, 38 insertions(+)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index cb8e167fc9..fc068a1c5f 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1040,6 +1040,7 @@ csched_item_insert(const struct scheduler *ops, struct sched_item *item)
     lock = vcpu_schedule_lock_irq(vc);
 
     vc->processor = csched_cpu_pick(ops, item);
+    item->res = per_cpu(sched_res, vc->processor);
 
     spin_unlock_irq(lock);
 
@@ -1675,6 +1676,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
             WARN_ON(vc->is_urgent);
             runq_remove(speer);
             vc->processor = cpu;
+            vc->sched_item->res = per_cpu(sched_res, cpu);
             /*
              * speer will start executing directly on cpu, without having to
              * go through runq_insert(). So we must update the runnable count
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 9c052c24a7..614d71d948 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2519,6 +2519,7 @@ static void migrate(const struct scheduler *ops,
                     &trqd->active);
         svc->vcpu->processor = cpumask_cycle(trqd->pick_bias,
                                              cpumask_scratch_cpu(cpu));
+        svc->vcpu->sched_item->res = per_cpu(sched_res, svc->vcpu->processor);
         trqd->pick_bias = svc->vcpu->processor;
         ASSERT(svc->vcpu->processor < nr_cpu_ids);
 
@@ -2774,6 +2775,7 @@ csched2_item_migrate(
         }
         _runq_deassign(svc);
         vc->processor = new_cpu;
+        item->res = per_cpu(sched_res, new_cpu);
         return;
     }
 
@@ -2794,7 +2796,10 @@ csched2_item_migrate(
     if ( trqd != svc->rqd )
         migrate(ops, svc, trqd, now);
     else
+    {
         vc->processor = new_cpu;
+        item->res = per_cpu(sched_res, new_cpu);
+    }
 }
 
 static int
@@ -3119,6 +3124,7 @@ csched2_item_insert(const struct scheduler *ops, struct sched_item *item)
     lock = vcpu_schedule_lock_irq(vc);
 
     vc->processor = csched2_cpu_pick(ops, item);
+    item->res = per_cpu(sched_res, vc->processor);
 
     spin_unlock_irq(lock);
 
@@ -3596,6 +3602,7 @@ csched2_schedule(
         {
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
             snext->vcpu->processor = cpu;
+            snext->vcpu->sched_item->res = per_cpu(sched_res, cpu);
             SCHED_STAT_CRANK(migrated);
             ret.migrated = 1;
         }
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index eb51ddbccb..114b32e2e1 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -343,6 +343,7 @@ static void vcpu_assign(struct null_private *prv, struct vcpu *v,
 {
     per_cpu(npc, cpu).vcpu = v;
     v->processor = cpu;
+    v->sched_item->res = per_cpu(sched_res, cpu);
     cpumask_clear_cpu(cpu, &prv->cpus_free);
 
     dprintk(XENLOG_G_INFO, "%d <-- %pv\n", cpu, v);
@@ -429,6 +430,7 @@ static void null_item_insert(const struct scheduler *ops,
  retry:
 
     cpu = v->processor = pick_cpu(prv, v);
+    item->res = per_cpu(sched_res, cpu);
 
     spin_unlock(lock);
 
@@ -675,6 +677,7 @@ static void null_item_migrate(const struct scheduler *ops,
      * by this, will be fixed-up during resume.
      */
     v->processor = new_cpu;
+    item->res = per_cpu(sched_res, new_cpu);
 }
 
 #ifndef NDEBUG
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index c830aac92f..44b86fc08d 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -902,6 +902,7 @@ rt_item_insert(const struct scheduler *ops, struct sched_item *item)
 
     /* This is safe because vc isn't yet being scheduled */
     vc->processor = rt_cpu_pick(ops, item);
+    item->res = per_cpu(sched_res, vc->processor);
 
     lock = vcpu_schedule_lock_irq(vc);
 
@@ -1132,6 +1133,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         if ( snext->vcpu->processor != cpu )
         {
             snext->vcpu->processor = cpu;
+            snext->vcpu->sched_item->res = per_cpu(sched_res, cpu);
             ret.migrated = 1;
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e9d91d29cc..db297f6144 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -63,6 +63,7 @@ static void poll_timer_fn(void *data);
 /* This is global for now so that private implementations can reach it */
 DEFINE_PER_CPU(struct schedule_data, schedule_data);
 DEFINE_PER_CPU(struct scheduler *, scheduler);
+DEFINE_PER_CPU(struct sched_resource *, sched_res);
 
 /* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
@@ -309,6 +310,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     if ( (item = sched_alloc_item(v)) == NULL )
         return 1;
 
+    item->res = per_cpu(sched_res, processor);
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -423,6 +425,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         sched_set_affinity(v, &cpumask_all, &cpumask_all);
 
         v->processor = new_p;
+	v->sched_item->res = per_cpu(sched_res, new_p);
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
@@ -613,7 +616,10 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
     if ( vcpu_scheduler(v)->migrate )
         SCHED_OP(vcpu_scheduler(v), migrate, v->sched_item, new_cpu);
     else
+    {
         v->processor = new_cpu;
+        v->sched_item->res = per_cpu(sched_res, new_cpu);
+    }
 }
 
 /*
@@ -794,9 +800,11 @@ void restore_vcpu_affinity(struct domain *d)
         }
 
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+        v->sched_item->res = per_cpu(sched_res, v->processor);
 
         lock = vcpu_schedule_lock_irq(v);
         v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, v->sched_item);
+        v->sched_item->res = per_cpu(sched_res, v->processor);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -1635,6 +1643,13 @@ static int cpu_schedule_up(unsigned int cpu)
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     void *sched_priv;
+    struct sched_resource *res;
+
+    res = xmalloc(struct sched_resource);
+    if ( res == NULL )
+        return -ENOMEM;
+    res->processor = cpu;
+    per_cpu(sched_res, cpu) = res;
 
     per_cpu(scheduler, cpu) = &ops;
     spin_lock_init(&sd->_lock);
@@ -1699,6 +1714,9 @@ static void cpu_schedule_down(unsigned int cpu)
     sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
+
+    xfree(per_cpu(sched_res, cpu));
+    per_cpu(sched_res, cpu) = NULL;
 }
 
 static int cpu_schedule_callback(
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 4caade5b8b..43235951a3 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -44,14 +44,20 @@ struct schedule_data {
 
 #define curr_on_cpu(c)    (per_cpu(schedule_data, c).curr)
 
+struct sched_resource {
+    unsigned     processor;
+};
+
 DECLARE_PER_CPU(struct schedule_data, schedule_data);
 DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
+DECLARE_PER_CPU(struct sched_resource *, sched_res);
 
 struct sched_item {
     struct vcpu           *vcpu;
     void                  *priv;      /* scheduler private data */
     struct sched_item     *next_in_list;
+    struct sched_resource *res;
 };
 
 #define for_each_sched_item(d, e)                                         \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 13/49] xen/sched: let pick_cpu return a scheduler resource
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (11 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 12/49] xen/sched: introduce struct sched_resource Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 15:08 ` [PATCH RFC 14/49] xen/sched: switch schedule_data.curr to point at sched_item Juergen Gross
                   ` (41 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

Instead of returning a physical cpu number let pick_cpu() return a
scheduler resource instead. Rename pick_cpu() to pick_resource() to
reflect that change.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c  | 12 ++++++------
 xen/common/sched_credit.c    | 16 ++++++++--------
 xen/common/sched_credit2.c   | 22 +++++++++++-----------
 xen/common/sched_null.c      | 20 +++++++++++---------
 xen/common/sched_rt.c        | 18 +++++++++---------
 xen/common/schedule.c        |  8 +++++---
 xen/include/xen/perfc_defn.h |  2 +-
 xen/include/xen/sched-if.h   |  4 ++--
 8 files changed, 53 insertions(+), 49 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index f5af8b972d..a775be4cbc 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -601,15 +601,15 @@ a653sched_do_schedule(
 }
 
 /**
- * Xen scheduler callback function to select a CPU for the VCPU to run on
+ * Xen scheduler callback function to select a resource for the VCPU to run on
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param item      Pointer to struct sched_item
  *
- * @return          Number of selected physical CPU
+ * @return          Scheduler resource to run on
  */
-static int
-a653sched_pick_cpu(const struct scheduler *ops, struct sched_item *item)
+static struct sched_resource *
+a653sched_pick_resource(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
     cpumask_t *online;
@@ -627,7 +627,7 @@ a653sched_pick_cpu(const struct scheduler *ops, struct sched_item *item)
          || (cpu >= nr_cpu_ids) )
         cpu = vc->processor;
 
-    return cpu;
+    return per_cpu(sched_res, cpu);
 }
 
 /**
@@ -730,7 +730,7 @@ static const struct scheduler sched_arinc653_def = {
 
     .do_schedule    = a653sched_do_schedule,
 
-    .pick_cpu       = a653sched_pick_cpu,
+    .pick_resource  = a653sched_pick_resource,
 
     .switch_sched   = a653_switch_sched,
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index fc068a1c5f..14b749dc1a 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -867,8 +867,8 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     return cpu;
 }
 
-static int
-csched_cpu_pick(const struct scheduler *ops, struct sched_item *item)
+static struct sched_resource *
+csched_res_pick(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
     struct csched_vcpu *svc = CSCHED_VCPU(vc);
@@ -881,7 +881,7 @@ csched_cpu_pick(const struct scheduler *ops, struct sched_item *item)
      * get boosted, which we don't deserve as we are "only" migrating.
      */
     set_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
-    return _csched_cpu_pick(ops, vc, 1);
+    return per_cpu(sched_res, _csched_cpu_pick(ops, vc, 1));
 }
 
 static inline void
@@ -981,7 +981,7 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
         /*
          * If it's been active a while, check if we'd be better off
          * migrating it to run elsewhere (see multi-core and multi-thread
-         * support in csched_cpu_pick()).
+         * support in csched_res_pick()).
          */
         new_cpu = _csched_cpu_pick(ops, current, 0);
 
@@ -1036,11 +1036,11 @@ csched_item_insert(const struct scheduler *ops, struct sched_item *item)
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    /* csched_cpu_pick() looks in vc->processor's runq, so we need the lock. */
+    /* csched_res_pick() looks in vc->processor's runq, so we need the lock. */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched_cpu_pick(ops, item);
-    item->res = per_cpu(sched_res, vc->processor);
+    item->res = csched_res_pick(ops, item);
+    vc->processor = item->res->processor;
 
     spin_unlock_irq(lock);
 
@@ -2290,7 +2290,7 @@ static const struct scheduler sched_credit_def = {
     .adjust_affinity= csched_aff_cntl,
     .adjust_global  = csched_sys_cntl,
 
-    .pick_cpu       = csched_cpu_pick,
+    .pick_resource  = csched_res_pick,
     .do_schedule    = csched_schedule,
 
     .dump_cpu_state = csched_dump_pcpu,
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 614d71d948..c8ae585272 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -625,9 +625,9 @@ static inline bool has_cap(const struct csched2_vcpu *svc)
  * runq, _always_ happens by means of tickling:
  *  - when a vcpu wakes up, it calls csched2_item_wake(), which calls
  *    runq_tickle();
- *  - when a migration is initiated in schedule.c, we call csched2_cpu_pick(),
+ *  - when a migration is initiated in schedule.c, we call csched2_res_pick(),
  *    csched2_item_migrate() (which calls migrate()) and csched2_item_wake().
- *    csched2_cpu_pick() looks for the least loaded runq and return just any
+ *    csched2_res_pick() looks for the least loaded runq and return just any
  *    of its processors. Then, csched2_item_migrate() just moves the vcpu to
  *    the chosen runq, and it is again runq_tickle(), called by
  *    csched2_item_wake() that actually decides what pcpu to use within the
@@ -676,7 +676,7 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 }
 
 /*
- * In csched2_cpu_pick(), it may not be possible to actually look at remote
+ * In csched2_res_pick(), it may not be possible to actually look at remote
  * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
  * we pick, in order of decreasing preference:
  *  1) svc's current pcpu, if it is part of svc's soft affinity;
@@ -2201,8 +2201,8 @@ csched2_context_saved(const struct scheduler *ops, struct sched_item *item)
 }
 
 #define MAX_LOAD (STIME_MAX)
-static int
-csched2_cpu_pick(const struct scheduler *ops, struct sched_item *item)
+static struct sched_resource *
+csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
 {
     struct csched2_private *prv = csched2_priv(ops);
     struct vcpu *vc = item->vcpu;
@@ -2214,7 +2214,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct sched_item *item)
 
     ASSERT(!cpumask_empty(&prv->active_queues));
 
-    SCHED_STAT_CRANK(pick_cpu);
+    SCHED_STAT_CRANK(pick_resource);
 
     /* Locking:
      * - Runqueue lock of vc->processor is already locked
@@ -2423,7 +2423,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct sched_item *item)
                     (unsigned char *)&d);
     }
 
-    return new_cpu;
+    return per_cpu(sched_res, new_cpu);
 }
 
 /* Working state of the load-balancing algorithm */
@@ -3120,11 +3120,11 @@ csched2_item_insert(const struct scheduler *ops, struct sched_item *item)
     ASSERT(!is_idle_vcpu(vc));
     ASSERT(list_empty(&svc->runq_elem));
 
-    /* csched2_cpu_pick() expects the pcpu lock to be held */
+    /* csched2_res_pick() expects the pcpu lock to be held */
     lock = vcpu_schedule_lock_irq(vc);
 
-    vc->processor = csched2_cpu_pick(ops, item);
-    item->res = per_cpu(sched_res, vc->processor);
+    item->res = csched2_res_pick(ops, item);
+    vc->processor = item->res->processor;
 
     spin_unlock_irq(lock);
 
@@ -4113,7 +4113,7 @@ static const struct scheduler sched_credit2_def = {
     .adjust_affinity= csched2_aff_cntl,
     .adjust_global  = csched2_sys_cntl,
 
-    .pick_cpu       = csched2_cpu_pick,
+    .pick_resource  = csched2_res_pick,
     .migrate        = csched2_item_migrate,
     .do_schedule    = csched2_schedule,
     .context_saved  = csched2_context_saved,
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 114b32e2e1..a08f23993c 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -269,9 +269,11 @@ static void null_free_domdata(const struct scheduler *ops, void *data)
  *
  * So this is not part of any hot path.
  */
-static unsigned int pick_cpu(struct null_private *prv, struct vcpu *v)
+static struct sched_resource *
+pick_res(struct null_private *prv, struct sched_item *item)
 {
     unsigned int bs;
+    struct vcpu *v = item->vcpu;
     unsigned int cpu = v->processor, new_cpu;
     cpumask_t *cpus = cpupool_domain_cpumask(v->domain);
 
@@ -335,7 +337,7 @@ static unsigned int pick_cpu(struct null_private *prv, struct vcpu *v)
         __trace_var(TRC_SNULL_PICKED_CPU, 1, sizeof(d), &d);
     }
 
-    return new_cpu;
+    return per_cpu(sched_res, new_cpu);
 }
 
 static void vcpu_assign(struct null_private *prv, struct vcpu *v,
@@ -429,8 +431,8 @@ static void null_item_insert(const struct scheduler *ops,
     lock = vcpu_schedule_lock_irq(v);
  retry:
 
-    cpu = v->processor = pick_cpu(prv, v);
-    item->res = per_cpu(sched_res, cpu);
+    item->res = pick_res(prv, item);
+    cpu = v->processor = item->res->processor;
 
     spin_unlock(lock);
 
@@ -586,11 +588,11 @@ static void null_item_sleep(const struct scheduler *ops,
     SCHED_STAT_CRANK(vcpu_sleep);
 }
 
-static int null_cpu_pick(const struct scheduler *ops, struct sched_item *item)
+static struct sched_resource *
+null_res_pick(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *v = item->vcpu;
-    ASSERT(!is_idle_vcpu(v));
-    return pick_cpu(null_priv(ops), v);
+    ASSERT(!is_idle_vcpu(item->vcpu));
+    return pick_res(null_priv(ops), item);
 }
 
 static void null_item_migrate(const struct scheduler *ops,
@@ -909,7 +911,7 @@ const struct scheduler sched_null_def = {
 
     .wake           = null_item_wake,
     .sleep          = null_item_sleep,
-    .pick_cpu       = null_cpu_pick,
+    .pick_resource  = null_res_pick,
     .migrate        = null_item_migrate,
     .do_schedule    = null_schedule,
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 44b86fc08d..2bd4637592 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -632,12 +632,12 @@ replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc)
 }
 
 /*
- * Pick a valid CPU for the vcpu vc
- * Valid CPU of a vcpu is intesection of vcpu's affinity
- * and available cpus
+ * Pick a valid resource for the vcpu vc
+ * Valid resource of a vcpu is intesection of vcpu's affinity
+ * and available resources
  */
-static int
-rt_cpu_pick(const struct scheduler *ops, struct sched_item *item)
+static struct sched_resource *
+rt_res_pick(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
     cpumask_t cpus;
@@ -652,7 +652,7 @@ rt_cpu_pick(const struct scheduler *ops, struct sched_item *item)
             : cpumask_cycle(vc->processor, &cpus);
     ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
 
-    return cpu;
+    return per_cpu(sched_res, cpu);
 }
 
 /*
@@ -901,8 +901,8 @@ rt_item_insert(const struct scheduler *ops, struct sched_item *item)
     BUG_ON( is_idle_vcpu(vc) );
 
     /* This is safe because vc isn't yet being scheduled */
-    vc->processor = rt_cpu_pick(ops, item);
-    item->res = per_cpu(sched_res, vc->processor);
+    item->res = rt_res_pick(ops, item);
+    vc->processor = item->res->processor;
 
     lock = vcpu_schedule_lock_irq(vc);
 
@@ -1571,7 +1571,7 @@ static const struct scheduler sched_rtds_def = {
 
     .adjust         = rt_dom_cntl,
 
-    .pick_cpu       = rt_cpu_pick,
+    .pick_resource  = rt_res_pick,
     .do_schedule    = rt_schedule,
     .sleep          = rt_item_sleep,
     .wake           = rt_item_wake,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index db297f6144..62490454ea 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -697,7 +697,8 @@ static void vcpu_migrate_finish(struct vcpu *v)
                 break;
 
             /* Select a new CPU. */
-            new_cpu = SCHED_OP(vcpu_scheduler(v), pick_cpu, v->sched_item);
+            new_cpu = SCHED_OP(vcpu_scheduler(v), pick_resource,
+                               v->sched_item)->processor;
             if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -803,8 +804,9 @@ void restore_vcpu_affinity(struct domain *d)
         v->sched_item->res = per_cpu(sched_res, v->processor);
 
         lock = vcpu_schedule_lock_irq(v);
-        v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, v->sched_item);
-        v->sched_item->res = per_cpu(sched_res, v->processor);
+        v->sched_item->res = SCHED_OP(vcpu_scheduler(v), pick_resource,
+                                      v->sched_item);
+        v->processor = v->sched_item->res->processor;
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index ef6f86b91e..1ad4384080 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -69,7 +69,7 @@ PERFCOUNTER(migrate_on_runq,        "csched2: migrate_on_runq")
 PERFCOUNTER(migrate_no_runq,        "csched2: migrate_no_runq")
 PERFCOUNTER(runtime_min_timer,      "csched2: runtime_min_timer")
 PERFCOUNTER(runtime_max_timer,      "csched2: runtime_max_timer")
-PERFCOUNTER(pick_cpu,               "csched2: pick_cpu")
+PERFCOUNTER(pick_resource,          "csched2: pick_resource")
 PERFCOUNTER(need_fallback_cpu,      "csched2: need_fallback_cpu")
 PERFCOUNTER(migrated,               "csched2: migrated")
 PERFCOUNTER(migrate_resisted,       "csched2: migrate_resisted")
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 43235951a3..10a97a5dc2 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -193,8 +193,8 @@ struct scheduler {
     struct task_slice (*do_schedule) (const struct scheduler *, s_time_t,
                                       bool_t tasklet_work_scheduled);
 
-    int          (*pick_cpu)       (const struct scheduler *,
-                                    struct sched_item *);
+    struct sched_resource * (*pick_resource) (const struct scheduler *,
+                                              struct sched_item *);
     void         (*migrate)        (const struct scheduler *,
                                     struct sched_item *, unsigned int);
     int          (*adjust)         (const struct scheduler *, struct domain *,
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 14/49] xen/sched: switch schedule_data.curr to point at sched_item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (12 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 13/49] xen/sched: let pick_cpu return a scheduler resource Juergen Gross
@ 2019-03-29 15:08 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 15/49] xen/sched: move per cpu scheduler private data into struct sched_resource Juergen Gross
                   ` (40 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

In preparation of core scheduling let the percpu pointer
schedule_data.curr point to a strct sched_item instead of the related
vcpu. At the same time rename the per-vcpu scheduler specific structs
to per-item ones.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |   2 +-
 xen/common/sched_credit.c   | 101 +++++++++++++-------------
 xen/common/sched_credit2.c  | 168 ++++++++++++++++++++++----------------------
 xen/common/sched_null.c     |  44 ++++++------
 xen/common/sched_rt.c       | 118 +++++++++++++++----------------
 xen/common/schedule.c       |   8 ++-
 xen/include/xen/sched-if.h  |   2 +-
 7 files changed, 220 insertions(+), 223 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index a775be4cbc..5701baf337 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -475,7 +475,7 @@ a653sched_item_sleep(const struct scheduler *ops, struct sched_item *item)
      * If the VCPU being put to sleep is the same one that is currently
      * running, raise a softirq to invoke the scheduler to switch domains.
      */
-    if ( per_cpu(schedule_data, vc->processor).curr == vc )
+    if ( per_cpu(schedule_data, vc->processor).curr == item )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
 }
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 14b749dc1a..6552d4c087 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -83,7 +83,7 @@
     ((struct csched_private *)((_ops)->sched_data))
 #define CSCHED_PCPU(_c)     \
     ((struct csched_pcpu *)per_cpu(schedule_data, _c).sched_priv)
-#define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_item->priv)
+#define CSCHED_ITEM(item)   ((struct csched_item *) (item)->priv)
 #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
 #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
 
@@ -160,7 +160,7 @@ struct csched_pcpu {
 /*
  * Virtual CPU
  */
-struct csched_vcpu {
+struct csched_item {
     struct list_head runq_elem;
     struct list_head active_vcpu_elem;
 
@@ -231,15 +231,15 @@ static void csched_tick(void *_cpu);
 static void csched_acct(void *dummy);
 
 static inline int
-__vcpu_on_runq(struct csched_vcpu *svc)
+__vcpu_on_runq(struct csched_item *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
 
-static inline struct csched_vcpu *
+static inline struct csched_item *
 __runq_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct csched_vcpu, runq_elem);
+    return list_entry(elem, struct csched_item, runq_elem);
 }
 
 /* Is the first element of cpu's runq (if any) cpu's idle vcpu? */
@@ -271,7 +271,7 @@ dec_nr_runnable(unsigned int cpu)
 }
 
 static inline void
-__runq_insert(struct csched_vcpu *svc)
+__runq_insert(struct csched_item *svc)
 {
     unsigned int cpu = svc->vcpu->processor;
     const struct list_head * const runq = RUNQ(cpu);
@@ -281,7 +281,7 @@ __runq_insert(struct csched_vcpu *svc)
 
     list_for_each( iter, runq )
     {
-        const struct csched_vcpu * const iter_svc = __runq_elem(iter);
+        const struct csched_item * const iter_svc = __runq_elem(iter);
         if ( svc->pri > iter_svc->pri )
             break;
     }
@@ -302,34 +302,34 @@ __runq_insert(struct csched_vcpu *svc)
 }
 
 static inline void
-runq_insert(struct csched_vcpu *svc)
+runq_insert(struct csched_item *svc)
 {
     __runq_insert(svc);
     inc_nr_runnable(svc->vcpu->processor);
 }
 
 static inline void
-__runq_remove(struct csched_vcpu *svc)
+__runq_remove(struct csched_item *svc)
 {
     BUG_ON( !__vcpu_on_runq(svc) );
     list_del_init(&svc->runq_elem);
 }
 
 static inline void
-runq_remove(struct csched_vcpu *svc)
+runq_remove(struct csched_item *svc)
 {
     dec_nr_runnable(svc->vcpu->processor);
     __runq_remove(svc);
 }
 
-static void burn_credits(struct csched_vcpu *svc, s_time_t now)
+static void burn_credits(struct csched_item *svc, s_time_t now)
 {
     s_time_t delta;
     uint64_t val;
     unsigned int credits;
 
     /* Assert svc is current */
-    ASSERT( svc == CSCHED_VCPU(curr_on_cpu(svc->vcpu->processor)) );
+    ASSERT( svc == CSCHED_ITEM(curr_on_cpu(svc->vcpu->processor)) );
 
     if ( (delta = now - svc->start_time) <= 0 )
         return;
@@ -347,10 +347,10 @@ boolean_param("tickle_one_idle_cpu", opt_tickle_one_idle);
 
 DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 
-static inline void __runq_tickle(struct csched_vcpu *new)
+static inline void __runq_tickle(struct csched_item *new)
 {
     unsigned int cpu = new->vcpu->processor;
-    struct csched_vcpu * const cur = CSCHED_VCPU(curr_on_cpu(cpu));
+    struct csched_item * const cur = CSCHED_ITEM(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     cpumask_t mask, idle_mask, *online;
     int balance_step, idlers_empty;
@@ -605,7 +605,7 @@ init_pdata(struct csched_private *prv, struct csched_pcpu *spc, int cpu)
     spc->idle_bias = nr_cpu_ids - 1;
 
     /* Start off idling... */
-    BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)));
+    BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)->vcpu));
     cpumask_set_cpu(cpu, prv->idlers);
     spc->nr_runnable = 0;
 }
@@ -637,7 +637,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct csched_private *prv = CSCHED_PRIV(new_ops);
-    struct csched_vcpu *svc = vdata;
+    struct csched_item *svc = vdata;
 
     ASSERT(svc && is_idle_vcpu(svc->vcpu));
 
@@ -669,7 +669,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 static inline void
 __csched_vcpu_check(struct vcpu *vc)
 {
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_item * const svc = CSCHED_ITEM(vc->sched_item);
     struct csched_dom * const sdom = svc->sdom;
 
     BUG_ON( svc->vcpu != vc );
@@ -871,7 +871,7 @@ static struct sched_resource *
 csched_res_pick(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched_vcpu *svc = CSCHED_VCPU(vc);
+    struct csched_item *svc = CSCHED_ITEM(item);
 
     /*
      * We have been called by vcpu_migrate() (in schedule.c), as part
@@ -885,7 +885,7 @@ csched_res_pick(const struct scheduler *ops, struct sched_item *item)
 }
 
 static inline void
-__csched_vcpu_acct_start(struct csched_private *prv, struct csched_vcpu *svc)
+__csched_vcpu_acct_start(struct csched_private *prv, struct csched_item *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
     unsigned long flags;
@@ -915,7 +915,7 @@ __csched_vcpu_acct_start(struct csched_private *prv, struct csched_vcpu *svc)
 
 static inline void
 __csched_vcpu_acct_stop_locked(struct csched_private *prv,
-    struct csched_vcpu *svc)
+    struct csched_item *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
@@ -940,7 +940,7 @@ __csched_vcpu_acct_stop_locked(struct csched_private *prv,
 static void
 csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
 {
-    struct csched_vcpu * const svc = CSCHED_VCPU(current);
+    struct csched_item * const svc = CSCHED_ITEM(current->sched_item);
     const struct scheduler *ops = per_cpu(scheduler, cpu);
 
     ASSERT( current->processor == cpu );
@@ -1009,10 +1009,10 @@ csched_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
                    void *dd)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched_vcpu *svc;
+    struct csched_item *svc;
 
     /* Allocate per-VCPU info */
-    svc = xzalloc(struct csched_vcpu);
+    svc = xzalloc(struct csched_item);
     if ( svc == NULL )
         return NULL;
 
@@ -1031,7 +1031,7 @@ static void
 csched_item_insert(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched_vcpu *svc = item->priv;
+    struct csched_item *svc = item->priv;
     spinlock_t *lock;
 
     BUG_ON( is_idle_vcpu(vc) );
@@ -1057,7 +1057,7 @@ csched_item_insert(const struct scheduler *ops, struct sched_item *item)
 static void
 csched_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct csched_vcpu *svc = priv;
+    struct csched_item *svc = priv;
 
     BUG_ON( !list_empty(&svc->runq_elem) );
 
@@ -1068,8 +1068,7 @@ static void
 csched_item_remove(const struct scheduler *ops, struct sched_item *item)
 {
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct vcpu *vc = item->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_item * const svc = CSCHED_ITEM(item);
     struct csched_dom * const sdom = svc->sdom;
 
     SCHED_STAT_CRANK(vcpu_remove);
@@ -1096,14 +1095,14 @@ static void
 csched_item_sleep(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_item * const svc = CSCHED_ITEM(item);
     unsigned int cpu = vc->processor;
 
     SCHED_STAT_CRANK(vcpu_sleep);
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    if ( curr_on_cpu(cpu) == vc )
+    if ( curr_on_cpu(cpu) == item )
     {
         /*
          * We are about to tickle cpu, so we should clear its bit in idlers.
@@ -1121,12 +1120,12 @@ static void
 csched_item_wake(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_item * const svc = CSCHED_ITEM(item);
     bool_t migrating;
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
+    if ( unlikely(curr_on_cpu(vc->processor) == item) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         return;
@@ -1182,8 +1181,7 @@ csched_item_wake(const struct scheduler *ops, struct sched_item *item)
 static void
 csched_item_yield(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
-    struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+    struct csched_item * const svc = CSCHED_ITEM(item);
 
     /* Let the scheduler know that this vcpu is trying to yield */
     set_bit(CSCHED_FLAG_VCPU_YIELD, &svc->flags);
@@ -1238,8 +1236,7 @@ static void
 csched_aff_cntl(const struct scheduler *ops, struct sched_item *item,
                 const cpumask_t *hard, const cpumask_t *soft)
 {
-    struct vcpu *v = item->vcpu;
-    struct csched_vcpu *svc = CSCHED_VCPU(v);
+    struct csched_item *svc = CSCHED_ITEM(item);
 
     if ( !hard )
         return;
@@ -1342,7 +1339,7 @@ csched_runq_sort(struct csched_private *prv, unsigned int cpu)
 {
     struct csched_pcpu * const spc = CSCHED_PCPU(cpu);
     struct list_head *runq, *elem, *next, *last_under;
-    struct csched_vcpu *svc_elem;
+    struct csched_item *svc_elem;
     spinlock_t *lock;
     unsigned long flags;
     int sort_epoch;
@@ -1388,7 +1385,7 @@ csched_acct(void* dummy)
     unsigned long flags;
     struct list_head *iter_vcpu, *next_vcpu;
     struct list_head *iter_sdom, *next_sdom;
-    struct csched_vcpu *svc;
+    struct csched_item *svc;
     struct csched_dom *sdom;
     uint32_t credit_total;
     uint32_t weight_total;
@@ -1511,7 +1508,7 @@ csched_acct(void* dummy)
 
         list_for_each_safe( iter_vcpu, next_vcpu, &sdom->active_vcpu )
         {
-            svc = list_entry(iter_vcpu, struct csched_vcpu, active_vcpu_elem);
+            svc = list_entry(iter_vcpu, struct csched_item, active_vcpu_elem);
             BUG_ON( sdom != svc->sdom );
 
             /* Increment credit */
@@ -1614,12 +1611,12 @@ csched_tick(void *_cpu)
     set_timer(&spc->ticker, NOW() + MICROSECS(prv->tick_period_us) );
 }
 
-static struct csched_vcpu *
+static struct csched_item *
 csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
 {
     const struct csched_private * const prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu);
-    struct csched_vcpu *speer;
+    struct csched_item *speer;
     struct list_head *iter;
     struct vcpu *vc;
 
@@ -1629,7 +1626,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
      * Don't steal from an idle CPU's runq because it's about to
      * pick up work from it itself.
      */
-    if ( unlikely(is_idle_vcpu(curr_on_cpu(peer_cpu))) )
+    if ( unlikely(is_idle_vcpu(curr_on_cpu(peer_cpu)->vcpu)) )
         goto out;
 
     list_for_each( iter, &peer_pcpu->runq )
@@ -1691,12 +1688,12 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
     return NULL;
 }
 
-static struct csched_vcpu *
+static struct csched_item *
 csched_load_balance(struct csched_private *prv, int cpu,
-    struct csched_vcpu *snext, bool_t *stolen)
+    struct csched_item *snext, bool_t *stolen)
 {
     struct cpupool *c = per_cpu(cpupool, cpu);
-    struct csched_vcpu *speer;
+    struct csched_item *speer;
     cpumask_t workers;
     cpumask_t *online;
     int peer_cpu, first_cpu, peer_node, bstep;
@@ -1845,9 +1842,9 @@ csched_schedule(
 {
     const int cpu = smp_processor_id();
     struct list_head * const runq = RUNQ(cpu);
-    struct csched_vcpu * const scurr = CSCHED_VCPU(current);
+    struct csched_item * const scurr = CSCHED_ITEM(current->sched_item);
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct csched_vcpu *snext;
+    struct csched_item *snext;
     struct task_slice ret;
     s_time_t runtime, tslice;
 
@@ -1963,7 +1960,7 @@ csched_schedule(
     if ( tasklet_work_scheduled )
     {
         TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_VCPU(idle_vcpu[cpu]);
+        snext = CSCHED_ITEM(idle_vcpu[cpu]->sched_item);
         snext->pri = CSCHED_PRI_TS_BOOST;
     }
 
@@ -2015,7 +2012,7 @@ out:
 }
 
 static void
-csched_dump_vcpu(struct csched_vcpu *svc)
+csched_dump_vcpu(struct csched_item *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
@@ -2051,7 +2048,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
     struct list_head *runq, *iter;
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_pcpu *spc;
-    struct csched_vcpu *svc;
+    struct csched_item *svc;
     spinlock_t *lock;
     unsigned long flags;
     int loop;
@@ -2075,7 +2072,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
     /* current VCPU (nothing to say if that's the idle vcpu). */
-    svc = CSCHED_VCPU(curr_on_cpu(cpu));
+    svc = CSCHED_ITEM(curr_on_cpu(cpu));
     if ( svc && !is_idle_vcpu(svc->vcpu) )
     {
         printk("\trun: ");
@@ -2144,10 +2141,10 @@ csched_dump(const struct scheduler *ops)
 
         list_for_each( iter_svc, &sdom->active_vcpu )
         {
-            struct csched_vcpu *svc;
+            struct csched_item *svc;
             spinlock_t *lock;
 
-            svc = list_entry(iter_svc, struct csched_vcpu, active_vcpu_elem);
+            svc = list_entry(iter_svc, struct csched_item, active_vcpu_elem);
             lock = vcpu_schedule_lock(svc->vcpu);
 
             printk("\t%3d: ", ++loop);
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index c8ae585272..5a3a0babab 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -176,7 +176,7 @@
  *     load balancing;
  *  + serializes runqueue operations (removing and inserting vcpus);
  *  + protects runqueue-wide data in csched2_runqueue_data;
- *  + protects vcpu parameters in csched2_vcpu for the vcpu in the
+ *  + protects vcpu parameters in csched2_item for the vcpu in the
  *    runqueue.
  *
  * - Private scheduler lock
@@ -511,7 +511,7 @@ struct csched2_pcpu {
 /*
  * Virtual CPU
  */
-struct csched2_vcpu {
+struct csched2_item {
     struct csched2_dom *sdom;          /* Up-pointer to domain                */
     struct vcpu *vcpu;                 /* Up-pointer, to vcpu                 */
     struct csched2_runqueue_data *rqd; /* Up-pointer to the runqueue          */
@@ -570,9 +570,9 @@ static inline struct csched2_pcpu *csched2_pcpu(unsigned int cpu)
     return per_cpu(schedule_data, cpu).sched_priv;
 }
 
-static inline struct csched2_vcpu *csched2_vcpu(const struct vcpu *v)
+static inline struct csched2_item *csched2_item(const struct sched_item *item)
 {
-    return v->sched_item->priv;
+    return item->priv;
 }
 
 static inline struct csched2_dom *csched2_dom(const struct domain *d)
@@ -594,7 +594,7 @@ static inline struct csched2_runqueue_data *c2rqd(const struct scheduler *ops,
 }
 
 /* Does the domain of this vCPU have a cap? */
-static inline bool has_cap(const struct csched2_vcpu *svc)
+static inline bool has_cap(const struct csched2_item *svc)
 {
     return svc->budget != STIME_MAX;
 }
@@ -688,7 +688,7 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
  * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
  * note that at least 5 is guaranteed to _always_ return at least one pcpu.
  */
-static int get_fallback_cpu(struct csched2_vcpu *svc)
+static int get_fallback_cpu(struct csched2_item *svc)
 {
     struct vcpu *v = svc->vcpu;
     unsigned int bs;
@@ -773,7 +773,7 @@ static int get_fallback_cpu(struct csched2_vcpu *svc)
  * FIXME: Do pre-calculated division?
  */
 static void t2c_update(struct csched2_runqueue_data *rqd, s_time_t time,
-                          struct csched2_vcpu *svc)
+                          struct csched2_item *svc)
 {
     uint64_t val = time * rqd->max_weight + svc->residual;
 
@@ -781,7 +781,7 @@ static void t2c_update(struct csched2_runqueue_data *rqd, s_time_t time,
     svc->credit -= val;
 }
 
-static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct csched2_vcpu *svc)
+static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct csched2_item *svc)
 {
     return credit * svc->weight / rqd->max_weight;
 }
@@ -790,14 +790,14 @@ static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct c
  * Runqueue related code.
  */
 
-static inline int vcpu_on_runq(struct csched2_vcpu *svc)
+static inline int vcpu_on_runq(struct csched2_item *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
 
-static inline struct csched2_vcpu * runq_elem(struct list_head *elem)
+static inline struct csched2_item * runq_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct csched2_vcpu, runq_elem);
+    return list_entry(elem, struct csched2_item, runq_elem);
 }
 
 static void activate_runqueue(struct csched2_private *prv, int rqi)
@@ -915,7 +915,7 @@ static void update_max_weight(struct csched2_runqueue_data *rqd, int new_weight,
 
         list_for_each( iter, &rqd->svc )
         {
-            struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, rqd_elem);
+            struct csched2_item * svc = list_entry(iter, struct csched2_item, rqd_elem);
 
             if ( svc->weight > max_weight )
                 max_weight = svc->weight;
@@ -940,7 +940,7 @@ static void update_max_weight(struct csched2_runqueue_data *rqd, int new_weight,
 
 /* Add and remove from runqueue assignment (not active run queue) */
 static void
-_runq_assign(struct csched2_vcpu *svc, struct csched2_runqueue_data *rqd)
+_runq_assign(struct csched2_item *svc, struct csched2_runqueue_data *rqd)
 {
 
     svc->rqd = rqd;
@@ -970,7 +970,7 @@ _runq_assign(struct csched2_vcpu *svc, struct csched2_runqueue_data *rqd)
 static void
 runq_assign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_item->priv;
+    struct csched2_item *svc = vc->sched_item->priv;
 
     ASSERT(svc->rqd == NULL);
 
@@ -978,7 +978,7 @@ runq_assign(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-_runq_deassign(struct csched2_vcpu *svc)
+_runq_deassign(struct csched2_item *svc)
 {
     struct csched2_runqueue_data *rqd = svc->rqd;
 
@@ -997,7 +997,7 @@ _runq_deassign(struct csched2_vcpu *svc)
 static void
 runq_deassign(const struct scheduler *ops, struct vcpu *vc)
 {
-    struct csched2_vcpu *svc = vc->sched_item->priv;
+    struct csched2_item *svc = vc->sched_item->priv;
 
     ASSERT(svc->rqd == c2rqd(ops, vc->processor));
 
@@ -1199,7 +1199,7 @@ update_runq_load(const struct scheduler *ops,
 
 static void
 update_svc_load(const struct scheduler *ops,
-                struct csched2_vcpu *svc, int change, s_time_t now)
+                struct csched2_item *svc, int change, s_time_t now)
 {
     struct csched2_private *prv = csched2_priv(ops);
     s_time_t delta, vcpu_load;
@@ -1259,7 +1259,7 @@ update_svc_load(const struct scheduler *ops,
 static void
 update_load(const struct scheduler *ops,
             struct csched2_runqueue_data *rqd,
-            struct csched2_vcpu *svc, int change, s_time_t now)
+            struct csched2_item *svc, int change, s_time_t now)
 {
     trace_var(TRC_CSCHED2_UPDATE_LOAD, 1, 0,  NULL);
 
@@ -1269,7 +1269,7 @@ update_load(const struct scheduler *ops,
 }
 
 static void
-runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
+runq_insert(const struct scheduler *ops, struct csched2_item *svc)
 {
     struct list_head *iter;
     unsigned int cpu = svc->vcpu->processor;
@@ -1288,7 +1288,7 @@ runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
 
     list_for_each( iter, runq )
     {
-        struct csched2_vcpu * iter_svc = runq_elem(iter);
+        struct csched2_item * iter_svc = runq_elem(iter);
 
         if ( svc->credit > iter_svc->credit )
             break;
@@ -1312,13 +1312,13 @@ runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
     }
 }
 
-static inline void runq_remove(struct csched2_vcpu *svc)
+static inline void runq_remove(struct csched2_item *svc)
 {
     ASSERT(vcpu_on_runq(svc));
     list_del_init(&svc->runq_elem);
 }
 
-void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
+void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_item *, s_time_t);
 
 static inline void
 tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
@@ -1334,7 +1334,7 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
  * whether or not it already run for more than the ratelimit, to which we
  * apply some tolerance).
  */
-static inline bool is_preemptable(const struct csched2_vcpu *svc,
+static inline bool is_preemptable(const struct csched2_item *svc,
                                     s_time_t now, s_time_t ratelimit)
 {
     if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
@@ -1360,10 +1360,10 @@ static inline bool is_preemptable(const struct csched2_vcpu *svc,
  * Within the same class, the highest difference of credit.
  */
 static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
-                             struct csched2_vcpu *new, unsigned int cpu)
+                             struct csched2_item *new, unsigned int cpu)
 {
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
-    struct csched2_vcpu * cur = csched2_vcpu(curr_on_cpu(cpu));
+    struct csched2_item * cur = csched2_item(curr_on_cpu(cpu));
     struct csched2_private *prv = csched2_priv(ops);
     s_time_t score;
 
@@ -1432,7 +1432,7 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
  * pick up some work, so it would be wrong to consider it idle.
  */
 static void
-runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
+runq_tickle(const struct scheduler *ops, struct csched2_item *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t max = 0;
@@ -1587,7 +1587,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         return;
     }
 
-    ASSERT(!is_idle_vcpu(curr_on_cpu(ipid)));
+    ASSERT(!is_idle_vcpu(curr_on_cpu(ipid)->vcpu));
     SCHED_STAT_CRANK(tickled_busy_cpu);
  tickle:
     BUG_ON(ipid == -1);
@@ -1614,7 +1614,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
  * Credit-related code
  */
 static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
-                         struct csched2_vcpu *snext)
+                         struct csched2_item *snext)
 {
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
     struct list_head *iter;
@@ -1644,10 +1644,10 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
     list_for_each( iter, &rqd->svc )
     {
         unsigned int svc_cpu;
-        struct csched2_vcpu * svc;
+        struct csched2_item * svc;
         int start_credit;
 
-        svc = list_entry(iter, struct csched2_vcpu, rqd_elem);
+        svc = list_entry(iter, struct csched2_item, rqd_elem);
         svc_cpu = svc->vcpu->processor;
 
         ASSERT(!is_idle_vcpu(svc->vcpu));
@@ -1657,7 +1657,7 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
          * If svc is running, it is our responsibility to make sure, here,
          * that the credit it has spent so far get accounted.
          */
-        if ( svc->vcpu == curr_on_cpu(svc_cpu) )
+        if ( svc->vcpu == curr_on_cpu(svc_cpu)->vcpu )
         {
             burn_credits(rqd, svc, now);
             /*
@@ -1709,11 +1709,11 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
 }
 
 void burn_credits(struct csched2_runqueue_data *rqd,
-                  struct csched2_vcpu *svc, s_time_t now)
+                  struct csched2_item *svc, s_time_t now)
 {
     s_time_t delta;
 
-    ASSERT(svc == csched2_vcpu(curr_on_cpu(svc->vcpu->processor)));
+    ASSERT(svc == csched2_item(curr_on_cpu(svc->vcpu->processor)));
 
     if ( unlikely(is_idle_vcpu(svc->vcpu)) )
     {
@@ -1763,7 +1763,7 @@ void burn_credits(struct csched2_runqueue_data *rqd,
  * Budget-related code.
  */
 
-static void park_vcpu(struct csched2_vcpu *svc)
+static void park_vcpu(struct csched2_item *svc)
 {
     struct vcpu *v = svc->vcpu;
 
@@ -1792,7 +1792,7 @@ static void park_vcpu(struct csched2_vcpu *svc)
     list_add(&svc->parked_elem, &svc->sdom->parked_vcpus);
 }
 
-static bool vcpu_grab_budget(struct csched2_vcpu *svc)
+static bool vcpu_grab_budget(struct csched2_item *svc)
 {
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
@@ -1839,7 +1839,7 @@ static bool vcpu_grab_budget(struct csched2_vcpu *svc)
 }
 
 static void
-vcpu_return_budget(struct csched2_vcpu *svc, struct list_head *parked)
+vcpu_return_budget(struct csched2_item *svc, struct list_head *parked)
 {
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
@@ -1882,7 +1882,7 @@ vcpu_return_budget(struct csched2_vcpu *svc, struct list_head *parked)
 static void
 unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
 {
-    struct csched2_vcpu *svc, *tmp;
+    struct csched2_item *svc, *tmp;
     spinlock_t *lock;
 
     list_for_each_entry_safe(svc, tmp, vcpus, parked_elem)
@@ -2004,7 +2004,7 @@ static void replenish_domain_budget(void* data)
 static inline void
 csched2_vcpu_check(struct vcpu *vc)
 {
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_item * const svc = csched2_item(vc->sched_item);
     struct csched2_dom * const sdom = svc->sdom;
 
     BUG_ON( svc->vcpu != vc );
@@ -2030,10 +2030,10 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
                     void *dd)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched2_vcpu *svc;
+    struct csched2_item *svc;
 
     /* Allocate per-VCPU info */
-    svc = xzalloc(struct csched2_vcpu);
+    svc = xzalloc(struct csched2_item);
     if ( svc == NULL )
         return NULL;
 
@@ -2074,12 +2074,12 @@ static void
 csched2_item_sleep(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_item * const svc = csched2_item(item);
 
     ASSERT(!is_idle_vcpu(vc));
     SCHED_STAT_CRANK(vcpu_sleep);
 
-    if ( curr_on_cpu(vc->processor) == vc )
+    if ( curr_on_cpu(vc->processor) == item )
     {
         tickle_cpu(vc->processor, svc->rqd);
     }
@@ -2097,7 +2097,7 @@ static void
 csched2_item_wake(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_item * const svc = csched2_item(item);
     unsigned int cpu = vc->processor;
     s_time_t now;
 
@@ -2105,7 +2105,7 @@ csched2_item_wake(const struct scheduler *ops, struct sched_item *item)
 
     ASSERT(!is_idle_vcpu(vc));
 
-    if ( unlikely(curr_on_cpu(cpu) == vc) )
+    if ( unlikely(curr_on_cpu(cpu) == item) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         goto out;
@@ -2152,8 +2152,7 @@ out:
 static void
 csched2_item_yield(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *v = item->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(v);
+    struct csched2_item * const svc = csched2_item(item);
 
     __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
 }
@@ -2162,7 +2161,7 @@ static void
 csched2_context_saved(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_item * const svc = csched2_item(item);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
     s_time_t now = NOW();
     LIST_HEAD(were_parked);
@@ -2208,7 +2207,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
     struct vcpu *vc = item->vcpu;
     int i, min_rqi = -1, min_s_rqi = -1;
     unsigned int new_cpu, cpu = vc->processor;
-    struct csched2_vcpu *svc = csched2_vcpu(vc);
+    struct csched2_item *svc = csched2_item(item);
     s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
     bool has_soft;
 
@@ -2430,15 +2429,15 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
 typedef struct {
     /* NB: Modified by consider() */
     s_time_t load_delta;
-    struct csched2_vcpu * best_push_svc, *best_pull_svc;
+    struct csched2_item * best_push_svc, *best_pull_svc;
     /* NB: Read by consider() */
     struct csched2_runqueue_data *lrqd;
     struct csched2_runqueue_data *orqd;                  
 } balance_state_t;
 
 static void consider(balance_state_t *st, 
-                     struct csched2_vcpu *push_svc,
-                     struct csched2_vcpu *pull_svc)
+                     struct csched2_item *push_svc,
+                     struct csched2_item *pull_svc)
 {
     s_time_t l_load, o_load, delta;
 
@@ -2471,8 +2470,8 @@ static void consider(balance_state_t *st,
 
 
 static void migrate(const struct scheduler *ops,
-                    struct csched2_vcpu *svc, 
-                    struct csched2_runqueue_data *trqd, 
+                    struct csched2_item *svc,
+                    struct csched2_runqueue_data *trqd,
                     s_time_t now)
 {
     int cpu = svc->vcpu->processor;
@@ -2541,7 +2540,7 @@ static void migrate(const struct scheduler *ops,
  *  - svc is not already flagged to migrate,
  *  - if svc is allowed to run on at least one of the pcpus of rqd.
  */
-static bool vcpu_is_migrateable(struct csched2_vcpu *svc,
+static bool vcpu_is_migrateable(struct csched2_item *svc,
                                   struct csched2_runqueue_data *rqd)
 {
     struct vcpu *v = svc->vcpu;
@@ -2691,7 +2690,7 @@ retry:
     /* Reuse load delta (as we're trying to minimize it) */
     list_for_each( push_iter, &st.lrqd->svc )
     {
-        struct csched2_vcpu * push_svc = list_entry(push_iter, struct csched2_vcpu, rqd_elem);
+        struct csched2_item * push_svc = list_entry(push_iter, struct csched2_item, rqd_elem);
 
         update_svc_load(ops, push_svc, 0, now);
 
@@ -2700,7 +2699,7 @@ retry:
 
         list_for_each( pull_iter, &st.orqd->svc )
         {
-            struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
+            struct csched2_item * pull_svc = list_entry(pull_iter, struct csched2_item, rqd_elem);
             
             if ( !inner_load_updated )
                 update_svc_load(ops, pull_svc, 0, now);
@@ -2719,7 +2718,7 @@ retry:
 
     list_for_each( pull_iter, &st.orqd->svc )
     {
-        struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
+        struct csched2_item * pull_svc = list_entry(pull_iter, struct csched2_item, rqd_elem);
         
         if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
             continue;
@@ -2746,7 +2745,7 @@ csched2_item_migrate(
 {
     struct vcpu *vc = item->vcpu;
     struct domain *d = vc->domain;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_item * const svc = csched2_item(item);
     struct csched2_runqueue_data *trqd;
     s_time_t now = NOW();
 
@@ -2847,7 +2846,7 @@ csched2_dom_cntl(
             /* Update weights for vcpus, and max_weight for runqueues on which they reside */
             for_each_vcpu ( d, v )
             {
-                struct csched2_vcpu *svc = csched2_vcpu(v);
+                struct csched2_item *svc = csched2_item(v->sched_item);
                 spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
 
                 ASSERT(svc->rqd == c2rqd(ops, svc->vcpu->processor));
@@ -2861,7 +2860,7 @@ csched2_dom_cntl(
         /* Cap */
         if ( op->u.credit2.cap != 0 )
         {
-            struct csched2_vcpu *svc;
+            struct csched2_item *svc;
             spinlock_t *lock;
 
             /* Cap is only valid if it's below 100 * nr_of_vCPUS */
@@ -2885,7 +2884,7 @@ csched2_dom_cntl(
              */
             for_each_vcpu ( d, v )
             {
-                svc = csched2_vcpu(v);
+                svc = csched2_item(v->sched_item);
                 lock = vcpu_schedule_lock(svc->vcpu);
                 /*
                  * Too small quotas would in theory cause a lot of overhead,
@@ -2928,14 +2927,14 @@ csched2_dom_cntl(
                  */
                 for_each_vcpu ( d, v )
                 {
-                    svc = csched2_vcpu(v);
+                    svc = csched2_item(v->sched_item);
                     lock = vcpu_schedule_lock(svc->vcpu);
                     if ( v->is_running )
                     {
                         unsigned int cpu = v->processor;
                         struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
 
-                        ASSERT(curr_on_cpu(cpu) == v);
+                        ASSERT(curr_on_cpu(cpu)->vcpu == v);
 
                         /*
                          * We are triggering a reschedule on the vCPU's
@@ -2975,7 +2974,7 @@ csched2_dom_cntl(
             /* Disable budget accounting for all the vCPUs. */
             for_each_vcpu ( d, v )
             {
-                struct csched2_vcpu *svc = csched2_vcpu(v);
+                struct csched2_item *svc = csched2_item(v->sched_item);
                 spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
 
                 svc->budget = STIME_MAX;
@@ -3012,8 +3011,7 @@ static void
 csched2_aff_cntl(const struct scheduler *ops, struct sched_item *item,
                  const cpumask_t *hard, const cpumask_t *soft)
 {
-    struct vcpu *v = item->vcpu;
-    struct csched2_vcpu *svc = csched2_vcpu(v);
+    struct csched2_item *svc = csched2_item(item);
 
     if ( !hard )
         return;
@@ -3113,7 +3111,7 @@ static void
 csched2_item_insert(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched2_vcpu *svc = item->priv;
+    struct csched2_item *svc = item->priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
@@ -3145,7 +3143,7 @@ csched2_item_insert(const struct scheduler *ops, struct sched_item *item)
 static void
 csched2_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct csched2_vcpu *svc = priv;
+    struct csched2_item *svc = priv;
 
     xfree(svc);
 }
@@ -3154,7 +3152,7 @@ static void
 csched2_item_remove(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct csched2_vcpu * const svc = csched2_vcpu(vc);
+    struct csched2_item * const svc = csched2_item(item);
     spinlock_t *lock;
 
     ASSERT(!is_idle_vcpu(vc));
@@ -3175,7 +3173,7 @@ csched2_item_remove(const struct scheduler *ops, struct sched_item *item)
 /* How long should we let this vcpu run for? */
 static s_time_t
 csched2_runtime(const struct scheduler *ops, int cpu,
-                struct csched2_vcpu *snext, s_time_t now)
+                struct csched2_item *snext, s_time_t now)
 {
     s_time_t time, min_time;
     int rt_credit; /* Proposed runtime measured in credits */
@@ -3220,7 +3218,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
      */
     if ( ! list_empty(runq) )
     {
-        struct csched2_vcpu *swait = runq_elem(runq->next);
+        struct csched2_item *swait = runq_elem(runq->next);
 
         if ( ! is_idle_vcpu(swait->vcpu)
              && swait->credit > 0 )
@@ -3271,14 +3269,14 @@ csched2_runtime(const struct scheduler *ops, int cpu,
 /*
  * Find a candidate.
  */
-static struct csched2_vcpu *
+static struct csched2_item *
 runq_candidate(struct csched2_runqueue_data *rqd,
-               struct csched2_vcpu *scurr,
+               struct csched2_item *scurr,
                int cpu, s_time_t now,
                unsigned int *skipped)
 {
     struct list_head *iter, *temp;
-    struct csched2_vcpu *snext = NULL;
+    struct csched2_item *snext = NULL;
     struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
     bool yield = false, soft_aff_preempt = false;
 
@@ -3359,12 +3357,12 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
         snext = scurr;
     else
-        snext = csched2_vcpu(idle_vcpu[cpu]);
+        snext = csched2_item(idle_vcpu[cpu]->sched_item);
 
  check_runq:
     list_for_each_safe( iter, temp, &rqd->runq )
     {
-        struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
+        struct csched2_item * svc = list_entry(iter, struct csched2_item, runq_elem);
 
         if ( unlikely(tb_init_done) )
         {
@@ -3463,8 +3461,8 @@ csched2_schedule(
 {
     const int cpu = smp_processor_id();
     struct csched2_runqueue_data *rqd;
-    struct csched2_vcpu * const scurr = csched2_vcpu(current);
-    struct csched2_vcpu *snext = NULL;
+    struct csched2_item * const scurr = csched2_item(current->sched_item);
+    struct csched2_item *snext = NULL;
     unsigned int skipped_vcpus = 0;
     struct task_slice ret;
     bool tickled;
@@ -3540,7 +3538,7 @@ csched2_schedule(
     {
         __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
         trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
-        snext = csched2_vcpu(idle_vcpu[cpu]);
+        snext = csched2_item(idle_vcpu[cpu]->sched_item);
     }
     else
         snext = runq_candidate(rqd, scurr, cpu, now, &skipped_vcpus);
@@ -3643,7 +3641,7 @@ csched2_schedule(
 }
 
 static void
-csched2_dump_vcpu(struct csched2_private *prv, struct csched2_vcpu *svc)
+csched2_dump_vcpu(struct csched2_private *prv, struct csched2_item *svc)
 {
     printk("[%i.%i] flags=%x cpu=%i",
             svc->vcpu->domain->domain_id,
@@ -3667,7 +3665,7 @@ static inline void
 dump_pcpu(const struct scheduler *ops, int cpu)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    struct csched2_vcpu *svc;
+    struct csched2_item *svc;
 
     printk("CPU[%02d] runq=%d, sibling=%*pb, core=%*pb\n",
            cpu, c2r(cpu),
@@ -3675,7 +3673,7 @@ dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
     /* current VCPU (nothing to say if that's the idle vcpu) */
-    svc = csched2_vcpu(curr_on_cpu(cpu));
+    svc = csched2_item(curr_on_cpu(cpu));
     if ( svc && !is_idle_vcpu(svc->vcpu) )
     {
         printk("\trun: ");
@@ -3748,7 +3746,7 @@ csched2_dump(const struct scheduler *ops)
 
         for_each_vcpu( sdom->dom, v )
         {
-            struct csched2_vcpu * const svc = csched2_vcpu(v);
+            struct csched2_item * const svc = csched2_item(v->sched_item);
             spinlock_t *lock;
 
             lock = vcpu_schedule_lock(svc->vcpu);
@@ -3777,7 +3775,7 @@ csched2_dump(const struct scheduler *ops)
         printk("RUNQ:\n");
         list_for_each( iter, runq )
         {
-            struct csched2_vcpu *svc = runq_elem(iter);
+            struct csched2_item *svc = runq_elem(iter);
 
             if ( svc )
             {
@@ -3879,7 +3877,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                      void *pdata, void *vdata)
 {
     struct csched2_private *prv = csched2_priv(new_ops);
-    struct csched2_vcpu *svc = vdata;
+    struct csched2_item *svc = vdata;
     unsigned rqi;
 
     ASSERT(pdata && svc && is_idle_vcpu(svc->vcpu));
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index a08f23993c..f7a2650c48 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -94,7 +94,7 @@ DEFINE_PER_CPU(struct null_pcpu, npc);
 /*
  * Virtual CPU
  */
-struct null_vcpu {
+struct null_item {
     struct list_head waitq_elem;
     struct vcpu *vcpu;
 };
@@ -115,9 +115,9 @@ static inline struct null_private *null_priv(const struct scheduler *ops)
     return ops->sched_data;
 }
 
-static inline struct null_vcpu *null_vcpu(const struct vcpu *v)
+static inline struct null_item *null_item(const struct sched_item *item)
 {
-    return v->sched_item->priv;
+    return item->priv;
 }
 
 static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
@@ -197,9 +197,9 @@ static void *null_alloc_vdata(const struct scheduler *ops,
                               struct sched_item *item, void *dd)
 {
     struct vcpu *v = item->vcpu;
-    struct null_vcpu *nvc;
+    struct null_item *nvc;
 
-    nvc = xzalloc(struct null_vcpu);
+    nvc = xzalloc(struct null_item);
     if ( nvc == NULL )
         return NULL;
 
@@ -213,7 +213,7 @@ static void *null_alloc_vdata(const struct scheduler *ops,
 
 static void null_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct null_vcpu *nvc = priv;
+    struct null_item *nvc = priv;
 
     xfree(nvc);
 }
@@ -390,7 +390,7 @@ static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 {
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct null_private *prv = null_priv(new_ops);
-    struct null_vcpu *nvc = vdata;
+    struct null_item *nvc = vdata;
 
     ASSERT(nvc && is_idle_vcpu(nvc->vcpu));
 
@@ -422,7 +422,7 @@ static void null_item_insert(const struct scheduler *ops,
 {
     struct vcpu *v = item->vcpu;
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc = null_vcpu(v);
+    struct null_item *nvc = null_item(item);
     unsigned int cpu;
     spinlock_t *lock;
 
@@ -479,9 +479,9 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
 {
     unsigned int bs;
     unsigned int cpu = v->processor;
-    struct null_vcpu *wvc;
+    struct null_item *wvc;
 
-    ASSERT(list_empty(&null_vcpu(v)->waitq_elem));
+    ASSERT(list_empty(&null_item(v->sched_item)->waitq_elem));
 
     vcpu_deassign(prv, v, cpu);
 
@@ -517,7 +517,7 @@ static void null_item_remove(const struct scheduler *ops,
 {
     struct vcpu *v = item->vcpu;
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc = null_vcpu(v);
+    struct null_item *nvc = null_item(item);
     spinlock_t *lock;
 
     ASSERT(!is_idle_vcpu(v));
@@ -552,13 +552,13 @@ static void null_item_wake(const struct scheduler *ops,
 
     ASSERT(!is_idle_vcpu(v));
 
-    if ( unlikely(curr_on_cpu(v->processor) == v) )
+    if ( unlikely(curr_on_cpu(v->processor) == item) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         return;
     }
 
-    if ( unlikely(!list_empty(&null_vcpu(v)->waitq_elem)) )
+    if ( unlikely(!list_empty(&null_item(item)->waitq_elem)) )
     {
         /* Not exactly "on runq", but close enough for reusing the counter */
         SCHED_STAT_CRANK(vcpu_wake_onrunq);
@@ -582,7 +582,7 @@ static void null_item_sleep(const struct scheduler *ops,
     ASSERT(!is_idle_vcpu(v));
 
     /* If v is not assigned to a pCPU, or is not running, no need to bother */
-    if ( curr_on_cpu(v->processor) == v )
+    if ( curr_on_cpu(v->processor) == item )
         cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
 
     SCHED_STAT_CRANK(vcpu_sleep);
@@ -600,7 +600,7 @@ static void null_item_migrate(const struct scheduler *ops,
 {
     struct vcpu *v = item->vcpu;
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc = null_vcpu(v);
+    struct null_item *nvc = null_item(item);
 
     ASSERT(!is_idle_vcpu(v));
 
@@ -685,7 +685,7 @@ static void null_item_migrate(const struct scheduler *ops,
 #ifndef NDEBUG
 static inline void null_vcpu_check(struct vcpu *v)
 {
-    struct null_vcpu * const nvc = null_vcpu(v);
+    struct null_item * const nvc = null_item(v->sched_item);
     struct null_dom * const ndom = v->domain->sched_priv;
 
     BUG_ON(nvc->vcpu != v);
@@ -715,7 +715,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     unsigned int bs;
     const unsigned int cpu = smp_processor_id();
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *wvc;
+    struct null_item *wvc;
     struct task_slice ret;
 
     SCHED_STAT_CRANK(schedule);
@@ -798,7 +798,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     return ret;
 }
 
-static inline void dump_vcpu(struct null_private *prv, struct null_vcpu *nvc)
+static inline void dump_vcpu(struct null_private *prv, struct null_item *nvc)
 {
     printk("[%i.%i] pcpu=%d", nvc->vcpu->domain->domain_id,
             nvc->vcpu->vcpu_id, list_empty(&nvc->waitq_elem) ?
@@ -808,7 +808,7 @@ static inline void dump_vcpu(struct null_private *prv, struct null_vcpu *nvc)
 static void null_dump_pcpu(const struct scheduler *ops, int cpu)
 {
     struct null_private *prv = null_priv(ops);
-    struct null_vcpu *nvc;
+    struct null_item *nvc;
     spinlock_t *lock;
     unsigned long flags;
 
@@ -823,7 +823,7 @@ static void null_dump_pcpu(const struct scheduler *ops, int cpu)
     printk("\n");
 
     /* current VCPU (nothing to say if that's the idle vcpu) */
-    nvc = null_vcpu(curr_on_cpu(cpu));
+    nvc = null_item(curr_on_cpu(cpu));
     if ( nvc && !is_idle_vcpu(nvc->vcpu) )
     {
         printk("\trun: ");
@@ -857,7 +857,7 @@ static void null_dump(const struct scheduler *ops)
         printk("\tDomain: %d\n", ndom->dom->domain_id);
         for_each_vcpu( ndom->dom, v )
         {
-            struct null_vcpu * const nvc = null_vcpu(v);
+            struct null_item * const nvc = null_item(v->sched_item);
             spinlock_t *lock;
 
             lock = vcpu_schedule_lock(nvc->vcpu);
@@ -875,7 +875,7 @@ static void null_dump(const struct scheduler *ops)
     spin_lock(&prv->waitq_lock);
     list_for_each( iter, &prv->waitq )
     {
-        struct null_vcpu *nvc = list_entry(iter, struct null_vcpu, waitq_elem);
+        struct null_item *nvc = list_entry(iter, struct null_item, waitq_elem);
 
         if ( loop++ != 0 )
             printk(", ");
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 2bd4637592..a3cd00f765 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -195,7 +195,7 @@ struct rt_private {
 /*
  * Virtual CPU
  */
-struct rt_vcpu {
+struct rt_item {
     struct list_head q_elem;     /* on the runq/depletedq list */
     struct list_head replq_elem; /* on the replenishment events list */
 
@@ -233,9 +233,9 @@ static inline struct rt_private *rt_priv(const struct scheduler *ops)
     return ops->sched_data;
 }
 
-static inline struct rt_vcpu *rt_vcpu(const struct vcpu *vcpu)
+static inline struct rt_item *rt_item(const struct sched_item *item)
 {
-    return vcpu->sched_item->priv;
+    return item->priv;
 }
 
 static inline struct list_head *rt_runq(const struct scheduler *ops)
@@ -253,7 +253,7 @@ static inline struct list_head *rt_replq(const struct scheduler *ops)
     return &rt_priv(ops)->replq;
 }
 
-static inline bool has_extratime(const struct rt_vcpu *svc)
+static inline bool has_extratime(const struct rt_item *svc)
 {
     return svc->flags & RTDS_extratime;
 }
@@ -263,25 +263,25 @@ static inline bool has_extratime(const struct rt_vcpu *svc)
  * and the replenishment events queue.
  */
 static int
-vcpu_on_q(const struct rt_vcpu *svc)
+vcpu_on_q(const struct rt_item *svc)
 {
    return !list_empty(&svc->q_elem);
 }
 
-static struct rt_vcpu *
+static struct rt_item *
 q_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct rt_vcpu, q_elem);
+    return list_entry(elem, struct rt_item, q_elem);
 }
 
-static struct rt_vcpu *
+static struct rt_item *
 replq_elem(struct list_head *elem)
 {
-    return list_entry(elem, struct rt_vcpu, replq_elem);
+    return list_entry(elem, struct rt_item, replq_elem);
 }
 
 static int
-vcpu_on_replq(const struct rt_vcpu *svc)
+vcpu_on_replq(const struct rt_item *svc)
 {
     return !list_empty(&svc->replq_elem);
 }
@@ -291,7 +291,7 @@ vcpu_on_replq(const struct rt_vcpu *svc)
  * Otherwise, return value < 0
  */
 static s_time_t
-compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
+compare_vcpu_priority(const struct rt_item *v1, const struct rt_item *v2)
 {
     int prio = v2->priority_level - v1->priority_level;
 
@@ -305,7 +305,7 @@ compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
  * Debug related code, dump vcpu/cpu information
  */
 static void
-rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
+rt_dump_vcpu(const struct scheduler *ops, const struct rt_item *svc)
 {
     cpumask_t *cpupool_mask, *mask;
 
@@ -352,13 +352,13 @@ static void
 rt_dump_pcpu(const struct scheduler *ops, int cpu)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *svc;
+    struct rt_item *svc;
     unsigned long flags;
 
     spin_lock_irqsave(&prv->lock, flags);
     printk("CPU[%02d]\n", cpu);
     /* current VCPU (nothing to say if that's the idle vcpu). */
-    svc = rt_vcpu(curr_on_cpu(cpu));
+    svc = rt_item(curr_on_cpu(cpu));
     if ( svc && !is_idle_vcpu(svc->vcpu) )
     {
         rt_dump_vcpu(ops, svc);
@@ -371,7 +371,7 @@ rt_dump(const struct scheduler *ops)
 {
     struct list_head *runq, *depletedq, *replq, *iter;
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *svc;
+    struct rt_item *svc;
     struct rt_dom *sdom;
     unsigned long flags;
 
@@ -415,7 +415,7 @@ rt_dump(const struct scheduler *ops)
 
         for_each_vcpu ( sdom->dom, v )
         {
-            svc = rt_vcpu(v);
+            svc = rt_item(v->sched_item);
             rt_dump_vcpu(ops, svc);
         }
     }
@@ -429,7 +429,7 @@ rt_dump(const struct scheduler *ops)
  * it needs to be updated to the deadline of the current period
  */
 static void
-rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
+rt_update_deadline(s_time_t now, struct rt_item *svc)
 {
     ASSERT(now >= svc->cur_deadline);
     ASSERT(svc->period != 0);
@@ -500,8 +500,8 @@ deadline_queue_remove(struct list_head *queue, struct list_head *elem)
 }
 
 static inline bool
-deadline_queue_insert(struct rt_vcpu * (*qelem)(struct list_head *),
-                      struct rt_vcpu *svc, struct list_head *elem,
+deadline_queue_insert(struct rt_item * (*qelem)(struct list_head *),
+                      struct rt_item *svc, struct list_head *elem,
                       struct list_head *queue)
 {
     struct list_head *iter;
@@ -509,7 +509,7 @@ deadline_queue_insert(struct rt_vcpu * (*qelem)(struct list_head *),
 
     list_for_each ( iter, queue )
     {
-        struct rt_vcpu * iter_svc = (*qelem)(iter);
+        struct rt_item * iter_svc = (*qelem)(iter);
         if ( compare_vcpu_priority(svc, iter_svc) > 0 )
             break;
         pos++;
@@ -523,14 +523,14 @@ deadline_queue_insert(struct rt_vcpu * (*qelem)(struct list_head *),
   deadline_queue_insert(&replq_elem, ##__VA_ARGS__)
 
 static inline void
-q_remove(struct rt_vcpu *svc)
+q_remove(struct rt_item *svc)
 {
     ASSERT( vcpu_on_q(svc) );
     list_del_init(&svc->q_elem);
 }
 
 static inline void
-replq_remove(const struct scheduler *ops, struct rt_vcpu *svc)
+replq_remove(const struct scheduler *ops, struct rt_item *svc)
 {
     struct rt_private *prv = rt_priv(ops);
     struct list_head *replq = rt_replq(ops);
@@ -547,7 +547,7 @@ replq_remove(const struct scheduler *ops, struct rt_vcpu *svc)
          */
         if ( !list_empty(replq) )
         {
-            struct rt_vcpu *svc_next = replq_elem(replq->next);
+            struct rt_item *svc_next = replq_elem(replq->next);
             set_timer(&prv->repl_timer, svc_next->cur_deadline);
         }
         else
@@ -561,7 +561,7 @@ replq_remove(const struct scheduler *ops, struct rt_vcpu *svc)
  * Insert svc without budget in DepletedQ unsorted;
  */
 static void
-runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
+runq_insert(const struct scheduler *ops, struct rt_item *svc)
 {
     struct rt_private *prv = rt_priv(ops);
     struct list_head *runq = rt_runq(ops);
@@ -579,7 +579,7 @@ runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
 }
 
 static void
-replq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
+replq_insert(const struct scheduler *ops, struct rt_item *svc)
 {
     struct list_head *replq = rt_replq(ops);
     struct rt_private *prv = rt_priv(ops);
@@ -601,10 +601,10 @@ replq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
  * changed.
  */
 static void
-replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc)
+replq_reinsert(const struct scheduler *ops, struct rt_item *svc)
 {
     struct list_head *replq = rt_replq(ops);
-    struct rt_vcpu *rearm_svc = svc;
+    struct rt_item *rearm_svc = svc;
     bool_t rearm = 0;
 
     ASSERT( vcpu_on_replq(svc) );
@@ -735,7 +735,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                 void *pdata, void *vdata)
 {
     struct rt_private *prv = rt_priv(new_ops);
-    struct rt_vcpu *svc = vdata;
+    struct rt_item *svc = vdata;
 
     ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
 
@@ -850,10 +850,10 @@ static void *
 rt_alloc_vdata(const struct scheduler *ops, struct sched_item *item, void *dd)
 {
     struct vcpu *vc = item->vcpu;
-    struct rt_vcpu *svc;
+    struct rt_item *svc;
 
     /* Allocate per-VCPU info */
-    svc = xzalloc(struct rt_vcpu);
+    svc = xzalloc(struct rt_item);
     if ( svc == NULL )
         return NULL;
 
@@ -878,7 +878,7 @@ rt_alloc_vdata(const struct scheduler *ops, struct sched_item *item, void *dd)
 static void
 rt_free_vdata(const struct scheduler *ops, void *priv)
 {
-    struct rt_vcpu *svc = priv;
+    struct rt_item *svc = priv;
 
     xfree(svc);
 }
@@ -894,7 +894,7 @@ static void
 rt_item_insert(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct rt_vcpu *svc = rt_vcpu(vc);
+    struct rt_item *svc = rt_item(item);
     s_time_t now;
     spinlock_t *lock;
 
@@ -923,13 +923,13 @@ rt_item_insert(const struct scheduler *ops, struct sched_item *item)
 }
 
 /*
- * Remove rt_vcpu svc from the old scheduler in source cpupool.
+ * Remove rt_item svc from the old scheduler in source cpupool.
  */
 static void
 rt_item_remove(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct rt_vcpu * const svc = rt_vcpu(vc);
+    struct rt_item * const svc = rt_item(item);
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
@@ -951,7 +951,7 @@ rt_item_remove(const struct scheduler *ops, struct sched_item *item)
  * Burn budget in nanosecond granularity
  */
 static void
-burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
+burn_budget(const struct scheduler *ops, struct rt_item *svc, s_time_t now)
 {
     s_time_t delta;
 
@@ -1015,13 +1015,13 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
  * RunQ is sorted. Pick first one within cpumask. If no one, return NULL
  * lock is grabbed before calling this function
  */
-static struct rt_vcpu *
+static struct rt_item *
 runq_pick(const struct scheduler *ops, const cpumask_t *mask)
 {
     struct list_head *runq = rt_runq(ops);
     struct list_head *iter;
-    struct rt_vcpu *svc = NULL;
-    struct rt_vcpu *iter_svc = NULL;
+    struct rt_item *svc = NULL;
+    struct rt_item *iter_svc = NULL;
     cpumask_t cpu_common;
     cpumask_t *online;
 
@@ -1072,8 +1072,8 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
 {
     const int cpu = smp_processor_id();
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *const scurr = rt_vcpu(current);
-    struct rt_vcpu *snext = NULL;
+    struct rt_item *const scurr = rt_item(current->sched_item);
+    struct rt_item *snext = NULL;
     struct task_slice ret = { .migrated = 0 };
 
     /* TRACE */
@@ -1099,13 +1099,13 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0,  NULL);
-        snext = rt_vcpu(idle_vcpu[cpu]);
+        snext = rt_item(idle_vcpu[cpu]->sched_item);
     }
     else
     {
         snext = runq_pick(ops, cpumask_of(cpu));
         if ( snext == NULL )
-            snext = rt_vcpu(idle_vcpu[cpu]);
+            snext = rt_item(idle_vcpu[cpu]->sched_item);
 
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_vcpu(current) &&
@@ -1151,12 +1151,12 @@ static void
 rt_item_sleep(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct rt_vcpu * const svc = rt_vcpu(vc);
+    struct rt_item * const svc = rt_item(item);
 
     BUG_ON( is_idle_vcpu(vc) );
     SCHED_STAT_CRANK(vcpu_sleep);
 
-    if ( curr_on_cpu(vc->processor) == vc )
+    if ( curr_on_cpu(vc->processor) == item )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
     else if ( vcpu_on_q(svc) )
     {
@@ -1186,11 +1186,11 @@ rt_item_sleep(const struct scheduler *ops, struct sched_item *item)
  * lock is grabbed before calling this function
  */
 static void
-runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
+runq_tickle(const struct scheduler *ops, struct rt_item *new)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *latest_deadline_vcpu = NULL; /* lowest priority */
-    struct rt_vcpu *iter_svc;
+    struct rt_item *latest_deadline_vcpu = NULL; /* lowest priority */
+    struct rt_item *iter_svc;
     struct vcpu *iter_vc;
     int cpu = 0, cpu_to_tickle = 0;
     cpumask_t not_tickled;
@@ -1211,14 +1211,14 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
     cpu = cpumask_test_or_cycle(new->vcpu->processor, &not_tickled);
     while ( cpu!= nr_cpu_ids )
     {
-        iter_vc = curr_on_cpu(cpu);
+        iter_vc = curr_on_cpu(cpu)->vcpu;
         if ( is_idle_vcpu(iter_vc) )
         {
             SCHED_STAT_CRANK(tickled_idle_cpu);
             cpu_to_tickle = cpu;
             goto out;
         }
-        iter_svc = rt_vcpu(iter_vc);
+        iter_svc = rt_item(iter_vc->sched_item);
         if ( latest_deadline_vcpu == NULL ||
              compare_vcpu_priority(iter_svc, latest_deadline_vcpu) < 0 )
             latest_deadline_vcpu = iter_svc;
@@ -1267,13 +1267,13 @@ static void
 rt_item_wake(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct rt_vcpu * const svc = rt_vcpu(vc);
+    struct rt_item * const svc = rt_item(item);
     s_time_t now;
     bool_t missed;
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == vc) )
+    if ( unlikely(curr_on_cpu(vc->processor) == item) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         return;
@@ -1338,7 +1338,7 @@ static void
 rt_context_saved(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
-    struct rt_vcpu *svc = rt_vcpu(vc);
+    struct rt_item *svc = rt_item(item);
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
 
     __clear_bit(__RTDS_scheduled, &svc->flags);
@@ -1369,7 +1369,7 @@ rt_dom_cntl(
     struct xen_domctl_scheduler_op *op)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *svc;
+    struct rt_item *svc;
     struct vcpu *v;
     unsigned long flags;
     int rc = 0;
@@ -1393,7 +1393,7 @@ rt_dom_cntl(
         spin_lock_irqsave(&prv->lock, flags);
         for_each_vcpu ( d, v )
         {
-            svc = rt_vcpu(v);
+            svc = rt_item(v->sched_item);
             svc->period = MICROSECS(op->u.rtds.period); /* transfer to nanosec */
             svc->budget = MICROSECS(op->u.rtds.budget);
         }
@@ -1419,7 +1419,7 @@ rt_dom_cntl(
             if ( op->cmd == XEN_DOMCTL_SCHEDOP_getvcpuinfo )
             {
                 spin_lock_irqsave(&prv->lock, flags);
-                svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
+                svc = rt_item(d->vcpu[local_sched.vcpuid]->sched_item);
                 local_sched.u.rtds.budget = svc->budget / MICROSECS(1);
                 local_sched.u.rtds.period = svc->period / MICROSECS(1);
                 if ( has_extratime(svc) )
@@ -1447,7 +1447,7 @@ rt_dom_cntl(
                 }
 
                 spin_lock_irqsave(&prv->lock, flags);
-                svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
+                svc = rt_item(d->vcpu[local_sched.vcpuid]->sched_item);
                 svc->period = period;
                 svc->budget = budget;
                 if ( local_sched.u.rtds.flags & XEN_DOMCTL_SCHEDRT_extra )
@@ -1480,7 +1480,7 @@ static void repl_timer_handler(void *data){
     struct list_head *replq = rt_replq(ops);
     struct list_head *runq = rt_runq(ops);
     struct list_head *iter, *tmp;
-    struct rt_vcpu *svc;
+    struct rt_item *svc;
     LIST_HEAD(tmp_replq);
 
     spin_lock_irq(&prv->lock);
@@ -1522,10 +1522,10 @@ static void repl_timer_handler(void *data){
     {
         svc = replq_elem(iter);
 
-        if ( curr_on_cpu(svc->vcpu->processor) == svc->vcpu &&
+        if ( curr_on_cpu(svc->vcpu->processor) == svc->vcpu->sched_item &&
              !list_empty(runq) )
         {
-            struct rt_vcpu *next_on_runq = q_elem(runq->next);
+            struct rt_item *next_on_runq = q_elem(runq->next);
 
             if ( compare_vcpu_priority(svc, next_on_runq) < 0 )
                 runq_tickle(ops, next_on_runq);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 62490454ea..90eb915e4e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -338,7 +338,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     /* Idle VCPUs are scheduled immediately, so don't put them in runqueue. */
     if ( is_idle_domain(d) )
     {
-        per_cpu(schedule_data, v->processor).curr = v;
+        per_cpu(schedule_data, v->processor).curr = item;
         v->is_running = 1;
     }
     else
@@ -1533,7 +1533,7 @@ static void schedule(void)
 
     next = next_slice.task;
 
-    sd->curr = next;
+    sd->curr = next->sched_item;
 
     if ( next_slice.time >= 0 ) /* -ve means no limit */
         set_timer(&sd->s_timer, now + next_slice.time);
@@ -1656,7 +1656,6 @@ static int cpu_schedule_up(unsigned int cpu)
     per_cpu(scheduler, cpu) = &ops;
     spin_lock_init(&sd->_lock);
     sd->schedule_lock = &sd->_lock;
-    sd->curr = idle_vcpu[cpu];
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&sd->urgent_count, 0);
 
@@ -1690,6 +1689,8 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
+    sd->curr = idle_vcpu[cpu]->sched_item;
+
     /*
      * We don't want to risk calling xfree() on an sd->sched_priv
      * (e.g., inside free_pdata, from cpu_schedule_down() called
@@ -1859,6 +1860,7 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0, 0) == NULL )
         BUG();
+    this_cpu(schedule_data).curr = idle_vcpu[0]->sched_item;
     this_cpu(schedule_data).sched_priv = SCHED_OP(&ops, alloc_pdata, 0);
     BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
     SCHED_OP(&ops, init_pdata, this_cpu(schedule_data).sched_priv, 0);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 10a97a5dc2..85b77dafdc 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -36,7 +36,7 @@ extern int sched_ratelimit_us;
 struct schedule_data {
     spinlock_t         *schedule_lock,
                        _lock;
-    struct vcpu        *curr;           /* current task                    */
+    struct sched_item  *curr;           /* current task                    */
     void               *sched_priv;
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 15/49] xen/sched: move per cpu scheduler private data into struct sched_resource
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (13 preceding siblings ...)
  2019-03-29 15:08 ` [PATCH RFC 14/49] xen/sched: switch schedule_data.curr to point at sched_item Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 16/49] xen/sched: switch vcpu_schedule_lock to item_schedule_lock Juergen Gross
                   ` (39 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich, Roger Pau Monné

This prepares support of larger scheduling granularities, e.g. core
scheduling.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c   |  6 ++---
 xen/common/sched_credit.c     | 14 +++++------
 xen/common/sched_credit2.c    | 24 +++++++++----------
 xen/common/sched_null.c       |  8 +++----
 xen/common/sched_rt.c         | 12 +++++-----
 xen/common/schedule.c         | 56 +++++++++++++++++++++----------------------
 xen/include/asm-x86/cpuidle.h |  2 +-
 xen/include/xen/sched-if.h    | 20 +++++++---------
 8 files changed, 68 insertions(+), 74 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 5701baf337..9dc1ff6a73 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -475,7 +475,7 @@ a653sched_item_sleep(const struct scheduler *ops, struct sched_item *item)
      * If the VCPU being put to sleep is the same one that is currently
      * running, raise a softirq to invoke the scheduler to switch domains.
      */
-    if ( per_cpu(schedule_data, vc->processor).curr == item )
+    if ( per_cpu(sched_res, vc->processor)->curr == item )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
 }
 
@@ -642,7 +642,7 @@ static void
 a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                   void *pdata, void *vdata)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = per_cpu(sched_res, cpu);
     arinc653_vcpu_t *svc = vdata;
 
     ASSERT(!pdata && svc && is_idle_vcpu(svc->vc));
@@ -650,7 +650,7 @@ a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     idle_vcpu[cpu]->sched_item->priv = vdata;
 
     per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
+    per_cpu(sched_res, cpu)->sched_priv = NULL; /* no pdata */
 
     /*
      * (Re?)route the lock to its default location. We actually do not use
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 6552d4c087..e8369b3648 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -82,7 +82,7 @@
 #define CSCHED_PRIV(_ops)   \
     ((struct csched_private *)((_ops)->sched_data))
 #define CSCHED_PCPU(_c)     \
-    ((struct csched_pcpu *)per_cpu(schedule_data, _c).sched_priv)
+    ((struct csched_pcpu *)per_cpu(sched_res, _c)->sched_priv)
 #define CSCHED_ITEM(item)   ((struct csched_item *) (item)->priv)
 #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
 #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
@@ -248,7 +248,7 @@ static inline bool_t is_runq_idle(unsigned int cpu)
     /*
      * We're peeking at cpu's runq, we must hold the proper lock.
      */
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
     return list_empty(RUNQ(cpu)) ||
            is_idle_vcpu(__runq_elem(RUNQ(cpu)->next)->vcpu);
@@ -257,7 +257,7 @@ static inline bool_t is_runq_idle(unsigned int cpu)
 static inline void
 inc_nr_runnable(unsigned int cpu)
 {
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
     CSCHED_PCPU(cpu)->nr_runnable++;
 
 }
@@ -265,7 +265,7 @@ inc_nr_runnable(unsigned int cpu)
 static inline void
 dec_nr_runnable(unsigned int cpu)
 {
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
     ASSERT(CSCHED_PCPU(cpu)->nr_runnable >= 1);
     CSCHED_PCPU(cpu)->nr_runnable--;
 }
@@ -615,7 +615,7 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     unsigned long flags;
     struct csched_private *prv = CSCHED_PRIV(ops);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = per_cpu(sched_res, cpu);
 
     /*
      * This is called either during during boot, resume or hotplug, in
@@ -635,7 +635,7 @@ static void
 csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                     void *pdata, void *vdata)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = per_cpu(sched_res, cpu);
     struct csched_private *prv = CSCHED_PRIV(new_ops);
     struct csched_item *svc = vdata;
 
@@ -654,7 +654,7 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     spin_unlock(&prv->lock);
 
     per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = pdata;
+    per_cpu(sched_res, cpu)->sched_priv = pdata;
 
     /*
      * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 5a3a0babab..df0e7282ce 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -567,7 +567,7 @@ static inline struct csched2_private *csched2_priv(const struct scheduler *ops)
 
 static inline struct csched2_pcpu *csched2_pcpu(unsigned int cpu)
 {
-    return per_cpu(schedule_data, cpu).sched_priv;
+    return per_cpu(sched_res, cpu)->sched_priv;
 }
 
 static inline struct csched2_item *csched2_item(const struct sched_item *item)
@@ -1276,7 +1276,7 @@ runq_insert(const struct scheduler *ops, struct csched2_item *svc)
     struct list_head * runq = &c2rqd(ops, cpu)->runq;
     int pos = 0;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
     ASSERT(!vcpu_on_runq(svc));
     ASSERT(c2r(cpu) == c2r(svc->vcpu->processor));
@@ -1797,7 +1797,7 @@ static bool vcpu_grab_budget(struct csched2_item *svc)
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
     if ( svc->budget > 0 )
         return true;
@@ -1844,7 +1844,7 @@ vcpu_return_budget(struct csched2_item *svc, struct list_head *parked)
     struct csched2_dom *sdom = svc->sdom;
     unsigned int cpu = svc->vcpu->processor;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
     ASSERT(list_empty(parked));
 
     /* budget_lock nests inside runqueue lock. */
@@ -2101,7 +2101,7 @@ csched2_item_wake(const struct scheduler *ops, struct sched_item *item)
     unsigned int cpu = vc->processor;
     s_time_t now;
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
     ASSERT(!is_idle_vcpu(vc));
 
@@ -2229,7 +2229,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
      * just grab the prv lock.  Instead, we'll have to trylock, and
      * do something else reasonable if we fail.
      */
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
     if ( !read_trylock(&prv->lock) )
     {
@@ -2569,7 +2569,7 @@ static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
      * on either side may be empty).
      */
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
     st.lrqd = c2rqd(ops, cpu);
 
     update_runq_load(ops, st.lrqd, 0, now);
@@ -3475,7 +3475,7 @@ csched2_schedule(
     rqd = c2rqd(ops, cpu);
     BUG_ON(!cpumask_test_cpu(cpu, &rqd->active));
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
     BUG_ON(!is_idle_vcpu(scurr->vcpu) && scurr->rqd != rqd);
 
@@ -3864,7 +3864,7 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 
     rqi = init_pdata(prv, pdata, cpu);
     /* Move the scheduler lock to the new runq lock. */
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock;
+    per_cpu(sched_res, cpu)->schedule_lock = &prv->rqd[rqi].lock;
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock(old_lock);
@@ -3903,10 +3903,10 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
      * this scheduler, and so it's safe to have taken it /before/ our
      * private global lock.
      */
-    ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &prv->rqd[rqi].lock);
+    ASSERT(per_cpu(sched_res, cpu)->schedule_lock != &prv->rqd[rqi].lock);
 
     per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = pdata;
+    per_cpu(sched_res, cpu)->sched_priv = pdata;
 
     /*
      * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
@@ -3914,7 +3914,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
      * taking it, find all the initializations we've done above in place.
      */
     smp_mb();
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock;
+    per_cpu(sched_res, cpu)->schedule_lock = &prv->rqd[rqi].lock;
 
     write_unlock(&prv->lock);
 }
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index f7a2650c48..a9cfa163b9 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -168,7 +168,7 @@ static void init_pdata(struct null_private *prv, unsigned int cpu)
 static void null_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     struct null_private *prv = null_priv(ops);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = per_cpu(sched_res, cpu);
 
     /* alloc_pdata is not implemented, so we want this to be NULL. */
     ASSERT(!pdata);
@@ -277,7 +277,7 @@ pick_res(struct null_private *prv, struct sched_item *item)
     unsigned int cpu = v->processor, new_cpu;
     cpumask_t *cpus = cpupool_domain_cpumask(v->domain);
 
-    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
     for_each_affinity_balance_step( bs )
     {
@@ -388,7 +388,7 @@ static void vcpu_deassign(struct null_private *prv, struct vcpu *v,
 static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                               void *pdata, void *vdata)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = per_cpu(sched_res, cpu);
     struct null_private *prv = null_priv(new_ops);
     struct null_item *nvc = vdata;
 
@@ -406,7 +406,7 @@ static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     init_pdata(prv, cpu);
 
     per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = pdata;
+    per_cpu(sched_res, cpu)->sched_priv = pdata;
 
     /*
      * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index a3cd00f765..0019646b52 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -75,7 +75,7 @@
 /*
  * Locking:
  * A global system lock is used to protect the RunQ and DepletedQ.
- * The global lock is referenced by schedule_data.schedule_lock
+ * The global lock is referenced by sched_res->schedule_lock
  * from all physical cpus.
  *
  * The lock is already grabbed when calling wake/sleep/schedule/ functions
@@ -176,7 +176,7 @@ static void repl_timer_handler(void *data);
 
 /*
  * System-wide private data, include global RunQueue/DepletedQ
- * Global lock is referenced by schedule_data.schedule_lock from all
+ * Global lock is referenced by sched_res->schedule_lock from all
  * physical cpus. It can be grabbed via vcpu_schedule_lock_irq()
  */
 struct rt_private {
@@ -723,7 +723,7 @@ rt_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
     }
 
     /* Move the scheduler lock to our global runqueue lock.  */
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
+    per_cpu(sched_res, cpu)->schedule_lock = &prv->lock;
 
     /* _Not_ pcpu_schedule_unlock(): per_cpu().schedule_lock changed! */
     spin_unlock_irqrestore(old_lock, flags);
@@ -745,7 +745,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
      * another scheduler, but that is how things need to be, for
      * preventing races.
      */
-    ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &prv->lock);
+    ASSERT(per_cpu(sched_res, cpu)->schedule_lock != &prv->lock);
 
     /*
      * If we are the absolute first cpu being switched toward this
@@ -763,7 +763,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
     idle_vcpu[cpu]->sched_item->priv = vdata;
     per_cpu(scheduler, cpu) = new_ops;
-    per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */
+    per_cpu(sched_res, cpu)->sched_priv = NULL; /* no pdata */
 
     /*
      * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact,
@@ -771,7 +771,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
      * taking it, find all the initializations we've done above in place.
      */
     smp_mb();
-    per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
+    per_cpu(sched_res, cpu)->schedule_lock = &prv->lock;
 }
 
 static void
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 90eb915e4e..a9a9f2b691 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -61,7 +61,6 @@ static void vcpu_singleshot_timer_fn(void *data);
 static void poll_timer_fn(void *data);
 
 /* This is global for now so that private implementations can reach it */
-DEFINE_PER_CPU(struct schedule_data, schedule_data);
 DEFINE_PER_CPU(struct scheduler *, scheduler);
 DEFINE_PER_CPU(struct sched_resource *, sched_res);
 
@@ -161,7 +160,7 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
              !test_bit(v->vcpu_id, v->domain->poll_mask) )
         {
             v->is_urgent = 0;
-            atomic_dec(&per_cpu(schedule_data,v->processor).urgent_count);
+            atomic_dec(&per_cpu(sched_res, v->processor)->urgent_count);
         }
     }
     else
@@ -170,7 +169,7 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
              unlikely(test_bit(v->vcpu_id, v->domain->poll_mask)) )
         {
             v->is_urgent = 1;
-            atomic_inc(&per_cpu(schedule_data,v->processor).urgent_count);
+            atomic_inc(&per_cpu(sched_res, v->processor)->urgent_count);
         }
     }
 }
@@ -181,7 +180,7 @@ static inline void vcpu_runstate_change(
     s_time_t delta;
 
     ASSERT(v->runstate.state != new_state);
-    ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, v->processor)->schedule_lock));
 
     vcpu_urgent_count_update(v);
 
@@ -338,7 +337,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     /* Idle VCPUs are scheduled immediately, so don't put them in runqueue. */
     if ( is_idle_domain(d) )
     {
-        per_cpu(schedule_data, v->processor).curr = item;
+        per_cpu(sched_res, v->processor)->curr = item;
         v->is_running = 1;
     }
     else
@@ -463,7 +462,7 @@ void sched_destroy_vcpu(struct vcpu *v)
     kill_timer(&v->singleshot_timer);
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
-        atomic_dec(&per_cpu(schedule_data, v->processor).urgent_count);
+        atomic_dec(&per_cpu(sched_res, v->processor)->urgent_count);
     SCHED_OP(vcpu_scheduler(v), remove_item, item);
     SCHED_OP(vcpu_scheduler(v), free_vdata, item->priv);
     sched_free_item(item);
@@ -510,7 +509,7 @@ void sched_destroy_domain(struct domain *d)
 
 void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
-    ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, v->processor)->schedule_lock));
 
     if ( likely(!vcpu_runnable(v)) )
     {
@@ -605,8 +604,8 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
      */
     if ( unlikely(v->is_urgent) && (old_cpu != new_cpu) )
     {
-        atomic_inc(&per_cpu(schedule_data, new_cpu).urgent_count);
-        atomic_dec(&per_cpu(schedule_data, old_cpu).urgent_count);
+        atomic_inc(&per_cpu(sched_res, new_cpu)->urgent_count);
+        atomic_dec(&per_cpu(sched_res, old_cpu)->urgent_count);
     }
 
     /*
@@ -678,20 +677,20 @@ static void vcpu_migrate_finish(struct vcpu *v)
          * are not correct any longer after evaluating old and new cpu holding
          * the locks.
          */
-        old_lock = per_cpu(schedule_data, old_cpu).schedule_lock;
-        new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
+        old_lock = per_cpu(sched_res, old_cpu)->schedule_lock;
+        new_lock = per_cpu(sched_res, new_cpu)->schedule_lock;
 
         sched_spin_lock_double(old_lock, new_lock, &flags);
 
         old_cpu = v->processor;
-        if ( old_lock == per_cpu(schedule_data, old_cpu).schedule_lock )
+        if ( old_lock == per_cpu(sched_res, old_cpu)->schedule_lock )
         {
             /*
              * If we selected a CPU on the previosu iteration, check if it
              * remains suitable for running this vCPU.
              */
             if ( pick_called &&
-                 (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
+                 (new_lock == per_cpu(sched_res, new_cpu)->schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
@@ -699,7 +698,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
             /* Select a new CPU. */
             new_cpu = SCHED_OP(vcpu_scheduler(v), pick_resource,
                                v->sched_item)->processor;
-            if ( (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
+            if ( (new_lock == per_cpu(sched_res, new_cpu)->schedule_lock) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
             pick_called = 1;
@@ -1492,7 +1491,7 @@ static void schedule(void)
     struct scheduler     *sched;
     unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
     bool_t                tasklet_work_scheduled = 0;
-    struct schedule_data *sd;
+    struct sched_resource *sd;
     spinlock_t           *lock;
     struct task_slice     next_slice;
     int cpu = smp_processor_id();
@@ -1501,7 +1500,7 @@ static void schedule(void)
 
     SCHED_STAT_CRANK(sched_run);
 
-    sd = &this_cpu(schedule_data);
+    sd = this_cpu(sched_res);
 
     /* Update tasklet scheduling status. */
     switch ( *tasklet_work )
@@ -1643,15 +1642,14 @@ static void poll_timer_fn(void *data)
 
 static int cpu_schedule_up(unsigned int cpu)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd;
     void *sched_priv;
-    struct sched_resource *res;
 
-    res = xmalloc(struct sched_resource);
-    if ( res == NULL )
+    sd = xmalloc(struct sched_resource);
+    if ( sd == NULL )
         return -ENOMEM;
-    res->processor = cpu;
-    per_cpu(sched_res, cpu) = res;
+    sd->processor = cpu;
+    per_cpu(sched_res, cpu) = sd;
 
     per_cpu(scheduler, cpu) = &ops;
     spin_lock_init(&sd->_lock);
@@ -1707,7 +1705,7 @@ static int cpu_schedule_up(unsigned int cpu)
 
 static void cpu_schedule_down(unsigned int cpu)
 {
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = per_cpu(sched_res, cpu);
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
     SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
@@ -1727,7 +1725,7 @@ static int cpu_schedule_callback(
 {
     unsigned int cpu = (unsigned long)hcpu;
     struct scheduler *sched = per_cpu(scheduler, cpu);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    struct sched_resource *sd = per_cpu(sched_res, cpu);
     int rc = 0;
 
     /*
@@ -1860,10 +1858,10 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus = nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0, 0) == NULL )
         BUG();
-    this_cpu(schedule_data).curr = idle_vcpu[0]->sched_item;
-    this_cpu(schedule_data).sched_priv = SCHED_OP(&ops, alloc_pdata, 0);
-    BUG_ON(IS_ERR(this_cpu(schedule_data).sched_priv));
-    SCHED_OP(&ops, init_pdata, this_cpu(schedule_data).sched_priv, 0);
+    this_cpu(sched_res)->curr = idle_vcpu[0]->sched_item;
+    this_cpu(sched_res)->sched_priv = SCHED_OP(&ops, alloc_pdata, 0);
+    BUG_ON(IS_ERR(this_cpu(sched_res)->sched_priv));
+    SCHED_OP(&ops, init_pdata, this_cpu(sched_res)->sched_priv, 0);
 }
 
 /*
@@ -1943,7 +1941,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     old_lock = pcpu_schedule_lock_irq(cpu);
 
     vpriv_old = idle->sched_item->priv;
-    ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
+    ppriv_old = per_cpu(sched_res, cpu)->sched_priv;
     SCHED_OP(new_ops, switch_sched, cpu, ppriv, vpriv);
 
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
diff --git a/xen/include/asm-x86/cpuidle.h b/xen/include/asm-x86/cpuidle.h
index 08da01803f..f520145752 100644
--- a/xen/include/asm-x86/cpuidle.h
+++ b/xen/include/asm-x86/cpuidle.h
@@ -33,7 +33,7 @@ void update_last_cx_stat(struct acpi_processor_power *,
  */
 static inline int sched_has_urgent_vcpu(void)
 {
-    return atomic_read(&this_cpu(schedule_data).urgent_count);
+    return atomic_read(&this_cpu(sched_res)->urgent_count);
 }
 
 #endif /* __X86_ASM_CPUIDLE_H__ */
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 85b77dafdc..4bc053e9f7 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -33,22 +33,18 @@ extern int sched_ratelimit_us;
  * For cache betterness, keep the actual lock in the same cache area
  * as the rest of the struct.  Just have the scheduler point to the
  * one it wants (This may be the one right in front of it).*/
-struct schedule_data {
+struct sched_resource {
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_item  *curr;           /* current task                    */
     void               *sched_priv;
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
+    unsigned            processor;
 };
 
-#define curr_on_cpu(c)    (per_cpu(schedule_data, c).curr)
-
-struct sched_resource {
-    unsigned     processor;
-};
+#define curr_on_cpu(c)    (per_cpu(sched_res, c)->curr)
 
-DECLARE_PER_CPU(struct schedule_data, schedule_data);
 DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
@@ -83,7 +79,7 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
 { \
     for ( ; ; ) \
     { \
-        spinlock_t *lock = per_cpu(schedule_data, cpu).schedule_lock; \
+        spinlock_t *lock = per_cpu(sched_res, cpu)->schedule_lock; \
         /* \
          * v->processor may change when grabbing the lock; but \
          * per_cpu(v->processor) may also change, if changing cpu pool \
@@ -93,7 +89,7 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
          * lock may be the same; this will succeed in that case. \
          */ \
         spin_lock##irq(lock, ## arg); \
-        if ( likely(lock == per_cpu(schedule_data, cpu).schedule_lock) ) \
+        if ( likely(lock == per_cpu(sched_res, cpu)->schedule_lock) ) \
             return lock; \
         spin_unlock##irq(lock, ## arg); \
     } \
@@ -103,7 +99,7 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
 static inline void kind##_schedule_unlock##irq(spinlock_t *lock \
                                                EXTRA_TYPE(arg), param) \
 { \
-    ASSERT(lock == per_cpu(schedule_data, cpu).schedule_lock); \
+    ASSERT(lock == per_cpu(sched_res, cpu)->schedule_lock); \
     spin_unlock##irq(lock, ## arg); \
 }
 
@@ -132,11 +128,11 @@ sched_unlock(vcpu, const struct vcpu *v, v->processor, _irqrestore, flags)
 
 static inline spinlock_t *pcpu_schedule_trylock(unsigned int cpu)
 {
-    spinlock_t *lock = per_cpu(schedule_data, cpu).schedule_lock;
+    spinlock_t *lock = per_cpu(sched_res, cpu)->schedule_lock;
 
     if ( !spin_trylock(lock) )
         return NULL;
-    if ( lock == per_cpu(schedule_data, cpu).schedule_lock )
+    if ( lock == per_cpu(sched_res, cpu)->schedule_lock )
         return lock;
     spin_unlock(lock);
     return NULL;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 16/49] xen/sched: switch vcpu_schedule_lock to item_schedule_lock
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (14 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 15/49] xen/sched: move per cpu scheduler private data into struct sched_resource Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item Juergen Gross
                   ` (38 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Rename vcpu_schedule_[un]lock[_irq]() to item_schedule_[un]lock[_irq]()
and let it take a sched_item pointer instead of a vcpu pointer as
parameter.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c  | 17 +++++++++--------
 xen/common/sched_credit2.c | 40 +++++++++++++++++++--------------------
 xen/common/sched_null.c    | 14 +++++++-------
 xen/common/sched_rt.c      | 15 +++++++--------
 xen/common/schedule.c      | 47 +++++++++++++++++++++++-----------------------
 xen/include/xen/sched-if.h | 12 ++++++------
 6 files changed, 73 insertions(+), 72 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index e8369b3648..de4face2bc 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -940,7 +940,8 @@ __csched_vcpu_acct_stop_locked(struct csched_private *prv,
 static void
 csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
 {
-    struct csched_item * const svc = CSCHED_ITEM(current->sched_item);
+    struct sched_item *curritem = current->sched_item;
+    struct csched_item * const svc = CSCHED_ITEM(curritem);
     const struct scheduler *ops = per_cpu(scheduler, cpu);
 
     ASSERT( current->processor == cpu );
@@ -976,7 +977,7 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
     {
         unsigned int new_cpu;
         unsigned long flags;
-        spinlock_t *lock = vcpu_schedule_lock_irqsave(current, &flags);
+        spinlock_t *lock = item_schedule_lock_irqsave(curritem, &flags);
 
         /*
          * If it's been active a while, check if we'd be better off
@@ -985,7 +986,7 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
          */
         new_cpu = _csched_cpu_pick(ops, current, 0);
 
-        vcpu_schedule_unlock_irqrestore(lock, flags, current);
+        item_schedule_unlock_irqrestore(lock, flags, curritem);
 
         if ( new_cpu != cpu )
         {
@@ -1037,19 +1038,19 @@ csched_item_insert(const struct scheduler *ops, struct sched_item *item)
     BUG_ON( is_idle_vcpu(vc) );
 
     /* csched_res_pick() looks in vc->processor's runq, so we need the lock. */
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = item_schedule_lock_irq(item);
 
     item->res = csched_res_pick(ops, item);
     vc->processor = item->res->processor;
 
     spin_unlock_irq(lock);
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = item_schedule_lock_irq(item);
 
     if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )
         runq_insert(svc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    item_schedule_unlock_irq(lock, item);
 
     SCHED_STAT_CRANK(vcpu_insert);
 }
@@ -2145,12 +2146,12 @@ csched_dump(const struct scheduler *ops)
             spinlock_t *lock;
 
             svc = list_entry(iter_svc, struct csched_item, active_vcpu_elem);
-            lock = vcpu_schedule_lock(svc->vcpu);
+            lock = item_schedule_lock(svc->vcpu->sched_item);
 
             printk("\t%3d: ", ++loop);
             csched_dump_vcpu(svc);
 
-            vcpu_schedule_unlock(lock, svc->vcpu);
+            item_schedule_unlock(lock, svc->vcpu->sched_item);
         }
     }
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index df0e7282ce..6106293b3f 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -171,7 +171,7 @@
  * - runqueue lock
  *  + it is per-runqueue, so:
  *   * cpus in a runqueue take the runqueue lock, when using
- *     pcpu_schedule_lock() / vcpu_schedule_lock() (and friends),
+ *     pcpu_schedule_lock() / item_schedule_lock() (and friends),
  *   * a cpu may (try to) take a "remote" runqueue lock, e.g., for
  *     load balancing;
  *  + serializes runqueue operations (removing and inserting vcpus);
@@ -1890,7 +1890,7 @@ unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
         unsigned long flags;
         s_time_t now;
 
-        lock = vcpu_schedule_lock_irqsave(svc->vcpu, &flags);
+        lock = item_schedule_lock_irqsave(svc->vcpu->sched_item, &flags);
 
         __clear_bit(_VPF_parked, &svc->vcpu->pause_flags);
         if ( unlikely(svc->flags & CSFLAG_scheduled) )
@@ -1923,7 +1923,7 @@ unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
         }
         list_del_init(&svc->parked_elem);
 
-        vcpu_schedule_unlock_irqrestore(lock, flags, svc->vcpu);
+        item_schedule_unlock_irqrestore(lock, flags, svc->vcpu->sched_item);
     }
 }
 
@@ -2162,7 +2162,7 @@ csched2_context_saved(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
     struct csched2_item * const svc = csched2_item(item);
-    spinlock_t *lock = vcpu_schedule_lock_irq(vc);
+    spinlock_t *lock = item_schedule_lock_irq(item);
     s_time_t now = NOW();
     LIST_HEAD(were_parked);
 
@@ -2194,7 +2194,7 @@ csched2_context_saved(const struct scheduler *ops, struct sched_item *item)
     else if ( !is_idle_vcpu(vc) )
         update_load(ops, svc->rqd, svc, -1, now);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    item_schedule_unlock_irq(lock, item);
 
     unpark_parked_vcpus(ops, &were_parked);
 }
@@ -2847,14 +2847,14 @@ csched2_dom_cntl(
             for_each_vcpu ( d, v )
             {
                 struct csched2_item *svc = csched2_item(v->sched_item);
-                spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
+                spinlock_t *lock = item_schedule_lock(svc->vcpu->sched_item);
 
                 ASSERT(svc->rqd == c2rqd(ops, svc->vcpu->processor));
 
                 svc->weight = sdom->weight;
                 update_max_weight(svc->rqd, svc->weight, old_weight);
 
-                vcpu_schedule_unlock(lock, svc->vcpu);
+                item_schedule_unlock(lock, svc->vcpu->sched_item);
             }
         }
         /* Cap */
@@ -2885,7 +2885,7 @@ csched2_dom_cntl(
             for_each_vcpu ( d, v )
             {
                 svc = csched2_item(v->sched_item);
-                lock = vcpu_schedule_lock(svc->vcpu);
+                lock = item_schedule_lock(svc->vcpu->sched_item);
                 /*
                  * Too small quotas would in theory cause a lot of overhead,
                  * which then won't happen because, in csched2_runtime(),
@@ -2893,7 +2893,7 @@ csched2_dom_cntl(
                  */
                 svc->budget_quota = max(sdom->tot_budget / sdom->nr_vcpus,
                                         CSCHED2_MIN_TIMER);
-                vcpu_schedule_unlock(lock, svc->vcpu);
+                item_schedule_unlock(lock, svc->vcpu->sched_item);
             }
 
             if ( sdom->cap == 0 )
@@ -2928,7 +2928,7 @@ csched2_dom_cntl(
                 for_each_vcpu ( d, v )
                 {
                     svc = csched2_item(v->sched_item);
-                    lock = vcpu_schedule_lock(svc->vcpu);
+                    lock = item_schedule_lock(svc->vcpu->sched_item);
                     if ( v->is_running )
                     {
                         unsigned int cpu = v->processor;
@@ -2959,7 +2959,7 @@ csched2_dom_cntl(
                         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
                     }
                     svc->budget = 0;
-                    vcpu_schedule_unlock(lock, svc->vcpu);
+                    item_schedule_unlock(lock, svc->vcpu->sched_item);
                 }
             }
 
@@ -2975,12 +2975,12 @@ csched2_dom_cntl(
             for_each_vcpu ( d, v )
             {
                 struct csched2_item *svc = csched2_item(v->sched_item);
-                spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
+                spinlock_t *lock = item_schedule_lock(svc->vcpu->sched_item);
 
                 svc->budget = STIME_MAX;
                 svc->budget_quota = 0;
 
-                vcpu_schedule_unlock(lock, svc->vcpu);
+                item_schedule_unlock(lock, svc->vcpu->sched_item);
             }
             sdom->cap = 0;
             /*
@@ -3119,19 +3119,19 @@ csched2_item_insert(const struct scheduler *ops, struct sched_item *item)
     ASSERT(list_empty(&svc->runq_elem));
 
     /* csched2_res_pick() expects the pcpu lock to be held */
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = item_schedule_lock_irq(item);
 
     item->res = csched2_res_pick(ops, item);
     vc->processor = item->res->processor;
 
     spin_unlock_irq(lock);
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = item_schedule_lock_irq(item);
 
     /* Add vcpu to runqueue of initial processor */
     runq_assign(ops, vc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    item_schedule_unlock_irq(lock, item);
 
     sdom->nr_vcpus++;
 
@@ -3161,11 +3161,11 @@ csched2_item_remove(const struct scheduler *ops, struct sched_item *item)
     SCHED_STAT_CRANK(vcpu_remove);
 
     /* Remove from runqueue */
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = item_schedule_lock_irq(item);
 
     runq_deassign(ops, vc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    item_schedule_unlock_irq(lock, item);
 
     svc->sdom->nr_vcpus--;
 }
@@ -3749,12 +3749,12 @@ csched2_dump(const struct scheduler *ops)
             struct csched2_item * const svc = csched2_item(v->sched_item);
             spinlock_t *lock;
 
-            lock = vcpu_schedule_lock(svc->vcpu);
+            lock = item_schedule_lock(svc->vcpu->sched_item);
 
             printk("\t%3d: ", ++loop);
             csched2_dump_vcpu(prv, svc);
 
-            vcpu_schedule_unlock(lock, svc->vcpu);
+            item_schedule_unlock(lock, svc->vcpu->sched_item);
         }
     }
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index a9cfa163b9..620925e8ce 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -317,7 +317,7 @@ pick_res(struct null_private *prv, struct sched_item *item)
      * all the pCPUs are busy.
      *
      * In fact, there must always be something sane in v->processor, or
-     * vcpu_schedule_lock() and friends won't work. This is not a problem,
+     * item_schedule_lock() and friends won't work. This is not a problem,
      * as we will actually assign the vCPU to the pCPU we return from here,
      * only if the pCPU is free.
      */
@@ -428,7 +428,7 @@ static void null_item_insert(const struct scheduler *ops,
 
     ASSERT(!is_idle_vcpu(v));
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = item_schedule_lock_irq(item);
  retry:
 
     item->res = pick_res(prv, item);
@@ -436,7 +436,7 @@ static void null_item_insert(const struct scheduler *ops,
 
     spin_unlock(lock);
 
-    lock = vcpu_schedule_lock(v);
+    lock = item_schedule_lock(item);
 
     cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
                 cpupool_domain_cpumask(v->domain));
@@ -522,7 +522,7 @@ static void null_item_remove(const struct scheduler *ops,
 
     ASSERT(!is_idle_vcpu(v));
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = item_schedule_lock_irq(item);
 
     /* If v is in waitqueue, just get it out of there and bail */
     if ( unlikely(!list_empty(&nvc->waitq_elem)) )
@@ -540,7 +540,7 @@ static void null_item_remove(const struct scheduler *ops,
     _vcpu_remove(prv, v);
 
  out:
-    vcpu_schedule_unlock_irq(lock, v);
+    item_schedule_unlock_irq(lock, item);
 
     SCHED_STAT_CRANK(vcpu_remove);
 }
@@ -860,13 +860,13 @@ static void null_dump(const struct scheduler *ops)
             struct null_item * const nvc = null_item(v->sched_item);
             spinlock_t *lock;
 
-            lock = vcpu_schedule_lock(nvc->vcpu);
+            lock = item_schedule_lock(nvc->vcpu->sched_item);
 
             printk("\t%3d: ", ++loop);
             dump_vcpu(prv, nvc);
             printk("\n");
 
-            vcpu_schedule_unlock(lock, nvc->vcpu);
+            item_schedule_unlock(lock, nvc->vcpu->sched_item);
         }
     }
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 0019646b52..a604a0d5a6 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -177,7 +177,7 @@ static void repl_timer_handler(void *data);
 /*
  * System-wide private data, include global RunQueue/DepletedQ
  * Global lock is referenced by sched_res->schedule_lock from all
- * physical cpus. It can be grabbed via vcpu_schedule_lock_irq()
+ * physical cpus. It can be grabbed via item_schedule_lock_irq()
  */
 struct rt_private {
     spinlock_t lock;            /* the global coarse-grained lock */
@@ -904,7 +904,7 @@ rt_item_insert(const struct scheduler *ops, struct sched_item *item)
     item->res = rt_res_pick(ops, item);
     vc->processor = item->res->processor;
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = item_schedule_lock_irq(item);
 
     now = NOW();
     if ( now >= svc->cur_deadline )
@@ -917,7 +917,7 @@ rt_item_insert(const struct scheduler *ops, struct sched_item *item)
         if ( !vc->is_running )
             runq_insert(ops, svc);
     }
-    vcpu_schedule_unlock_irq(lock, vc);
+    item_schedule_unlock_irq(lock, item);
 
     SCHED_STAT_CRANK(vcpu_insert);
 }
@@ -928,7 +928,6 @@ rt_item_insert(const struct scheduler *ops, struct sched_item *item)
 static void
 rt_item_remove(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct rt_item * const svc = rt_item(item);
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
@@ -937,14 +936,14 @@ rt_item_remove(const struct scheduler *ops, struct sched_item *item)
 
     BUG_ON( sdom == NULL );
 
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = item_schedule_lock_irq(item);
     if ( vcpu_on_q(svc) )
         q_remove(svc);
 
     if ( vcpu_on_replq(svc) )
         replq_remove(ops,svc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    item_schedule_unlock_irq(lock, item);
 }
 
 /*
@@ -1339,7 +1338,7 @@ rt_context_saved(const struct scheduler *ops, struct sched_item *item)
 {
     struct vcpu *vc = item->vcpu;
     struct rt_item *svc = rt_item(item);
-    spinlock_t *lock = vcpu_schedule_lock_irq(vc);
+    spinlock_t *lock = item_schedule_lock_irq(item);
 
     __clear_bit(__RTDS_scheduled, &svc->flags);
     /* not insert idle vcpu to runq */
@@ -1356,7 +1355,7 @@ rt_context_saved(const struct scheduler *ops, struct sched_item *item)
         replq_remove(ops, svc);
 
 out:
-    vcpu_schedule_unlock_irq(lock, vc);
+    item_schedule_unlock_irq(lock, item);
 }
 
 /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index a9a9f2b691..a8382d9812 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -198,7 +198,8 @@ static inline void vcpu_runstate_change(
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
 {
-    spinlock_t *lock = likely(v == current) ? NULL : vcpu_schedule_lock_irq(v);
+    spinlock_t *lock = likely(v == current)
+                       ? NULL : item_schedule_lock_irq(v->sched_item);
     s_time_t delta;
 
     memcpy(runstate, &v->runstate, sizeof(*runstate));
@@ -207,7 +208,7 @@ void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
         runstate->time[runstate->state] += delta;
 
     if ( unlikely(lock != NULL) )
-        vcpu_schedule_unlock_irq(lock, v);
+        item_schedule_unlock_irq(lock, v->sched_item);
 }
 
 uint64_t get_cpu_idle_time(unsigned int cpu)
@@ -419,7 +420,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         migrate_timer(&v->singleshot_timer, new_p);
         migrate_timer(&v->poll_timer, new_p);
 
-        lock = vcpu_schedule_lock_irq(v);
+        lock = item_schedule_lock_irq(v->sched_item);
 
         sched_set_affinity(v, &cpumask_all, &cpumask_all);
 
@@ -428,7 +429,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
-         * - use vcpu_schedule_unlock_irq().
+         * - use item_schedule_unlock_irq().
          */
         spin_unlock_irq(lock);
 
@@ -527,11 +528,11 @@ void vcpu_sleep_nosync(struct vcpu *v)
 
     TRACE_2D(TRC_SCHED_SLEEP, v->domain->domain_id, v->vcpu_id);
 
-    lock = vcpu_schedule_lock_irqsave(v, &flags);
+    lock = item_schedule_lock_irqsave(v->sched_item, &flags);
 
     vcpu_sleep_nosync_locked(v);
 
-    vcpu_schedule_unlock_irqrestore(lock, flags, v);
+    item_schedule_unlock_irqrestore(lock, flags, v->sched_item);
 }
 
 void vcpu_sleep_sync(struct vcpu *v)
@@ -551,7 +552,7 @@ void vcpu_wake(struct vcpu *v)
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
-    lock = vcpu_schedule_lock_irqsave(v, &flags);
+    lock = item_schedule_lock_irqsave(v->sched_item, &flags);
 
     if ( likely(vcpu_runnable(v)) )
     {
@@ -565,7 +566,7 @@ void vcpu_wake(struct vcpu *v)
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
     }
 
-    vcpu_schedule_unlock_irqrestore(lock, flags, v);
+    item_schedule_unlock_irqrestore(lock, flags, v->sched_item);
 }
 
 void vcpu_unblock(struct vcpu *v)
@@ -639,9 +640,9 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
  * These steps are encapsulated in the following two functions; they
  * should be called like this:
  *
- *     lock = vcpu_schedule_lock_irq(v);
+ *     lock = item_schedule_lock_irq(item);
  *     vcpu_migrate_start(v);
- *     vcpu_schedule_unlock_irq(lock, v)
+ *     item_schedule_unlock_irq(lock, item)
  *     vcpu_migrate_finish(v);
  *
  * vcpu_migrate_finish() will do the work now if it can, or simply
@@ -746,12 +747,12 @@ static void vcpu_migrate_finish(struct vcpu *v)
  */
 void vcpu_force_reschedule(struct vcpu *v)
 {
-    spinlock_t *lock = vcpu_schedule_lock_irq(v);
+    spinlock_t *lock = item_schedule_lock_irq(v->sched_item);
 
     if ( v->is_running )
         vcpu_migrate_start(v);
 
-    vcpu_schedule_unlock_irq(lock, v);
+    item_schedule_unlock_irq(lock, v->sched_item);
 
     vcpu_migrate_finish(v);
 }
@@ -802,7 +803,7 @@ void restore_vcpu_affinity(struct domain *d)
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
         v->sched_item->res = per_cpu(sched_res, v->processor);
 
-        lock = vcpu_schedule_lock_irq(v);
+        lock = item_schedule_lock_irq(v->sched_item);
         v->sched_item->res = SCHED_OP(vcpu_scheduler(v), pick_resource,
                                       v->sched_item);
         v->processor = v->sched_item->res->processor;
@@ -837,7 +838,7 @@ int cpu_disable_scheduler(unsigned int cpu)
         for_each_vcpu ( d, v )
         {
             unsigned long flags;
-            spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
+            spinlock_t *lock = item_schedule_lock_irqsave(v->sched_item, &flags);
 
             cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
@@ -846,7 +847,7 @@ int cpu_disable_scheduler(unsigned int cpu)
                 if ( v->affinity_broken )
                 {
                     /* The vcpu is temporarily pinned, can't move it. */
-                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
+                    item_schedule_unlock_irqrestore(lock, flags, v->sched_item);
                     ret = -EADDRINUSE;
                     break;
                 }
@@ -859,7 +860,7 @@ int cpu_disable_scheduler(unsigned int cpu)
             if ( v->processor != cpu )
             {
                 /* The vcpu is not on this cpu, so we can move on. */
-                vcpu_schedule_unlock_irqrestore(lock, flags, v);
+                item_schedule_unlock_irqrestore(lock, flags, v->sched_item);
                 continue;
             }
 
@@ -872,7 +873,7 @@ int cpu_disable_scheduler(unsigned int cpu)
              *    things would have failed before getting in here.
              */
             vcpu_migrate_start(v);
-            vcpu_schedule_unlock_irqrestore(lock, flags, v);
+            item_schedule_unlock_irqrestore(lock, flags, v->sched_item);
 
             vcpu_migrate_finish(v);
 
@@ -943,7 +944,7 @@ static int vcpu_set_affinity(
     spinlock_t *lock;
     int ret = 0;
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = item_schedule_lock_irq(v->sched_item);
 
     if ( v->affinity_broken )
         ret = -EBUSY;
@@ -965,7 +966,7 @@ static int vcpu_set_affinity(
         vcpu_migrate_start(v);
     }
 
-    vcpu_schedule_unlock_irq(lock, v);
+    item_schedule_unlock_irq(lock, v->sched_item);
 
     domain_update_node_affinity(v->domain);
 
@@ -1100,10 +1101,10 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=current;
-    spinlock_t *lock = vcpu_schedule_lock_irq(v);
+    spinlock_t *lock = item_schedule_lock_irq(v->sched_item);
 
     SCHED_OP(vcpu_scheduler(v), yield, v->sched_item);
-    vcpu_schedule_unlock_irq(lock, v);
+    item_schedule_unlock_irq(lock, v->sched_item);
 
     SCHED_STAT_CRANK(vcpu_yield);
 
@@ -1189,7 +1190,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     spinlock_t *lock;
     int ret = -EINVAL;
 
-    lock = vcpu_schedule_lock_irq(v);
+    lock = item_schedule_lock_irq(v->sched_item);
 
     if ( cpu < 0 )
     {
@@ -1216,7 +1217,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     if ( ret == 0 )
         vcpu_migrate_start(v);
 
-    vcpu_schedule_unlock_irq(lock, v);
+    item_schedule_unlock_irq(lock, v->sched_item);
 
     domain_update_node_affinity(v->domain);
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 4bc053e9f7..d549ef696e 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -105,22 +105,22 @@ static inline void kind##_schedule_unlock##irq(spinlock_t *lock \
 
 #define EXTRA_TYPE(arg)
 sched_lock(pcpu, unsigned int cpu,     cpu, )
-sched_lock(vcpu, const struct vcpu *v, v->processor, )
+sched_lock(item, const struct sched_item *i, i->res->processor, )
 sched_lock(pcpu, unsigned int cpu,     cpu,          _irq)
-sched_lock(vcpu, const struct vcpu *v, v->processor, _irq)
+sched_lock(item, const struct sched_item *i, i->res->processor, _irq)
 sched_unlock(pcpu, unsigned int cpu,     cpu, )
-sched_unlock(vcpu, const struct vcpu *v, v->processor, )
+sched_unlock(item, const struct sched_item *i, i->res->processor, )
 sched_unlock(pcpu, unsigned int cpu,     cpu,          _irq)
-sched_unlock(vcpu, const struct vcpu *v, v->processor, _irq)
+sched_unlock(item, const struct sched_item *i, i->res->processor, _irq)
 #undef EXTRA_TYPE
 
 #define EXTRA_TYPE(arg) , unsigned long arg
 #define spin_unlock_irqsave spin_unlock_irqrestore
 sched_lock(pcpu, unsigned int cpu,     cpu,          _irqsave, *flags)
-sched_lock(vcpu, const struct vcpu *v, v->processor, _irqsave, *flags)
+sched_lock(item, const struct sched_item *i, i->res->processor, _irqsave, *flags)
 #undef spin_unlock_irqsave
 sched_unlock(pcpu, unsigned int cpu,     cpu,          _irqrestore, flags)
-sched_unlock(vcpu, const struct vcpu *v, v->processor, _irqrestore, flags)
+sched_unlock(item, const struct sched_item *i, i->res->processor, _irqrestore, flags)
 #undef EXTRA_TYPE
 
 #undef sched_unlock
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (15 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 16/49] xen/sched: switch vcpu_schedule_lock to item_schedule_lock Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 21:33   ` Andrew Cooper
  2019-03-29 15:09 ` [PATCH RFC 18/49] xen/sched: add scheduler helpers hiding vcpu Juergen Gross
                   ` (37 subsequent siblings)
  54 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Affinities are scheduler specific attributes, they should be per
scheduling item. So move all affinity related fields in struct vcpu
to struct sched_item. While at it switch affinity related functions in
sched-if.h to use a pointer to sched_item instead to vcpu as parameter.

vcpu->last_run_time is primarily used by sched_credit, so move it to
struct sched_item, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain.c          |   1 +
 xen/arch/x86/pv/emul-priv-op.c |   2 +
 xen/arch/x86/pv/traps.c        |   6 ++-
 xen/arch/x86/traps.c           |  10 ++--
 xen/common/domain.c            |  19 ++-----
 xen/common/domctl.c            |  13 +++--
 xen/common/keyhandler.c        |   5 +-
 xen/common/sched_credit.c      |  20 ++++----
 xen/common/sched_credit2.c     |  42 ++++++++--------
 xen/common/sched_null.c        |  16 +++---
 xen/common/sched_rt.c          |   9 ++--
 xen/common/schedule.c          | 110 ++++++++++++++++++++++++-----------------
 xen/common/wait.c              |   5 +-
 xen/include/xen/sched-if.h     |  33 ++++++++++---
 xen/include/xen/sched.h        |  20 +-------
 15 files changed, 168 insertions(+), 143 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8d579e2cf9..5d8f3255cb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -15,6 +15,7 @@
 #include <xen/lib.h>
 #include <xen/errno.h>
 #include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/domain.h>
 #include <xen/smp.h>
 #include <xen/delay.h>
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 3746e2ad54..f7d98c28f1 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -23,6 +23,8 @@
 #include <xen/event.h>
 #include <xen/guest_access.h>
 #include <xen/iocap.h>
+#include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/spinlock.h>
 #include <xen/trace.h>
 
diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 1740784ff2..f586d486fc 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -22,6 +22,8 @@
 #include <xen/event.h>
 #include <xen/hypercall.h>
 #include <xen/lib.h>
+#include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/trace.h>
 #include <xen/softirq.h>
 
@@ -155,8 +157,8 @@ static void nmi_mce_softirq(void)
      * Set the tmp value unconditionally, so that the check in the iret
      * hypercall works.
      */
-    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
-                 st->vcpu->cpu_hard_affinity);
+    cpumask_copy(st->vcpu->sched_item->cpu_hard_affinity_tmp,
+                 st->vcpu->sched_item->cpu_hard_affinity);
 
     if ( (cpu != st->processor) ||
          (st->processor != st->vcpu->processor) )
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 05ddc39bfe..481d0b1c37 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -26,6 +26,7 @@
 
 #include <xen/init.h>
 #include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/lib.h>
 #include <xen/err.h>
 #include <xen/errno.h>
@@ -1594,16 +1595,17 @@ static void pci_serr_softirq(void)
 void async_exception_cleanup(struct vcpu *curr)
 {
     int trap;
+    struct sched_item *item = curr->sched_item;
 
     if ( !curr->async_exception_mask )
         return;
 
     /* Restore affinity.  */
-    if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) &&
-         !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) )
+    if ( !cpumask_empty(item->cpu_hard_affinity_tmp) &&
+         !cpumask_equal(item->cpu_hard_affinity_tmp, item->cpu_hard_affinity) )
     {
-        vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp);
-        cpumask_clear(curr->cpu_hard_affinity_tmp);
+        vcpu_set_hard_affinity(curr, item->cpu_hard_affinity_tmp);
+        cpumask_clear(item->cpu_hard_affinity_tmp);
     }
 
     if ( !(curr->async_exception_mask & (curr->async_exception_mask - 1)) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 3b18f11f12..2045e762ac 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -126,11 +126,6 @@ static void vcpu_info_reset(struct vcpu *v)
 
 static void vcpu_destroy(struct vcpu *v)
 {
-    free_cpumask_var(v->cpu_hard_affinity);
-    free_cpumask_var(v->cpu_hard_affinity_tmp);
-    free_cpumask_var(v->cpu_hard_affinity_saved);
-    free_cpumask_var(v->cpu_soft_affinity);
-
     free_vcpu_struct(v);
 }
 
@@ -154,12 +149,6 @@ struct vcpu *vcpu_create(
 
     grant_table_init_vcpu(v);
 
-    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
-         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
-         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
-         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
-        goto fail;
-
     if ( is_idle_domain(d) )
     {
         v->runstate.state = RUNSTATE_running;
@@ -199,7 +188,6 @@ struct vcpu *vcpu_create(
     sched_destroy_vcpu(v);
  fail_wq:
     destroy_waitqueue_vcpu(v);
- fail:
     vcpu_destroy(v);
 
     return NULL;
@@ -559,9 +547,10 @@ void domain_update_node_affinity(struct domain *d)
          */
         for_each_vcpu ( d, v )
         {
-            cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity);
+            cpumask_or(dom_cpumask, dom_cpumask,
+                       v->sched_item->cpu_hard_affinity);
             cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
-                       v->cpu_soft_affinity);
+                       v->sched_item->cpu_soft_affinity);
         }
         /* Filter out non-online cpus */
         cpumask_and(dom_cpumask, dom_cpumask, online);
@@ -1230,7 +1219,7 @@ int vcpu_reset(struct vcpu *v)
     v->async_exception_mask = 0;
     memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
 #endif
-    cpumask_clear(v->cpu_hard_affinity_tmp);
+    cpumask_clear(v->sched_item->cpu_hard_affinity_tmp);
     clear_bit(_VPF_blocked, &v->pause_flags);
     clear_bit(_VPF_in_reset, &v->pause_flags);
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index bade9a63b1..8464713d2b 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     case XEN_DOMCTL_getvcpuaffinity:
     {
         struct vcpu *v;
+        struct sched_item *item;
         struct xen_domctl_vcpuaffinity *vcpuaff = &op->u.vcpuaffinity;
 
         ret = -EINVAL;
@@ -624,6 +625,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         if ( (v = d->vcpu[vcpuaff->vcpu]) == NULL )
             break;
 
+        item = v->sched_item;
         ret = -EINVAL;
         if ( vcpuaffinity_params_invalid(vcpuaff) )
             break;
@@ -643,7 +645,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                 ret = -ENOMEM;
                 break;
             }
-            cpumask_copy(old_affinity, v->cpu_hard_affinity);
+            cpumask_copy(old_affinity, item->cpu_hard_affinity);
 
             if ( !alloc_cpumask_var(&new_affinity) )
             {
@@ -676,7 +678,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                  * For hard affinity, what we return is the intersection of
                  * cpupool's online mask and the new hard affinity.
                  */
-                cpumask_and(new_affinity, online, v->cpu_hard_affinity);
+                cpumask_and(new_affinity, online, item->cpu_hard_affinity);
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard,
                                                new_affinity);
             }
@@ -705,7 +707,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                  * hard affinity.
                  */
                 cpumask_and(new_affinity, new_affinity, online);
-                cpumask_and(new_affinity, new_affinity, v->cpu_hard_affinity);
+                cpumask_and(new_affinity, new_affinity,
+                            item->cpu_hard_affinity);
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft,
                                                new_affinity);
             }
@@ -718,10 +721,10 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         {
             if ( vcpuaff->flags & XEN_VCPUAFFINITY_HARD )
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard,
-                                               v->cpu_hard_affinity);
+                                               item->cpu_hard_affinity);
             if ( vcpuaff->flags & XEN_VCPUAFFINITY_SOFT )
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft,
-                                               v->cpu_soft_affinity);
+                                               item->cpu_soft_affinity);
         }
         break;
     }
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 4f4a660b0c..f50df5841d 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -9,6 +9,7 @@
 #include <xen/console.h>
 #include <xen/serial.h>
 #include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/tasklet.h>
 #include <xen/domain.h>
 #include <xen/rangeset.h>
@@ -312,8 +313,8 @@ static void dump_domains(unsigned char key)
                 printk("dirty_cpu=%u", v->dirty_cpu);
             printk("\n");
             printk("    cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n",
-                   nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity),
-                   nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity));
+                   nr_cpu_ids, cpumask_bits(v->sched_item->cpu_hard_affinity),
+                   nr_cpu_ids, cpumask_bits(v->sched_item->cpu_soft_affinity));
             printk("    pause_count=%d pause_flags=%lx\n",
                    atomic_read(&v->pause_count), v->pause_flags);
             arch_dump_vcpu_info(v);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index de4face2bc..9e7c849b94 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -350,6 +350,7 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 static inline void __runq_tickle(struct csched_item *new)
 {
     unsigned int cpu = new->vcpu->processor;
+    struct sched_item *item = new->vcpu->sched_item;
     struct csched_item * const cur = CSCHED_ITEM(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     cpumask_t mask, idle_mask, *online;
@@ -375,7 +376,7 @@ static inline void __runq_tickle(struct csched_item *new)
     if ( unlikely(test_bit(CSCHED_FLAG_VCPU_PINNED, &new->flags) &&
                   cpumask_test_cpu(cpu, &idle_mask)) )
     {
-        ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu);
+        ASSERT(cpumask_cycle(cpu, item->cpu_hard_affinity) == cpu);
         SCHED_STAT_CRANK(tickled_idle_cpu_excl);
         __cpumask_set_cpu(cpu, &mask);
         goto tickle;
@@ -410,11 +411,11 @@ static inline void __runq_tickle(struct csched_item *new)
             int new_idlers_empty;
 
             if ( balance_step == BALANCE_SOFT_AFFINITY
-                 && !has_soft_affinity(new->vcpu) )
+                 && !has_soft_affinity(item) )
                 continue;
 
             /* Are there idlers suitable for new (for this balance step)? */
-            affinity_balance_cpumask(new->vcpu, balance_step,
+            affinity_balance_cpumask(item, balance_step,
                                      cpumask_scratch_cpu(cpu));
             cpumask_and(cpumask_scratch_cpu(cpu),
                         cpumask_scratch_cpu(cpu), &idle_mask);
@@ -443,8 +444,7 @@ static inline void __runq_tickle(struct csched_item *new)
              */
             if ( new_idlers_empty && new->pri > cur->pri )
             {
-                if ( cpumask_intersects(cur->vcpu->cpu_hard_affinity,
-                                        &idle_mask) )
+                if ( cpumask_intersects(item->cpu_hard_affinity, &idle_mask) )
                 {
                     SCHED_VCPU_STAT_CRANK(cur, kicked_away);
                     SCHED_VCPU_STAT_CRANK(cur, migrate_r);
@@ -704,7 +704,7 @@ static inline bool
 __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
 {
     bool hot = prv->vcpu_migr_delay &&
-               (NOW() - v->last_run_time) < prv->vcpu_migr_delay;
+               (NOW() - v->sched_item->last_run_time) < prv->vcpu_migr_delay;
 
     if ( hot )
         SCHED_STAT_CRANK(vcpu_hot);
@@ -742,7 +742,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
 
     for_each_affinity_balance_step( balance_step )
     {
-        affinity_balance_cpumask(vc, balance_step, cpus);
+        affinity_balance_cpumask(vc->sched_item, balance_step, cpus);
         cpumask_and(cpus, online, cpus);
         /*
          * We want to pick up a pcpu among the ones that are online and
@@ -761,7 +761,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * balancing step all together.
          */
         if ( balance_step == BALANCE_SOFT_AFFINITY &&
-             (!has_soft_affinity(vc) || cpumask_empty(cpus)) )
+             (!has_soft_affinity(vc->sched_item) || cpumask_empty(cpus)) )
             continue;
 
         /* If present, prefer vc's current processor */
@@ -1660,10 +1660,10 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
          * or counter.
          */
         if ( vc->is_running || (balance_step == BALANCE_SOFT_AFFINITY &&
-                                !has_soft_affinity(vc)) )
+                                !has_soft_affinity(vc->sched_item)) )
             continue;
 
-        affinity_balance_cpumask(vc, balance_step, cpumask_scratch);
+        affinity_balance_cpumask(vc->sched_item, balance_step, cpumask_scratch);
         if ( __csched_vcpu_is_migrateable(prv, vc, cpu, cpumask_scratch) )
         {
             /* We got a candidate. Grab it! */
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 6106293b3f..5c1794db61 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -699,10 +699,10 @@ static int get_fallback_cpu(struct csched2_item *svc)
     {
         int cpu = v->processor;
 
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v->sched_item) )
             continue;
 
-        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(v->sched_item, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     cpupool_domain_cpumask(v->domain));
 
@@ -1390,10 +1390,10 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
      */
     if ( score > 0 )
     {
-        if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+        if ( cpumask_test_cpu(cpu, new->vcpu->sched_item->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
 
-        if ( !cpumask_test_cpu(cpu, cur->vcpu->cpu_soft_affinity) )
+        if ( !cpumask_test_cpu(cpu, cur->vcpu->sched_item->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
     }
 
@@ -1436,6 +1436,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_item *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t max = 0;
+    struct sched_item *item = new->vcpu->sched_item;
     unsigned int bs, cpu = new->vcpu->processor;
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
     cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain);
@@ -1473,7 +1474,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_item *new, s_time_t now)
                   cpumask_test_cpu(cpu, &rqd->idle) &&
                   !cpumask_test_cpu(cpu, &rqd->tickled)) )
     {
-        ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu);
+        ASSERT(cpumask_cycle(cpu, item->cpu_hard_affinity) == cpu);
         SCHED_STAT_CRANK(tickled_idle_cpu_excl);
         ipid = cpu;
         goto tickle;
@@ -1482,10 +1483,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_item *new, s_time_t now)
     for_each_affinity_balance_step( bs )
     {
         /* Just skip first step, if we don't have a soft affinity */
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(new->vcpu) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(item) )
             continue;
 
-        affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(item, bs, cpumask_scratch_cpu(cpu));
 
         /*
          * First of all, consider idle cpus, checking if we can just
@@ -1557,7 +1558,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_item *new, s_time_t now)
             ipid = cpu;
 
             /* If this is in new's soft affinity, just take it */
-            if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+            if ( cpumask_test_cpu(cpu, item->cpu_soft_affinity) )
             {
                 SCHED_STAT_CRANK(tickled_busy_cpu);
                 goto tickle;
@@ -2243,7 +2244,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
         goto out;
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
                 cpupool_domain_cpumask(vc->domain));
 
     /*
@@ -2288,7 +2289,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
      *
      * Find both runqueues in one pass.
      */
-    has_soft = has_soft_affinity(vc);
+    has_soft = has_soft_affinity(item);
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
@@ -2335,7 +2336,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
             cpumask_t mask;
 
             cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active);
-            if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) )
+            if ( cpumask_intersects(&mask, item->cpu_soft_affinity) )
             {
                 min_s_avgload = rqd_avgload;
                 min_s_rqi = i;
@@ -2357,9 +2358,9 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
          * Note that, to obtain the soft-affinity mask, we "just" put what we
          * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
          * ok because:
-         * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have
+         * - we know that item->cpu_hard_affinity and ->cpu_soft_affinity have
          *   a non-empty intersection (because has_soft is true);
-         * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already
+         * - we have item->cpu_hard_affinity & cpupool_domain_cpumask() already
          *   in cpumask_scratch, we do save a lot doing like this.
          *
          * It's kind of like open coding affinity_balance_cpumask() but, in
@@ -2367,7 +2368,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
          * cpumask operations.
          */
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                    vc->cpu_soft_affinity);
+                    item->cpu_soft_affinity);
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &prv->rqd[min_s_rqi].active);
     }
@@ -2475,6 +2476,7 @@ static void migrate(const struct scheduler *ops,
                     s_time_t now)
 {
     int cpu = svc->vcpu->processor;
+    struct sched_item *item = svc->vcpu->sched_item;
 
     if ( unlikely(tb_init_done) )
     {
@@ -2512,7 +2514,7 @@ static void migrate(const struct scheduler *ops,
         }
         _runq_deassign(svc);
 
-        cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
                     cpupool_domain_cpumask(svc->vcpu->domain));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
@@ -2546,7 +2548,7 @@ static bool vcpu_is_migrateable(struct csched2_item *svc,
     struct vcpu *v = svc->vcpu;
     int cpu = svc->vcpu->processor;
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), v->sched_item->cpu_hard_affinity,
                 cpupool_domain_cpumask(v->domain));
 
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
@@ -2780,7 +2782,7 @@ csched2_item_migrate(
 
     /* If here, new_cpu must be a valid Credit2 pCPU, and in our affinity. */
     ASSERT(cpumask_test_cpu(new_cpu, &csched2_priv(ops)->initialized));
-    ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
+    ASSERT(cpumask_test_cpu(new_cpu, item->cpu_hard_affinity));
 
     trqd = c2rqd(ops, new_cpu);
 
@@ -3320,9 +3322,9 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     }
 
     /* If scurr has a soft-affinity, let's check whether cpu is part of it */
-    if ( has_soft_affinity(scurr->vcpu) )
+    if ( has_soft_affinity(scurr->vcpu->sched_item) )
     {
-        affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
+        affinity_balance_cpumask(scurr->vcpu->sched_item, BALANCE_SOFT_AFFINITY,
                                  cpumask_scratch);
         if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) )
         {
@@ -3377,7 +3379,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         }
 
         /* Only consider vcpus that are allowed to run on this processor. */
-        if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
+        if ( !cpumask_test_cpu(cpu, svc->vcpu->sched_item->cpu_hard_affinity) )
         {
             (*skipped)++;
             continue;
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 620925e8ce..c45af9f8ee 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -123,7 +123,8 @@ static inline struct null_item *null_item(const struct sched_item *item)
 static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
                                        unsigned int balance_step)
 {
-    affinity_balance_cpumask(v, balance_step, cpumask_scratch_cpu(cpu));
+    affinity_balance_cpumask(v->sched_item, balance_step,
+                             cpumask_scratch_cpu(cpu));
     cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                 cpupool_domain_cpumask(v->domain));
 
@@ -281,10 +282,10 @@ pick_res(struct null_private *prv, struct sched_item *item)
 
     for_each_affinity_balance_step( bs )
     {
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(item) )
             continue;
 
-        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(item, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), cpus);
 
         /*
@@ -321,7 +322,7 @@ pick_res(struct null_private *prv, struct sched_item *item)
      * as we will actually assign the vCPU to the pCPU we return from here,
      * only if the pCPU is free.
      */
-    cpumask_and(cpumask_scratch_cpu(cpu), cpus, v->cpu_hard_affinity);
+    cpumask_and(cpumask_scratch_cpu(cpu), cpus, item->cpu_hard_affinity);
     new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
 
  out:
@@ -438,7 +439,7 @@ static void null_item_insert(const struct scheduler *ops,
 
     lock = item_schedule_lock(item);
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
                 cpupool_domain_cpumask(v->domain));
 
     /* If the pCPU is free, we assign v to it */
@@ -496,7 +497,8 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
     {
         list_for_each_entry( wvc, &prv->waitq, waitq_elem )
         {
-            if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(wvc->vcpu) )
+            if ( bs == BALANCE_SOFT_AFFINITY &&
+                 !has_soft_affinity(wvc->vcpu->sched_item) )
                 continue;
 
             if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
@@ -775,7 +777,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
             list_for_each_entry( wvc, &prv->waitq, waitq_elem )
             {
                 if ( bs == BALANCE_SOFT_AFFINITY &&
-                     !has_soft_affinity(wvc->vcpu) )
+                     !has_soft_affinity(wvc->vcpu->sched_item) )
                     continue;
 
                 if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index a604a0d5a6..58560d086b 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -327,7 +327,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_item *svc)
     mask = cpumask_scratch_cpu(svc->vcpu->processor);
 
     cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain);
-    cpumask_and(mask, cpupool_mask, svc->vcpu->cpu_hard_affinity);
+    cpumask_and(mask, cpupool_mask, svc->vcpu->sched_item->cpu_hard_affinity);
     printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
            " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
            " \t\t priority_level=%d has_extratime=%d\n"
@@ -645,7 +645,7 @@ rt_res_pick(const struct scheduler *ops, struct sched_item *item)
     int cpu;
 
     online = cpupool_domain_cpumask(vc->domain);
-    cpumask_and(&cpus, online, vc->cpu_hard_affinity);
+    cpumask_and(&cpus, online, item->cpu_hard_affinity);
 
     cpu = cpumask_test_cpu(vc->processor, &cpus)
             ? vc->processor
@@ -1030,7 +1030,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
 
         /* mask cpu_hard_affinity & cpupool & mask */
         online = cpupool_domain_cpumask(iter_svc->vcpu->domain);
-        cpumask_and(&cpu_common, online, iter_svc->vcpu->cpu_hard_affinity);
+        cpumask_and(&cpu_common, online,
+                    iter_svc->vcpu->sched_item->cpu_hard_affinity);
         cpumask_and(&cpu_common, mask, &cpu_common);
         if ( cpumask_empty(&cpu_common) )
             continue;
@@ -1199,7 +1200,7 @@ runq_tickle(const struct scheduler *ops, struct rt_item *new)
         return;
 
     online = cpupool_domain_cpumask(new->vcpu->domain);
-    cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&not_tickled, online, new->vcpu->sched_item->cpu_hard_affinity);
     cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
 
     /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index a8382d9812..be85fb8000 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -274,6 +274,12 @@ static void sched_free_item(struct sched_item *item)
     }
 
     item->vcpu->sched_item = NULL;
+
+    free_cpumask_var(item->cpu_hard_affinity);
+    free_cpumask_var(item->cpu_hard_affinity_tmp);
+    free_cpumask_var(item->cpu_hard_affinity_saved);
+    free_cpumask_var(item->cpu_soft_affinity);
+
     xfree(item);
 }
 
@@ -297,7 +303,17 @@ static struct sched_item *sched_alloc_item(struct vcpu *v)
     item->next_in_list = *prev_item;
     *prev_item = item;
 
+    if ( !zalloc_cpumask_var(&item->cpu_hard_affinity) ||
+         !zalloc_cpumask_var(&item->cpu_hard_affinity_tmp) ||
+         !zalloc_cpumask_var(&item->cpu_hard_affinity_saved) ||
+         !zalloc_cpumask_var(&item->cpu_soft_affinity) )
+        goto fail;
+
     return item;
+
+ fail:
+    sched_free_item(item);
+    return NULL;
 }
 
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
@@ -367,7 +383,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        if ( v->affinity_broken )
+        if ( v->sched_item->affinity_broken )
             return -EBUSY;
     }
 
@@ -692,7 +708,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
              */
             if ( pick_called &&
                  (new_lock == per_cpu(sched_res, new_cpu)->schedule_lock) &&
-                 cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) &&
+                 cpumask_test_cpu(new_cpu, v->sched_item->cpu_hard_affinity) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
 
@@ -768,6 +784,7 @@ void restore_vcpu_affinity(struct domain *d)
     {
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
+        struct sched_item *item = v->sched_item;
 
         ASSERT(!vcpu_runnable(v));
 
@@ -779,15 +796,15 @@ void restore_vcpu_affinity(struct domain *d)
          * set v->processor of each of their vCPUs to something that will
          * make sense for the scheduler of the cpupool in which they are in.
          */
-        cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
                     cpupool_domain_cpumask(d));
         if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
         {
-            if ( v->affinity_broken )
+            if ( item->affinity_broken )
             {
-                sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
-                v->affinity_broken = 0;
-                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                sched_set_affinity(v, item->cpu_hard_affinity_saved, NULL);
+                item->affinity_broken = 0;
+                cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
             }
 
@@ -795,18 +812,17 @@ void restore_vcpu_affinity(struct domain *d)
             {
                 printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
                 sched_set_affinity(v, &cpumask_all, NULL);
-                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
             }
         }
 
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
-        v->sched_item->res = per_cpu(sched_res, v->processor);
+        item->res = per_cpu(sched_res, v->processor);
 
-        lock = item_schedule_lock_irq(v->sched_item);
-        v->sched_item->res = SCHED_OP(vcpu_scheduler(v), pick_resource,
-                                      v->sched_item);
-        v->processor = v->sched_item->res->processor;
+        lock = item_schedule_lock_irq(item);
+        item->res = SCHED_OP(vcpu_scheduler(v), pick_resource, item);
+        v->processor = item->res->processor;
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -838,16 +854,17 @@ int cpu_disable_scheduler(unsigned int cpu)
         for_each_vcpu ( d, v )
         {
             unsigned long flags;
-            spinlock_t *lock = item_schedule_lock_irqsave(v->sched_item, &flags);
+            struct sched_item *item = v->sched_item;
+            spinlock_t *lock = item_schedule_lock_irqsave(item, &flags);
 
-            cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid);
+            cpumask_and(&online_affinity, item->cpu_hard_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
-                 cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
+                 cpumask_test_cpu(cpu, item->cpu_hard_affinity) )
             {
-                if ( v->affinity_broken )
+                if ( item->affinity_broken )
                 {
                     /* The vcpu is temporarily pinned, can't move it. */
-                    item_schedule_unlock_irqrestore(lock, flags, v->sched_item);
+                    item_schedule_unlock_irqrestore(lock, flags, item);
                     ret = -EADDRINUSE;
                     break;
                 }
@@ -860,7 +877,7 @@ int cpu_disable_scheduler(unsigned int cpu)
             if ( v->processor != cpu )
             {
                 /* The vcpu is not on this cpu, so we can move on. */
-                item_schedule_unlock_irqrestore(lock, flags, v->sched_item);
+                item_schedule_unlock_irqrestore(lock, flags, item);
                 continue;
             }
 
@@ -873,7 +890,7 @@ int cpu_disable_scheduler(unsigned int cpu)
              *    things would have failed before getting in here.
              */
             vcpu_migrate_start(v);
-            item_schedule_unlock_irqrestore(lock, flags, v->sched_item);
+            item_schedule_unlock_irqrestore(lock, flags, item);
 
             vcpu_migrate_finish(v);
 
@@ -904,7 +921,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
     {
         for_each_vcpu ( d, v )
         {
-            if ( v->affinity_broken )
+            if ( v->sched_item->affinity_broken )
                 return -EADDRINUSE;
             if ( system_state != SYS_STATE_suspend && v->processor == cpu )
                 return -EAGAIN;
@@ -924,29 +941,30 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    SCHED_OP(dom_scheduler(v->domain), adjust_affinity, v->sched_item,
-             hard, soft);
+    struct sched_item *item = v->sched_item;
+    SCHED_OP(dom_scheduler(v->domain), adjust_affinity, item, hard, soft);
 
     if ( hard )
-        cpumask_copy(v->cpu_hard_affinity, hard);
+        cpumask_copy(item->cpu_hard_affinity, hard);
     if ( soft )
-        cpumask_copy(v->cpu_soft_affinity, soft);
+        cpumask_copy(item->cpu_soft_affinity, soft);
 
-    v->soft_aff_effective = !cpumask_subset(v->cpu_hard_affinity,
-                                            v->cpu_soft_affinity) &&
-                            cpumask_intersects(v->cpu_soft_affinity,
-                                               v->cpu_hard_affinity);
+    item->soft_aff_effective = !cpumask_subset(item->cpu_hard_affinity,
+                                               item->cpu_soft_affinity) &&
+                               cpumask_intersects(item->cpu_soft_affinity,
+                                                  item->cpu_hard_affinity);
 }
 
 static int vcpu_set_affinity(
     struct vcpu *v, const cpumask_t *affinity, const cpumask_t *which)
 {
+    struct sched_item *item = v->sched_item;
     spinlock_t *lock;
     int ret = 0;
 
-    lock = item_schedule_lock_irq(v->sched_item);
+    lock = item_schedule_lock_irq(item);
 
-    if ( v->affinity_broken )
+    if ( item->affinity_broken )
         ret = -EBUSY;
     else
     {
@@ -954,19 +972,19 @@ static int vcpu_set_affinity(
          * Tell the scheduler we changes something about affinity,
          * and ask to re-evaluate vcpu placement.
          */
-        if ( which == v->cpu_hard_affinity )
+        if ( which == item->cpu_hard_affinity )
         {
             sched_set_affinity(v, affinity, NULL);
         }
         else
         {
-            ASSERT(which == v->cpu_soft_affinity);
+            ASSERT(which == item->cpu_soft_affinity);
             sched_set_affinity(v, NULL, affinity);
         }
         vcpu_migrate_start(v);
     }
 
-    item_schedule_unlock_irq(lock, v->sched_item);
+    item_schedule_unlock_irq(lock, item);
 
     domain_update_node_affinity(v->domain);
 
@@ -988,12 +1006,12 @@ int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity)
     if ( cpumask_empty(&online_affinity) )
         return -EINVAL;
 
-    return vcpu_set_affinity(v, affinity, v->cpu_hard_affinity);
+    return vcpu_set_affinity(v, affinity, v->sched_item->cpu_hard_affinity);
 }
 
 int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity)
 {
-    return vcpu_set_affinity(v, affinity, v->cpu_soft_affinity);
+    return vcpu_set_affinity(v, affinity, v->sched_item->cpu_soft_affinity);
 }
 
 /* Block the currently-executing domain until a pertinent event occurs. */
@@ -1187,28 +1205,30 @@ void watchdog_domain_destroy(struct domain *d)
 
 int vcpu_pin_override(struct vcpu *v, int cpu)
 {
+    struct sched_item *item = v->sched_item;
     spinlock_t *lock;
     int ret = -EINVAL;
 
-    lock = item_schedule_lock_irq(v->sched_item);
+    lock = item_schedule_lock_irq(item);
 
     if ( cpu < 0 )
     {
-        if ( v->affinity_broken )
+        if ( item->affinity_broken )
         {
-            sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
-            v->affinity_broken = 0;
+            sched_set_affinity(v, item->cpu_hard_affinity_saved, NULL);
+            item->affinity_broken = 0;
             ret = 0;
         }
     }
     else if ( cpu < nr_cpu_ids )
     {
-        if ( v->affinity_broken )
+        if ( item->affinity_broken )
             ret = -EBUSY;
         else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) )
         {
-            cpumask_copy(v->cpu_hard_affinity_saved, v->cpu_hard_affinity);
-            v->affinity_broken = 1;
+            cpumask_copy(item->cpu_hard_affinity_saved,
+                         item->cpu_hard_affinity);
+            item->affinity_broken = 1;
             sched_set_affinity(v, cpumask_of(cpu), NULL);
             ret = 0;
         }
@@ -1217,7 +1237,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     if ( ret == 0 )
         vcpu_migrate_start(v);
 
-    item_schedule_unlock_irq(lock, v->sched_item);
+    item_schedule_unlock_irq(lock, item);
 
     domain_update_node_affinity(v->domain);
 
@@ -1569,7 +1589,7 @@ static void schedule(void)
         ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
          (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
         now);
-    prev->last_run_time = now;
+    prev->sched_item->last_run_time = now;
 
     ASSERT(next->runstate.state != RUNSTATE_running);
     vcpu_runstate_change(next, RUNSTATE_running, now);
diff --git a/xen/common/wait.c b/xen/common/wait.c
index 4f830a14e8..6b91092c71 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -20,6 +20,7 @@
  */
 
 #include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/softirq.h>
 #include <xen/wait.h>
 #include <xen/errno.h>
@@ -132,7 +133,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
 
     /* Save current VCPU affinity; force wakeup on *this* CPU only. */
     wqv->wakeup_cpu = smp_processor_id();
-    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
+    cpumask_copy(&wqv->saved_affinity, curr->sched_item->cpu_hard_affinity);
     if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
     {
         gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
@@ -199,7 +200,7 @@ void check_wakeup_from_wait(void)
     {
         /* Re-set VCPU affinity and re-enter the scheduler. */
         struct vcpu *curr = current;
-        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
+        cpumask_copy(&wqv->saved_affinity, curr->sched_item->cpu_hard_affinity);
         if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
         {
             gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index d549ef696e..577015b868 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -54,6 +54,22 @@ struct sched_item {
     void                  *priv;      /* scheduler private data */
     struct sched_item     *next_in_list;
     struct sched_resource *res;
+
+    /* Last time when item has been scheduled out. */
+    uint64_t               last_run_time;
+
+    /* Item needs affinity restored. */
+    bool                   affinity_broken;
+    /* Does soft affinity actually play a role (given hard affinity)? */
+    bool                   soft_aff_effective;
+    /* Bitmask of CPUs on which this VCPU may run. */
+    cpumask_var_t          cpu_hard_affinity;
+    /* Used to change affinity temporarily. */
+    cpumask_var_t          cpu_hard_affinity_tmp;
+    /* Used to restore affinity across S3. */
+    cpumask_var_t          cpu_hard_affinity_saved;
+    /* Bitmask of CPUs on which this VCPU prefers to run. */
+    cpumask_var_t          cpu_soft_affinity;
 };
 
 #define for_each_sched_item(d, e)                                         \
@@ -290,11 +306,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
  * * The hard affinity is not a subset of soft affinity
  * * There is an overlap between the soft and hard affinity masks
  */
-static inline int has_soft_affinity(const struct vcpu *v)
+static inline int has_soft_affinity(const struct sched_item *item)
 {
-    return v->soft_aff_effective &&
-           !cpumask_subset(cpupool_domain_cpumask(v->domain),
-                           v->cpu_soft_affinity);
+    return item->soft_aff_effective &&
+           !cpumask_subset(cpupool_domain_cpumask(item->vcpu->domain),
+                           item->cpu_soft_affinity);
 }
 
 /*
@@ -304,17 +320,18 @@ static inline int has_soft_affinity(const struct vcpu *v)
  * to avoid running a vcpu where it would like, but is not allowed to!
  */
 static inline void
-affinity_balance_cpumask(const struct vcpu *v, int step, cpumask_t *mask)
+affinity_balance_cpumask(const struct sched_item *item, int step,
+                         cpumask_t *mask)
 {
     if ( step == BALANCE_SOFT_AFFINITY )
     {
-        cpumask_and(mask, v->cpu_soft_affinity, v->cpu_hard_affinity);
+        cpumask_and(mask, item->cpu_soft_affinity, item->cpu_hard_affinity);
 
         if ( unlikely(cpumask_empty(mask)) )
-            cpumask_copy(mask, v->cpu_hard_affinity);
+            cpumask_copy(mask, item->cpu_hard_affinity);
     }
     else /* step == BALANCE_HARD_AFFINITY */
-        cpumask_copy(mask, v->cpu_hard_affinity);
+        cpumask_copy(mask, item->cpu_hard_affinity);
 }
 
 #endif /* __XEN_SCHED_IF_H__ */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 2e9ced29a8..4b59de42da 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -175,9 +175,6 @@ struct vcpu
     } runstate_guest; /* guest address */
 #endif
 
-    /* last time when vCPU is scheduled out */
-    uint64_t last_run_time;
-
     /* Has the FPU been initialised? */
     bool             fpu_initialised;
     /* Has the FPU been used since it was last saved? */
@@ -203,8 +200,6 @@ struct vcpu
     bool             defer_shutdown;
     /* VCPU is paused following shutdown request (d->is_shutting_down)? */
     bool             paused_for_shutdown;
-    /* VCPU need affinity restored */
-    bool             affinity_broken;
 
     /* A hypercall has been preempted. */
     bool             hcall_preempted;
@@ -213,9 +208,6 @@ struct vcpu
     bool             hcall_compat;
 #endif
 
-    /* Does soft affinity actually play a role (given hard affinity)? */
-    bool             soft_aff_effective;
-
     /* The CPU, if any, which is holding onto this VCPU's state. */
 #define VCPU_CPU_CLEAN (~0u)
     unsigned int     dirty_cpu;
@@ -247,16 +239,6 @@ struct vcpu
     evtchn_port_t    virq_to_evtchn[NR_VIRQS];
     spinlock_t       virq_lock;
 
-    /* Bitmask of CPUs on which this VCPU may run. */
-    cpumask_var_t    cpu_hard_affinity;
-    /* Used to change affinity temporarily. */
-    cpumask_var_t    cpu_hard_affinity_tmp;
-    /* Used to restore affinity across S3. */
-    cpumask_var_t    cpu_hard_affinity_saved;
-
-    /* Bitmask of CPUs on which this VCPU prefers to run. */
-    cpumask_var_t    cpu_soft_affinity;
-
     /* Tasklet for continue_hypercall_on_cpu(). */
     struct tasklet   continue_hypercall_tasklet;
 
@@ -964,7 +946,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v)
 }
 
 #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \
-                           cpumask_weight((v)->cpu_hard_affinity) == 1)
+             cpumask_weight((v)->sched_item->cpu_hard_affinity) == 1)
 #ifdef CONFIG_HAS_PASSTHROUGH
 #define has_iommu_pt(d) (dom_iommu(d)->status != IOMMU_STATUS_disabled)
 #define need_iommu_pt_sync(d) (dom_iommu(d)->need_sync)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 18/49] xen/sched: add scheduler helpers hiding vcpu
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (16 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 19/49] xen/sched: add domain pointer to struct sched_item Juergen Gross
                   ` (36 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add the following helpers using a sched_item as input instead of a
vcpu:

- is_idle_item() similar to is_idle_vcpu()
- item_runnable() like vcpu_runnable()
- sched_set_res() to set the current processor of an item
- sched_item_cpu() to get the current processor of an item
- sched_{set|clear}_pause_flags[_atomic]() to modify pause_flags of the
  associated vcpu(s)
- sched_idle_item() to get the sched_item pointer of the idle vcpu of a
  specific physical cpu

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c  |  3 +--
 xen/common/schedule.c      | 24 +++++++++-------------
 xen/include/xen/sched-if.h | 51 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+), 16 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 9e7c849b94..8cfe54ec36 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1673,8 +1673,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
             SCHED_STAT_CRANK(migrate_queued);
             WARN_ON(vc->is_urgent);
             runq_remove(speer);
-            vc->processor = cpu;
-            vc->sched_item->res = per_cpu(sched_res, cpu);
+            sched_set_res(vc->sched_item, per_cpu(sched_res, cpu));
             /*
              * speer will start executing directly on cpu, without having to
              * go through runq_insert(). So we must update the runnable count
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index be85fb8000..99660cee67 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -321,12 +321,11 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     struct domain *d = v->domain;
     struct sched_item *item;
 
-    v->processor = processor;
-
     if ( (item = sched_alloc_item(v)) == NULL )
         return 1;
 
-    item->res = per_cpu(sched_res, processor);
+    sched_set_res(item, per_cpu(sched_res, processor));
+
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -440,8 +439,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         sched_set_affinity(v, &cpumask_all, &cpumask_all);
 
-        v->processor = new_p;
-	v->sched_item->res = per_cpu(sched_res, new_p);
+        sched_set_res(v->sched_item, per_cpu(sched_res, new_p));
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
@@ -632,10 +630,7 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
     if ( vcpu_scheduler(v)->migrate )
         SCHED_OP(vcpu_scheduler(v), migrate, v->sched_item, new_cpu);
     else
-    {
-        v->processor = new_cpu;
-        v->sched_item->res = per_cpu(sched_res, new_cpu);
-    }
+        sched_set_res(v->sched_item, per_cpu(sched_res, new_cpu));
 }
 
 /*
@@ -785,8 +780,9 @@ void restore_vcpu_affinity(struct domain *d)
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
         struct sched_item *item = v->sched_item;
+        struct sched_resource *res;
 
-        ASSERT(!vcpu_runnable(v));
+        ASSERT(!item_runnable(item));
 
         /*
          * Re-assign the initial processor as after resume we have no
@@ -817,12 +813,12 @@ void restore_vcpu_affinity(struct domain *d)
             }
         }
 
-        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
-        item->res = per_cpu(sched_res, v->processor);
+        res = per_cpu(sched_res, cpumask_any(cpumask_scratch_cpu(cpu)));
+        sched_set_res(item, res);
 
         lock = item_schedule_lock_irq(item);
-        item->res = SCHED_OP(vcpu_scheduler(v), pick_resource, item);
-        v->processor = item->res->processor;
+        res = SCHED_OP(vcpu_scheduler(v), pick_resource, item);
+        sched_set_res(item, res);
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 577015b868..b5cbbbbcb1 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -79,6 +79,57 @@ struct sched_item {
     for ( (v) = (i)->vcpu; (v) != NULL && (v)->sched_item == (i);         \
           (v) = (v)->next_in_list )
 
+static inline bool is_idle_item(const struct sched_item *item)
+{
+    return is_idle_vcpu(item->vcpu);
+}
+
+static inline bool item_runnable(const struct sched_item *item)
+{
+    return vcpu_runnable(item->vcpu);
+}
+
+static inline void sched_set_res(struct sched_item *item,
+                                 struct sched_resource *res)
+{
+    item->vcpu->processor = res->processor;
+    item->res = res;
+}
+
+static inline unsigned int sched_item_cpu(struct sched_item *item)
+{
+    return item->res->processor;
+}
+
+static inline void sched_set_pause_flags(struct sched_item *item,
+                                         unsigned int bit)
+{
+    __set_bit(bit, &item->vcpu->pause_flags);
+}
+
+static inline void sched_clear_pause_flags(struct sched_item *item,
+                                           unsigned int bit)
+{
+    __clear_bit(bit, &item->vcpu->pause_flags);
+}
+
+static inline void sched_set_pause_flags_atomic(struct sched_item *item,
+                                                unsigned int bit)
+{
+    set_bit(bit, &item->vcpu->pause_flags);
+}
+
+static inline void sched_clear_pause_flags_atomic(struct sched_item *item,
+                                                  unsigned int bit)
+{
+    clear_bit(bit, &item->vcpu->pause_flags);
+}
+
+static inline struct sched_item *sched_idle_item(unsigned int cpu)
+{
+    return idle_vcpu[cpu]->sched_item;
+}
+
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
  * Within each scheduler, when using the scratch mask of one pCPU:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 19/49] xen/sched: add domain pointer to struct sched_item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (17 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 18/49] xen/sched: add scheduler helpers hiding vcpu Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 20/49] xen/sched: add id " Juergen Gross
                   ` (35 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add a pointer to the domain to struct sched_item in order to avoid
having to dereference the vcpu pointer of struct sched_item to find
the related domain.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c      | 3 ++-
 xen/include/xen/sched-if.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 99660cee67..625d6287c2 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -257,7 +257,7 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
 static void sched_free_item(struct sched_item *item)
 {
     struct sched_item *prev_item;
-    struct domain *d = item->vcpu->domain;
+    struct domain *d = item->domain;
 
     if ( d->sched_item_list == item )
         d->sched_item_list = item->next_in_list;
@@ -293,6 +293,7 @@ static struct sched_item *sched_alloc_item(struct vcpu *v)
 
     v->sched_item = item;
     item->vcpu = v;
+    item->domain = d;
 
     for ( prev_item = &d->sched_item_list; *prev_item;
           prev_item = &(*prev_item)->next_in_list )
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index b5cbbbbcb1..9a524014d0 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -50,6 +50,7 @@ DECLARE_PER_CPU(struct cpupool *, cpupool);
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
 
 struct sched_item {
+    struct domain         *domain;
     struct vcpu           *vcpu;
     void                  *priv;      /* scheduler private data */
     struct sched_item     *next_in_list;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 20/49] xen/sched: add id to struct sched_item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (18 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 19/49] xen/sched: add domain pointer to struct sched_item Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 21/49] xen/sched: rename scheduler related perf counters Juergen Gross
                   ` (34 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add an identifier to sched_item. For now it will be the same as the
related vcpu_id.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c      | 3 ++-
 xen/include/xen/sched-if.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 625d6287c2..7a7ec56402 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -293,12 +293,13 @@ static struct sched_item *sched_alloc_item(struct vcpu *v)
 
     v->sched_item = item;
     item->vcpu = v;
+    item->item_id = v->vcpu_id;
     item->domain = d;
 
     for ( prev_item = &d->sched_item_list; *prev_item;
           prev_item = &(*prev_item)->next_in_list )
         if ( (*prev_item)->next_in_list &&
-             (*prev_item)->next_in_list->vcpu->vcpu_id > v->vcpu_id )
+             (*prev_item)->next_in_list->item_id > item->item_id )
             break;
 
     item->next_in_list = *prev_item;
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 9a524014d0..1e4a7e1e64 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -55,6 +55,7 @@ struct sched_item {
     void                  *priv;      /* scheduler private data */
     struct sched_item     *next_in_list;
     struct sched_resource *res;
+    int                    item_id;
 
     /* Last time when item has been scheduled out. */
     uint64_t               last_run_time;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 21/49] xen/sched: rename scheduler related perf counters
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (19 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 20/49] xen/sched: add id " Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 22/49] xen/sched: switch struct task_slice from vcpu to sched_item Juergen Gross
                   ` (33 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Meng Xu, Jan Beulich

Rename the scheduler related perf counters from vcpu* to item* where
appropriate.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c    | 32 ++++++++++++++++----------------
 xen/common/sched_credit2.c   | 18 +++++++++---------
 xen/common/sched_null.c      | 18 +++++++++---------
 xen/common/sched_rt.c        | 16 ++++++++--------
 xen/include/xen/perfc_defn.h | 30 +++++++++++++++---------------
 5 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 8cfe54ec36..3ea0d40afb 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -684,7 +684,7 @@ __csched_vcpu_check(struct vcpu *vc)
         BUG_ON( !is_idle_vcpu(vc) );
     }
 
-    SCHED_STAT_CRANK(vcpu_check);
+    SCHED_STAT_CRANK(item_check);
 }
 #define CSCHED_VCPU_CHECK(_vc)  (__csched_vcpu_check(_vc))
 #else
@@ -707,7 +707,7 @@ __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
                (NOW() - v->sched_item->last_run_time) < prv->vcpu_migr_delay;
 
     if ( hot )
-        SCHED_STAT_CRANK(vcpu_hot);
+        SCHED_STAT_CRANK(item_hot);
 
     return hot;
 }
@@ -895,7 +895,7 @@ __csched_vcpu_acct_start(struct csched_private *prv, struct csched_item *svc)
     if ( list_empty(&svc->active_vcpu_elem) )
     {
         SCHED_VCPU_STAT_CRANK(svc, state_active);
-        SCHED_STAT_CRANK(acct_vcpu_active);
+        SCHED_STAT_CRANK(acct_item_active);
 
         sdom->active_vcpu_count++;
         list_add(&svc->active_vcpu_elem, &sdom->active_vcpu);
@@ -922,7 +922,7 @@ __csched_vcpu_acct_stop_locked(struct csched_private *prv,
     BUG_ON( list_empty(&svc->active_vcpu_elem) );
 
     SCHED_VCPU_STAT_CRANK(svc, state_idle);
-    SCHED_STAT_CRANK(acct_vcpu_idle);
+    SCHED_STAT_CRANK(acct_item_idle);
 
     BUG_ON( prv->weight < sdom->weight );
     sdom->active_vcpu_count--;
@@ -1024,7 +1024,7 @@ csched_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
     svc->pri = is_idle_domain(vc->domain) ?
         CSCHED_PRI_IDLE : CSCHED_PRI_TS_UNDER;
     SCHED_VCPU_STATS_RESET(svc);
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(item_alloc);
     return svc;
 }
 
@@ -1052,7 +1052,7 @@ csched_item_insert(const struct scheduler *ops, struct sched_item *item)
 
     item_schedule_unlock_irq(lock, item);
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(item_insert);
 }
 
 static void
@@ -1072,13 +1072,13 @@ csched_item_remove(const struct scheduler *ops, struct sched_item *item)
     struct csched_item * const svc = CSCHED_ITEM(item);
     struct csched_dom * const sdom = svc->sdom;
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(item_remove);
 
     ASSERT(!__vcpu_on_runq(svc));
 
     if ( test_and_clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
     {
-        SCHED_STAT_CRANK(vcpu_unpark);
+        SCHED_STAT_CRANK(item_unpark);
         vcpu_unpause(svc->vcpu);
     }
 
@@ -1099,7 +1099,7 @@ csched_item_sleep(const struct scheduler *ops, struct sched_item *item)
     struct csched_item * const svc = CSCHED_ITEM(item);
     unsigned int cpu = vc->processor;
 
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(item_sleep);
 
     BUG_ON( is_idle_vcpu(vc) );
 
@@ -1128,19 +1128,19 @@ csched_item_wake(const struct scheduler *ops, struct sched_item *item)
 
     if ( unlikely(curr_on_cpu(vc->processor) == item) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(item_wake_running);
         return;
     }
     if ( unlikely(__vcpu_on_runq(svc)) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(item_wake_onrunq);
         return;
     }
 
     if ( likely(vcpu_runnable(vc)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(item_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(item_wake_not_runnable);
 
     /*
      * We temporarly boost the priority of awaking VCPUs!
@@ -1170,7 +1170,7 @@ csched_item_wake(const struct scheduler *ops, struct sched_item *item)
          !test_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
     {
         TRACE_2D(TRC_CSCHED_BOOST_START, vc->domain->domain_id, vc->vcpu_id);
-        SCHED_STAT_CRANK(vcpu_boost);
+        SCHED_STAT_CRANK(item_boost);
         svc->pri = CSCHED_PRI_TS_BOOST;
     }
 
@@ -1529,7 +1529,7 @@ csched_acct(void* dummy)
                      credit < -credit_cap &&
                      !test_and_set_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
                 {
-                    SCHED_STAT_CRANK(vcpu_park);
+                    SCHED_STAT_CRANK(item_park);
                     vcpu_pause_nosync(svc->vcpu);
                 }
 
@@ -1553,7 +1553,7 @@ csched_acct(void* dummy)
                      * call to make sure the VCPU's priority is not boosted
                      * if it is woken up here.
                      */
-                    SCHED_STAT_CRANK(vcpu_unpark);
+                    SCHED_STAT_CRANK(item_unpark);
                     vcpu_unpause(svc->vcpu);
                 }
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 5c1794db61..8bee7cb9a2 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2019,7 +2019,7 @@ csched2_vcpu_check(struct vcpu *vc)
     {
         BUG_ON( !is_idle_vcpu(vc) );
     }
-    SCHED_STAT_CRANK(vcpu_check);
+    SCHED_STAT_CRANK(item_check);
 }
 #define CSCHED2_VCPU_CHECK(_vc)  (csched2_vcpu_check(_vc))
 #else
@@ -2066,7 +2066,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
     svc->budget_quota = 0;
     INIT_LIST_HEAD(&svc->parked_elem);
 
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(item_alloc);
 
     return svc;
 }
@@ -2078,7 +2078,7 @@ csched2_item_sleep(const struct scheduler *ops, struct sched_item *item)
     struct csched2_item * const svc = csched2_item(item);
 
     ASSERT(!is_idle_vcpu(vc));
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(item_sleep);
 
     if ( curr_on_cpu(vc->processor) == item )
     {
@@ -2108,20 +2108,20 @@ csched2_item_wake(const struct scheduler *ops, struct sched_item *item)
 
     if ( unlikely(curr_on_cpu(cpu) == item) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(item_wake_running);
         goto out;
     }
 
     if ( unlikely(vcpu_on_runq(svc)) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(item_wake_onrunq);
         goto out;
     }
 
     if ( likely(vcpu_runnable(vc)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(item_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(item_wake_not_runnable);
 
     /* If the context hasn't been saved for this vcpu yet, we can't put it on
      * another runqueue.  Instead, we set a flag so that it will be put on the runqueue
@@ -3137,7 +3137,7 @@ csched2_item_insert(const struct scheduler *ops, struct sched_item *item)
 
     sdom->nr_vcpus++;
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(item_insert);
 
     CSCHED2_VCPU_CHECK(vc);
 }
@@ -3160,7 +3160,7 @@ csched2_item_remove(const struct scheduler *ops, struct sched_item *item)
     ASSERT(!is_idle_vcpu(vc));
     ASSERT(list_empty(&svc->runq_elem));
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(item_remove);
 
     /* Remove from runqueue */
     lock = item_schedule_lock_irq(item);
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index c45af9f8ee..5570cc1a8c 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -207,7 +207,7 @@ static void *null_alloc_vdata(const struct scheduler *ops,
     INIT_LIST_HEAD(&nvc->waitq_elem);
     nvc->vcpu = v;
 
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(item_alloc);
 
     return nvc;
 }
@@ -473,7 +473,7 @@ static void null_item_insert(const struct scheduler *ops,
     }
     spin_unlock_irq(lock);
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(item_insert);
 }
 
 static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
@@ -544,7 +544,7 @@ static void null_item_remove(const struct scheduler *ops,
  out:
     item_schedule_unlock_irq(lock, item);
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(item_remove);
 }
 
 static void null_item_wake(const struct scheduler *ops,
@@ -556,21 +556,21 @@ static void null_item_wake(const struct scheduler *ops,
 
     if ( unlikely(curr_on_cpu(v->processor) == item) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(item_wake_running);
         return;
     }
 
     if ( unlikely(!list_empty(&null_item(item)->waitq_elem)) )
     {
         /* Not exactly "on runq", but close enough for reusing the counter */
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(item_wake_onrunq);
         return;
     }
 
     if ( likely(vcpu_runnable(v)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(item_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(item_wake_not_runnable);
 
     /* Note that we get here only for vCPUs assigned to a pCPU */
     cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
@@ -587,7 +587,7 @@ static void null_item_sleep(const struct scheduler *ops,
     if ( curr_on_cpu(v->processor) == item )
         cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
 
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(item_sleep);
 }
 
 static struct sched_resource *
@@ -697,7 +697,7 @@ static inline void null_vcpu_check(struct vcpu *v)
     else
         BUG_ON(!is_idle_vcpu(v));
 
-    SCHED_STAT_CRANK(vcpu_check);
+    SCHED_STAT_CRANK(item_check);
 }
 #define NULL_VCPU_CHECK(v)  (null_vcpu_check(v))
 #else
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 58560d086b..0639cdce0a 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -870,7 +870,7 @@ rt_alloc_vdata(const struct scheduler *ops, struct sched_item *item, void *dd)
     if ( !is_idle_vcpu(vc) )
         svc->budget = RTDS_DEFAULT_BUDGET;
 
-    SCHED_STAT_CRANK(vcpu_alloc);
+    SCHED_STAT_CRANK(item_alloc);
 
     return svc;
 }
@@ -919,7 +919,7 @@ rt_item_insert(const struct scheduler *ops, struct sched_item *item)
     }
     item_schedule_unlock_irq(lock, item);
 
-    SCHED_STAT_CRANK(vcpu_insert);
+    SCHED_STAT_CRANK(item_insert);
 }
 
 /*
@@ -932,7 +932,7 @@ rt_item_remove(const struct scheduler *ops, struct sched_item *item)
     struct rt_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
-    SCHED_STAT_CRANK(vcpu_remove);
+    SCHED_STAT_CRANK(item_remove);
 
     BUG_ON( sdom == NULL );
 
@@ -1154,7 +1154,7 @@ rt_item_sleep(const struct scheduler *ops, struct sched_item *item)
     struct rt_item * const svc = rt_item(item);
 
     BUG_ON( is_idle_vcpu(vc) );
-    SCHED_STAT_CRANK(vcpu_sleep);
+    SCHED_STAT_CRANK(item_sleep);
 
     if ( curr_on_cpu(vc->processor) == item )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
@@ -1275,21 +1275,21 @@ rt_item_wake(const struct scheduler *ops, struct sched_item *item)
 
     if ( unlikely(curr_on_cpu(vc->processor) == item) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_running);
+        SCHED_STAT_CRANK(item_wake_running);
         return;
     }
 
     /* on RunQ/DepletedQ, just update info is ok */
     if ( unlikely(vcpu_on_q(svc)) )
     {
-        SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        SCHED_STAT_CRANK(item_wake_onrunq);
         return;
     }
 
     if ( likely(vcpu_runnable(vc)) )
-        SCHED_STAT_CRANK(vcpu_wake_runnable);
+        SCHED_STAT_CRANK(item_wake_runnable);
     else
-        SCHED_STAT_CRANK(vcpu_wake_not_runnable);
+        SCHED_STAT_CRANK(item_wake_not_runnable);
 
     /*
      * If a deadline passed while svc was asleep/blocked, we need new
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index 1ad4384080..25af4dbd12 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -21,20 +21,20 @@ PERFCOUNTER(sched_ctx,              "sched: context switches")
 PERFCOUNTER(schedule,               "sched: specific scheduler")
 PERFCOUNTER(dom_init,               "sched: dom_init")
 PERFCOUNTER(dom_destroy,            "sched: dom_destroy")
-PERFCOUNTER(vcpu_alloc,             "sched: vcpu_alloc")
-PERFCOUNTER(vcpu_insert,            "sched: vcpu_insert")
-PERFCOUNTER(vcpu_remove,            "sched: vcpu_remove")
-PERFCOUNTER(vcpu_sleep,             "sched: vcpu_sleep")
 PERFCOUNTER(vcpu_yield,             "sched: vcpu_yield")
-PERFCOUNTER(vcpu_wake_running,      "sched: vcpu_wake_running")
-PERFCOUNTER(vcpu_wake_onrunq,       "sched: vcpu_wake_onrunq")
-PERFCOUNTER(vcpu_wake_runnable,     "sched: vcpu_wake_runnable")
-PERFCOUNTER(vcpu_wake_not_runnable, "sched: vcpu_wake_not_runnable")
+PERFCOUNTER(item_alloc,             "sched: item_alloc")
+PERFCOUNTER(item_insert,            "sched: item_insert")
+PERFCOUNTER(item_remove,            "sched: item_remove")
+PERFCOUNTER(item_sleep,             "sched: item_sleep")
+PERFCOUNTER(item_wake_running,      "sched: item_wake_running")
+PERFCOUNTER(item_wake_onrunq,       "sched: item_wake_onrunq")
+PERFCOUNTER(item_wake_runnable,     "sched: item_wake_runnable")
+PERFCOUNTER(item_wake_not_runnable, "sched: item_wake_not_runnable")
 PERFCOUNTER(tickled_no_cpu,         "sched: tickled_no_cpu")
 PERFCOUNTER(tickled_idle_cpu,       "sched: tickled_idle_cpu")
 PERFCOUNTER(tickled_idle_cpu_excl,  "sched: tickled_idle_cpu_exclusive")
 PERFCOUNTER(tickled_busy_cpu,       "sched: tickled_busy_cpu")
-PERFCOUNTER(vcpu_check,             "sched: vcpu_check")
+PERFCOUNTER(item_check,             "sched: item_check")
 
 /* credit specific counters */
 PERFCOUNTER(delay_ms,               "csched: delay")
@@ -43,11 +43,11 @@ PERFCOUNTER(acct_no_work,           "csched: acct_no_work")
 PERFCOUNTER(acct_balance,           "csched: acct_balance")
 PERFCOUNTER(acct_reorder,           "csched: acct_reorder")
 PERFCOUNTER(acct_min_credit,        "csched: acct_min_credit")
-PERFCOUNTER(acct_vcpu_active,       "csched: acct_vcpu_active")
-PERFCOUNTER(acct_vcpu_idle,         "csched: acct_vcpu_idle")
-PERFCOUNTER(vcpu_boost,             "csched: vcpu_boost")
-PERFCOUNTER(vcpu_park,              "csched: vcpu_park")
-PERFCOUNTER(vcpu_unpark,            "csched: vcpu_unpark")
+PERFCOUNTER(acct_item_active,       "csched: acct_item_active")
+PERFCOUNTER(acct_item_idle,         "csched: acct_item_idle")
+PERFCOUNTER(item_boost,             "csched: item_boost")
+PERFCOUNTER(item_park,              "csched: item_park")
+PERFCOUNTER(item_unpark,            "csched: item_unpark")
 PERFCOUNTER(load_balance_idle,      "csched: load_balance_idle")
 PERFCOUNTER(load_balance_over,      "csched: load_balance_over")
 PERFCOUNTER(load_balance_other,     "csched: load_balance_other")
@@ -57,7 +57,7 @@ PERFCOUNTER(steal_peer_idle,        "csched: steal_peer_idle")
 PERFCOUNTER(migrate_queued,         "csched: migrate_queued")
 PERFCOUNTER(migrate_running,        "csched: migrate_running")
 PERFCOUNTER(migrate_kicked_away,    "csched: migrate_kicked_away")
-PERFCOUNTER(vcpu_hot,               "csched: vcpu_hot")
+PERFCOUNTER(item_hot,               "csched: item_hot")
 
 /* credit2 specific counters */
 PERFCOUNTER(burn_credits_t2c,       "csched2: burn_credits_t2c")
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 22/49] xen/sched: switch struct task_slice from vcpu to sched_item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (20 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 21/49] xen/sched: rename scheduler related perf counters Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 23/49] xen/sched: move is_running indicator to struct sched_item Juergen Gross
                   ` (32 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

Let the schedulers put a sched_item pointer into struct task_slice
instead of a vcpu pointer.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |  8 ++++----
 xen/common/sched_credit.c   |  4 ++--
 xen/common/sched_credit2.c  |  4 ++--
 xen/common/sched_null.c     | 12 ++++++------
 xen/common/sched_rt.c       |  2 +-
 xen/common/schedule.c       |  2 +-
 xen/include/xen/sched-if.h  |  6 +++---
 7 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 9dc1ff6a73..5733a2a6b8 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -554,9 +554,9 @@ a653sched_do_schedule(
 
     /*
      * If there are more domains to run in the current major frame, set
-     * new_task equal to the address of next domain's VCPU structure.
-     * Otherwise, set new_task equal to the address of the idle task's VCPU
-     * structure.
+     * new_task equal to the address of next domain's sched_item structure.
+     * Otherwise, set new_task equal to the address of the idle task's
+     * sched_item structure.
      */
     new_task = (sched_index < sched_priv->num_schedule_entries)
         ? sched_priv->schedule[sched_index].vc
@@ -592,7 +592,7 @@ a653sched_do_schedule(
      * of the selected task's VCPU structure.
      */
     ret.time = next_switch_time - now;
-    ret.task = new_task;
+    ret.task = new_task->sched_item;
     ret.migrated = 0;
 
     BUG_ON(ret.time <= 0);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 3ea0d40afb..29076e362b 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -2005,9 +2005,9 @@ out:
      */
     ret.time = (is_idle_vcpu(snext->vcpu) ?
                 -1 : tslice);
-    ret.task = snext->vcpu;
+    ret.task = snext->vcpu->sched_item;
 
-    CSCHED_VCPU_CHECK(ret.task);
+    CSCHED_VCPU_CHECK(ret.task->vcpu);
     return ret;
 }
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 8bee7cb9a2..9bf045d20f 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3636,9 +3636,9 @@ csched2_schedule(
      * Return task to run next...
      */
     ret.time = csched2_runtime(ops, cpu, snext, now);
-    ret.task = snext->vcpu;
+    ret.task = snext->vcpu->sched_item;
 
-    CSCHED2_VCPU_CHECK(ret.task);
+    CSCHED2_VCPU_CHECK(ret.task->vcpu);
     return ret;
 }
 
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 5570cc1a8c..62c51e2c83 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -746,10 +746,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = idle_vcpu[cpu];
+        ret.task = idle_vcpu[cpu]->sched_item;
     }
     else
-        ret.task = per_cpu(npc, cpu).vcpu;
+        ret.task = per_cpu(npc, cpu).vcpu->sched_item;
     ret.migrated = 0;
     ret.time = -1;
 
@@ -784,7 +784,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
                 {
                     vcpu_assign(prv, wvc->vcpu, cpu);
                     list_del_init(&wvc->waitq_elem);
-                    ret.task = wvc->vcpu;
+                    ret.task = wvc->vcpu->sched_item;
                     goto unlock;
                 }
             }
@@ -793,10 +793,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
         spin_unlock(&prv->waitq_lock);
     }
 
-    if ( unlikely(ret.task == NULL || !vcpu_runnable(ret.task)) )
-        ret.task = idle_vcpu[cpu];
+    if ( unlikely(ret.task == NULL || !item_runnable(ret.task)) )
+        ret.task = idle_vcpu[cpu]->sched_item;
 
-    NULL_VCPU_CHECK(ret.task);
+    NULL_VCPU_CHECK(ret.task->vcpu);
     return ret;
 }
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 0639cdce0a..374a9d2383 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1138,7 +1138,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
     }
-    ret.task = snext->vcpu;
+    ret.task = snext->vcpu->sched_item;
 
     return ret;
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7a7ec56402..b295b0b81e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1549,7 +1549,7 @@ static void schedule(void)
     sched = this_cpu(scheduler);
     next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
 
-    next = next_slice.task;
+    next = next_slice.task->vcpu;
 
     sd->curr = next->sched_item;
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 1e4a7e1e64..3dcf1dca19 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -208,9 +208,9 @@ static inline spinlock_t *pcpu_schedule_trylock(unsigned int cpu)
 }
 
 struct task_slice {
-    struct vcpu *task;
-    s_time_t     time;
-    bool_t       migrated;
+    struct sched_item *task;
+    s_time_t           time;
+    bool_t             migrated;
 };
 
 struct scheduler {
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 23/49] xen/sched: move is_running indicator to struct sched_item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (21 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 22/49] xen/sched: switch struct task_slice from vcpu to sched_item Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 24/49] xen/sched: make null scheduler vcpu agnostic Juergen Gross
                   ` (31 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Kevin Tian, Stefano Stabellini, Wei Liu,
	Jun Nakajima, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Paul Durrant, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Move the is_running indicator from struct vcpu to struct sched_item.
For non-scheduler parts introduce a vcpu_running() access function for
obtaining the related value.

At the same time introduce a state_entry_time field in struct
sched_item being updated whenever the is_running indicator is changed.
Use that new field in the schedulers instead of the similar vcpu field.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain.c                |  2 +-
 xen/arch/x86/hvm/hvm.c               |  3 ++-
 xen/arch/x86/hvm/viridian/viridian.c |  1 +
 xen/arch/x86/hvm/vmx/vmcs.c          |  6 ++++--
 xen/arch/x86/hvm/vmx/vmx.c           |  5 +++--
 xen/common/domctl.c                  |  4 ++--
 xen/common/keyhandler.c              |  2 +-
 xen/common/sched_credit.c            | 10 +++++-----
 xen/common/sched_credit2.c           | 18 +++++++++---------
 xen/common/sched_rt.c                |  2 +-
 xen/common/schedule.c                | 19 +++++++++++--------
 xen/include/xen/sched-if.h           | 11 ++++++++++-
 xen/include/xen/sched.h              |  2 --
 13 files changed, 50 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 5d8f3255cb..53b8fa1c9d 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2137,7 +2137,7 @@ void vcpu_kick(struct vcpu *v)
      * NB2. We save the running flag across the unblock to avoid a needless
      * IPI for domains that we IPI'd to unblock.
      */
-    bool running = v->is_running;
+    bool running = vcpu_running(v);
 
     vcpu_unblock(v);
     if ( running && (in_irq() || (v != current)) )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8adbb61b57..f184136f81 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -23,6 +23,7 @@
 #include <xen/lib.h>
 #include <xen/trace.h>
 #include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/irq.h>
 #include <xen/softirq.h>
 #include <xen/domain.h>
@@ -3984,7 +3985,7 @@ bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
     /* Now that all VCPUs are signalled to deschedule, we wait... */
     for_each_vcpu ( d, v )
         if ( v != current && flush_vcpu(ctxt, v) )
-            while ( !vcpu_runnable(v) && v->is_running )
+            while ( !vcpu_runnable(v) && vcpu_running(v) )
                 cpu_relax();
 
     /* All other vcpus are paused, safe to unlock now. */
diff --git a/xen/arch/x86/hvm/viridian/viridian.c b/xen/arch/x86/hvm/viridian/viridian.c
index 425af56856..5779efc81f 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -6,6 +6,7 @@
  */
 
 #include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/version.h>
 #include <xen/hypercall.h>
 #include <xen/domain_page.h>
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 74f2a08cfd..257fb00528 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -23,6 +23,8 @@
 #include <xen/event.h>
 #include <xen/kernel.h>
 #include <xen/keyhandler.h>
+#include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/vm_event.h>
 #include <asm/current.h>
 #include <asm/cpufeature.h>
@@ -562,7 +564,7 @@ void vmx_vmcs_reload(struct vcpu *v)
      * v->arch.hvm.vmx.vmcs_lock here. However, with interrupts disabled
      * the VMCS can't be taken away from us anymore if we still own it.
      */
-    ASSERT(v->is_running || !local_irq_is_enabled());
+    ASSERT(vcpu_running(v) || !local_irq_is_enabled());
     if ( v->arch.hvm.vmx.vmcs_pa == this_cpu(current_vmcs) )
         return;
 
@@ -1576,7 +1578,7 @@ void vmx_vcpu_flush_pml_buffer(struct vcpu *v)
     uint64_t *pml_buf;
     unsigned long pml_idx;
 
-    ASSERT((v == current) || (!vcpu_runnable(v) && !v->is_running));
+    ASSERT((v == current) || (!vcpu_runnable(v) && !vcpu_running(v)));
     ASSERT(vmx_vcpu_pml_enabled(v));
 
     vmx_vmcs_enter(v);
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 725dd88c13..0056fd0191 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -19,6 +19,7 @@
 #include <xen/lib.h>
 #include <xen/trace.h>
 #include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/irq.h>
 #include <xen/softirq.h>
 #include <xen/domain_page.h>
@@ -907,7 +908,7 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
     if ( unlikely(!this_cpu(vmxon)) )
         return;
 
-    if ( !v->is_running )
+    if ( !vcpu_running(v) )
     {
         /*
          * When this vCPU isn't marked as running anymore, a remote pCPU's
@@ -2004,7 +2005,7 @@ static void vmx_process_isr(int isr, struct vcpu *v)
 
 static void __vmx_deliver_posted_interrupt(struct vcpu *v)
 {
-    bool_t running = v->is_running;
+    bool_t running = vcpu_running(v);
 
     vcpu_unblock(v);
     /*
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 8464713d2b..6a9a54130d 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -173,7 +173,7 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
         {
             if ( !(v->pause_flags & VPF_blocked) )
                 flags &= ~XEN_DOMINF_blocked;
-            if ( v->is_running )
+            if ( vcpu_running(v) )
                 flags |= XEN_DOMINF_running;
             info->nr_online_vcpus++;
         }
@@ -841,7 +841,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 
         op->u.getvcpuinfo.online   = !(v->pause_flags & VPF_down);
         op->u.getvcpuinfo.blocked  = !!(v->pause_flags & VPF_blocked);
-        op->u.getvcpuinfo.running  = v->is_running;
+        op->u.getvcpuinfo.running  = vcpu_running(v);
         op->u.getvcpuinfo.cpu_time = runstate.time[RUNSTATE_running];
         op->u.getvcpuinfo.cpu      = v->processor;
         ret = 0;
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index f50df5841d..0d312ff953 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -306,7 +306,7 @@ static void dump_domains(unsigned char key)
             printk("    VCPU%d: CPU%d [has=%c] poll=%d "
                    "upcall_pend=%02x upcall_mask=%02x ",
                    v->vcpu_id, v->processor,
-                   v->is_running ? 'T':'F', v->poll_evtchn,
+                   vcpu_running(v) ? 'T':'F', v->poll_evtchn,
                    vcpu_info(v, evtchn_upcall_pending),
                    !vcpu_event_delivery_is_enabled(v));
             if ( vcpu_cpu_dirty(v) )
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 29076e362b..6d0639109a 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -723,7 +723,7 @@ __csched_vcpu_is_migrateable(const struct csched_private *prv, struct vcpu *vc,
      * The caller is supposed to have already checked that vc is also
      * not running.
      */
-    ASSERT(!vc->is_running);
+    ASSERT(!vcpu_running(vc));
 
     return !__csched_vcpu_is_cache_hot(prv, vc) &&
            cpumask_test_cpu(dest_cpu, mask);
@@ -1047,7 +1047,7 @@ csched_item_insert(const struct scheduler *ops, struct sched_item *item)
 
     lock = item_schedule_lock_irq(item);
 
-    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )
+    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vcpu_running(vc) )
         runq_insert(svc);
 
     item_schedule_unlock_irq(lock, item);
@@ -1659,8 +1659,8 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
          * vCPUs with useful soft affinities in some sort of bitmap
          * or counter.
          */
-        if ( vc->is_running || (balance_step == BALANCE_SOFT_AFFINITY &&
-                                !has_soft_affinity(vc->sched_item)) )
+        if ( vcpu_running(vc) || (balance_step == BALANCE_SOFT_AFFINITY &&
+                                  !has_soft_affinity(vc->sched_item)) )
             continue;
 
         affinity_balance_cpumask(vc->sched_item, balance_step, cpumask_scratch);
@@ -1868,7 +1868,7 @@ csched_schedule(
                     (unsigned char *)&d);
     }
 
-    runtime = now - current->runstate.state_entry_time;
+    runtime = now - current->sched_item->state_entry_time;
     if ( runtime < 0 ) /* Does this ever happen? */
         runtime = 0;
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 9bf045d20f..5aa819b2c5 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1283,7 +1283,7 @@ runq_insert(const struct scheduler *ops, struct csched2_item *svc)
 
     ASSERT(&svc->rqd->runq == runq);
     ASSERT(!is_idle_vcpu(svc->vcpu));
-    ASSERT(!svc->vcpu->is_running);
+    ASSERT(!vcpu_running(svc->vcpu));
     ASSERT(!(svc->flags & CSFLAG_scheduled));
 
     list_for_each( iter, runq )
@@ -1340,8 +1340,8 @@ static inline bool is_preemptable(const struct csched2_item *svc,
     if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
         return true;
 
-    ASSERT(svc->vcpu->is_running);
-    return now - svc->vcpu->runstate.state_entry_time >
+    ASSERT(vcpu_running(svc->vcpu));
+    return now - svc->vcpu->sched_item->state_entry_time >
            ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE;
 }
 
@@ -2931,7 +2931,7 @@ csched2_dom_cntl(
                 {
                     svc = csched2_item(v->sched_item);
                     lock = item_schedule_lock(svc->vcpu->sched_item);
-                    if ( v->is_running )
+                    if ( vcpu_running(v) )
                     {
                         unsigned int cpu = v->processor;
                         struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
@@ -3204,8 +3204,8 @@ csched2_runtime(const struct scheduler *ops, int cpu,
     if ( prv->ratelimit_us )
     {
         s_time_t ratelimit_min = MICROSECS(prv->ratelimit_us);
-        if ( snext->vcpu->is_running )
-            ratelimit_min = snext->vcpu->runstate.state_entry_time +
+        if ( vcpu_running(snext->vcpu) )
+            ratelimit_min = snext->vcpu->sched_item->state_entry_time +
                             MICROSECS(prv->ratelimit_us) - now;
         if ( ratelimit_min > min_time )
             min_time = ratelimit_min;
@@ -3302,7 +3302,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * no point forcing it to do so until rate limiting expires.
      */
     if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) &&
-         (now - scurr->vcpu->runstate.state_entry_time) <
+         (now - scurr->vcpu->sched_item->state_entry_time) <
           MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
@@ -3313,7 +3313,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
             } d;
             d.dom = scurr->vcpu->domain->domain_id;
             d.vcpu = scurr->vcpu->vcpu_id;
-            d.runtime = now - scurr->vcpu->runstate.state_entry_time;
+            d.runtime = now - scurr->vcpu->sched_item->state_entry_time;
             __trace_var(TRC_CSCHED2_RATELIMIT, 1,
                         sizeof(d),
                         (unsigned char *)&d);
@@ -3561,7 +3561,7 @@ csched2_schedule(
         if ( snext != scurr )
         {
             ASSERT(snext->rqd == rqd);
-            ASSERT(!snext->vcpu->is_running);
+            ASSERT(!vcpu_running(snext->vcpu));
 
             runq_remove(snext);
             __set_bit(__CSFLAG_scheduled, &snext->flags);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 374a9d2383..9efe807230 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -914,7 +914,7 @@ rt_item_insert(const struct scheduler *ops, struct sched_item *item)
     {
         replq_insert(ops, svc);
 
-        if ( !vc->is_running )
+        if ( !vcpu_running(vc) )
             runq_insert(ops, svc);
     }
     item_schedule_unlock_irq(lock, item);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index b295b0b81e..ae2a6d0323 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -356,7 +356,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     if ( is_idle_domain(d) )
     {
         per_cpu(sched_res, v->processor)->curr = item;
-        v->is_running = 1;
+        item->is_running = 1;
+        item->state_entry_time = NOW();
     }
     else
     {
@@ -555,7 +556,7 @@ void vcpu_sleep_sync(struct vcpu *v)
 {
     vcpu_sleep_nosync(v);
 
-    while ( !vcpu_runnable(v) && v->is_running )
+    while ( !vcpu_runnable(v) && vcpu_running(v) )
         cpu_relax();
 
     sync_vcpu_execstate(v);
@@ -680,7 +681,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
      * context_saved(); and in any case, if the bit is cleared, then
      * someone else has already done the work so we don't need to.
      */
-    if ( v->is_running || !test_bit(_VPF_migrating, &v->pause_flags) )
+    if ( vcpu_running(v) || !test_bit(_VPF_migrating, &v->pause_flags) )
         return;
 
     old_cpu = new_cpu = v->processor;
@@ -734,7 +735,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
      * because they both happen in (different) spinlock regions, and those
      * regions are strictly serialised.
      */
-    if ( v->is_running ||
+    if ( vcpu_running(v) ||
          !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
     {
         sched_spin_unlock_double(old_lock, new_lock, flags);
@@ -762,7 +763,7 @@ void vcpu_force_reschedule(struct vcpu *v)
 {
     spinlock_t *lock = item_schedule_lock_irq(v->sched_item);
 
-    if ( v->is_running )
+    if ( vcpu_running(v) )
         vcpu_migrate_start(v);
 
     item_schedule_unlock_irq(lock, v->sched_item);
@@ -1597,8 +1598,9 @@ static void schedule(void)
      * switch, else lost_records resume will not work properly.
      */
 
-    ASSERT(!next->is_running);
-    next->is_running = 1;
+    ASSERT(!vcpu_running(next));
+    next->sched_item->is_running = 1;
+    next->sched_item->state_entry_time = now;
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
@@ -1619,7 +1621,8 @@ void context_saved(struct vcpu *prev)
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
 
-    prev->is_running = 0;
+    prev->sched_item->is_running = 0;
+    prev->sched_item->state_entry_time = NOW();
 
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 3dcf1dca19..5cacede473 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -59,8 +59,12 @@ struct sched_item {
 
     /* Last time when item has been scheduled out. */
     uint64_t               last_run_time;
+    /* Last time item got (de-)scheduled. */
+    uint64_t               state_entry_time;
 
-    /* Item needs affinity restored. */
+    /* Currently running on a CPU? */
+    bool                   is_running;
+    /* Item needs affinity restored */
     bool                   affinity_broken;
     /* Does soft affinity actually play a role (given hard affinity)? */
     bool                   soft_aff_effective;
@@ -132,6 +136,11 @@ static inline struct sched_item *sched_idle_item(unsigned int cpu)
     return idle_vcpu[cpu]->sched_item;
 }
 
+static inline bool vcpu_running(struct vcpu *v)
+{
+    return v->sched_item->is_running;
+}
+
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
  * Within each scheduler, when using the scratch mask of one pCPU:
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 4b59de42da..21a7fa14ce 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -181,8 +181,6 @@ struct vcpu
     bool             fpu_dirtied;
     /* Initialization completed for this VCPU? */
     bool             is_initialised;
-    /* Currently running on a CPU? */
-    bool             is_running;
     /* VCPU should wake fast (do not deep sleep the CPU). */
     bool             is_urgent;
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 24/49] xen/sched: make null scheduler vcpu agnostic.
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (22 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 23/49] xen/sched: move is_running indicator to struct sched_item Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 25/49] xen/sched: make rt " Juergen Gross
                   ` (30 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Switch null scheduler completely from vcpu to sched_item usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_null.c | 304 ++++++++++++++++++++++++------------------------
 1 file changed, 149 insertions(+), 155 deletions(-)

diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 62c51e2c83..ceb026c8af 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -18,10 +18,10 @@
 
 /*
  * The 'null' scheduler always choose to run, on each pCPU, either nothing
- * (i.e., the pCPU stays idle) or always the same vCPU.
+ * (i.e., the pCPU stays idle) or always the same Item.
  *
  * It is aimed at supporting static scenarios, where there always are
- * less vCPUs than pCPUs (and the vCPUs don't need to move among pCPUs
+ * less Items than pCPUs (and the Items don't need to move among pCPUs
  * for any reason) with the least possible overhead.
  *
  * Typical usecase are embedded applications, but also HPC, especially
@@ -38,8 +38,8 @@
  * null tracing events. Check include/public/trace.h for more details.
  */
 #define TRC_SNULL_PICKED_CPU    TRC_SCHED_CLASS_EVT(SNULL, 1)
-#define TRC_SNULL_VCPU_ASSIGN   TRC_SCHED_CLASS_EVT(SNULL, 2)
-#define TRC_SNULL_VCPU_DEASSIGN TRC_SCHED_CLASS_EVT(SNULL, 3)
+#define TRC_SNULL_ITEM_ASSIGN   TRC_SCHED_CLASS_EVT(SNULL, 2)
+#define TRC_SNULL_ITEM_DEASSIGN TRC_SCHED_CLASS_EVT(SNULL, 3)
 #define TRC_SNULL_MIGRATE       TRC_SCHED_CLASS_EVT(SNULL, 4)
 #define TRC_SNULL_SCHEDULE      TRC_SCHED_CLASS_EVT(SNULL, 5)
 #define TRC_SNULL_TASKLET       TRC_SCHED_CLASS_EVT(SNULL, 6)
@@ -48,13 +48,13 @@
  * Locking:
  * - Scheduler-lock (a.k.a. runqueue lock):
  *  + is per-pCPU;
- *  + serializes assignment and deassignment of vCPUs to a pCPU.
+ *  + serializes assignment and deassignment of Items to a pCPU.
  * - Private data lock (a.k.a. private scheduler lock):
  *  + is scheduler-wide;
  *  + serializes accesses to the list of domains in this scheduler.
  * - Waitqueue lock:
  *  + is scheduler-wide;
- *  + serialize accesses to the list of vCPUs waiting to be assigned
+ *  + serialize accesses to the list of Items waiting to be assigned
  *    to pCPUs.
  *
  * Ordering is: private lock, runqueue lock, waitqueue lock. Or, OTOH,
@@ -78,25 +78,25 @@
 struct null_private {
     spinlock_t lock;        /* scheduler lock; nests inside cpupool_lock */
     struct list_head ndom;  /* Domains of this scheduler                 */
-    struct list_head waitq; /* vCPUs not assigned to any pCPU            */
+    struct list_head waitq; /* Items not assigned to any pCPU            */
     spinlock_t waitq_lock;  /* serializes waitq; nests inside runq locks */
-    cpumask_t cpus_free;    /* CPUs without a vCPU associated to them    */
+    cpumask_t cpus_free;    /* CPUs without a Item associated to them    */
 };
 
 /*
  * Physical CPU
  */
 struct null_pcpu {
-    struct vcpu *vcpu;
+    struct sched_item *item;
 };
 DEFINE_PER_CPU(struct null_pcpu, npc);
 
 /*
- * Virtual CPU
+ * Schedule Item
  */
 struct null_item {
     struct list_head waitq_elem;
-    struct vcpu *vcpu;
+    struct sched_item *item;
 };
 
 /*
@@ -120,13 +120,13 @@ static inline struct null_item *null_item(const struct sched_item *item)
     return item->priv;
 }
 
-static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
+static inline bool item_check_affinity(struct sched_item *item,
+                                       unsigned int cpu,
                                        unsigned int balance_step)
 {
-    affinity_balance_cpumask(v->sched_item, balance_step,
-                             cpumask_scratch_cpu(cpu));
+    affinity_balance_cpumask(item, balance_step, cpumask_scratch_cpu(cpu));
     cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                cpupool_domain_cpumask(v->domain));
+                cpupool_domain_cpumask(item->domain));
 
     return cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu));
 }
@@ -161,9 +161,9 @@ static void null_deinit(struct scheduler *ops)
 
 static void init_pdata(struct null_private *prv, unsigned int cpu)
 {
-    /* Mark the pCPU as free, and with no vCPU assigned */
+    /* Mark the pCPU as free, and with no item assigned */
     cpumask_set_cpu(cpu, &prv->cpus_free);
-    per_cpu(npc, cpu).vcpu = NULL;
+    per_cpu(npc, cpu).item = NULL;
 }
 
 static void null_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
@@ -191,13 +191,12 @@ static void null_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
     ASSERT(!pcpu);
 
     cpumask_clear_cpu(cpu, &prv->cpus_free);
-    per_cpu(npc, cpu).vcpu = NULL;
+    per_cpu(npc, cpu).item = NULL;
 }
 
 static void *null_alloc_vdata(const struct scheduler *ops,
                               struct sched_item *item, void *dd)
 {
-    struct vcpu *v = item->vcpu;
     struct null_item *nvc;
 
     nvc = xzalloc(struct null_item);
@@ -205,7 +204,7 @@ static void *null_alloc_vdata(const struct scheduler *ops,
         return NULL;
 
     INIT_LIST_HEAD(&nvc->waitq_elem);
-    nvc->vcpu = v;
+    nvc->item = item;
 
     SCHED_STAT_CRANK(item_alloc);
 
@@ -257,15 +256,15 @@ static void null_free_domdata(const struct scheduler *ops, void *data)
 }
 
 /*
- * vCPU to pCPU assignment and placement. This _only_ happens:
+ * item to pCPU assignment and placement. This _only_ happens:
  *  - on insert,
  *  - on migrate.
  *
- * Insert occurs when a vCPU joins this scheduler for the first time
+ * Insert occurs when a item joins this scheduler for the first time
  * (e.g., when the domain it's part of is moved to the scheduler's
  * cpupool).
  *
- * Migration may be necessary if a pCPU (with a vCPU assigned to it)
+ * Migration may be necessary if a pCPU (with a item assigned to it)
  * is removed from the scheduler's cpupool.
  *
  * So this is not part of any hot path.
@@ -274,9 +273,8 @@ static struct sched_resource *
 pick_res(struct null_private *prv, struct sched_item *item)
 {
     unsigned int bs;
-    struct vcpu *v = item->vcpu;
-    unsigned int cpu = v->processor, new_cpu;
-    cpumask_t *cpus = cpupool_domain_cpumask(v->domain);
+    unsigned int cpu = sched_item_cpu(item), new_cpu;
+    cpumask_t *cpus = cpupool_domain_cpumask(item->domain);
 
     ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
@@ -291,11 +289,12 @@ pick_res(struct null_private *prv, struct sched_item *item)
         /*
          * If our processor is free, or we are assigned to it, and it is also
          * still valid and part of our affinity, just go for it.
-         * (Note that we may call vcpu_check_affinity(), but we deliberately
+         * (Note that we may call item_check_affinity(), but we deliberately
          * don't, so we get to keep in the scratch cpumask what we have just
          * put in it.)
          */
-        if ( likely((per_cpu(npc, cpu).vcpu == NULL || per_cpu(npc, cpu).vcpu == v)
+        if ( likely((per_cpu(npc, cpu).item == NULL ||
+                     per_cpu(npc, cpu).item == item)
                     && cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
         {
             new_cpu = cpu;
@@ -313,13 +312,13 @@ pick_res(struct null_private *prv, struct sched_item *item)
 
     /*
      * If we didn't find any free pCPU, just pick any valid pcpu, even if
-     * it has another vCPU assigned. This will happen during shutdown and
+     * it has another Item assigned. This will happen during shutdown and
      * suspend/resume, but it may also happen during "normal operation", if
      * all the pCPUs are busy.
      *
      * In fact, there must always be something sane in v->processor, or
      * item_schedule_lock() and friends won't work. This is not a problem,
-     * as we will actually assign the vCPU to the pCPU we return from here,
+     * as we will actually assign the Item to the pCPU we return from here,
      * only if the pCPU is free.
      */
     cpumask_and(cpumask_scratch_cpu(cpu), cpus, item->cpu_hard_affinity);
@@ -329,11 +328,11 @@ pick_res(struct null_private *prv, struct sched_item *item)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t item, dom;
             uint32_t new_cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
+        d.dom = item->domain->domain_id;
+        d.item = item->item_id;
         d.new_cpu = new_cpu;
         __trace_var(TRC_SNULL_PICKED_CPU, 1, sizeof(d), &d);
     }
@@ -341,47 +340,47 @@ pick_res(struct null_private *prv, struct sched_item *item)
     return per_cpu(sched_res, new_cpu);
 }
 
-static void vcpu_assign(struct null_private *prv, struct vcpu *v,
+static void item_assign(struct null_private *prv, struct sched_item *item,
                         unsigned int cpu)
 {
-    per_cpu(npc, cpu).vcpu = v;
-    v->processor = cpu;
-    v->sched_item->res = per_cpu(sched_res, cpu);
+    per_cpu(npc, cpu).item = item;
+    sched_set_res(item, per_cpu(sched_res, cpu));
     cpumask_clear_cpu(cpu, &prv->cpus_free);
 
-    dprintk(XENLOG_G_INFO, "%d <-- %pv\n", cpu, v);
+    dprintk(XENLOG_G_INFO, "%d <-- %pdv%d\n", cpu, item->domain, item->item_id);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t item, dom;
             uint32_t cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
+        d.dom = item->domain->domain_id;
+        d.item = item->item_id;
         d.cpu = cpu;
-        __trace_var(TRC_SNULL_VCPU_ASSIGN, 1, sizeof(d), &d);
+        __trace_var(TRC_SNULL_ITEM_ASSIGN, 1, sizeof(d), &d);
     }
 }
 
-static void vcpu_deassign(struct null_private *prv, struct vcpu *v,
+static void item_deassign(struct null_private *prv, struct sched_item *item,
                           unsigned int cpu)
 {
-    per_cpu(npc, cpu).vcpu = NULL;
+    per_cpu(npc, cpu).item = NULL;
     cpumask_set_cpu(cpu, &prv->cpus_free);
 
-    dprintk(XENLOG_G_INFO, "%d <-- NULL (%pv)\n", cpu, v);
+    dprintk(XENLOG_G_INFO, "%d <-- NULL (%pdv%d)\n", cpu, item->domain,
+            item->item_id);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t item, dom;
             uint32_t cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
+        d.dom = item->domain->domain_id;
+        d.item = item->item_id;
         d.cpu = cpu;
-        __trace_var(TRC_SNULL_VCPU_DEASSIGN, 1, sizeof(d), &d);
+        __trace_var(TRC_SNULL_ITEM_DEASSIGN, 1, sizeof(d), &d);
     }
 }
 
@@ -393,9 +392,9 @@ static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     struct null_private *prv = null_priv(new_ops);
     struct null_item *nvc = vdata;
 
-    ASSERT(nvc && is_idle_vcpu(nvc->vcpu));
+    ASSERT(nvc && is_idle_item(nvc->item));
 
-    idle_vcpu[cpu]->sched_item->priv = vdata;
+    sched_idle_item(cpu)->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -421,35 +420,34 @@ static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 static void null_item_insert(const struct scheduler *ops,
                              struct sched_item *item)
 {
-    struct vcpu *v = item->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_item *nvc = null_item(item);
     unsigned int cpu;
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_item(item));
 
     lock = item_schedule_lock_irq(item);
  retry:
 
-    item->res = pick_res(prv, item);
-    cpu = v->processor = item->res->processor;
+    sched_set_res(item, pick_res(prv, item));
+    cpu = sched_item_cpu(item);
 
     spin_unlock(lock);
 
     lock = item_schedule_lock(item);
 
     cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
-                cpupool_domain_cpumask(v->domain));
+                cpupool_domain_cpumask(item->domain));
 
-    /* If the pCPU is free, we assign v to it */
-    if ( likely(per_cpu(npc, cpu).vcpu == NULL) )
+    /* If the pCPU is free, we assign item to it */
+    if ( likely(per_cpu(npc, cpu).item == NULL) )
     {
         /*
          * Insert is followed by vcpu_wake(), so there's no need to poke
          * the pcpu with the SCHEDULE_SOFTIRQ, as wake will do that.
          */
-        vcpu_assign(prv, v, cpu);
+        item_assign(prv, item, cpu);
     }
     else if ( cpumask_intersects(&prv->cpus_free, cpumask_scratch_cpu(cpu)) )
     {
@@ -468,7 +466,8 @@ static void null_item_insert(const struct scheduler *ops,
          */
         spin_lock(&prv->waitq_lock);
         list_add_tail(&nvc->waitq_elem, &prv->waitq);
-        dprintk(XENLOG_G_WARNING, "WARNING: %pv not assigned to any CPU!\n", v);
+        dprintk(XENLOG_G_WARNING, "WARNING: %pdv%d not assigned to any CPU!\n",
+                item->domain, item->item_id);
         spin_unlock(&prv->waitq_lock);
     }
     spin_unlock_irq(lock);
@@ -476,35 +475,34 @@ static void null_item_insert(const struct scheduler *ops,
     SCHED_STAT_CRANK(item_insert);
 }
 
-static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
+static void _item_remove(struct null_private *prv, struct sched_item *item)
 {
     unsigned int bs;
-    unsigned int cpu = v->processor;
+    unsigned int cpu = sched_item_cpu(item);
     struct null_item *wvc;
 
-    ASSERT(list_empty(&null_item(v->sched_item)->waitq_elem));
+    ASSERT(list_empty(&null_item(item)->waitq_elem));
 
-    vcpu_deassign(prv, v, cpu);
+    item_deassign(prv, item, cpu);
 
     spin_lock(&prv->waitq_lock);
 
     /*
-     * If v is assigned to a pCPU, let's see if there is someone waiting,
-     * suitable to be assigned to it (prioritizing vcpus that have
+     * If item is assigned to a pCPU, let's see if there is someone waiting,
+     * suitable to be assigned to it (prioritizing items that have
      * soft-affinity with cpu).
      */
     for_each_affinity_balance_step( bs )
     {
         list_for_each_entry( wvc, &prv->waitq, waitq_elem )
         {
-            if ( bs == BALANCE_SOFT_AFFINITY &&
-                 !has_soft_affinity(wvc->vcpu->sched_item) )
+            if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(wvc->item) )
                 continue;
 
-            if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
+            if ( item_check_affinity(wvc->item, cpu, bs) )
             {
                 list_del_init(&wvc->waitq_elem);
-                vcpu_assign(prv, wvc->vcpu, cpu);
+                item_assign(prv, wvc->item, cpu);
                 cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
                 spin_unlock(&prv->waitq_lock);
                 return;
@@ -517,16 +515,15 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
 static void null_item_remove(const struct scheduler *ops,
                              struct sched_item *item)
 {
-    struct vcpu *v = item->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_item *nvc = null_item(item);
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_item(item));
 
     lock = item_schedule_lock_irq(item);
 
-    /* If v is in waitqueue, just get it out of there and bail */
+    /* If item is in waitqueue, just get it out of there and bail */
     if ( unlikely(!list_empty(&nvc->waitq_elem)) )
     {
         spin_lock(&prv->waitq_lock);
@@ -536,10 +533,10 @@ static void null_item_remove(const struct scheduler *ops,
         goto out;
     }
 
-    ASSERT(per_cpu(npc, v->processor).vcpu == v);
-    ASSERT(!cpumask_test_cpu(v->processor, &prv->cpus_free));
+    ASSERT(per_cpu(npc, sched_item_cpu(item)).item == item);
+    ASSERT(!cpumask_test_cpu(sched_item_cpu(item), &prv->cpus_free));
 
-    _vcpu_remove(prv, v);
+    _item_remove(prv, item);
 
  out:
     item_schedule_unlock_irq(lock, item);
@@ -550,11 +547,9 @@ static void null_item_remove(const struct scheduler *ops,
 static void null_item_wake(const struct scheduler *ops,
                            struct sched_item *item)
 {
-    struct vcpu *v = item->vcpu;
+    ASSERT(!is_idle_item(item));
 
-    ASSERT(!is_idle_vcpu(v));
-
-    if ( unlikely(curr_on_cpu(v->processor) == item) )
+    if ( unlikely(curr_on_cpu(sched_item_cpu(item)) == item) )
     {
         SCHED_STAT_CRANK(item_wake_running);
         return;
@@ -567,25 +562,23 @@ static void null_item_wake(const struct scheduler *ops,
         return;
     }
 
-    if ( likely(vcpu_runnable(v)) )
+    if ( likely(item_runnable(item)) )
         SCHED_STAT_CRANK(item_wake_runnable);
     else
         SCHED_STAT_CRANK(item_wake_not_runnable);
 
-    /* Note that we get here only for vCPUs assigned to a pCPU */
-    cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
+    /* Note that we get here only for items assigned to a pCPU */
+    cpu_raise_softirq(sched_item_cpu(item), SCHEDULE_SOFTIRQ);
 }
 
 static void null_item_sleep(const struct scheduler *ops,
                             struct sched_item *item)
 {
-    struct vcpu *v = item->vcpu;
-
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_item(item));
 
-    /* If v is not assigned to a pCPU, or is not running, no need to bother */
-    if ( curr_on_cpu(v->processor) == item )
-        cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
+    /* If item isn't assigned to a pCPU, or isn't running, no need to bother */
+    if ( curr_on_cpu(sched_item_cpu(item)) == item )
+        cpu_raise_softirq(sched_item_cpu(item), SCHEDULE_SOFTIRQ);
 
     SCHED_STAT_CRANK(item_sleep);
 }
@@ -593,37 +586,36 @@ static void null_item_sleep(const struct scheduler *ops,
 static struct sched_resource *
 null_res_pick(const struct scheduler *ops, struct sched_item *item)
 {
-    ASSERT(!is_idle_vcpu(item->vcpu));
+    ASSERT(!is_idle_item(item));
     return pick_res(null_priv(ops), item);
 }
 
 static void null_item_migrate(const struct scheduler *ops,
                               struct sched_item *item, unsigned int new_cpu)
 {
-    struct vcpu *v = item->vcpu;
     struct null_private *prv = null_priv(ops);
     struct null_item *nvc = null_item(item);
 
-    ASSERT(!is_idle_vcpu(v));
+    ASSERT(!is_idle_item(item));
 
-    if ( v->processor == new_cpu )
+    if ( sched_item_cpu(item) == new_cpu )
         return;
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            uint16_t vcpu, dom;
+            uint16_t item, dom;
             uint16_t cpu, new_cpu;
         } d;
-        d.dom = v->domain->domain_id;
-        d.vcpu = v->vcpu_id;
-        d.cpu = v->processor;
+        d.dom = item->domain->domain_id;
+        d.item = item->item_id;
+        d.cpu = sched_item_cpu(item);
         d.new_cpu = new_cpu;
         __trace_var(TRC_SNULL_MIGRATE, 1, sizeof(d), &d);
     }
 
     /*
-     * v is either assigned to a pCPU, or in the waitqueue.
+     * item is either assigned to a pCPU, or in the waitqueue.
      *
      * In the former case, the pCPU to which it was assigned would
      * become free, and we, therefore, should check whether there is
@@ -633,7 +625,7 @@ static void null_item_migrate(const struct scheduler *ops,
      */
     if ( likely(list_empty(&nvc->waitq_elem)) )
     {
-        _vcpu_remove(prv, v);
+        _item_remove(prv, item);
         SCHED_STAT_CRANK(migrate_running);
     }
     else
@@ -642,32 +634,34 @@ static void null_item_migrate(const struct scheduler *ops,
     SCHED_STAT_CRANK(migrated);
 
     /*
-     * Let's now consider new_cpu, which is where v is being sent. It can be
-     * either free, or have a vCPU already assigned to it.
+     * Let's now consider new_cpu, which is where item is being sent. It can be
+     * either free, or have a item already assigned to it.
      *
-     * In the former case, we should assign v to it, and try to get it to run,
+     * In the former case we should assign item to it, and try to get it to run,
      * if possible, according to affinity.
      *
-     * In latter, all we can do is to park v in the waitqueue.
+     * In latter, all we can do is to park item in the waitqueue.
      */
-    if ( per_cpu(npc, new_cpu).vcpu == NULL &&
-         vcpu_check_affinity(v, new_cpu, BALANCE_HARD_AFFINITY) )
+    if ( per_cpu(npc, new_cpu).item == NULL &&
+         item_check_affinity(item, new_cpu, BALANCE_HARD_AFFINITY) )
     {
-        /* v might have been in the waitqueue, so remove it */
+        /* item might have been in the waitqueue, so remove it */
         spin_lock(&prv->waitq_lock);
         list_del_init(&nvc->waitq_elem);
         spin_unlock(&prv->waitq_lock);
 
-        vcpu_assign(prv, v, new_cpu);
+        item_assign(prv, item, new_cpu);
     }
     else
     {
-        /* Put v in the waitqueue, if it wasn't there already */
+        /* Put item in the waitqueue, if it wasn't there already */
         spin_lock(&prv->waitq_lock);
         if ( list_empty(&nvc->waitq_elem) )
         {
             list_add_tail(&nvc->waitq_elem, &prv->waitq);
-            dprintk(XENLOG_G_WARNING, "WARNING: %pv not assigned to any CPU!\n", v);
+            dprintk(XENLOG_G_WARNING,
+                    "WARNING: %pdv%d not assigned to any CPU!\n", item->domain,
+                    item->item_id);
         }
         spin_unlock(&prv->waitq_lock);
     }
@@ -680,35 +674,34 @@ static void null_item_migrate(const struct scheduler *ops,
      * at least. In case of suspend, any temporary inconsistency caused
      * by this, will be fixed-up during resume.
      */
-    v->processor = new_cpu;
-    item->res = per_cpu(sched_res, new_cpu);
+    sched_set_res(item, per_cpu(sched_res, new_cpu));
 }
 
 #ifndef NDEBUG
-static inline void null_vcpu_check(struct vcpu *v)
+static inline void null_item_check(struct sched_item *item)
 {
-    struct null_item * const nvc = null_item(v->sched_item);
-    struct null_dom * const ndom = v->domain->sched_priv;
+    struct null_item * const nvc = null_item(item);
+    struct null_dom * const ndom = item->domain->sched_priv;
 
-    BUG_ON(nvc->vcpu != v);
+    BUG_ON(nvc->item != item);
 
     if ( ndom )
-        BUG_ON(is_idle_vcpu(v));
+        BUG_ON(is_idle_item(item));
     else
-        BUG_ON(!is_idle_vcpu(v));
+        BUG_ON(!is_idle_item(item));
 
     SCHED_STAT_CRANK(item_check);
 }
-#define NULL_VCPU_CHECK(v)  (null_vcpu_check(v))
+#define NULL_ITEM_CHECK(item)  (null_item_check(item))
 #else
-#define NULL_VCPU_CHECK(v)
+#define NULL_ITEM_CHECK(item)
 #endif
 
 
 /*
  * The most simple scheduling function of all times! We either return:
- *  - the vCPU assigned to the pCPU, if there's one and it can run;
- *  - the idle vCPU, otherwise.
+ *  - the item assigned to the pCPU, if there's one and it can run;
+ *  - the idle item, otherwise.
  */
 static struct task_slice null_schedule(const struct scheduler *ops,
                                        s_time_t now,
@@ -721,24 +714,24 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     struct task_slice ret;
 
     SCHED_STAT_CRANK(schedule);
-    NULL_VCPU_CHECK(current);
+    NULL_ITEM_CHECK(current->sched_item);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
             uint16_t tasklet, cpu;
-            int16_t vcpu, dom;
+            int16_t item, dom;
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        if ( per_cpu(npc, cpu).vcpu == NULL )
+        if ( per_cpu(npc, cpu).item == NULL )
         {
-            d.vcpu = d.dom = -1;
+            d.item = d.dom = -1;
         }
         else
         {
-            d.vcpu = per_cpu(npc, cpu).vcpu->vcpu_id;
-            d.dom = per_cpu(npc, cpu).vcpu->domain->domain_id;
+            d.item = per_cpu(npc, cpu).item->item_id;
+            d.dom = per_cpu(npc, cpu).item->domain->domain_id;
         }
         __trace_var(TRC_SNULL_SCHEDULE, 1, sizeof(d), &d);
     }
@@ -746,16 +739,16 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = idle_vcpu[cpu]->sched_item;
+        ret.task = sched_idle_item(cpu);
     }
     else
-        ret.task = per_cpu(npc, cpu).vcpu->sched_item;
+        ret.task = per_cpu(npc, cpu).item;
     ret.migrated = 0;
     ret.time = -1;
 
     /*
      * We may be new in the cpupool, or just coming back online. In which
-     * case, there may be vCPUs in the waitqueue that we can assign to us
+     * case, there may be items in the waitqueue that we can assign to us
      * and run.
      */
     if ( unlikely(ret.task == NULL) )
@@ -766,10 +759,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
             goto unlock;
 
         /*
-         * We scan the waitqueue twice, for prioritizing vcpus that have
+         * We scan the waitqueue twice, for prioritizing items that have
          * soft-affinity with cpu. This may look like something expensive to
-         * do here in null_schedule(), but it's actually fine, beceuse we do
-         * it only in cases where a pcpu has no vcpu associated (e.g., as
+         * do here in null_schedule(), but it's actually fine, because we do
+         * it only in cases where a pcpu has no item associated (e.g., as
          * said above, the cpu has just joined a cpupool).
          */
         for_each_affinity_balance_step( bs )
@@ -777,14 +770,14 @@ static struct task_slice null_schedule(const struct scheduler *ops,
             list_for_each_entry( wvc, &prv->waitq, waitq_elem )
             {
                 if ( bs == BALANCE_SOFT_AFFINITY &&
-                     !has_soft_affinity(wvc->vcpu->sched_item) )
+                     !has_soft_affinity(wvc->item) )
                     continue;
 
-                if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
+                if ( item_check_affinity(wvc->item, cpu, bs) )
                 {
-                    vcpu_assign(prv, wvc->vcpu, cpu);
+                    item_assign(prv, wvc->item, cpu);
                     list_del_init(&wvc->waitq_elem);
-                    ret.task = wvc->vcpu->sched_item;
+                    ret.task = wvc->item;
                     goto unlock;
                 }
             }
@@ -794,17 +787,17 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     }
 
     if ( unlikely(ret.task == NULL || !item_runnable(ret.task)) )
-        ret.task = idle_vcpu[cpu]->sched_item;
+        ret.task = sched_idle_item(cpu);
 
-    NULL_VCPU_CHECK(ret.task->vcpu);
+    NULL_ITEM_CHECK(ret.task);
     return ret;
 }
 
-static inline void dump_vcpu(struct null_private *prv, struct null_item *nvc)
+static inline void dump_item(struct null_private *prv, struct null_item *nvc)
 {
-    printk("[%i.%i] pcpu=%d", nvc->vcpu->domain->domain_id,
-            nvc->vcpu->vcpu_id, list_empty(&nvc->waitq_elem) ?
-                                nvc->vcpu->processor : -1);
+    printk("[%i.%i] pcpu=%d", nvc->item->domain->domain_id,
+            nvc->item->item_id, list_empty(&nvc->waitq_elem) ?
+                                sched_item_cpu(nvc->item) : -1);
 }
 
 static void null_dump_pcpu(const struct scheduler *ops, int cpu)
@@ -820,16 +813,17 @@ static void null_dump_pcpu(const struct scheduler *ops, int cpu)
            cpu,
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_sibling_mask, cpu)),
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
-    if ( per_cpu(npc, cpu).vcpu != NULL )
-        printk(", vcpu=%pv", per_cpu(npc, cpu).vcpu);
+    if ( per_cpu(npc, cpu).item != NULL )
+        printk(", item=%pdv%d", per_cpu(npc, cpu).item->domain,
+               per_cpu(npc, cpu).item->item_id);
     printk("\n");
 
-    /* current VCPU (nothing to say if that's the idle vcpu) */
+    /* current item (nothing to say if that's the idle item) */
     nvc = null_item(curr_on_cpu(cpu));
-    if ( nvc && !is_idle_vcpu(nvc->vcpu) )
+    if ( nvc && !is_idle_item(nvc->item) )
     {
         printk("\trun: ");
-        dump_vcpu(prv, nvc);
+        dump_item(prv, nvc);
         printk("\n");
     }
 
@@ -852,23 +846,23 @@ static void null_dump(const struct scheduler *ops)
     list_for_each( iter, &prv->ndom )
     {
         struct null_dom *ndom;
-        struct vcpu *v;
+        struct sched_item *item;
 
         ndom = list_entry(iter, struct null_dom, ndom_elem);
 
         printk("\tDomain: %d\n", ndom->dom->domain_id);
-        for_each_vcpu( ndom->dom, v )
+        for_each_sched_item( ndom->dom, item )
         {
-            struct null_item * const nvc = null_item(v->sched_item);
+            struct null_item * const nvc = null_item(item);
             spinlock_t *lock;
 
-            lock = item_schedule_lock(nvc->vcpu->sched_item);
+            lock = item_schedule_lock(item);
 
             printk("\t%3d: ", ++loop);
-            dump_vcpu(prv, nvc);
+            dump_item(prv, nvc);
             printk("\n");
 
-            item_schedule_unlock(lock, nvc->vcpu->sched_item);
+            item_schedule_unlock(lock, item);
         }
     }
 
@@ -883,7 +877,7 @@ static void null_dump(const struct scheduler *ops)
             printk(", ");
         if ( loop % 24 == 0 )
             printk("\n\t");
-        printk("%pv", nvc->vcpu);
+        printk("%pdv%d", nvc->item->domain, nvc->item->item_id);
     }
     printk("\n");
     spin_unlock(&prv->waitq_lock);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 25/49] xen/sched: make rt scheduler vcpu agnostic.
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (23 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 24/49] xen/sched: make null scheduler vcpu agnostic Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 26/49] xen/sched: make credit " Juergen Gross
                   ` (29 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Meng Xu, Dario Faggioli

Switch rt scheduler completely from vcpu to sched_item usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_rt.c | 358 ++++++++++++++++++++++++--------------------------
 1 file changed, 175 insertions(+), 183 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 9efe807230..730aa292d4 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -36,7 +36,7 @@
  *
  * Migration compensation and resist like credit2 to better use cache;
  * Lock Holder Problem, using yield?
- * Self switch problem: VCPUs of the same domain may preempt each other;
+ * Self switch problem: ITEMs of the same domain may preempt each other;
  */
 
 /*
@@ -44,30 +44,30 @@
  *
  * This scheduler follows the Preemptive Global Earliest Deadline First (EDF)
  * theory in real-time field.
- * At any scheduling point, the VCPU with earlier deadline has higher priority.
- * The scheduler always picks highest priority VCPU to run on a feasible PCPU.
- * A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or
- * has a lower-priority VCPU running on it.)
+ * At any scheduling point, the ITEM with earlier deadline has higher priority.
+ * The scheduler always picks highest priority ITEM to run on a feasible PCPU.
+ * A PCPU is feasible if the ITEM can run on this PCPU and (the PCPU is idle or
+ * has a lower-priority ITEM running on it.)
  *
- * Each VCPU has a dedicated period, budget and a extratime flag
- * The deadline of a VCPU is at the end of each period;
- * A VCPU has its budget replenished at the beginning of each period;
- * While scheduled, a VCPU burns its budget.
- * The VCPU needs to finish its budget before its deadline in each period;
- * The VCPU discards its unused budget at the end of each period.
- * When a VCPU runs out of budget in a period, if its extratime flag is set,
- * the VCPU increases its priority_level by 1 and refills its budget; otherwise,
+ * Each ITEM has a dedicated period, budget and a extratime flag
+ * The deadline of an ITEM is at the end of each period;
+ * An ITEM has its budget replenished at the beginning of each period;
+ * While scheduled, an ITEM burns its budget.
+ * The ITEM needs to finish its budget before its deadline in each period;
+ * The ITEM discards its unused budget at the end of each period.
+ * When an ITEM runs out of budget in a period, if its extratime flag is set,
+ * the ITEM increases its priority_level by 1 and refills its budget; otherwise,
  * it has to wait until next period.
  *
- * Each VCPU is implemented as a deferable server.
- * When a VCPU has a task running on it, its budget is continuously burned;
- * When a VCPU has no task but with budget left, its budget is preserved.
+ * Each ITEM is implemented as a deferable server.
+ * When an ITEM has a task running on it, its budget is continuously burned;
+ * When an ITEM has no task but with budget left, its budget is preserved.
  *
  * Queue scheme:
  * A global runqueue and a global depletedqueue for each CPU pool.
- * The runqueue holds all runnable VCPUs with budget,
+ * The runqueue holds all runnable ITEMs with budget,
  * sorted by priority_level and deadline;
- * The depletedqueue holds all VCPUs without budget, unsorted;
+ * The depletedqueue holds all ITEMs without budget, unsorted;
  *
  * Note: cpumask and cpupool is supported.
  */
@@ -82,7 +82,7 @@
  * in schedule.c
  *
  * The functions involes RunQ and needs to grab locks are:
- *    vcpu_insert, vcpu_remove, context_saved, runq_insert
+ *    item_insert, item_remove, context_saved, runq_insert
  */
 
 
@@ -95,7 +95,7 @@
 
 /*
  * Max period: max delta of time type, because period is added to the time
- * a vcpu activates, so this must not overflow.
+ * an item activates, so this must not overflow.
  * Min period: 10 us, considering the scheduling overhead (when period is
  * too low, scheduling is invoked too frequently, causing high overhead).
  */
@@ -121,12 +121,12 @@
  * Flags
  */
 /*
- * RTDS_scheduled: Is this vcpu either running on, or context-switching off,
+ * RTDS_scheduled: Is this item either running on, or context-switching off,
  * a phyiscal cpu?
  * + Accessed only with global lock held.
  * + Set when chosen as next in rt_schedule().
  * + Cleared after context switch has been saved in rt_context_saved()
- * + Checked in vcpu_wake to see if we can add to the Runqueue, or if we should
+ * + Checked in item_wake to see if we can add to the Runqueue, or if we should
  *   set RTDS_delayed_runq_add
  * + Checked to be false in runq_insert.
  */
@@ -146,15 +146,15 @@
 /*
  * RTDS_depleted: Does this vcp run out of budget?
  * This flag is
- * + set in burn_budget() if a vcpu has zero budget left;
+ * + set in burn_budget() if an item has zero budget left;
  * + cleared and checked in the repenishment handler,
- *   for the vcpus that are being replenished.
+ *   for the items that are being replenished.
  */
 #define __RTDS_depleted     3
 #define RTDS_depleted (1<<__RTDS_depleted)
 
 /*
- * RTDS_extratime: Can the vcpu run in the time that is
+ * RTDS_extratime: Can the item run in the time that is
  * not part of any real-time reservation, and would therefore
  * be otherwise left idle?
  */
@@ -183,11 +183,11 @@ struct rt_private {
     spinlock_t lock;            /* the global coarse-grained lock */
     struct list_head sdom;      /* list of availalbe domains, used for dump */
 
-    struct list_head runq;      /* ordered list of runnable vcpus */
-    struct list_head depletedq; /* unordered list of depleted vcpus */
+    struct list_head runq;      /* ordered list of runnable items */
+    struct list_head depletedq; /* unordered list of depleted items */
 
     struct timer repl_timer;    /* replenishment timer */
-    struct list_head replq;     /* ordered list of vcpus that need replenishment */
+    struct list_head replq;     /* ordered list of items that need replenishment */
 
     cpumask_t tickled;          /* cpus been tickled */
 };
@@ -199,18 +199,18 @@ struct rt_item {
     struct list_head q_elem;     /* on the runq/depletedq list */
     struct list_head replq_elem; /* on the replenishment events list */
 
-    /* VCPU parameters, in nanoseconds */
+    /* ITEM parameters, in nanoseconds */
     s_time_t period;
     s_time_t budget;
 
-    /* VCPU current information in nanosecond */
+    /* ITEM current information in nanosecond */
     s_time_t cur_budget;         /* current budget */
     s_time_t last_start;         /* last start time */
     s_time_t cur_deadline;       /* current deadline for EDF */
 
     /* Up-pointers */
     struct rt_dom *sdom;
-    struct vcpu *vcpu;
+    struct sched_item *item;
 
     unsigned priority_level;
 
@@ -263,7 +263,7 @@ static inline bool has_extratime(const struct rt_item *svc)
  * and the replenishment events queue.
  */
 static int
-vcpu_on_q(const struct rt_item *svc)
+item_on_q(const struct rt_item *svc)
 {
    return !list_empty(&svc->q_elem);
 }
@@ -281,7 +281,7 @@ replq_elem(struct list_head *elem)
 }
 
 static int
-vcpu_on_replq(const struct rt_item *svc)
+item_on_replq(const struct rt_item *svc)
 {
     return !list_empty(&svc->replq_elem);
 }
@@ -291,7 +291,7 @@ vcpu_on_replq(const struct rt_item *svc)
  * Otherwise, return value < 0
  */
 static s_time_t
-compare_vcpu_priority(const struct rt_item *v1, const struct rt_item *v2)
+compare_item_priority(const struct rt_item *v1, const struct rt_item *v2)
 {
     int prio = v2->priority_level - v1->priority_level;
 
@@ -302,15 +302,15 @@ compare_vcpu_priority(const struct rt_item *v1, const struct rt_item *v2)
 }
 
 /*
- * Debug related code, dump vcpu/cpu information
+ * Debug related code, dump item/cpu information
  */
 static void
-rt_dump_vcpu(const struct scheduler *ops, const struct rt_item *svc)
+rt_dump_item(const struct scheduler *ops, const struct rt_item *svc)
 {
     cpumask_t *cpupool_mask, *mask;
 
     ASSERT(svc != NULL);
-    /* idle vcpu */
+    /* idle item */
     if( svc->sdom == NULL )
     {
         printk("\n");
@@ -321,20 +321,20 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_item *svc)
      * We can't just use 'cpumask_scratch' because the dumping can
      * happen from a pCPU outside of this scheduler's cpupool, and
      * hence it's not right to use its pCPU's scratch mask.
-     * On the other hand, it is safe to use svc->vcpu->processor's
+     * On the other hand, it is safe to use sched_item_cpu(svc->item)'s
      * own scratch space, since we hold the runqueue lock.
      */
-    mask = cpumask_scratch_cpu(svc->vcpu->processor);
+    mask = cpumask_scratch_cpu(sched_item_cpu(svc->item));
 
-    cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain);
-    cpumask_and(mask, cpupool_mask, svc->vcpu->sched_item->cpu_hard_affinity);
+    cpupool_mask = cpupool_domain_cpumask(svc->item->domain);
+    cpumask_and(mask, cpupool_mask, svc->item->cpu_hard_affinity);
     printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
            " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
            " \t\t priority_level=%d has_extratime=%d\n"
            " \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%*pbl\n",
-            svc->vcpu->domain->domain_id,
-            svc->vcpu->vcpu_id,
-            svc->vcpu->processor,
+            svc->item->domain->domain_id,
+            svc->item->item_id,
+            sched_item_cpu(svc->item),
             svc->period,
             svc->budget,
             svc->cur_budget,
@@ -342,8 +342,8 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_item *svc)
             svc->last_start,
             svc->priority_level,
             has_extratime(svc),
-            vcpu_on_q(svc),
-            vcpu_runnable(svc->vcpu),
+            item_on_q(svc),
+            item_runnable(svc->item),
             svc->flags,
             nr_cpu_ids, cpumask_bits(mask));
 }
@@ -357,11 +357,11 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu)
 
     spin_lock_irqsave(&prv->lock, flags);
     printk("CPU[%02d]\n", cpu);
-    /* current VCPU (nothing to say if that's the idle vcpu). */
+    /* current ITEM (nothing to say if that's the idle item). */
     svc = rt_item(curr_on_cpu(cpu));
-    if ( svc && !is_idle_vcpu(svc->vcpu) )
+    if ( svc && !is_idle_item(svc->item) )
     {
-        rt_dump_vcpu(ops, svc);
+        rt_dump_item(ops, svc);
     }
     spin_unlock_irqrestore(&prv->lock, flags);
 }
@@ -388,35 +388,35 @@ rt_dump(const struct scheduler *ops)
     list_for_each ( iter, runq )
     {
         svc = q_elem(iter);
-        rt_dump_vcpu(ops, svc);
+        rt_dump_item(ops, svc);
     }
 
     printk("Global DepletedQueue info:\n");
     list_for_each ( iter, depletedq )
     {
         svc = q_elem(iter);
-        rt_dump_vcpu(ops, svc);
+        rt_dump_item(ops, svc);
     }
 
     printk("Global Replenishment Events info:\n");
     list_for_each ( iter, replq )
     {
         svc = replq_elem(iter);
-        rt_dump_vcpu(ops, svc);
+        rt_dump_item(ops, svc);
     }
 
     printk("Domain info:\n");
     list_for_each ( iter, &prv->sdom )
     {
-        struct vcpu *v;
+        struct sched_item *item;
 
         sdom = list_entry(iter, struct rt_dom, sdom_elem);
         printk("\tdomain: %d\n", sdom->dom->domain_id);
 
-        for_each_vcpu ( sdom->dom, v )
+        for_each_sched_item ( sdom->dom, item )
         {
-            svc = rt_item(v->sched_item);
-            rt_dump_vcpu(ops, svc);
+            svc = rt_item(item);
+            rt_dump_item(ops, svc);
         }
     }
 
@@ -458,12 +458,12 @@ rt_update_deadline(s_time_t now, struct rt_item *svc)
     /* TRACE */
     {
         struct __packed {
-            unsigned vcpu:16, dom:16;
+            unsigned item:16, dom:16;
             unsigned priority_level;
             uint64_t cur_deadline, cur_budget;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->item->domain->domain_id;
+        d.item = svc->item->item_id;
         d.priority_level = svc->priority_level;
         d.cur_deadline = (uint64_t) svc->cur_deadline;
         d.cur_budget = (uint64_t) svc->cur_budget;
@@ -476,15 +476,15 @@ rt_update_deadline(s_time_t now, struct rt_item *svc)
 }
 
 /*
- * Helpers for removing and inserting a vcpu in a queue
- * that is being kept ordered by the vcpus' deadlines (as EDF
+ * Helpers for removing and inserting an item in a queue
+ * that is being kept ordered by the items' deadlines (as EDF
  * mandates).
  *
- * For callers' convenience, the vcpu removing helper returns
- * true if the vcpu removed was the one at the front of the
+ * For callers' convenience, the item removing helper returns
+ * true if the item removed was the one at the front of the
  * queue; similarly, the inserting helper returns true if the
  * inserted ended at the front of the queue (i.e., in both
- * cases, if the vcpu with the earliest deadline is what we
+ * cases, if the item with the earliest deadline is what we
  * are dealing with).
  */
 static inline bool
@@ -510,7 +510,7 @@ deadline_queue_insert(struct rt_item * (*qelem)(struct list_head *),
     list_for_each ( iter, queue )
     {
         struct rt_item * iter_svc = (*qelem)(iter);
-        if ( compare_vcpu_priority(svc, iter_svc) > 0 )
+        if ( compare_item_priority(svc, iter_svc) > 0 )
             break;
         pos++;
     }
@@ -525,7 +525,7 @@ deadline_queue_insert(struct rt_item * (*qelem)(struct list_head *),
 static inline void
 q_remove(struct rt_item *svc)
 {
-    ASSERT( vcpu_on_q(svc) );
+    ASSERT( item_on_q(svc) );
     list_del_init(&svc->q_elem);
 }
 
@@ -535,14 +535,14 @@ replq_remove(const struct scheduler *ops, struct rt_item *svc)
     struct rt_private *prv = rt_priv(ops);
     struct list_head *replq = rt_replq(ops);
 
-    ASSERT( vcpu_on_replq(svc) );
+    ASSERT( item_on_replq(svc) );
 
     if ( deadline_queue_remove(replq, &svc->replq_elem) )
     {
         /*
          * The replenishment timer needs to be set to fire when a
-         * replenishment for the vcpu at the front of the replenishment
-         * queue is due. If it is such vcpu that we just removed, we may
+         * replenishment for the item at the front of the replenishment
+         * queue is due. If it is such item that we just removed, we may
          * need to reprogram the timer.
          */
         if ( !list_empty(replq) )
@@ -557,7 +557,7 @@ replq_remove(const struct scheduler *ops, struct rt_item *svc)
 
 /*
  * Insert svc with budget in RunQ according to EDF:
- * vcpus with smaller deadlines go first.
+ * items with smaller deadlines go first.
  * Insert svc without budget in DepletedQ unsorted;
  */
 static void
@@ -567,8 +567,8 @@ runq_insert(const struct scheduler *ops, struct rt_item *svc)
     struct list_head *runq = rt_runq(ops);
 
     ASSERT( spin_is_locked(&prv->lock) );
-    ASSERT( !vcpu_on_q(svc) );
-    ASSERT( vcpu_on_replq(svc) );
+    ASSERT( !item_on_q(svc) );
+    ASSERT( item_on_replq(svc) );
 
     /* add svc to runq if svc still has budget or its extratime is set */
     if ( svc->cur_budget > 0 ||
@@ -584,7 +584,7 @@ replq_insert(const struct scheduler *ops, struct rt_item *svc)
     struct list_head *replq = rt_replq(ops);
     struct rt_private *prv = rt_priv(ops);
 
-    ASSERT( !vcpu_on_replq(svc) );
+    ASSERT( !item_on_replq(svc) );
 
     /*
      * The timer may be re-programmed if svc is inserted
@@ -607,12 +607,12 @@ replq_reinsert(const struct scheduler *ops, struct rt_item *svc)
     struct rt_item *rearm_svc = svc;
     bool_t rearm = 0;
 
-    ASSERT( vcpu_on_replq(svc) );
+    ASSERT( item_on_replq(svc) );
 
     /*
      * If svc was at the front of the replenishment queue, we certainly
      * need to re-program the timer, and we want to use the deadline of
-     * the vcpu which is now at the front of the queue (which may still
+     * the item which is now at the front of the queue (which may still
      * be svc or not).
      *
      * We may also need to re-program, if svc has been put at the front
@@ -632,24 +632,23 @@ replq_reinsert(const struct scheduler *ops, struct rt_item *svc)
 }
 
 /*
- * Pick a valid resource for the vcpu vc
- * Valid resource of a vcpu is intesection of vcpu's affinity
+ * Pick a valid resource for the item vc
+ * Valid resource of an item is intesection of item's affinity
  * and available resources
  */
 static struct sched_resource *
 rt_res_pick(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     cpumask_t cpus;
     cpumask_t *online;
     int cpu;
 
-    online = cpupool_domain_cpumask(vc->domain);
+    online = cpupool_domain_cpumask(item->domain);
     cpumask_and(&cpus, online, item->cpu_hard_affinity);
 
-    cpu = cpumask_test_cpu(vc->processor, &cpus)
-            ? vc->processor
-            : cpumask_cycle(vc->processor, &cpus);
+    cpu = cpumask_test_cpu(sched_item_cpu(item), &cpus)
+            ? sched_item_cpu(item)
+            : cpumask_cycle(sched_item_cpu(item), &cpus);
     ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) );
 
     return per_cpu(sched_res, cpu);
@@ -737,7 +736,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     struct rt_private *prv = rt_priv(new_ops);
     struct rt_item *svc = vdata;
 
-    ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu));
+    ASSERT(!pdata && svc && is_idle_item(svc->item));
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -761,7 +760,7 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
         dprintk(XENLOG_DEBUG, "RTDS: timer initialized on cpu %u\n", cpu);
     }
 
-    idle_vcpu[cpu]->sched_item->priv = vdata;
+    sched_idle_item(cpu)->priv = vdata;
     per_cpu(scheduler, cpu) = new_ops;
     per_cpu(sched_res, cpu)->sched_priv = NULL; /* no pdata */
 
@@ -849,10 +848,9 @@ rt_free_domdata(const struct scheduler *ops, void *data)
 static void *
 rt_alloc_vdata(const struct scheduler *ops, struct sched_item *item, void *dd)
 {
-    struct vcpu *vc = item->vcpu;
     struct rt_item *svc;
 
-    /* Allocate per-VCPU info */
+    /* Allocate per-ITEM info */
     svc = xzalloc(struct rt_item);
     if ( svc == NULL )
         return NULL;
@@ -861,13 +859,13 @@ rt_alloc_vdata(const struct scheduler *ops, struct sched_item *item, void *dd)
     INIT_LIST_HEAD(&svc->replq_elem);
     svc->flags = 0U;
     svc->sdom = dd;
-    svc->vcpu = vc;
+    svc->item = item;
     svc->last_start = 0;
 
     __set_bit(__RTDS_extratime, &svc->flags);
     svc->priority_level = 0;
     svc->period = RTDS_DEFAULT_PERIOD;
-    if ( !is_idle_vcpu(vc) )
+    if ( !is_idle_item(item) )
         svc->budget = RTDS_DEFAULT_BUDGET;
 
     SCHED_STAT_CRANK(item_alloc);
@@ -887,22 +885,20 @@ rt_free_vdata(const struct scheduler *ops, void *priv)
  * It is called in sched_move_domain() and sched_init_vcpu
  * in schedule.c.
  * When move a domain to a new cpupool.
- * It inserts vcpus of moving domain to the scheduler's RunQ in
+ * It inserts items of moving domain to the scheduler's RunQ in
  * dest. cpupool.
  */
 static void
 rt_item_insert(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct rt_item *svc = rt_item(item);
     s_time_t now;
     spinlock_t *lock;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_item(item) );
 
-    /* This is safe because vc isn't yet being scheduled */
-    item->res = rt_res_pick(ops, item);
-    vc->processor = item->res->processor;
+    /* This is safe because item isn't yet being scheduled */
+    sched_set_res(item, rt_res_pick(ops, item));
 
     lock = item_schedule_lock_irq(item);
 
@@ -910,11 +906,11 @@ rt_item_insert(const struct scheduler *ops, struct sched_item *item)
     if ( now >= svc->cur_deadline )
         rt_update_deadline(now, svc);
 
-    if ( !vcpu_on_q(svc) && vcpu_runnable(vc) )
+    if ( !item_on_q(svc) && item_runnable(item) )
     {
         replq_insert(ops, svc);
 
-        if ( !vcpu_running(vc) )
+        if ( !item->is_running )
             runq_insert(ops, svc);
     }
     item_schedule_unlock_irq(lock, item);
@@ -937,10 +933,10 @@ rt_item_remove(const struct scheduler *ops, struct sched_item *item)
     BUG_ON( sdom == NULL );
 
     lock = item_schedule_lock_irq(item);
-    if ( vcpu_on_q(svc) )
+    if ( item_on_q(svc) )
         q_remove(svc);
 
-    if ( vcpu_on_replq(svc) )
+    if ( item_on_replq(svc) )
         replq_remove(ops,svc);
 
     item_schedule_unlock_irq(lock, item);
@@ -954,8 +950,8 @@ burn_budget(const struct scheduler *ops, struct rt_item *svc, s_time_t now)
 {
     s_time_t delta;
 
-    /* don't burn budget for idle VCPU */
-    if ( is_idle_vcpu(svc->vcpu) )
+    /* don't burn budget for idle ITEM */
+    if ( is_idle_item(svc->item) )
         return;
 
     /* burn at nanoseconds level */
@@ -992,14 +988,14 @@ burn_budget(const struct scheduler *ops, struct rt_item *svc, s_time_t now)
     /* TRACE */
     {
         struct __packed {
-            unsigned vcpu:16, dom:16;
+            unsigned item:16, dom:16;
             uint64_t cur_budget;
             int delta;
             unsigned priority_level;
             bool has_extratime;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->item->domain->domain_id;
+        d.item = svc->item->item_id;
         d.cur_budget = (uint64_t) svc->cur_budget;
         d.delta = delta;
         d.priority_level = svc->priority_level;
@@ -1029,9 +1025,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
         iter_svc = q_elem(iter);
 
         /* mask cpu_hard_affinity & cpupool & mask */
-        online = cpupool_domain_cpumask(iter_svc->vcpu->domain);
-        cpumask_and(&cpu_common, online,
-                    iter_svc->vcpu->sched_item->cpu_hard_affinity);
+        online = cpupool_domain_cpumask(iter_svc->item->domain);
+        cpumask_and(&cpu_common, online, iter_svc->item->cpu_hard_affinity);
         cpumask_and(&cpu_common, mask, &cpu_common);
         if ( cpumask_empty(&cpu_common) )
             continue;
@@ -1047,11 +1042,11 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
         if( svc != NULL )
         {
             struct __packed {
-                unsigned vcpu:16, dom:16;
+                unsigned item:16, dom:16;
                 uint64_t cur_deadline, cur_budget;
             } d;
-            d.dom = svc->vcpu->domain->domain_id;
-            d.vcpu = svc->vcpu->vcpu_id;
+            d.dom = svc->item->domain->domain_id;
+            d.item = svc->item->item_id;
             d.cur_deadline = (uint64_t) svc->cur_deadline;
             d.cur_budget = (uint64_t) svc->cur_budget;
             trace_var(TRC_RTDS_RUNQ_PICK, 1,
@@ -1075,6 +1070,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     struct rt_item *const scurr = rt_item(current->sched_item);
     struct rt_item *snext = NULL;
     struct task_slice ret = { .migrated = 0 };
+    struct sched_item *curritem = current->sched_item;
 
     /* TRACE */
     {
@@ -1084,7 +1080,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
         d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
-        d.idle = is_idle_vcpu(current);
+        d.idle = is_idle_item(curritem);
         trace_var(TRC_RTDS_SCHEDULE, 1,
                   sizeof(d),
                   (unsigned char *)&d);
@@ -1093,72 +1089,70 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     /* clear ticked bit now that we've been scheduled */
     cpumask_clear_cpu(cpu, &prv->tickled);
 
-    /* burn_budget would return for IDLE VCPU */
+    /* burn_budget would return for IDLE ITEM */
     burn_budget(ops, scurr, now);
 
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0,  NULL);
-        snext = rt_item(idle_vcpu[cpu]->sched_item);
+        snext = rt_item(sched_idle_item(cpu));
     }
     else
     {
         snext = runq_pick(ops, cpumask_of(cpu));
         if ( snext == NULL )
-            snext = rt_item(idle_vcpu[cpu]->sched_item);
+            snext = rt_item(sched_idle_item(cpu));
 
         /* if scurr has higher priority and budget, still pick scurr */
-        if ( !is_idle_vcpu(current) &&
-             vcpu_runnable(current) &&
+        if ( !is_idle_item(curritem) &&
+             item_runnable(curritem) &&
              scurr->cur_budget > 0 &&
-             ( is_idle_vcpu(snext->vcpu) ||
-               compare_vcpu_priority(scurr, snext) > 0 ) )
+             ( is_idle_item(snext->item) ||
+               compare_item_priority(scurr, snext) > 0 ) )
             snext = scurr;
     }
 
     if ( snext != scurr &&
-         !is_idle_vcpu(current) &&
-         vcpu_runnable(current) )
+         !is_idle_item(curritem) &&
+         item_runnable(curritem) )
         __set_bit(__RTDS_delayed_runq_add, &scurr->flags);
 
     snext->last_start = now;
-    ret.time =  -1; /* if an idle vcpu is picked */
-    if ( !is_idle_vcpu(snext->vcpu) )
+    ret.time =  -1; /* if an idle item is picked */
+    if ( !is_idle_item(snext->item) )
     {
         if ( snext != scurr )
         {
             q_remove(snext);
             __set_bit(__RTDS_scheduled, &snext->flags);
         }
-        if ( snext->vcpu->processor != cpu )
+        if ( sched_item_cpu(snext->item) != cpu )
         {
-            snext->vcpu->processor = cpu;
-            snext->vcpu->sched_item->res = per_cpu(sched_res, cpu);
+            sched_set_res(snext->item, per_cpu(sched_res, cpu));
             ret.migrated = 1;
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
     }
-    ret.task = snext->vcpu->sched_item;
+    ret.task = snext->item;
 
     return ret;
 }
 
 /*
- * Remove VCPU from RunQ
+ * Remove ITEM from RunQ
  * The lock is already grabbed in schedule.c, no need to lock here
  */
 static void
 rt_item_sleep(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct rt_item * const svc = rt_item(item);
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_item(item) );
     SCHED_STAT_CRANK(item_sleep);
 
-    if ( curr_on_cpu(vc->processor) == item )
-        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
-    else if ( vcpu_on_q(svc) )
+    if ( curr_on_cpu(sched_item_cpu(item)) == item )
+        cpu_raise_softirq(sched_item_cpu(item), SCHEDULE_SOFTIRQ);
+    else if ( item_on_q(svc) )
     {
         q_remove(svc);
         replq_remove(ops, svc);
@@ -1168,20 +1162,20 @@ rt_item_sleep(const struct scheduler *ops, struct sched_item *item)
 }
 
 /*
- * Pick a cpu where to run a vcpu,
- * possibly kicking out the vcpu running there
+ * Pick a cpu where to run an item,
+ * possibly kicking out the item running there
  * Called by wake() and context_saved()
  * We have a running candidate here, the kick logic is:
  * Among all the cpus that are within the cpu affinity
  * 1) if there are any idle CPUs, kick one.
       For cache benefit, we check new->cpu as first
  * 2) now all pcpus are busy;
- *    among all the running vcpus, pick lowest priority one
+ *    among all the running items, pick lowest priority one
  *    if snext has higher priority, kick it.
  *
  * TODO:
- * 1) what if these two vcpus belongs to the same domain?
- *    replace a vcpu belonging to the same domain introduces more overhead
+ * 1) what if these two items belongs to the same domain?
+ *    replace an item belonging to the same domain introduces more overhead
  *
  * lock is grabbed before calling this function
  */
@@ -1189,18 +1183,18 @@ static void
 runq_tickle(const struct scheduler *ops, struct rt_item *new)
 {
     struct rt_private *prv = rt_priv(ops);
-    struct rt_item *latest_deadline_vcpu = NULL; /* lowest priority */
+    struct rt_item *latest_deadline_item = NULL; /* lowest priority */
     struct rt_item *iter_svc;
-    struct vcpu *iter_vc;
+    struct sched_item *iter_item;
     int cpu = 0, cpu_to_tickle = 0;
     cpumask_t not_tickled;
     cpumask_t *online;
 
-    if ( new == NULL || is_idle_vcpu(new->vcpu) )
+    if ( new == NULL || is_idle_item(new->item) )
         return;
 
-    online = cpupool_domain_cpumask(new->vcpu->domain);
-    cpumask_and(&not_tickled, online, new->vcpu->sched_item->cpu_hard_affinity);
+    online = cpupool_domain_cpumask(new->item->domain);
+    cpumask_and(&not_tickled, online, new->item->cpu_hard_affinity);
     cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
 
     /*
@@ -1208,31 +1202,31 @@ runq_tickle(const struct scheduler *ops, struct rt_item *new)
      *    For cache benefit,we first search new->cpu.
      *    The same loop also find the one with lowest priority.
      */
-    cpu = cpumask_test_or_cycle(new->vcpu->processor, &not_tickled);
+    cpu = cpumask_test_or_cycle(sched_item_cpu(new->item), &not_tickled);
     while ( cpu!= nr_cpu_ids )
     {
-        iter_vc = curr_on_cpu(cpu)->vcpu;
-        if ( is_idle_vcpu(iter_vc) )
+        iter_item = curr_on_cpu(cpu);
+        if ( is_idle_item(iter_item) )
         {
             SCHED_STAT_CRANK(tickled_idle_cpu);
             cpu_to_tickle = cpu;
             goto out;
         }
-        iter_svc = rt_item(iter_vc->sched_item);
-        if ( latest_deadline_vcpu == NULL ||
-             compare_vcpu_priority(iter_svc, latest_deadline_vcpu) < 0 )
-            latest_deadline_vcpu = iter_svc;
+        iter_svc = rt_item(iter_item);
+        if ( latest_deadline_item == NULL ||
+             compare_item_priority(iter_svc, latest_deadline_item) < 0 )
+            latest_deadline_item = iter_svc;
 
         cpumask_clear_cpu(cpu, &not_tickled);
         cpu = cpumask_cycle(cpu, &not_tickled);
     }
 
-    /* 2) candicate has higher priority, kick out lowest priority vcpu */
-    if ( latest_deadline_vcpu != NULL &&
-         compare_vcpu_priority(latest_deadline_vcpu, new) < 0 )
+    /* 2) candicate has higher priority, kick out lowest priority item */
+    if ( latest_deadline_item != NULL &&
+         compare_item_priority(latest_deadline_item, new) < 0 )
     {
         SCHED_STAT_CRANK(tickled_busy_cpu);
-        cpu_to_tickle = latest_deadline_vcpu->vcpu->processor;
+        cpu_to_tickle = sched_item_cpu(latest_deadline_item->item);
         goto out;
     }
 
@@ -1258,35 +1252,34 @@ runq_tickle(const struct scheduler *ops, struct rt_item *new)
 }
 
 /*
- * Should always wake up runnable vcpu, put it back to RunQ.
+ * Should always wake up runnable item, put it back to RunQ.
  * Check priority to raise interrupt
  * The lock is already grabbed in schedule.c, no need to lock here
- * TODO: what if these two vcpus belongs to the same domain?
+ * TODO: what if these two items belongs to the same domain?
  */
 static void
 rt_item_wake(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct rt_item * const svc = rt_item(item);
     s_time_t now;
     bool_t missed;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_item(item) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == item) )
+    if ( unlikely(curr_on_cpu(sched_item_cpu(item)) == item) )
     {
         SCHED_STAT_CRANK(item_wake_running);
         return;
     }
 
     /* on RunQ/DepletedQ, just update info is ok */
-    if ( unlikely(vcpu_on_q(svc)) )
+    if ( unlikely(item_on_q(svc)) )
     {
         SCHED_STAT_CRANK(item_wake_onrunq);
         return;
     }
 
-    if ( likely(vcpu_runnable(vc)) )
+    if ( likely(item_runnable(item)) )
         SCHED_STAT_CRANK(item_wake_runnable);
     else
         SCHED_STAT_CRANK(item_wake_not_runnable);
@@ -1302,16 +1295,16 @@ rt_item_wake(const struct scheduler *ops, struct sched_item *item)
         rt_update_deadline(now, svc);
 
     /*
-     * If context hasn't been saved for this vcpu yet, we can't put it on
+     * If context hasn't been saved for this item yet, we can't put it on
      * the run-queue/depleted-queue. Instead, we set the appropriate flag,
-     * the vcpu will be put back on queue after the context has been saved
+     * the item will be put back on queue after the context has been saved
      * (in rt_context_save()).
      */
     if ( unlikely(svc->flags & RTDS_scheduled) )
     {
         __set_bit(__RTDS_delayed_runq_add, &svc->flags);
         /*
-         * The vcpu is waking up already, and we didn't even had the time to
+         * The item is waking up already, and we didn't even had the time to
          * remove its next replenishment event from the replenishment queue
          * when it blocked! No big deal. If we did not miss the deadline in
          * the meantime, let's just leave it there. If we did, let's remove it
@@ -1332,22 +1325,21 @@ rt_item_wake(const struct scheduler *ops, struct sched_item *item)
 
 /*
  * scurr has finished context switch, insert it back to the RunQ,
- * and then pick the highest priority vcpu from runq to run
+ * and then pick the highest priority item from runq to run
  */
 static void
 rt_context_saved(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct rt_item *svc = rt_item(item);
     spinlock_t *lock = item_schedule_lock_irq(item);
 
     __clear_bit(__RTDS_scheduled, &svc->flags);
-    /* not insert idle vcpu to runq */
-    if ( is_idle_vcpu(vc) )
+    /* not insert idle item to runq */
+    if ( is_idle_item(item) )
         goto out;
 
     if ( __test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) &&
-         likely(vcpu_runnable(vc)) )
+         likely(item_runnable(item)) )
     {
         runq_insert(ops, svc);
         runq_tickle(ops, svc);
@@ -1360,7 +1352,7 @@ out:
 }
 
 /*
- * set/get each vcpu info of each domain
+ * set/get each item info of each domain
  */
 static int
 rt_dom_cntl(
@@ -1370,7 +1362,7 @@ rt_dom_cntl(
 {
     struct rt_private *prv = rt_priv(ops);
     struct rt_item *svc;
-    struct vcpu *v;
+    struct sched_item *item;
     unsigned long flags;
     int rc = 0;
     struct xen_domctl_schedparam_vcpu local_sched;
@@ -1391,9 +1383,9 @@ rt_dom_cntl(
             break;
         }
         spin_lock_irqsave(&prv->lock, flags);
-        for_each_vcpu ( d, v )
+        for_each_sched_item ( d, item )
         {
-            svc = rt_item(v->sched_item);
+            svc = rt_item(item);
             svc->period = MICROSECS(op->u.rtds.period); /* transfer to nanosec */
             svc->budget = MICROSECS(op->u.rtds.budget);
         }
@@ -1461,7 +1453,7 @@ rt_dom_cntl(
                 break;
         }
         if ( !rc )
-            /* notify upper caller how many vcpus have been processed. */
+            /* notify upper caller how many items have been processed. */
             op->u.v.nr_vcpus = index;
         break;
     }
@@ -1470,7 +1462,7 @@ rt_dom_cntl(
 }
 
 /*
- * The replenishment timer handler picks vcpus
+ * The replenishment timer handler picks items
  * from the replq and does the actual replenishment.
  */
 static void repl_timer_handler(void *data){
@@ -1488,7 +1480,7 @@ static void repl_timer_handler(void *data){
     now = NOW();
 
     /*
-     * Do the replenishment and move replenished vcpus
+     * Do the replenishment and move replenished items
      * to the temporary list to tickle.
      * If svc is on run queue, we need to put it at
      * the correct place since its deadline changes.
@@ -1504,7 +1496,7 @@ static void repl_timer_handler(void *data){
         rt_update_deadline(now, svc);
         list_add(&svc->replq_elem, &tmp_replq);
 
-        if ( vcpu_on_q(svc) )
+        if ( item_on_q(svc) )
         {
             q_remove(svc);
             runq_insert(ops, svc);
@@ -1512,26 +1504,26 @@ static void repl_timer_handler(void *data){
     }
 
     /*
-     * Iterate through the list of updated vcpus.
-     * If an updated vcpu is running, tickle the head of the
+     * Iterate through the list of updated items.
+     * If an updated item is running, tickle the head of the
      * runqueue if it has a higher priority.
-     * If an updated vcpu was depleted and on the runqueue, tickle it.
-     * Finally, reinsert the vcpus back to replenishement events list.
+     * If an updated item was depleted and on the runqueue, tickle it.
+     * Finally, reinsert the items back to replenishement events list.
      */
     list_for_each_safe ( iter, tmp, &tmp_replq )
     {
         svc = replq_elem(iter);
 
-        if ( curr_on_cpu(svc->vcpu->processor) == svc->vcpu->sched_item &&
+        if ( curr_on_cpu(sched_item_cpu(svc->item)) == svc->item &&
              !list_empty(runq) )
         {
             struct rt_item *next_on_runq = q_elem(runq->next);
 
-            if ( compare_vcpu_priority(svc, next_on_runq) < 0 )
+            if ( compare_item_priority(svc, next_on_runq) < 0 )
                 runq_tickle(ops, next_on_runq);
         }
         else if ( __test_and_clear_bit(__RTDS_depleted, &svc->flags) &&
-                  vcpu_on_q(svc) )
+                  item_on_q(svc) )
             runq_tickle(ops, svc);
 
         list_del(&svc->replq_elem);
@@ -1539,7 +1531,7 @@ static void repl_timer_handler(void *data){
     }
 
     /*
-     * If there are vcpus left in the replenishment event list,
+     * If there are items left in the replenishment event list,
      * set the next replenishment to happen at the deadline of
      * the one in the front.
      */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 26/49] xen/sched: make credit scheduler vcpu agnostic.
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (24 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 25/49] xen/sched: make rt " Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 27/49] xen/sched: make credit2 " Juergen Gross
                   ` (28 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Switch credit scheduler completely from vcpu to sched_item usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c | 502 +++++++++++++++++++++++-----------------------
 1 file changed, 251 insertions(+), 251 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 6d0639109a..babccb69f7 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -70,10 +70,10 @@
  * inconsistent set of locks. Therefore atomic-safe bit operations must
  * be used for accessing it.
  */
-#define CSCHED_FLAG_VCPU_PARKED    0x0  /* VCPU over capped credits */
-#define CSCHED_FLAG_VCPU_YIELD     0x1  /* VCPU yielding */
-#define CSCHED_FLAG_VCPU_MIGRATING 0x2  /* VCPU may have moved to a new pcpu */
-#define CSCHED_FLAG_VCPU_PINNED    0x4  /* VCPU can run only on 1 pcpu */
+#define CSCHED_FLAG_ITEM_PARKED    0x0  /* ITEM over capped credits */
+#define CSCHED_FLAG_ITEM_YIELD     0x1  /* ITEM yielding */
+#define CSCHED_FLAG_ITEM_MIGRATING 0x2  /* ITEM may have moved to a new pcpu */
+#define CSCHED_FLAG_ITEM_PINNED    0x4  /* ITEM can run only on 1 pcpu */
 
 
 /*
@@ -91,7 +91,7 @@
 /*
  * CSCHED_STATS
  *
- * Manage very basic per-vCPU counters and stats.
+ * Manage very basic per-item counters and stats.
  *
  * Useful for debugging live systems. The stats are displayed
  * with runq dumps ('r' on the Xen console).
@@ -100,23 +100,23 @@
 
 #define CSCHED_STATS
 
-#define SCHED_VCPU_STATS_RESET(_V)                      \
+#define SCHED_ITEM_STATS_RESET(_V)                      \
     do                                                  \
     {                                                   \
         memset(&(_V)->stats, 0, sizeof((_V)->stats));   \
     } while ( 0 )
 
-#define SCHED_VCPU_STAT_CRANK(_V, _X)       (((_V)->stats._X)++)
+#define SCHED_ITEM_STAT_CRANK(_V, _X)       (((_V)->stats._X)++)
 
-#define SCHED_VCPU_STAT_SET(_V, _X, _Y)     (((_V)->stats._X) = (_Y))
+#define SCHED_ITEM_STAT_SET(_V, _X, _Y)     (((_V)->stats._X) = (_Y))
 
 #else /* !SCHED_STATS */
 
 #undef CSCHED_STATS
 
-#define SCHED_VCPU_STATS_RESET(_V)         do {} while ( 0 )
-#define SCHED_VCPU_STAT_CRANK(_V, _X)      do {} while ( 0 )
-#define SCHED_VCPU_STAT_SET(_V, _X, _Y)    do {} while ( 0 )
+#define SCHED_ITEM_STATS_RESET(_V)         do {} while ( 0 )
+#define SCHED_ITEM_STAT_CRANK(_V, _X)      do {} while ( 0 )
+#define SCHED_ITEM_STAT_SET(_V, _X, _Y)    do {} while ( 0 )
 
 #endif /* SCHED_STATS */
 
@@ -128,7 +128,7 @@
 #define TRC_CSCHED_SCHED_TASKLET TRC_SCHED_CLASS_EVT(CSCHED, 1)
 #define TRC_CSCHED_ACCOUNT_START TRC_SCHED_CLASS_EVT(CSCHED, 2)
 #define TRC_CSCHED_ACCOUNT_STOP  TRC_SCHED_CLASS_EVT(CSCHED, 3)
-#define TRC_CSCHED_STOLEN_VCPU   TRC_SCHED_CLASS_EVT(CSCHED, 4)
+#define TRC_CSCHED_STOLEN_ITEM   TRC_SCHED_CLASS_EVT(CSCHED, 4)
 #define TRC_CSCHED_PICKED_CPU    TRC_SCHED_CLASS_EVT(CSCHED, 5)
 #define TRC_CSCHED_TICKLE        TRC_SCHED_CLASS_EVT(CSCHED, 6)
 #define TRC_CSCHED_BOOST_START   TRC_SCHED_CLASS_EVT(CSCHED, 7)
@@ -158,15 +158,15 @@ struct csched_pcpu {
 };
 
 /*
- * Virtual CPU
+ * Virtual ITEM
  */
 struct csched_item {
     struct list_head runq_elem;
-    struct list_head active_vcpu_elem;
+    struct list_head active_item_elem;
 
     /* Up-pointers */
     struct csched_dom *sdom;
-    struct vcpu *vcpu;
+    struct sched_item *item;
 
     s_time_t start_time;   /* When we were scheduled (used for credit) */
     unsigned flags;
@@ -192,10 +192,10 @@ struct csched_item {
  * Domain
  */
 struct csched_dom {
-    struct list_head active_vcpu;
+    struct list_head active_item;
     struct list_head active_sdom_elem;
     struct domain *dom;
-    uint16_t active_vcpu_count;
+    uint16_t active_item_count;
     uint16_t weight;
     uint16_t cap;
 };
@@ -215,7 +215,7 @@ struct csched_private {
 
     /* Period of master and tick in milliseconds */
     unsigned int tick_period_us, ticks_per_tslice;
-    s_time_t ratelimit, tslice, vcpu_migr_delay;
+    s_time_t ratelimit, tslice, item_migr_delay;
 
     struct list_head active_sdom;
     uint32_t weight;
@@ -231,7 +231,7 @@ static void csched_tick(void *_cpu);
 static void csched_acct(void *dummy);
 
 static inline int
-__vcpu_on_runq(struct csched_item *svc)
+__item_on_runq(struct csched_item *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
@@ -242,7 +242,7 @@ __runq_elem(struct list_head *elem)
     return list_entry(elem, struct csched_item, runq_elem);
 }
 
-/* Is the first element of cpu's runq (if any) cpu's idle vcpu? */
+/* Is the first element of cpu's runq (if any) cpu's idle item? */
 static inline bool_t is_runq_idle(unsigned int cpu)
 {
     /*
@@ -251,7 +251,7 @@ static inline bool_t is_runq_idle(unsigned int cpu)
     ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
     return list_empty(RUNQ(cpu)) ||
-           is_idle_vcpu(__runq_elem(RUNQ(cpu)->next)->vcpu);
+           is_idle_item(__runq_elem(RUNQ(cpu)->next)->item);
 }
 
 static inline void
@@ -273,11 +273,11 @@ dec_nr_runnable(unsigned int cpu)
 static inline void
 __runq_insert(struct csched_item *svc)
 {
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_item_cpu(svc->item);
     const struct list_head * const runq = RUNQ(cpu);
     struct list_head *iter;
 
-    BUG_ON( __vcpu_on_runq(svc) );
+    BUG_ON( __item_on_runq(svc) );
 
     list_for_each( iter, runq )
     {
@@ -286,10 +286,10 @@ __runq_insert(struct csched_item *svc)
             break;
     }
 
-    /* If the vcpu yielded, try to put it behind one lower-priority
-     * runnable vcpu if we can.  The next runq_sort will bring it forward
+    /* If the item yielded, try to put it behind one lower-priority
+     * runnable item if we can.  The next runq_sort will bring it forward
      * within 30ms if the queue too long. */
-    if ( test_bit(CSCHED_FLAG_VCPU_YIELD, &svc->flags)
+    if ( test_bit(CSCHED_FLAG_ITEM_YIELD, &svc->flags)
          && __runq_elem(iter)->pri > CSCHED_PRI_IDLE )
     {
         iter=iter->next;
@@ -305,20 +305,20 @@ static inline void
 runq_insert(struct csched_item *svc)
 {
     __runq_insert(svc);
-    inc_nr_runnable(svc->vcpu->processor);
+    inc_nr_runnable(sched_item_cpu(svc->item));
 }
 
 static inline void
 __runq_remove(struct csched_item *svc)
 {
-    BUG_ON( !__vcpu_on_runq(svc) );
+    BUG_ON( !__item_on_runq(svc) );
     list_del_init(&svc->runq_elem);
 }
 
 static inline void
 runq_remove(struct csched_item *svc)
 {
-    dec_nr_runnable(svc->vcpu->processor);
+    dec_nr_runnable(sched_item_cpu(svc->item));
     __runq_remove(svc);
 }
 
@@ -329,7 +329,7 @@ static void burn_credits(struct csched_item *svc, s_time_t now)
     unsigned int credits;
 
     /* Assert svc is current */
-    ASSERT( svc == CSCHED_ITEM(curr_on_cpu(svc->vcpu->processor)) );
+    ASSERT( svc == CSCHED_ITEM(curr_on_cpu(sched_item_cpu(svc->item))) );
 
     if ( (delta = now - svc->start_time) <= 0 )
         return;
@@ -349,8 +349,8 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 
 static inline void __runq_tickle(struct csched_item *new)
 {
-    unsigned int cpu = new->vcpu->processor;
-    struct sched_item *item = new->vcpu->sched_item;
+    unsigned int cpu = sched_item_cpu(new->item);
+    struct sched_item *item = new->item;
     struct csched_item * const cur = CSCHED_ITEM(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     cpumask_t mask, idle_mask, *online;
@@ -364,16 +364,16 @@ static inline void __runq_tickle(struct csched_item *new)
     idlers_empty = cpumask_empty(&idle_mask);
 
     /*
-     * Exclusive pinning is when a vcpu has hard-affinity with only one
-     * cpu, and there is no other vcpu that has hard-affinity with that
+     * Exclusive pinning is when a item has hard-affinity with only one
+     * cpu, and there is no other item that has hard-affinity with that
      * same cpu. This is infrequent, but if it happens, is for achieving
      * the most possible determinism, and least possible overhead for
-     * the vcpus in question.
+     * the items in question.
      *
      * Try to identify the vast majority of these situations, and deal
      * with them quickly.
      */
-    if ( unlikely(test_bit(CSCHED_FLAG_VCPU_PINNED, &new->flags) &&
+    if ( unlikely(test_bit(CSCHED_FLAG_ITEM_PINNED, &new->flags) &&
                   cpumask_test_cpu(cpu, &idle_mask)) )
     {
         ASSERT(cpumask_cycle(cpu, item->cpu_hard_affinity) == cpu);
@@ -384,7 +384,7 @@ static inline void __runq_tickle(struct csched_item *new)
 
     /*
      * If the pcpu is idle, or there are no idlers and the new
-     * vcpu is a higher priority than the old vcpu, run it here.
+     * item is a higher priority than the old item, run it here.
      *
      * If there are idle cpus, first try to find one suitable to run
      * new, so we can avoid preempting cur.  If we cannot find a
@@ -403,7 +403,7 @@ static inline void __runq_tickle(struct csched_item *new)
     else if ( !idlers_empty )
     {
         /*
-         * Soft and hard affinity balancing loop. For vcpus without
+         * Soft and hard affinity balancing loop. For items without
          * a useful soft affinity, consider hard affinity only.
          */
         for_each_affinity_balance_step( balance_step )
@@ -446,10 +446,10 @@ static inline void __runq_tickle(struct csched_item *new)
             {
                 if ( cpumask_intersects(item->cpu_hard_affinity, &idle_mask) )
                 {
-                    SCHED_VCPU_STAT_CRANK(cur, kicked_away);
-                    SCHED_VCPU_STAT_CRANK(cur, migrate_r);
+                    SCHED_ITEM_STAT_CRANK(cur, kicked_away);
+                    SCHED_ITEM_STAT_CRANK(cur, migrate_r);
                     SCHED_STAT_CRANK(migrate_kicked_away);
-                    set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
+                    sched_set_pause_flags_atomic(cur->item, _VPF_migrating);
                 }
                 /* Tickle cpu anyway, to let new preempt cur. */
                 SCHED_STAT_CRANK(tickled_busy_cpu);
@@ -605,7 +605,7 @@ init_pdata(struct csched_private *prv, struct csched_pcpu *spc, int cpu)
     spc->idle_bias = nr_cpu_ids - 1;
 
     /* Start off idling... */
-    BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)->vcpu));
+    BUG_ON(!is_idle_item(curr_on_cpu(cpu)));
     cpumask_set_cpu(cpu, prv->idlers);
     spc->nr_runnable = 0;
 }
@@ -639,9 +639,9 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     struct csched_private *prv = CSCHED_PRIV(new_ops);
     struct csched_item *svc = vdata;
 
-    ASSERT(svc && is_idle_vcpu(svc->vcpu));
+    ASSERT(svc && is_idle_item(svc->item));
 
-    idle_vcpu[cpu]->sched_item->priv = vdata;
+    sched_idle_item(cpu)->priv = vdata;
 
     /*
      * We are holding the runqueue lock already (it's been taken in
@@ -667,33 +667,33 @@ csched_switch_sched(struct scheduler *new_ops, unsigned int cpu,
 
 #ifndef NDEBUG
 static inline void
-__csched_vcpu_check(struct vcpu *vc)
+__csched_item_check(struct sched_item *item)
 {
-    struct csched_item * const svc = CSCHED_ITEM(vc->sched_item);
+    struct csched_item * const svc = CSCHED_ITEM(item);
     struct csched_dom * const sdom = svc->sdom;
 
-    BUG_ON( svc->vcpu != vc );
-    BUG_ON( sdom != CSCHED_DOM(vc->domain) );
+    BUG_ON( svc->item != item );
+    BUG_ON( sdom != CSCHED_DOM(item->domain) );
     if ( sdom )
     {
-        BUG_ON( is_idle_vcpu(vc) );
-        BUG_ON( sdom->dom != vc->domain );
+        BUG_ON( is_idle_item(item) );
+        BUG_ON( sdom->dom != item->domain );
     }
     else
     {
-        BUG_ON( !is_idle_vcpu(vc) );
+        BUG_ON( !is_idle_item(item) );
     }
 
     SCHED_STAT_CRANK(item_check);
 }
-#define CSCHED_VCPU_CHECK(_vc)  (__csched_vcpu_check(_vc))
+#define CSCHED_ITEM_CHECK(item)  (__csched_item_check(item))
 #else
-#define CSCHED_VCPU_CHECK(_vc)
+#define CSCHED_ITEM_CHECK(item)
 #endif
 
 /*
- * Delay, in microseconds, between migrations of a VCPU between PCPUs.
- * This prevents rapid fluttering of a VCPU between CPUs, and reduces the
+ * Delay, in microseconds, between migrations of a ITEM between PCPUs.
+ * This prevents rapid fluttering of a ITEM between CPUs, and reduces the
  * implicit overheads such as cache-warming. 1ms (1000) has been measured
  * as a good value.
  */
@@ -701,10 +701,11 @@ static unsigned int vcpu_migration_delay_us;
 integer_param("vcpu_migration_delay", vcpu_migration_delay_us);
 
 static inline bool
-__csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
+__csched_item_is_cache_hot(const struct csched_private *prv,
+                           struct sched_item *item)
 {
-    bool hot = prv->vcpu_migr_delay &&
-               (NOW() - v->sched_item->last_run_time) < prv->vcpu_migr_delay;
+    bool hot = prv->item_migr_delay &&
+               (NOW() - item->last_run_time) < prv->item_migr_delay;
 
     if ( hot )
         SCHED_STAT_CRANK(item_hot);
@@ -713,36 +714,38 @@ __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
 }
 
 static inline int
-__csched_vcpu_is_migrateable(const struct csched_private *prv, struct vcpu *vc,
+__csched_item_is_migrateable(const struct csched_private *prv,
+                             struct sched_item *item,
                              int dest_cpu, cpumask_t *mask)
 {
     /*
      * Don't pick up work that's hot on peer PCPU, or that can't (or
      * would prefer not to) run on cpu.
      *
-     * The caller is supposed to have already checked that vc is also
+     * The caller is supposed to have already checked that item is also
      * not running.
      */
-    ASSERT(!vcpu_running(vc));
+    ASSERT(!item->is_running);
 
-    return !__csched_vcpu_is_cache_hot(prv, vc) &&
+    return !__csched_item_is_cache_hot(prv, item) &&
            cpumask_test_cpu(dest_cpu, mask);
 }
 
 static int
-_csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
+_csched_cpu_pick(const struct scheduler *ops, struct sched_item *item,
+                 bool_t commit)
 {
-    /* We must always use vc->procssor's scratch space */
-    cpumask_t *cpus = cpumask_scratch_cpu(vc->processor);
+    int cpu = sched_item_cpu(item);
+    /* We must always use cpu's scratch space */
+    cpumask_t *cpus = cpumask_scratch_cpu(cpu);
     cpumask_t idlers;
-    cpumask_t *online = cpupool_domain_cpumask(vc->domain);
+    cpumask_t *online = cpupool_domain_cpumask(item->domain);
     struct csched_pcpu *spc = NULL;
-    int cpu = vc->processor;
     int balance_step;
 
     for_each_affinity_balance_step( balance_step )
     {
-        affinity_balance_cpumask(vc->sched_item, balance_step, cpus);
+        affinity_balance_cpumask(item, balance_step, cpus);
         cpumask_and(cpus, online, cpus);
         /*
          * We want to pick up a pcpu among the ones that are online and
@@ -761,12 +764,13 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * balancing step all together.
          */
         if ( balance_step == BALANCE_SOFT_AFFINITY &&
-             (!has_soft_affinity(vc->sched_item) || cpumask_empty(cpus)) )
+             (!has_soft_affinity(item) || cpumask_empty(cpus)) )
             continue;
 
         /* If present, prefer vc's current processor */
-        cpu = cpumask_test_cpu(vc->processor, cpus)
-                ? vc->processor : cpumask_cycle(vc->processor, cpus);
+        cpu = cpumask_test_cpu(sched_item_cpu(item), cpus)
+                ? sched_item_cpu(item)
+                : cpumask_cycle(sched_item_cpu(item), cpus);
         ASSERT(cpumask_test_cpu(cpu, cpus));
 
         /*
@@ -778,15 +782,15 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * We give preference to the idle execution vehicle with the most
          * idling neighbours in its grouping. This distributes work across
          * distinct cores first and guarantees we don't do something stupid
-         * like run two VCPUs on co-hyperthreads while there are idle cores
+         * like run two ITEMs on co-hyperthreads while there are idle cores
          * or sockets.
          *
          * Notice that, when computing the "idleness" of cpu, we may want to
-         * discount vc. That is, iff vc is the currently running and the only
-         * runnable vcpu on cpu, we add cpu to the idlers.
+         * discount item. That is, iff item is the currently running and the
+         * only runnable item on cpu, we add cpu to the idlers.
          */
         cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
-        if ( vc->processor == cpu && is_runq_idle(cpu) )
+        if ( sched_item_cpu(item) == cpu && is_runq_idle(cpu) )
             __cpumask_set_cpu(cpu, &idlers);
         cpumask_and(cpus, &idlers, cpus);
 
@@ -796,7 +800,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * CPU, as we just &&-ed it with idlers). In fact, if we are on SMT, and
          * cpu points to a busy thread with an idle sibling, both the threads
          * will be considered the same, from the "idleness" calculation point
-         * of view", preventing vcpu from being moved to the thread that is
+         * of view", preventing item from being moved to the thread that is
          * actually idle.
          *
          * Notice that cpumask_test_cpu() is quicker than cpumask_empty(), so
@@ -862,7 +866,8 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     if ( commit && spc )
        spc->idle_bias = cpu;
 
-    TRACE_3D(TRC_CSCHED_PICKED_CPU, vc->domain->domain_id, vc->vcpu_id, cpu);
+    TRACE_3D(TRC_CSCHED_PICKED_CPU, item->domain->domain_id, item->item_id,
+             cpu);
 
     return cpu;
 }
@@ -870,7 +875,6 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
 static struct sched_resource *
 csched_res_pick(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct csched_item *svc = CSCHED_ITEM(item);
 
     /*
@@ -880,26 +884,26 @@ csched_res_pick(const struct scheduler *ops, struct sched_item *item)
      * csched_item_wake() (still called from vcpu_migrate()) we won't
      * get boosted, which we don't deserve as we are "only" migrating.
      */
-    set_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
-    return per_cpu(sched_res, _csched_cpu_pick(ops, vc, 1));
+    set_bit(CSCHED_FLAG_ITEM_MIGRATING, &svc->flags);
+    return per_cpu(sched_res, _csched_cpu_pick(ops, item, 1));
 }
 
 static inline void
-__csched_vcpu_acct_start(struct csched_private *prv, struct csched_item *svc)
+__csched_item_acct_start(struct csched_private *prv, struct csched_item *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
     unsigned long flags;
 
     spin_lock_irqsave(&prv->lock, flags);
 
-    if ( list_empty(&svc->active_vcpu_elem) )
+    if ( list_empty(&svc->active_item_elem) )
     {
-        SCHED_VCPU_STAT_CRANK(svc, state_active);
+        SCHED_ITEM_STAT_CRANK(svc, state_active);
         SCHED_STAT_CRANK(acct_item_active);
 
-        sdom->active_vcpu_count++;
-        list_add(&svc->active_vcpu_elem, &sdom->active_vcpu);
-        /* Make weight per-vcpu */
+        sdom->active_item_count++;
+        list_add(&svc->active_item_elem, &sdom->active_item);
+        /* Make weight per-item */
         prv->weight += sdom->weight;
         if ( list_empty(&sdom->active_sdom_elem) )
         {
@@ -908,56 +912,56 @@ __csched_vcpu_acct_start(struct csched_private *prv, struct csched_item *svc)
     }
 
     TRACE_3D(TRC_CSCHED_ACCOUNT_START, sdom->dom->domain_id,
-             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
+             svc->item->item_id, sdom->active_item_count);
 
     spin_unlock_irqrestore(&prv->lock, flags);
 }
 
 static inline void
-__csched_vcpu_acct_stop_locked(struct csched_private *prv,
+__csched_item_acct_stop_locked(struct csched_private *prv,
     struct csched_item *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
-    BUG_ON( list_empty(&svc->active_vcpu_elem) );
+    BUG_ON( list_empty(&svc->active_item_elem) );
 
-    SCHED_VCPU_STAT_CRANK(svc, state_idle);
+    SCHED_ITEM_STAT_CRANK(svc, state_idle);
     SCHED_STAT_CRANK(acct_item_idle);
 
     BUG_ON( prv->weight < sdom->weight );
-    sdom->active_vcpu_count--;
-    list_del_init(&svc->active_vcpu_elem);
+    sdom->active_item_count--;
+    list_del_init(&svc->active_item_elem);
     prv->weight -= sdom->weight;
-    if ( list_empty(&sdom->active_vcpu) )
+    if ( list_empty(&sdom->active_item) )
     {
         list_del_init(&sdom->active_sdom_elem);
     }
 
     TRACE_3D(TRC_CSCHED_ACCOUNT_STOP, sdom->dom->domain_id,
-             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
+             svc->item->item_id, sdom->active_item_count);
 }
 
 static void
-csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
+csched_item_acct(struct csched_private *prv, unsigned int cpu)
 {
     struct sched_item *curritem = current->sched_item;
     struct csched_item * const svc = CSCHED_ITEM(curritem);
     const struct scheduler *ops = per_cpu(scheduler, cpu);
 
-    ASSERT( current->processor == cpu );
+    ASSERT( sched_item_cpu(curritem) == cpu );
     ASSERT( svc->sdom != NULL );
-    ASSERT( !is_idle_vcpu(svc->vcpu) );
+    ASSERT( !is_idle_item(svc->item) );
 
     /*
-     * If this VCPU's priority was boosted when it last awoke, reset it.
-     * If the VCPU is found here, then it's consuming a non-negligeable
+     * If this ITEM's priority was boosted when it last awoke, reset it.
+     * If the ITEM is found here, then it's consuming a non-negligeable
      * amount of CPU resources and should no longer be boosted.
      */
     if ( svc->pri == CSCHED_PRI_TS_BOOST )
     {
         svc->pri = CSCHED_PRI_TS_UNDER;
         TRACE_2D(TRC_CSCHED_BOOST_END, svc->sdom->dom->domain_id,
-                 svc->vcpu->vcpu_id);
+                 svc->item->item_id);
     }
 
     /*
@@ -966,12 +970,12 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
     burn_credits(svc, NOW());
 
     /*
-     * Put this VCPU and domain back on the active list if it was
+     * Put this ITEM and domain back on the active list if it was
      * idling.
      */
-    if ( list_empty(&svc->active_vcpu_elem) )
+    if ( list_empty(&svc->active_item_elem) )
     {
-        __csched_vcpu_acct_start(prv, svc);
+        __csched_item_acct_start(prv, svc);
     }
     else
     {
@@ -984,15 +988,15 @@ csched_vcpu_acct(struct csched_private *prv, unsigned int cpu)
          * migrating it to run elsewhere (see multi-core and multi-thread
          * support in csched_res_pick()).
          */
-        new_cpu = _csched_cpu_pick(ops, current, 0);
+        new_cpu = _csched_cpu_pick(ops, curritem, 0);
 
         item_schedule_unlock_irqrestore(lock, flags, curritem);
 
         if ( new_cpu != cpu )
         {
-            SCHED_VCPU_STAT_CRANK(svc, migrate_r);
+            SCHED_ITEM_STAT_CRANK(svc, migrate_r);
             SCHED_STAT_CRANK(migrate_running);
-            set_bit(_VPF_migrating, &current->pause_flags);
+            sched_set_pause_flags_atomic(curritem, _VPF_migrating);
             /*
              * As we are about to tickle cpu, we should clear its bit in
              * idlers. But, if we are here, it means there is someone running
@@ -1009,21 +1013,20 @@ static void *
 csched_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
                    void *dd)
 {
-    struct vcpu *vc = item->vcpu;
     struct csched_item *svc;
 
-    /* Allocate per-VCPU info */
+    /* Allocate per-ITEM info */
     svc = xzalloc(struct csched_item);
     if ( svc == NULL )
         return NULL;
 
     INIT_LIST_HEAD(&svc->runq_elem);
-    INIT_LIST_HEAD(&svc->active_vcpu_elem);
+    INIT_LIST_HEAD(&svc->active_item_elem);
     svc->sdom = dd;
-    svc->vcpu = vc;
-    svc->pri = is_idle_domain(vc->domain) ?
+    svc->item = item;
+    svc->pri = is_idle_item(item) ?
         CSCHED_PRI_IDLE : CSCHED_PRI_TS_UNDER;
-    SCHED_VCPU_STATS_RESET(svc);
+    SCHED_ITEM_STATS_RESET(svc);
     SCHED_STAT_CRANK(item_alloc);
     return svc;
 }
@@ -1031,23 +1034,21 @@ csched_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
 static void
 csched_item_insert(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct csched_item *svc = item->priv;
     spinlock_t *lock;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_item(item) );
 
     /* csched_res_pick() looks in vc->processor's runq, so we need the lock. */
     lock = item_schedule_lock_irq(item);
 
-    item->res = csched_res_pick(ops, item);
-    vc->processor = item->res->processor;
+    sched_set_res(item, csched_res_pick(ops, item));
 
     spin_unlock_irq(lock);
 
     lock = item_schedule_lock_irq(item);
 
-    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vcpu_running(vc) )
+    if ( !__item_on_runq(svc) && item_runnable(item) && !item->is_running )
         runq_insert(svc);
 
     item_schedule_unlock_irq(lock, item);
@@ -1074,18 +1075,18 @@ csched_item_remove(const struct scheduler *ops, struct sched_item *item)
 
     SCHED_STAT_CRANK(item_remove);
 
-    ASSERT(!__vcpu_on_runq(svc));
+    ASSERT(!__item_on_runq(svc));
 
-    if ( test_and_clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+    if ( test_and_clear_bit(CSCHED_FLAG_ITEM_PARKED, &svc->flags) )
     {
         SCHED_STAT_CRANK(item_unpark);
-        vcpu_unpause(svc->vcpu);
+        vcpu_unpause(svc->item->vcpu);
     }
 
     spin_lock_irq(&prv->lock);
 
-    if ( !list_empty(&svc->active_vcpu_elem) )
-        __csched_vcpu_acct_stop_locked(prv, svc);
+    if ( !list_empty(&svc->active_item_elem) )
+        __csched_item_acct_stop_locked(prv, svc);
 
     spin_unlock_irq(&prv->lock);
 
@@ -1095,86 +1096,85 @@ csched_item_remove(const struct scheduler *ops, struct sched_item *item)
 static void
 csched_item_sleep(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct csched_item * const svc = CSCHED_ITEM(item);
-    unsigned int cpu = vc->processor;
+    unsigned int cpu = sched_item_cpu(item);
 
     SCHED_STAT_CRANK(item_sleep);
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_item(item) );
 
     if ( curr_on_cpu(cpu) == item )
     {
         /*
          * We are about to tickle cpu, so we should clear its bit in idlers.
-         * But, we are here because vc is going to sleep while running on cpu,
+         * But, we are here because item is going to sleep while running on cpu,
          * so the bit must be zero already.
          */
         ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(per_cpu(scheduler, cpu))->idlers));
         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
     }
-    else if ( __vcpu_on_runq(svc) )
+    else if ( __item_on_runq(svc) )
         runq_remove(svc);
 }
 
 static void
 csched_item_wake(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct csched_item * const svc = CSCHED_ITEM(item);
     bool_t migrating;
 
-    BUG_ON( is_idle_vcpu(vc) );
+    BUG_ON( is_idle_item(item) );
 
-    if ( unlikely(curr_on_cpu(vc->processor) == item) )
+    if ( unlikely(curr_on_cpu(sched_item_cpu(item)) == item) )
     {
         SCHED_STAT_CRANK(item_wake_running);
         return;
     }
-    if ( unlikely(__vcpu_on_runq(svc)) )
+    if ( unlikely(__item_on_runq(svc)) )
     {
         SCHED_STAT_CRANK(item_wake_onrunq);
         return;
     }
 
-    if ( likely(vcpu_runnable(vc)) )
+    if ( likely(item_runnable(item)) )
         SCHED_STAT_CRANK(item_wake_runnable);
     else
         SCHED_STAT_CRANK(item_wake_not_runnable);
 
     /*
-     * We temporarly boost the priority of awaking VCPUs!
+     * We temporarily boost the priority of awaking ITEMs!
      *
-     * If this VCPU consumes a non negligeable amount of CPU, it
+     * If this ITEM consumes a non negligible amount of CPU, it
      * will eventually find itself in the credit accounting code
      * path where its priority will be reset to normal.
      *
-     * If on the other hand the VCPU consumes little CPU and is
+     * If on the other hand the ITEM consumes little CPU and is
      * blocking and awoken a lot (doing I/O for example), its
      * priority will remain boosted, optimizing it's wake-to-run
      * latencies.
      *
-     * This allows wake-to-run latency sensitive VCPUs to preempt
-     * more CPU resource intensive VCPUs without impacting overall 
+     * This allows wake-to-run latency sensitive ITEMs to preempt
+     * more CPU resource intensive ITEMs without impacting overall
      * system fairness.
      *
      * There are two cases, when we don't want to boost:
-     *  - VCPUs that are waking up after a migration, rather than
+     *  - ITEMs that are waking up after a migration, rather than
      *    after having block;
-     *  - VCPUs of capped domains unpausing after earning credits
+     *  - ITEMs of capped domains unpausing after earning credits
      *    they had overspent.
      */
-    migrating = test_and_clear_bit(CSCHED_FLAG_VCPU_MIGRATING, &svc->flags);
+    migrating = test_and_clear_bit(CSCHED_FLAG_ITEM_MIGRATING, &svc->flags);
 
     if ( !migrating && svc->pri == CSCHED_PRI_TS_UNDER &&
-         !test_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+         !test_bit(CSCHED_FLAG_ITEM_PARKED, &svc->flags) )
     {
-        TRACE_2D(TRC_CSCHED_BOOST_START, vc->domain->domain_id, vc->vcpu_id);
+        TRACE_2D(TRC_CSCHED_BOOST_START, item->domain->domain_id,
+                 item->item_id);
         SCHED_STAT_CRANK(item_boost);
         svc->pri = CSCHED_PRI_TS_BOOST;
     }
 
-    /* Put the VCPU on the runq and tickle CPUs */
+    /* Put the ITEM on the runq and tickle CPUs */
     runq_insert(svc);
     __runq_tickle(svc);
 }
@@ -1185,7 +1185,7 @@ csched_item_yield(const struct scheduler *ops, struct sched_item *item)
     struct csched_item * const svc = CSCHED_ITEM(item);
 
     /* Let the scheduler know that this vcpu is trying to yield */
-    set_bit(CSCHED_FLAG_VCPU_YIELD, &svc->flags);
+    set_bit(CSCHED_FLAG_ITEM_YIELD, &svc->flags);
 }
 
 static int
@@ -1214,8 +1214,8 @@ csched_dom_cntl(
         {
             if ( !list_empty(&sdom->active_sdom_elem) )
             {
-                prv->weight -= sdom->weight * sdom->active_vcpu_count;
-                prv->weight += op->u.credit.weight * sdom->active_vcpu_count;
+                prv->weight -= sdom->weight * sdom->active_item_count;
+                prv->weight += op->u.credit.weight * sdom->active_item_count;
             }
             sdom->weight = op->u.credit.weight;
         }
@@ -1244,9 +1244,9 @@ csched_aff_cntl(const struct scheduler *ops, struct sched_item *item,
 
     /* Are we becoming exclusively pinned? */
     if ( cpumask_weight(hard) == 1 )
-        set_bit(CSCHED_FLAG_VCPU_PINNED, &svc->flags);
+        set_bit(CSCHED_FLAG_ITEM_PINNED, &svc->flags);
     else
-        clear_bit(CSCHED_FLAG_VCPU_PINNED, &svc->flags);
+        clear_bit(CSCHED_FLAG_ITEM_PINNED, &svc->flags);
 }
 
 static inline void
@@ -1289,14 +1289,14 @@ csched_sys_cntl(const struct scheduler *ops,
         else if ( prv->ratelimit && !params->ratelimit_us )
             printk(XENLOG_INFO "Disabling context switch rate limiting\n");
         prv->ratelimit = MICROSECS(params->ratelimit_us);
-        prv->vcpu_migr_delay = MICROSECS(params->vcpu_migr_delay_us);
+        prv->item_migr_delay = MICROSECS(params->vcpu_migr_delay_us);
         spin_unlock_irqrestore(&prv->lock, flags);
 
         /* FALLTHRU */
     case XEN_SYSCTL_SCHEDOP_getinfo:
         params->tslice_ms = prv->tslice / MILLISECS(1);
         params->ratelimit_us = prv->ratelimit / MICROSECS(1);
-        params->vcpu_migr_delay_us = prv->vcpu_migr_delay / MICROSECS(1);
+        params->vcpu_migr_delay_us = prv->item_migr_delay / MICROSECS(1);
         rc = 0;
         break;
     }
@@ -1314,7 +1314,7 @@ csched_alloc_domdata(const struct scheduler *ops, struct domain *dom)
         return ERR_PTR(-ENOMEM);
 
     /* Initialize credit and weight */
-    INIT_LIST_HEAD(&sdom->active_vcpu);
+    INIT_LIST_HEAD(&sdom->active_item);
     INIT_LIST_HEAD(&sdom->active_sdom_elem);
     sdom->dom = dom;
     sdom->weight = CSCHED_DEFAULT_WEIGHT;
@@ -1331,7 +1331,7 @@ csched_free_domdata(const struct scheduler *ops, void *data)
 /*
  * This is a O(n) optimized sort of the runq.
  *
- * Time-share VCPUs can only be one of two priorities, UNDER or OVER. We walk
+ * Time-share ITEMs can only be one of two priorities, UNDER or OVER. We walk
  * through the runq and move up any UNDERs that are preceded by OVERS. We
  * remember the last UNDER to make the move up operation O(1).
  */
@@ -1384,7 +1384,7 @@ csched_acct(void* dummy)
 {
     struct csched_private *prv = dummy;
     unsigned long flags;
-    struct list_head *iter_vcpu, *next_vcpu;
+    struct list_head *iter_item, *next_item;
     struct list_head *iter_sdom, *next_sdom;
     struct csched_item *svc;
     struct csched_dom *sdom;
@@ -1431,26 +1431,26 @@ csched_acct(void* dummy)
         sdom = list_entry(iter_sdom, struct csched_dom, active_sdom_elem);
 
         BUG_ON( is_idle_domain(sdom->dom) );
-        BUG_ON( sdom->active_vcpu_count == 0 );
+        BUG_ON( sdom->active_item_count == 0 );
         BUG_ON( sdom->weight == 0 );
-        BUG_ON( (sdom->weight * sdom->active_vcpu_count) > weight_left );
+        BUG_ON( (sdom->weight * sdom->active_item_count) > weight_left );
 
-        weight_left -= ( sdom->weight * sdom->active_vcpu_count );
+        weight_left -= ( sdom->weight * sdom->active_item_count );
 
         /*
          * A domain's fair share is computed using its weight in competition
          * with that of all other active domains.
          *
-         * At most, a domain can use credits to run all its active VCPUs
+         * At most, a domain can use credits to run all its active ITEMs
          * for one full accounting period. We allow a domain to earn more
          * only when the system-wide credit balance is negative.
          */
-        credit_peak = sdom->active_vcpu_count * prv->credits_per_tslice;
+        credit_peak = sdom->active_item_count * prv->credits_per_tslice;
         if ( prv->credit_balance < 0 )
         {
             credit_peak += ( ( -prv->credit_balance
                                * sdom->weight
-                               * sdom->active_vcpu_count) +
+                               * sdom->active_item_count) +
                              (weight_total - 1)
                            ) / weight_total;
         }
@@ -1461,14 +1461,14 @@ csched_acct(void* dummy)
             if ( credit_cap < credit_peak )
                 credit_peak = credit_cap;
 
-            /* FIXME -- set cap per-vcpu as well...? */
-            credit_cap = ( credit_cap + ( sdom->active_vcpu_count - 1 )
-                         ) / sdom->active_vcpu_count;
+            /* FIXME -- set cap per-item as well...? */
+            credit_cap = ( credit_cap + ( sdom->active_item_count - 1 )
+                         ) / sdom->active_item_count;
         }
 
         credit_fair = ( ( credit_total
                           * sdom->weight
-                          * sdom->active_vcpu_count )
+                          * sdom->active_item_count )
                         + (weight_total - 1)
                       ) / weight_total;
 
@@ -1502,14 +1502,14 @@ csched_acct(void* dummy)
             credit_fair = credit_peak;
         }
 
-        /* Compute fair share per VCPU */
-        credit_fair = ( credit_fair + ( sdom->active_vcpu_count - 1 )
-                      ) / sdom->active_vcpu_count;
+        /* Compute fair share per ITEM */
+        credit_fair = ( credit_fair + ( sdom->active_item_count - 1 )
+                      ) / sdom->active_item_count;
 
 
-        list_for_each_safe( iter_vcpu, next_vcpu, &sdom->active_vcpu )
+        list_for_each_safe( iter_item, next_item, &sdom->active_item )
         {
-            svc = list_entry(iter_vcpu, struct csched_item, active_vcpu_elem);
+            svc = list_entry(iter_item, struct csched_item, active_item_elem);
             BUG_ON( sdom != svc->sdom );
 
             /* Increment credit */
@@ -1517,20 +1517,20 @@ csched_acct(void* dummy)
             credit = atomic_read(&svc->credit);
 
             /*
-             * Recompute priority or, if VCPU is idling, remove it from
+             * Recompute priority or, if ITEM is idling, remove it from
              * the active list.
              */
             if ( credit < 0 )
             {
                 svc->pri = CSCHED_PRI_TS_OVER;
 
-                /* Park running VCPUs of capped-out domains */
+                /* Park running ITEMs of capped-out domains */
                 if ( sdom->cap != 0U &&
                      credit < -credit_cap &&
-                     !test_and_set_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+                     !test_and_set_bit(CSCHED_FLAG_ITEM_PARKED, &svc->flags) )
                 {
                     SCHED_STAT_CRANK(item_park);
-                    vcpu_pause_nosync(svc->vcpu);
+                    vcpu_pause_nosync(svc->item->vcpu);
                 }
 
                 /* Lower bound on credits */
@@ -1546,21 +1546,21 @@ csched_acct(void* dummy)
                 svc->pri = CSCHED_PRI_TS_UNDER;
 
                 /* Unpark any capped domains whose credits go positive */
-                if ( test_and_clear_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
+                if ( test_and_clear_bit(CSCHED_FLAG_ITEM_PARKED, &svc->flags) )
                 {
                     /*
                      * It's important to unset the flag AFTER the unpause()
-                     * call to make sure the VCPU's priority is not boosted
+                     * call to make sure the ITEM's priority is not boosted
                      * if it is woken up here.
                      */
                     SCHED_STAT_CRANK(item_unpark);
-                    vcpu_unpause(svc->vcpu);
+                    vcpu_unpause(svc->item->vcpu);
                 }
 
-                /* Upper bound on credits means VCPU stops earning */
+                /* Upper bound on credits means ITEM stops earning */
                 if ( credit > prv->credits_per_tslice )
                 {
-                    __csched_vcpu_acct_stop_locked(prv, svc);
+                    __csched_item_acct_stop_locked(prv, svc);
                     /* Divide credits in half, so that when it starts
                      * accounting again, it starts a little bit "ahead" */
                     credit /= 2;
@@ -1568,8 +1568,8 @@ csched_acct(void* dummy)
                 }
             }
 
-            SCHED_VCPU_STAT_SET(svc, credit_last, credit);
-            SCHED_VCPU_STAT_SET(svc, credit_incr, credit_fair);
+            SCHED_ITEM_STAT_SET(svc, credit_last, credit);
+            SCHED_ITEM_STAT_SET(svc, credit_incr, credit_fair);
             credit_balance += credit;
         }
     }
@@ -1595,10 +1595,10 @@ csched_tick(void *_cpu)
     spc->tick++;
 
     /*
-     * Accounting for running VCPU
+     * Accounting for running ITEM
      */
-    if ( !is_idle_vcpu(current) )
-        csched_vcpu_acct(prv, cpu);
+    if ( !is_idle_item(current->sched_item) )
+        csched_item_acct(prv, cpu);
 
     /*
      * Check if runq needs to be sorted
@@ -1619,7 +1619,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
     const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu);
     struct csched_item *speer;
     struct list_head *iter;
-    struct vcpu *vc;
+    struct sched_item *item;
 
     ASSERT(peer_pcpu != NULL);
 
@@ -1627,7 +1627,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
      * Don't steal from an idle CPU's runq because it's about to
      * pick up work from it itself.
      */
-    if ( unlikely(is_idle_vcpu(curr_on_cpu(peer_cpu)->vcpu)) )
+    if ( unlikely(is_idle_item(curr_on_cpu(peer_cpu))) )
         goto out;
 
     list_for_each( iter, &peer_pcpu->runq )
@@ -1635,45 +1635,44 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
         speer = __runq_elem(iter);
 
         /*
-         * If next available VCPU here is not of strictly higher
+         * If next available ITEM here is not of strictly higher
          * priority than ours, this PCPU is useless to us.
          */
         if ( speer->pri <= pri )
             break;
 
-        /* Is this VCPU runnable on our PCPU? */
-        vc = speer->vcpu;
-        BUG_ON( is_idle_vcpu(vc) );
+        /* Is this ITEM runnable on our PCPU? */
+        item = speer->item;
+        BUG_ON( is_idle_item(item) );
 
         /*
-         * If the vcpu is still in peer_cpu's scheduling tail, or if it
+         * If the item is still in peer_cpu's scheduling tail, or if it
          * has no useful soft affinity, skip it.
          *
          * In fact, what we want is to check if we have any "soft-affine
          * work" to steal, before starting to look at "hard-affine work".
          *
-         * Notice that, if not even one vCPU on this runq has a useful
+         * Notice that, if not even one item on this runq has a useful
          * soft affinity, we could have avoid considering this runq for
          * a soft balancing step in the first place. This, for instance,
          * can be implemented by taking note of on what runq there are
-         * vCPUs with useful soft affinities in some sort of bitmap
+         * items with useful soft affinities in some sort of bitmap
          * or counter.
          */
-        if ( vcpu_running(vc) || (balance_step == BALANCE_SOFT_AFFINITY &&
-                                  !has_soft_affinity(vc->sched_item)) )
+        if ( item->is_running || (balance_step == BALANCE_SOFT_AFFINITY &&
+                                  !has_soft_affinity(item)) )
             continue;
 
-        affinity_balance_cpumask(vc->sched_item, balance_step, cpumask_scratch);
-        if ( __csched_vcpu_is_migrateable(prv, vc, cpu, cpumask_scratch) )
+        affinity_balance_cpumask(item, balance_step, cpumask_scratch);
+        if ( __csched_item_is_migrateable(prv, item, cpu, cpumask_scratch) )
         {
             /* We got a candidate. Grab it! */
-            TRACE_3D(TRC_CSCHED_STOLEN_VCPU, peer_cpu,
-                     vc->domain->domain_id, vc->vcpu_id);
-            SCHED_VCPU_STAT_CRANK(speer, migrate_q);
+            TRACE_3D(TRC_CSCHED_STOLEN_ITEM, peer_cpu,
+                     item->domain->domain_id, item->item_id);
+            SCHED_ITEM_STAT_CRANK(speer, migrate_q);
             SCHED_STAT_CRANK(migrate_queued);
-            WARN_ON(vc->is_urgent);
             runq_remove(speer);
-            sched_set_res(vc->sched_item, per_cpu(sched_res, cpu));
+            sched_set_res(item, per_cpu(sched_res, cpu));
             /*
              * speer will start executing directly on cpu, without having to
              * go through runq_insert(). So we must update the runnable count
@@ -1699,7 +1698,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
     int peer_cpu, first_cpu, peer_node, bstep;
     int node = cpu_to_node(cpu);
 
-    BUG_ON( cpu != snext->vcpu->processor );
+    BUG_ON( cpu != sched_item_cpu(snext->item) );
     online = cpupool_online_cpumask(c);
 
     /*
@@ -1728,7 +1727,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
         /*
          * We peek at the non-idling CPUs in a node-wise fashion. In fact,
          * it is more likely that we find some affine work on our same
-         * node, not to mention that migrating vcpus within the same node
+         * node, not to mention that migrating items within the same node
          * could well expected to be cheaper than across-nodes (memory
          * stays local, there might be some node-wide cache[s], etc.).
          */
@@ -1749,7 +1748,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                 spinlock_t *lock;
 
                 /*
-                 * If there is only one runnable vCPU on peer_cpu, it means
+                 * If there is only one runnable item on peer_cpu, it means
                  * there's no one to be stolen in its runqueue, so skip it.
                  *
                  * Checking this without holding the lock is racy... But that's
@@ -1762,13 +1761,13 @@ csched_load_balance(struct csched_private *prv, int cpu,
                  *   And we can avoid that by re-checking nr_runnable after
                  *   having grabbed the lock, if we want;
                  * - if we race with inc_nr_runnable(), we skip a pCPU that may
-                 *   have runnable vCPUs in its runqueue, but that's not a
+                 *   have runnable items in its runqueue, but that's not a
                  *   problem because:
                  *   + if racing with csched_item_insert() or csched_item_wake(),
-                 *     __runq_tickle() will be called afterwords, so the vCPU
+                 *     __runq_tickle() will be called afterwords, so the item
                  *     won't get stuck in the runqueue for too long;
-                 *   + if racing with csched_runq_steal(), it may be that a
-                 *     vCPU that we could have picked up, stays in a runqueue
+                 *   + if racing with csched_runq_steal(), it may be that an
+                 *     item that we could have picked up, stays in a runqueue
                  *     until someone else tries to steal it again. But this is
                  *     no worse than what can happen already (without this
                  *     optimization), it the pCPU would schedule right after we
@@ -1803,7 +1802,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                     csched_runq_steal(peer_cpu, cpu, snext->pri, bstep) : NULL;
                 pcpu_schedule_unlock(lock, peer_cpu);
 
-                /* As soon as one vcpu is found, balancing ends */
+                /* As soon as one item is found, balancing ends */
                 if ( speer != NULL )
                 {
                     *stolen = 1;
@@ -1842,14 +1841,15 @@ csched_schedule(
 {
     const int cpu = smp_processor_id();
     struct list_head * const runq = RUNQ(cpu);
-    struct csched_item * const scurr = CSCHED_ITEM(current->sched_item);
+    struct sched_item *item = current->sched_item;
+    struct csched_item * const scurr = CSCHED_ITEM(item);
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_item *snext;
     struct task_slice ret;
     s_time_t runtime, tslice;
 
     SCHED_STAT_CRANK(schedule);
-    CSCHED_VCPU_CHECK(current);
+    CSCHED_ITEM_CHECK(item);
 
     /*
      * Here in Credit1 code, we usually just call TRACE_nD() helpers, and
@@ -1863,30 +1863,30 @@ csched_schedule(
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        d.idle = is_idle_vcpu(current);
+        d.idle = is_idle_item(item);
         __trace_var(TRC_CSCHED_SCHEDULE, 1, sizeof(d),
                     (unsigned char *)&d);
     }
 
-    runtime = now - current->sched_item->state_entry_time;
+    runtime = now - item->state_entry_time;
     if ( runtime < 0 ) /* Does this ever happen? */
         runtime = 0;
 
-    if ( !is_idle_vcpu(scurr->vcpu) )
+    if ( !is_idle_item(item) )
     {
-        /* Update credits of a non-idle VCPU. */
+        /* Update credits of a non-idle ITEM. */
         burn_credits(scurr, now);
         scurr->start_time -= now;
     }
     else
     {
-        /* Re-instate a boosted idle VCPU as normal-idle. */
+        /* Re-instate a boosted idle ITEM as normal-idle. */
         scurr->pri = CSCHED_PRI_IDLE;
     }
 
     /* Choices, choices:
-     * - If we have a tasklet, we need to run the idle vcpu no matter what.
-     * - If sched rate limiting is in effect, and the current vcpu has
+     * - If we have a tasklet, we need to run the idle item no matter what.
+     * - If sched rate limiting is in effect, and the current item has
      *   run for less than that amount of time, continue the current one,
      *   but with a shorter timeslice and return it immediately
      * - Otherwise, chose the one with the highest priority (which may
@@ -1904,11 +1904,11 @@ csched_schedule(
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !test_bit(CSCHED_FLAG_VCPU_YIELD, &scurr->flags)
+    if ( !test_bit(CSCHED_FLAG_ITEM_YIELD, &scurr->flags)
          && !tasklet_work_scheduled
          && prv->ratelimit
-         && vcpu_runnable(current)
-         && !is_idle_vcpu(current)
+         && item_runnable(item)
+         && !is_idle_item(item)
          && runtime < prv->ratelimit )
     {
         snext = scurr;
@@ -1926,11 +1926,11 @@ csched_schedule(
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned item:16, dom:16;
                 unsigned runtime;
             } d;
-            d.dom = scurr->vcpu->domain->domain_id;
-            d.vcpu = scurr->vcpu->vcpu_id;
+            d.dom = item->domain->domain_id;
+            d.item = item->item_id;
             d.runtime = runtime;
             __trace_var(TRC_CSCHED_RATELIMIT, 1, sizeof(d),
                         (unsigned char *)&d);
@@ -1942,13 +1942,13 @@ csched_schedule(
     tslice = prv->tslice;
 
     /*
-     * Select next runnable local VCPU (ie top of local runq)
+     * Select next runnable local ITEM (ie top of local runq)
      */
-    if ( vcpu_runnable(current) )
+    if ( item_runnable(item) )
         __runq_insert(scurr);
     else
     {
-        BUG_ON( is_idle_vcpu(current) || list_empty(runq) );
+        BUG_ON( is_idle_item(item) || list_empty(runq) );
         /* Current has blocked. Update the runnable counter for this cpu. */
         dec_nr_runnable(cpu);
     }
@@ -1956,23 +1956,23 @@ csched_schedule(
     snext = __runq_elem(runq->next);
     ret.migrated = 0;
 
-    /* Tasklet work (which runs in idle VCPU context) overrides all else. */
+    /* Tasklet work (which runs in idle ITEM context) overrides all else. */
     if ( tasklet_work_scheduled )
     {
         TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_ITEM(idle_vcpu[cpu]->sched_item);
+        snext = CSCHED_ITEM(sched_idle_item(cpu));
         snext->pri = CSCHED_PRI_TS_BOOST;
     }
 
     /*
      * Clear YIELD flag before scheduling out
      */
-    clear_bit(CSCHED_FLAG_VCPU_YIELD, &scurr->flags);
+    clear_bit(CSCHED_FLAG_ITEM_YIELD, &scurr->flags);
 
     /*
      * SMP Load balance:
      *
-     * If the next highest priority local runnable VCPU has already eaten
+     * If the next highest priority local runnable ITEM has already eaten
      * through its credits, look on other PCPUs to see if we have more
      * urgent work... If not, csched_load_balance() will return snext, but
      * already removed from the runq.
@@ -1996,32 +1996,32 @@ csched_schedule(
         cpumask_clear_cpu(cpu, prv->idlers);
     }
 
-    if ( !is_idle_vcpu(snext->vcpu) )
+    if ( !is_idle_item(snext->item) )
         snext->start_time += now;
 
 out:
     /*
      * Return task to run next...
      */
-    ret.time = (is_idle_vcpu(snext->vcpu) ?
+    ret.time = (is_idle_item(snext->item) ?
                 -1 : tslice);
-    ret.task = snext->vcpu->sched_item;
+    ret.task = snext->item;
 
-    CSCHED_VCPU_CHECK(ret.task->vcpu);
+    CSCHED_ITEM_CHECK(ret.task);
     return ret;
 }
 
 static void
-csched_dump_vcpu(struct csched_item *svc)
+csched_dump_item(struct csched_item *svc)
 {
     struct csched_dom * const sdom = svc->sdom;
 
     printk("[%i.%i] pri=%i flags=%x cpu=%i",
-            svc->vcpu->domain->domain_id,
-            svc->vcpu->vcpu_id,
+            svc->item->domain->domain_id,
+            svc->item->item_id,
             svc->pri,
             svc->flags,
-            svc->vcpu->processor);
+            sched_item_cpu(svc->item));
 
     if ( sdom )
     {
@@ -2055,7 +2055,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
 
     /*
      * We need both locks:
-     * - csched_dump_vcpu() wants to access domains' scheduling
+     * - csched_dump_item() wants to access domains' scheduling
      *   parameters, which are protected by the private scheduler lock;
      * - we scan through the runqueue, so we need the proper runqueue
      *   lock (the one of the runqueue of this cpu).
@@ -2071,12 +2071,12 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_sibling_mask, cpu)),
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
-    /* current VCPU (nothing to say if that's the idle vcpu). */
+    /* current ITEM (nothing to say if that's the idle item). */
     svc = CSCHED_ITEM(curr_on_cpu(cpu));
-    if ( svc && !is_idle_vcpu(svc->vcpu) )
+    if ( svc && !is_idle_item(svc->item) )
     {
         printk("\trun: ");
-        csched_dump_vcpu(svc);
+        csched_dump_item(svc);
     }
 
     loop = 0;
@@ -2086,7 +2086,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
         if ( svc )
         {
             printk("\t%3d: ", ++loop);
-            csched_dump_vcpu(svc);
+            csched_dump_item(svc);
         }
     }
 
@@ -2128,29 +2128,29 @@ csched_dump(const struct scheduler *ops)
            prv->ratelimit / MICROSECS(1),
            CSCHED_CREDITS_PER_MSEC,
            prv->ticks_per_tslice,
-           prv->vcpu_migr_delay/ MICROSECS(1));
+           prv->item_migr_delay/ MICROSECS(1));
 
     printk("idlers: %*pb\n", nr_cpu_ids, cpumask_bits(prv->idlers));
 
-    printk("active vcpus:\n");
+    printk("active items:\n");
     loop = 0;
     list_for_each( iter_sdom, &prv->active_sdom )
     {
         struct csched_dom *sdom;
         sdom = list_entry(iter_sdom, struct csched_dom, active_sdom_elem);
 
-        list_for_each( iter_svc, &sdom->active_vcpu )
+        list_for_each( iter_svc, &sdom->active_item )
         {
             struct csched_item *svc;
             spinlock_t *lock;
 
-            svc = list_entry(iter_svc, struct csched_item, active_vcpu_elem);
-            lock = item_schedule_lock(svc->vcpu->sched_item);
+            svc = list_entry(iter_svc, struct csched_item, active_item_elem);
+            lock = item_schedule_lock(svc->item);
 
             printk("\t%3d: ", ++loop);
-            csched_dump_vcpu(svc);
+            csched_dump_item(svc);
 
-            item_schedule_unlock(lock, svc->vcpu->sched_item);
+            item_schedule_unlock(lock, svc->item);
         }
     }
 
@@ -2224,7 +2224,7 @@ csched_init(struct scheduler *ops)
     else
         prv->ratelimit = MICROSECS(sched_ratelimit_us);
 
-    prv->vcpu_migr_delay = MICROSECS(vcpu_migration_delay_us);
+    prv->item_migr_delay = MICROSECS(vcpu_migration_delay_us);
 
     return 0;
 }
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 27/49] xen/sched: make credit2 scheduler vcpu agnostic.
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (25 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 26/49] xen/sched: make credit " Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 28/49] xen/sched: make arinc653 " Juergen Gross
                   ` (27 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Switch credit2 scheduler completely from vcpu to sched_item usage.

As we are touching lots of lines remove some white space at the end of
the line, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit2.c | 820 ++++++++++++++++++++++-----------------------
 1 file changed, 403 insertions(+), 417 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 5aa819b2c5..7918d46a23 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -45,7 +45,7 @@
 #define TRC_CSCHED2_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(CSCHED2, 8)
 #define TRC_CSCHED2_UPDATE_LOAD      TRC_SCHED_CLASS_EVT(CSCHED2, 9)
 #define TRC_CSCHED2_RUNQ_ASSIGN      TRC_SCHED_CLASS_EVT(CSCHED2, 10)
-#define TRC_CSCHED2_UPDATE_VCPU_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 11)
+#define TRC_CSCHED2_UPDATE_ITEM_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 11)
 #define TRC_CSCHED2_UPDATE_RUNQ_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 12)
 #define TRC_CSCHED2_TICKLE_NEW       TRC_SCHED_CLASS_EVT(CSCHED2, 13)
 #define TRC_CSCHED2_RUNQ_MAX_WEIGHT  TRC_SCHED_CLASS_EVT(CSCHED2, 14)
@@ -74,13 +74,13 @@
  * Design:
  *
  * VMs "burn" credits based on their weight; higher weight means
- * credits burn more slowly.  The highest weight vcpu burns credits at
+ * credits burn more slowly.  The highest weight item burns credits at
  * a rate of 1 credit per nanosecond.  Others burn proportionally
  * more.
  *
- * vcpus are inserted into the runqueue by credit order.
+ * items are inserted into the runqueue by credit order.
  *
- * Credits are "reset" when the next vcpu in the runqueue is less than
+ * Credits are "reset" when the next item in the runqueue is less than
  * or equal to zero.  At that point, everyone's credits are "clipped"
  * to a small value, and a fixed credit is added to everyone.
  */
@@ -95,33 +95,33 @@
  *   be given a cap of 25%; a domain that must not use more than 1+1/2 of
  *   physical CPU time, will be given a cap of 150%;
  *
- * - caps are per-domain (not per-vCPU). If a domain has only 1 vCPU, and
- *   a 40% cap, that one vCPU will use 40% of one pCPU. If a somain has 4
- *   vCPUs, and a 200% cap, the equivalent of 100% time on 2 pCPUs will be
- *   split among the v vCPUs. How much each of the vCPUs will actually get,
+ * - caps are per-domain (not per-item). If a domain has only 1 item, and
+ *   a 40% cap, that one item will use 40% of one pCPU. If a somain has 4
+ *   items, and a 200% cap, the equivalent of 100% time on 2 pCPUs will be
+ *   split among the v items. How much each of the items will actually get,
  *   during any given interval of time, is unspecified (as it depends on
  *   various aspects: workload, system load, etc.). For instance, it is
- *   possible that, during a given time interval, 2 vCPUs use 100% each,
+ *   possible that, during a given time interval, 2 items use 100% each,
  *   and the other two use nothing; while during another time interval,
- *   two vCPUs use 80%, one uses 10% and the other 30%; or that each use
+ *   two items use 80%, one uses 10% and the other 30%; or that each use
  *   50% (and so on and so forth).
  *
  * For implementing this, we use the following approach:
  *
  * - each domain is given a 'budget', an each domain has a timer, which
  *   replenishes the domain's budget periodically. The budget is the amount
- *   of time the vCPUs of the domain can use every 'period';
+ *   of time the items of the domain can use every 'period';
  *
  * - the period is CSCHED2_BDGT_REPL_PERIOD, and is the same for all domains
  *   (but each domain has its own timer; so the all are periodic by the same
  *   period, but replenishment of the budgets of the various domains, at
  *   periods boundaries, are not synchronous);
  *
- * - when vCPUs run, they consume budget. When they don't run, they don't
- *   consume budget. If there is no budget left for the domain, no vCPU of
- *   that domain can run. If a vCPU tries to run and finds that there is no
+ * - when items run, they consume budget. When they don't run, they don't
+ *   consume budget. If there is no budget left for the domain, no item of
+ *   that domain can run. If an item tries to run and finds that there is no
  *   budget, it blocks.
- *   At whatever time a vCPU wants to run, it must check the domain's budget,
+ *   At whatever time an item wants to run, it must check the domain's budget,
  *   and if there is some, it can use it.
  *
  * - budget is replenished to the top of the capacity for the domain once
@@ -129,39 +129,39 @@
  *   though, the budget after a replenishment will always be at most equal
  *   to the total capacify of the domain ('tot_budget');
  *
- * - when a budget replenishment occurs, if there are vCPUs that had been
+ * - when a budget replenishment occurs, if there are items that had been
  *   blocked because of lack of budget, they'll be unblocked, and they will
  *   (potentially) be able to run again.
  *
  * Finally, some even more implementation related detail:
  *
- * - budget is stored in a domain-wide pool. vCPUs of the domain that want
+ * - budget is stored in a domain-wide pool. Items of the domain that want
  *   to run go to such pool, and grub some. When they do so, the amount
  *   they grabbed is _immediately_ removed from the pool. This happens in
- *   vcpu_grab_budget();
+ *   item_grab_budget();
  *
- * - when vCPUs stop running, if they've not consumed all the budget they
+ * - when items stop running, if they've not consumed all the budget they
  *   took, the leftover is put back in the pool. This happens in
- *   vcpu_return_budget();
+ *   item_return_budget();
  *
- * - the above means that a vCPU can find out that there is no budget and
+ * - the above means that an item can find out that there is no budget and
  *   block, not only if the cap has actually been reached (for this period),
- *   but also if some other vCPUs, in order to run, have grabbed a certain
+ *   but also if some other items, in order to run, have grabbed a certain
  *   quota of budget, no matter whether they've already used it all or not.
- *   A vCPU blocking because (any form of) lack of budget is said to be
- *   "parked", and such blocking happens in park_vcpu();
+ *   An item blocking because (any form of) lack of budget is said to be
+ *   "parked", and such blocking happens in park_item();
  *
- * - when a vCPU stops running, and puts back some budget in the domain pool,
+ * - when an item stops running, and puts back some budget in the domain pool,
  *   we need to check whether there is someone which has been parked and that
- *   can be unparked. This happens in unpark_parked_vcpus(), called from
+ *   can be unparked. This happens in unpark_parked_items(), called from
  *   csched2_context_saved();
  *
  * - of course, unparking happens also as a consequence of the domain's budget
  *   being replenished by the periodic timer. This also occurs by means of
  *   calling csched2_context_saved() (but from replenish_domain_budget());
  *
- * - parked vCPUs of a domain are kept in a (per-domain) list, called
- *   'parked_vcpus'). Manipulation of the list and of the domain-wide budget
+ * - parked items of a domain are kept in a (per-domain) list, called
+ *   'parked_items'). Manipulation of the list and of the domain-wide budget
  *   pool, must occur only when holding the 'budget_lock'.
  */
 
@@ -174,9 +174,9 @@
  *     pcpu_schedule_lock() / item_schedule_lock() (and friends),
  *   * a cpu may (try to) take a "remote" runqueue lock, e.g., for
  *     load balancing;
- *  + serializes runqueue operations (removing and inserting vcpus);
+ *  + serializes runqueue operations (removing and inserting items);
  *  + protects runqueue-wide data in csched2_runqueue_data;
- *  + protects vcpu parameters in csched2_item for the vcpu in the
+ *  + protects item parameters in csched2_item for the item in the
  *    runqueue.
  *
  * - Private scheduler lock
@@ -190,8 +190,8 @@
  *  + it is per-domain;
  *  + protects, in domains that have an utilization cap;
  *   * manipulation of the total budget of the domain (as it is shared
- *     among all vCPUs of the domain),
- *   * manipulation of the list of vCPUs that are blocked waiting for
+ *     among all items of the domain),
+ *   * manipulation of the list of items that are blocked waiting for
  *     some budget to be available.
  *
  * - Type:
@@ -228,9 +228,9 @@
  */
 #define CSCHED2_CREDIT_INIT          MILLISECS(10)
 /*
- * Amount of credit the idle vcpus have. It never changes, as idle
- * vcpus does not consume credits, and it must be lower than whatever
- * amount of credit 'regular' vcpu would end up with.
+ * Amount of credit the idle items have. It never changes, as idle
+ * items does not consume credits, and it must be lower than whatever
+ * amount of credit 'regular' item would end up with.
  */
 #define CSCHED2_IDLE_CREDIT          (-(1U<<30))
 /*
@@ -243,9 +243,9 @@
  * MIN_TIMER.
  */
 #define CSCHED2_MIGRATE_RESIST       ((opt_migrate_resist)*MICROSECS(1))
-/* How much to "compensate" a vcpu for L2 migration. */
+/* How much to "compensate" an item for L2 migration. */
 #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50)
-/* How tolerant we should be when peeking at runtime of vcpus on other cpus */
+/* How tolerant we should be when peeking at runtime of items on other cpus */
 #define CSCHED2_RATELIMIT_TICKLE_TOLERANCE MICROSECS(50)
 /* Reset: Value below which credit will be reset. */
 #define CSCHED2_CREDIT_RESET         0
@@ -258,7 +258,7 @@
  * Flags
  */
 /*
- * CSFLAG_scheduled: Is this vcpu either running on, or context-switching off,
+ * CSFLAG_scheduled: Is this item either running on, or context-switching off,
  * a physical cpu?
  * + Accessed only with runqueue lock held
  * + Set when chosen as next in csched2_schedule().
@@ -280,21 +280,21 @@
 #define __CSFLAG_delayed_runq_add 2
 #define CSFLAG_delayed_runq_add (1U<<__CSFLAG_delayed_runq_add)
 /*
- * CSFLAG_runq_migrate_request: This vcpu is being migrated as a result of a
+ * CSFLAG_runq_migrate_request: This item is being migrated as a result of a
  * credit2-initiated runq migrate request; migrate it to the runqueue indicated
- * in the svc struct. 
+ * in the svc struct.
  */
 #define __CSFLAG_runq_migrate_request 3
 #define CSFLAG_runq_migrate_request (1U<<__CSFLAG_runq_migrate_request)
 /*
- * CSFLAG_vcpu_yield: this vcpu was running, and has called vcpu_yield(). The
+ * CSFLAG_item_yield: this item was running, and has called vcpu_yield(). The
  * scheduler is invoked to see if we can give the cpu to someone else, and
- * get back to the yielding vcpu in a while.
+ * get back to the yielding item in a while.
  */
-#define __CSFLAG_vcpu_yield 4
-#define CSFLAG_vcpu_yield (1U<<__CSFLAG_vcpu_yield)
+#define __CSFLAG_item_yield 4
+#define CSFLAG_item_yield (1U<<__CSFLAG_item_yield)
 /*
- * CSFLAGS_pinned: this vcpu is currently 'pinned', i.e., has its hard
+ * CSFLAGS_pinned: this item is currently 'pinned', i.e., has its hard
  * affinity set to one and only 1 cpu (and, hence, can only run there).
  */
 #define __CSFLAG_pinned 5
@@ -306,7 +306,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
 /*
  * Load tracking and load balancing
  *
- * Load history of runqueues and vcpus is accounted for by using an
+ * Load history of runqueues and items is accounted for by using an
  * exponential weighted moving average algorithm. However, instead of using
  * fractions,we shift everything to left by the number of bits we want to
  * use for representing the fractional part (Q-format).
@@ -326,7 +326,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
  *
  * where W is the length of the window, P the multiplier for transitiong into
  * Q-format fixed point arithmetic and load is the instantaneous load of a
- * runqueue, which basically is the number of runnable vcpus there are on the
+ * runqueue, which basically is the number of runnable items there are on the
  * runqueue (for the meaning of the other terms, look at the doc comment to
  *  update_runq_load()).
  *
@@ -338,7 +338,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
  * The maximum possible value for the average load, which we want to store in
  * s_time_t type variables (i.e., we have 63 bits available) is load*P. This
  * means that, with P 18 bits wide, load can occupy 45 bits. This in turn
- * means we can have 2^45 vcpus in each runqueue, before overflow occurs!
+ * means we can have 2^45 items in each runqueue, before overflow occurs!
  *
  * However, it can happen that, at step j+1, if:
  *
@@ -354,13 +354,13 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
  *
  *  2^(63 - 30 - 18) = 2^15 = 32768
  *
- * So 32768 is the maximum number of vcpus the we can have in a runqueue,
+ * So 32768 is the maximum number of items the we can have in a runqueue,
  * at any given time, and still not have problems with the load tracking
  * calculations... and this is more than fine.
  *
  * As a matter of fact, since we are using microseconds granularity, we have
  * W=2^20. So, still with 18 fractional bits and a 1 second long window, there
- * may be 2^25 = 33554432 vcpus in a runq before we have to start thinking
+ * may be 2^25 = 33554432 items in a runq before we have to start thinking
  * about overflow.
  */
 
@@ -468,7 +468,7 @@ struct csched2_runqueue_data {
     struct list_head runq;     /* Ordered list of runnable vms               */
     int id;                    /* ID of this runqueue (-1 if invalid)        */
 
-    int load;                  /* Instantaneous load (num of non-idle vcpus) */
+    int load;                  /* Instantaneous load (num of non-idle items) */
     s_time_t load_last_update; /* Last time average was updated              */
     s_time_t avgload;          /* Decaying queue load                        */
     s_time_t b_avgload;        /* Decaying queue load modified by balancing  */
@@ -478,8 +478,8 @@ struct csched2_runqueue_data {
         tickled,               /* Have been asked to go through schedule     */
         idle;                  /* Currently idle pcpus                       */
 
-    struct list_head svc;      /* List of all vcpus assigned to the runqueue */
-    unsigned int max_weight;   /* Max weight of the vcpus in this runqueue   */
+    struct list_head svc;      /* List of all items assigned to the runqueue */
+    unsigned int max_weight;   /* Max weight of the items in this runqueue   */
     unsigned int pick_bias;    /* Last picked pcpu. Start from it next time  */
 };
 
@@ -509,20 +509,20 @@ struct csched2_pcpu {
 };
 
 /*
- * Virtual CPU
+ * Schedule Item
  */
 struct csched2_item {
     struct csched2_dom *sdom;          /* Up-pointer to domain                */
-    struct vcpu *vcpu;                 /* Up-pointer, to vcpu                 */
+    struct sched_item *item;           /* Up-pointer, to schedule item        */
     struct csched2_runqueue_data *rqd; /* Up-pointer to the runqueue          */
 
     int credit;                        /* Current amount of credit            */
-    unsigned int weight;               /* Weight of this vcpu                 */
+    unsigned int weight;               /* Weight of this item                 */
     unsigned int residual;             /* Reminder of div(max_weight/weight)  */
     unsigned flags;                    /* Status flags (16 bits would be ok,  */
     s_time_t budget;                   /* Current budget (if domains has cap) */
                                        /* but clear_bit() does not like that) */
-    s_time_t budget_quota;             /* Budget to which vCPU is entitled    */
+    s_time_t budget_quota;             /* Budget to which item is entitled    */
 
     s_time_t start_time;               /* Time we were scheduled (for credit) */
 
@@ -531,7 +531,7 @@ struct csched2_item {
     s_time_t avgload;                  /* Decaying queue load                 */
 
     struct list_head runq_elem;        /* On the runqueue (rqd->runq)         */
-    struct list_head parked_elem;      /* On the parked_vcpus list            */
+    struct list_head parked_elem;      /* On the parked_items list            */
     struct list_head rqd_elem;         /* On csched2_runqueue_data's svc list */
     struct csched2_runqueue_data *migrate_rqd; /* Pre-determined migr. target */
     int tickled_cpu;                   /* Cpu that will pick us (-1 if none)  */
@@ -549,12 +549,12 @@ struct csched2_dom {
 
     struct timer repl_timer;    /* Timer for periodic replenishment of budget */
     s_time_t next_repl;         /* Time at which next replenishment occurs    */
-    struct list_head parked_vcpus; /* List of CPUs waiting for budget         */
+    struct list_head parked_items; /* List of CPUs waiting for budget         */
 
     struct list_head sdom_elem; /* On csched2_runqueue_data's sdom list       */
     uint16_t weight;            /* User specified weight                      */
     uint16_t cap;               /* User specified cap                         */
-    uint16_t nr_vcpus;          /* Number of vcpus of this domain             */
+    uint16_t nr_items;          /* Number of items of this domain             */
 };
 
 /*
@@ -593,7 +593,7 @@ static inline struct csched2_runqueue_data *c2rqd(const struct scheduler *ops,
     return &csched2_priv(ops)->rqd[c2r(cpu)];
 }
 
-/* Does the domain of this vCPU have a cap? */
+/* Does the domain of this item have a cap? */
 static inline bool has_cap(const struct csched2_item *svc)
 {
     return svc->budget != STIME_MAX;
@@ -611,24 +611,24 @@ static inline bool has_cap(const struct csched2_item *svc)
  *    smt_idle mask.
  *
  * Once we have such a mask, it is easy to implement a policy that, either:
- *  - uses fully idle cores first: it is enough to try to schedule the vcpus
+ *  - uses fully idle cores first: it is enough to try to schedule the items
  *    on pcpus from smt_idle mask first. This is what happens if
  *    sched_smt_power_savings was not set at boot (default), and it maximizes
  *    true parallelism, and hence performance;
- *  - uses already busy cores first: it is enough to try to schedule the vcpus
+ *  - uses already busy cores first: it is enough to try to schedule the items
  *    on pcpus that are idle, but are not in smt_idle. This is what happens if
  *    sched_smt_power_savings is set at boot, and it allows as more cores as
  *    possible to stay in low power states, minimizing power consumption.
  *
  * This logic is entirely implemented in runq_tickle(), and that is enough.
- * In fact, in this scheduler, placement of a vcpu on one of the pcpus of a
+ * In fact, in this scheduler, placement of an item on one of the pcpus of a
  * runq, _always_ happens by means of tickling:
- *  - when a vcpu wakes up, it calls csched2_item_wake(), which calls
+ *  - when an item wakes up, it calls csched2_item_wake(), which calls
  *    runq_tickle();
  *  - when a migration is initiated in schedule.c, we call csched2_res_pick(),
  *    csched2_item_migrate() (which calls migrate()) and csched2_item_wake().
  *    csched2_res_pick() looks for the least loaded runq and return just any
- *    of its processors. Then, csched2_item_migrate() just moves the vcpu to
+ *    of its processors. Then, csched2_item_migrate() just moves the item to
  *    the chosen runq, and it is again runq_tickle(), called by
  *    csched2_item_wake() that actually decides what pcpu to use within the
  *    chosen runq;
@@ -643,7 +643,7 @@ static inline bool has_cap(const struct csched2_item *svc)
  *
  * NB that rqd->smt_idle is different than rqd->idle.  rqd->idle
  * records pcpus that at are merely idle (i.e., at the moment do not
- * have a vcpu running on them).  But you have to manually filter out
+ * have an item running on them).  But you have to manually filter out
  * which pcpus have been tickled in order to find cores that are not
  * going to be busy soon.  Filtering out tickled cpus pairwise is a
  * lot of extra pain; so for rqd->smt_idle, we explicitly make so that
@@ -690,24 +690,24 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
  */
 static int get_fallback_cpu(struct csched2_item *svc)
 {
-    struct vcpu *v = svc->vcpu;
+    struct sched_item *item = svc->item;
     unsigned int bs;
 
     SCHED_STAT_CRANK(need_fallback_cpu);
 
     for_each_affinity_balance_step( bs )
     {
-        int cpu = v->processor;
+        int cpu = sched_item_cpu(item);
 
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v->sched_item) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(item) )
             continue;
 
-        affinity_balance_cpumask(v->sched_item, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(item, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                    cpupool_domain_cpumask(v->domain));
+                    cpupool_domain_cpumask(item->domain));
 
         /*
-         * This is cases 1 or 3 (depending on bs): if v->processor is (still)
+         * This is cases 1 or 3 (depending on bs): if processor is (still)
          * in our affinity, go for it, for cache betterness.
          */
         if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
@@ -729,7 +729,7 @@ static int get_fallback_cpu(struct csched2_item *svc)
          * We may well pick any valid pcpu from our soft-affinity, outside
          * of our current runqueue, but we decide not to. In fact, changing
          * runqueue is slow, affects load distribution, and is a source of
-         * overhead for the vcpus running on the other runqueue (we need the
+         * overhead for the items running on the other runqueue (we need the
          * lock). So, better do that as a consequence of a well informed
          * decision (or if we really don't have any other chance, as we will,
          * at step 5, if we get to there).
@@ -761,7 +761,7 @@ static int get_fallback_cpu(struct csched2_item *svc)
      * We can't be here.  But if that somehow happen (in non-debug builds),
      * at least return something which both online and in our hard-affinity.
      */
-    return cpumask_any(cpumask_scratch_cpu(v->processor));
+    return cpumask_any(cpumask_scratch_cpu(sched_item_cpu(item)));
 }
 
 /*
@@ -790,7 +790,7 @@ static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct c
  * Runqueue related code.
  */
 
-static inline int vcpu_on_runq(struct csched2_item *svc)
+static inline int item_on_runq(struct csched2_item *svc)
 {
     return !list_empty(&svc->runq_elem);
 }
@@ -948,17 +948,17 @@ _runq_assign(struct csched2_item *svc, struct csched2_runqueue_data *rqd)
 
     update_max_weight(svc->rqd, svc->weight, 0);
 
-    /* Expected new load based on adding this vcpu */
+    /* Expected new load based on adding this item */
     rqd->b_avgload += svc->avgload;
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned item:16, dom:16;
             unsigned rqi:16;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->item->domain->domain_id;
+        d.item = svc->item->item_id;
         d.rqi=rqd->id;
         __trace_var(TRC_CSCHED2_RUNQ_ASSIGN, 1,
                     sizeof(d),
@@ -968,13 +968,13 @@ _runq_assign(struct csched2_item *svc, struct csched2_runqueue_data *rqd)
 }
 
 static void
-runq_assign(const struct scheduler *ops, struct vcpu *vc)
+runq_assign(const struct scheduler *ops, struct sched_item *item)
 {
-    struct csched2_item *svc = vc->sched_item->priv;
+    struct csched2_item *svc = item->priv;
 
     ASSERT(svc->rqd == NULL);
 
-    _runq_assign(svc, c2rqd(ops, vc->processor));
+    _runq_assign(svc, c2rqd(ops, sched_item_cpu(item)));
 }
 
 static void
@@ -982,24 +982,24 @@ _runq_deassign(struct csched2_item *svc)
 {
     struct csched2_runqueue_data *rqd = svc->rqd;
 
-    ASSERT(!vcpu_on_runq(svc));
+    ASSERT(!item_on_runq(svc));
     ASSERT(!(svc->flags & CSFLAG_scheduled));
 
     list_del_init(&svc->rqd_elem);
     update_max_weight(rqd, 0, svc->weight);
 
-    /* Expected new load based on removing this vcpu */
+    /* Expected new load based on removing this item */
     rqd->b_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
 
     svc->rqd = NULL;
 }
 
 static void
-runq_deassign(const struct scheduler *ops, struct vcpu *vc)
+runq_deassign(const struct scheduler *ops, struct sched_item *item)
 {
-    struct csched2_item *svc = vc->sched_item->priv;
+    struct csched2_item *svc = item->priv;
 
-    ASSERT(svc->rqd == c2rqd(ops, vc->processor));
+    ASSERT(svc->rqd == c2rqd(ops, sched_item_cpu(item)));
 
     _runq_deassign(svc);
 }
@@ -1202,15 +1202,15 @@ update_svc_load(const struct scheduler *ops,
                 struct csched2_item *svc, int change, s_time_t now)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    s_time_t delta, vcpu_load;
+    s_time_t delta, item_load;
     unsigned int P, W;
 
     if ( change == -1 )
-        vcpu_load = 1;
+        item_load = 1;
     else if ( change == 1 )
-        vcpu_load = 0;
+        item_load = 0;
     else
-        vcpu_load = vcpu_runnable(svc->vcpu);
+        item_load = item_runnable(svc->item);
 
     W = prv->load_window_shift;
     P = prv->load_precision_shift;
@@ -1218,7 +1218,7 @@ update_svc_load(const struct scheduler *ops,
 
     if ( svc->load_last_update + (1ULL << W) < now )
     {
-        svc->avgload = vcpu_load << P;
+        svc->avgload = item_load << P;
     }
     else
     {
@@ -1231,7 +1231,7 @@ update_svc_load(const struct scheduler *ops,
         }
 
         svc->avgload = svc->avgload +
-                       ((delta * (vcpu_load << P)) >> W) -
+                       ((delta * (item_load << P)) >> W) -
                        ((delta * svc->avgload) >> W);
     }
     svc->load_last_update = now;
@@ -1243,14 +1243,14 @@ update_svc_load(const struct scheduler *ops,
     {
         struct {
             uint64_t v_avgload;
-            unsigned vcpu:16, dom:16;
+            unsigned item:16, dom:16;
             unsigned shift;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->item->domain->domain_id;
+        d.item = svc->item->item_id;
         d.v_avgload = svc->avgload;
         d.shift = P;
-        __trace_var(TRC_CSCHED2_UPDATE_VCPU_LOAD, 1,
+        __trace_var(TRC_CSCHED2_UPDATE_ITEM_LOAD, 1,
                     sizeof(d),
                     (unsigned char *)&d);
     }
@@ -1272,18 +1272,18 @@ static void
 runq_insert(const struct scheduler *ops, struct csched2_item *svc)
 {
     struct list_head *iter;
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_item_cpu(svc->item);
     struct list_head * runq = &c2rqd(ops, cpu)->runq;
     int pos = 0;
 
     ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
-    ASSERT(!vcpu_on_runq(svc));
-    ASSERT(c2r(cpu) == c2r(svc->vcpu->processor));
+    ASSERT(!item_on_runq(svc));
+    ASSERT(c2r(cpu) == c2r(sched_item_cpu(svc->item)));
 
     ASSERT(&svc->rqd->runq == runq);
-    ASSERT(!is_idle_vcpu(svc->vcpu));
-    ASSERT(!vcpu_running(svc->vcpu));
+    ASSERT(!is_idle_item(svc->item));
+    ASSERT(!svc->item->is_running);
     ASSERT(!(svc->flags & CSFLAG_scheduled));
 
     list_for_each( iter, runq )
@@ -1300,11 +1300,11 @@ runq_insert(const struct scheduler *ops, struct csched2_item *svc)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned item:16, dom:16;
             unsigned pos;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->item->domain->domain_id;
+        d.item = svc->item->item_id;
         d.pos = pos;
         __trace_var(TRC_CSCHED2_RUNQ_POS, 1,
                     sizeof(d),
@@ -1314,7 +1314,7 @@ runq_insert(const struct scheduler *ops, struct csched2_item *svc)
 
 static inline void runq_remove(struct csched2_item *svc)
 {
-    ASSERT(vcpu_on_runq(svc));
+    ASSERT(item_on_runq(svc));
     list_del_init(&svc->runq_elem);
 }
 
@@ -1340,8 +1340,8 @@ static inline bool is_preemptable(const struct csched2_item *svc,
     if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
         return true;
 
-    ASSERT(vcpu_running(svc->vcpu));
-    return now - svc->vcpu->sched_item->state_entry_time >
+    ASSERT(svc->item->is_running);
+    return now - svc->item->state_entry_time >
            ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE;
 }
 
@@ -1369,17 +1369,17 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
 
     /*
      * We are dealing with cpus that are marked non-idle (i.e., that are not
-     * in rqd->idle). However, some of them may be running their idle vcpu,
+     * in rqd->idle). However, some of them may be running their idle item,
      * if taking care of tasklets. In that case, we want to leave it alone.
      */
-    if ( unlikely(is_idle_vcpu(cur->vcpu) ||
+    if ( unlikely(is_idle_item(cur->item) ||
          !is_preemptable(cur, now, MICROSECS(prv->ratelimit_us))) )
         return -1;
 
     burn_credits(rqd, cur, now);
 
     score = new->credit - cur->credit;
-    if ( new->vcpu->processor != cpu )
+    if ( sched_item_cpu(new->item) != cpu )
         score -= CSCHED2_MIGRATE_RESIST;
 
     /*
@@ -1390,21 +1390,21 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
      */
     if ( score > 0 )
     {
-        if ( cpumask_test_cpu(cpu, new->vcpu->sched_item->cpu_soft_affinity) )
+        if ( cpumask_test_cpu(cpu, new->item->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
 
-        if ( !cpumask_test_cpu(cpu, cur->vcpu->sched_item->cpu_soft_affinity) )
+        if ( !cpumask_test_cpu(cpu, cur->item->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
     }
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned item:16, dom:16;
             int credit, score;
         } d;
-        d.dom = cur->vcpu->domain->domain_id;
-        d.vcpu = cur->vcpu->vcpu_id;
+        d.dom = cur->item->domain->domain_id;
+        d.item = cur->item->item_id;
         d.credit = cur->credit;
         d.score = score;
         __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
@@ -1416,14 +1416,14 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
 }
 
 /*
- * Check what processor it is best to 'wake', for picking up a vcpu that has
+ * Check what processor it is best to 'wake', for picking up an item that has
  * just been put (back) in the runqueue. Logic is as follows:
  *  1. if there are idle processors in the runq, wake one of them;
- *  2. if there aren't idle processor, check the one were the vcpu was
+ *  2. if there aren't idle processor, check the one were the item was
  *     running before to see if we can preempt what's running there now
  *     (and hence doing just one migration);
- *  3. last stand: check all processors and see if the vcpu is in right
- *     of preempting any of the other vcpus running on them (this requires
+ *  3. last stand: check all processors and see if the item is in right
+ *     of preempting any of the other items running on them (this requires
  *     two migrations, and that's indeed why it is left as the last stand).
  *
  * Note that when we say 'idle processors' what we really mean is (pretty
@@ -1436,10 +1436,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_item *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t max = 0;
-    struct sched_item *item = new->vcpu->sched_item;
-    unsigned int bs, cpu = new->vcpu->processor;
+    struct sched_item *item = new->item;
+    unsigned int bs, cpu = sched_item_cpu(item);
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
-    cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain);
+    cpumask_t *online = cpupool_domain_cpumask(item->domain);
     cpumask_t mask;
 
     ASSERT(new->rqd == rqd);
@@ -1447,13 +1447,13 @@ runq_tickle(const struct scheduler *ops, struct csched2_item *new, s_time_t now)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned item:16, dom:16;
             unsigned processor;
             int credit;
         } d;
-        d.dom = new->vcpu->domain->domain_id;
-        d.vcpu = new->vcpu->vcpu_id;
-        d.processor = new->vcpu->processor;
+        d.dom = item->domain->domain_id;
+        d.item = item->item_id;
+        d.processor = cpu;
         d.credit = new->credit;
         __trace_var(TRC_CSCHED2_TICKLE_NEW, 1,
                     sizeof(d),
@@ -1461,11 +1461,11 @@ runq_tickle(const struct scheduler *ops, struct csched2_item *new, s_time_t now)
     }
 
     /*
-     * Exclusive pinning is when a vcpu has hard-affinity with only one
-     * cpu, and there is no other vcpu that has hard-affinity with that
+     * Exclusive pinning is when an item has hard-affinity with only one
+     * cpu, and there is no other item that has hard-affinity with that
      * same cpu. This is infrequent, but if it happens, is for achieving
      * the most possible determinism, and least possible overhead for
-     * the vcpus in question.
+     * the items in question.
      *
      * Try to identify the vast majority of these situations, and deal
      * with them quickly.
@@ -1532,7 +1532,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_item *new, s_time_t now)
     /*
      * Note that, if we are here, it means we have done the hard-affinity
      * balancing step of the loop, and hence what we have in cpumask_scratch
-     * is what we put there for last, i.e., new's vcpu_hard_affinity & online
+     * is what we put there for last, i.e., new's item_hard_affinity & online
      * which is exactly what we need for the next part of the function.
      */
 
@@ -1543,7 +1543,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_item *new, s_time_t now)
      *
      * For deciding which cpu to tickle, we use tickle_score(), which will
      * factor in both new's soft-affinity, and the soft-affinity of the
-     * vcpu running on each cpu that we consider.
+     * item running on each cpu that we consider.
      */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
@@ -1588,7 +1588,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_item *new, s_time_t now)
         return;
     }
 
-    ASSERT(!is_idle_vcpu(curr_on_cpu(ipid)->vcpu));
+    ASSERT(!is_idle_item(curr_on_cpu(ipid)));
     SCHED_STAT_CRANK(tickled_busy_cpu);
  tickle:
     BUG_ON(ipid == -1);
@@ -1623,16 +1623,16 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
 
     /*
      * Under normal circumstances, snext->credit should never be less
-     * than -CSCHED2_MIN_TIMER.  However, under some circumstances, a
-     * vcpu with low credits may be allowed to run long enough that
+     * than -CSCHED2_MIN_TIMER.  However, under some circumstances, an
+     * item with low credits may be allowed to run long enough that
      * its credits are actually less than -CSCHED2_CREDIT_INIT.
-     * (Instances have been observed, for example, where a vcpu with
+     * (Instances have been observed, for example, where an item with
      * 200us of credit was allowed to run for 11ms, giving it -10.8ms
      * of credit.  Thus it was still negative even after the reset.)
      *
      * If this is the case for snext, we simply want to keep moving
      * everyone up until it is in the black again.  This fair because
-     * none of the other vcpus want to run at the moment.
+     * none of the other items want to run at the moment.
      *
      * Rather than looping, however, we just calculate a multiplier,
      * avoiding an integer division and multiplication in the common
@@ -1649,16 +1649,16 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
         int start_credit;
 
         svc = list_entry(iter, struct csched2_item, rqd_elem);
-        svc_cpu = svc->vcpu->processor;
+        svc_cpu = sched_item_cpu(svc->item);
 
-        ASSERT(!is_idle_vcpu(svc->vcpu));
+        ASSERT(!is_idle_item(svc->item));
         ASSERT(svc->rqd == rqd);
 
         /*
          * If svc is running, it is our responsibility to make sure, here,
          * that the credit it has spent so far get accounted.
          */
-        if ( svc->vcpu == curr_on_cpu(svc_cpu)->vcpu )
+        if ( svc->item == curr_on_cpu(svc_cpu) )
         {
             burn_credits(rqd, svc, now);
             /*
@@ -1689,12 +1689,12 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned item:16, dom:16;
                 int credit_start, credit_end;
                 unsigned multiplier;
             } d;
-            d.dom = svc->vcpu->domain->domain_id;
-            d.vcpu = svc->vcpu->vcpu_id;
+            d.dom = svc->item->domain->domain_id;
+            d.item = svc->item->item_id;
             d.credit_start = start_credit;
             d.credit_end = svc->credit;
             d.multiplier = m;
@@ -1714,9 +1714,9 @@ void burn_credits(struct csched2_runqueue_data *rqd,
 {
     s_time_t delta;
 
-    ASSERT(svc == csched2_item(curr_on_cpu(svc->vcpu->processor)));
+    ASSERT(svc == csched2_item(curr_on_cpu(sched_item_cpu(svc->item))));
 
-    if ( unlikely(is_idle_vcpu(svc->vcpu)) )
+    if ( unlikely(is_idle_item(svc->item)) )
     {
         ASSERT(svc->credit == CSCHED2_IDLE_CREDIT);
         return;
@@ -1745,12 +1745,12 @@ void burn_credits(struct csched2_runqueue_data *rqd,
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned item:16, dom:16;
             int credit, budget;
             int delta;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = svc->item->domain->domain_id;
+        d.item = svc->item->item_id;
         d.credit = svc->credit;
         d.budget = has_cap(svc) ?  svc->budget : INT_MIN;
         d.delta = delta;
@@ -1764,39 +1764,39 @@ void burn_credits(struct csched2_runqueue_data *rqd,
  * Budget-related code.
  */
 
-static void park_vcpu(struct csched2_item *svc)
+static void park_item(struct csched2_item *svc)
 {
-    struct vcpu *v = svc->vcpu;
+    struct sched_item *item = svc->item;
 
     ASSERT(spin_is_locked(&svc->sdom->budget_lock));
 
     /*
-     * It was impossible to find budget for this vCPU, so it has to be
+     * It was impossible to find budget for this item, so it has to be
      * "parked". This implies it is not runnable, so we mark it as such in
-     * its pause_flags. If the vCPU is currently scheduled (which means we
+     * its pause_flags. If the item is currently scheduled (which means we
      * are here after being called from within csched_schedule()), flagging
      * is enough, as we'll choose someone else, and then context_saved()
      * will take care of updating the load properly.
      *
-     * If, OTOH, the vCPU is sitting in the runqueue (which means we are here
+     * If, OTOH, the item is sitting in the runqueue (which means we are here
      * after being called from within runq_candidate()), we must go all the
      * way down to taking it out of there, and updating the load accordingly.
      *
-     * In both cases, we also add it to the list of parked vCPUs of the domain.
+     * In both cases, we also add it to the list of parked items of the domain.
      */
-    __set_bit(_VPF_parked, &v->pause_flags);
-    if ( vcpu_on_runq(svc) )
+    sched_set_pause_flags(item, _VPF_parked);
+    if ( item_on_runq(svc) )
     {
         runq_remove(svc);
         update_load(svc->sdom->dom->cpupool->sched, svc->rqd, svc, -1, NOW());
     }
-    list_add(&svc->parked_elem, &svc->sdom->parked_vcpus);
+    list_add(&svc->parked_elem, &svc->sdom->parked_items);
 }
 
-static bool vcpu_grab_budget(struct csched2_item *svc)
+static bool item_grab_budget(struct csched2_item *svc)
 {
     struct csched2_dom *sdom = svc->sdom;
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_item_cpu(svc->item);
 
     ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
@@ -1808,9 +1808,9 @@ static bool vcpu_grab_budget(struct csched2_item *svc)
 
     /*
      * Here, svc->budget is <= 0 (as, if it was > 0, we'd have taken the if
-     * above!). That basically means the vCPU has overrun a bit --because of
+     * above!). That basically means the item has overrun a bit --because of
      * various reasons-- and we want to take that into account. With the +=,
-     * we are actually subtracting the amount of budget the vCPU has
+     * we are actually subtracting the amount of budget the item has
      * overconsumed, from the total domain budget.
      */
     sdom->budget += svc->budget;
@@ -1831,7 +1831,7 @@ static bool vcpu_grab_budget(struct csched2_item *svc)
     else
     {
         svc->budget = 0;
-        park_vcpu(svc);
+        park_item(svc);
     }
 
     spin_unlock(&sdom->budget_lock);
@@ -1840,10 +1840,10 @@ static bool vcpu_grab_budget(struct csched2_item *svc)
 }
 
 static void
-vcpu_return_budget(struct csched2_item *svc, struct list_head *parked)
+item_return_budget(struct csched2_item *svc, struct list_head *parked)
 {
     struct csched2_dom *sdom = svc->sdom;
-    unsigned int cpu = svc->vcpu->processor;
+    unsigned int cpu = sched_item_cpu(svc->item);
 
     ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
     ASSERT(list_empty(parked));
@@ -1852,7 +1852,7 @@ vcpu_return_budget(struct csched2_item *svc, struct list_head *parked)
     spin_lock(&sdom->budget_lock);
 
     /*
-     * The vCPU is stopping running (e.g., because it's blocking, or it has
+     * The item is stopping running (e.g., because it's blocking, or it has
      * been preempted). If it hasn't consumed all the budget it got when,
      * starting to run, put that remaining amount back in the domain's budget
      * pool.
@@ -1861,58 +1861,58 @@ vcpu_return_budget(struct csched2_item *svc, struct list_head *parked)
     svc->budget = 0;
 
     /*
-     * Making budget available again to the domain means that parked vCPUs
-     * may be unparked and run. They are, if any, in the domain's parked_vcpus
+     * Making budget available again to the domain means that parked items
+     * may be unparked and run. They are, if any, in the domain's parked_items
      * list, so we want to go through that and unpark them (so they can try
      * to get some budget).
      *
      * Touching the list requires the budget_lock, which we hold. Let's
      * therefore put everyone in that list in another, temporary list, which
-     * then the caller will traverse, unparking the vCPUs it finds there.
+     * then the caller will traverse, unparking the items it finds there.
      *
      * In fact, we can't do the actual unparking here, because that requires
-     * taking the runqueue lock of the vCPUs being unparked, and we can't
+     * taking the runqueue lock of the items being unparked, and we can't
      * take any runqueue locks while we hold a budget_lock.
      */
     if ( sdom->budget > 0 )
-        list_splice_init(&sdom->parked_vcpus, parked);
+        list_splice_init(&sdom->parked_items, parked);
 
     spin_unlock(&sdom->budget_lock);
 }
 
 static void
-unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
+unpark_parked_items(const struct scheduler *ops, struct list_head *items)
 {
     struct csched2_item *svc, *tmp;
     spinlock_t *lock;
 
-    list_for_each_entry_safe(svc, tmp, vcpus, parked_elem)
+    list_for_each_entry_safe(svc, tmp, items, parked_elem)
     {
         unsigned long flags;
         s_time_t now;
 
-        lock = item_schedule_lock_irqsave(svc->vcpu->sched_item, &flags);
+        lock = item_schedule_lock_irqsave(svc->item, &flags);
 
-        __clear_bit(_VPF_parked, &svc->vcpu->pause_flags);
+        sched_clear_pause_flags(svc->item, _VPF_parked);
         if ( unlikely(svc->flags & CSFLAG_scheduled) )
         {
             /*
              * We end here if a budget replenishment arrived between
              * csched2_schedule() (and, in particular, after a call to
-             * vcpu_grab_budget() that returned false), and
+             * item_grab_budget() that returned false), and
              * context_saved(). By setting __CSFLAG_delayed_runq_add,
-             * we tell context_saved() to put the vCPU back in the
+             * we tell context_saved() to put the item back in the
              * runqueue, from where it will compete with the others
              * for the newly replenished budget.
              */
             ASSERT( svc->rqd != NULL );
-            ASSERT( c2rqd(ops, svc->vcpu->processor) == svc->rqd );
+            ASSERT( c2rqd(ops, sched_item_cpu(svc->item)) == svc->rqd );
             __set_bit(__CSFLAG_delayed_runq_add, &svc->flags);
         }
-        else if ( vcpu_runnable(svc->vcpu) )
+        else if ( item_runnable(svc->item) )
         {
             /*
-             * The vCPU should go back to the runqueue, and compete for
+             * The item should go back to the runqueue, and compete for
              * the newly replenished budget, but only if it is actually
              * runnable (and was therefore offline only because of the
              * lack of budget).
@@ -1924,7 +1924,7 @@ unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus)
         }
         list_del_init(&svc->parked_elem);
 
-        item_schedule_unlock_irqrestore(lock, flags, svc->vcpu->sched_item);
+        item_schedule_unlock_irqrestore(lock, flags, svc->item);
     }
 }
 
@@ -1954,7 +1954,7 @@ static void replenish_domain_budget(void* data)
      *
      * Even in cases of overrun or delay, however, we expect that in 99% of
      * cases, doing just one replenishment will be good enough for being able
-     * to unpark the vCPUs that are waiting for some budget.
+     * to unpark the items that are waiting for some budget.
      */
     do_replenish(sdom);
 
@@ -1974,7 +1974,7 @@ static void replenish_domain_budget(void* data)
     }
     /*
      * 2) if we overrun by more than tot_budget, then budget+tot_budget is
-     * still < 0, which means that we can't unpark the vCPUs. Let's bail,
+     * still < 0, which means that we can't unpark the items. Let's bail,
      * and wait for future replenishments.
      */
     if ( unlikely(sdom->budget <= 0) )
@@ -1988,14 +1988,14 @@ static void replenish_domain_budget(void* data)
 
     /*
      * As above, let's prepare the temporary list, out of the domain's
-     * parked_vcpus list, now that we hold the budget_lock. Then, drop such
+     * parked_items list, now that we hold the budget_lock. Then, drop such
      * lock, and pass the list to the unparking function.
      */
-    list_splice_init(&sdom->parked_vcpus, &parked);
+    list_splice_init(&sdom->parked_items, &parked);
 
     spin_unlock_irqrestore(&sdom->budget_lock, flags);
 
-    unpark_parked_vcpus(sdom->dom->cpupool->sched, &parked);
+    unpark_parked_items(sdom->dom->cpupool->sched, &parked);
 
  out:
     set_timer(&sdom->repl_timer, sdom->next_repl);
@@ -2003,37 +2003,36 @@ static void replenish_domain_budget(void* data)
 
 #ifndef NDEBUG
 static inline void
-csched2_vcpu_check(struct vcpu *vc)
+csched2_item_check(struct sched_item *item)
 {
-    struct csched2_item * const svc = csched2_item(vc->sched_item);
+    struct csched2_item * const svc = csched2_item(item);
     struct csched2_dom * const sdom = svc->sdom;
 
-    BUG_ON( svc->vcpu != vc );
-    BUG_ON( sdom != csched2_dom(vc->domain) );
+    BUG_ON( svc->item != item );
+    BUG_ON( sdom != csched2_dom(item->domain) );
     if ( sdom )
     {
-        BUG_ON( is_idle_vcpu(vc) );
-        BUG_ON( sdom->dom != vc->domain );
+        BUG_ON( is_idle_item(item) );
+        BUG_ON( sdom->dom != item->domain );
     }
     else
     {
-        BUG_ON( !is_idle_vcpu(vc) );
+        BUG_ON( !is_idle_item(item) );
     }
     SCHED_STAT_CRANK(item_check);
 }
-#define CSCHED2_VCPU_CHECK(_vc)  (csched2_vcpu_check(_vc))
+#define CSCHED2_ITEM_CHECK(item)  (csched2_item_check(item))
 #else
-#define CSCHED2_VCPU_CHECK(_vc)
+#define CSCHED2_ITEM_CHECK(item)
 #endif
 
 static void *
 csched2_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
                     void *dd)
 {
-    struct vcpu *vc = item->vcpu;
     struct csched2_item *svc;
 
-    /* Allocate per-VCPU info */
+    /* Allocate per-ITEM info */
     svc = xzalloc(struct csched2_item);
     if ( svc == NULL )
         return NULL;
@@ -2042,10 +2041,10 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
     INIT_LIST_HEAD(&svc->runq_elem);
 
     svc->sdom = dd;
-    svc->vcpu = vc;
+    svc->item = item;
     svc->flags = 0U;
 
-    if ( ! is_idle_vcpu(vc) )
+    if ( ! is_idle_item(item) )
     {
         ASSERT(svc->sdom != NULL);
         svc->credit = CSCHED2_CREDIT_INIT;
@@ -2074,19 +2073,18 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
 static void
 csched2_item_sleep(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct csched2_item * const svc = csched2_item(item);
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_item(item));
     SCHED_STAT_CRANK(item_sleep);
 
-    if ( curr_on_cpu(vc->processor) == item )
+    if ( curr_on_cpu(sched_item_cpu(item)) == item )
     {
-        tickle_cpu(vc->processor, svc->rqd);
+        tickle_cpu(sched_item_cpu(item), svc->rqd);
     }
-    else if ( vcpu_on_runq(svc) )
+    else if ( item_on_runq(svc) )
     {
-        ASSERT(svc->rqd == c2rqd(ops, vc->processor));
+        ASSERT(svc->rqd == c2rqd(ops, sched_item_cpu(item)));
         update_load(ops, svc->rqd, svc, -1, NOW());
         runq_remove(svc);
     }
@@ -2097,14 +2095,13 @@ csched2_item_sleep(const struct scheduler *ops, struct sched_item *item)
 static void
 csched2_item_wake(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct csched2_item * const svc = csched2_item(item);
-    unsigned int cpu = vc->processor;
+    unsigned int cpu = sched_item_cpu(item);
     s_time_t now;
 
     ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_item(item));
 
     if ( unlikely(curr_on_cpu(cpu) == item) )
     {
@@ -2112,18 +2109,18 @@ csched2_item_wake(const struct scheduler *ops, struct sched_item *item)
         goto out;
     }
 
-    if ( unlikely(vcpu_on_runq(svc)) )
+    if ( unlikely(item_on_runq(svc)) )
     {
         SCHED_STAT_CRANK(item_wake_onrunq);
         goto out;
     }
 
-    if ( likely(vcpu_runnable(vc)) )
+    if ( likely(item_runnable(item)) )
         SCHED_STAT_CRANK(item_wake_runnable);
     else
         SCHED_STAT_CRANK(item_wake_not_runnable);
 
-    /* If the context hasn't been saved for this vcpu yet, we can't put it on
+    /* If the context hasn't been saved for this item yet, we can't put it on
      * another runqueue.  Instead, we set a flag so that it will be put on the runqueue
      * after the context has been saved. */
     if ( unlikely(svc->flags & CSFLAG_scheduled) )
@@ -2134,15 +2131,15 @@ csched2_item_wake(const struct scheduler *ops, struct sched_item *item)
 
     /* Add into the new runqueue if necessary */
     if ( svc->rqd == NULL )
-        runq_assign(ops, vc);
+        runq_assign(ops, item);
     else
-        ASSERT(c2rqd(ops, vc->processor) == svc->rqd );
+        ASSERT(c2rqd(ops, sched_item_cpu(item)) == svc->rqd );
 
     now = NOW();
 
     update_load(ops, svc->rqd, svc, 1, now);
-        
-    /* Put the VCPU on the runq */
+
+    /* Put the ITEM on the runq */
     runq_insert(ops, svc);
     runq_tickle(ops, svc, now);
 
@@ -2155,49 +2152,48 @@ csched2_item_yield(const struct scheduler *ops, struct sched_item *item)
 {
     struct csched2_item * const svc = csched2_item(item);
 
-    __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
+    __set_bit(__CSFLAG_item_yield, &svc->flags);
 }
 
 static void
 csched2_context_saved(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct csched2_item * const svc = csched2_item(item);
     spinlock_t *lock = item_schedule_lock_irq(item);
     s_time_t now = NOW();
     LIST_HEAD(were_parked);
 
-    BUG_ON( !is_idle_vcpu(vc) && svc->rqd != c2rqd(ops, vc->processor));
-    ASSERT(is_idle_vcpu(vc) || svc->rqd == c2rqd(ops, vc->processor));
+    BUG_ON( !is_idle_item(item) && svc->rqd != c2rqd(ops, sched_item_cpu(item)));
+    ASSERT(is_idle_item(item) || svc->rqd == c2rqd(ops, sched_item_cpu(item)));
 
-    /* This vcpu is now eligible to be put on the runqueue again */
+    /* This item is now eligible to be put on the runqueue again */
     __clear_bit(__CSFLAG_scheduled, &svc->flags);
 
     if ( unlikely(has_cap(svc) && svc->budget > 0) )
-        vcpu_return_budget(svc, &were_parked);
+        item_return_budget(svc, &were_parked);
 
     /* If someone wants it on the runqueue, put it there. */
     /*
      * NB: We can get rid of CSFLAG_scheduled by checking for
-     * vc->is_running and vcpu_on_runq(svc) here.  However,
+     * vc->is_running and item_on_runq(svc) here.  However,
      * since we're accessing the flags cacheline anyway,
      * it seems a bit pointless; especially as we have plenty of
      * bits free.
      */
     if ( __test_and_clear_bit(__CSFLAG_delayed_runq_add, &svc->flags)
-         && likely(vcpu_runnable(vc)) )
+         && likely(item_runnable(item)) )
     {
-        ASSERT(!vcpu_on_runq(svc));
+        ASSERT(!item_on_runq(svc));
 
         runq_insert(ops, svc);
         runq_tickle(ops, svc, now);
     }
-    else if ( !is_idle_vcpu(vc) )
+    else if ( !is_idle_item(item) )
         update_load(ops, svc->rqd, svc, -1, now);
 
     item_schedule_unlock_irq(lock, item);
 
-    unpark_parked_vcpus(ops, &were_parked);
+    unpark_parked_items(ops, &were_parked);
 }
 
 #define MAX_LOAD (STIME_MAX)
@@ -2205,9 +2201,8 @@ static struct sched_resource *
 csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    struct vcpu *vc = item->vcpu;
     int i, min_rqi = -1, min_s_rqi = -1;
-    unsigned int new_cpu, cpu = vc->processor;
+    unsigned int new_cpu, cpu = sched_item_cpu(item);
     struct csched2_item *svc = csched2_item(item);
     s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
     bool has_soft;
@@ -2245,7 +2240,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
     }
 
     cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
-                cpupool_domain_cpumask(vc->domain));
+                cpupool_domain_cpumask(item->domain));
 
     /*
      * First check to see if we're here because someone else suggested a place
@@ -2356,7 +2351,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
          * We have soft affinity, and we have a candidate runq, so go for it.
          *
          * Note that, to obtain the soft-affinity mask, we "just" put what we
-         * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
+         * have in cpumask_scratch in && with item->cpu_soft_affinity. This is
          * ok because:
          * - we know that item->cpu_hard_affinity and ->cpu_soft_affinity have
          *   a non-empty intersection (because has_soft is true);
@@ -2379,7 +2374,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
          * any suitable runq. But we did find one when considering hard
          * affinity, so go for it.
          *
-         * cpumask_scratch already has vc->cpu_hard_affinity &
+         * cpumask_scratch already has item->cpu_hard_affinity &
          * cpupool_domain_cpumask() in it, so it's enough that we filter
          * with the cpus of the runq.
          */
@@ -2410,11 +2405,11 @@ csched2_res_pick(const struct scheduler *ops, struct sched_item *item)
     {
         struct {
             uint64_t b_avgload;
-            unsigned vcpu:16, dom:16;
+            unsigned item:16, dom:16;
             unsigned rq_id:16, new_cpu:16;
         } d;
-        d.dom = vc->domain->domain_id;
-        d.vcpu = vc->vcpu_id;
+        d.dom = item->domain->domain_id;
+        d.item = item->item_id;
         d.rq_id = min_rqi;
         d.b_avgload = min_avgload;
         d.new_cpu = new_cpu;
@@ -2433,10 +2428,10 @@ typedef struct {
     struct csched2_item * best_push_svc, *best_pull_svc;
     /* NB: Read by consider() */
     struct csched2_runqueue_data *lrqd;
-    struct csched2_runqueue_data *orqd;                  
+    struct csched2_runqueue_data *orqd;
 } balance_state_t;
 
-static void consider(balance_state_t *st, 
+static void consider(balance_state_t *st,
                      struct csched2_item *push_svc,
                      struct csched2_item *pull_svc)
 {
@@ -2475,17 +2470,17 @@ static void migrate(const struct scheduler *ops,
                     struct csched2_runqueue_data *trqd,
                     s_time_t now)
 {
-    int cpu = svc->vcpu->processor;
-    struct sched_item *item = svc->vcpu->sched_item;
+    struct sched_item *item = svc->item;
+    int cpu = sched_item_cpu(item);
 
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned item:16, dom:16;
             unsigned rqi:16, trqi:16;
         } d;
-        d.dom = svc->vcpu->domain->domain_id;
-        d.vcpu = svc->vcpu->vcpu_id;
+        d.dom = item->domain->domain_id;
+        d.item = item->item_id;
         d.rqi = svc->rqd->id;
         d.trqi = trqd->id;
         __trace_var(TRC_CSCHED2_MIGRATE, 1,
@@ -2497,7 +2492,7 @@ static void migrate(const struct scheduler *ops,
     {
         /* It's running; mark it to migrate. */
         svc->migrate_rqd = trqd;
-        __set_bit(_VPF_migrating, &svc->vcpu->pause_flags);
+        sched_set_pause_flags(item, _VPF_migrating);
         __set_bit(__CSFLAG_runq_migrate_request, &svc->flags);
         SCHED_STAT_CRANK(migrate_requested);
         tickle_cpu(cpu, svc->rqd);
@@ -2506,7 +2501,7 @@ static void migrate(const struct scheduler *ops,
     {
         int on_runq = 0;
         /* It's not running; just move it */
-        if ( vcpu_on_runq(svc) )
+        if ( item_on_runq(svc) )
         {
             runq_remove(svc);
             update_load(ops, svc->rqd, NULL, -1, now);
@@ -2515,14 +2510,14 @@ static void migrate(const struct scheduler *ops,
         _runq_deassign(svc);
 
         cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
-                    cpupool_domain_cpumask(svc->vcpu->domain));
+                    cpupool_domain_cpumask(item->domain));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
-        svc->vcpu->processor = cpumask_cycle(trqd->pick_bias,
-                                             cpumask_scratch_cpu(cpu));
-        svc->vcpu->sched_item->res = per_cpu(sched_res, svc->vcpu->processor);
-        trqd->pick_bias = svc->vcpu->processor;
-        ASSERT(svc->vcpu->processor < nr_cpu_ids);
+        sched_set_res(item, per_cpu(sched_res,
+                                    cpumask_cycle(trqd->pick_bias,
+                                                  cpumask_scratch_cpu(cpu))));
+        trqd->pick_bias = sched_item_cpu(item);
+        ASSERT(sched_item_cpu(item) < nr_cpu_ids);
 
         _runq_assign(svc, trqd);
         if ( on_runq )
@@ -2542,14 +2537,14 @@ static void migrate(const struct scheduler *ops,
  *  - svc is not already flagged to migrate,
  *  - if svc is allowed to run on at least one of the pcpus of rqd.
  */
-static bool vcpu_is_migrateable(struct csched2_item *svc,
+static bool item_is_migrateable(struct csched2_item *svc,
                                   struct csched2_runqueue_data *rqd)
 {
-    struct vcpu *v = svc->vcpu;
-    int cpu = svc->vcpu->processor;
+    struct sched_item *item = svc->item;
+    int cpu = sched_item_cpu(item);
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->sched_item->cpu_hard_affinity,
-                cpupool_domain_cpumask(v->domain));
+    cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
+                cpupool_domain_cpumask(item->domain));
 
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
            cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
@@ -2586,7 +2581,7 @@ retry:
     for_each_cpu(i, &prv->active_queues)
     {
         s_time_t delta;
-        
+
         st.orqd = prv->rqd + i;
 
         if ( st.orqd == st.lrqd
@@ -2594,7 +2589,7 @@ retry:
             continue;
 
         update_runq_load(ops, st.orqd, 0, now);
-    
+
         delta = st.lrqd->b_avgload - st.orqd->b_avgload;
         if ( delta < 0 )
             delta = -delta;
@@ -2617,7 +2612,7 @@ retry:
         s_time_t load_max;
         int cpus_max;
 
-        
+
         load_max = st.lrqd->b_avgload;
         if ( st.orqd->b_avgload > load_max )
             load_max = st.orqd->b_avgload;
@@ -2656,7 +2651,7 @@ retry:
                                            opt_overload_balance_tolerance)) )
                 goto out;
     }
-             
+
     /* Try to grab the other runqueue lock; if it's been taken in the
      * meantime, try the process over again.  This can't deadlock
      * because if it doesn't get any other rqd locks, it will simply
@@ -2696,17 +2691,17 @@ retry:
 
         update_svc_load(ops, push_svc, 0, now);
 
-        if ( !vcpu_is_migrateable(push_svc, st.orqd) )
+        if ( !item_is_migrateable(push_svc, st.orqd) )
             continue;
 
         list_for_each( pull_iter, &st.orqd->svc )
         {
             struct csched2_item * pull_svc = list_entry(pull_iter, struct csched2_item, rqd_elem);
-            
+
             if ( !inner_load_updated )
                 update_svc_load(ops, pull_svc, 0, now);
-        
-            if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
+
+            if ( !item_is_migrateable(pull_svc, st.lrqd) )
                 continue;
 
             consider(&st, push_svc, pull_svc);
@@ -2721,8 +2716,8 @@ retry:
     list_for_each( pull_iter, &st.orqd->svc )
     {
         struct csched2_item * pull_svc = list_entry(pull_iter, struct csched2_item, rqd_elem);
-        
-        if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
+
+        if ( !item_is_migrateable(pull_svc, st.lrqd) )
             continue;
 
         /* Consider pull only */
@@ -2745,8 +2740,7 @@ static void
 csched2_item_migrate(
     const struct scheduler *ops, struct sched_item *item, unsigned int new_cpu)
 {
-    struct vcpu *vc = item->vcpu;
-    struct domain *d = vc->domain;
+    struct domain *d = item->domain;
     struct csched2_item * const svc = csched2_item(item);
     struct csched2_runqueue_data *trqd;
     s_time_t now = NOW();
@@ -2758,25 +2752,24 @@ csched2_item_migrate(
      * cpupool.
      *
      * And since there indeed is the chance that it is not part of it, all
-     * we must do is remove _and_ unassign the vCPU from any runqueue, as
+     * we must do is remove _and_ unassign the item from any runqueue, as
      * well as updating v->processor with the target, so that the suspend
      * process can continue.
      *
      * It will then be during resume that a new, meaningful, value for
      * v->processor will be chosen, and during actual domain unpause that
-     * the vCPU will be assigned to and added to the proper runqueue.
+     * the item will be assigned to and added to the proper runqueue.
      */
     if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
     {
         ASSERT(system_state == SYS_STATE_suspend);
-        if ( vcpu_on_runq(svc) )
+        if ( item_on_runq(svc) )
         {
             runq_remove(svc);
             update_load(ops, svc->rqd, NULL, -1, now);
         }
         _runq_deassign(svc);
-        vc->processor = new_cpu;
-        item->res = per_cpu(sched_res, new_cpu);
+        sched_set_res(item, per_cpu(sched_res, new_cpu));
         return;
     }
 
@@ -2790,17 +2783,14 @@ csched2_item_migrate(
      * Do the actual movement toward new_cpu, and update vc->processor.
      * If we are changing runqueue, migrate() takes care of everything.
      * If we are not changing runqueue, we need to update vc->processor
-     * here. In fact, if, for instance, we are here because the vcpu's
+     * here. In fact, if, for instance, we are here because the item's
      * hard affinity changed, we don't want to risk leaving vc->processor
      * pointing to a pcpu where we can't run any longer.
      */
     if ( trqd != svc->rqd )
         migrate(ops, svc, trqd, now);
     else
-    {
-        vc->processor = new_cpu;
-        item->res = per_cpu(sched_res, new_cpu);
-    }
+        sched_set_res(item, per_cpu(sched_res, new_cpu));
 }
 
 static int
@@ -2812,18 +2802,18 @@ csched2_dom_cntl(
     struct csched2_dom * const sdom = csched2_dom(d);
     struct csched2_private *prv = csched2_priv(ops);
     unsigned long flags;
-    struct vcpu *v;
+    struct sched_item *item;
     int rc = 0;
 
     /*
      * Locking:
      *  - we must take the private lock for accessing the weights of the
-     *    vcpus of d, and/or the cap;
+     *    items of d, and/or the cap;
      *  - in the putinfo case, we also need the runqueue lock(s), for
      *    updating the max waight of the runqueue(s).
      *    If changing the cap, we also need the budget_lock, for updating
      *    the value of the domain budget pool (and the runqueue lock,
-     *    for adjusting the parameters and rescheduling any vCPU that is
+     *    for adjusting the parameters and rescheduling any item that is
      *    running at the time of the change).
      */
     switch ( op->cmd )
@@ -2845,18 +2835,18 @@ csched2_dom_cntl(
 
             sdom->weight = op->u.credit2.weight;
 
-            /* Update weights for vcpus, and max_weight for runqueues on which they reside */
-            for_each_vcpu ( d, v )
+            /* Update weights for items, and max_weight for runqueues on which they reside */
+            for_each_sched_item ( d, item )
             {
-                struct csched2_item *svc = csched2_item(v->sched_item);
-                spinlock_t *lock = item_schedule_lock(svc->vcpu->sched_item);
+                struct csched2_item *svc = csched2_item(item);
+                spinlock_t *lock = item_schedule_lock(item);
 
-                ASSERT(svc->rqd == c2rqd(ops, svc->vcpu->processor));
+                ASSERT(svc->rqd == c2rqd(ops, sched_item_cpu(item)));
 
                 svc->weight = sdom->weight;
                 update_max_weight(svc->rqd, svc->weight, old_weight);
 
-                item_schedule_unlock(lock, svc->vcpu->sched_item);
+                item_schedule_unlock(lock, item);
             }
         }
         /* Cap */
@@ -2865,8 +2855,8 @@ csched2_dom_cntl(
             struct csched2_item *svc;
             spinlock_t *lock;
 
-            /* Cap is only valid if it's below 100 * nr_of_vCPUS */
-            if ( op->u.credit2.cap > 100 * sdom->nr_vcpus )
+            /* Cap is only valid if it's below 100 * nr_of_items */
+            if ( op->u.credit2.cap > 100 * sdom->nr_items )
             {
                 rc = -EINVAL;
                 write_unlock_irqrestore(&prv->lock, flags);
@@ -2879,23 +2869,23 @@ csched2_dom_cntl(
             spin_unlock(&sdom->budget_lock);
 
             /*
-             * When trying to get some budget and run, each vCPU will grab
-             * from the pool 1/N (with N = nr of vCPUs of the domain) of
-             * the total budget. Roughly speaking, this means each vCPU will
+             * When trying to get some budget and run, each item will grab
+             * from the pool 1/N (with N = nr of items of the domain) of
+             * the total budget. Roughly speaking, this means each item will
              * have at least one chance to run during every period.
              */
-            for_each_vcpu ( d, v )
+            for_each_sched_item ( d, item )
             {
-                svc = csched2_item(v->sched_item);
-                lock = item_schedule_lock(svc->vcpu->sched_item);
+                svc = csched2_item(item);
+                lock = item_schedule_lock(item);
                 /*
                  * Too small quotas would in theory cause a lot of overhead,
                  * which then won't happen because, in csched2_runtime(),
                  * CSCHED2_MIN_TIMER is what would be used anyway.
                  */
-                svc->budget_quota = max(sdom->tot_budget / sdom->nr_vcpus,
+                svc->budget_quota = max(sdom->tot_budget / sdom->nr_items,
                                         CSCHED2_MIN_TIMER);
-                item_schedule_unlock(lock, svc->vcpu->sched_item);
+                item_schedule_unlock(lock, item);
             }
 
             if ( sdom->cap == 0 )
@@ -2905,7 +2895,7 @@ csched2_dom_cntl(
                  * and queue its first replenishment event.
                  *
                  * Since cap is currently disabled for this domain, we
-                 * know no vCPU is messing with the domain's budget, and
+                 * know no item is messing with the domain's budget, and
                  * the replenishment timer is still off.
                  * For these reasons, it is safe to do the following without
                  * taking the budget_lock.
@@ -2915,42 +2905,42 @@ csched2_dom_cntl(
                 set_timer(&sdom->repl_timer, sdom->next_repl);
 
                 /*
-                 * Now, let's enable budget accounting for all the vCPUs.
+                 * Now, let's enable budget accounting for all the items.
                  * For making sure that they will start to honour the domain's
                  * cap, we set their budget to 0.
                  * This way, as soon as they will try to run, they will have
                  * to get some budget.
                  *
-                 * For the vCPUs that are already running, we trigger the
+                 * For the items that are already running, we trigger the
                  * scheduler on their pCPU. When, as a consequence of this,
                  * csched2_schedule() will run, it will figure out there is
-                 * no budget, and the vCPU will try to get some (and be parked,
+                 * no budget, and the item will try to get some (and be parked,
                  * if there's none, and we'll switch to someone else).
                  */
-                for_each_vcpu ( d, v )
+                for_each_sched_item ( d, item )
                 {
-                    svc = csched2_item(v->sched_item);
-                    lock = item_schedule_lock(svc->vcpu->sched_item);
-                    if ( vcpu_running(v) )
+                    svc = csched2_item(item);
+                    lock = item_schedule_lock(item);
+                    if ( item->is_running )
                     {
-                        unsigned int cpu = v->processor;
+                        unsigned int cpu = sched_item_cpu(item);
                         struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
 
-                        ASSERT(curr_on_cpu(cpu)->vcpu == v);
+                        ASSERT(curr_on_cpu(cpu) == item);
 
                         /*
-                         * We are triggering a reschedule on the vCPU's
+                         * We are triggering a reschedule on the item's
                          * pCPU. That will run burn_credits() and, since
-                         * the vCPU is capped now, it would charge all the
+                         * the item is capped now, it would charge all the
                          * execution time of this last round as budget as
-                         * well. That will make the vCPU budget go negative,
+                         * well. That will make the item budget go negative,
                          * potentially by a large amount, and it's unfair.
                          *
                          * To avoid that, call burn_credit() here, to do the
                          * accounting of this current running instance now,
                          * with budgetting still disabled. This does not
                          * prevent some small amount of budget being charged
-                         * to the vCPU (i.e., the amount of time it runs from
+                         * to the item (i.e., the amount of time it runs from
                          * now, to when scheduling happens). The budget will
                          * also go below 0, but a lot less than how it would
                          * if we don't do this.
@@ -2961,7 +2951,7 @@ csched2_dom_cntl(
                         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
                     }
                     svc->budget = 0;
-                    item_schedule_unlock(lock, svc->vcpu->sched_item);
+                    item_schedule_unlock(lock, item);
                 }
             }
 
@@ -2973,30 +2963,30 @@ csched2_dom_cntl(
 
             stop_timer(&sdom->repl_timer);
 
-            /* Disable budget accounting for all the vCPUs. */
-            for_each_vcpu ( d, v )
+            /* Disable budget accounting for all the items. */
+            for_each_sched_item ( d, item )
             {
-                struct csched2_item *svc = csched2_item(v->sched_item);
-                spinlock_t *lock = item_schedule_lock(svc->vcpu->sched_item);
+                struct csched2_item *svc = csched2_item(item);
+                spinlock_t *lock = item_schedule_lock(item);
 
                 svc->budget = STIME_MAX;
                 svc->budget_quota = 0;
 
-                item_schedule_unlock(lock, svc->vcpu->sched_item);
+                item_schedule_unlock(lock, item);
             }
             sdom->cap = 0;
             /*
              * We are disabling the cap for this domain, which may have
-             * vCPUs waiting for a replenishment, so we unpark them all.
+             * items waiting for a replenishment, so we unpark them all.
              * Note that, since we have already disabled budget accounting
-             * for all the vCPUs of the domain, no currently running vCPU
-             * will be added to the parked vCPUs list any longer.
+             * for all the items of the domain, no currently running item
+             * will be added to the parked items list any longer.
              */
             spin_lock(&sdom->budget_lock);
-            list_splice_init(&sdom->parked_vcpus, &parked);
+            list_splice_init(&sdom->parked_items, &parked);
             spin_unlock(&sdom->budget_lock);
 
-            unpark_parked_vcpus(ops, &parked);
+            unpark_parked_items(ops, &parked);
         }
         write_unlock_irqrestore(&prv->lock, flags);
         break;
@@ -3073,12 +3063,12 @@ csched2_alloc_domdata(const struct scheduler *ops, struct domain *dom)
     sdom->dom = dom;
     sdom->weight = CSCHED2_DEFAULT_WEIGHT;
     sdom->cap = 0U;
-    sdom->nr_vcpus = 0;
+    sdom->nr_items = 0;
 
     init_timer(&sdom->repl_timer, replenish_domain_budget, sdom,
                cpumask_any(cpupool_domain_cpumask(dom)));
     spin_lock_init(&sdom->budget_lock);
-    INIT_LIST_HEAD(&sdom->parked_vcpus);
+    INIT_LIST_HEAD(&sdom->parked_items);
 
     write_lock_irqsave(&prv->lock, flags);
 
@@ -3112,34 +3102,32 @@ csched2_free_domdata(const struct scheduler *ops, void *data)
 static void
 csched2_item_insert(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct csched2_item *svc = item->priv;
     struct csched2_dom * const sdom = svc->sdom;
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_item(item));
     ASSERT(list_empty(&svc->runq_elem));
 
     /* csched2_res_pick() expects the pcpu lock to be held */
     lock = item_schedule_lock_irq(item);
 
-    item->res = csched2_res_pick(ops, item);
-    vc->processor = item->res->processor;
+    sched_set_res(item, csched2_res_pick(ops, item));
 
     spin_unlock_irq(lock);
 
     lock = item_schedule_lock_irq(item);
 
-    /* Add vcpu to runqueue of initial processor */
-    runq_assign(ops, vc);
+    /* Add item to runqueue of initial processor */
+    runq_assign(ops, item);
 
     item_schedule_unlock_irq(lock, item);
 
-    sdom->nr_vcpus++;
+    sdom->nr_items++;
 
     SCHED_STAT_CRANK(item_insert);
 
-    CSCHED2_VCPU_CHECK(vc);
+    CSCHED2_ITEM_CHECK(item);
 }
 
 static void
@@ -3153,11 +3141,10 @@ csched2_free_vdata(const struct scheduler *ops, void *priv)
 static void
 csched2_item_remove(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     struct csched2_item * const svc = csched2_item(item);
     spinlock_t *lock;
 
-    ASSERT(!is_idle_vcpu(vc));
+    ASSERT(!is_idle_item(item));
     ASSERT(list_empty(&svc->runq_elem));
 
     SCHED_STAT_CRANK(item_remove);
@@ -3165,14 +3152,14 @@ csched2_item_remove(const struct scheduler *ops, struct sched_item *item)
     /* Remove from runqueue */
     lock = item_schedule_lock_irq(item);
 
-    runq_deassign(ops, vc);
+    runq_deassign(ops, item);
 
     item_schedule_unlock_irq(lock, item);
 
-    svc->sdom->nr_vcpus--;
+    svc->sdom->nr_items--;
 }
 
-/* How long should we let this vcpu run for? */
+/* How long should we let this item run for? */
 static s_time_t
 csched2_runtime(const struct scheduler *ops, int cpu,
                 struct csched2_item *snext, s_time_t now)
@@ -3187,7 +3174,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
      * If we're idle, just stay so. Others (or external events)
      * will poke us when necessary.
      */
-    if ( is_idle_vcpu(snext->vcpu) )
+    if ( is_idle_item(snext->item) )
         return -1;
 
     /* General algorithm:
@@ -3204,8 +3191,8 @@ csched2_runtime(const struct scheduler *ops, int cpu,
     if ( prv->ratelimit_us )
     {
         s_time_t ratelimit_min = MICROSECS(prv->ratelimit_us);
-        if ( vcpu_running(snext->vcpu) )
-            ratelimit_min = snext->vcpu->sched_item->state_entry_time +
+        if ( snext->item->is_running )
+            ratelimit_min = snext->item->state_entry_time +
                             MICROSECS(prv->ratelimit_us) - now;
         if ( ratelimit_min > min_time )
             min_time = ratelimit_min;
@@ -3222,7 +3209,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
     {
         struct csched2_item *swait = runq_elem(runq->next);
 
-        if ( ! is_idle_vcpu(swait->vcpu)
+        if ( ! is_idle_item(swait->item)
              && swait->credit > 0 )
         {
             rt_credit = snext->credit - swait->credit;
@@ -3236,7 +3223,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
      *
      * FIXME: See if we can eliminate this conversion if we know time
      * will be outside (MIN,MAX).  Probably requires pre-calculating
-     * credit values of MIN,MAX per vcpu, since each vcpu burns credit
+     * credit values of MIN,MAX per item, since each item burns credit
      * at a different rate.
      */
     if ( rt_credit > 0 )
@@ -3284,36 +3271,35 @@ runq_candidate(struct csched2_runqueue_data *rqd,
 
     *skipped = 0;
 
-    if ( unlikely(is_idle_vcpu(scurr->vcpu)) )
+    if ( unlikely(is_idle_item(scurr->item)) )
     {
         snext = scurr;
         goto check_runq;
     }
 
-    yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+    yield = __test_and_clear_bit(__CSFLAG_item_yield, &scurr->flags);
 
     /*
-     * Return the current vcpu if it has executed for less than ratelimit.
-     * Adjuststment for the selected vcpu's credit and decision
+     * Return the current item if it has executed for less than ratelimit.
+     * Adjuststment for the selected item's credit and decision
      * for how long it will run will be taken in csched2_runtime.
      *
      * Note that, if scurr is yielding, we don't let rate limiting kick in.
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) &&
-         (now - scurr->vcpu->sched_item->state_entry_time) <
-          MICROSECS(prv->ratelimit_us) )
+    if ( !yield && prv->ratelimit_us && item_runnable(scurr->item) &&
+         (now - scurr->item->state_entry_time) < MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned item:16, dom:16;
                 unsigned runtime;
             } d;
-            d.dom = scurr->vcpu->domain->domain_id;
-            d.vcpu = scurr->vcpu->vcpu_id;
-            d.runtime = now - scurr->vcpu->sched_item->state_entry_time;
+            d.dom = scurr->item->domain->domain_id;
+            d.item = scurr->item->item_id;
+            d.runtime = now - scurr->item->state_entry_time;
             __trace_var(TRC_CSCHED2_RATELIMIT, 1,
                         sizeof(d),
                         (unsigned char *)&d);
@@ -3322,13 +3308,13 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     }
 
     /* If scurr has a soft-affinity, let's check whether cpu is part of it */
-    if ( has_soft_affinity(scurr->vcpu->sched_item) )
+    if ( has_soft_affinity(scurr->item) )
     {
-        affinity_balance_cpumask(scurr->vcpu->sched_item, BALANCE_SOFT_AFFINITY,
+        affinity_balance_cpumask(scurr->item, BALANCE_SOFT_AFFINITY,
                                  cpumask_scratch);
         if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) )
         {
-            cpumask_t *online = cpupool_domain_cpumask(scurr->vcpu->domain);
+            cpumask_t *online = cpupool_domain_cpumask(scurr->item->domain);
 
             /* Ok, is any of the pcpus in scurr soft-affinity idle? */
             cpumask_and(cpumask_scratch, cpumask_scratch, &rqd->idle);
@@ -3356,10 +3342,10 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      *
      * Of course, we also default to idle also if scurr is not runnable.
      */
-    if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
+    if ( item_runnable(scurr->item) && !soft_aff_preempt )
         snext = scurr;
     else
-        snext = csched2_item(idle_vcpu[cpu]->sched_item);
+        snext = csched2_item(sched_idle_item(cpu));
 
  check_runq:
     list_for_each_safe( iter, temp, &rqd->runq )
@@ -3369,24 +3355,24 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned vcpu:16, dom:16;
+                unsigned item:16, dom:16;
             } d;
-            d.dom = svc->vcpu->domain->domain_id;
-            d.vcpu = svc->vcpu->vcpu_id;
+            d.dom = svc->item->domain->domain_id;
+            d.item = svc->item->item_id;
             __trace_var(TRC_CSCHED2_RUNQ_CAND_CHECK, 1,
                         sizeof(d),
                         (unsigned char *)&d);
         }
 
-        /* Only consider vcpus that are allowed to run on this processor. */
-        if ( !cpumask_test_cpu(cpu, svc->vcpu->sched_item->cpu_hard_affinity) )
+        /* Only consider items that are allowed to run on this processor. */
+        if ( !cpumask_test_cpu(cpu, svc->item->cpu_hard_affinity) )
         {
             (*skipped)++;
             continue;
         }
 
         /*
-         * If a vcpu is meant to be picked up by another processor, and such
+         * If an item is meant to be picked up by another processor, and such
          * processor has not scheduled yet, leave it in the runqueue for him.
          */
         if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
@@ -3401,7 +3387,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          * If this is on a different processor, don't pull it unless
          * its credit is at least CSCHED2_MIGRATE_RESIST higher.
          */
-        if ( svc->vcpu->processor != cpu
+        if ( sched_item_cpu(svc->item) != cpu
              && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
         {
             (*skipped)++;
@@ -3416,7 +3402,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          * some budget, then choose it.
          */
         if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || vcpu_grab_budget(svc)) )
+             (!has_cap(svc) || item_grab_budget(svc)) )
             snext = svc;
 
         /* In any case, if we got this far, break. */
@@ -3426,12 +3412,12 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned vcpu:16, dom:16;
+            unsigned item:16, dom:16;
             unsigned tickled_cpu, skipped;
             int credit;
         } d;
-        d.dom = snext->vcpu->domain->domain_id;
-        d.vcpu = snext->vcpu->vcpu_id;
+        d.dom = snext->item->domain->domain_id;
+        d.item = snext->item->item_id;
         d.credit = snext->credit;
         d.tickled_cpu = snext->tickled_cpu;
         d.skipped = *skipped;
@@ -3463,14 +3449,15 @@ csched2_schedule(
 {
     const int cpu = smp_processor_id();
     struct csched2_runqueue_data *rqd;
-    struct csched2_item * const scurr = csched2_item(current->sched_item);
+    struct sched_item *curritem = current->sched_item;
+    struct csched2_item * const scurr = csched2_item(curritem);
     struct csched2_item *snext = NULL;
-    unsigned int skipped_vcpus = 0;
+    unsigned int skipped_items = 0;
     struct task_slice ret;
     bool tickled;
 
     SCHED_STAT_CRANK(schedule);
-    CSCHED2_VCPU_CHECK(current);
+    CSCHED2_ITEM_CHECK(curritem);
 
     BUG_ON(!cpumask_test_cpu(cpu, &csched2_priv(ops)->initialized));
 
@@ -3479,7 +3466,7 @@ csched2_schedule(
 
     ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
 
-    BUG_ON(!is_idle_vcpu(scurr->vcpu) && scurr->rqd != rqd);
+    BUG_ON(!is_idle_item(curritem) && scurr->rqd != rqd);
 
     /* Clear "tickled" bit now that we've been scheduled */
     tickled = cpumask_test_cpu(cpu, &rqd->tickled);
@@ -3499,7 +3486,7 @@ csched2_schedule(
         d.cpu = cpu;
         d.rq_id = c2r(cpu);
         d.tasklet = tasklet_work_scheduled;
-        d.idle = is_idle_vcpu(current);
+        d.idle = is_idle_item(curritem);
         d.smt_idle = cpumask_test_cpu(cpu, &rqd->smt_idle);
         d.tickled = tickled;
         __trace_var(TRC_CSCHED2_SCHEDULE, 1,
@@ -3513,55 +3500,55 @@ csched2_schedule(
     /*
      *  Below 0, means that we are capped and we have overrun our  budget.
      *  Let's try to get some more but, if we fail (e.g., because of the
-     *  other running vcpus), we will be parked.
+     *  other running items), we will be parked.
      */
     if ( unlikely(scurr->budget <= 0) )
-        vcpu_grab_budget(scurr);
+        item_grab_budget(scurr);
 
     /*
-     * Select next runnable local VCPU (ie top of local runq).
+     * Select next runnable local ITEM (ie top of local runq).
      *
-     * If the current vcpu is runnable, and has higher credit than
+     * If the current item is runnable, and has higher credit than
      * the next guy on the queue (or there is noone else), we want to
      * run him again.
      *
-     * If there's tasklet work to do, we want to chose the idle vcpu
+     * If there's tasklet work to do, we want to chose the idle item
      * for this processor, and mark the current for delayed runqueue
      * add.
      *
-     * If the current vcpu is runnable, and there's another runnable
+     * If the current item is runnable, and there's another runnable
      * candidate, we want to mark current for delayed runqueue add,
      * and remove the next guy from the queue.
      *
-     * If the current vcpu is not runnable, we want to chose the idle
-     * vcpu for this processor.
+     * If the current item is not runnable, we want to chose the idle
+     * item for this processor.
      */
     if ( tasklet_work_scheduled )
     {
-        __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+        __clear_bit(__CSFLAG_item_yield, &scurr->flags);
         trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
-        snext = csched2_item(idle_vcpu[cpu]->sched_item);
+        snext = csched2_item(sched_idle_item(cpu));
     }
     else
-        snext = runq_candidate(rqd, scurr, cpu, now, &skipped_vcpus);
+        snext = runq_candidate(rqd, scurr, cpu, now, &skipped_items);
 
-    /* If switching from a non-idle runnable vcpu, put it
+    /* If switching from a non-idle runnable item, put it
      * back on the runqueue. */
     if ( snext != scurr
-         && !is_idle_vcpu(scurr->vcpu)
-         && vcpu_runnable(current) )
+         && !is_idle_item(curritem)
+         && item_runnable(curritem) )
         __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags);
 
     ret.migrated = 0;
 
     /* Accounting for non-idle tasks */
-    if ( !is_idle_vcpu(snext->vcpu) )
+    if ( !is_idle_item(snext->item) )
     {
         /* If switching, remove this from the runqueue and mark it scheduled */
         if ( snext != scurr )
         {
             ASSERT(snext->rqd == rqd);
-            ASSERT(!vcpu_running(snext->vcpu));
+            ASSERT(!snext->item->is_running);
 
             runq_remove(snext);
             __set_bit(__CSFLAG_scheduled, &snext->flags);
@@ -3576,19 +3563,19 @@ csched2_schedule(
 
         /*
          * The reset condition is "has a scheduler epoch come to an end?".
-         * The way this is enforced is checking whether the vcpu at the top
+         * The way this is enforced is checking whether the item at the top
          * of the runqueue has negative credits. This means the epochs have
          * variable length, as in one epoch expores when:
-         *  1) the vcpu at the top of the runqueue has executed for
+         *  1) the item at the top of the runqueue has executed for
          *     around 10 ms (with default parameters);
-         *  2) no other vcpu with higher credits wants to run.
+         *  2) no other item with higher credits wants to run.
          *
          * Here, where we want to check for reset, we need to make sure the
-         * proper vcpu is being used. In fact, runqueue_candidate() may have
-         * not returned the first vcpu in the runqueue, for various reasons
+         * proper item is being used. In fact, runqueue_candidate() may have
+         * not returned the first item in the runqueue, for various reasons
          * (e.g., affinity). Only trigger a reset when it does.
          */
-        if ( skipped_vcpus == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
+        if ( skipped_items == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
         {
             reset_credit(ops, cpu, now, snext);
             balance_load(ops, cpu, now);
@@ -3598,11 +3585,10 @@ csched2_schedule(
         snext->tickled_cpu = -1;
 
         /* Safe because lock for old processor is held */
-        if ( snext->vcpu->processor != cpu )
+        if ( sched_item_cpu(snext->item) != cpu )
         {
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
-            snext->vcpu->processor = cpu;
-            snext->vcpu->sched_item->res = per_cpu(sched_res, cpu);
+            sched_set_res(snext->item, per_cpu(sched_res, cpu));
             SCHED_STAT_CRANK(migrated);
             ret.migrated = 1;
         }
@@ -3636,20 +3622,20 @@ csched2_schedule(
      * Return task to run next...
      */
     ret.time = csched2_runtime(ops, cpu, snext, now);
-    ret.task = snext->vcpu->sched_item;
+    ret.task = snext->item;
 
-    CSCHED2_VCPU_CHECK(ret.task->vcpu);
+    CSCHED2_ITEM_CHECK(ret.task);
     return ret;
 }
 
 static void
-csched2_dump_vcpu(struct csched2_private *prv, struct csched2_item *svc)
+csched2_dump_item(struct csched2_private *prv, struct csched2_item *svc)
 {
     printk("[%i.%i] flags=%x cpu=%i",
-            svc->vcpu->domain->domain_id,
-            svc->vcpu->vcpu_id,
+            svc->item->domain->domain_id,
+            svc->item->item_id,
             svc->flags,
-            svc->vcpu->processor);
+            sched_item_cpu(svc->item));
 
     printk(" credit=%" PRIi32" [w=%u]", svc->credit, svc->weight);
 
@@ -3674,12 +3660,12 @@ dump_pcpu(const struct scheduler *ops, int cpu)
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_sibling_mask, cpu)),
            nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu)));
 
-    /* current VCPU (nothing to say if that's the idle vcpu) */
+    /* current ITEM (nothing to say if that's the idle item) */
     svc = csched2_item(curr_on_cpu(cpu));
-    if ( svc && !is_idle_vcpu(svc->vcpu) )
+    if ( svc && !is_idle_item(svc->item) )
     {
         printk("\trun: ");
-        csched2_dump_vcpu(prv, svc);
+        csched2_dump_item(prv, svc);
     }
 }
 
@@ -3736,7 +3722,7 @@ csched2_dump(const struct scheduler *ops)
     list_for_each( iter_sdom, &prv->sdom )
     {
         struct csched2_dom *sdom;
-        struct vcpu *v;
+        struct sched_item *item;
 
         sdom = list_entry(iter_sdom, struct csched2_dom, sdom_elem);
 
@@ -3744,19 +3730,19 @@ csched2_dump(const struct scheduler *ops)
                sdom->dom->domain_id,
                sdom->weight,
                sdom->cap,
-               sdom->nr_vcpus);
+               sdom->nr_items);
 
-        for_each_vcpu( sdom->dom, v )
+        for_each_sched_item( sdom->dom, item )
         {
-            struct csched2_item * const svc = csched2_item(v->sched_item);
+            struct csched2_item * const svc = csched2_item(item);
             spinlock_t *lock;
 
-            lock = item_schedule_lock(svc->vcpu->sched_item);
+            lock = item_schedule_lock(item);
 
             printk("\t%3d: ", ++loop);
-            csched2_dump_vcpu(prv, svc);
+            csched2_dump_item(prv, svc);
 
-            item_schedule_unlock(lock, svc->vcpu->sched_item);
+            item_schedule_unlock(lock, item);
         }
     }
 
@@ -3782,7 +3768,7 @@ csched2_dump(const struct scheduler *ops)
             if ( svc )
             {
                 printk("\t%3d: ", loop++);
-                csched2_dump_vcpu(prv, svc);
+                csched2_dump_item(prv, svc);
             }
         }
         spin_unlock(&rqd->lock);
@@ -3882,7 +3868,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     struct csched2_item *svc = vdata;
     unsigned rqi;
 
-    ASSERT(pdata && svc && is_idle_vcpu(svc->vcpu));
+    ASSERT(pdata && svc && is_idle_item(svc->item));
 
     /*
      * We own one runqueue lock already (from schedule_cpu_switch()). This
@@ -3895,7 +3881,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     ASSERT(!local_irq_is_enabled());
     write_lock(&prv->lock);
 
-    idle_vcpu[cpu]->sched_item->priv = vdata;
+    sched_idle_item(cpu)->priv = vdata;
 
     rqi = init_pdata(prv, pdata, cpu);
 
@@ -3946,7 +3932,7 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
      */
     ASSERT(spc && spc->runq_id != -1);
     ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
-    
+
     /* Find the old runqueue and remove this cpu from it */
     rqd = prv->rqd + spc->runq_id;
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 28/49] xen/sched: make arinc653 scheduler vcpu agnostic.
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (26 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 27/49] xen/sched: make credit2 " Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 29/49] xen: add sched_item_pause_nosync() and sched_item_unpause() Juergen Gross
                   ` (26 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, George Dunlap, Josh Whitehead, Robert VanVossen,
	Dario Faggioli

Switch arinc653 scheduler completely from vcpu to sched_item usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c | 208 +++++++++++++++++++++-----------------------
 1 file changed, 101 insertions(+), 107 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 5733a2a6b8..61f9ea6824 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -45,15 +45,15 @@
 #define DEFAULT_TIMESLICE MILLISECS(10)
 
 /**
- * Retrieve the idle VCPU for a given physical CPU
+ * Retrieve the idle ITEM for a given physical CPU
  */
-#define IDLETASK(cpu)  (idle_vcpu[cpu])
+#define IDLETASK(cpu)  (sched_idle_item(cpu))
 
 /**
  * Return a pointer to the ARINC 653-specific scheduler data information
- * associated with the given VCPU (vc)
+ * associated with the given ITEM (item)
  */
-#define AVCPU(vc) ((arinc653_vcpu_t *)(vc)->sched_item->priv)
+#define AITEM(item) ((arinc653_item_t *)(item)->priv)
 
 /**
  * Return the global scheduler private data given the scheduler ops pointer
@@ -65,20 +65,20 @@
  **************************************************************************/
 
 /**
- * The arinc653_vcpu_t structure holds ARINC 653-scheduler-specific
- * information for all non-idle VCPUs
+ * The arinc653_item_t structure holds ARINC 653-scheduler-specific
+ * information for all non-idle ITEMs
  */
-typedef struct arinc653_vcpu_s
+typedef struct arinc653_item_s
 {
-    /* vc points to Xen's struct vcpu so we can get to it from an
-     * arinc653_vcpu_t pointer. */
-    struct vcpu *       vc;
-    /* awake holds whether the VCPU has been woken with vcpu_wake() */
+    /* item points to Xen's struct sched_item so we can get to it from an
+     * arinc653_item_t pointer. */
+    struct sched_item * item;
+    /* awake holds whether the ITEM has been woken with vcpu_wake() */
     bool_t              awake;
-    /* list holds the linked list information for the list this VCPU
+    /* list holds the linked list information for the list this ITEM
      * is stored in */
     struct list_head    list;
-} arinc653_vcpu_t;
+} arinc653_item_t;
 
 /**
  * The sched_entry_t structure holds a single entry of the
@@ -89,14 +89,14 @@ typedef struct sched_entry_s
     /* dom_handle holds the handle ("UUID") for the domain that this
      * schedule entry refers to. */
     xen_domain_handle_t dom_handle;
-    /* vcpu_id holds the VCPU number for the VCPU that this schedule
+    /* item_id holds the ITEM number for the ITEM that this schedule
      * entry refers to. */
-    int                 vcpu_id;
-    /* runtime holds the number of nanoseconds that the VCPU for this
+    int                 item_id;
+    /* runtime holds the number of nanoseconds that the ITEM for this
      * schedule entry should be allowed to run per major frame. */
     s_time_t            runtime;
-    /* vc holds a pointer to the Xen VCPU structure */
-    struct vcpu *       vc;
+    /* item holds a pointer to the Xen sched_item structure */
+    struct sched_item * item;
 } sched_entry_t;
 
 /**
@@ -110,9 +110,9 @@ typedef struct a653sched_priv_s
     /**
      * This array holds the active ARINC 653 schedule.
      *
-     * When the system tries to start a new VCPU, this schedule is scanned
-     * to look for a matching (handle, VCPU #) pair. If both the handle (UUID)
-     * and VCPU number match, then the VCPU is allowed to run. Its run time
+     * When the system tries to start a new ITEM, this schedule is scanned
+     * to look for a matching (handle, ITEM #) pair. If both the handle (UUID)
+     * and ITEM number match, then the ITEM is allowed to run. Its run time
      * (per major frame) is given in the third entry of the schedule.
      */
     sched_entry_t schedule[ARINC653_MAX_DOMAINS_PER_SCHEDULE];
@@ -123,8 +123,8 @@ typedef struct a653sched_priv_s
      *
      * This is not necessarily the same as the number of domains in the
      * schedule. A domain could be listed multiple times within the schedule,
-     * or a domain with multiple VCPUs could have a different
-     * schedule entry for each VCPU.
+     * or a domain with multiple ITEMs could have a different
+     * schedule entry for each ITEM.
      */
     unsigned int num_schedule_entries;
 
@@ -139,9 +139,9 @@ typedef struct a653sched_priv_s
     s_time_t next_major_frame;
 
     /**
-     * pointers to all Xen VCPU structures for iterating through
+     * pointers to all Xen ITEM structures for iterating through
      */
-    struct list_head vcpu_list;
+    struct list_head item_list;
 } a653sched_priv_t;
 
 /**************************************************************************
@@ -167,50 +167,50 @@ static int dom_handle_cmp(const xen_domain_handle_t h1,
 }
 
 /**
- * This function searches the vcpu list to find a VCPU that matches
- * the domain handle and VCPU ID specified.
+ * This function searches the item list to find a ITEM that matches
+ * the domain handle and ITEM ID specified.
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param handle    Pointer to handler
- * @param vcpu_id   VCPU ID
+ * @param item_id   ITEM ID
  *
  * @return          <ul>
- *                  <li> Pointer to the matching VCPU if one is found
+ *                  <li> Pointer to the matching ITEM if one is found
  *                  <li> NULL otherwise
  *                  </ul>
  */
-static struct vcpu *find_vcpu(
+static struct sched_item *find_item(
     const struct scheduler *ops,
     xen_domain_handle_t handle,
-    int vcpu_id)
+    int item_id)
 {
-    arinc653_vcpu_t *avcpu;
+    arinc653_item_t *aitem;
 
-    /* loop through the vcpu_list looking for the specified VCPU */
-    list_for_each_entry ( avcpu, &SCHED_PRIV(ops)->vcpu_list, list )
-        if ( (dom_handle_cmp(avcpu->vc->domain->handle, handle) == 0)
-             && (vcpu_id == avcpu->vc->vcpu_id) )
-            return avcpu->vc;
+    /* loop through the item_list looking for the specified ITEM */
+    list_for_each_entry ( aitem, &SCHED_PRIV(ops)->item_list, list )
+        if ( (dom_handle_cmp(aitem->item->domain->handle, handle) == 0)
+             && (item_id == aitem->item->item_id) )
+            return aitem->item;
 
     return NULL;
 }
 
 /**
- * This function updates the pointer to the Xen VCPU structure for each entry
+ * This function updates the pointer to the Xen ITEM structure for each entry
  * in the ARINC 653 schedule.
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @return          <None>
  */
-static void update_schedule_vcpus(const struct scheduler *ops)
+static void update_schedule_items(const struct scheduler *ops)
 {
     unsigned int i, n_entries = SCHED_PRIV(ops)->num_schedule_entries;
 
     for ( i = 0; i < n_entries; i++ )
-        SCHED_PRIV(ops)->schedule[i].vc =
-            find_vcpu(ops,
+        SCHED_PRIV(ops)->schedule[i].item =
+            find_item(ops,
                       SCHED_PRIV(ops)->schedule[i].dom_handle,
-                      SCHED_PRIV(ops)->schedule[i].vcpu_id);
+                      SCHED_PRIV(ops)->schedule[i].item_id);
 }
 
 /**
@@ -268,12 +268,12 @@ arinc653_sched_set(
         memcpy(sched_priv->schedule[i].dom_handle,
                schedule->sched_entries[i].dom_handle,
                sizeof(sched_priv->schedule[i].dom_handle));
-        sched_priv->schedule[i].vcpu_id =
+        sched_priv->schedule[i].item_id =
             schedule->sched_entries[i].vcpu_id;
         sched_priv->schedule[i].runtime =
             schedule->sched_entries[i].runtime;
     }
-    update_schedule_vcpus(ops);
+    update_schedule_items(ops);
 
     /*
      * The newly-installed schedule takes effect immediately. We do not even
@@ -319,7 +319,7 @@ arinc653_sched_get(
         memcpy(schedule->sched_entries[i].dom_handle,
                sched_priv->schedule[i].dom_handle,
                sizeof(sched_priv->schedule[i].dom_handle));
-        schedule->sched_entries[i].vcpu_id = sched_priv->schedule[i].vcpu_id;
+        schedule->sched_entries[i].vcpu_id = sched_priv->schedule[i].item_id;
         schedule->sched_entries[i].runtime = sched_priv->schedule[i].runtime;
     }
 
@@ -355,7 +355,7 @@ a653sched_init(struct scheduler *ops)
 
     prv->next_major_frame = 0;
     spin_lock_init(&prv->lock);
-    INIT_LIST_HEAD(&prv->vcpu_list);
+    INIT_LIST_HEAD(&prv->item_list);
 
     return 0;
 }
@@ -373,7 +373,7 @@ a653sched_deinit(struct scheduler *ops)
 }
 
 /**
- * This function allocates scheduler-specific data for a VCPU
+ * This function allocates scheduler-specific data for a ITEM
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param item      Pointer to struct sched_item
@@ -385,35 +385,34 @@ a653sched_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
                       void *dd)
 {
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
-    struct vcpu *vc = item->vcpu;
-    arinc653_vcpu_t *svc;
+    arinc653_item_t *svc;
     unsigned int entry;
     unsigned long flags;
 
     /*
      * Allocate memory for the ARINC 653-specific scheduler data information
-     * associated with the given VCPU (vc).
+     * associated with the given ITEM (item).
      */
-    svc = xmalloc(arinc653_vcpu_t);
+    svc = xmalloc(arinc653_item_t);
     if ( svc == NULL )
         return NULL;
 
     spin_lock_irqsave(&sched_priv->lock, flags);
 
-    /* 
-     * Add every one of dom0's vcpus to the schedule, as long as there are
+    /*
+     * Add every one of dom0's items to the schedule, as long as there are
      * slots available.
      */
-    if ( vc->domain->domain_id == 0 )
+    if ( item->domain->domain_id == 0 )
     {
         entry = sched_priv->num_schedule_entries;
 
         if ( entry < ARINC653_MAX_DOMAINS_PER_SCHEDULE )
         {
             sched_priv->schedule[entry].dom_handle[0] = '\0';
-            sched_priv->schedule[entry].vcpu_id = vc->vcpu_id;
+            sched_priv->schedule[entry].item_id = item->item_id;
             sched_priv->schedule[entry].runtime = DEFAULT_TIMESLICE;
-            sched_priv->schedule[entry].vc = vc;
+            sched_priv->schedule[entry].item = item;
 
             sched_priv->major_frame += DEFAULT_TIMESLICE;
             ++sched_priv->num_schedule_entries;
@@ -421,16 +420,16 @@ a653sched_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
     }
 
     /*
-     * Initialize our ARINC 653 scheduler-specific information for the VCPU.
-     * The VCPU starts "asleep." When Xen is ready for the VCPU to run, it
+     * Initialize our ARINC 653 scheduler-specific information for the ITEM.
+     * The ITEM starts "asleep." When Xen is ready for the ITEM to run, it
      * will call the vcpu_wake scheduler callback function and our scheduler
-     * will mark the VCPU awake.
+     * will mark the ITEM awake.
      */
-    svc->vc = vc;
+    svc->item = item;
     svc->awake = 0;
-    if ( !is_idle_vcpu(vc) )
-        list_add(&svc->list, &SCHED_PRIV(ops)->vcpu_list);
-    update_schedule_vcpus(ops);
+    if ( !is_idle_item(item) )
+        list_add(&svc->list, &SCHED_PRIV(ops)->item_list);
+    update_schedule_items(ops);
 
     spin_unlock_irqrestore(&sched_priv->lock, flags);
 
@@ -438,27 +437,27 @@ a653sched_alloc_vdata(const struct scheduler *ops, struct sched_item *item,
 }
 
 /**
- * This function frees scheduler-specific VCPU data
+ * This function frees scheduler-specific ITEM data
  *
  * @param ops       Pointer to this instance of the scheduler structure
  */
 static void
 a653sched_free_vdata(const struct scheduler *ops, void *priv)
 {
-    arinc653_vcpu_t *av = priv;
+    arinc653_item_t *av = priv;
 
     if (av == NULL)
         return;
 
-    if ( !is_idle_vcpu(av->vc) )
+    if ( !is_idle_item(av->item) )
         list_del(&av->list);
 
     xfree(av);
-    update_schedule_vcpus(ops);
+    update_schedule_items(ops);
 }
 
 /**
- * Xen scheduler callback function to sleep a VCPU
+ * Xen scheduler callback function to sleep a ITEM
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param item      Pointer to struct sched_item
@@ -466,21 +465,19 @@ a653sched_free_vdata(const struct scheduler *ops, void *priv)
 static void
 a653sched_item_sleep(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
-
-    if ( AVCPU(vc) != NULL )
-        AVCPU(vc)->awake = 0;
+    if ( AITEM(item) != NULL )
+        AITEM(item)->awake = 0;
 
     /*
-     * If the VCPU being put to sleep is the same one that is currently
+     * If the ITEM being put to sleep is the same one that is currently
      * running, raise a softirq to invoke the scheduler to switch domains.
      */
-    if ( per_cpu(sched_res, vc->processor)->curr == item )
-        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+    if ( per_cpu(sched_res, sched_item_cpu(item))->curr == item )
+        cpu_raise_softirq(sched_item_cpu(item), SCHEDULE_SOFTIRQ);
 }
 
 /**
- * Xen scheduler callback function to wake up a VCPU
+ * Xen scheduler callback function to wake up a ITEM
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param item      Pointer to struct sched_item
@@ -488,24 +485,22 @@ a653sched_item_sleep(const struct scheduler *ops, struct sched_item *item)
 static void
 a653sched_item_wake(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
+    if ( AITEM(item) != NULL )
+        AITEM(item)->awake = 1;
 
-    if ( AVCPU(vc) != NULL )
-        AVCPU(vc)->awake = 1;
-
-    cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+    cpu_raise_softirq(sched_item_cpu(item), SCHEDULE_SOFTIRQ);
 }
 
 /**
- * Xen scheduler callback function to select a VCPU to run.
+ * Xen scheduler callback function to select a ITEM to run.
  * This is the main scheduler routine.
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param now       Current time
  *
- * @return          Address of the VCPU structure scheduled to be run next
- *                  Amount of time to execute the returned VCPU
- *                  Flag for whether the VCPU was migrated
+ * @return          Address of the ITEM structure scheduled to be run next
+ *                  Amount of time to execute the returned ITEM
+ *                  Flag for whether the ITEM was migrated
  */
 static struct task_slice
 a653sched_do_schedule(
@@ -514,7 +509,7 @@ a653sched_do_schedule(
     bool_t tasklet_work_scheduled)
 {
     struct task_slice ret;                      /* hold the chosen domain */
-    struct vcpu * new_task = NULL;
+    struct sched_item *new_task = NULL;
     static unsigned int sched_index = 0;
     static s_time_t next_switch_time;
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
@@ -559,14 +554,14 @@ a653sched_do_schedule(
      * sched_item structure.
      */
     new_task = (sched_index < sched_priv->num_schedule_entries)
-        ? sched_priv->schedule[sched_index].vc
+        ? sched_priv->schedule[sched_index].item
         : IDLETASK(cpu);
 
     /* Check to see if the new task can be run (awake & runnable). */
     if ( !((new_task != NULL)
-           && (AVCPU(new_task) != NULL)
-           && AVCPU(new_task)->awake
-           && vcpu_runnable(new_task)) )
+           && (AITEM(new_task) != NULL)
+           && AITEM(new_task)->awake
+           && item_runnable(new_task)) )
         new_task = IDLETASK(cpu);
     BUG_ON(new_task == NULL);
 
@@ -578,21 +573,21 @@ a653sched_do_schedule(
 
     spin_unlock_irqrestore(&sched_priv->lock, flags);
 
-    /* Tasklet work (which runs in idle VCPU context) overrides all else. */
+    /* Tasklet work (which runs in idle ITEM context) overrides all else. */
     if ( tasklet_work_scheduled )
         new_task = IDLETASK(cpu);
 
     /* Running this task would result in a migration */
-    if ( !is_idle_vcpu(new_task)
-         && (new_task->processor != cpu) )
+    if ( !is_idle_item(new_task)
+         && (sched_item_cpu(new_task) != cpu) )
         new_task = IDLETASK(cpu);
 
     /*
      * Return the amount of time the next domain has to run and the address
-     * of the selected task's VCPU structure.
+     * of the selected task's ITEM structure.
      */
     ret.time = next_switch_time - now;
-    ret.task = new_task->sched_item;
+    ret.task = new_task;
     ret.migrated = 0;
 
     BUG_ON(ret.time <= 0);
@@ -601,7 +596,7 @@ a653sched_do_schedule(
 }
 
 /**
- * Xen scheduler callback function to select a resource for the VCPU to run on
+ * Xen scheduler callback function to select a resource for the ITEM to run on
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param item      Pointer to struct sched_item
@@ -611,21 +606,20 @@ a653sched_do_schedule(
 static struct sched_resource *
 a653sched_pick_resource(const struct scheduler *ops, struct sched_item *item)
 {
-    struct vcpu *vc = item->vcpu;
     cpumask_t *online;
     unsigned int cpu;
 
-    /* 
-     * If present, prefer vc's current processor, else
-     * just find the first valid vcpu .
+    /*
+     * If present, prefer item's current processor, else
+     * just find the first valid item.
      */
-    online = cpupool_domain_cpumask(vc->domain);
+    online = cpupool_domain_cpumask(item->domain);
 
     cpu = cpumask_first(online);
 
-    if ( cpumask_test_cpu(vc->processor, online)
+    if ( cpumask_test_cpu(sched_item_cpu(item), online)
          || (cpu >= nr_cpu_ids) )
-        cpu = vc->processor;
+        cpu = sched_item_cpu(item);
 
     return per_cpu(sched_res, cpu);
 }
@@ -636,18 +630,18 @@ a653sched_pick_resource(const struct scheduler *ops, struct sched_item *item)
  * @param new_ops   Pointer to this instance of the scheduler structure
  * @param cpu       The cpu that is changing scheduler
  * @param pdata     scheduler specific PCPU data (we don't have any)
- * @param vdata     scheduler specific VCPU data of the idle vcpu
+ * @param vdata     scheduler specific ITEM data of the idle item
  */
 static void
 a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                   void *pdata, void *vdata)
 {
     struct sched_resource *sd = per_cpu(sched_res, cpu);
-    arinc653_vcpu_t *svc = vdata;
+    arinc653_item_t *svc = vdata;
 
-    ASSERT(!pdata && svc && is_idle_vcpu(svc->vc));
+    ASSERT(!pdata && svc && is_idle_item(svc->item));
 
-    idle_vcpu[cpu]->sched_item->priv = vdata;
+    sched_idle_item(cpu)->priv = vdata;
 
     per_cpu(scheduler, cpu) = new_ops;
     per_cpu(sched_res, cpu)->sched_priv = NULL; /* no pdata */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 29/49] xen: add sched_item_pause_nosync() and sched_item_unpause()
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (27 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 28/49] xen/sched: make arinc653 " Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 30/49] xen: let vcpu_create() select processor Juergen Gross
                   ` (25 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

The credit scheduler calls vcpu_pause_nosync() and vcpu_unpause()
today. Add sched_item_pause_nosync() and sched_item_unpause() to
perform the same operations on scheduler items instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit.c  |  6 +++---
 xen/include/xen/sched-if.h | 10 ++++++++++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index babccb69f7..9db5c3fc71 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1080,7 +1080,7 @@ csched_item_remove(const struct scheduler *ops, struct sched_item *item)
     if ( test_and_clear_bit(CSCHED_FLAG_ITEM_PARKED, &svc->flags) )
     {
         SCHED_STAT_CRANK(item_unpark);
-        vcpu_unpause(svc->item->vcpu);
+        sched_item_unpause(svc->item);
     }
 
     spin_lock_irq(&prv->lock);
@@ -1530,7 +1530,7 @@ csched_acct(void* dummy)
                      !test_and_set_bit(CSCHED_FLAG_ITEM_PARKED, &svc->flags) )
                 {
                     SCHED_STAT_CRANK(item_park);
-                    vcpu_pause_nosync(svc->item->vcpu);
+                    sched_item_pause_nosync(svc->item);
                 }
 
                 /* Lower bound on credits */
@@ -1554,7 +1554,7 @@ csched_acct(void* dummy)
                      * if it is woken up here.
                      */
                     SCHED_STAT_CRANK(item_unpark);
-                    vcpu_unpause(svc->item->vcpu);
+                    sched_item_unpause(svc->item);
                 }
 
                 /* Upper bound on credits means ITEM stops earning */
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 5cacede473..18134c7972 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -308,6 +308,16 @@ static inline void sched_free_domdata(const struct scheduler *s,
         ASSERT(!data);
 }
 
+static inline void sched_item_pause_nosync(struct sched_item *item)
+{
+    vcpu_pause_nosync(item->vcpu);
+}
+
+static inline void sched_item_unpause(struct sched_item *item)
+{
+    vcpu_unpause(item->vcpu);
+}
+
 #define REGISTER_SCHEDULER(x) static const struct scheduler *x##_entry \
   __used_section(".data.schedulers") = &x;
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 30/49] xen: let vcpu_create() select processor
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (28 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 29/49] xen: add sched_item_pause_nosync() and sched_item_unpause() Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 19:17   ` Andrew Cooper
  2019-03-29 15:09 ` [PATCH RFC 31/49] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers Juergen Gross
                   ` (24 subsequent siblings)
  54 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Today there are two distinct scenarios for vcpu_create(): either for
creation of idle-domain vcpus (vcpuid == processor) or for creation of
"normal" domain vcpus (including dom0), where the caller selects the
initial processor on a round-robin scheme of the allowed processors
(allowed being based on cpupool and affinities).

Instead of passing the initial processor to vcpu_create() and passing
on to sched_init_vcpu() let sched_init_vcpu() do the processor
selection. For supporting dom0 vcpu creation use the node_affinity of
the domain as a base for selecting the processors. User domains will
have initially all nodes set, so this is no different behavior compared
to today.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/arm/domain_build.c      | 13 ++++++-------
 xen/arch/x86/dom0_build.c        | 10 +++-------
 xen/arch/x86/hvm/dom0_build.c    |  9 ++-------
 xen/arch/x86/pv/dom0_build.c     | 10 ++--------
 xen/common/domain.c              |  5 ++---
 xen/common/domctl.c              | 10 ++--------
 xen/common/schedule.c            | 32 +++++++++++++++++++++++++++++---
 xen/include/asm-x86/dom0_build.h |  3 +--
 xen/include/xen/domain.h         |  3 +--
 xen/include/xen/sched.h          |  2 +-
 10 files changed, 49 insertions(+), 48 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index d9836779d1..d5294b0d26 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -80,7 +80,7 @@ unsigned int __init dom0_max_vcpus(void)
 
 struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
 {
-    return vcpu_create(dom0, 0, 0);
+    return vcpu_create(dom0, 0);
 }
 
 static unsigned int __init get_allocation_size(paddr_t size)
@@ -1923,7 +1923,7 @@ static void __init find_gnttab_region(struct domain *d,
 
 static int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
 {
-    int i, cpu;
+    int i;
     struct vcpu *v = d->vcpu[0];
     struct cpu_user_regs *regs = &v->arch.cpu_info->guest_cpu_user_regs;
 
@@ -1986,12 +1986,11 @@ static int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
     }
 #endif
 
-    for ( i = 1, cpu = 0; i < d->max_vcpus; i++ )
+    for ( i = 1; i < d->max_vcpus; i++ )
     {
-        cpu = cpumask_cycle(cpu, &cpu_online_map);
-        if ( vcpu_create(d, i, cpu) == NULL )
+        if ( vcpu_create(d, i) == NULL )
         {
-            printk("Failed to allocate dom0 vcpu %d on pcpu %d\n", i, cpu);
+            printk("Failed to allocate dom0 vcpu %d\n", i);
             break;
         }
 
@@ -2026,7 +2025,7 @@ static int __init construct_domU(struct domain *d,
 
     kinfo.vpl011 = dt_property_read_bool(node, "vpl011");
 
-    if ( vcpu_create(d, 0, 0) == NULL )
+    if ( vcpu_create(d, 0) == NULL )
         return -ENOMEM;
     d->max_pages = ~0U;
 
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 6ebe36766b..77b5646424 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -198,12 +198,9 @@ custom_param("dom0_nodes", parse_dom0_nodes);
 
 static cpumask_t __initdata dom0_cpus;
 
-struct vcpu *__init dom0_setup_vcpu(struct domain *d,
-                                    unsigned int vcpu_id,
-                                    unsigned int prev_cpu)
+struct vcpu *__init dom0_setup_vcpu(struct domain *d, unsigned int vcpu_id)
 {
-    unsigned int cpu = cpumask_cycle(prev_cpu, &dom0_cpus);
-    struct vcpu *v = vcpu_create(d, vcpu_id, cpu);
+    struct vcpu *v = vcpu_create(d, vcpu_id);
 
     if ( v )
     {
@@ -273,8 +270,7 @@ struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
     dom0->node_affinity = dom0_nodes;
     dom0->auto_node_affinity = !dom0_nr_pxms;
 
-    return dom0_setup_vcpu(dom0, 0,
-                           cpumask_last(&dom0_cpus) /* so it wraps around to first pcpu */);
+    return dom0_setup_vcpu(dom0, 0);
 }
 
 #ifdef CONFIG_SHADOW_PAGING
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index aa599f09ef..15166bbaa9 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -614,7 +614,7 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
                                  paddr_t start_info)
 {
     struct vcpu *v = d->vcpu[0];
-    unsigned int cpu = v->processor, i;
+    unsigned int i;
     int rc;
     /*
      * This sets the vCPU state according to the state described in
@@ -636,12 +636,7 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
     };
 
     for ( i = 1; i < d->max_vcpus; i++ )
-    {
-        const struct vcpu *p = dom0_setup_vcpu(d, i, cpu);
-
-        if ( p )
-            cpu = p->processor;
-    }
+        dom0_setup_vcpu(d, i);
 
     domain_update_node_affinity(d);
 
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index cef2d42254..800b3e6b7d 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -285,7 +285,7 @@ int __init dom0_construct_pv(struct domain *d,
                              module_t *initrd,
                              char *cmdline)
 {
-    int i, cpu, rc, compatible, order, machine;
+    int i, rc, compatible, order, machine;
     struct cpu_user_regs *regs;
     unsigned long pfn, mfn;
     unsigned long nr_pages;
@@ -693,14 +693,8 @@ int __init dom0_construct_pv(struct domain *d,
 
     printk("Dom%u has maximum %u VCPUs\n", d->domain_id, d->max_vcpus);
 
-    cpu = v->processor;
     for ( i = 1; i < d->max_vcpus; i++ )
-    {
-        const struct vcpu *p = dom0_setup_vcpu(d, i, cpu);
-
-        if ( p )
-            cpu = p->processor;
-    }
+        dom0_setup_vcpu(d, i);
 
     domain_update_node_affinity(d);
     d->arch.paging.mode = 0;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 2045e762ac..a5f0146459 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -129,8 +129,7 @@ static void vcpu_destroy(struct vcpu *v)
     free_vcpu_struct(v);
 }
 
-struct vcpu *vcpu_create(
-    struct domain *d, unsigned int vcpu_id, unsigned int cpu_id)
+struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
 {
     struct vcpu *v;
 
@@ -162,7 +161,7 @@ struct vcpu *vcpu_create(
         init_waitqueue_vcpu(v);
     }
 
-    if ( sched_init_vcpu(v, cpu_id) != 0 )
+    if ( sched_init_vcpu(v) != 0 )
         goto fail_wq;
 
     if ( arch_vcpu_create(v) != 0 )
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 6a9a54130d..ccde1ba706 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -540,8 +540,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 
     case XEN_DOMCTL_max_vcpus:
     {
-        unsigned int i, max = op->u.max_vcpus.max, cpu;
-        cpumask_t *online;
+        unsigned int i, max = op->u.max_vcpus.max;
 
         ret = -EINVAL;
         if ( (d == current->domain) || /* no domain_pause() */
@@ -552,18 +551,13 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         domain_pause(d);
 
         ret = -ENOMEM;
-        online = cpupool_domain_cpumask(d);
 
         for ( i = 0; i < max; i++ )
         {
             if ( d->vcpu[i] != NULL )
                 continue;
 
-            cpu = (i == 0) ?
-                cpumask_any(online) :
-                cpumask_cycle(d->vcpu[i-1]->processor, online);
-
-            if ( vcpu_create(d, i, cpu) == NULL )
+            if ( vcpu_create(d, i) == NULL )
                 goto maxvcpu_out;
         }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ae2a6d0323..9b5527c1eb 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -318,14 +318,40 @@ static struct sched_item *sched_alloc_item(struct vcpu *v)
     return NULL;
 }
 
-int sched_init_vcpu(struct vcpu *v, unsigned int processor)
+static unsigned int sched_select_initial_cpu(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    nodeid_t node;
+    cpumask_t cpus;
+
+    cpumask_clear(&cpus);
+    for_each_node_mask ( node, d->node_affinity )
+        cpumask_or(&cpus, &cpus, &node_to_cpumask(node));
+    cpumask_and(&cpus, &cpus, cpupool_domain_cpumask(d));
+    if ( cpumask_empty(&cpus) )
+        cpumask_copy(&cpus, cpupool_domain_cpumask(d));
+
+    if ( v->vcpu_id == 0 )
+        return cpumask_first(&cpus);
+
+    /* We can rely on previous vcpu being available. */
+    return cpumask_cycle(d->vcpu[v->vcpu_id - 1]->processor, &cpus);
+}
+
+int sched_init_vcpu(struct vcpu *v)
 {
     struct domain *d = v->domain;
     struct sched_item *item;
+    unsigned int processor;
 
     if ( (item = sched_alloc_item(v)) == NULL )
         return 1;
 
+    if ( is_idle_domain(d) || d->is_pinned )
+        processor = v->vcpu_id;
+    else
+        processor = sched_select_initial_cpu(v);
+
     sched_set_res(item, per_cpu(sched_res, processor));
 
     /* Initialise the per-vcpu timers. */
@@ -1684,7 +1710,7 @@ static int cpu_schedule_up(unsigned int cpu)
         return 0;
 
     if ( idle_vcpu[cpu] == NULL )
-        vcpu_create(idle_vcpu[0]->domain, cpu, cpu);
+        vcpu_create(idle_vcpu[0]->domain, cpu);
     else
     {
         struct vcpu *idle = idle_vcpu[cpu];
@@ -1878,7 +1904,7 @@ void __init scheduler_init(void)
     BUG_ON(nr_cpu_ids > ARRAY_SIZE(idle_vcpu));
     idle_domain->vcpu = idle_vcpu;
     idle_domain->max_vcpus = nr_cpu_ids;
-    if ( vcpu_create(idle_domain, 0, 0) == NULL )
+    if ( vcpu_create(idle_domain, 0) == NULL )
         BUG();
     this_cpu(sched_res)->curr = idle_vcpu[0]->sched_item;
     this_cpu(sched_res)->sched_priv = SCHED_OP(&ops, alloc_pdata, 0);
diff --git a/xen/include/asm-x86/dom0_build.h b/xen/include/asm-x86/dom0_build.h
index 33a5483739..3eb4b036e1 100644
--- a/xen/include/asm-x86/dom0_build.h
+++ b/xen/include/asm-x86/dom0_build.h
@@ -11,8 +11,7 @@ extern unsigned int dom0_memflags;
 unsigned long dom0_compute_nr_pages(struct domain *d,
                                     struct elf_dom_parms *parms,
                                     unsigned long initrd_len);
-struct vcpu *dom0_setup_vcpu(struct domain *d, unsigned int vcpu_id,
-                             unsigned int cpu);
+struct vcpu *dom0_setup_vcpu(struct domain *d, unsigned int vcpu_id);
 int dom0_setup_permissions(struct domain *d);
 
 int dom0_construct_pv(struct domain *d, const module_t *image,
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index d1bfc82f57..a6e929685c 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -13,8 +13,7 @@ typedef union {
     struct compat_vcpu_guest_context *cmp;
 } vcpu_guest_context_u __attribute__((__transparent_union__));
 
-struct vcpu *vcpu_create(
-    struct domain *d, unsigned int vcpu_id, unsigned int cpu_id);
+struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id);
 
 unsigned int dom0_max_vcpus(void);
 struct vcpu *alloc_dom0_vcpu0(struct domain *dom0);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 21a7fa14ce..f7eb138d86 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -625,7 +625,7 @@ void __domain_crash(struct domain *d);
 void noreturn asm_domain_crash_synchronous(unsigned long addr);
 
 void scheduler_init(void);
-int  sched_init_vcpu(struct vcpu *v, unsigned int processor);
+int  sched_init_vcpu(struct vcpu *v);
 void sched_destroy_vcpu(struct vcpu *v);
 int  sched_init_domain(struct domain *d, int poolid);
 void sched_destroy_domain(struct domain *d);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 31/49] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (29 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 30/49] xen: let vcpu_create() select processor Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 19:36   ` Andrew Cooper
  2019-03-29 15:09 ` [PATCH RFC 32/49] xen/sched: switch schedule() from vcpus to sched_items Juergen Gross
                   ` (23 subsequent siblings)
  54 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

Especially in the do_schedule() functions of the different schedulers
using smp_processor_id() for the local cpu number is correct only if
the sched_item is a single vcpu. As soon as larger sched_items are
used most uses should be replaced by the cpu number of the local
sched_resource instead.

Add a helper to get that sched_resource cpu and modify the schedulers
to use it in a correct way.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c |  2 +-
 xen/common/sched_credit.c   | 19 ++++++++--------
 xen/common/sched_credit2.c  | 53 +++++++++++++++++++++++----------------------
 xen/common/sched_null.c     | 17 ++++++++-------
 xen/common/sched_rt.c       | 17 ++++++++-------
 xen/common/schedule.c       |  2 +-
 xen/include/xen/sched-if.h  |  5 +++++
 7 files changed, 62 insertions(+), 53 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 61f9ea6824..3919c0a3e9 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -513,7 +513,7 @@ a653sched_do_schedule(
     static unsigned int sched_index = 0;
     static s_time_t next_switch_time;
     a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
-    const unsigned int cpu = smp_processor_id();
+    const unsigned int cpu = sched_get_resource_cpu(smp_processor_id());
     unsigned long flags;
 
     spin_lock_irqsave(&sched_priv->lock, flags);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 9db5c3fc71..4734f52fc7 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1839,8 +1839,9 @@ static struct task_slice
 csched_schedule(
     const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
 {
-    const int cpu = smp_processor_id();
-    struct list_head * const runq = RUNQ(cpu);
+    const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
+    struct list_head * const runq = RUNQ(sched_cpu);
     struct sched_item *item = current->sched_item;
     struct csched_item * const scurr = CSCHED_ITEM(item);
     struct csched_private *prv = CSCHED_PRIV(ops);
@@ -1950,7 +1951,7 @@ csched_schedule(
     {
         BUG_ON( is_idle_item(item) || list_empty(runq) );
         /* Current has blocked. Update the runnable counter for this cpu. */
-        dec_nr_runnable(cpu);
+        dec_nr_runnable(sched_cpu);
     }
 
     snext = __runq_elem(runq->next);
@@ -1960,7 +1961,7 @@ csched_schedule(
     if ( tasklet_work_scheduled )
     {
         TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_ITEM(sched_idle_item(cpu));
+        snext = CSCHED_ITEM(sched_idle_item(sched_cpu));
         snext->pri = CSCHED_PRI_TS_BOOST;
     }
 
@@ -1980,7 +1981,7 @@ csched_schedule(
     if ( snext->pri > CSCHED_PRI_TS_OVER )
         __runq_remove(snext);
     else
-        snext = csched_load_balance(prv, cpu, snext, &ret.migrated);
+        snext = csched_load_balance(prv, sched_cpu, snext, &ret.migrated);
 
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
@@ -1988,12 +1989,12 @@ csched_schedule(
      */
     if ( !tasklet_work_scheduled && snext->pri == CSCHED_PRI_IDLE )
     {
-        if ( !cpumask_test_cpu(cpu, prv->idlers) )
-            cpumask_set_cpu(cpu, prv->idlers);
+        if ( !cpumask_test_cpu(sched_cpu, prv->idlers) )
+            cpumask_set_cpu(sched_cpu, prv->idlers);
     }
-    else if ( cpumask_test_cpu(cpu, prv->idlers) )
+    else if ( cpumask_test_cpu(sched_cpu, prv->idlers) )
     {
-        cpumask_clear_cpu(cpu, prv->idlers);
+        cpumask_clear_cpu(sched_cpu, prv->idlers);
     }
 
     if ( !is_idle_item(snext->item) )
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 7918d46a23..d5cb8c0200 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3447,7 +3447,8 @@ static struct task_slice
 csched2_schedule(
     const struct scheduler *ops, s_time_t now, bool tasklet_work_scheduled)
 {
-    const int cpu = smp_processor_id();
+    const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct csched2_runqueue_data *rqd;
     struct sched_item *curritem = current->sched_item;
     struct csched2_item * const scurr = csched2_item(curritem);
@@ -3459,22 +3460,22 @@ csched2_schedule(
     SCHED_STAT_CRANK(schedule);
     CSCHED2_ITEM_CHECK(curritem);
 
-    BUG_ON(!cpumask_test_cpu(cpu, &csched2_priv(ops)->initialized));
+    BUG_ON(!cpumask_test_cpu(sched_cpu, &csched2_priv(ops)->initialized));
 
-    rqd = c2rqd(ops, cpu);
-    BUG_ON(!cpumask_test_cpu(cpu, &rqd->active));
+    rqd = c2rqd(ops, sched_cpu);
+    BUG_ON(!cpumask_test_cpu(sched_cpu, &rqd->active));
 
-    ASSERT(spin_is_locked(per_cpu(sched_res, cpu)->schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(sched_res, sched_cpu)->schedule_lock));
 
     BUG_ON(!is_idle_item(curritem) && scurr->rqd != rqd);
 
     /* Clear "tickled" bit now that we've been scheduled */
-    tickled = cpumask_test_cpu(cpu, &rqd->tickled);
+    tickled = cpumask_test_cpu(sched_cpu, &rqd->tickled);
     if ( tickled )
     {
-        __cpumask_clear_cpu(cpu, &rqd->tickled);
+        __cpumask_clear_cpu(sched_cpu, &rqd->tickled);
         cpumask_andnot(cpumask_scratch, &rqd->idle, &rqd->tickled);
-        smt_idle_mask_set(cpu, cpumask_scratch, &rqd->smt_idle);
+        smt_idle_mask_set(sched_cpu, cpumask_scratch, &rqd->smt_idle);
     }
 
     if ( unlikely(tb_init_done) )
@@ -3484,10 +3485,10 @@ csched2_schedule(
             unsigned tasklet:8, idle:8, smt_idle:8, tickled:8;
         } d;
         d.cpu = cpu;
-        d.rq_id = c2r(cpu);
+        d.rq_id = c2r(sched_cpu);
         d.tasklet = tasklet_work_scheduled;
         d.idle = is_idle_item(curritem);
-        d.smt_idle = cpumask_test_cpu(cpu, &rqd->smt_idle);
+        d.smt_idle = cpumask_test_cpu(sched_cpu, &rqd->smt_idle);
         d.tickled = tickled;
         __trace_var(TRC_CSCHED2_SCHEDULE, 1,
                     sizeof(d),
@@ -3527,10 +3528,10 @@ csched2_schedule(
     {
         __clear_bit(__CSFLAG_item_yield, &scurr->flags);
         trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
-        snext = csched2_item(sched_idle_item(cpu));
+        snext = csched2_item(sched_idle_item(sched_cpu));
     }
     else
-        snext = runq_candidate(rqd, scurr, cpu, now, &skipped_items);
+        snext = runq_candidate(rqd, scurr, sched_cpu, now, &skipped_items);
 
     /* If switching from a non-idle runnable item, put it
      * back on the runqueue. */
@@ -3555,10 +3556,10 @@ csched2_schedule(
         }
 
         /* Clear the idle mask if necessary */
-        if ( cpumask_test_cpu(cpu, &rqd->idle) )
+        if ( cpumask_test_cpu(sched_cpu, &rqd->idle) )
         {
-            __cpumask_clear_cpu(cpu, &rqd->idle);
-            smt_idle_mask_clear(cpu, &rqd->smt_idle);
+            __cpumask_clear_cpu(sched_cpu, &rqd->idle);
+            smt_idle_mask_clear(sched_cpu, &rqd->smt_idle);
         }
 
         /*
@@ -3577,18 +3578,18 @@ csched2_schedule(
          */
         if ( skipped_items == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
         {
-            reset_credit(ops, cpu, now, snext);
-            balance_load(ops, cpu, now);
+            reset_credit(ops, sched_cpu, now, snext);
+            balance_load(ops, sched_cpu, now);
         }
 
         snext->start_time = now;
         snext->tickled_cpu = -1;
 
         /* Safe because lock for old processor is held */
-        if ( sched_item_cpu(snext->item) != cpu )
+        if ( sched_item_cpu(snext->item) != sched_cpu )
         {
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
-            sched_set_res(snext->item, per_cpu(sched_res, cpu));
+            sched_set_res(snext->item, per_cpu(sched_res, sched_cpu));
             SCHED_STAT_CRANK(migrated);
             ret.migrated = 1;
         }
@@ -3601,17 +3602,17 @@ csched2_schedule(
          */
         if ( tasklet_work_scheduled )
         {
-            if ( cpumask_test_cpu(cpu, &rqd->idle) )
+            if ( cpumask_test_cpu(sched_cpu, &rqd->idle) )
             {
-                __cpumask_clear_cpu(cpu, &rqd->idle);
-                smt_idle_mask_clear(cpu, &rqd->smt_idle);
+                __cpumask_clear_cpu(sched_cpu, &rqd->idle);
+                smt_idle_mask_clear(sched_cpu, &rqd->smt_idle);
             }
         }
-        else if ( !cpumask_test_cpu(cpu, &rqd->idle) )
+        else if ( !cpumask_test_cpu(sched_cpu, &rqd->idle) )
         {
-            __cpumask_set_cpu(cpu, &rqd->idle);
+            __cpumask_set_cpu(sched_cpu, &rqd->idle);
             cpumask_andnot(cpumask_scratch, &rqd->idle, &rqd->tickled);
-            smt_idle_mask_set(cpu, cpumask_scratch, &rqd->smt_idle);
+            smt_idle_mask_set(sched_cpu, cpumask_scratch, &rqd->smt_idle);
         }
         /* Make sure avgload gets updated periodically even
          * if there's no activity */
@@ -3621,7 +3622,7 @@ csched2_schedule(
     /*
      * Return task to run next...
      */
-    ret.time = csched2_runtime(ops, cpu, snext, now);
+    ret.time = csched2_runtime(ops, sched_cpu, snext, now);
     ret.task = snext->item;
 
     CSCHED2_ITEM_CHECK(ret.task);
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index ceb026c8af..34ce7a05d3 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -709,6 +709,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
 {
     unsigned int bs;
     const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct null_private *prv = null_priv(ops);
     struct null_item *wvc;
     struct task_slice ret;
@@ -724,14 +725,14 @@ static struct task_slice null_schedule(const struct scheduler *ops,
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        if ( per_cpu(npc, cpu).item == NULL )
+        if ( per_cpu(npc, sched_cpu).item == NULL )
         {
             d.item = d.dom = -1;
         }
         else
         {
-            d.item = per_cpu(npc, cpu).item->item_id;
-            d.dom = per_cpu(npc, cpu).item->domain->domain_id;
+            d.item = per_cpu(npc, sched_cpu).item->item_id;
+            d.dom = per_cpu(npc, sched_cpu).item->domain->domain_id;
         }
         __trace_var(TRC_SNULL_SCHEDULE, 1, sizeof(d), &d);
     }
@@ -739,10 +740,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = sched_idle_item(cpu);
+        ret.task = sched_idle_item(sched_cpu);
     }
     else
-        ret.task = per_cpu(npc, cpu).item;
+        ret.task = per_cpu(npc, sched_cpu).item;
     ret.migrated = 0;
     ret.time = -1;
 
@@ -773,9 +774,9 @@ static struct task_slice null_schedule(const struct scheduler *ops,
                      !has_soft_affinity(wvc->item) )
                     continue;
 
-                if ( item_check_affinity(wvc->item, cpu, bs) )
+                if ( item_check_affinity(wvc->item, sched_cpu, bs) )
                 {
-                    item_assign(prv, wvc->item, cpu);
+                    item_assign(prv, wvc->item, sched_cpu);
                     list_del_init(&wvc->waitq_elem);
                     ret.task = wvc->item;
                     goto unlock;
@@ -787,7 +788,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     }
 
     if ( unlikely(ret.task == NULL || !item_runnable(ret.task)) )
-        ret.task = sched_idle_item(cpu);
+        ret.task = sched_idle_item(sched_cpu);
 
     NULL_ITEM_CHECK(ret.task);
     return ret;
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 730aa292d4..2366e33beb 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1065,7 +1065,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
 static struct task_slice
 rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
 {
-    const int cpu = smp_processor_id();
+    const unsigned int cpu = smp_processor_id();
+    const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct rt_private *prv = rt_priv(ops);
     struct rt_item *const scurr = rt_item(current->sched_item);
     struct rt_item *snext = NULL;
@@ -1079,7 +1080,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         } d;
         d.cpu = cpu;
         d.tasklet = tasklet_work_scheduled;
-        d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
+        d.tickled = cpumask_test_cpu(sched_cpu, &prv->tickled);
         d.idle = is_idle_item(curritem);
         trace_var(TRC_RTDS_SCHEDULE, 1,
                   sizeof(d),
@@ -1087,7 +1088,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     }
 
     /* clear ticked bit now that we've been scheduled */
-    cpumask_clear_cpu(cpu, &prv->tickled);
+    cpumask_clear_cpu(sched_cpu, &prv->tickled);
 
     /* burn_budget would return for IDLE ITEM */
     burn_budget(ops, scurr, now);
@@ -1095,13 +1096,13 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0,  NULL);
-        snext = rt_item(sched_idle_item(cpu));
+        snext = rt_item(sched_idle_item(sched_cpu));
     }
     else
     {
-        snext = runq_pick(ops, cpumask_of(cpu));
+        snext = runq_pick(ops, cpumask_of(sched_cpu));
         if ( snext == NULL )
-            snext = rt_item(sched_idle_item(cpu));
+            snext = rt_item(sched_idle_item(sched_cpu));
 
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_item(curritem) &&
@@ -1126,9 +1127,9 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
             q_remove(snext);
             __set_bit(__RTDS_scheduled, &snext->flags);
         }
-        if ( sched_item_cpu(snext->item) != cpu )
+        if ( sched_item_cpu(snext->item) != sched_cpu )
         {
-            sched_set_res(snext->item, per_cpu(sched_res, cpu));
+            sched_set_res(snext->item, per_cpu(sched_res, sched_cpu));
             ret.migrated = 1;
         }
         ret.time = snext->cur_budget; /* invoke the scheduler next time */
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 9b5527c1eb..0b5e5e566b 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -347,7 +347,7 @@ int sched_init_vcpu(struct vcpu *v)
     if ( (item = sched_alloc_item(v)) == NULL )
         return 1;
 
-    if ( is_idle_domain(d) || d->is_pinned )
+    if ( is_idle_domain(d) )
         processor = v->vcpu_id;
     else
         processor = sched_select_initial_cpu(v);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 18134c7972..38b403dfbf 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -141,6 +141,11 @@ static inline bool vcpu_running(struct vcpu *v)
     return v->sched_item->is_running;
 }
 
+static inline unsigned int sched_get_resource_cpu(unsigned int cpu)
+{
+    return per_cpu(sched_res, cpu)->processor;
+}
+
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
  * Within each scheduler, when using the scratch mask of one pCPU:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 32/49] xen/sched: switch schedule() from vcpus to sched_items
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (30 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 31/49] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 33/49] xen/sched: switch sched_move_irqs() to take sched_item as parameter Juergen Gross
                   ` (22 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Use sched_items instead of vcpus in schedule(). This includes the
introduction of sched_item_runstate_change() as a replacement of
vcpu_runstate_change() in schedule().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 70 +++++++++++++++++++++++++++++----------------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 0b5e5e566b..c16f548b63 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -196,6 +196,20 @@ static inline void vcpu_runstate_change(
     v->runstate.state = new_state;
 }
 
+static inline void sched_item_runstate_change(struct sched_item *item,
+    bool running, s_time_t new_entry_time)
+{
+    struct vcpu *v = item->vcpu;
+
+    if ( running )
+        vcpu_runstate_change(v, RUNSTATE_running, new_entry_time);
+    else
+        vcpu_runstate_change(v,
+            ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
+             (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)),
+            new_entry_time);
+}
+
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
 {
     spinlock_t *lock = likely(v == current)
@@ -1532,7 +1546,7 @@ static void vcpu_periodic_timer_work(struct vcpu *v)
  */
 static void schedule(void)
 {
-    struct vcpu          *prev = current, *next = NULL;
+    struct sched_item    *prev = current->sched_item, *next = NULL;
     s_time_t              now;
     struct scheduler     *sched;
     unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
@@ -1576,9 +1590,9 @@ static void schedule(void)
     sched = this_cpu(scheduler);
     next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
 
-    next = next_slice.task->vcpu;
+    next = next_slice.task;
 
-    sd->curr = next->sched_item;
+    sd->curr = next;
 
     if ( next_slice.time >= 0 ) /* -ve means no limit */
         set_timer(&sd->s_timer, now + next_slice.time);
@@ -1587,59 +1601,55 @@ static void schedule(void)
     {
         pcpu_schedule_unlock_irq(lock, cpu);
         TRACE_4D(TRC_SCHED_SWITCH_INFCONT,
-                 next->domain->domain_id, next->vcpu_id,
-                 now - prev->runstate.state_entry_time,
+                 next->domain->domain_id, next->item_id,
+                 now - prev->state_entry_time,
                  next_slice.time);
-        trace_continue_running(next);
-        return continue_running(prev);
+        trace_continue_running(next->vcpu);
+        return continue_running(prev->vcpu);
     }
 
     TRACE_3D(TRC_SCHED_SWITCH_INFPREV,
-             prev->domain->domain_id, prev->vcpu_id,
-             now - prev->runstate.state_entry_time);
+             prev->domain->domain_id, prev->item_id,
+             now - prev->state_entry_time);
     TRACE_4D(TRC_SCHED_SWITCH_INFNEXT,
-             next->domain->domain_id, next->vcpu_id,
-             (next->runstate.state == RUNSTATE_runnable) ?
-             (now - next->runstate.state_entry_time) : 0,
+             next->domain->domain_id, next->item_id,
+             (next->vcpu->runstate.state == RUNSTATE_runnable) ?
+             (now - next->state_entry_time) : 0,
              next_slice.time);
 
-    ASSERT(prev->runstate.state == RUNSTATE_running);
+    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
 
     TRACE_4D(TRC_SCHED_SWITCH,
-             prev->domain->domain_id, prev->vcpu_id,
-             next->domain->domain_id, next->vcpu_id);
+             prev->domain->domain_id, prev->item_id,
+             next->domain->domain_id, next->item_id);
 
-    vcpu_runstate_change(
-        prev,
-        ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
-         (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
-        now);
-    prev->sched_item->last_run_time = now;
+    sched_item_runstate_change(prev, false, now);
+    prev->last_run_time = now;
 
-    ASSERT(next->runstate.state != RUNSTATE_running);
-    vcpu_runstate_change(next, RUNSTATE_running, now);
+    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
+    sched_item_runstate_change(next, true, now);
 
     /*
      * NB. Don't add any trace records from here until the actual context
      * switch, else lost_records resume will not work properly.
      */
 
-    ASSERT(!vcpu_running(next));
-    next->sched_item->is_running = 1;
-    next->sched_item->state_entry_time = now;
+    ASSERT(!next->is_running);
+    next->is_running = 1;
+    next->state_entry_time = now;
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
     SCHED_STAT_CRANK(sched_ctx);
 
-    stop_timer(&prev->periodic_timer);
+    stop_timer(&prev->vcpu->periodic_timer);
 
     if ( next_slice.migrated )
-        sched_move_irqs(next);
+        sched_move_irqs(next->vcpu);
 
-    vcpu_periodic_timer_work(next);
+    vcpu_periodic_timer_work(next->vcpu);
 
-    context_switch(prev, next);
+    context_switch(prev->vcpu, next->vcpu);
 }
 
 void context_saved(struct vcpu *prev)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 33/49] xen/sched: switch sched_move_irqs() to take sched_item as parameter
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (31 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 32/49] xen/sched: switch schedule() from vcpus to sched_items Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 34/49] xen: switch from for_each_vcpu() to for_each_sched_item() Juergen Gross
                   ` (21 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

sched_move_irqs() should work on a sched_item as that is the item
moved between cpus.

Rename the current function to vcpu_move_irqs() as it is still needed
in schedule().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c16f548b63..a5147b9481 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -407,12 +407,20 @@ int sched_init_vcpu(struct vcpu *v)
     return 0;
 }
 
-static void sched_move_irqs(struct vcpu *v)
+static void vcpu_move_irqs(struct vcpu *v)
 {
     arch_move_irqs(v);
     evtchn_move_pirqs(v);
 }
 
+static void sched_move_irqs(struct sched_item *item)
+{
+    struct vcpu *v;
+
+    for_each_sched_item_vcpu( item, v )
+        vcpu_move_irqs(v);
+}
+
 int sched_move_domain(struct domain *d, struct cpupool *c)
 {
     struct vcpu *v;
@@ -492,7 +500,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
         v->sched_item->priv = vcpu_priv[v->vcpu_id];
         if ( !d->is_dying )
-            sched_move_irqs(v);
+            sched_move_irqs(v->sched_item);
 
         new_p = cpumask_cycle(new_p, c->cpu_valid);
 
@@ -787,7 +795,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
     sched_spin_unlock_double(old_lock, new_lock, flags);
 
     if ( old_cpu != new_cpu )
-        sched_move_irqs(v);
+        sched_move_irqs(v->sched_item);
 
     /* Wake on new CPU. */
     vcpu_wake(v);
@@ -865,7 +873,7 @@ void restore_vcpu_affinity(struct domain *d)
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
-            sched_move_irqs(v);
+            sched_move_irqs(v->sched_item);
     }
 
     domain_update_node_affinity(d);
@@ -1645,7 +1653,7 @@ static void schedule(void)
     stop_timer(&prev->vcpu->periodic_timer);
 
     if ( next_slice.migrated )
-        sched_move_irqs(next->vcpu);
+        vcpu_move_irqs(next->vcpu);
 
     vcpu_periodic_timer_work(next->vcpu);
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 34/49] xen: switch from for_each_vcpu() to for_each_sched_item()
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (32 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 33/49] xen/sched: switch sched_move_irqs() to take sched_item as parameter Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 35/49] xen/sched: add runstate counters to struct sched_item Juergen Gross
                   ` (20 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli

Where appropriate switch from for_each_vcpu() to for_each_sched_item()
in order to prepare core scheduling.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/domain.c   |   9 ++--
 xen/common/schedule.c | 112 ++++++++++++++++++++++++++------------------------
 2 files changed, 63 insertions(+), 58 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index a5f0146459..2773a21129 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -511,7 +511,7 @@ void domain_update_node_affinity(struct domain *d)
     cpumask_var_t dom_cpumask, dom_cpumask_soft;
     cpumask_t *dom_affinity;
     const cpumask_t *online;
-    struct vcpu *v;
+    struct sched_item *item;
     unsigned int cpu;
 
     /* Do we have vcpus already? If not, no need to update node-affinity. */
@@ -544,12 +544,11 @@ void domain_update_node_affinity(struct domain *d)
          * and the full mask of where it would prefer to run (the union of
          * the soft affinity of all its various vcpus). Let's build them.
          */
-        for_each_vcpu ( d, v )
+        for_each_sched_item ( d, item )
         {
-            cpumask_or(dom_cpumask, dom_cpumask,
-                       v->sched_item->cpu_hard_affinity);
+            cpumask_or(dom_cpumask, dom_cpumask, item->cpu_hard_affinity);
             cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
-                       v->sched_item->cpu_soft_affinity);
+                       item->cpu_soft_affinity);
         }
         /* Filter out non-online cpus */
         cpumask_and(dom_cpumask, dom_cpumask, online);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index a5147b9481..5a12d9bdc7 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -424,16 +424,17 @@ static void sched_move_irqs(struct sched_item *item)
 int sched_move_domain(struct domain *d, struct cpupool *c)
 {
     struct vcpu *v;
+    struct sched_item *item;
     unsigned int new_p;
-    void **vcpu_priv;
+    void **item_priv;
     void *domdata;
-    void *vcpudata;
+    void *itemdata;
     struct scheduler *old_ops;
     void *old_domdata;
 
-    for_each_vcpu ( d, v )
+    for_each_sched_item ( d, item )
     {
-        if ( v->sched_item->affinity_broken )
+        if ( item->affinity_broken )
             return -EBUSY;
     }
 
@@ -441,22 +442,22 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     if ( IS_ERR(domdata) )
         return PTR_ERR(domdata);
 
-    vcpu_priv = xzalloc_array(void *, d->max_vcpus);
-    if ( vcpu_priv == NULL )
+    item_priv = xzalloc_array(void *, d->max_vcpus);
+    if ( item_priv == NULL )
     {
         sched_free_domdata(c->sched, domdata);
         return -ENOMEM;
     }
 
-    for_each_vcpu ( d, v )
+    for_each_sched_item ( d, item )
     {
-        vcpu_priv[v->vcpu_id] = SCHED_OP(c->sched, alloc_vdata,
-                                         v->sched_item, domdata);
-        if ( vcpu_priv[v->vcpu_id] == NULL )
+        item_priv[item->item_id] = SCHED_OP(c->sched, alloc_vdata,
+                                            item, domdata);
+        if ( item_priv[item->item_id] == NULL )
         {
-            for_each_vcpu ( d, v )
-                xfree(vcpu_priv[v->vcpu_id]);
-            xfree(vcpu_priv);
+            for_each_sched_item ( d, item )
+                xfree(item_priv[item->item_id]);
+            xfree(item_priv);
             sched_free_domdata(c->sched, domdata);
             return -ENOMEM;
         }
@@ -467,30 +468,35 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     old_ops = dom_scheduler(d);
     old_domdata = d->sched_priv;
 
-    for_each_vcpu ( d, v )
+    for_each_sched_item ( d, item )
     {
-        SCHED_OP(old_ops, remove_item, v->sched_item);
+        SCHED_OP(old_ops, remove_item, item);
     }
 
     d->cpupool = c;
     d->sched_priv = domdata;
 
     new_p = cpumask_first(c->cpu_valid);
-    for_each_vcpu ( d, v )
+    for_each_sched_item ( d, item )
     {
         spinlock_t *lock;
+        unsigned int item_p = new_p;
 
-        vcpudata = v->sched_item->priv;
+        itemdata = item->priv;
 
-        migrate_timer(&v->periodic_timer, new_p);
-        migrate_timer(&v->singleshot_timer, new_p);
-        migrate_timer(&v->poll_timer, new_p);
+        for_each_sched_item_vcpu( item, v )
+        {
+            migrate_timer(&v->periodic_timer, new_p);
+            migrate_timer(&v->singleshot_timer, new_p);
+            migrate_timer(&v->poll_timer, new_p);
+            new_p = cpumask_cycle(new_p, c->cpu_valid);
+        }
 
-        lock = item_schedule_lock_irq(v->sched_item);
+        lock = item_schedule_lock_irq(item);
 
-        sched_set_affinity(v, &cpumask_all, &cpumask_all);
+        sched_set_affinity(item->vcpu, &cpumask_all, &cpumask_all);
 
-        sched_set_res(v->sched_item, per_cpu(sched_res, new_p));
+        sched_set_res(item, per_cpu(sched_res, item_p));
         /*
          * With v->processor modified we must not
          * - make any further changes assuming we hold the scheduler lock,
@@ -498,15 +504,13 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
          */
         spin_unlock_irq(lock);
 
-        v->sched_item->priv = vcpu_priv[v->vcpu_id];
+        item->priv = item_priv[item->item_id];
         if ( !d->is_dying )
             sched_move_irqs(v->sched_item);
 
-        new_p = cpumask_cycle(new_p, c->cpu_valid);
+        SCHED_OP(c->sched, insert_item, item);
 
-        SCHED_OP(c->sched, insert_item, v->sched_item);
-
-        SCHED_OP(old_ops, free_vdata, vcpudata);
+        SCHED_OP(old_ops, free_vdata, itemdata);
     }
 
     domain_update_node_affinity(d);
@@ -515,7 +519,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     sched_free_domdata(old_ops, old_domdata);
 
-    xfree(vcpu_priv);
+    xfree(item_priv);
 
     return 0;
 }
@@ -822,15 +826,14 @@ void vcpu_force_reschedule(struct vcpu *v)
 void restore_vcpu_affinity(struct domain *d)
 {
     unsigned int cpu = smp_processor_id();
-    struct vcpu *v;
+    struct sched_item *item;
 
     ASSERT(system_state == SYS_STATE_resume);
 
-    for_each_vcpu ( d, v )
+    for_each_sched_item ( d, item )
     {
         spinlock_t *lock;
-        unsigned int old_cpu = v->processor;
-        struct sched_item *item = v->sched_item;
+        unsigned int old_cpu = sched_item_cpu(item);
         struct sched_resource *res;
 
         ASSERT(!item_runnable(item));
@@ -849,7 +852,8 @@ void restore_vcpu_affinity(struct domain *d)
         {
             if ( item->affinity_broken )
             {
-                sched_set_affinity(v, item->cpu_hard_affinity_saved, NULL);
+                sched_set_affinity(item->vcpu, item->cpu_hard_affinity_saved,
+                                   NULL);
                 item->affinity_broken = 0;
                 cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
@@ -857,8 +861,8 @@ void restore_vcpu_affinity(struct domain *d)
 
             if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
             {
-                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
-                sched_set_affinity(v, &cpumask_all, NULL);
+                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", item->vcpu);
+                sched_set_affinity(item->vcpu, &cpumask_all, NULL);
                 cpumask_and(cpumask_scratch_cpu(cpu), item->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
             }
@@ -868,12 +872,12 @@ void restore_vcpu_affinity(struct domain *d)
         sched_set_res(item, res);
 
         lock = item_schedule_lock_irq(item);
-        res = SCHED_OP(vcpu_scheduler(v), pick_resource, item);
+        res = SCHED_OP(vcpu_scheduler(item->vcpu), pick_resource, item);
         sched_set_res(item, res);
         spin_unlock_irq(lock);
 
-        if ( old_cpu != v->processor )
-            sched_move_irqs(v->sched_item);
+        if ( old_cpu != sched_item_cpu(item) )
+            sched_move_irqs(item);
     }
 
     domain_update_node_affinity(d);
@@ -887,7 +891,6 @@ void restore_vcpu_affinity(struct domain *d)
 int cpu_disable_scheduler(unsigned int cpu)
 {
     struct domain *d;
-    struct vcpu *v;
     struct cpupool *c;
     cpumask_t online_affinity;
     int ret = 0;
@@ -898,10 +901,11 @@ int cpu_disable_scheduler(unsigned int cpu)
 
     for_each_domain_in_cpupool ( d, c )
     {
-        for_each_vcpu ( d, v )
+        struct sched_item *item;
+
+        for_each_sched_item ( d, item )
         {
             unsigned long flags;
-            struct sched_item *item = v->sched_item;
             spinlock_t *lock = item_schedule_lock_irqsave(item, &flags);
 
             cpumask_and(&online_affinity, item->cpu_hard_affinity, c->cpu_valid);
@@ -916,14 +920,14 @@ int cpu_disable_scheduler(unsigned int cpu)
                     break;
                 }
 
-                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
+                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", item->vcpu);
 
-                sched_set_affinity(v, &cpumask_all, NULL);
+                sched_set_affinity(item->vcpu, &cpumask_all, NULL);
             }
 
-            if ( v->processor != cpu )
+            if ( sched_item_cpu(item) != sched_get_resource_cpu(cpu) )
             {
-                /* The vcpu is not on this cpu, so we can move on. */
+                /* The item is not on this cpu, so we can move on. */
                 item_schedule_unlock_irqrestore(lock, flags, item);
                 continue;
             }
@@ -936,17 +940,17 @@ int cpu_disable_scheduler(unsigned int cpu)
              *  * the scheduler will always find a suitable solution, or
              *    things would have failed before getting in here.
              */
-            vcpu_migrate_start(v);
+            vcpu_migrate_start(item->vcpu);
             item_schedule_unlock_irqrestore(lock, flags, item);
 
-            vcpu_migrate_finish(v);
+            vcpu_migrate_finish(item->vcpu);
 
             /*
              * The only caveat, in this case, is that if a vcpu active in
              * the hypervisor isn't migratable. In this case, the caller
              * should try again after releasing and reaquiring all locks.
              */
-            if ( v->processor == cpu )
+            if ( sched_item_cpu(item) == sched_get_resource_cpu(cpu) )
                 ret = -EAGAIN;
         }
     }
@@ -957,7 +961,6 @@ int cpu_disable_scheduler(unsigned int cpu)
 static int cpu_disable_scheduler_check(unsigned int cpu)
 {
     struct domain *d;
-    struct vcpu *v;
     struct cpupool *c;
 
     c = per_cpu(cpupool, cpu);
@@ -966,11 +969,14 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 
     for_each_domain_in_cpupool ( d, c )
     {
-        for_each_vcpu ( d, v )
+        struct sched_item *item;
+
+        for_each_sched_item ( d, item )
         {
-            if ( v->sched_item->affinity_broken )
+            if ( item->affinity_broken )
                 return -EADDRINUSE;
-            if ( system_state != SYS_STATE_suspend && v->processor == cpu )
+            if ( system_state != SYS_STATE_suspend &&
+                 sched_item_cpu(item) == sched_get_resource_cpu(cpu) )
                 return -EAGAIN;
         }
     }
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 35/49] xen/sched: add runstate counters to struct sched_item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (33 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 34/49] xen: switch from for_each_vcpu() to for_each_sched_item() Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 36/49] xen/sched: rework and rename vcpu_force_reschedule() Juergen Gross
                   ` (19 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Add counters to struct sched_item indicating how many vcpus are either
running/runnable or blocked/paused. The counters are updated when a
vcpu runstate is changed.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c      | 27 +++++++++++++++++++++++++++
 xen/include/xen/sched-if.h |  4 ++++
 2 files changed, 31 insertions(+)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5a12d9bdc7..0a94505b89 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -178,6 +178,7 @@ static inline void vcpu_runstate_change(
     struct vcpu *v, int new_state, s_time_t new_entry_time)
 {
     s_time_t delta;
+    bool old_run, new_run;
 
     ASSERT(v->runstate.state != new_state);
     ASSERT(spin_is_locked(per_cpu(sched_res, v->processor)->schedule_lock));
@@ -186,6 +187,26 @@ static inline void vcpu_runstate_change(
 
     trace_runstate_change(v, new_state);
 
+    old_run = (v->runstate.state == RUNSTATE_running ||
+               v->runstate.state == RUNSTATE_runnable);
+    new_run = (new_state == RUNSTATE_running || new_state == RUNSTATE_runnable);
+
+    if ( old_run != new_run )
+    {
+        struct sched_item *item = v->sched_item;
+
+        if ( old_run )
+        {
+            item->run_cnt--;
+            item->idle_cnt++;
+        }
+        else
+        {
+            item->run_cnt++;
+            item->idle_cnt--;
+        }
+    }
+
     delta = new_entry_time - v->runstate.state_entry_time;
     if ( delta > 0 )
     {
@@ -362,9 +383,15 @@ int sched_init_vcpu(struct vcpu *v)
         return 1;
 
     if ( is_idle_domain(d) )
+    {
         processor = v->vcpu_id;
+        item->run_cnt++;
+    }
     else
+    {
         processor = sched_select_initial_cpu(v);
+        item->idle_cnt++;
+    }
 
     sched_set_res(item, per_cpu(sched_res, processor));
 
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 38b403dfbf..795b2fafe5 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -62,6 +62,10 @@ struct sched_item {
     /* Last time item got (de-)scheduled. */
     uint64_t               state_entry_time;
 
+    /* Vcpu state summary. */
+    unsigned int           run_cnt;   /* vcpus running or runnable */
+    unsigned int           idle_cnt;  /* vcpus blocked or offline */
+
     /* Currently running on a CPU? */
     bool                   is_running;
     /* Item needs affinity restored */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 36/49] xen/sched: rework and rename vcpu_force_reschedule()
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (34 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 35/49] xen/sched: add runstate counters to struct sched_item Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 37/49] xen/sched: Change vcpu_migrate_*() to operate on schedule item Juergen Gross
                   ` (18 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

vcpu_force_reschedule() is only used for modifying the periodic timer
of a vcpu. Forcing a vcpu to give up the physical cpu for that purpose
is kind of brutal.

So instead of doing the reschedule dance just operate on the timer
directly.

In case we are modifying the timer of the currently running vcpu we
can just do that. In case it is for a foreign vcpu we should pause it
for that purpose like we do for all other vcpu state modifications.

Rename the function to vcpu_set_periodic_timer() as this now reflects
the functionality.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/pv/shim.c  |  4 +---
 xen/common/domain.c     |  6 ++----
 xen/common/schedule.c   | 23 +++++++++++++----------
 xen/include/xen/sched.h |  2 +-
 4 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 324ca27f93..5edbcd9ac5 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -410,7 +410,7 @@ int pv_shim_shutdown(uint8_t reason)
         unmap_vcpu_info(v);
 
         /* Reset the periodic timer to the default value. */
-        v->periodic_period = MILLISECS(10);
+        vcpu_set_periodic_timer(v, MILLISECS(10));
         /* Stop the singleshot timer. */
         stop_timer(&v->singleshot_timer);
 
@@ -419,8 +419,6 @@ int pv_shim_shutdown(uint8_t reason)
 
         if ( v != current )
             vcpu_unpause_by_systemcontroller(v);
-        else
-            vcpu_force_reschedule(v);
     }
 
     return 0;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 2773a21129..b448d20d40 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1442,15 +1442,13 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( set.period_ns > STIME_DELTA_MAX )
             return -EINVAL;
 
-        v->periodic_period = set.period_ns;
-        vcpu_force_reschedule(v);
+        vcpu_set_periodic_timer(v, set.period_ns);
 
         break;
     }
 
     case VCPUOP_stop_periodic_timer:
-        v->periodic_period = 0;
-        vcpu_force_reschedule(v);
+        vcpu_set_periodic_timer(v, 0);
         break;
 
     case VCPUOP_set_singleshot_timer:
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 0a94505b89..7c7735bf33 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -833,21 +833,24 @@ static void vcpu_migrate_finish(struct vcpu *v)
 }
 
 /*
- * Force a VCPU through a deschedule/reschedule path.
- * For example, using this when setting the periodic timer period means that
- * most periodic-timer state need only be touched from within the scheduler
- * which can thus be done without need for synchronisation.
+ * Set the periodic timer of a vcpu.
  */
-void vcpu_force_reschedule(struct vcpu *v)
+void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value)
 {
-    spinlock_t *lock = item_schedule_lock_irq(v->sched_item);
+    s_time_t now = NOW();
 
-    if ( vcpu_running(v) )
-        vcpu_migrate_start(v);
+    if ( v != current )
+        vcpu_pause(v);
+    else
+        stop_timer(&v->periodic_timer);
 
-    item_schedule_unlock_irq(lock, v->sched_item);
+    v->periodic_period = value;
+    v->periodic_last_event = now;
 
-    vcpu_migrate_finish(v);
+    if ( v != current )
+        vcpu_unpause(v);
+    else if ( value != 0 )
+        set_timer(&v->periodic_timer, now + value);
 }
 
 void restore_vcpu_affinity(struct domain *d)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index f7eb138d86..873a903977 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -857,7 +857,7 @@ struct scheduler *scheduler_get_default(void);
 struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr);
 void scheduler_free(struct scheduler *sched);
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c);
-void vcpu_force_reschedule(struct vcpu *v);
+void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value);
 int cpu_disable_scheduler(unsigned int cpu);
 /* We need it in dom0_setup_vcpu */
 void sched_set_affinity(struct vcpu *v, const cpumask_t *hard,
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 37/49] xen/sched: Change vcpu_migrate_*() to operate on schedule item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (35 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 36/49] xen/sched: rework and rename vcpu_force_reschedule() Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 38/49] xen/sched: move struct task_slice into struct sched_item Juergen Gross
                   ` (17 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Now that vcpu_migrate_start() and vcpu_migrate_finish() are used only
to ensure a vcpu is running on a suitable processor they can be
switched to operate on schedule items instead of vcpus.

While doing that rename them accordingly and make the _start() variant
static.

vcpu_move_locked() is switched to schedule item, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 107 +++++++++++++++++++++++++++++---------------------
 1 file changed, 62 insertions(+), 45 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7c7735bf33..22e43d88cc 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -687,38 +687,43 @@ void vcpu_unblock(struct vcpu *v)
 }
 
 /*
- * Do the actual movement of a vcpu from old to new CPU. Locks for *both*
+ * Do the actual movement of an item from old to new CPU. Locks for *both*
  * CPUs needs to have been taken already when calling this!
  */
-static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
+static void sched_item_move_locked(struct sched_item *item,
+                                   unsigned int new_cpu)
 {
-    unsigned int old_cpu = v->processor;
+    unsigned int old_cpu = item->res->processor;
+    struct vcpu *v;
 
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
      * once the switch occurs, v->is_urgent is no longer protected by
      * the per-CPU scheduler lock we are holding.
      */
-    if ( unlikely(v->is_urgent) && (old_cpu != new_cpu) )
+    for_each_sched_item_vcpu ( item, v )
     {
-        atomic_inc(&per_cpu(sched_res, new_cpu)->urgent_count);
-        atomic_dec(&per_cpu(sched_res, old_cpu)->urgent_count);
+        if ( unlikely(v->is_urgent) && (old_cpu != new_cpu) )
+        {
+            atomic_inc(&per_cpu(sched_res, new_cpu)->urgent_count);
+            atomic_dec(&per_cpu(sched_res, old_cpu)->urgent_count);
+        }
     }
 
     /*
      * Actual CPU switch to new CPU.  This is safe because the lock
-     * pointer cant' change while the current lock is held.
+     * pointer can't change while the current lock is held.
      */
-    if ( vcpu_scheduler(v)->migrate )
-        SCHED_OP(vcpu_scheduler(v), migrate, v->sched_item, new_cpu);
+    if ( vcpu_scheduler(item->vcpu)->migrate )
+        SCHED_OP(vcpu_scheduler(item->vcpu), migrate, item, new_cpu);
     else
-        sched_set_res(v->sched_item, per_cpu(sched_res, new_cpu));
+        sched_set_res(item, per_cpu(sched_res, new_cpu));
 }
 
 /*
  * Initiating migration
  *
- * In order to migrate, we need the vcpu in question to have stopped
+ * In order to migrate, we need the item in question to have stopped
  * running and had SCHED_OP(sleep) called (to take it off any
  * runqueues, for instance); and if it is currently running, it needs
  * to be scheduled out.  Finally, we need to hold the scheduling locks
@@ -734,36 +739,45 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
  * should be called like this:
  *
  *     lock = item_schedule_lock_irq(item);
- *     vcpu_migrate_start(v);
+ *     sched_item_migrate_start(item);
  *     item_schedule_unlock_irq(lock, item)
- *     vcpu_migrate_finish(v);
+ *     sched_item_migrate_finish(item);
  *
- * vcpu_migrate_finish() will do the work now if it can, or simply
- * return if it can't (because v is still running); in that case
- * vcpu_migrate_finish() will be called by context_saved().
+ * sched_item_migrate_finish() will do the work now if it can, or simply
+ * return if it can't (because item is still running); in that case
+ * sched_item_migrate_finish() will be called by context_saved().
  */
-void vcpu_migrate_start(struct vcpu *v)
+static void sched_item_migrate_start(struct sched_item *item)
 {
-    set_bit(_VPF_migrating, &v->pause_flags);
-    vcpu_sleep_nosync_locked(v);
+    struct vcpu *v;
+
+    for_each_sched_item_vcpu ( item, v )
+    {
+        set_bit(_VPF_migrating, &v->pause_flags);
+        vcpu_sleep_nosync_locked(v);
+    }
 }
 
-static void vcpu_migrate_finish(struct vcpu *v)
+static void sched_item_migrate_finish(struct sched_item *item)
 {
     unsigned long flags;
     unsigned int old_cpu, new_cpu;
     spinlock_t *old_lock, *new_lock;
     bool_t pick_called = 0;
+    struct vcpu *v;
 
     /*
-     * If the vcpu is currently running, this will be handled by
+     * If the item is currently running, this will be handled by
      * context_saved(); and in any case, if the bit is cleared, then
      * someone else has already done the work so we don't need to.
      */
-    if ( vcpu_running(v) || !test_bit(_VPF_migrating, &v->pause_flags) )
-        return;
+    for_each_sched_item_vcpu ( item, v )
+    {
+        if ( vcpu_running(v) || !test_bit(_VPF_migrating, &v->pause_flags) )
+            return;
+    }
 
-    old_cpu = new_cpu = v->processor;
+    old_cpu = new_cpu = item->res->processor;
     for ( ; ; )
     {
         /*
@@ -776,7 +790,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
 
         sched_spin_lock_double(old_lock, new_lock, &flags);
 
-        old_cpu = v->processor;
+        old_cpu = item->res->processor;
         if ( old_lock == per_cpu(sched_res, old_cpu)->schedule_lock )
         {
             /*
@@ -785,15 +799,15 @@ static void vcpu_migrate_finish(struct vcpu *v)
              */
             if ( pick_called &&
                  (new_lock == per_cpu(sched_res, new_cpu)->schedule_lock) &&
-                 cpumask_test_cpu(new_cpu, v->sched_item->cpu_hard_affinity) &&
-                 cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
+                 cpumask_test_cpu(new_cpu, item->cpu_hard_affinity) &&
+                 cpumask_test_cpu(new_cpu, item->domain->cpupool->cpu_valid) )
                 break;
 
             /* Select a new CPU. */
-            new_cpu = SCHED_OP(vcpu_scheduler(v), pick_resource,
-                               v->sched_item)->processor;
+            new_cpu = SCHED_OP(vcpu_scheduler(item->vcpu), pick_resource,
+                               item)->processor;
             if ( (new_lock == per_cpu(sched_res, new_cpu)->schedule_lock) &&
-                 cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
+                 cpumask_test_cpu(new_cpu, item->domain->cpupool->cpu_valid) )
                 break;
             pick_called = 1;
         }
@@ -814,22 +828,26 @@ static void vcpu_migrate_finish(struct vcpu *v)
      * because they both happen in (different) spinlock regions, and those
      * regions are strictly serialised.
      */
-    if ( vcpu_running(v) ||
-         !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
+    for_each_sched_item_vcpu ( item, v )
     {
-        sched_spin_unlock_double(old_lock, new_lock, flags);
-        return;
+        if ( vcpu_running(v) ||
+             !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
+        {
+            sched_spin_unlock_double(old_lock, new_lock, flags);
+            return;
+        }
     }
 
-    vcpu_move_locked(v, new_cpu);
+    sched_item_move_locked(item, new_cpu);
 
     sched_spin_unlock_double(old_lock, new_lock, flags);
 
     if ( old_cpu != new_cpu )
-        sched_move_irqs(v->sched_item);
+        sched_move_irqs(item);
 
     /* Wake on new CPU. */
-    vcpu_wake(v);
+    for_each_sched_item_vcpu ( item, v )
+        vcpu_wake(v);
 }
 
 /*
@@ -970,10 +988,9 @@ int cpu_disable_scheduler(unsigned int cpu)
              *  * the scheduler will always find a suitable solution, or
              *    things would have failed before getting in here.
              */
-            vcpu_migrate_start(item->vcpu);
+            sched_item_migrate_start(item);
             item_schedule_unlock_irqrestore(lock, flags, item);
-
-            vcpu_migrate_finish(item->vcpu);
+            sched_item_migrate_finish(item);
 
             /*
              * The only caveat, in this case, is that if a vcpu active in
@@ -1064,14 +1081,14 @@ static int vcpu_set_affinity(
             ASSERT(which == item->cpu_soft_affinity);
             sched_set_affinity(v, NULL, affinity);
         }
-        vcpu_migrate_start(v);
+        sched_item_migrate_start(item);
     }
 
     item_schedule_unlock_irq(lock, item);
 
     domain_update_node_affinity(v->domain);
 
-    vcpu_migrate_finish(v);
+    sched_item_migrate_finish(item);
 
     return ret;
 }
@@ -1318,13 +1335,13 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     }
 
     if ( ret == 0 )
-        vcpu_migrate_start(v);
+        sched_item_migrate_start(item);
 
     item_schedule_unlock_irq(lock, item);
 
     domain_update_node_affinity(v->domain);
 
-    vcpu_migrate_finish(v);
+    sched_item_migrate_finish(item);
 
     return ret;
 }
@@ -1709,7 +1726,7 @@ void context_saved(struct vcpu *prev)
 
     SCHED_OP(vcpu_scheduler(prev), context_saved, prev->sched_item);
 
-    vcpu_migrate_finish(prev);
+    sched_item_migrate_finish(prev->sched_item);
 }
 
 /* The scheduler timer: force a run through the scheduler */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 38/49] xen/sched: move struct task_slice into struct sched_item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (36 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 37/49] xen/sched: Change vcpu_migrate_*() to operate on schedule item Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 39/49] xen/sched: add code to sync scheduling of all vcpus of a sched item Juergen Gross
                   ` (16 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich

In order to prepare for multiple vcpus per schedule item move struct
task_slice in schedule() from the local stack into struct sched_item
of the currently running item. To make access easier for the single
schedulers add the pointer of the currently running item as a parameter
of do_schedule().

While at it switch the tasklet_work_scheduled parameter of
do_schedule() from bool_t to bool.

As struct task_slice is only ever modified with the local schedule
lock held it is safe to directly set the different items in struct
sched_item instead of using an on-stack copy for returning the data.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_arinc653.c | 20 +++++++-------------
 xen/common/sched_credit.c   | 25 +++++++++++--------------
 xen/common/sched_credit2.c  | 21 +++++++++------------
 xen/common/sched_null.c     | 26 ++++++++++++--------------
 xen/common/sched_rt.c       | 22 +++++++++++-----------
 xen/common/schedule.c       | 21 ++++++++++-----------
 xen/include/xen/sched-if.h  | 17 +++++++++--------
 7 files changed, 69 insertions(+), 83 deletions(-)

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 3919c0a3e9..e98e98116b 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -497,18 +497,14 @@ a653sched_item_wake(const struct scheduler *ops, struct sched_item *item)
  *
  * @param ops       Pointer to this instance of the scheduler structure
  * @param now       Current time
- *
- * @return          Address of the ITEM structure scheduled to be run next
- *                  Amount of time to execute the returned ITEM
- *                  Flag for whether the ITEM was migrated
  */
-static struct task_slice
+static void
 a653sched_do_schedule(
     const struct scheduler *ops,
+    struct sched_item *prev,
     s_time_t now,
-    bool_t tasklet_work_scheduled)
+    bool tasklet_work_scheduled)
 {
-    struct task_slice ret;                      /* hold the chosen domain */
     struct sched_item *new_task = NULL;
     static unsigned int sched_index = 0;
     static s_time_t next_switch_time;
@@ -586,13 +582,11 @@ a653sched_do_schedule(
      * Return the amount of time the next domain has to run and the address
      * of the selected task's ITEM structure.
      */
-    ret.time = next_switch_time - now;
-    ret.task = new_task;
-    ret.migrated = 0;
-
-    BUG_ON(ret.time <= 0);
+    prev->next_time = next_switch_time - now;
+    prev->next_task = new_task;
+    new_task->migrated = false;
 
-    return ret;
+    BUG_ON(prev->next_time <= 0);
 }
 
 /**
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 4734f52fc7..064f88ab23 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1689,7 +1689,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
 
 static struct csched_item *
 csched_load_balance(struct csched_private *prv, int cpu,
-    struct csched_item *snext, bool_t *stolen)
+    struct csched_item *snext, bool *stolen)
 {
     struct cpupool *c = per_cpu(cpupool, cpu);
     struct csched_item *speer;
@@ -1805,7 +1805,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                 /* As soon as one item is found, balancing ends */
                 if ( speer != NULL )
                 {
-                    *stolen = 1;
+                    *stolen = true;
                     /*
                      * Next time we'll look for work to steal on this node, we
                      * will start from the next pCPU, with respect to this one,
@@ -1835,19 +1835,18 @@ csched_load_balance(struct csched_private *prv, int cpu,
  * This function is in the critical path. It is designed to be simple and
  * fast for the common case.
  */
-static struct task_slice
-csched_schedule(
-    const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
+static void csched_schedule(
+    const struct scheduler *ops, struct sched_item *item, s_time_t now,
+    bool tasklet_work_scheduled)
 {
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct list_head * const runq = RUNQ(sched_cpu);
-    struct sched_item *item = current->sched_item;
     struct csched_item * const scurr = CSCHED_ITEM(item);
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_item *snext;
-    struct task_slice ret;
     s_time_t runtime, tslice;
+    bool migrated = false;
 
     SCHED_STAT_CRANK(schedule);
     CSCHED_ITEM_CHECK(item);
@@ -1937,7 +1936,6 @@ csched_schedule(
                         (unsigned char *)&d);
         }
 
-        ret.migrated = 0;
         goto out;
     }
     tslice = prv->tslice;
@@ -1955,7 +1953,6 @@ csched_schedule(
     }
 
     snext = __runq_elem(runq->next);
-    ret.migrated = 0;
 
     /* Tasklet work (which runs in idle ITEM context) overrides all else. */
     if ( tasklet_work_scheduled )
@@ -1981,7 +1978,7 @@ csched_schedule(
     if ( snext->pri > CSCHED_PRI_TS_OVER )
         __runq_remove(snext);
     else
-        snext = csched_load_balance(prv, sched_cpu, snext, &ret.migrated);
+        snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
 
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
@@ -2004,12 +2001,12 @@ out:
     /*
      * Return task to run next...
      */
-    ret.time = (is_idle_item(snext->item) ?
+    item->next_time = (is_idle_item(snext->item) ?
                 -1 : tslice);
-    ret.task = snext->item;
+    item->next_task = snext->item;
+    snext->item->migrated = migrated;
 
-    CSCHED_ITEM_CHECK(ret.task);
-    return ret;
+    CSCHED_ITEM_CHECK(item->next_task);
 }
 
 static void
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index d5cb8c0200..f1074be25d 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3443,19 +3443,18 @@ runq_candidate(struct csched2_runqueue_data *rqd,
  * This function is in the critical path. It is designed to be simple and
  * fast for the common case.
  */
-static struct task_slice
-csched2_schedule(
-    const struct scheduler *ops, s_time_t now, bool tasklet_work_scheduled)
+static void csched2_schedule(
+    const struct scheduler *ops, struct sched_item *curritem, s_time_t now,
+    bool tasklet_work_scheduled)
 {
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct csched2_runqueue_data *rqd;
-    struct sched_item *curritem = current->sched_item;
     struct csched2_item * const scurr = csched2_item(curritem);
     struct csched2_item *snext = NULL;
     unsigned int skipped_items = 0;
-    struct task_slice ret;
     bool tickled;
+    bool migrated = false;
 
     SCHED_STAT_CRANK(schedule);
     CSCHED2_ITEM_CHECK(curritem);
@@ -3540,8 +3539,6 @@ csched2_schedule(
          && item_runnable(curritem) )
         __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags);
 
-    ret.migrated = 0;
-
     /* Accounting for non-idle tasks */
     if ( !is_idle_item(snext->item) )
     {
@@ -3591,7 +3588,7 @@ csched2_schedule(
             snext->credit += CSCHED2_MIGRATE_COMPENSATION;
             sched_set_res(snext->item, per_cpu(sched_res, sched_cpu));
             SCHED_STAT_CRANK(migrated);
-            ret.migrated = 1;
+            migrated = true;
         }
     }
     else
@@ -3622,11 +3619,11 @@ csched2_schedule(
     /*
      * Return task to run next...
      */
-    ret.time = csched2_runtime(ops, sched_cpu, snext, now);
-    ret.task = snext->item;
+    curritem->next_time = csched2_runtime(ops, sched_cpu, snext, now);
+    curritem->next_task = snext->item;
+    snext->item->migrated = migrated;
 
-    CSCHED2_ITEM_CHECK(ret.task);
-    return ret;
+    CSCHED2_ITEM_CHECK(curritem->next_task);
 }
 
 static void
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 34ce7a05d3..1af396dcdb 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -703,16 +703,14 @@ static inline void null_item_check(struct sched_item *item)
  *  - the item assigned to the pCPU, if there's one and it can run;
  *  - the idle item, otherwise.
  */
-static struct task_slice null_schedule(const struct scheduler *ops,
-                                       s_time_t now,
-                                       bool_t tasklet_work_scheduled)
+static void null_schedule(const struct scheduler *ops, struct sched_item *prev,
+                          s_time_t now, bool tasklet_work_scheduled)
 {
     unsigned int bs;
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct null_private *prv = null_priv(ops);
     struct null_item *wvc;
-    struct task_slice ret;
 
     SCHED_STAT_CRANK(schedule);
     NULL_ITEM_CHECK(current->sched_item);
@@ -740,19 +738,18 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     if ( tasklet_work_scheduled )
     {
         trace_var(TRC_SNULL_TASKLET, 1, 0, NULL);
-        ret.task = sched_idle_item(sched_cpu);
+        prev->next_task = sched_idle_item(sched_cpu);
     }
     else
-        ret.task = per_cpu(npc, sched_cpu).item;
-    ret.migrated = 0;
-    ret.time = -1;
+        prev->next_task = per_cpu(npc, sched_cpu).item;
+    prev->next_time = -1;
 
     /*
      * We may be new in the cpupool, or just coming back online. In which
      * case, there may be items in the waitqueue that we can assign to us
      * and run.
      */
-    if ( unlikely(ret.task == NULL) )
+    if ( unlikely(prev->next_task == NULL) )
     {
         spin_lock(&prv->waitq_lock);
 
@@ -778,7 +775,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
                 {
                     item_assign(prv, wvc->item, sched_cpu);
                     list_del_init(&wvc->waitq_elem);
-                    ret.task = wvc->item;
+                    prev->next_task = wvc->item;
                     goto unlock;
                 }
             }
@@ -787,11 +784,12 @@ static struct task_slice null_schedule(const struct scheduler *ops,
         spin_unlock(&prv->waitq_lock);
     }
 
-    if ( unlikely(ret.task == NULL || !item_runnable(ret.task)) )
-        ret.task = sched_idle_item(sched_cpu);
+    if ( unlikely(prev->next_task == NULL || !item_runnable(prev->next_task)) )
+        prev->next_task = sched_idle_item(sched_cpu);
 
-    NULL_ITEM_CHECK(ret.task);
-    return ret;
+    NULL_ITEM_CHECK(prev->next_task);
+
+    prev->next_task->migrated = false;
 }
 
 static inline void dump_item(struct null_private *prv, struct null_item *nvc)
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 2366e33beb..c5e8b559f3 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1062,16 +1062,16 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
  * schedule function for rt scheduler.
  * The lock is already grabbed in schedule.c, no need to lock here
  */
-static struct task_slice
-rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
+static void
+rt_schedule(const struct scheduler *ops, struct sched_item *curritem,
+            s_time_t now, bool tasklet_work_scheduled)
 {
     const unsigned int cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cpu);
     struct rt_private *prv = rt_priv(ops);
-    struct rt_item *const scurr = rt_item(current->sched_item);
+    struct rt_item *const scurr = rt_item(curritem);
     struct rt_item *snext = NULL;
-    struct task_slice ret = { .migrated = 0 };
-    struct sched_item *curritem = current->sched_item;
+    bool migrated = false;
 
     /* TRACE */
     {
@@ -1119,7 +1119,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         __set_bit(__RTDS_delayed_runq_add, &scurr->flags);
 
     snext->last_start = now;
-    ret.time =  -1; /* if an idle item is picked */
+    curritem->next_time =  -1; /* if an idle item is picked */
     if ( !is_idle_item(snext->item) )
     {
         if ( snext != scurr )
@@ -1130,13 +1130,13 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         if ( sched_item_cpu(snext->item) != sched_cpu )
         {
             sched_set_res(snext->item, per_cpu(sched_res, sched_cpu));
-            ret.migrated = 1;
+            migrated = true;
         }
-        ret.time = snext->cur_budget; /* invoke the scheduler next time */
+        /* Invoke the scheduler next time. */
+        curritem->next_time = snext->cur_budget;
     }
-    ret.task = snext->item;
-
-    return ret;
+    curritem->next_task = snext->item;
+    snext->item->migrated = migrated;
 }
 
 /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 22e43d88cc..082225d173 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1611,10 +1611,9 @@ static void schedule(void)
     s_time_t              now;
     struct scheduler     *sched;
     unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
-    bool_t                tasklet_work_scheduled = 0;
+    bool                  tasklet_work_scheduled = false;
     struct sched_resource *sd;
     spinlock_t           *lock;
-    struct task_slice     next_slice;
     int cpu = smp_processor_id();
 
     ASSERT_NOT_IN_ATOMIC();
@@ -1630,12 +1629,12 @@ static void schedule(void)
         set_bit(_TASKLET_scheduled, tasklet_work);
         /* fallthrough */
     case TASKLET_enqueued|TASKLET_scheduled:
-        tasklet_work_scheduled = 1;
+        tasklet_work_scheduled = true;
         break;
     case TASKLET_scheduled:
         clear_bit(_TASKLET_scheduled, tasklet_work);
     case 0:
-        /*tasklet_work_scheduled = 0;*/
+        /*tasklet_work_scheduled = false;*/
         break;
     default:
         BUG();
@@ -1649,14 +1648,14 @@ static void schedule(void)
 
     /* get policy-specific decision on scheduling... */
     sched = this_cpu(scheduler);
-    next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
+    sched->do_schedule(sched, prev, now, tasklet_work_scheduled);
 
-    next = next_slice.task;
+    next = prev->next_task;
 
     sd->curr = next;
 
-    if ( next_slice.time >= 0 ) /* -ve means no limit */
-        set_timer(&sd->s_timer, now + next_slice.time);
+    if ( prev->next_time >= 0 ) /* -ve means no limit */
+        set_timer(&sd->s_timer, now + prev->next_time);
 
     if ( unlikely(prev == next) )
     {
@@ -1664,7 +1663,7 @@ static void schedule(void)
         TRACE_4D(TRC_SCHED_SWITCH_INFCONT,
                  next->domain->domain_id, next->item_id,
                  now - prev->state_entry_time,
-                 next_slice.time);
+                 prev->next_time);
         trace_continue_running(next->vcpu);
         return continue_running(prev->vcpu);
     }
@@ -1676,7 +1675,7 @@ static void schedule(void)
              next->domain->domain_id, next->item_id,
              (next->vcpu->runstate.state == RUNSTATE_runnable) ?
              (now - next->state_entry_time) : 0,
-             next_slice.time);
+             prev->next_time);
 
     ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
 
@@ -1705,7 +1704,7 @@ static void schedule(void)
 
     stop_timer(&prev->vcpu->periodic_timer);
 
-    if ( next_slice.migrated )
+    if ( next->migrated )
         vcpu_move_irqs(next->vcpu);
 
     vcpu_periodic_timer_work(next->vcpu);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 795b2fafe5..e2bc8f7284 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -72,6 +72,8 @@ struct sched_item {
     bool                   affinity_broken;
     /* Does soft affinity actually play a role (given hard affinity)? */
     bool                   soft_aff_effective;
+    /* Item has been migrated to other cpu(s). */
+    bool                   migrated;
     /* Bitmask of CPUs on which this VCPU may run. */
     cpumask_var_t          cpu_hard_affinity;
     /* Used to change affinity temporarily. */
@@ -80,6 +82,10 @@ struct sched_item {
     cpumask_var_t          cpu_hard_affinity_saved;
     /* Bitmask of CPUs on which this VCPU prefers to run. */
     cpumask_var_t          cpu_soft_affinity;
+
+    /* Next item to run. */
+    struct sched_item      *next_task;
+    s_time_t                next_time;
 };
 
 #define for_each_sched_item(d, e)                                         \
@@ -225,12 +231,6 @@ static inline spinlock_t *pcpu_schedule_trylock(unsigned int cpu)
     return NULL;
 }
 
-struct task_slice {
-    struct sched_item *task;
-    s_time_t           time;
-    bool_t             migrated;
-};
-
 struct scheduler {
     char *name;             /* full name for this scheduler      */
     char *opt_name;         /* option name for this scheduler    */
@@ -273,8 +273,9 @@ struct scheduler {
     void         (*context_saved)  (const struct scheduler *,
                                     struct sched_item *);
 
-    struct task_slice (*do_schedule) (const struct scheduler *, s_time_t,
-                                      bool_t tasklet_work_scheduled);
+    void         (*do_schedule)    (const struct scheduler *,
+                                    struct sched_item *, s_time_t,
+                                    bool tasklet_work_scheduled);
 
     struct sched_resource * (*pick_resource) (const struct scheduler *,
                                               struct sched_item *);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 39/49] xen/sched: add code to sync scheduling of all vcpus of a sched item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (37 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 38/49] xen/sched: move struct task_slice into struct sched_item Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 40/49] xen/sched: add support for multiple vcpus per sched item where missing Juergen Gross
                   ` (15 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

When switching sched items synchronize all vcpus of the new item to be
scheduled at the same time.

A variable sched_granularity is added which holds the number of vcpus
per schedule item.

As tasklets require to schedule the idle item it is required to set the
tasklet_work_scheduled parameter of do_schedule() to true if any cpu
covered by the current schedule() call has any pending tasklet work.

For joining other vcpus of the schedule item we need to add a new
softirq SCHED_SLAVE_SOFTIRQ in order to have a way to initiate a
context switch without calling the generic schedule() function
selecting the vcpu to switch to, as we already know which vcpu we
want to run. This has the other advantage not to loose any other
concurrent SCHEDULE_SOFTIRQ events.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain.c      |  37 +++++-
 xen/common/schedule.c      | 275 ++++++++++++++++++++++++++++++++-------------
 xen/common/softirq.c       |   6 +-
 xen/include/xen/sched-if.h |   7 ++
 xen/include/xen/softirq.h  |   1 +
 5 files changed, 247 insertions(+), 79 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 53b8fa1c9d..7daba4fb91 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1709,12 +1709,45 @@ static void __context_switch(void)
     per_cpu(curr_vcpu, cpu) = n;
 }
 
+/*
+ * Rendezvous on end of context switch.
+ * As no lock is protecting this rendezvous function we need to use atomic
+ * access functions on the counter.
+ * The counter will be 0 in case no rendezvous is needed. For the rendezvous
+ * case it is initialised to the number of cpus to rendezvous plus 1. Each
+ * member entering decrements the counter. The last one will decrement it to
+ * 1 and perform the final needed action in that case (call of context_saved()
+ * if prev was specified, and then set the counter to zero. The other members
+ * will wait until the counter becomes zero until they proceed.
+ */
+static void context_wait_rendezvous_out(struct sched_item *item,
+                                        struct vcpu *prev)
+{
+    if ( atomic_read(&item->rendezvous_out_cnt) )
+    {
+        int cnt = atomic_dec_return(&item->rendezvous_out_cnt);
+
+        /* Call context_saved() before releasing other waiters. */
+        if ( cnt == 1 )
+        {
+            if ( prev )
+                context_saved(prev);
+            atomic_set(&item->rendezvous_out_cnt, 0);
+        }
+        else
+            while ( atomic_read(&item->rendezvous_out_cnt) )
+                cpu_relax();
+    }
+    else if ( prev )
+        context_saved(prev);
+}
 
 void context_switch(struct vcpu *prev, struct vcpu *next)
 {
     unsigned int cpu = smp_processor_id();
     const struct domain *prevd = prev->domain, *nextd = next->domain;
     unsigned int dirty_cpu = next->dirty_cpu;
+    struct sched_item *item = next->sched_item;
 
     ASSERT(local_irq_is_enabled());
 
@@ -1787,7 +1820,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
         }
     }
 
-    context_saved(prev);
+    context_wait_rendezvous_out(item, prev);
 
     if ( prev != next )
     {
@@ -1812,6 +1845,8 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
 void continue_running(struct vcpu *same)
 {
+    context_wait_rendezvous_out(same->sched_item, NULL);
+
     /* See the comment above. */
     same->domain->arch.ctxt_switch->tail(same);
     BUG();
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 082225d173..d3474e6565 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -54,6 +54,10 @@ boolean_param("sched_smt_power_savings", sched_smt_power_savings);
  * */
 int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
+
+/* Number of vcpus per struct sched_item. */
+static unsigned int sched_granularity = 1;
+
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
 static void vcpu_periodic_timer_fn(void *data);
@@ -1600,116 +1604,235 @@ static void vcpu_periodic_timer_work(struct vcpu *v)
     set_timer(&v->periodic_timer, periodic_next_event);
 }
 
-/*
- * The main function
- * - deschedule the current domain (scheduler independent).
- * - pick a new domain (scheduler dependent).
- */
-static void schedule(void)
+static void sched_switch_items(struct sched_resource *sd,
+                               struct sched_item *next, struct sched_item *prev,
+                               s_time_t now)
 {
-    struct sched_item    *prev = current->sched_item, *next = NULL;
-    s_time_t              now;
-    struct scheduler     *sched;
-    unsigned long        *tasklet_work = &this_cpu(tasklet_work_to_do);
-    bool                  tasklet_work_scheduled = false;
-    struct sched_resource *sd;
-    spinlock_t           *lock;
-    int cpu = smp_processor_id();
+    sd->curr = next;
 
-    ASSERT_NOT_IN_ATOMIC();
+    TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, prev->item_id,
+             now - prev->state_entry_time);
+    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, next->item_id,
+             (next->vcpu->runstate.state == RUNSTATE_runnable) ?
+             (now - next->state_entry_time) : 0, prev->next_time);
 
-    SCHED_STAT_CRANK(sched_run);
+    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
 
-    sd = this_cpu(sched_res);
+    TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->item_id,
+             next->domain->domain_id, next->item_id);
+
+    sched_item_runstate_change(prev, false, now);
+    prev->last_run_time = now;
+
+    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
+    sched_item_runstate_change(next, true, now);
+
+    /*
+     * NB. Don't add any trace records from here until the actual context
+     * switch, else lost_records resume will not work properly.
+     */
+
+    ASSERT(!next->is_running);
+    next->is_running = 1;
+}
+
+static bool sched_tasklet_check(void)
+{
+    unsigned long *tasklet_work;
+    bool tasklet_work_scheduled = false;
+    const cpumask_t *mask = this_cpu(sched_res)->cpus;
+    int cpu;
 
-    /* Update tasklet scheduling status. */
-    switch ( *tasklet_work )
+    for_each_cpu ( cpu, mask )
     {
-    case TASKLET_enqueued:
-        set_bit(_TASKLET_scheduled, tasklet_work);
-        /* fallthrough */
-    case TASKLET_enqueued|TASKLET_scheduled:
-        tasklet_work_scheduled = true;
-        break;
-    case TASKLET_scheduled:
-        clear_bit(_TASKLET_scheduled, tasklet_work);
-    case 0:
-        /*tasklet_work_scheduled = false;*/
-        break;
-    default:
-        BUG();
-    }
+        tasklet_work = &per_cpu(tasklet_work_to_do, cpu);
 
-    lock = pcpu_schedule_lock_irq(cpu);
+        switch ( *tasklet_work )
+        {
+        case TASKLET_enqueued:
+            set_bit(_TASKLET_scheduled, tasklet_work);
+            /* fallthrough */
+        case TASKLET_enqueued|TASKLET_scheduled:
+            tasklet_work_scheduled = true;
+            break;
+        case TASKLET_scheduled:
+            clear_bit(_TASKLET_scheduled, tasklet_work);
+        case 0:
+            /*tasklet_work_scheduled = false;*/
+            break;
+        default:
+            BUG();
+        }
+    }
 
-    now = NOW();
+    return tasklet_work_scheduled;
+}
 
-    stop_timer(&sd->s_timer);
+static struct sched_item *do_schedule(struct sched_item *prev, s_time_t now)
+{
+    struct scheduler *sched = this_cpu(scheduler);
+    struct sched_resource *sd = this_cpu(sched_res);
+    struct sched_item *next;
 
     /* get policy-specific decision on scheduling... */
-    sched = this_cpu(scheduler);
-    sched->do_schedule(sched, prev, now, tasklet_work_scheduled);
+    sched->do_schedule(sched, prev, now, sched_tasklet_check());
 
     next = prev->next_task;
 
-    sd->curr = next;
-
     if ( prev->next_time >= 0 ) /* -ve means no limit */
         set_timer(&sd->s_timer, now + prev->next_time);
 
-    if ( unlikely(prev == next) )
+    if ( likely(prev != next) )
+        sched_switch_items(sd, next, prev, now);
+
+    return next;
+}
+
+/*
+ * Rendezvous before taking a scheduling decision.
+ * Called with schedule lock held, so all accesses to the rendezvous counter
+ * can be normal ones (no atomic accesses needed).
+ * The counter is initialized to the number of cpus to rendezvous initially.
+ * Each cpu entering will decrement the counter. In case the counter becomes
+ * zero do_schedule() is called and the rendezvous counter for leaving
+ * context_switch() is set. All other members will wait until the counter is
+ * becoming zero, dropping the schedule lock in between.
+ */
+static struct sched_item *sched_wait_rendezvous_in(struct sched_item *prev,
+                                                   spinlock_t *lock, int cpu,
+                                                   s_time_t now)
+{
+    struct sched_item *next;
+
+    if ( !--prev->rendezvous_in_cnt )
+    {
+        next = do_schedule(prev, now);
+        atomic_set(&next->rendezvous_out_cnt, sched_granularity + 1);
+        return next;
+    }
+
+    while ( prev->rendezvous_in_cnt )
     {
         pcpu_schedule_unlock_irq(lock, cpu);
+        cpu_relax();
+        pcpu_schedule_lock_irq(cpu);
+    }
+
+    return prev->next_task;
+}
+
+static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
+                                 s_time_t now)
+{
+    if ( unlikely(vprev == vnext) )
+    {
         TRACE_4D(TRC_SCHED_SWITCH_INFCONT,
-                 next->domain->domain_id, next->item_id,
-                 now - prev->state_entry_time,
-                 prev->next_time);
-        trace_continue_running(next->vcpu);
-        return continue_running(prev->vcpu);
+                 vnext->domain->domain_id, vnext->sched_item->item_id,
+                 now - vprev->runstate.state_entry_time,
+                 vprev->sched_item->next_time);
+        trace_continue_running(vnext);
+        return continue_running(vprev);
     }
 
-    TRACE_3D(TRC_SCHED_SWITCH_INFPREV,
-             prev->domain->domain_id, prev->item_id,
-             now - prev->state_entry_time);
-    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT,
-             next->domain->domain_id, next->item_id,
-             (next->vcpu->runstate.state == RUNSTATE_runnable) ?
-             (now - next->state_entry_time) : 0,
-             prev->next_time);
+    SCHED_STAT_CRANK(sched_ctx);
 
-    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
+    stop_timer(&vprev->periodic_timer);
 
-    TRACE_4D(TRC_SCHED_SWITCH,
-             prev->domain->domain_id, prev->item_id,
-             next->domain->domain_id, next->item_id);
+    if ( vnext->sched_item->migrated )
+        vcpu_move_irqs(vnext);
 
-    sched_item_runstate_change(prev, false, now);
-    prev->last_run_time = now;
+    vcpu_periodic_timer_work(vnext);
 
-    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
-    sched_item_runstate_change(next, true, now);
+    context_switch(vprev, vnext);
+}
 
-    /*
-     * NB. Don't add any trace records from here until the actual context
-     * switch, else lost_records resume will not work properly.
-     */
+static void sched_slave(void)
+{
+    struct vcpu          *vprev = current;
+    struct sched_item    *prev = vprev->sched_item, *next;
+    s_time_t              now;
+    spinlock_t           *lock;
+    int cpu = smp_processor_id();
 
-    ASSERT(!next->is_running);
-    next->is_running = 1;
-    next->state_entry_time = now;
+    ASSERT_NOT_IN_ATOMIC();
+
+    lock = pcpu_schedule_lock_irq(cpu);
+
+    now = NOW();
+
+    if ( !prev->rendezvous_in_cnt )
+    {
+        pcpu_schedule_unlock_irq(lock, cpu);
+        return;
+    }
+
+    stop_timer(&this_cpu(sched_res)->s_timer);
+
+    next = sched_wait_rendezvous_in(prev, lock, cpu, now);
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
-    SCHED_STAT_CRANK(sched_ctx);
+    sched_context_switch(vprev, next->vcpu, now);
+}
 
-    stop_timer(&prev->vcpu->periodic_timer);
+/*
+ * The main function
+ * - deschedule the current domain (scheduler independent).
+ * - pick a new domain (scheduler dependent).
+ */
+static void schedule(void)
+{
+    struct vcpu          *vnext, *vprev = current;
+    struct sched_item    *prev = vprev->sched_item, *next = NULL;
+    s_time_t              now;
+    struct sched_resource *sd;
+    spinlock_t           *lock;
+    int cpu = smp_processor_id();
+
+    ASSERT_NOT_IN_ATOMIC();
 
-    if ( next->migrated )
-        vcpu_move_irqs(next->vcpu);
+    SCHED_STAT_CRANK(sched_run);
 
-    vcpu_periodic_timer_work(next->vcpu);
+    sd = this_cpu(sched_res);
+
+    lock = pcpu_schedule_lock_irq(cpu);
+
+    if ( prev->rendezvous_in_cnt )
+    {
+        /*
+         * We have a race: sched_slave() should be called, so raise a softirq
+         * in order to re-enter schedule() later and call sched_slave() now.
+         */
+        pcpu_schedule_unlock_irq(lock, cpu);
+
+        raise_softirq(SCHEDULE_SOFTIRQ);
+        return sched_slave();
+    }
+
+    now = NOW();
+
+    stop_timer(&sd->s_timer);
+
+    if ( sched_granularity > 1 )
+    {
+        cpumask_t mask;
+
+        prev->rendezvous_in_cnt = sched_granularity;
+        cpumask_andnot(&mask, sd->cpus, cpumask_of(cpu));
+        cpumask_raise_softirq(&mask, SCHED_SLAVE_SOFTIRQ);
+        next = sched_wait_rendezvous_in(prev, lock, cpu, now);
+    }
+    else
+    {
+        prev->rendezvous_in_cnt = 0;
+        next = do_schedule(prev, now);
+        atomic_set(&next->rendezvous_out_cnt, 0);
+    }
+
+    pcpu_schedule_unlock_irq(lock, cpu);
 
-    context_switch(prev->vcpu, next->vcpu);
+    vnext = next->vcpu;
+    sched_context_switch(vprev, vnext, now);
 }
 
 void context_saved(struct vcpu *prev)
@@ -1767,6 +1890,7 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( sd == NULL )
         return -ENOMEM;
     sd->processor = cpu;
+    sd->cpus = cpumask_of(cpu);
     per_cpu(sched_res, cpu) = sd;
 
     per_cpu(scheduler, cpu) = &ops;
@@ -1926,6 +2050,7 @@ void __init scheduler_init(void)
     int i;
 
     open_softirq(SCHEDULE_SOFTIRQ, schedule);
+    open_softirq(SCHED_SLAVE_SOFTIRQ, sched_slave);
 
     for ( i = 0; i < NUM_SCHEDULERS; i++)
     {
diff --git a/xen/common/softirq.c b/xen/common/softirq.c
index 83c3c09bd5..2d66193203 100644
--- a/xen/common/softirq.c
+++ b/xen/common/softirq.c
@@ -33,8 +33,8 @@ static void __do_softirq(unsigned long ignore_mask)
     for ( ; ; )
     {
         /*
-         * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ may move
-         * us to another processor.
+         * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ or
+         * SCHED_SLAVE_SOFTIRQ may move us to another processor.
          */
         cpu = smp_processor_id();
 
@@ -55,7 +55,7 @@ void process_pending_softirqs(void)
 {
     ASSERT(!in_irq() && local_irq_is_enabled());
     /* Do not enter scheduler as it can preempt the calling context. */
-    __do_softirq(1ul<<SCHEDULE_SOFTIRQ);
+    __do_softirq((1ul << SCHEDULE_SOFTIRQ) | (1ul << SCHED_SLAVE_SOFTIRQ));
 }
 
 void do_softirq(void)
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index e2bc8f7284..9688d174e4 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -41,6 +41,7 @@ struct sched_resource {
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
     unsigned            processor;
+    const cpumask_t    *cpus;           /* cpus covered by this struct     */
 };
 
 #define curr_on_cpu(c)    (per_cpu(sched_res, c)->curr)
@@ -86,6 +87,12 @@ struct sched_item {
     /* Next item to run. */
     struct sched_item      *next_task;
     s_time_t                next_time;
+
+    /* Number of vcpus not yet joined for context switch. */
+    unsigned int            rendezvous_in_cnt;
+
+    /* Number of vcpus not yet finished with context switch. */
+    atomic_t                rendezvous_out_cnt;
 };
 
 #define for_each_sched_item(d, e)                                         \
diff --git a/xen/include/xen/softirq.h b/xen/include/xen/softirq.h
index c327c9b6cd..d7273b389b 100644
--- a/xen/include/xen/softirq.h
+++ b/xen/include/xen/softirq.h
@@ -4,6 +4,7 @@
 /* Low-latency softirqs come first in the following list. */
 enum {
     TIMER_SOFTIRQ = 0,
+    SCHED_SLAVE_SOFTIRQ,
     SCHEDULE_SOFTIRQ,
     NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
     RCU_SOFTIRQ,
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 40/49] xen/sched: add support for multiple vcpus per sched item where missing
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (38 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 39/49] xen/sched: add code to sync scheduling of all vcpus of a sched item Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 41/49] x86: make loading of GDT at context switch more modular Juergen Gross
                   ` (14 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

In several places there is support for multiple vcpus per sched item
missing. Add that missing support (with the exception of initial
allocation) and missing helpers for that.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c      | 28 +++++++++++++---------
 xen/include/xen/sched-if.h | 60 +++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 69 insertions(+), 19 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index d3474e6565..d33efbcdc5 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -184,8 +184,9 @@ static inline void vcpu_runstate_change(
     s_time_t delta;
     bool old_run, new_run;
 
-    ASSERT(v->runstate.state != new_state);
     ASSERT(spin_is_locked(per_cpu(sched_res, v->processor)->schedule_lock));
+    if ( v->runstate.state == new_state )
+        return;
 
     vcpu_urgent_count_update(v);
 
@@ -221,18 +222,23 @@ static inline void vcpu_runstate_change(
     v->runstate.state = new_state;
 }
 
+static inline void vcpu_runstate_helper(struct vcpu *v, int new_state,
+                                        s_time_t new_entry_time)
+{
+    vcpu_runstate_change(v,
+        ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
+         (vcpu_runnable(v) ? new_state : RUNSTATE_offline)),
+        new_entry_time);
+}
+
 static inline void sched_item_runstate_change(struct sched_item *item,
     bool running, s_time_t new_entry_time)
 {
-    struct vcpu *v = item->vcpu;
+    int new_state = running ? RUNSTATE_running : RUNSTATE_runnable;
+    struct vcpu *v;
 
-    if ( running )
-        vcpu_runstate_change(v, RUNSTATE_running, new_entry_time);
-    else
-        vcpu_runstate_change(v,
-            ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
-             (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)),
-            new_entry_time);
+    for_each_sched_item_vcpu( item, v )
+        vcpu_runstate_helper(v, new_state, new_entry_time);
 }
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
@@ -1616,7 +1622,7 @@ static void sched_switch_items(struct sched_resource *sd,
              (next->vcpu->runstate.state == RUNSTATE_runnable) ?
              (now - next->state_entry_time) : 0, prev->next_time);
 
-    ASSERT(prev->vcpu->runstate.state == RUNSTATE_running);
+    ASSERT(item_running(prev));
 
     TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->item_id,
              next->domain->domain_id, next->item_id);
@@ -1624,7 +1630,7 @@ static void sched_switch_items(struct sched_resource *sd,
     sched_item_runstate_change(prev, false, now);
     prev->last_run_time = now;
 
-    ASSERT(next->vcpu->runstate.state != RUNSTATE_running);
+    ASSERT(!item_running(next));
     sched_item_runstate_change(next, true, now);
 
     /*
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 9688d174e4..49724aafd0 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -107,15 +107,41 @@ static inline bool is_idle_item(const struct sched_item *item)
     return is_idle_vcpu(item->vcpu);
 }
 
+static inline bool item_running(const struct sched_item *item)
+{
+    struct vcpu *v;
+
+    for_each_sched_item_vcpu( item, v )
+        if ( v->runstate.state == RUNSTATE_running )
+            return true;
+
+    return false;
+}
+
 static inline bool item_runnable(const struct sched_item *item)
 {
-    return vcpu_runnable(item->vcpu);
+    struct vcpu *v;
+
+    for_each_sched_item_vcpu( item, v )
+        if ( vcpu_runnable(v) )
+            return true;
+
+    return false;
 }
 
 static inline void sched_set_res(struct sched_item *item,
                                  struct sched_resource *res)
 {
-    item->vcpu->processor = res->processor;
+    int cpu = cpumask_first(res->cpus);
+    struct vcpu *v;
+
+    for_each_sched_item_vcpu( item, v )
+    {
+        ASSERT(cpu < nr_cpu_ids);
+        v->processor = cpu;
+        cpu = cpumask_next(cpu, res->cpus);
+    }
+
     item->res = res;
 }
 
@@ -127,25 +153,37 @@ static inline unsigned int sched_item_cpu(struct sched_item *item)
 static inline void sched_set_pause_flags(struct sched_item *item,
                                          unsigned int bit)
 {
-    __set_bit(bit, &item->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_item_vcpu( item, v )
+        __set_bit(bit, &v->pause_flags);
 }
 
 static inline void sched_clear_pause_flags(struct sched_item *item,
                                            unsigned int bit)
 {
-    __clear_bit(bit, &item->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_item_vcpu( item, v )
+        __clear_bit(bit, &v->pause_flags);
 }
 
 static inline void sched_set_pause_flags_atomic(struct sched_item *item,
                                                 unsigned int bit)
 {
-    set_bit(bit, &item->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_item_vcpu( item, v )
+        set_bit(bit, &v->pause_flags);
 }
 
 static inline void sched_clear_pause_flags_atomic(struct sched_item *item,
                                                   unsigned int bit)
 {
-    clear_bit(bit, &item->vcpu->pause_flags);
+    struct vcpu *v;
+
+    for_each_sched_item_vcpu( item, v )
+        clear_bit(bit, &v->pause_flags);
 }
 
 static inline struct sched_item *sched_idle_item(unsigned int cpu)
@@ -327,12 +365,18 @@ static inline void sched_free_domdata(const struct scheduler *s,
 
 static inline void sched_item_pause_nosync(struct sched_item *item)
 {
-    vcpu_pause_nosync(item->vcpu);
+    struct vcpu *v;
+
+    for_each_sched_item_vcpu( item, v )
+        vcpu_pause_nosync(v);
 }
 
 static inline void sched_item_unpause(struct sched_item *item)
 {
-    vcpu_unpause(item->vcpu);
+    struct vcpu *v;
+
+    for_each_sched_item_vcpu( item, v )
+        vcpu_unpause(v);
 }
 
 #define REGISTER_SCHEDULER(x) static const struct scheduler *x##_entry \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 41/49] x86: make loading of GDT at context switch more modular
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (39 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 40/49] xen/sched: add support for multiple vcpus per sched item where missing Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 42/49] xen/sched: add support for guest vcpu idle Juergen Gross
                   ` (13 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

In preparation for core scheduling carve out the GDT related
functionality (writing GDT related PTEs, loading default of full GDT)
into sub-functions.

Instead of dynamically decide whether the previous vcpu was using full
or default GDT just add a percpu variable for that purpose. This at
once removes the need for testing vcpu_ids to differ twice.

Cache the need_full_gdt(nd) value in a local variable.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain.c | 71 +++++++++++++++++++++++++++++++++------------------
 1 file changed, 46 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 7daba4fb91..5e764d8a54 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -73,6 +73,8 @@
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 
+static DEFINE_PER_CPU(bool, full_gdt_loaded);
+
 static void default_idle(void);
 void (*pm_idle) (void) __read_mostly = default_idle;
 void (*dead_idle) (void) __read_mostly = default_dead_idle;
@@ -1614,6 +1616,41 @@ static inline bool need_full_gdt(const struct domain *d)
     return is_pv_domain(d) && !is_idle_domain(d);
 }
 
+static inline void write_full_gdt_ptes(seg_desc_t *gdt, struct vcpu *v)
+{
+    unsigned long mfn = virt_to_mfn(gdt);
+    l1_pgentry_t *pl1e = pv_gdt_ptes(v);
+    unsigned int i;
+
+    for ( i = 0; i < NR_RESERVED_GDT_PAGES; i++ )
+        l1e_write(pl1e + FIRST_RESERVED_GDT_PAGE + i,
+                  l1e_from_pfn(mfn + i, __PAGE_HYPERVISOR_RW));
+}
+
+static inline void load_full_gdt(struct vcpu *v, unsigned int cpu)
+{
+    struct desc_ptr gdt_desc;
+
+    gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
+    gdt_desc.base = GDT_VIRT_START(v);
+
+    lgdt(&gdt_desc);
+
+    per_cpu(full_gdt_loaded, cpu) = true;
+}
+
+static inline void load_default_gdt(seg_desc_t *gdt, unsigned int cpu)
+{
+    struct desc_ptr gdt_desc;
+
+    gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
+    gdt_desc.base  = (unsigned long)(gdt - FIRST_RESERVED_GDT_ENTRY);
+
+    lgdt(&gdt_desc);
+
+    per_cpu(full_gdt_loaded, cpu) = false;
+}
+
 static void __context_switch(void)
 {
     struct cpu_user_regs *stack_regs = guest_cpu_user_regs();
@@ -1622,7 +1659,7 @@ static void __context_switch(void)
     struct vcpu          *n = current;
     struct domain        *pd = p->domain, *nd = n->domain;
     seg_desc_t           *gdt;
-    struct desc_ptr       gdt_desc;
+    bool                  need_full_gdt_n;
 
     ASSERT(p != n);
     ASSERT(!vcpu_cpu_dirty(n));
@@ -1664,25 +1701,15 @@ static void __context_switch(void)
 
     gdt = !is_pv_32bit_domain(nd) ? per_cpu(gdt_table, cpu) :
                                     per_cpu(compat_gdt_table, cpu);
-    if ( need_full_gdt(nd) )
-    {
-        unsigned long mfn = virt_to_mfn(gdt);
-        l1_pgentry_t *pl1e = pv_gdt_ptes(n);
-        unsigned int i;
 
-        for ( i = 0; i < NR_RESERVED_GDT_PAGES; i++ )
-            l1e_write(pl1e + FIRST_RESERVED_GDT_PAGE + i,
-                      l1e_from_pfn(mfn + i, __PAGE_HYPERVISOR_RW));
-    }
+    need_full_gdt_n = need_full_gdt(nd);
 
-    if ( need_full_gdt(pd) &&
-         ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd)) )
-    {
-        gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
-        gdt_desc.base  = (unsigned long)(gdt - FIRST_RESERVED_GDT_ENTRY);
+    if ( need_full_gdt_n )
+        write_full_gdt_ptes(gdt, n);
 
-        lgdt(&gdt_desc);
-    }
+    if ( per_cpu(full_gdt_loaded, cpu) &&
+         ((p->vcpu_id != n->vcpu_id) || !need_full_gdt_n) )
+        load_default_gdt(gdt, cpu);
 
     write_ptbase(n);
 
@@ -1693,14 +1720,8 @@ static void __context_switch(void)
         svm_load_segs(0, 0, 0, 0, 0, 0, 0);
 #endif
 
-    if ( need_full_gdt(nd) &&
-         ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) )
-    {
-        gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
-        gdt_desc.base = GDT_VIRT_START(n);
-
-        lgdt(&gdt_desc);
-    }
+    if ( need_full_gdt_n && !per_cpu(full_gdt_loaded, cpu) )
+        load_full_gdt(n, cpu);
 
     if ( pd != nd )
         cpumask_clear_cpu(cpu, pd->dirty_cpumask);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 42/49] xen/sched: add support for guest vcpu idle
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (40 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 41/49] x86: make loading of GDT at context switch more modular Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 43/49] xen/sched: modify cpupool_domain_cpumask() to be an item mask Juergen Gross
                   ` (12 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

With core scheduling active a single vcpu might need to idle while
other vcpu(s) is/are running. In order to avoid having to mix vcpus
from different sched items on the same sched resource we need a new
idle mode in an active guest vcpu.

This idle is similar to the idle_loop() of the idle vcpus, but
without any tasklet work, memory scrubbing or live patch work. We
avoid deep sleep states by setting the vcpu to "urgent".

As the guest idle vcpu should still be active from the hypervisor's
point of view we need a valid cr3 value to be active even if the vcpu
has not been initialized yet. For this purpose allocate a l4 page for
pv-domains or allocate the monitor table early for HVM domains.

Some assertions need to be modified to accept an offline vcpu to
appear to be running now.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/arm/domain.c               | 14 ++++++++++++++
 xen/arch/x86/domain.c               | 20 +++++++++++++++++++-
 xen/arch/x86/hvm/hvm.c              |  2 ++
 xen/arch/x86/mm.c                   | 10 +++++++++-
 xen/arch/x86/pv/descriptor-tables.c |  6 +++---
 xen/arch/x86/pv/domain.c            | 19 +++++++++++++++++++
 xen/common/schedule.c               | 13 ++++++++++++-
 xen/include/asm-x86/domain.h        |  3 +++
 xen/include/xen/sched-if.h          |  3 +++
 xen/include/xen/sched.h             |  2 ++
 10 files changed, 86 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 6dc633ed50..881523d87f 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -59,6 +59,18 @@ static void do_idle(void)
     sched_tick_resume();
 }
 
+void guest_idle_loop(void)
+{
+    unsigned int cpu = smp_processor_id();
+
+    for ( ; ; )
+    {
+        if ( !softirq_pending(cpu) )
+            do_idle();
+        do_softirq();
+    }
+}
+
 void idle_loop(void)
 {
     unsigned int cpu = smp_processor_id();
@@ -329,6 +341,8 @@ static void continue_new_vcpu(struct vcpu *prev)
 
     if ( is_idle_vcpu(current) )
         reset_stack_and_jump(idle_loop);
+    else if ( !vcpu_runnable(current) )
+        sched_vcpu_idle(current);
     else if ( is_32bit_domain(current->domain) )
         /* check_wakeup_from_wait(); */
         reset_stack_and_jump(return_to_new_vcpu32);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 5e764d8a54..9acf2e9792 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -126,6 +126,18 @@ static void play_dead(void)
     (*dead_idle)();
 }
 
+void guest_idle_loop(void)
+{
+    unsigned int cpu = smp_processor_id();
+
+    for ( ; ; )
+    {
+        if ( !softirq_pending(cpu) )
+            pm_idle();
+        do_softirq();
+    }
+}
+
 static void idle_loop(void)
 {
     unsigned int cpu = smp_processor_id();
@@ -1702,7 +1714,7 @@ static void __context_switch(void)
     gdt = !is_pv_32bit_domain(nd) ? per_cpu(gdt_table, cpu) :
                                     per_cpu(compat_gdt_table, cpu);
 
-    need_full_gdt_n = need_full_gdt(nd);
+    need_full_gdt_n = need_full_gdt(nd) && is_vcpu_online(n);
 
     if ( need_full_gdt_n )
         write_full_gdt_ptes(gdt, n);
@@ -1855,6 +1867,9 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     /* Ensure that the vcpu has an up-to-date time base. */
     update_vcpu_system_time(next);
 
+    if ( !vcpu_runnable(next) )
+        sched_vcpu_idle(next);
+
     /*
      * Schedule tail *should* be a terminal function pointer, but leave a
      * bug frame around just in case it returns, to save going back into the
@@ -1868,6 +1883,9 @@ void continue_running(struct vcpu *same)
 {
     context_wait_rendezvous_out(same->sched_item, NULL);
 
+    if ( !vcpu_runnable(same) )
+        sched_vcpu_idle(same);
+
     /* See the comment above. */
     same->domain->arch.ctxt_switch->tail(same);
     BUG();
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index f184136f81..6668df9f3b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1541,6 +1541,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
         hvm_set_guest_tsc(v, 0);
     }
 
+    paging_update_paging_modes(v);
+
     return 0;
 
  fail6:
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index dbec130da0..a3d97adfca 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3016,9 +3016,15 @@ int vcpu_destroy_pagetables(struct vcpu *v)
 {
     unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
     struct page_info *page;
-    l4_pgentry_t *l4tab = NULL;
+    l4_pgentry_t *l4tab = v->domain->arch.pv.l4tab_idle;
     int rc = put_old_guest_table(v);
 
+    if ( l4tab && mfn == __virt_to_mfn(l4tab) )
+    {
+        v->arch.guest_table = pagetable_null();
+        mfn = 0;
+    }
+
     if ( rc )
         return rc;
 
@@ -3027,6 +3033,8 @@ int vcpu_destroy_pagetables(struct vcpu *v)
         l4tab = map_domain_page(_mfn(mfn));
         mfn = l4e_get_pfn(*l4tab);
     }
+    else
+        l4tab = NULL;
 
     if ( mfn )
     {
diff --git a/xen/arch/x86/pv/descriptor-tables.c b/xen/arch/x86/pv/descriptor-tables.c
index 940804b18a..1bcb1c2dd6 100644
--- a/xen/arch/x86/pv/descriptor-tables.c
+++ b/xen/arch/x86/pv/descriptor-tables.c
@@ -43,7 +43,7 @@ bool pv_destroy_ldt(struct vcpu *v)
     if ( v->arch.pv.shadow_ldt_mapcnt == 0 )
         goto out;
 #else
-    ASSERT(v == current || !vcpu_cpu_dirty(v));
+    ASSERT(v == current || !vcpu_cpu_dirty(v) || (v->pause_flags & VPF_down));
 #endif
 
     pl1e = pv_ldt_ptes(v);
@@ -80,7 +80,7 @@ void pv_destroy_gdt(struct vcpu *v)
     l1_pgentry_t zero_l1e = l1e_from_mfn(zero_mfn, __PAGE_HYPERVISOR_RO);
     unsigned int i;
 
-    ASSERT(v == current || !vcpu_cpu_dirty(v));
+    ASSERT(v == current || !vcpu_cpu_dirty(v) || (v->pause_flags & VPF_down));
 
     v->arch.pv.gdt_ents = 0;
     for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ )
@@ -102,7 +102,7 @@ long pv_set_gdt(struct vcpu *v, unsigned long *frames, unsigned int entries)
     l1_pgentry_t *pl1e;
     unsigned int i, nr_frames = DIV_ROUND_UP(entries, 512);
 
-    ASSERT(v == current || !vcpu_cpu_dirty(v));
+    ASSERT(v == current || !vcpu_cpu_dirty(v) || (v->pause_flags & VPF_down));
 
     if ( entries > FIRST_RESERVED_GDT_ENTRY )
         return -EINVAL;
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 4b6f48dea2..3ecdb96e8e 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -259,6 +259,12 @@ int pv_vcpu_initialise(struct vcpu *v)
             goto done;
     }
 
+    if ( d->arch.pv.l4tab_idle )
+    {
+        v->arch.guest_table = pagetable_from_paddr(__pa(d->arch.pv.l4tab_idle));
+        update_cr3(v);
+    }
+
  done:
     if ( rc )
         pv_vcpu_destroy(v);
@@ -275,6 +281,7 @@ void pv_domain_destroy(struct domain *d)
     XFREE(d->arch.pv.cpuidmasks);
 
     FREE_XENHEAP_PAGE(d->arch.pv.gdt_ldt_l1tab);
+    FREE_XENHEAP_PAGE(d->arch.pv.l4tab_idle);
 }
 
 
@@ -307,6 +314,18 @@ int pv_domain_initialise(struct domain *d)
 
     d->arch.ctxt_switch = &pv_csw;
 
+    if ( sched_granularity > 1 )
+    {
+        l4_pgentry_t *l4;
+
+        l4 = alloc_xenheap_pages(0, MEMF_node(domain_to_node(d)));
+        if ( !l4 )
+            goto fail;
+        clear_page(l4);
+        init_xen_l4_slots(l4, _mfn(virt_to_mfn(l4)), d, INVALID_MFN, true);
+        d->arch.pv.l4tab_idle = l4;
+    }
+
     /* 64-bit PV guest by default. */
     d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index d33efbcdc5..d2a02aea34 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -56,7 +56,7 @@ int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
 
 /* Number of vcpus per struct sched_item. */
-static unsigned int sched_granularity = 1;
+unsigned int sched_granularity = 1;
 
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
@@ -1124,6 +1124,17 @@ int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity)
     return vcpu_set_affinity(v, affinity, v->sched_item->cpu_soft_affinity);
 }
 
+void sched_vcpu_idle(struct vcpu *v)
+{
+    if ( !v->is_urgent )
+    {
+        v->is_urgent = 1;
+        atomic_inc(&per_cpu(sched_res, v->processor)->urgent_count);
+    }
+
+    reset_stack_and_jump(guest_idle_loop);
+}
+
 /* Block the currently-executing domain until a pertinent event occurs. */
 void vcpu_block(void)
 {
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 214e44ce1c..695292456b 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -254,6 +254,9 @@ struct pv_domain
 
     atomic_t nr_l4_pages;
 
+    /* L4 tab for offline vcpus with scheduling granularity > 1. */
+    l4_pgentry_t *l4tab_idle;
+
     /* XPTI active? */
     bool xpti;
     /* Use PCID feature? */
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 49724aafd0..4a3fb092c2 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -201,6 +201,9 @@ static inline unsigned int sched_get_resource_cpu(unsigned int cpu)
     return per_cpu(sched_res, cpu)->processor;
 }
 
+void sched_vcpu_idle(struct vcpu *v);
+void guest_idle_loop(void);
+
 /*
  * Scratch space, for avoiding having too many cpumask_t on the stack.
  * Within each scheduler, when using the scratch mask of one pCPU:
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 873a903977..52a1abfca9 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -488,6 +488,8 @@ extern struct vcpu *idle_vcpu[NR_CPUS];
 #define is_idle_domain(d) ((d)->domain_id == DOMID_IDLE)
 #define is_idle_vcpu(v)   (is_idle_domain((v)->domain))
 
+extern unsigned int sched_granularity;
+
 static inline bool is_system_domain(const struct domain *d)
 {
     return d->domain_id >= DOMID_FIRST_RESERVED;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 43/49] xen/sched: modify cpupool_domain_cpumask() to be an item mask
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (41 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 42/49] xen/sched: add support for guest vcpu idle Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 44/49] xen: round up max vcpus to scheduling granularity Juergen Gross
                   ` (11 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

cpupool_domain_cpumask() is used by scheduling to select cpus or to
iterate over cpus. In order to support scheduling items spanning
multiple cpus let cpupool_domain_cpumask() return a cpumask with only
one bit set per scheduling resource.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/cpupool.c       | 30 +++++++++++++++++++++---------
 xen/common/schedule.c      |  5 +++--
 xen/include/xen/sched-if.h |  5 ++++-
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 31ac323e40..ba76045937 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -38,26 +38,35 @@ DEFINE_PER_CPU(struct cpupool *, cpupool);
 
 #define cpupool_dprintk(x...) ((void)0)
 
+static void free_cpupool_struct(struct cpupool *c)
+{
+    if ( c )
+    {
+        free_cpumask_var(c->res_valid);
+        free_cpumask_var(c->cpu_valid);
+    }
+    xfree(c);
+}
+
 static struct cpupool *alloc_cpupool_struct(void)
 {
     struct cpupool *c = xzalloc(struct cpupool);
 
-    if ( !c || !zalloc_cpumask_var(&c->cpu_valid) )
+    if ( !c )
+        return NULL;
+
+    zalloc_cpumask_var(&c->cpu_valid);
+    zalloc_cpumask_var(&c->res_valid);
+
+    if ( !c->cpu_valid || !c->res_valid )
     {
-        xfree(c);
+        free_cpupool_struct(c);
         c = NULL;
     }
 
     return c;
 }
 
-static void free_cpupool_struct(struct cpupool *c)
-{
-    if ( c )
-        free_cpumask_var(c->cpu_valid);
-    xfree(c);
-}
-
 /*
  * find a cpupool by it's id. to be called with cpupool lock held
  * if exact is not specified, the first cpupool with an id larger or equal to
@@ -271,6 +280,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
         cpupool_cpu_moving = NULL;
     }
     cpumask_set_cpu(cpu, c->cpu_valid);
+    cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
 
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool(d, c)
@@ -393,6 +403,7 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
     atomic_inc(&c->refcnt);
     cpupool_cpu_moving = c;
     cpumask_clear_cpu(cpu, c->cpu_valid);
+    cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
     spin_unlock(&cpupool_lock);
 
     work_cpu = smp_processor_id();
@@ -509,6 +520,7 @@ static int cpupool_cpu_remove(unsigned int cpu)
          * allowed only for CPUs in pool0.
          */
         cpumask_clear_cpu(cpu, cpupool0->cpu_valid);
+        cpumask_and(cpupool0->res_valid, cpupool0->cpu_valid, sched_res_mask);
         ret = 0;
     }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index d2a02aea34..7fb0b1ed4e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -57,6 +57,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
 
 /* Number of vcpus per struct sched_item. */
 unsigned int sched_granularity = 1;
+const cpumask_t *sched_res_mask = &cpumask_all;
 
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
@@ -372,9 +373,9 @@ static unsigned int sched_select_initial_cpu(struct vcpu *v)
     cpumask_clear(&cpus);
     for_each_node_mask ( node, d->node_affinity )
         cpumask_or(&cpus, &cpus, &node_to_cpumask(node));
-    cpumask_and(&cpus, &cpus, cpupool_domain_cpumask(d));
+    cpumask_and(&cpus, &cpus, d->cpupool->cpu_valid);
     if ( cpumask_empty(&cpus) )
-        cpumask_copy(&cpus, cpupool_domain_cpumask(d));
+        cpumask_copy(&cpus, d->cpupool->cpu_valid);
 
     if ( v->vcpu_id == 0 )
         return cpumask_first(&cpus);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 4a3fb092c2..2b2612302d 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -22,6 +22,8 @@ extern cpumask_t cpupool_free_cpus;
 #define SCHED_DEFAULT_RATELIMIT_US 1000
 extern int sched_ratelimit_us;
 
+/* Scheduling resource mask. */
+extern const cpumask_t *sched_res_mask;
 
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
@@ -389,6 +391,7 @@ struct cpupool
 {
     int              cpupool_id;
     cpumask_var_t    cpu_valid;      /* all cpus assigned to pool */
+    cpumask_var_t    res_valid;      /* all scheduling resources of pool */
     struct cpupool   *next;
     unsigned int     n_dom;
     struct scheduler *sched;
@@ -405,7 +408,7 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
      * be interested in calling this for the idle domain.
      */
     ASSERT(d->cpupool != NULL);
-    return d->cpupool->cpu_valid;
+    return d->cpupool->res_valid;
 }
 
 /*
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 44/49] xen: round up max vcpus to scheduling granularity
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (42 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 43/49] xen/sched: modify cpupool_domain_cpumask() to be an item mask Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-04-01  8:50   ` Andrew Cooper
  2019-03-29 15:09 ` [PATCH RFC 45/49] xen/sched: support allocating multiple vcpus into one sched item Juergen Gross
                   ` (10 subsequent siblings)
  54 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Roger Pau Monné

Make sure the number of vcpus is always a multiple of the scheduling
granularity. Note that we don't support a scheduling granularity above
one on ARM.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/dom0_build.c | 1 +
 xen/common/domain.c       | 1 +
 xen/common/domctl.c       | 1 +
 xen/include/xen/sched.h   | 5 +++++
 4 files changed, 8 insertions(+)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 77b5646424..76a81dd4a9 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -258,6 +258,7 @@ unsigned int __init dom0_max_vcpus(void)
         max_vcpus = opt_dom0_max_vcpus_min;
     if ( opt_dom0_max_vcpus_max < max_vcpus )
         max_vcpus = opt_dom0_max_vcpus_max;
+    max_vcpus = sched_max_vcpus(max_vcpus);
     limit = dom0_pvh ? HVM_MAX_VCPUS : MAX_VIRT_CPUS;
     if ( max_vcpus > limit )
         max_vcpus = limit;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index b448d20d40..d338a2204c 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -290,6 +290,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
         return -EINVAL;
     }
 
+    config->max_vcpus = sched_max_vcpus(config->max_vcpus);
     if ( config->max_vcpus < 1 )
     {
         dprintk(XENLOG_INFO, "No vCPUS\n");
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index ccde1ba706..80837a2a5e 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -542,6 +542,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     {
         unsigned int i, max = op->u.max_vcpus.max;
 
+        max = sched_max_vcpus(max);
         ret = -EINVAL;
         if ( (d == current->domain) || /* no domain_pause() */
              (max != d->max_vcpus) )   /* max_vcpus set up in createdomain */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 52a1abfca9..314a453a60 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -490,6 +490,11 @@ extern struct vcpu *idle_vcpu[NR_CPUS];
 
 extern unsigned int sched_granularity;
 
+static inline unsigned int sched_max_vcpus(unsigned int n_vcpus)
+{
+    return DIV_ROUND_UP(n_vcpus, sched_granularity) * sched_granularity;
+}
+
 static inline bool is_system_domain(const struct domain *d)
 {
     return d->domain_id >= DOMID_FIRST_RESERVED;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 45/49] xen/sched: support allocating multiple vcpus into one sched item
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (43 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 44/49] xen: round up max vcpus to scheduling granularity Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 46/49] xen/sched: add a scheduler_percpu_init() function Juergen Gross
                   ` (9 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

With a scheduling granularity greater than 1 multiple vcpus share the
same struct sched_item. Support that.

Setting the initial processor must be done carefully: we can't use
sched_set_res() as that relies on for_each_sched_item_vcpu() which in
turn needs the vcpu already as a member of the domain's vcpu linked
list, which isn't the case.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 75 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 62 insertions(+), 13 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7fb0b1ed4e..a2140b3d7c 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -300,10 +300,25 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
     spin_unlock_irqrestore(lock1, flags);
 }
 
-static void sched_free_item(struct sched_item *item)
+static void sched_free_item(struct sched_item *item, struct vcpu *v)
 {
     struct sched_item *prev_item;
     struct domain *d = item->domain;
+    struct vcpu *vitem;
+    unsigned int cnt = 0;
+
+    /* Don't count to be released vcpu, might be not in vcpu list yet. */
+    for_each_sched_item_vcpu ( item, vitem )
+        if ( vitem != v )
+            cnt++;
+
+    v->sched_item = NULL;
+
+    if ( cnt )
+        return;
+
+    if ( item->vcpu == v )
+        item->vcpu = v->next_in_list;
 
     if ( d->sched_item_list == item )
         d->sched_item_list = item->next_in_list;
@@ -319,8 +334,6 @@ static void sched_free_item(struct sched_item *item)
         }
     }
 
-    item->vcpu->sched_item = NULL;
-
     free_cpumask_var(item->cpu_hard_affinity);
     free_cpumask_var(item->cpu_hard_affinity_tmp);
     free_cpumask_var(item->cpu_hard_affinity_saved);
@@ -329,17 +342,36 @@ static void sched_free_item(struct sched_item *item)
     xfree(item);
 }
 
+static void sched_item_add_vcpu(struct sched_item *item, struct vcpu *v)
+{
+    v->sched_item = item;
+    if ( !item->vcpu || item->vcpu->vcpu_id > v->vcpu_id )
+    {
+        item->vcpu = v;
+        item->item_id = v->vcpu_id;
+    }
+}
+
 static struct sched_item *sched_alloc_item(struct vcpu *v)
 {
     struct sched_item *item, **prev_item;
     struct domain *d = v->domain;
 
+    for_each_sched_item ( d, item )
+        if ( item->vcpu->vcpu_id / sched_granularity ==
+             v->vcpu_id / sched_granularity )
+            break;
+
+    if ( item )
+    {
+        sched_item_add_vcpu(item, v);
+        return item;
+    }
+
     if ( (item = xzalloc(struct sched_item)) == NULL )
         return NULL;
 
-    v->sched_item = item;
-    item->vcpu = v;
-    item->item_id = v->vcpu_id;
+    sched_item_add_vcpu(item, v);
     item->domain = d;
 
     for ( prev_item = &d->sched_item_list; *prev_item;
@@ -360,7 +392,7 @@ static struct sched_item *sched_alloc_item(struct vcpu *v)
     return item;
 
  fail:
-    sched_free_item(item);
+    sched_free_item(item, v);
     return NULL;
 }
 
@@ -404,8 +436,6 @@ int sched_init_vcpu(struct vcpu *v)
         item->idle_cnt++;
     }
 
-    sched_set_res(item, per_cpu(sched_res, processor));
-
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -414,10 +444,22 @@ int sched_init_vcpu(struct vcpu *v)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
+    /* If this is not the first vcpu of the item we are done. */
+    if ( item->priv != NULL )
+    {
+        /* We can rely on previous vcpu to exist. */
+        v->processor = cpumask_next(d->vcpu[v->vcpu_id - 1]->processor,
+                                    item->res->cpus);
+        return 0;
+    }
+
+    /* The first vcpu of an item can be set via sched_set_res(). */
+    sched_set_res(item, per_cpu(sched_res, processor));
+
     item->priv = SCHED_OP(dom_scheduler(d), alloc_vdata, item, d->sched_priv);
     if ( item->priv == NULL )
     {
-        sched_free_item(item);
+        sched_free_item(item, v);
         return 1;
     }
 
@@ -571,9 +613,16 @@ void sched_destroy_vcpu(struct vcpu *v)
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(sched_res, v->processor)->urgent_count);
-    SCHED_OP(vcpu_scheduler(v), remove_item, item);
-    SCHED_OP(vcpu_scheduler(v), free_vdata, item->priv);
-    sched_free_item(item);
+    /*
+     * Vcpus are being destroyed top-down. So being the first vcpu of an item
+     * is the same as being the only one.
+     */
+    if ( item->vcpu == v )
+    {
+        SCHED_OP(vcpu_scheduler(v), remove_item, item);
+        SCHED_OP(vcpu_scheduler(v), free_vdata, item->priv);
+        sched_free_item(item, v);
+    }
 }
 
 int sched_init_domain(struct domain *d, int poolid)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 46/49] xen/sched: add a scheduler_percpu_init() function
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (44 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 45/49] xen/sched: support allocating multiple vcpus into one sched item Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 47/49] xen/sched: support core scheduling in continue_running() Juergen Gross
                   ` (8 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

For support of core scheduling the scheduler cpu callback for
CPU_STARTING has to be moved into a dedicated function called by
start_secondary() as it needs to run before spin_debug_enable() then
due to potentially calling xfree().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/arm/smpboot.c  |  2 ++
 xen/arch/x86/smpboot.c  |  2 ++
 xen/common/schedule.c   | 19 ++++++++++++-------
 xen/include/xen/sched.h |  1 +
 4 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 0728a9b505..3ae32449d7 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -350,6 +350,8 @@ void start_secondary(unsigned long boot_phys_offset,
 
     setup_cpu_sibling_map(cpuid);
 
+    scheduler_percpu_init(cpu);
+
     /* Run local notifiers */
     notify_cpu_starting(cpuid);
     /*
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index b7a0a4a419..1c4f628b97 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -383,6 +383,8 @@ void start_secondary(void *unused)
 
     set_cpu_sibling_map(cpu);
 
+    scheduler_percpu_init(cpu);
+
     init_percpu_time();
 
     setup_secondary_APIC_clock();
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index a2140b3d7c..f43d00b59f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -2029,6 +2029,15 @@ static void cpu_schedule_down(unsigned int cpu)
     per_cpu(sched_res, cpu) = NULL;
 }
 
+void scheduler_percpu_init(unsigned int cpu)
+{
+    struct scheduler *sched = per_cpu(scheduler, cpu);
+    struct sched_resource *sd = per_cpu(sched_res, cpu);
+
+    if ( system_state != SYS_STATE_resume )
+        SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
+}
+
 static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -2047,8 +2056,8 @@ static int cpu_schedule_callback(
      * data can avoid implementing alloc_pdata. init_pdata may, however, be
      * necessary/useful in this case too (e.g., it can contain the "register
      * the pCPU to the scheduler" part). alloc_pdata (if present) is called
-     * during CPU_UP_PREPARE. init_pdata (if present) is called during
-     * CPU_STARTING.
+     * during CPU_UP_PREPARE. init_pdata (if present) is called before
+     * CPU_STARTING in scheduler_percpu_init().
      *
      * On the other hand, at teardown, we need to reverse what has been done
      * during initialization, and then free the per-pCPU specific data. This
@@ -2071,10 +2080,6 @@ static int cpu_schedule_callback(
      */
     switch ( action )
     {
-    case CPU_STARTING:
-        if ( system_state != SYS_STATE_resume )
-            SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
-        break;
     case CPU_UP_PREPARE:
         if ( system_state != SYS_STATE_resume )
             rc = cpu_schedule_up(cpu);
@@ -2171,7 +2176,7 @@ void __init scheduler_init(void)
     this_cpu(sched_res)->curr = idle_vcpu[0]->sched_item;
     this_cpu(sched_res)->sched_priv = SCHED_OP(&ops, alloc_pdata, 0);
     BUG_ON(IS_ERR(this_cpu(sched_res)->sched_priv));
-    SCHED_OP(&ops, init_pdata, this_cpu(sched_res)->sched_priv, 0);
+    scheduler_percpu_init(0);
 }
 
 /*
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 314a453a60..51b8b6a44f 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -632,6 +632,7 @@ void __domain_crash(struct domain *d);
 void noreturn asm_domain_crash_synchronous(unsigned long addr);
 
 void scheduler_init(void);
+void scheduler_percpu_init(unsigned int cpu);
 int  sched_init_vcpu(struct vcpu *v);
 void sched_destroy_vcpu(struct vcpu *v);
 int  sched_init_domain(struct domain *d, int poolid);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 47/49] xen/sched: support core scheduling in continue_running()
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (45 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 46/49] xen/sched: add a scheduler_percpu_init() function Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 48/49] xen/sched: make vcpu_wake() core scheduling aware Juergen Gross
                   ` (7 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

For core scheduling a transition from an offline vcpu to a running one
must be special cased: the vcpu might be in guest idle but the context
has to be loaded as if a context switch is to be done. For that purpose
add a flag to the vcpu structure which indicates that condition. That
flag is tested in continue_running() and if set the context is loaded
if required.

Carve out some context loading functionality from __context_switch()
into a new function as we need it in continue_running() now, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain.c     | 114 +++++++++++++++++++++++++++++++++++++++-------
 xen/arch/x86/hvm/hvm.c    |   2 +
 xen/arch/x86/hvm/vlapic.c |   1 +
 xen/common/domain.c       |   2 +
 xen/common/schedule.c     |  19 +++++---
 xen/include/xen/sched.h   |   3 ++
 6 files changed, 117 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9acf2e9792..7a51064de0 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1171,7 +1171,10 @@ int arch_set_info_guest(
 
  out:
     if ( flags & VGCF_online )
+    {
+        v->reload_context = true;
         clear_bit(_VPF_down, &v->pause_flags);
+    }
     else
         set_bit(_VPF_down, &v->pause_flags);
     return 0;
@@ -1663,6 +1666,24 @@ static inline void load_default_gdt(seg_desc_t *gdt, unsigned int cpu)
     per_cpu(full_gdt_loaded, cpu) = false;
 }
 
+static void inline csw_load_regs(struct vcpu *v,
+                                 struct cpu_user_regs *stack_regs)
+{
+    memcpy(stack_regs, &v->arch.user_regs, CTXT_SWITCH_STACK_BYTES);
+    if ( cpu_has_xsave )
+    {
+        u64 xcr0 = v->arch.xcr0 ?: XSTATE_FP_SSE;
+
+        if ( xcr0 != get_xcr0() && !set_xcr0(xcr0) )
+            BUG();
+
+        if ( cpu_has_xsaves && is_hvm_vcpu(v) )
+            set_msr_xss(v->arch.hvm.msr_xss);
+    }
+    vcpu_restore_fpu_nonlazy(v, false);
+    v->domain->arch.ctxt_switch->to(v);
+}
+
 static void __context_switch(void)
 {
     struct cpu_user_regs *stack_regs = guest_cpu_user_regs();
@@ -1676,7 +1697,7 @@ static void __context_switch(void)
     ASSERT(p != n);
     ASSERT(!vcpu_cpu_dirty(n));
 
-    if ( !is_idle_domain(pd) )
+    if ( !is_idle_domain(pd) && is_vcpu_online(p) && !p->reload_context )
     {
         memcpy(&p->arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
         vcpu_save_fpu(p);
@@ -1692,22 +1713,8 @@ static void __context_switch(void)
         cpumask_set_cpu(cpu, nd->dirty_cpumask);
     write_atomic(&n->dirty_cpu, cpu);
 
-    if ( !is_idle_domain(nd) )
-    {
-        memcpy(stack_regs, &n->arch.user_regs, CTXT_SWITCH_STACK_BYTES);
-        if ( cpu_has_xsave )
-        {
-            u64 xcr0 = n->arch.xcr0 ?: XSTATE_FP_SSE;
-
-            if ( xcr0 != get_xcr0() && !set_xcr0(xcr0) )
-                BUG();
-
-            if ( cpu_has_xsaves && is_hvm_vcpu(n) )
-                set_msr_xss(n->arch.hvm.msr_xss);
-        }
-        vcpu_restore_fpu_nonlazy(n, false);
-        nd->arch.ctxt_switch->to(n);
-    }
+    if ( !is_idle_domain(nd) && is_vcpu_online(n) )
+        csw_load_regs(n, stack_regs);
 
     psr_ctxt_switch_to(nd);
 
@@ -1775,6 +1782,72 @@ static void context_wait_rendezvous_out(struct sched_item *item,
         context_saved(prev);
 }
 
+static void __continue_running(struct vcpu *same)
+{
+    struct domain *d = same->domain;
+    seg_desc_t *gdt;
+    bool full_gdt = need_full_gdt(d);
+    unsigned int cpu = smp_processor_id();
+
+    gdt = !is_pv_32bit_domain(d) ? per_cpu(gdt_table, cpu) :
+                                   per_cpu(compat_gdt_table, cpu);
+
+    if ( same->reload_context )
+    {
+        struct cpu_user_regs *stack_regs = guest_cpu_user_regs();
+
+        get_cpu_info()->use_pv_cr3 = false;
+        get_cpu_info()->xen_cr3 = 0;
+
+        local_irq_disable();
+
+        csw_load_regs(same, stack_regs);
+
+        psr_ctxt_switch_to(d);
+
+        if ( full_gdt )
+            write_full_gdt_ptes(gdt, same);
+
+        write_ptbase(same);
+
+#if defined(CONFIG_PV) && defined(CONFIG_HVM)
+        /* Prefetch the VMCB if we expect to use it later in context switch */
+        if ( cpu_has_svm && is_pv_domain(d) && !is_pv_32bit_domain(d) &&
+             !(read_cr4() & X86_CR4_FSGSBASE) )
+            svm_load_segs(0, 0, 0, 0, 0, 0, 0);
+#endif
+
+        if ( full_gdt )
+            load_full_gdt(same, cpu);
+
+        local_irq_enable();
+
+        if ( is_pv_domain(d) )
+            load_segments(same);
+
+        same->reload_context = false;
+
+        _update_runstate_area(same);
+
+        update_vcpu_system_time(same);
+    }
+    else if ( !is_idle_vcpu(same) && full_gdt != per_cpu(full_gdt_loaded, cpu) )
+    {
+        local_irq_disable();
+
+        if ( full_gdt )
+        {
+            write_full_gdt_ptes(gdt, same);
+            write_ptbase(same);
+            load_full_gdt(same, cpu);
+        }
+        else
+            load_default_gdt(gdt, cpu);
+
+        local_irq_enable();
+    }
+}
+
 void context_switch(struct vcpu *prev, struct vcpu *next)
 {
     unsigned int cpu = smp_processor_id();
@@ -1811,6 +1884,9 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
          (is_idle_domain(nextd) && cpu_online(cpu)) )
     {
         local_irq_enable();
+
+        if ( !is_idle_domain(nextd) )
+            __continue_running(next);
     }
     else
     {
@@ -1822,6 +1898,8 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
         if ( is_pv_domain(nextd) )
             load_segments(next);
 
+        next->reload_context = false;
+
         ctxt_switch_levelling(next);
 
         if ( opt_ibpb && !is_idle_domain(nextd) )
@@ -1886,6 +1964,8 @@ void continue_running(struct vcpu *same)
     if ( !vcpu_runnable(same) )
         sched_vcpu_idle(same);
 
+    __continue_running(same);
+
     /* See the comment above. */
     same->domain->arch.ctxt_switch->tail(same);
     BUG();
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 6668df9f3b..12a6d62dc8 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1133,6 +1133,7 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
 
     /* Auxiliary processors should be woken immediately. */
     v->is_initialised = 1;
+    v->reload_context = true;
     clear_bit(_VPF_down, &v->pause_flags);
     vcpu_wake(v);
 
@@ -3913,6 +3914,7 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip)
 
     v->arch.flags |= TF_kernel_mode;
     v->is_initialised = 1;
+    v->reload_context = true;
     clear_bit(_VPF_down, &v->pause_flags);
 
  out:
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index a1a43cd792..41f8050c02 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -367,6 +367,7 @@ static void vlapic_accept_irq(struct vcpu *v, uint32_t icr_low)
             domain_lock(v->domain);
             if ( v->is_initialised )
                 wake = test_and_clear_bit(_VPF_down, &v->pause_flags);
+            v->reload_context = wake;
             domain_unlock(v->domain);
             if ( wake )
                 vcpu_wake(v);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index d338a2204c..b467197f05 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1383,6 +1383,8 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
                 rc = -EINVAL;
             else
                 wake = test_and_clear_bit(_VPF_down, &v->pause_flags);
+            if ( wake )
+                v->reload_context = true;
             domain_unlock(d);
             if ( wake )
                 vcpu_wake(v);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index f43d00b59f..7b30a153df 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1775,17 +1775,22 @@ static struct sched_item *sched_wait_rendezvous_in(struct sched_item *prev,
     {
         next = do_schedule(prev, now);
         atomic_set(&next->rendezvous_out_cnt, sched_granularity + 1);
-        return next;
     }
-
-    while ( prev->rendezvous_in_cnt )
+    else
     {
-        pcpu_schedule_unlock_irq(lock, cpu);
-        cpu_relax();
-        pcpu_schedule_lock_irq(cpu);
+        while ( prev->rendezvous_in_cnt )
+        {
+            pcpu_schedule_unlock_irq(lock, cpu);
+            cpu_relax();
+            pcpu_schedule_lock_irq(cpu);
+        }
+        next = prev->next_task;
     }
 
-    return prev->next_task;
+    if ( unlikely(prev == next) )
+        vcpu_runstate_helper(current, RUNSTATE_running, now);
+
+    return next;
 }
 
 static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 51b8b6a44f..13085ddf90 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -206,6 +206,9 @@ struct vcpu
     bool             hcall_compat;
 #endif
 
+    /* VCPU was down before (context might need to be reloaded). */
+    bool             reload_context;
+
     /* The CPU, if any, which is holding onto this VCPU's state. */
 #define VCPU_CPU_CLEAN (~0u)
     unsigned int     dirty_cpu;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 48/49] xen/sched: make vcpu_wake() core scheduling aware
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (46 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 47/49] xen/sched: support core scheduling in continue_running() Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:09 ` [PATCH RFC 49/49] xen/sched: add scheduling granularity enum Juergen Gross
                   ` (6 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

With core scheduling active a vcpu being woken up via vcpu_wake() might
be on a physical cpu in guest idle already. In this case it just needs
to be set to "running" and pinged via cpu_raise_softirq().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7b30a153df..ba03b588c8 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -705,16 +705,19 @@ void vcpu_wake(struct vcpu *v)
 {
     unsigned long flags;
     spinlock_t *lock;
+    struct sched_item *item = v->sched_item;
 
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 
-    lock = item_schedule_lock_irqsave(v->sched_item, &flags);
+    lock = item_schedule_lock_irqsave(item, &flags);
 
     if ( likely(vcpu_runnable(v)) )
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        SCHED_OP(vcpu_scheduler(v), wake, v->sched_item);
+        SCHED_OP(vcpu_scheduler(v), wake, item);
+        if ( item->is_running && v->runstate.state != RUNSTATE_running )
+            cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ);
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -722,7 +725,7 @@ void vcpu_wake(struct vcpu *v)
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
     }
 
-    item_schedule_unlock_irqrestore(lock, flags, v->sched_item);
+    item_schedule_unlock_irqrestore(lock, flags, item);
 }
 
 void vcpu_unblock(struct vcpu *v)
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH RFC 49/49] xen/sched: add scheduling granularity enum
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (47 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 48/49] xen/sched: make vcpu_wake() core scheduling aware Juergen Gross
@ 2019-03-29 15:09 ` Juergen Gross
  2019-03-29 15:37 ` [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (5 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Add a scheduling granularity enum ("thread", "core", "socket") for
specification of the scheduling granularity. Initially it is set to
"thread", this can be modified by the new boot parameter (x86 only)
"sched_granularity".

According to the selected granularity sched_granularity is set after
all cpus are online. The sched items of the idle vcpus and the sched
resources of the physical cpus need to be combined in case
sched_granularity > 1, this happens before the init_pdata hook of
the active scheduler is being called.

A test is added for all sched resources holding the same number of
cpus. For now panic if this is not the case.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/setup.c       |   2 +
 xen/common/schedule.c      | 196 +++++++++++++++++++++++++++++++++++++++------
 xen/include/xen/sched-if.h |   4 +-
 xen/include/xen/sched.h    |   1 +
 4 files changed, 177 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 3440794275..83854eeef8 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1701,6 +1701,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         printk(XENLOG_INFO "Parked %u CPUs\n", num_parked);
     smp_cpus_done();
 
+    scheduler_smp_init();
+
     do_initcalls();
 
     if ( opt_watchdog ) 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ba03b588c8..dceae08691 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -55,9 +55,32 @@ boolean_param("sched_smt_power_savings", sched_smt_power_savings);
 int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
 
+static enum {
+    SCHED_GRAN_thread,
+    SCHED_GRAN_core,
+    SCHED_GRAN_socket
+} opt_sched_granularity = SCHED_GRAN_thread;
+
+#ifdef CONFIG_X86
+static int __init sched_select_granularity(const char *str)
+{
+    if (strcmp("thread", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_thread;
+    else if (strcmp("core", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_core;
+    else if (strcmp("socket", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_socket;
+    else
+        return -EINVAL;
+
+    return 0;
+}
+custom_param("sched_granularity", sched_select_granularity);
+#endif
+
 /* Number of vcpus per struct sched_item. */
 unsigned int sched_granularity = 1;
-const cpumask_t *sched_res_mask = &cpumask_all;
+cpumask_var_t sched_res_mask;
 
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
@@ -68,6 +91,7 @@ static void poll_timer_fn(void *data);
 /* This is global for now so that private implementations can reach it */
 DEFINE_PER_CPU(struct scheduler *, scheduler);
 DEFINE_PER_CPU(struct sched_resource *, sched_res);
+static DEFINE_PER_CPU(unsigned int, sched_res_idx);
 
 /* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
@@ -82,6 +106,12 @@ static struct scheduler __read_mostly ops;
          (( (opsptr)->fn != NULL ) ? (opsptr)->fn(opsptr, ##__VA_ARGS__ )  \
           : (typeof((opsptr)->fn(opsptr, ##__VA_ARGS__)))0 )
 
+static inline struct vcpu *sched_item2vcpu_cpu(const struct sched_item *item,
+                                               unsigned int cpu)
+{
+    return item->domain->vcpu[item->item_id + per_cpu(sched_res_idx, cpu)];
+}
+
 static inline struct scheduler *dom_scheduler(const struct domain *d)
 {
     if ( likely(d->cpupool != NULL) )
@@ -300,25 +330,10 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
     spin_unlock_irqrestore(lock1, flags);
 }
 
-static void sched_free_item(struct sched_item *item, struct vcpu *v)
+static void sched_free_item_mem(struct sched_item *item)
 {
-    struct sched_item *prev_item;
     struct domain *d = item->domain;
-    struct vcpu *vitem;
-    unsigned int cnt = 0;
-
-    /* Don't count to be released vcpu, might be not in vcpu list yet. */
-    for_each_sched_item_vcpu ( item, vitem )
-        if ( vitem != v )
-            cnt++;
-
-    v->sched_item = NULL;
-
-    if ( cnt )
-        return;
-
-    if ( item->vcpu == v )
-        item->vcpu = v->next_in_list;
+    struct sched_item *prev_item;
 
     if ( d->sched_item_list == item )
         d->sched_item_list = item->next_in_list;
@@ -342,6 +357,30 @@ static void sched_free_item(struct sched_item *item, struct vcpu *v)
     xfree(item);
 }
 
+static void sched_free_item(struct sched_item *item, struct vcpu *v)
+{
+    struct vcpu *vitem;
+    unsigned int cnt = 0;
+
+    /* Don't count to be released vcpu, might be not in vcpu list yet. */
+    for_each_sched_item_vcpu ( item, vitem )
+        if ( vitem != v )
+            cnt++;
+
+    v->sched_item = NULL;
+
+    if ( item->vcpu == v )
+        item->vcpu = v->next_in_list;
+
+    if ( is_idle_domain(item->domain) )
+        item->run_cnt--;
+    else
+        item->idle_cnt--;
+
+    if ( !cnt )
+        sched_free_item_mem(item);
+}
+
 static void sched_item_add_vcpu(struct sched_item *item, struct vcpu *v)
 {
     v->sched_item = item;
@@ -1847,7 +1886,7 @@ static void sched_slave(void)
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
-    sched_context_switch(vprev, next->vcpu, now);
+    sched_context_switch(vprev, sched_item2vcpu_cpu(next, cpu), now);
 }
 
 /*
@@ -1906,7 +1945,7 @@ static void schedule(void)
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
-    vnext = next->vcpu;
+    vnext = sched_item2vcpu_cpu(next, cpu);
     sched_context_switch(vprev, vnext, now);
 }
 
@@ -1964,8 +2003,14 @@ static int cpu_schedule_up(unsigned int cpu)
     sd = xmalloc(struct sched_resource);
     if ( sd == NULL )
         return -ENOMEM;
+    if ( !zalloc_cpumask_var(&sd->cpus) )
+    {
+        xfree(sd);
+        return -ENOMEM;
+    }
+
     sd->processor = cpu;
-    sd->cpus = cpumask_of(cpu);
+    cpumask_copy(sd->cpus, cpumask_of(cpu));
     per_cpu(sched_res, cpu) = sd;
 
     per_cpu(scheduler, cpu) = &ops;
@@ -2025,6 +2070,12 @@ static void cpu_schedule_down(unsigned int cpu)
     struct sched_resource *sd = per_cpu(sched_res, cpu);
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
+    cpumask_clear_cpu(cpu, sd->cpus);
+    per_cpu(sched_res, cpu) = NULL;
+
+    if ( cpumask_weight(sd->cpus) )
+        return;
+
     SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
     SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_item->priv);
 
@@ -2032,18 +2083,67 @@ static void cpu_schedule_down(unsigned int cpu)
     sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
+    free_cpumask_var(sd->cpus);
+    cpumask_clear_cpu(cpu, sched_res_mask);
 
-    xfree(per_cpu(sched_res, cpu));
-    per_cpu(sched_res, cpu) = NULL;
+    xfree(sd);
 }
 
 void scheduler_percpu_init(unsigned int cpu)
 {
     struct scheduler *sched = per_cpu(scheduler, cpu);
     struct sched_resource *sd = per_cpu(sched_res, cpu);
+    const cpumask_t *mask;
+    unsigned int master_cpu;
+    spinlock_t *lock;
+    struct sched_item *old_item, *master_item;
+
+    if ( system_state == SYS_STATE_resume )
+        return;
+
+    switch ( opt_sched_granularity )
+    {
+    case SCHED_GRAN_thread:
+        mask = cpumask_of(cpu);
+        break;
+    case SCHED_GRAN_core:
+        mask = per_cpu(cpu_sibling_mask, cpu);
+        break;
+    case SCHED_GRAN_socket:
+        mask = per_cpu(cpu_core_mask, cpu);
+        break;
+    default:
+        ASSERT_UNREACHABLE();
+        return;
+    }
 
-    if ( system_state != SYS_STATE_resume )
+    if ( cpu == 0 || cpumask_weight(mask) == 1 )
+    {
+        cpumask_set_cpu(cpu, sched_res_mask);
         SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
+        return;
+    }
+
+    master_cpu = cpumask_first(mask);
+    master_item = idle_vcpu[master_cpu]->sched_item;
+    lock = pcpu_schedule_lock(master_cpu);
+
+    /* Merge idle_vcpu item and sched_resource into master cpu. */
+    old_item = idle_vcpu[cpu]->sched_item;
+    idle_vcpu[cpu]->sched_item = master_item;
+    per_cpu(sched_res, cpu) = per_cpu(sched_res, master_cpu);
+    per_cpu(sched_res_idx, cpu) = cpumask_weight(per_cpu(sched_res, cpu)->cpus);
+    cpumask_set_cpu(cpu, per_cpu(sched_res, cpu)->cpus);
+    master_item->run_cnt += old_item->run_cnt;
+    master_item->idle_cnt += old_item->idle_cnt;
+
+    pcpu_schedule_unlock(lock, master_cpu);
+
+    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
+    SCHED_OP(sched, free_vdata, old_item->priv);
+
+    xfree(sd);
+    sched_free_item_mem(old_item);
 }
 
 static int cpu_schedule_callback(
@@ -2123,6 +2223,51 @@ static struct notifier_block cpu_schedule_nfb = {
     .notifier_call = cpu_schedule_callback
 };
 
+static unsigned int __init sched_check_granularity(void)
+{
+    unsigned int cpu;
+    unsigned int siblings, gran = 0;
+
+    for_each_online_cpu( cpu )
+    {
+        switch ( opt_sched_granularity )
+        {
+        case SCHED_GRAN_thread:
+            /* If granularity is "thread" we are fine already. */
+            return 1;
+        case SCHED_GRAN_core:
+            siblings = cpumask_weight(per_cpu(cpu_sibling_mask, cpu));
+            break;
+        case SCHED_GRAN_socket:
+            siblings = cpumask_weight(per_cpu(cpu_core_mask, cpu));
+            break;
+        default:
+            ASSERT_UNREACHABLE();
+            return 0;
+        }
+
+        if ( gran == 0 )
+            gran = siblings;
+        else if ( gran != siblings )
+            return 0;
+    }
+
+    return gran;
+}
+
+/* Setup data for selected scheduler granularity. */
+void __init scheduler_smp_init(void)
+{
+    unsigned int gran;
+
+    gran = sched_check_granularity();
+    if ( gran == 0 )
+        panic("Illegal cpu configuration for scheduling granularity!\n"
+              "Please use thread scheduling.\n");
+
+    sched_granularity = gran;
+}
+
 /* Initialise the data structures. */
 void __init scheduler_init(void)
 {
@@ -2154,6 +2299,9 @@ void __init scheduler_init(void)
         printk("Using '%s' (%s)\n", ops.name, ops.opt_name);
     }
 
+    if ( !zalloc_cpumask_var(&sched_res_mask) )
+        BUG();
+
     if ( cpu_schedule_up(0) )
         BUG();
     register_cpu_notifier(&cpu_schedule_nfb);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 2b2612302d..09ee7281c6 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -23,7 +23,7 @@ extern cpumask_t cpupool_free_cpus;
 extern int sched_ratelimit_us;
 
 /* Scheduling resource mask. */
-extern const cpumask_t *sched_res_mask;
+extern cpumask_var_t sched_res_mask;
 
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
@@ -43,7 +43,7 @@ struct sched_resource {
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
     unsigned            processor;
-    const cpumask_t    *cpus;           /* cpus covered by this struct     */
+    cpumask_var_t       cpus;           /* cpus covered by this struct     */
 };
 
 #define curr_on_cpu(c)    (per_cpu(sched_res, c)->curr)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 13085ddf90..7cd83155f7 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -636,6 +636,7 @@ void noreturn asm_domain_crash_synchronous(unsigned long addr);
 
 void scheduler_init(void);
 void scheduler_percpu_init(unsigned int cpu);
+void scheduler_smp_init(void);
 int  sched_init_vcpu(struct vcpu *v);
 void sched_destroy_vcpu(struct vcpu *v);
 int  sched_init_domain(struct domain *d, int poolid);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (48 preceding siblings ...)
  2019-03-29 15:09 ` [PATCH RFC 49/49] xen/sched: add scheduling granularity enum Juergen Gross
@ 2019-03-29 15:37 ` Juergen Gross
  2019-03-29 15:39 ` Jan Beulich
                   ` (4 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jan Beulich,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Robert VanVossen, Dario Faggioli, Julien Grall, Paul Durrant,
	Josh Whitehead, Meng Xu, Jun Nakajima, Ian Jackson,
	Roger Pau Monné

On 29/03/2019 16:08, Juergen Gross wrote:
> This series is very RFC!!!!
> 
> Add support for core- and socket-scheduling in the Xen hypervisor.
> 
> Via boot parameter sched_granularity=core (or sched_granularity=socket)
> it is possible to change the scheduling granularity from thread (the
> default) to either whole cores or even sockets.
> 
> All logical cpus (threads) of the core or socket are always scheduled
> together. This means that on a core always vcpus of the same domain
> will be active, and those vcpus will always be scheduled at the same
> time.
> 
> This is achieved by switching the scheduler to no longer see vcpus as
> the primary object to schedule, but "schedule items". Each schedule
> item consists of as many vcpus as each core has threads on the current
> system. The vcpu->item relation is fixed.
> 
> I have done some very basic performance testing: on a 4 cpu system
> (2 cores with 2 threads each) I did a "make -j 4" for building the Xen
> hypervisor. With This test has been run on dom0, once with no other
> guest active and once with another guest with 4 vcpus running the same
> test. The results are (always elapsed time, system time, user time):
> 
> sched_granularity=thread, no other guest: 116.10 177.65 207.84
> sched_granularity=core,   no other guest: 114.04 175.47 207.45
> sched_granularity=thread, other guest:    202.30 334.21 384.63
> sched_granularity=core,   other guest:    207.24 293.04 371.37
> 
> All tests have been performed with credit2, the other schedulers are
> untested up to now.
> 
> Cpupools are not yet working, as moving cpus between cpupools needs
> more work.
> 
> HVM domains do not work yet, there is a doublefault in Xen at the
> end of Seabios. I'm currently investigating this issue.
> 
> This is x86-only for the moment. ARM doesn't even build with the
> series applied. For full ARM support I might need some help with the
> ARM specific context switch handling.
> 
> The first 7 patches have been sent to xen-devel already, I'm just
> adding them here for convenience as they are prerequisites.
> 
> I'm especially looking for feedback regarding the overall idea and
> design.

I have put the patches in a repository:

github.com/jgross1/xen.git sched-rfc


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (49 preceding siblings ...)
  2019-03-29 15:37 ` [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
@ 2019-03-29 15:39 ` Jan Beulich
       [not found] ` <5C9E3C3D0200007800222FB0@suse.com>
                   ` (3 subsequent siblings)
  54 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2019-03-29 15:39 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Paul Durrant,
	Joshua Whitehead, Meng Xu, Jun Nakajima, xen-devel,
	Roger Pau Monne

>>> On 29.03.19 at 16:08, <jgross@suse.com> wrote:
> Via boot parameter sched_granularity=core (or sched_granularity=socket)
> it is possible to change the scheduling granularity from thread (the
> default) to either whole cores or even sockets.
> 
> All logical cpus (threads) of the core or socket are always scheduled
> together. This means that on a core always vcpus of the same domain
> will be active, and those vcpus will always be scheduled at the same
> time.
> 
> This is achieved by switching the scheduler to no longer see vcpus as
> the primary object to schedule, but "schedule items". Each schedule
> item consists of as many vcpus as each core has threads on the current
> system. The vcpu->item relation is fixed.

Hmm, I find this surprising: A typical guest would have more vCPU-s
than there are threads per core. So if two of them want to run, but
each is associated with a different core, you'd need two cores instead
of one to actually fulfill the request? I could see this necessarily being
the case if you arranged vCPU-s into virtual threads, cores, sockets,
and nodes, but at least from the patch titles it doesn't look as if you
did in this series. Are there other reasons to make this a fixed
relationship?

As a minor cosmetic request visible from this cover letter right away:
Could the command line option please become "sched-granularity="
or even "sched-gran="?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
       [not found] ` <5C9E3C3D0200007800222FB0@suse.com>
@ 2019-03-29 15:46   ` Juergen Gross
  2019-03-29 16:56     ` Dario Faggioli
  0 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 15:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Paul Durrant,
	Joshua Whitehead, Meng Xu, Jun Nakajima, xen-devel,
	Roger Pau Monne

On 29/03/2019 16:39, Jan Beulich wrote:
>>>> On 29.03.19 at 16:08, <jgross@suse.com> wrote:
>> Via boot parameter sched_granularity=core (or sched_granularity=socket)
>> it is possible to change the scheduling granularity from thread (the
>> default) to either whole cores or even sockets.
>>
>> All logical cpus (threads) of the core or socket are always scheduled
>> together. This means that on a core always vcpus of the same domain
>> will be active, and those vcpus will always be scheduled at the same
>> time.
>>
>> This is achieved by switching the scheduler to no longer see vcpus as
>> the primary object to schedule, but "schedule items". Each schedule
>> item consists of as many vcpus as each core has threads on the current
>> system. The vcpu->item relation is fixed.
> 
> Hmm, I find this surprising: A typical guest would have more vCPU-s
> than there are threads per core. So if two of them want to run, but
> each is associated with a different core, you'd need two cores instead
> of one to actually fulfill the request? I could see this necessarily being

Correct.

> the case if you arranged vCPU-s into virtual threads, cores, sockets,
> and nodes, but at least from the patch titles it doesn't look as if you
> did in this series. Are there other reasons to make this a fixed
> relationship?

In fact I'm doing it, but only implicitly and without adapting the
cpuid related information. The idea is to pass the topology information
at least below the scheduling granularity to the guest later.

Not having the fixed relationship would result in something like the
co-scheduling series Dario already sent, which would need more than
mechanical changes in each scheduler.

> As a minor cosmetic request visible from this cover letter right away:
> Could the command line option please become "sched-granularity="
> or even "sched-gran="?

Of course!


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-03-29 15:46   ` Juergen Gross
@ 2019-03-29 16:56     ` Dario Faggioli
  2019-03-29 17:00       ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Dario Faggioli @ 2019-03-29 16:56 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Robert VanVossen,
	Julien Grall, Paul Durrant, Joshua Whitehead, Meng Xu,
	Jun Nakajima, xen-devel, Ian Jackson, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 2826 bytes --]

On Fri, 2019-03-29 at 16:46 +0100, Juergen Gross wrote:
> On 29/03/2019 16:39, Jan Beulich wrote:
> > > > > On 29.03.19 at 16:08, <jgross@suse.com> wrote:
> > > This is achieved by switching the scheduler to no longer see
> > > vcpus as
> > > the primary object to schedule, but "schedule items". Each
> > > schedule
> > > item consists of as many vcpus as each core has threads on the
> > > current
> > > system. The vcpu->item relation is fixed.
> > 
> > the case if you arranged vCPU-s into virtual threads, cores,
> > sockets,
> > and nodes, but at least from the patch titles it doesn't look as if
> > you
> > did in this series. Are there other reasons to make this a fixed
> > relationship?
> 
> In fact I'm doing it, but only implicitly and without adapting the
> cpuid related information. The idea is to pass the topology
> information
> at least below the scheduling granularity to the guest later.
> 
> Not having the fixed relationship would result in something like the
> co-scheduling series Dario already sent, which would need more than
> mechanical changes in each scheduler.
> 
Yep. So, just for the records, those series are, this one for Credit1:
https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg02164.html

And this one for Credit2:
https://lists.xenproject.org/archives/html/xen-devel/2018-10/msg01113.html

Both are RFC, but the Credit2 one was much, much better (more complete,
more tested, more stable, achieving better fairness, etc).

In these series, the "relationship" being discussed here is not fixed.
Not right now, at least, but it can become so (I didn't do it as we
currently lack the info for doing that properly).

It is/was, IMO, a good thing that everything work both with or without
topology enlightenment (even for one we'll have it, in case one, for
whatever reason, doesn't care).

As said by Juergen, the two approaches (and hence the structure of the
series) are quite different. This series is more generic, acts on the
common scheduler code and logic. It's quite intrusive, as we can see
:-D, but enables the feature for all the schedulers all at once (well,
they all need changes, but mostly mechanical).

My series, OTOH, act on each scheduler specifically (and in fact there
is one for Credit and one for Credit2, and there would need to be one
for RTDS, if wanted, etc). They're much more self contained, but less
generic; and the changes necessary within each scheduler are specific
to the scheduler itself, and non-mechanical.

Regards,
Dario
-- 
<<This happens because _I_ choose it to happen!>> (Raistlin)
------------------------------------------------------------
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-03-29 16:56     ` Dario Faggioli
@ 2019-03-29 17:00       ` Juergen Gross
  2019-03-29 17:29         ` Dario Faggioli
  2019-03-29 17:39         ` Rian Quinn
  0 siblings, 2 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-29 17:00 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Robert VanVossen,
	Julien Grall, Paul Durrant, Joshua Whitehead, Meng Xu,
	Jun Nakajima, xen-devel, Ian Jackson, Roger Pau Monne

On 29/03/2019 17:56, Dario Faggioli wrote:
> On Fri, 2019-03-29 at 16:46 +0100, Juergen Gross wrote:
>> On 29/03/2019 16:39, Jan Beulich wrote:
>>>>>> On 29.03.19 at 16:08, <jgross@suse.com> wrote:
>>>> This is achieved by switching the scheduler to no longer see
>>>> vcpus as
>>>> the primary object to schedule, but "schedule items". Each
>>>> schedule
>>>> item consists of as many vcpus as each core has threads on the
>>>> current
>>>> system. The vcpu->item relation is fixed.
>>>
>>> the case if you arranged vCPU-s into virtual threads, cores,
>>> sockets,
>>> and nodes, but at least from the patch titles it doesn't look as if
>>> you
>>> did in this series. Are there other reasons to make this a fixed
>>> relationship?
>>
>> In fact I'm doing it, but only implicitly and without adapting the
>> cpuid related information. The idea is to pass the topology
>> information
>> at least below the scheduling granularity to the guest later.
>>
>> Not having the fixed relationship would result in something like the
>> co-scheduling series Dario already sent, which would need more than
>> mechanical changes in each scheduler.
>>
> Yep. So, just for the records, those series are, this one for Credit1:
> https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg02164.html
> 
> And this one for Credit2:
> https://lists.xenproject.org/archives/html/xen-devel/2018-10/msg01113.html
> 
> Both are RFC, but the Credit2 one was much, much better (more complete,
> more tested, more stable, achieving better fairness, etc).
> 
> In these series, the "relationship" being discussed here is not fixed.
> Not right now, at least, but it can become so (I didn't do it as we
> currently lack the info for doing that properly).
> 
> It is/was, IMO, a good thing that everything work both with or without
> topology enlightenment (even for one we'll have it, in case one, for
> whatever reason, doesn't care).
> 
> As said by Juergen, the two approaches (and hence the structure of the
> series) are quite different. This series is more generic, acts on the
> common scheduler code and logic. It's quite intrusive, as we can see
> :-D, but enables the feature for all the schedulers all at once (well,
> they all need changes, but mostly mechanical).
> 
> My series, OTOH, act on each scheduler specifically (and in fact there
> is one for Credit and one for Credit2, and there would need to be one
> for RTDS, if wanted, etc). They're much more self contained, but less
> generic; and the changes necessary within each scheduler are specific
> to the scheduler itself, and non-mechanical.

Another line of thought: in case we want core scheduling for security
reasons (to ensure always vcpus of the same guest are sharing a core)
the same might apply to the guest itself: it might want to ensure
only threads of the same process are sharing a core. This would be
quite easy with my series, but impossible for Dario's solution without
the fixed relationship between guest siblings.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-03-29 17:00       ` Juergen Gross
@ 2019-03-29 17:29         ` Dario Faggioli
  2019-03-29 17:39         ` Rian Quinn
  1 sibling, 0 replies; 111+ messages in thread
From: Dario Faggioli @ 2019-03-29 17:29 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Robert VanVossen,
	Julien Grall, Paul Durrant, Joshua Whitehead, Meng Xu,
	Jun Nakajima, xen-devel, Ian Jackson, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 1848 bytes --]

On Fri, 2019-03-29 at 18:00 +0100, Juergen Gross wrote:
> On 29/03/2019 17:56, Dario Faggioli wrote:
> > As said by Juergen, the two approaches (and hence the structure of
> > the
> > series) are quite different. This series is more generic, acts on
> > the
> > common scheduler code and logic. It's quite intrusive, as we can
> > see
> > :-D, but enables the feature for all the schedulers all at once
> > (well,
> > they all need changes, but mostly mechanical).
> > 
> > My series, OTOH, act on each scheduler specifically (and in fact
> > there
> > is one for Credit and one for Credit2, and there would need to be
> > one
> > for RTDS, if wanted, etc). They're much more self contained, but
> > less
> > generic; and the changes necessary within each scheduler are
> > specific
> > to the scheduler itself, and non-mechanical.
> 
> Another line of thought: in case we want core scheduling for security
> reasons (to ensure always vcpus of the same guest are sharing a core)
> the same might apply to the guest itself: it might want to ensure
> only threads of the same process are sharing a core.
>
Sure, as soon as we'll manage to "passthrough" to it the necessary
topology information.

> This would be
> quite easy with my series, but impossible for Dario's solution
> without
> the fixed relationship between guest siblings.
>
Well, not "impossible". :-)

As said above, that's not there, but it can be added/implemented.

Anyway... Lemme go back looking at the patches, and preparing for
running benchmarks. :-D :-D

Dario
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-03-29 17:00       ` Juergen Gross
  2019-03-29 17:29         ` Dario Faggioli
@ 2019-03-29 17:39         ` Rian Quinn
  2019-03-29 17:48           ` Andrew Cooper
  1 sibling, 1 reply; 111+ messages in thread
From: Rian Quinn @ 2019-03-29 17:39 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jun Nakajima,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Robert VanVossen, Dario Faggioli, Julien Grall, Paul Durrant,
	Joshua Whitehead, Meng Xu, Jan Beulich, xen-devel, Ian Jackson,
	Roger Pau Monne

Out of curiosity, has there been any research done on whether or not
it makes more sense to just disable CPU threading with respect to
overall performance? In some of the testing that we did with OpenXT,
we noticed in some of our tests a performance increase when
hyperthreading was disabled. I would be curious what other research
has been done in this regard.

Either way, if threading is enabled, grouping up threads makes a lot
of sense WRT some of the recent security issues that have come up with
Intel CPUs.


On Fri, Mar 29, 2019 at 11:03 AM Juergen Gross <jgross@suse.com> wrote:
>
> On 29/03/2019 17:56, Dario Faggioli wrote:
> > On Fri, 2019-03-29 at 16:46 +0100, Juergen Gross wrote:
> >> On 29/03/2019 16:39, Jan Beulich wrote:
> >>>>>> On 29.03.19 at 16:08, <jgross@suse.com> wrote:
> >>>> This is achieved by switching the scheduler to no longer see
> >>>> vcpus as
> >>>> the primary object to schedule, but "schedule items". Each
> >>>> schedule
> >>>> item consists of as many vcpus as each core has threads on the
> >>>> current
> >>>> system. The vcpu->item relation is fixed.
> >>>
> >>> the case if you arranged vCPU-s into virtual threads, cores,
> >>> sockets,
> >>> and nodes, but at least from the patch titles it doesn't look as if
> >>> you
> >>> did in this series. Are there other reasons to make this a fixed
> >>> relationship?
> >>
> >> In fact I'm doing it, but only implicitly and without adapting the
> >> cpuid related information. The idea is to pass the topology
> >> information
> >> at least below the scheduling granularity to the guest later.
> >>
> >> Not having the fixed relationship would result in something like the
> >> co-scheduling series Dario already sent, which would need more than
> >> mechanical changes in each scheduler.
> >>
> > Yep. So, just for the records, those series are, this one for Credit1:
> > https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg02164.html
> >
> > And this one for Credit2:
> > https://lists.xenproject.org/archives/html/xen-devel/2018-10/msg01113.html
> >
> > Both are RFC, but the Credit2 one was much, much better (more complete,
> > more tested, more stable, achieving better fairness, etc).
> >
> > In these series, the "relationship" being discussed here is not fixed.
> > Not right now, at least, but it can become so (I didn't do it as we
> > currently lack the info for doing that properly).
> >
> > It is/was, IMO, a good thing that everything work both with or without
> > topology enlightenment (even for one we'll have it, in case one, for
> > whatever reason, doesn't care).
> >
> > As said by Juergen, the two approaches (and hence the structure of the
> > series) are quite different. This series is more generic, acts on the
> > common scheduler code and logic. It's quite intrusive, as we can see
> > :-D, but enables the feature for all the schedulers all at once (well,
> > they all need changes, but mostly mechanical).
> >
> > My series, OTOH, act on each scheduler specifically (and in fact there
> > is one for Credit and one for Credit2, and there would need to be one
> > for RTDS, if wanted, etc). They're much more self contained, but less
> > generic; and the changes necessary within each scheduler are specific
> > to the scheduler itself, and non-mechanical.
>
> Another line of thought: in case we want core scheduling for security
> reasons (to ensure always vcpus of the same guest are sharing a core)
> the same might apply to the guest itself: it might want to ensure
> only threads of the same process are sharing a core. This would be
> quite easy with my series, but impossible for Dario's solution without
> the fixed relationship between guest siblings.
>
>
> Juergen
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-03-29 17:39         ` Rian Quinn
@ 2019-03-29 17:48           ` Andrew Cooper
  2019-03-29 18:35             ` Rian Quinn
  0 siblings, 1 reply; 111+ messages in thread
From: Andrew Cooper @ 2019-03-29 17:48 UTC (permalink / raw)
  To: Rian Quinn, Juergen Gross
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jun Nakajima,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Robert VanVossen, Dario Faggioli, Julien Grall, Paul Durrant,
	Joshua Whitehead, Meng Xu, Jan Beulich, xen-devel,
	Roger Pau Monne

On 29/03/2019 17:39, Rian Quinn wrote:
> Out of curiosity, has there been any research done on whether or not
> it makes more sense to just disable CPU threading with respect to
> overall performance? In some of the testing that we did with OpenXT,
> we noticed in some of our tests a performance increase when
> hyperthreading was disabled. I would be curious what other research
> has been done in this regard.
>
> Either way, if threading is enabled, grouping up threads makes a lot
> of sense WRT some of the recent security issues that have come up with
> Intel CPUs.

There has been plenty of academic research done, and there are real
usecases where disabling HT improves performance.

However, there are plenty when it doesn't.  During L1TF testing,
XenServer measured one typical usecase (agregate small packet IO
throughput, which is representative of a load of webserver VMs) which
took a 60% perf hit.

10% of this was the raw L1D_FLUSH hit, while 50% of it was actually due
to the increased IO latency of halving the number of vcpus which could
be run concurrently.

As for core aware scheduling, even if nothing else, grouping things up
will get you better cache sharing from the VM's point of view.

As you can probably tell, the answer is far too workload dependent to
come up with a general rule, but at least having the options available
will let people experiment.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (51 preceding siblings ...)
       [not found] ` <5C9E3C3D0200007800222FB0@suse.com>
@ 2019-03-29 18:16 ` Dario Faggioli
  2019-03-30  9:55   ` Juergen Gross
  2019-04-11  0:34     ` [Xen-devel] " Dario Faggioli
  2019-04-01  6:41 ` Jan Beulich
       [not found] ` <5CA1B285020000780022361D@suse.com>
  54 siblings, 2 replies; 111+ messages in thread
From: Dario Faggioli @ 2019-03-29 18:16 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jan Beulich,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Robert VanVossen, Julien Grall, Paul Durrant, Josh Whitehead,
	Meng Xu, Jun Nakajima, Ian Jackson, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 2320 bytes --]

Even if I've only skimmed through it... cool series! :-D

On Fri, 2019-03-29 at 16:08 +0100, Juergen Gross wrote:
> 
> I have done some very basic performance testing: on a 4 cpu system
> (2 cores with 2 threads each) I did a "make -j 4" for building the
> Xen
> hypervisor. With This test has been run on dom0, once with no other
> guest active and once with another guest with 4 vcpus running the
> same
> test. The results are (always elapsed time, system time, user time):
> 
> sched_granularity=thread, no other guest: 116.10 177.65 207.84
> sched_granularity=core,   no other guest: 114.04 175.47 207.45
> sched_granularity=thread, other guest:    202.30 334.21 384.63
> sched_granularity=core,   other guest:    207.24 293.04 371.37
> 
So, just to be sure I'm reading this properly,
"sched_granularity=thread" means no co-scheduling of any sort is in
effect, right? Basically the patch series is applied, but "not used",
correct?

If yes, these are interesting, and promising, numbers. :-)

> All tests have been performed with credit2, the other schedulers are
> untested up to now.
> 
Just as an heads up for people (as Juergen knows this already :-D), I'm
planning to run some performance evaluation of this patches.

I've got an 8 CPUs system (4 cores, 2 threads each, no-NUMA) and an 16
CPUs system (2 sockets/NUMA nodes, 4 cores each, 2 threads each) on
which I should be able to get some bench suite running relatively easy
and (hopefully) quick.

I'm planning to evaluate:
- vanilla (i.e., without this series), SMT enabled in BIOS
- vanilla (i.e., without this series), SMT disabled in BIOS
- patched (i.e., with this series), granularity=thread
- patched (i.e., with this series), granularity=core

I'll do start with no overcommitment, and then move to 2x
overcommitment (as you did above).

And I'll also be focusing on Credit2 only.

Everyone else who also want to do some stress and performance testing
and share the results, that's very much appreciated. :-)

Regards,
Dario
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 07/49] xen/sched: fix credit2 smt idle handling
  2019-03-29 15:08 ` [PATCH RFC 07/49] xen/sched: fix credit2 smt idle handling Juergen Gross
@ 2019-03-29 18:22   ` Dario Faggioli
  0 siblings, 0 replies; 111+ messages in thread
From: Dario Faggioli @ 2019-03-29 18:22 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 1197 bytes --]

On Fri, 2019-03-29 at 16:08 +0100, Juergen Gross wrote:
> Credit2's smt_idle_mask_set() and smt_idle_mask_clear() are used to
> identify idle cores where vcpus can be moved to. A core is thought to
> be idle when all siblings are known to have the idle vcpu running on
> them.
> 
> Unfortunately the information of a vcpu running on a cpu is per
> runqueue. So in case not all siblings are in the same runqueue a core
> will never be regarded to be idle, as the sibling not in the runqueue
> is never known to run the idle vcpu.
> 
> Use a credit2 specific cpumask of siblings with only those cpus
> being marked which are in the same runqueue as the cpu in question.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V2:
> - use credit2 per-cpu specific sibling mask
> ---
>
FYI, I've sent my RoB to (I think) patches 1 to 7 in the other threads
where they've been posted.

Regards,
Dario
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-03-29 17:48           ` Andrew Cooper
@ 2019-03-29 18:35             ` Rian Quinn
  0 siblings, 0 replies; 111+ messages in thread
From: Rian Quinn @ 2019-03-29 18:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Kevin Tian, Stefano Stabellini, Wei Liu,
	Jun Nakajima, Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson,
	Tim Deegan, Robert VanVossen, Dario Faggioli, Julien Grall,
	Paul Durrant, Joshua Whitehead, Meng Xu, Jan Beulich, xen-devel,
	Roger Pau Monne

Makes sense. The reason I ask is we currently have to disable HT due
to L1TF until a scheduler change is made to address the issue and the
#1 question everyone asks is what will that do to performance so any
info on that topic and how a patch like this will address the L1TF
issue is most helpful.

On Fri, Mar 29, 2019 at 11:49 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 29/03/2019 17:39, Rian Quinn wrote:
> > Out of curiosity, has there been any research done on whether or not
> > it makes more sense to just disable CPU threading with respect to
> > overall performance? In some of the testing that we did with OpenXT,
> > we noticed in some of our tests a performance increase when
> > hyperthreading was disabled. I would be curious what other research
> > has been done in this regard.
> >
> > Either way, if threading is enabled, grouping up threads makes a lot
> > of sense WRT some of the recent security issues that have come up with
> > Intel CPUs.
>
> There has been plenty of academic research done, and there are real
> usecases where disabling HT improves performance.
>
> However, there are plenty when it doesn't.  During L1TF testing,
> XenServer measured one typical usecase (agregate small packet IO
> throughput, which is representative of a load of webserver VMs) which
> took a 60% perf hit.
>
> 10% of this was the raw L1D_FLUSH hit, while 50% of it was actually due
> to the increased IO latency of halving the number of vcpus which could
> be run concurrently.
>
> As for core aware scheduling, even if nothing else, grouping things up
> will get you better cache sharing from the VM's point of view.
>
> As you can probably tell, the answer is far too workload dependent to
> come up with a general rule, but at least having the options available
> will let people experiment.
>
> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces
  2019-03-29 15:08 ` [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces Juergen Gross
@ 2019-03-29 18:42   ` Andrew Cooper
  2019-03-30 10:24     ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Andrew Cooper @ 2019-03-29 18:42 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Tim Deegan, Robert VanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich

On 29/03/2019 15:08, Juergen Gross wrote:
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 6b5d454630..d1a958143a 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -256,6 +256,7 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
>  int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>  {
>      struct domain *d = v->domain;
> +    struct sched_item item = { .vcpu = v };
>  
>      v->processor = processor;
>  
> @@ -267,7 +268,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>      init_timer(&v->poll_timer, poll_timer_fn,
>                 v, v->processor);
>  
> -    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, v,
> +    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, &item,
>                       d->sched_priv);

I realise this is perhaps an over-the-top request, but can we see about
doing more here?

SCHED_OP() is a thoroughly objectionable piece of obfuscation, which
breaks cscope/ctags and also results in especially poor code generation.

Given that we are changing the interface anyway and touching all
codepaths, would you mind also adding static inline wrappers like I
started with 340edc3 ?

TBH, I'm even happy to give this a go and give you the back the
resulting tree, if you'd prefer.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 30/49] xen: let vcpu_create() select processor
  2019-03-29 15:09 ` [PATCH RFC 30/49] xen: let vcpu_create() select processor Juergen Gross
@ 2019-03-29 19:17   ` Andrew Cooper
  2019-03-30 10:23     ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Andrew Cooper @ 2019-03-29 19:17 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Dario Faggioli,
	Julien Grall, Jan Beulich, Roger Pau Monné

On 29/03/2019 15:09, Juergen Gross wrote:
> Today there are two distinct scenarios for vcpu_create(): either for
> creation of idle-domain vcpus (vcpuid == processor) or for creation of
> "normal" domain vcpus (including dom0), where the caller selects the
> initial processor on a round-robin scheme of the allowed processors
> (allowed being based on cpupool and affinities).
>
> Instead of passing the initial processor to vcpu_create() and passing
> on to sched_init_vcpu() let sched_init_vcpu() do the processor
> selection. For supporting dom0 vcpu creation use the node_affinity of
> the domain as a base for selecting the processors. User domains will
> have initially all nodes set, so this is no different behavior compared
> to today.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Good riddance to the parameter!  This will definitely simply some of my
further domcreate changes.

> index d9836779d1..d5294b0d26 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1986,12 +1986,11 @@ static int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
>      }
>  #endif
>  
> -    for ( i = 1, cpu = 0; i < d->max_vcpus; i++ )
> +    for ( i = 1; i < d->max_vcpus; i++ )
>      {
> -        cpu = cpumask_cycle(cpu, &cpu_online_map);
> -        if ( vcpu_create(d, i, cpu) == NULL )
> +        if ( vcpu_create(d, i) == NULL )
>          {
> -            printk("Failed to allocate dom0 vcpu %d on pcpu %d\n", i, cpu);
> +            printk("Failed to allocate dom0 vcpu %d\n", i);

Mind adjusting this to d0v%u as it is changing anyway?

> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index ae2a6d0323..9b5527c1eb 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -318,14 +318,40 @@ static struct sched_item *sched_alloc_item(struct vcpu *v)
>      return NULL;
>  }
>  
> -int sched_init_vcpu(struct vcpu *v, unsigned int processor)
> +static unsigned int sched_select_initial_cpu(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    nodeid_t node;
> +    cpumask_t cpus;
> +
> +    cpumask_clear(&cpus);
> +    for_each_node_mask ( node, d->node_affinity )
> +        cpumask_or(&cpus, &cpus, &node_to_cpumask(node));
> +    cpumask_and(&cpus, &cpus, cpupool_domain_cpumask(d));
> +    if ( cpumask_empty(&cpus) )
> +        cpumask_copy(&cpus, cpupool_domain_cpumask(d));
> +
> +    if ( v->vcpu_id == 0 )
> +        return cpumask_first(&cpus);
> +
> +    /* We can rely on previous vcpu being available. */

Only if you ASSERT(!is_idle_domain(d)), which is safe given the sole caller.

idle->vcpu[] can be sparse in some corner cases.

Ideally with both of these suggestions, Acked-by: Andrew Cooper
<andrew.cooper3@citrix.com>

> +    return cpumask_cycle(d->vcpu[v->vcpu_id - 1]->processor, &cpus);
> +}
> +
> +int sched_init_vcpu(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
>      struct sched_item *item;
> +    unsigned int processor;
>  
>      if ( (item = sched_alloc_item(v)) == NULL )
>          return 1;
>  
> +    if ( is_idle_domain(d) || d->is_pinned )
> +        processor = v->vcpu_id;
> +    else
> +        processor = sched_select_initial_cpu(v);
> +
>      sched_set_res(item, per_cpu(sched_res, processor));
>  
>      /* Initialise the per-vcpu timers. */


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 31/49] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers
  2019-03-29 15:09 ` [PATCH RFC 31/49] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers Juergen Gross
@ 2019-03-29 19:36   ` Andrew Cooper
  2019-03-30 10:22     ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Andrew Cooper @ 2019-03-29 19:36 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Robert VanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich

On 29/03/2019 15:09, Juergen Gross wrote:
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 9b5527c1eb..0b5e5e566b 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -347,7 +347,7 @@ int sched_init_vcpu(struct vcpu *v)
>      if ( (item = sched_alloc_item(v)) == NULL )
>          return 1;
>  
> -    if ( is_idle_domain(d) || d->is_pinned )
> +    if ( is_idle_domain(d) )
>          processor = v->vcpu_id;
>      else
>          processor = sched_select_initial_cpu(v);

This looks like a spurious change.  I also don't an obvious other patch
that it might fit into.

As for the field itself, it is also fairly objectionable.  It is only
ever set for the hardware domain, and only if dom0_vcpus_pin is used,
but the actual pinning information is also reflected in dom0's hard
affinity mask.

In practice, all this flag does is permit the use of VCPUOP_get_physid,
disallow the use of vcpu_set_hard_affinity(), and allow dom0 to attempt
to actually write to MSR_AMD64_NB_CFG, MSR_FAM10H_MMIO_CONF_BASE,
MSR_IA32_UCODE_REV, MSR_IA32_THERM_CONTROL and
MSR_IA32_ENERGY_PERF_BIAS, rather than having the write silently discarded.

Dom0's use of those MSRs is dubious at best, and disabled by default,
*and* when active, also cross-checks with the hard affinity mask.  Does
anyone use dom0_vcpus_pin in production?

I think there is quite a lot of value in getting rid of d->is_pinned and
is_pinned_vcpu() entirely, with will remove an extreme
corner-case-x86-ism out of the common code.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item
  2019-03-29 15:09 ` [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item Juergen Gross
@ 2019-03-29 21:33   ` Andrew Cooper
  2019-03-30  9:59     ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Andrew Cooper @ 2019-03-29 21:33 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Dario Faggioli,
	Julien Grall, Meng Xu, Jan Beulich, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 2118 bytes --]

On 29/03/2019 15:09, Juergen Gross wrote:
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 8d579e2cf9..5d8f3255cb 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -15,6 +15,7 @@
>  #include <xen/lib.h>
>  #include <xen/errno.h>
>  #include <xen/sched.h>
> +#include <xen/sched-if.h>
>  #include <xen/domain.h>
>  #include <xen/smp.h>
>  #include <xen/delay.h>

I'm afraid that this feels like a step in the wrong direction.

sched-if.h is (per the comments) supposed to be the schedulers
private.h, with the intention that struct scheduler didn't leak into the
rest of the codebase.  Also the logic for taking scheduler locks, etc,
and lastly for cpumask_scratch, which really is unsafe to use outside of
the scheduler (and has come up in several recent patch series).

Sadly,

andrewcoop@andrewcoop:/local/xen.git/xen$ git grep sched-if
arch/x86/acpi/cpu_idle.c:41:#include <xen/sched-if.h>
arch/x86/cpu/mcheck/mce.c:13:#include <xen/sched-if.h>
arch/x86/cpu/mcheck/mctelem.c:21:#include <xen/sched-if.h>
arch/x86/dom0_build.c:12:#include <xen/sched-if.h>
arch/x86/setup.c:6:#include <xen/sched-if.h>
arch/x86/smpboot.c:28:#include <xen/sched-if.h>
common/cpupool.c:19:#include <xen/sched-if.h>
common/domain.c:13:#include <xen/sched-if.h>
common/domctl.c:14:#include <xen/sched-if.h>
common/sched_arinc653.c:29:#include <xen/sched-if.h>
common/sched_credit.c:18:#include <xen/sched-if.h>
common/sched_credit2.c:21:#include <xen/sched-if.h>
common/sched_null.c:32:#include <xen/sched-if.h>
common/sched_rt.c:23:#include <xen/sched-if.h>
common/schedule.c:26:#include <xen/sched-if.h>
include/asm-x86/cpuidle.h:7:#include <xen/sched-if.h>

and this change is making the situation worse.

If at all possible, I'd prefer to see about disentangling the bits which
actually need external use, and putting them in sched.h, and making
sched-if.h properly private to the schedulers.  I actually even started
a cleanup series which moved all of the scheduler infrastructure into
common/sched/, but found a disappointing quantity of sched-if.h being
referenced externally.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 2843 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-03-29 18:16 ` Dario Faggioli
@ 2019-03-30  9:55   ` Juergen Gross
  2019-04-11  0:34     ` [Xen-devel] " Dario Faggioli
  1 sibling, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-30  9:55 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jan Beulich,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Robert VanVossen, Julien Grall, Paul Durrant, Josh Whitehead,
	Meng Xu, Jun Nakajima, Ian Jackson, Roger Pau Monné

On 29/03/2019 19:16, Dario Faggioli wrote:
> Even if I've only skimmed through it... cool series! :-D
> 
> On Fri, 2019-03-29 at 16:08 +0100, Juergen Gross wrote:
>>
>> I have done some very basic performance testing: on a 4 cpu system
>> (2 cores with 2 threads each) I did a "make -j 4" for building the
>> Xen
>> hypervisor. With This test has been run on dom0, once with no other
>> guest active and once with another guest with 4 vcpus running the
>> same
>> test. The results are (always elapsed time, system time, user time):
>>
>> sched_granularity=thread, no other guest: 116.10 177.65 207.84
>> sched_granularity=core,   no other guest: 114.04 175.47 207.45
>> sched_granularity=thread, other guest:    202.30 334.21 384.63
>> sched_granularity=core,   other guest:    207.24 293.04 371.37
>>
> So, just to be sure I'm reading this properly,
> "sched_granularity=thread" means no co-scheduling of any sort is in
> effect, right? Basically the patch series is applied, but "not used",
> correct?

Yes.

> If yes, these are interesting, and promising, numbers. :-)
> 
>> All tests have been performed with credit2, the other schedulers are
>> untested up to now.
>>
> Just as an heads up for people (as Juergen knows this already :-D), I'm
> planning to run some performance evaluation of this patches.
> 
> I've got an 8 CPUs system (4 cores, 2 threads each, no-NUMA) and an 16
> CPUs system (2 sockets/NUMA nodes, 4 cores each, 2 threads each) on
> which I should be able to get some bench suite running relatively easy
> and (hopefully) quick.
> 
> I'm planning to evaluate:
> - vanilla (i.e., without this series), SMT enabled in BIOS
> - vanilla (i.e., without this series), SMT disabled in BIOS
> - patched (i.e., with this series), granularity=thread
> - patched (i.e., with this series), granularity=core
> 
> I'll do start with no overcommitment, and then move to 2x
> overcommitment (as you did above).

Thanks, I appreciate that!


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item
  2019-03-29 21:33   ` Andrew Cooper
@ 2019-03-30  9:59     ` Juergen Gross
  2019-04-01  5:59       ` Juergen Gross
  2019-04-01  8:01       ` Jan Beulich
  0 siblings, 2 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-30  9:59 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Dario Faggioli,
	Julien Grall, Meng Xu, Jan Beulich, Roger Pau Monné

On 29/03/2019 22:33, Andrew Cooper wrote:
> On 29/03/2019 15:09, Juergen Gross wrote:
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index 8d579e2cf9..5d8f3255cb 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -15,6 +15,7 @@
>>  #include <xen/lib.h>
>>  #include <xen/errno.h>
>>  #include <xen/sched.h>
>> +#include <xen/sched-if.h>
>>  #include <xen/domain.h>
>>  #include <xen/smp.h>
>>  #include <xen/delay.h>
> 
> I'm afraid that this feels like a step in the wrong direction.
> 
> sched-if.h is (per the comments) supposed to be the schedulers
> private.h, with the intention that struct scheduler didn't leak into the
> rest of the codebase.  Also the logic for taking scheduler locks, etc,
> and lastly for cpumask_scratch, which really is unsafe to use outside of
> the scheduler (and has come up in several recent patch series).
> 
> Sadly,
> 
> andrewcoop@andrewcoop:/local/xen.git/xen$ git grep sched-if
> arch/x86/acpi/cpu_idle.c:41:#include <xen/sched-if.h>
> arch/x86/cpu/mcheck/mce.c:13:#include <xen/sched-if.h>
> arch/x86/cpu/mcheck/mctelem.c:21:#include <xen/sched-if.h>
> arch/x86/dom0_build.c:12:#include <xen/sched-if.h>
> arch/x86/setup.c:6:#include <xen/sched-if.h>
> arch/x86/smpboot.c:28:#include <xen/sched-if.h>
> common/cpupool.c:19:#include <xen/sched-if.h>
> common/domain.c:13:#include <xen/sched-if.h>
> common/domctl.c:14:#include <xen/sched-if.h>
> common/sched_arinc653.c:29:#include <xen/sched-if.h>
> common/sched_credit.c:18:#include <xen/sched-if.h>
> common/sched_credit2.c:21:#include <xen/sched-if.h>
> common/sched_null.c:32:#include <xen/sched-if.h>
> common/sched_rt.c:23:#include <xen/sched-if.h>
> common/schedule.c:26:#include <xen/sched-if.h>
> include/asm-x86/cpuidle.h:7:#include <xen/sched-if.h>
> 
> and this change is making the situation worse.
> 
> If at all possible, I'd prefer to see about disentangling the bits which
> actually need external use, and putting them in sched.h, and making
> sched-if.h properly private to the schedulers.  I actually even started
> a cleanup series which moved all of the scheduler infrastructure into
> common/sched/, but found a disappointing quantity of sched-if.h being
> referenced externally.

I can add something like that to my series if you want. So:

- moving schedule.c, sched_*.c and cpupool.c to common/sched/
- move stuff from sched-if.h to sched.h if needed outside of
  common/sched/
- move sched-if.h to common/sched/


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 31/49] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers
  2019-03-29 19:36   ` Andrew Cooper
@ 2019-03-30 10:22     ` Juergen Gross
  2019-04-01  8:10       ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-03-30 10:22 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Robert VanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich

On 29/03/2019 20:36, Andrew Cooper wrote:
> On 29/03/2019 15:09, Juergen Gross wrote:
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index 9b5527c1eb..0b5e5e566b 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -347,7 +347,7 @@ int sched_init_vcpu(struct vcpu *v)
>>      if ( (item = sched_alloc_item(v)) == NULL )
>>          return 1;
>>  
>> -    if ( is_idle_domain(d) || d->is_pinned )
>> +    if ( is_idle_domain(d) )
>>          processor = v->vcpu_id;
>>      else
>>          processor = sched_select_initial_cpu(v);
> 
> This looks like a spurious change.  I also don't an obvious other patch
> that it might fit into.

That should be in patch 30.

> As for the field itself, it is also fairly objectionable.  It is only
> ever set for the hardware domain, and only if dom0_vcpus_pin is used,
> but the actual pinning information is also reflected in dom0's hard
> affinity mask.

Right.

> In practice, all this flag does is permit the use of VCPUOP_get_physid,
> disallow the use of vcpu_set_hard_affinity(), and allow dom0 to attempt
> to actually write to MSR_AMD64_NB_CFG, MSR_FAM10H_MMIO_CONF_BASE,
> MSR_IA32_UCODE_REV, MSR_IA32_THERM_CONTROL and
> MSR_IA32_ENERGY_PERF_BIAS, rather than having the write silently discarded.
> 
> Dom0's use of those MSRs is dubious at best, and disabled by default,
> *and* when active, also cross-checks with the hard affinity mask.  Does
> anyone use dom0_vcpus_pin in production?

I have seen it on customer systems.

> I think there is quite a lot of value in getting rid of d->is_pinned and
> is_pinned_vcpu() entirely, with will remove an extreme
> corner-case-x86-ism out of the common code.

Right. I don't see a reason why we need anything else but the hard
affinity for this option.

Allowing to re-pin (or unpin) vcpus should be fine. And use of the
said MSRs could be restricted to the correct hard affinity being
active.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 30/49] xen: let vcpu_create() select processor
  2019-03-29 19:17   ` Andrew Cooper
@ 2019-03-30 10:23     ` Juergen Gross
  0 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-03-30 10:23 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Dario Faggioli,
	Julien Grall, Jan Beulich, Roger Pau Monné

On 29/03/2019 20:17, Andrew Cooper wrote:
> On 29/03/2019 15:09, Juergen Gross wrote:
>> Today there are two distinct scenarios for vcpu_create(): either for
>> creation of idle-domain vcpus (vcpuid == processor) or for creation of
>> "normal" domain vcpus (including dom0), where the caller selects the
>> initial processor on a round-robin scheme of the allowed processors
>> (allowed being based on cpupool and affinities).
>>
>> Instead of passing the initial processor to vcpu_create() and passing
>> on to sched_init_vcpu() let sched_init_vcpu() do the processor
>> selection. For supporting dom0 vcpu creation use the node_affinity of
>> the domain as a base for selecting the processors. User domains will
>> have initially all nodes set, so this is no different behavior compared
>> to today.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Good riddance to the parameter!  This will definitely simply some of my
> further domcreate changes.
> 
>> index d9836779d1..d5294b0d26 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -1986,12 +1986,11 @@ static int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
>>      }
>>  #endif
>>  
>> -    for ( i = 1, cpu = 0; i < d->max_vcpus; i++ )
>> +    for ( i = 1; i < d->max_vcpus; i++ )
>>      {
>> -        cpu = cpumask_cycle(cpu, &cpu_online_map);
>> -        if ( vcpu_create(d, i, cpu) == NULL )
>> +        if ( vcpu_create(d, i) == NULL )
>>          {
>> -            printk("Failed to allocate dom0 vcpu %d on pcpu %d\n", i, cpu);
>> +            printk("Failed to allocate dom0 vcpu %d\n", i);
> 
> Mind adjusting this to d0v%u as it is changing anyway?

Okay.

> 
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index ae2a6d0323..9b5527c1eb 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -318,14 +318,40 @@ static struct sched_item *sched_alloc_item(struct vcpu *v)
>>      return NULL;
>>  }
>>  
>> -int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>> +static unsigned int sched_select_initial_cpu(struct vcpu *v)
>> +{
>> +    struct domain *d = v->domain;
>> +    nodeid_t node;
>> +    cpumask_t cpus;
>> +
>> +    cpumask_clear(&cpus);
>> +    for_each_node_mask ( node, d->node_affinity )
>> +        cpumask_or(&cpus, &cpus, &node_to_cpumask(node));
>> +    cpumask_and(&cpus, &cpus, cpupool_domain_cpumask(d));
>> +    if ( cpumask_empty(&cpus) )
>> +        cpumask_copy(&cpus, cpupool_domain_cpumask(d));
>> +
>> +    if ( v->vcpu_id == 0 )
>> +        return cpumask_first(&cpus);
>> +
>> +    /* We can rely on previous vcpu being available. */
> 
> Only if you ASSERT(!is_idle_domain(d)), which is safe given the sole caller.
> 
> idle->vcpu[] can be sparse in some corner cases.
> 
> Ideally with both of these suggestions, Acked-by: Andrew Cooper
> <andrew.cooper3@citrix.com>

Thanks.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces
  2019-03-29 18:42   ` Andrew Cooper
@ 2019-03-30 10:24     ` Juergen Gross
  2019-04-01  6:06       ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-03-30 10:24 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Tim Deegan, Robert VanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich

On 29/03/2019 19:42, Andrew Cooper wrote:
> On 29/03/2019 15:08, Juergen Gross wrote:
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index 6b5d454630..d1a958143a 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -256,6 +256,7 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
>>  int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>>  {
>>      struct domain *d = v->domain;
>> +    struct sched_item item = { .vcpu = v };
>>  
>>      v->processor = processor;
>>  
>> @@ -267,7 +268,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>>      init_timer(&v->poll_timer, poll_timer_fn,
>>                 v, v->processor);
>>  
>> -    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, v,
>> +    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, &item,
>>                       d->sched_priv);
> 
> I realise this is perhaps an over-the-top request, but can we see about
> doing more here?
> 
> SCHED_OP() is a thoroughly objectionable piece of obfuscation, which
> breaks cscope/ctags and also results in especially poor code generation.
> 
> Given that we are changing the interface anyway and touching all
> codepaths, would you mind also adding static inline wrappers like I
> started with 340edc3 ?

Okay, I'll do that.

> TBH, I'm even happy to give this a go and give you the back the
> resulting tree, if you'd prefer.

I think its is easier to do it myself, as I'm touching nearly all of
the call sites anyway.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item
  2019-03-30  9:59     ` Juergen Gross
@ 2019-04-01  5:59       ` Juergen Gross
  2019-04-01  8:05         ` Jan Beulich
  2019-04-01  8:01       ` Jan Beulich
  1 sibling, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-04-01  5:59 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Tim Deegan, Dario Faggioli,
	Julien Grall, Meng Xu, Jan Beulich, Roger Pau Monné

On 30/03/2019 10:59, Juergen Gross wrote:
> On 29/03/2019 22:33, Andrew Cooper wrote:
>> On 29/03/2019 15:09, Juergen Gross wrote:
>>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>>> index 8d579e2cf9..5d8f3255cb 100644
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -15,6 +15,7 @@
>>>  #include <xen/lib.h>
>>>  #include <xen/errno.h>
>>>  #include <xen/sched.h>
>>> +#include <xen/sched-if.h>
>>>  #include <xen/domain.h>
>>>  #include <xen/smp.h>
>>>  #include <xen/delay.h>
>>
>> I'm afraid that this feels like a step in the wrong direction.
>>
>> sched-if.h is (per the comments) supposed to be the schedulers
>> private.h, with the intention that struct scheduler didn't leak into the
>> rest of the codebase.  Also the logic for taking scheduler locks, etc,
>> and lastly for cpumask_scratch, which really is unsafe to use outside of
>> the scheduler (and has come up in several recent patch series).
>>
>> Sadly,
>>
>> andrewcoop@andrewcoop:/local/xen.git/xen$ git grep sched-if
>> arch/x86/acpi/cpu_idle.c:41:#include <xen/sched-if.h>
>> arch/x86/cpu/mcheck/mce.c:13:#include <xen/sched-if.h>
>> arch/x86/cpu/mcheck/mctelem.c:21:#include <xen/sched-if.h>
>> arch/x86/dom0_build.c:12:#include <xen/sched-if.h>
>> arch/x86/setup.c:6:#include <xen/sched-if.h>
>> arch/x86/smpboot.c:28:#include <xen/sched-if.h>
>> common/cpupool.c:19:#include <xen/sched-if.h>
>> common/domain.c:13:#include <xen/sched-if.h>
>> common/domctl.c:14:#include <xen/sched-if.h>
>> common/sched_arinc653.c:29:#include <xen/sched-if.h>
>> common/sched_credit.c:18:#include <xen/sched-if.h>
>> common/sched_credit2.c:21:#include <xen/sched-if.h>
>> common/sched_null.c:32:#include <xen/sched-if.h>
>> common/sched_rt.c:23:#include <xen/sched-if.h>
>> common/schedule.c:26:#include <xen/sched-if.h>
>> include/asm-x86/cpuidle.h:7:#include <xen/sched-if.h>
>>
>> and this change is making the situation worse.
>>
>> If at all possible, I'd prefer to see about disentangling the bits which
>> actually need external use, and putting them in sched.h, and making
>> sched-if.h properly private to the schedulers.  I actually even started
>> a cleanup series which moved all of the scheduler infrastructure into
>> common/sched/, but found a disappointing quantity of sched-if.h being
>> referenced externally.
> 
> I can add something like that to my series if you want. So:
> 
> - moving schedule.c, sched_*.c and cpupool.c to common/sched/
> - move stuff from sched-if.h to sched.h if needed outside of
>   common/sched/
> - move sched-if.h to common/sched/

Questions to especially the scheduler maintainers and "the REST": should
we move the scheduler stuff to xen/common/sched/ or would /xen/sched/ be
more appropriate?

Maybe it would be worthwhile to move e.g. the context switching from
xen/arch/*/domain.c to xen/sched/context_<arch>.c? I think this code is
rather scheduler related and moving it to the sched directory might help
hiding some scheduler internals from other sources, especially with my
core scheduling series. IMO this would make the xen/sched/ directory the
preferred one.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces
  2019-03-30 10:24     ` Juergen Gross
@ 2019-04-01  6:06       ` Juergen Gross
  2019-04-01  7:05         ` Dario Faggioli
  0 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-04-01  6:06 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Robert VanVossen,
	Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu,
	Jan Beulich

On 30/03/2019 11:24, Juergen Gross wrote:
> On 29/03/2019 19:42, Andrew Cooper wrote:
>> On 29/03/2019 15:08, Juergen Gross wrote:
>>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>>> index 6b5d454630..d1a958143a 100644
>>> --- a/xen/common/schedule.c
>>> +++ b/xen/common/schedule.c
>>> @@ -256,6 +256,7 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
>>>  int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>>>  {
>>>      struct domain *d = v->domain;
>>> +    struct sched_item item = { .vcpu = v };
>>>  
>>>      v->processor = processor;
>>>  
>>> @@ -267,7 +268,7 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>>>      init_timer(&v->poll_timer, poll_timer_fn,
>>>                 v, v->processor);
>>>  
>>> -    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, v,
>>> +    v->sched_priv = SCHED_OP(dom_scheduler(d), alloc_vdata, &item,
>>>                       d->sched_priv);
>>
>> I realise this is perhaps an over-the-top request, but can we see about
>> doing more here?
>>
>> SCHED_OP() is a thoroughly objectionable piece of obfuscation, which
>> breaks cscope/ctags and also results in especially poor code generation.
>>
>> Given that we are changing the interface anyway and touching all
>> codepaths, would you mind also adding static inline wrappers like I
>> started with 340edc3 ?
> 
> Okay, I'll do that.
> 
>> TBH, I'm even happy to give this a go and give you the back the
>> resulting tree, if you'd prefer.
> 
> I think its is easier to do it myself, as I'm touching nearly all of
> the call sites anyway.

And another thought I had: with RETPOLINE indirect jumps are even more
expensive. Would it be a good idea to remove the function pointers from
struct scheduler and generate the inline wrappers at build time? The
wrappers could then call the related specific scheduler function based
on the scheduler Id using a chain of if ... else if ... statements. It
would prefer the default scheduler over the others and test only for
configured schedulers. Scheduler registration could be done the same way
removing the need for an extra link section.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
                   ` (52 preceding siblings ...)
  2019-03-29 18:16 ` Dario Faggioli
@ 2019-04-01  6:41 ` Jan Beulich
       [not found] ` <5CA1B285020000780022361D@suse.com>
  54 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2019-04-01  6:41 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Paul Durrant,
	Joshua Whitehead, Meng Xu, Jun Nakajima, xen-devel,
	Roger Pau Monne

>>> On 29.03.19 at 16:08, <jgross@suse.com> wrote:
> Via boot parameter sched_granularity=core (or sched_granularity=socket)
> it is possible to change the scheduling granularity from thread (the
> default) to either whole cores or even sockets.

One further general question came to mind: How about also having
"sched-granularity=thread" (or "...=none") to retain current
behavior, at least to have an easy way to compare effects if
wanted? But perhaps also to allow to deal with potential resources
wasting configurations like having mostly VMs with e.g. an odd
number of vCPU-s.

The other question of course is whether the terms thread, core,
and socket are generic enough to be used in architecture
independent code. Even on x86 it already leaves out / unclear
where / how e.g. AMD's compute units would be classified. I
don't have any good suggestion for abstraction, so possibly
the terms used may want to become arch-specific.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
       [not found] ` <5CA1B285020000780022361D@suse.com>
@ 2019-04-01  6:49   ` Juergen Gross
  2019-04-01  7:10     ` Dario Faggioli
  2019-04-01  7:13     ` Jan Beulich
  0 siblings, 2 replies; 111+ messages in thread
From: Juergen Gross @ 2019-04-01  6:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Paul Durrant,
	Joshua Whitehead, Meng Xu, Jun Nakajima, xen-devel,
	Roger Pau Monne

On 01/04/2019 08:41, Jan Beulich wrote:
>>>> On 29.03.19 at 16:08, <jgross@suse.com> wrote:
>> Via boot parameter sched_granularity=core (or sched_granularity=socket)
>> it is possible to change the scheduling granularity from thread (the
>> default) to either whole cores or even sockets.
> 
> One further general question came to mind: How about also having
> "sched-granularity=thread" (or "...=none") to retain current
> behavior, at least to have an easy way to compare effects if
> wanted? But perhaps also to allow to deal with potential resources
> wasting configurations like having mostly VMs with e.g. an odd
> number of vCPU-s.

Fine with me.

> The other question of course is whether the terms thread, core,
> and socket are generic enough to be used in architecture
> independent code. Even on x86 it already leaves out / unclear
> where / how e.g. AMD's compute units would be classified. I
> don't have any good suggestion for abstraction, so possibly
> the terms used may want to become arch-specific.

I followed the already known terms from the credit2_runqueue
parameter. I think they should match. Which would call for
"sched-granularity=cpu" instead of "thread".


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces
  2019-04-01  6:06       ` Juergen Gross
@ 2019-04-01  7:05         ` Dario Faggioli
  2019-04-01  8:19           ` Andrew Cooper
  0 siblings, 1 reply; 111+ messages in thread
From: Dario Faggioli @ 2019-04-01  7:05 UTC (permalink / raw)
  To: Juergen Gross, Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Tim Deegan, Robert VanVossen,
	Julien Grall, Josh Whitehead, Meng Xu, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1318 bytes --]

On Mon, 2019-04-01 at 08:06 +0200, Juergen Gross wrote:
> On 30/03/2019 11:24, Juergen Gross wrote:
> > I think its is easier to do it myself, as I'm touching nearly all
> > of
> > the call sites anyway.
> 
> And another thought I had: with RETPOLINE indirect jumps are even
> more
> expensive. Would it be a good idea to remove the function pointers
> from
> struct scheduler and generate the inline wrappers at build time?
>
Yep, I was thinking about doing something like that already,
independently from this feature/series.

At least something that special case the configured default scheduler,
and let its hooks be called without indirect jumps (i.e., similarly to
what's being done in Linux, in quite a few places, these days).

> The
> wrappers could then call the related specific scheduler function
> based
> on the scheduler Id using a chain of if ... else if ... statements. 
>
I guess we'd have to see how the final code will look, but I like the
idea, and I think it's well worth a try.

Regards,
Dario
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-04-01  6:49   ` Juergen Gross
@ 2019-04-01  7:10     ` Dario Faggioli
  2019-04-01  7:15       ` Juergen Gross
  2019-04-01  7:13     ` Jan Beulich
  1 sibling, 1 reply; 111+ messages in thread
From: Dario Faggioli @ 2019-04-01  7:10 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Robert VanVossen,
	Julien Grall, Paul Durrant, Joshua Whitehead, Meng Xu,
	Jun Nakajima, xen-devel, Ian Jackson, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 1805 bytes --]

On Mon, 2019-04-01 at 08:49 +0200, Juergen Gross wrote:
> On 01/04/2019 08:41, Jan Beulich wrote:
> > One further general question came to mind: How about also having
> > "sched-granularity=thread" (or "...=none") to retain current
> > behavior, at least to have an easy way to compare effects if
> > wanted? But perhaps also to allow to deal with potential resources
> > wasting configurations like having mostly VMs with e.g. an odd
> > number of vCPU-s.
> 
> Fine with me.
> 
Mmm... I'm still in the process of looking at the patches, so there
might be something I'm missing, but, from the descriptions and from
talking to you (Juergen), I was assuming that to be the case already...
isn't it so?

> > The other question of course is whether the terms thread, core,
> > and socket are generic enough to be used in architecture
> > independent code. Even on x86 it already leaves out / unclear
> > where / how e.g. AMD's compute units would be classified. I
> > don't have any good suggestion for abstraction, so possibly
> > the terms used may want to become arch-specific.
> 
> I followed the already known terms from the credit2_runqueue
> parameter. I think they should match. Which would call for
> "sched-granularity=cpu" instead of "thread".
> 
Yep, I'd go for cpu. Both for, as you said, consistency and also
because I can envision "granularity=thread" being mistaken/interpreted
as a form of "thread aware co-scheduling" (i.e., what
"granularity=core" actually does! :-O)

Regards,
Dario
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-04-01  6:49   ` Juergen Gross
  2019-04-01  7:10     ` Dario Faggioli
@ 2019-04-01  7:13     ` Jan Beulich
  1 sibling, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2019-04-01  7:13 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini, Juergen Gross
  Cc: Tim Deegan, Kevin Tian, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Robert VanVossen,
	Dario Faggioli, Paul Durrant, Joshua Whitehead, Meng Xu,
	Jun Nakajima, xen-devel, Roger Pau Monne

>>> On 01.04.19 at 08:49, <jgross@suse.com> wrote:
> On 01/04/2019 08:41, Jan Beulich wrote:
>>>>> On 29.03.19 at 16:08, <jgross@suse.com> wrote:
>>> Via boot parameter sched_granularity=core (or sched_granularity=socket)
>>> it is possible to change the scheduling granularity from thread (the
>>> default) to either whole cores or even sockets.
>> 
>> One further general question came to mind: How about also having
>> "sched-granularity=thread" (or "...=none") to retain current
>> behavior, at least to have an easy way to compare effects if
>> wanted? But perhaps also to allow to deal with potential resources
>> wasting configurations like having mostly VMs with e.g. an odd
>> number of vCPU-s.
> 
> Fine with me.
> 
>> The other question of course is whether the terms thread, core,
>> and socket are generic enough to be used in architecture
>> independent code. Even on x86 it already leaves out / unclear
>> where / how e.g. AMD's compute units would be classified. I
>> don't have any good suggestion for abstraction, so possibly
>> the terms used may want to become arch-specific.
> 
> I followed the already known terms from the credit2_runqueue
> parameter. I think they should match. Which would call for
> "sched-granularity=cpu" instead of "thread".

"cpu" is fine of course. I wonder though whether the other two
were a good choice for "credit2_runqueue". Stefano, Julien -
is this terminology at least half way suitable for Arm?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
  2019-04-01  7:10     ` Dario Faggioli
@ 2019-04-01  7:15       ` Juergen Gross
  0 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-04-01  7:15 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Robert VanVossen,
	Julien Grall, Paul Durrant, Joshua Whitehead, Meng Xu,
	Jun Nakajima, xen-devel, Ian Jackson, Roger Pau Monne

On 01/04/2019 09:10, Dario Faggioli wrote:
> On Mon, 2019-04-01 at 08:49 +0200, Juergen Gross wrote:
>> On 01/04/2019 08:41, Jan Beulich wrote:
>>> One further general question came to mind: How about also having
>>> "sched-granularity=thread" (or "...=none") to retain current
>>> behavior, at least to have an easy way to compare effects if
>>> wanted? But perhaps also to allow to deal with potential resources
>>> wasting configurations like having mostly VMs with e.g. an odd
>>> number of vCPU-s.
>>
>> Fine with me.
>>
> Mmm... I'm still in the process of looking at the patches, so there
> might be something I'm missing, but, from the descriptions and from
> talking to you (Juergen), I was assuming that to be the case already...
> isn't it so?

Yes, it is.

I understood Jan to ask for a special parameter value for that.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item
  2019-03-30  9:59     ` Juergen Gross
  2019-04-01  5:59       ` Juergen Gross
@ 2019-04-01  8:01       ` Jan Beulich
  2019-04-01  8:33         ` Andrew Cooper
  1 sibling, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2019-04-01  8:01 UTC (permalink / raw)
  To: Andrew Cooper, Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Dario Faggioli,
	Julien Grall, Meng Xu, xen-devel, Roger Pau Monne

>>> On 30.03.19 at 10:59, <jgross@suse.com> wrote:
> On 29/03/2019 22:33, Andrew Cooper wrote:
>> If at all possible, I'd prefer to see about disentangling the bits which
>> actually need external use, and putting them in sched.h, and making
>> sched-if.h properly private to the schedulers.  I actually even started
>> a cleanup series which moved all of the scheduler infrastructure into
>> common/sched/, but found a disappointing quantity of sched-if.h being
>> referenced externally.
> 
> I can add something like that to my series if you want. So:
> 
> - moving schedule.c, sched_*.c and cpupool.c to common/sched/
> - move stuff from sched-if.h to sched.h if needed outside of
>   common/sched/
> - move sched-if.h to common/sched/

Before further general code re-work gets added to this already
long series, may I ask what the backporting intentions are?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item
  2019-04-01  5:59       ` Juergen Gross
@ 2019-04-01  8:05         ` Jan Beulich
  2019-04-01  8:26           ` Andrew Cooper
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2019-04-01  8:05 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, Meng Xu, xen-devel, Roger Pau Monne

>>> On 01.04.19 at 07:59, <jgross@suse.com> wrote:
> On 30/03/2019 10:59, Juergen Gross wrote:
>> On 29/03/2019 22:33, Andrew Cooper wrote:
>>> If at all possible, I'd prefer to see about disentangling the bits which
>>> actually need external use, and putting them in sched.h, and making
>>> sched-if.h properly private to the schedulers.  I actually even started
>>> a cleanup series which moved all of the scheduler infrastructure into
>>> common/sched/, but found a disappointing quantity of sched-if.h being
>>> referenced externally.
>> 
>> I can add something like that to my series if you want. So:
>> 
>> - moving schedule.c, sched_*.c and cpupool.c to common/sched/
>> - move stuff from sched-if.h to sched.h if needed outside of
>>   common/sched/
>> - move sched-if.h to common/sched/
> 
> Questions to especially the scheduler maintainers and "the REST": should
> we move the scheduler stuff to xen/common/sched/ or would /xen/sched/ be
> more appropriate?
> 
> Maybe it would be worthwhile to move e.g. the context switching from
> xen/arch/*/domain.c to xen/sched/context_<arch>.c? I think this code is
> rather scheduler related and moving it to the sched directory might help
> hiding some scheduler internals from other sources, especially with my
> core scheduling series. IMO this would make the xen/sched/ directory the
> preferred one.

FWIW, I don't really mind such a move as long as it won't result in
then having to expose various arch-internals just to make them
usable from xen/sched/context_<arch>.c (or whatever it's going
to be named - the name is a little longish for my taste).

But may I recommend not to do too many things all in one go?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 31/49] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers
  2019-03-30 10:22     ` Juergen Gross
@ 2019-04-01  8:10       ` Jan Beulich
  0 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2019-04-01  8:10 UTC (permalink / raw)
  To: Andrew Cooper, Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Robert VanVossen,
	Dario Faggioli, Julien Grall, Joshua Whitehead, Meng Xu,
	xen-devel

>>> On 30.03.19 at 11:22, <jgross@suse.com> wrote:
> On 29/03/2019 20:36, Andrew Cooper wrote:
>> In practice, all this flag does is permit the use of VCPUOP_get_physid,
>> disallow the use of vcpu_set_hard_affinity(), and allow dom0 to attempt
>> to actually write to MSR_AMD64_NB_CFG, MSR_FAM10H_MMIO_CONF_BASE,
>> MSR_IA32_UCODE_REV, MSR_IA32_THERM_CONTROL and
>> MSR_IA32_ENERGY_PERF_BIAS, rather than having the write silently discarded.
>> 
>> Dom0's use of those MSRs is dubious at best, and disabled by default,
>> *and* when active, also cross-checks with the hard affinity mask.  Does
>> anyone use dom0_vcpus_pin in production?
> 
> I have seen it on customer systems.

Same here, but I've never seen it used for a good reason.

>> I think there is quite a lot of value in getting rid of d->is_pinned and
>> is_pinned_vcpu() entirely, with will remove an extreme
>> corner-case-x86-ism out of the common code.

I think its origin was "cpufreq=dom0-kernel", which I think should go
away with it then.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces
  2019-04-01  7:05         ` Dario Faggioli
@ 2019-04-01  8:19           ` Andrew Cooper
  2019-04-01  8:49             ` Juergen Gross
  2019-04-01 15:15             ` Dario Faggioli
  0 siblings, 2 replies; 111+ messages in thread
From: Andrew Cooper @ 2019-04-01  8:19 UTC (permalink / raw)
  To: Dario Faggioli, Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Tim Deegan, Robert VanVossen,
	Julien Grall, Josh Whitehead, Meng Xu, Jan Beulich

On 01/04/2019 08:05, Dario Faggioli wrote:
> On Mon, 2019-04-01 at 08:06 +0200, Juergen Gross wrote:
>> On 30/03/2019 11:24, Juergen Gross wrote:
>>> I think its is easier to do it myself, as I'm touching nearly all
>>> of
>>> the call sites anyway.
>> And another thought I had: with RETPOLINE indirect jumps are even
>> more
>> expensive. Would it be a good idea to remove the function pointers
>> from
>> struct scheduler and generate the inline wrappers at build time?
>>
> Yep, I was thinking about doing something like that already,
> independently from this feature/series.
>
> At least something that special case the configured default scheduler,
> and let its hooks be called without indirect jumps (i.e., similarly to
> what's being done in Linux, in quite a few places, these days).
>
>> The
>> wrappers could then call the related specific scheduler function
>> based
>> on the scheduler Id using a chain of if ... else if ... statements. 
>>
> I guess we'd have to see how the final code will look, but I like the
> idea, and I think it's well worth a try.

Jan has a series in progress which does do some manual devirtualisation
across Xen.

The scheduler is harder though - we've got the default scheduler which
is overwhelmingly likely to be the target of the call, but not always
guaranteed.

Normally, the result is put together with PGO rather than manually,
because the effects are quite subtle.

The base case which might be good enough for Xen is:

if ( sched == default )
    sched_foo();
else
    sched->foo();

which for the common case of the default cpupool only, or multiple
groups with the same scheduler, will always take the direct path rather
than the indirect path.

Beyond that, the best length of the if/else chain can only reasonably be
determined with profiling.  It depends on the relative frequencies of
each call, and blindly doing an if/else chain to the end of the
scheduler list will probably make worse performance if you're using the
final scheduler than using a retpoline would.  Furthermore, on future
fixed hardware, using indirect calls will become the quicker option again.

I think its useful to consider optimisations potential optimisations,
but I'd advise against trying to merge everything into this series.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item
  2019-04-01  8:05         ` Jan Beulich
@ 2019-04-01  8:26           ` Andrew Cooper
  2019-04-01  8:41             ` Jan Beulich
  2019-04-01  8:45             ` Juergen Gross
  0 siblings, 2 replies; 111+ messages in thread
From: Andrew Cooper @ 2019-04-01  8:26 UTC (permalink / raw)
  To: Jan Beulich, Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Dario Faggioli,
	Julien Grall, Meng Xu, xen-devel, Roger Pau Monne

On 01/04/2019 09:05, Jan Beulich wrote:
>>>> On 01.04.19 at 07:59, <jgross@suse.com> wrote:
>> On 30/03/2019 10:59, Juergen Gross wrote:
>>> On 29/03/2019 22:33, Andrew Cooper wrote:
>>>> If at all possible, I'd prefer to see about disentangling the bits which
>>>> actually need external use, and putting them in sched.h, and making
>>>> sched-if.h properly private to the schedulers.  I actually even started
>>>> a cleanup series which moved all of the scheduler infrastructure into
>>>> common/sched/, but found a disappointing quantity of sched-if.h being
>>>> referenced externally.
>>> I can add something like that to my series if you want. So:
>>>
>>> - moving schedule.c, sched_*.c and cpupool.c to common/sched/
>>> - move stuff from sched-if.h to sched.h if needed outside of
>>>   common/sched/
>>> - move sched-if.h to common/sched/
>> Questions to especially the scheduler maintainers and "the REST": should
>> we move the scheduler stuff to xen/common/sched/ or would /xen/sched/ be
>> more appropriate?
>>
>> Maybe it would be worthwhile to move e.g. the context switching from
>> xen/arch/*/domain.c to xen/sched/context_<arch>.c? I think this code is
>> rather scheduler related and moving it to the sched directory might help
>> hiding some scheduler internals from other sources, especially with my
>> core scheduling series. IMO this would make the xen/sched/ directory the
>> preferred one.
> FWIW, I don't really mind such a move as long as it won't result in
> then having to expose various arch-internals just to make them
> usable from xen/sched/context_<arch>.c (or whatever it's going
> to be named - the name is a little longish for my taste).
>
> But may I recommend not to do too many things all in one go?

While I did suggest this originally, I too am a little wary about the
scope creep.

If moving everything is too much, can we at least try to not make the
sched-if.h situation any worse than it currently is.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item
  2019-04-01  8:01       ` Jan Beulich
@ 2019-04-01  8:33         ` Andrew Cooper
  2019-04-01  8:44           ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Andrew Cooper @ 2019-04-01  8:33 UTC (permalink / raw)
  To: Jan Beulich, Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Dario Faggioli,
	Julien Grall, Meng Xu, xen-devel, Roger Pau Monne

On 01/04/2019 09:01, Jan Beulich wrote:
>>>> On 30.03.19 at 10:59, <jgross@suse.com> wrote:
>> On 29/03/2019 22:33, Andrew Cooper wrote:
>>> If at all possible, I'd prefer to see about disentangling the bits which
>>> actually need external use, and putting them in sched.h, and making
>>> sched-if.h properly private to the schedulers.  I actually even started
>>> a cleanup series which moved all of the scheduler infrastructure into
>>> common/sched/, but found a disappointing quantity of sched-if.h being
>>> referenced externally.
>> I can add something like that to my series if you want. So:
>>
>> - moving schedule.c, sched_*.c and cpupool.c to common/sched/
>> - move stuff from sched-if.h to sched.h if needed outside of
>>   common/sched/
>> - move sched-if.h to common/sched/
> Before further general code re-work gets added to this already
> long series, may I ask what the backporting intentions are?

I don't see this as a viable backport candidate.  It is a complete
rewrite of the scheduler interface, along with some substantial changes
on the context switch path.

People wanting to use it will almost certainly be safer moving to Xen 4.13.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item
  2019-04-01  8:26           ` Andrew Cooper
@ 2019-04-01  8:41             ` Jan Beulich
  2019-04-01  8:45             ` Juergen Gross
  1 sibling, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2019-04-01  8:41 UTC (permalink / raw)
  To: Andrew Cooper, Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Dario Faggioli,
	Julien Grall, Meng Xu, xen-devel, Roger Pau Monne

>>> On 01.04.19 at 10:26, <andrew.cooper3@citrix.com> wrote:
> While I did suggest this originally, I too am a little wary about the
> scope creep.
> 
> If moving everything is too much, can we at least try to not make the
> sched-if.h situation any worse than it currently is.

+1

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item
  2019-04-01  8:33         ` Andrew Cooper
@ 2019-04-01  8:44           ` Juergen Gross
  0 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-04-01  8:44 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Dario Faggioli,
	Julien Grall, Meng Xu, xen-devel, Roger Pau Monne

On 01/04/2019 10:33, Andrew Cooper wrote:
> On 01/04/2019 09:01, Jan Beulich wrote:
>>>>> On 30.03.19 at 10:59, <jgross@suse.com> wrote:
>>> On 29/03/2019 22:33, Andrew Cooper wrote:
>>>> If at all possible, I'd prefer to see about disentangling the bits which
>>>> actually need external use, and putting them in sched.h, and making
>>>> sched-if.h properly private to the schedulers.  I actually even started
>>>> a cleanup series which moved all of the scheduler infrastructure into
>>>> common/sched/, but found a disappointing quantity of sched-if.h being
>>>> referenced externally.
>>> I can add something like that to my series if you want. So:
>>>
>>> - moving schedule.c, sched_*.c and cpupool.c to common/sched/
>>> - move stuff from sched-if.h to sched.h if needed outside of
>>>   common/sched/
>>> - move sched-if.h to common/sched/
>> Before further general code re-work gets added to this already
>> long series, may I ask what the backporting intentions are?
> 
> I don't see this as a viable backport candidate.  It is a complete
> rewrite of the scheduler interface, along with some substantial changes
> on the context switch path.
> 
> People wanting to use it will almost certainly be safer moving to Xen 4.13.

I think so, too. OTOH doing the core scheduling series first and the
cleanup later won't have any negative impact IMO.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item
  2019-04-01  8:26           ` Andrew Cooper
  2019-04-01  8:41             ` Jan Beulich
@ 2019-04-01  8:45             ` Juergen Gross
  1 sibling, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-04-01  8:45 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Dario Faggioli,
	Julien Grall, Meng Xu, xen-devel, Roger Pau Monne

On 01/04/2019 10:26, Andrew Cooper wrote:
> On 01/04/2019 09:05, Jan Beulich wrote:
>>>>> On 01.04.19 at 07:59, <jgross@suse.com> wrote:
>>> On 30/03/2019 10:59, Juergen Gross wrote:
>>>> On 29/03/2019 22:33, Andrew Cooper wrote:
>>>>> If at all possible, I'd prefer to see about disentangling the bits which
>>>>> actually need external use, and putting them in sched.h, and making
>>>>> sched-if.h properly private to the schedulers.  I actually even started
>>>>> a cleanup series which moved all of the scheduler infrastructure into
>>>>> common/sched/, but found a disappointing quantity of sched-if.h being
>>>>> referenced externally.
>>>> I can add something like that to my series if you want. So:
>>>>
>>>> - moving schedule.c, sched_*.c and cpupool.c to common/sched/
>>>> - move stuff from sched-if.h to sched.h if needed outside of
>>>>   common/sched/
>>>> - move sched-if.h to common/sched/
>>> Questions to especially the scheduler maintainers and "the REST": should
>>> we move the scheduler stuff to xen/common/sched/ or would /xen/sched/ be
>>> more appropriate?
>>>
>>> Maybe it would be worthwhile to move e.g. the context switching from
>>> xen/arch/*/domain.c to xen/sched/context_<arch>.c? I think this code is
>>> rather scheduler related and moving it to the sched directory might help
>>> hiding some scheduler internals from other sources, especially with my
>>> core scheduling series. IMO this would make the xen/sched/ directory the
>>> preferred one.
>> FWIW, I don't really mind such a move as long as it won't result in
>> then having to expose various arch-internals just to make them
>> usable from xen/sched/context_<arch>.c (or whatever it's going
>> to be named - the name is a little longish for my taste).
>>
>> But may I recommend not to do too many things all in one go?
> 
> While I did suggest this originally, I too am a little wary about the
> scope creep.
> 
> If moving everything is too much, can we at least try to not make the
> sched-if.h situation any worse than it currently is.

Okay, I'll move the sched_item stuff into sched.h then. In case a later
cleanup succeeds in making it scheduler private it can be moved away
again.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces
  2019-04-01  8:19           ` Andrew Cooper
@ 2019-04-01  8:49             ` Juergen Gross
  2019-04-01 15:15             ` Dario Faggioli
  1 sibling, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-04-01  8:49 UTC (permalink / raw)
  To: Andrew Cooper, Dario Faggioli, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Tim Deegan, Robert VanVossen,
	Julien Grall, Josh Whitehead, Meng Xu, Jan Beulich

On 01/04/2019 10:19, Andrew Cooper wrote:
> On 01/04/2019 08:05, Dario Faggioli wrote:
>> On Mon, 2019-04-01 at 08:06 +0200, Juergen Gross wrote:
>>> On 30/03/2019 11:24, Juergen Gross wrote:
>>>> I think its is easier to do it myself, as I'm touching nearly all
>>>> of
>>>> the call sites anyway.
>>> And another thought I had: with RETPOLINE indirect jumps are even
>>> more
>>> expensive. Would it be a good idea to remove the function pointers
>>> from
>>> struct scheduler and generate the inline wrappers at build time?
>>>
>> Yep, I was thinking about doing something like that already,
>> independently from this feature/series.
>>
>> At least something that special case the configured default scheduler,
>> and let its hooks be called without indirect jumps (i.e., similarly to
>> what's being done in Linux, in quite a few places, these days).
>>
>>> The
>>> wrappers could then call the related specific scheduler function
>>> based
>>> on the scheduler Id using a chain of if ... else if ... statements. 
>>>
>> I guess we'd have to see how the final code will look, but I like the
>> idea, and I think it's well worth a try.
> 
> Jan has a series in progress which does do some manual devirtualisation
> across Xen.
> 
> The scheduler is harder though - we've got the default scheduler which
> is overwhelmingly likely to be the target of the call, but not always
> guaranteed.
> 
> Normally, the result is put together with PGO rather than manually,
> because the effects are quite subtle.
> 
> The base case which might be good enough for Xen is:
> 
> if ( sched == default )
>     sched_foo();
> else
>     sched->foo();
> 
> which for the common case of the default cpupool only, or multiple
> groups with the same scheduler, will always take the direct path rather
> than the indirect path.
> 
> Beyond that, the best length of the if/else chain can only reasonably be
> determined with profiling.  It depends on the relative frequencies of
> each call, and blindly doing an if/else chain to the end of the
> scheduler list will probably make worse performance if you're using the
> final scheduler than using a retpoline would.  Furthermore, on future
> fixed hardware, using indirect calls will become the quicker option again.
> 
> I think its useful to consider optimisations potential optimisations,
> but I'd advise against trying to merge everything into this series.

Fine with me.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 44/49] xen: round up max vcpus to scheduling granularity
  2019-03-29 15:09 ` [PATCH RFC 44/49] xen: round up max vcpus to scheduling granularity Juergen Gross
@ 2019-04-01  8:50   ` Andrew Cooper
  2019-04-01  9:47     ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Andrew Cooper @ 2019-04-01  8:50 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Julien Grall,
	Jan Beulich, Roger Pau Monné

On 29/03/2019 15:09, Juergen Gross wrote:
> Make sure the number of vcpus is always a multiple of the scheduling
> granularity. Note that we don't support a scheduling granularity above
> one on ARM.

I'm afraid that I don't think this is a clever move.  In turn, this
brings into question the approach to idle handling.

Firstly, with a proposed socket granularity, this would be 128 on some
systems which exist today.  Furthermore, consider the case where
cpupool0 has a granularity of 1, and a second pool has a granularity of
2.  A domain can be created with an odd number of vcpus and operate in
pool0 fine, but can't now be moved to pool1.

If at all possible, I think it would be better to try and reuse the idle
cpus for holes like this.  Seeing as you've been playing with this code
a lot, what is your assessment?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-03-29 15:08 ` [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier Juergen Gross
@ 2019-04-01  9:21   ` Julien Grall
  2019-04-01  9:40     ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2019-04-01  9:21 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Roger Pau Monné

Hi,

On 3/29/19 3:08 PM, Juergen Gross wrote:
> cpu_disable_scheduler() is being called from __cpu_disable() today.
> There is no need to execute it on the cpu just being disabled, so use
> the CPU_DEAD case of the cpu notifier chain. Moving the call out of
> stop_machine() context is fine, as we just need to hold the domain RCU
> lock and need the scheduler percpu data to be still allocated.
> 
> Add another hook for CPU_DOWN_PREPARE to bail out early in case
> cpu_disable_scheduler() would fail. This will avoid crashes in rare
> cases for cpu hotplug or suspend.
> 
> While at it remove a superfluous smp_mb() in the ARM __cpu_disable()
> incarnation.

This is not obvious why the smp_mb() is superfluous. Can you please 
provide more details on why this is not necessary?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-01  9:21   ` Julien Grall
@ 2019-04-01  9:40     ` Juergen Gross
  2019-04-01 10:29       ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-04-01  9:40 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Roger Pau Monné

On 01/04/2019 11:21, Julien Grall wrote:
> Hi,
> 
> On 3/29/19 3:08 PM, Juergen Gross wrote:
>> cpu_disable_scheduler() is being called from __cpu_disable() today.
>> There is no need to execute it on the cpu just being disabled, so use
>> the CPU_DEAD case of the cpu notifier chain. Moving the call out of
>> stop_machine() context is fine, as we just need to hold the domain RCU
>> lock and need the scheduler percpu data to be still allocated.
>>
>> Add another hook for CPU_DOWN_PREPARE to bail out early in case
>> cpu_disable_scheduler() would fail. This will avoid crashes in rare
>> cases for cpu hotplug or suspend.
>>
>> While at it remove a superfluous smp_mb() in the ARM __cpu_disable()
>> incarnation.
> 
> This is not obvious why the smp_mb() is superfluous. Can you please
> provide more details on why this is not necessary?

cpumask_clear_cpu() should already have the needed semantics, no?
It is based on clear_bit() which is defined to be atomic.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 44/49] xen: round up max vcpus to scheduling granularity
  2019-04-01  8:50   ` Andrew Cooper
@ 2019-04-01  9:47     ` Juergen Gross
  2019-04-02  7:49       ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-04-01  9:47 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Julien Grall,
	Jan Beulich, Roger Pau Monné

On 01/04/2019 10:50, Andrew Cooper wrote:
> On 29/03/2019 15:09, Juergen Gross wrote:
>> Make sure the number of vcpus is always a multiple of the scheduling
>> granularity. Note that we don't support a scheduling granularity above
>> one on ARM.
> 
> I'm afraid that I don't think this is a clever move.  In turn, this
> brings into question the approach to idle handling.
> 
> Firstly, with a proposed socket granularity, this would be 128 on some
> systems which exist today.  Furthermore, consider the case where
> cpupool0 has a granularity of 1, and a second pool has a granularity of
> 2.  A domain can be created with an odd number of vcpus and operate in
> pool0 fine, but can't now be moved to pool1.

For now granularity is the same for all pools, but I plan to enhance
that in future.

The answer to that problem might be either to allow for later addition
of dummy vcpus (e.g. by sizing only the vcpu pointer array to the needed
number), or to really disallow moving such a domain between pools.

> If at all possible, I think it would be better to try and reuse the idle
> cpus for holes like this.  Seeing as you've been playing with this code
> a lot, what is your assessment?

This would be rather complicated. I'd either need to switch vcpus
dynamically in schedule items, or I'd need to special case the idle
vcpus in _lots_ of places.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-01  9:40     ` Juergen Gross
@ 2019-04-01 10:29       ` Julien Grall
  2019-04-01 10:37         ` Juergen Gross
  2019-04-16 19:34           ` [Xen-devel] " Stefano Stabellini
  0 siblings, 2 replies; 111+ messages in thread
From: Julien Grall @ 2019-04-01 10:29 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Roger Pau Monné

Hi,

On 4/1/19 10:40 AM, Juergen Gross wrote:
> On 01/04/2019 11:21, Julien Grall wrote:
>> Hi,
>>
>> On 3/29/19 3:08 PM, Juergen Gross wrote:
>>> cpu_disable_scheduler() is being called from __cpu_disable() today.
>>> There is no need to execute it on the cpu just being disabled, so use
>>> the CPU_DEAD case of the cpu notifier chain. Moving the call out of
>>> stop_machine() context is fine, as we just need to hold the domain RCU
>>> lock and need the scheduler percpu data to be still allocated.
>>>
>>> Add another hook for CPU_DOWN_PREPARE to bail out early in case
>>> cpu_disable_scheduler() would fail. This will avoid crashes in rare
>>> cases for cpu hotplug or suspend.
>>>
>>> While at it remove a superfluous smp_mb() in the ARM __cpu_disable()
>>> incarnation.
>>
>> This is not obvious why the smp_mb() is superfluous. Can you please
>> provide more details on why this is not necessary?
> 
> cpumask_clear_cpu() should already have the needed semantics, no?
> It is based on clear_bit() which is defined to be atomic.

atomicity does not mean the store/load cannot be re-ordered by the CPU. 
You would need a barrier to prevent re-ordering.

cpumask_clear_cpu() and clear_bit() does not contain any barrier, so 
store/load can be re-ordered.

I see we have similar smp_mb() barrier in __cpu_die(). Sadly, there are 
no documentation in the code why the barrier is here. The logs don't 
help either.

The barrier here will ensure that the load/store related to disabling 
the CPU are seen before any load/store happening after the return. 
Although, I am not sure why this is necessary.

Stefano, Do you remember the rationale?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-01 10:29       ` Julien Grall
@ 2019-04-01 10:37         ` Juergen Gross
  2019-04-01 13:21           ` Julien Grall
  2019-04-16 19:34           ` [Xen-devel] " Stefano Stabellini
  1 sibling, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-04-01 10:37 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Roger Pau Monné

On 01/04/2019 12:29, Julien Grall wrote:
> Hi,
> 
> On 4/1/19 10:40 AM, Juergen Gross wrote:
>> On 01/04/2019 11:21, Julien Grall wrote:
>>> Hi,
>>>
>>> On 3/29/19 3:08 PM, Juergen Gross wrote:
>>>> cpu_disable_scheduler() is being called from __cpu_disable() today.
>>>> There is no need to execute it on the cpu just being disabled, so use
>>>> the CPU_DEAD case of the cpu notifier chain. Moving the call out of
>>>> stop_machine() context is fine, as we just need to hold the domain RCU
>>>> lock and need the scheduler percpu data to be still allocated.
>>>>
>>>> Add another hook for CPU_DOWN_PREPARE to bail out early in case
>>>> cpu_disable_scheduler() would fail. This will avoid crashes in rare
>>>> cases for cpu hotplug or suspend.
>>>>
>>>> While at it remove a superfluous smp_mb() in the ARM __cpu_disable()
>>>> incarnation.
>>>
>>> This is not obvious why the smp_mb() is superfluous. Can you please
>>> provide more details on why this is not necessary?
>>
>> cpumask_clear_cpu() should already have the needed semantics, no?
>> It is based on clear_bit() which is defined to be atomic.
> 
> atomicity does not mean the store/load cannot be re-ordered by the CPU.
> You would need a barrier to prevent re-ordering.
> 
> cpumask_clear_cpu() and clear_bit() does not contain any barrier, so
> store/load can be re-ordered.

Uh, couldn't this lead to problems, e.g. in vcpu_block()? The comment
there suggests the sequence of setting the blocked bit and doing the
test is important for avoiding a race...


Juergen

> 
> I see we have similar smp_mb() barrier in __cpu_die(). Sadly, there are
> no documentation in the code why the barrier is here. The logs don't
> help either.
> 
> The barrier here will ensure that the load/store related to disabling
> the CPU are seen before any load/store happening after the return.
> Although, I am not sure why this is necessary.
> 
> Stefano, Do you remember the rationale?
> 
> Cheers,
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-01 10:37         ` Juergen Gross
@ 2019-04-01 13:21           ` Julien Grall
  2019-04-01 13:33             ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2019-04-01 13:21 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Roger Pau Monné

Hi Juergen,

On 4/1/19 11:37 AM, Juergen Gross wrote:
> On 01/04/2019 12:29, Julien Grall wrote:
>> Hi,
>>
>> On 4/1/19 10:40 AM, Juergen Gross wrote:
>>> On 01/04/2019 11:21, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 3/29/19 3:08 PM, Juergen Gross wrote:
>>>>> cpu_disable_scheduler() is being called from __cpu_disable() today.
>>>>> There is no need to execute it on the cpu just being disabled, so use
>>>>> the CPU_DEAD case of the cpu notifier chain. Moving the call out of
>>>>> stop_machine() context is fine, as we just need to hold the domain RCU
>>>>> lock and need the scheduler percpu data to be still allocated.
>>>>>
>>>>> Add another hook for CPU_DOWN_PREPARE to bail out early in case
>>>>> cpu_disable_scheduler() would fail. This will avoid crashes in rare
>>>>> cases for cpu hotplug or suspend.
>>>>>
>>>>> While at it remove a superfluous smp_mb() in the ARM __cpu_disable()
>>>>> incarnation.
>>>>
>>>> This is not obvious why the smp_mb() is superfluous. Can you please
>>>> provide more details on why this is not necessary?
>>>
>>> cpumask_clear_cpu() should already have the needed semantics, no?
>>> It is based on clear_bit() which is defined to be atomic.
>>
>> atomicity does not mean the store/load cannot be re-ordered by the CPU.
>> You would need a barrier to prevent re-ordering.
>>
>> cpumask_clear_cpu() and clear_bit() does not contain any barrier, so
>> store/load can be re-ordered.
> 
> Uh, couldn't this lead to problems, e.g. in vcpu_block()? The comment
> there suggests the sequence of setting the blocked bit and doing the
> test is important for avoiding a race...

Hmmm... looking at the other usage (such as in do_poll), on non-x86 
platform, there is a smp_mb() between set_bit(...) and checking the 
event with a similar comment above.

I don't know enough the scheduler code to know why the barrier is 
needed. But for consistency, it seems to me the smp_mb() would be 
required in vcpu_block() as well.

Also, it is quite interesting that the barrier is not presence for x86. 
If I understand correctly the comment on top of set_bit/clear_bit, it 
could as well be re-ordered. So we seem to relying on the underlying 
implementation of set_bit/clear_bit.

Wouldn't it make sense to try to uniformize the semantics? Maybe by 
introducing a new helper?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-01 13:21           ` Julien Grall
@ 2019-04-01 13:33             ` Juergen Gross
  2019-04-01 14:01               ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-04-01 13:33 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Roger Pau Monné

On 01/04/2019 15:21, Julien Grall wrote:
> Hi Juergen,
> 
> On 4/1/19 11:37 AM, Juergen Gross wrote:
>> On 01/04/2019 12:29, Julien Grall wrote:
>>> Hi,
>>>
>>> On 4/1/19 10:40 AM, Juergen Gross wrote:
>>>> On 01/04/2019 11:21, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> On 3/29/19 3:08 PM, Juergen Gross wrote:
>>>>>> cpu_disable_scheduler() is being called from __cpu_disable() today.
>>>>>> There is no need to execute it on the cpu just being disabled, so use
>>>>>> the CPU_DEAD case of the cpu notifier chain. Moving the call out of
>>>>>> stop_machine() context is fine, as we just need to hold the domain
>>>>>> RCU
>>>>>> lock and need the scheduler percpu data to be still allocated.
>>>>>>
>>>>>> Add another hook for CPU_DOWN_PREPARE to bail out early in case
>>>>>> cpu_disable_scheduler() would fail. This will avoid crashes in rare
>>>>>> cases for cpu hotplug or suspend.
>>>>>>
>>>>>> While at it remove a superfluous smp_mb() in the ARM __cpu_disable()
>>>>>> incarnation.
>>>>>
>>>>> This is not obvious why the smp_mb() is superfluous. Can you please
>>>>> provide more details on why this is not necessary?
>>>>
>>>> cpumask_clear_cpu() should already have the needed semantics, no?
>>>> It is based on clear_bit() which is defined to be atomic.
>>>
>>> atomicity does not mean the store/load cannot be re-ordered by the CPU.
>>> You would need a barrier to prevent re-ordering.
>>>
>>> cpumask_clear_cpu() and clear_bit() does not contain any barrier, so
>>> store/load can be re-ordered.
>>
>> Uh, couldn't this lead to problems, e.g. in vcpu_block()? The comment
>> there suggests the sequence of setting the blocked bit and doing the
>> test is important for avoiding a race...
> 
> Hmmm... looking at the other usage (such as in do_poll), on non-x86
> platform, there is a smp_mb() between set_bit(...) and checking the
> event with a similar comment above.
> 
> I don't know enough the scheduler code to know why the barrier is
> needed. But for consistency, it seems to me the smp_mb() would be
> required in vcpu_block() as well.
> 
> Also, it is quite interesting that the barrier is not presence for x86.
> If I understand correctly the comment on top of set_bit/clear_bit, it
> could as well be re-ordered. So we seem to relying on the underlying
> implementation of set_bit/clear_bit.

On x86 reads and writes can't be reordered with locked operations (SDM
Vol 3 8.2.2). So the barrier is really not needed AFAIU.

include/asm-x86/bitops.h:

 * clear_bit() is atomic and may not be reordered.

> Wouldn't it make sense to try to uniformize the semantics? Maybe by
> introducing a new helper?

Or adding the barrier on ARM for the atomic operations?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-01 13:33             ` Juergen Gross
@ 2019-04-01 14:01               ` Julien Grall
  2019-04-01 14:23                 ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2019-04-01 14:01 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Roger Pau Monné

Hi,

On 4/1/19 2:33 PM, Juergen Gross wrote:
> On 01/04/2019 15:21, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 4/1/19 11:37 AM, Juergen Gross wrote:
>>> On 01/04/2019 12:29, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 4/1/19 10:40 AM, Juergen Gross wrote:
>>>>> On 01/04/2019 11:21, Julien Grall wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 3/29/19 3:08 PM, Juergen Gross wrote:
>>>>>>> cpu_disable_scheduler() is being called from __cpu_disable() today.
>>>>>>> There is no need to execute it on the cpu just being disabled, so use
>>>>>>> the CPU_DEAD case of the cpu notifier chain. Moving the call out of
>>>>>>> stop_machine() context is fine, as we just need to hold the domain
>>>>>>> RCU
>>>>>>> lock and need the scheduler percpu data to be still allocated.
>>>>>>>
>>>>>>> Add another hook for CPU_DOWN_PREPARE to bail out early in case
>>>>>>> cpu_disable_scheduler() would fail. This will avoid crashes in rare
>>>>>>> cases for cpu hotplug or suspend.
>>>>>>>
>>>>>>> While at it remove a superfluous smp_mb() in the ARM __cpu_disable()
>>>>>>> incarnation.
>>>>>>
>>>>>> This is not obvious why the smp_mb() is superfluous. Can you please
>>>>>> provide more details on why this is not necessary?
>>>>>
>>>>> cpumask_clear_cpu() should already have the needed semantics, no?
>>>>> It is based on clear_bit() which is defined to be atomic.
>>>>
>>>> atomicity does not mean the store/load cannot be re-ordered by the CPU.
>>>> You would need a barrier to prevent re-ordering.
>>>>
>>>> cpumask_clear_cpu() and clear_bit() does not contain any barrier, so
>>>> store/load can be re-ordered.
>>>
>>> Uh, couldn't this lead to problems, e.g. in vcpu_block()? The comment
>>> there suggests the sequence of setting the blocked bit and doing the
>>> test is important for avoiding a race...
>>
>> Hmmm... looking at the other usage (such as in do_poll), on non-x86
>> platform, there is a smp_mb() between set_bit(...) and checking the
>> event with a similar comment above.
>>
>> I don't know enough the scheduler code to know why the barrier is
>> needed. But for consistency, it seems to me the smp_mb() would be
>> required in vcpu_block() as well.
>>
>> Also, it is quite interesting that the barrier is not presence for x86.
>> If I understand correctly the comment on top of set_bit/clear_bit, it
>> could as well be re-ordered. So we seem to relying on the underlying
>> implementation of set_bit/clear_bit.
> 
> On x86 reads and writes can't be reordered with locked operations (SDM
> Vol 3 8.2.2). So the barrier is really not needed AFAIU.
> 
> include/asm-x86/bitops.h:
> 
>   * clear_bit() is atomic and may not be reordered.

I interpreted the "may not" as you should not rely on the re-ordering to 
not happen.

In place were re-ordering should not happen (e.g test_and_set_bit) we 
use the wording "cannot".

> 
>> Wouldn't it make sense to try to uniformize the semantics? Maybe by
>> introducing a new helper?
> 
> Or adding the barrier on ARM for the atomic operations?

On which basis?  Why should we impact every users for fixing a bug in 
the scheduler?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-01 14:01               ` Julien Grall
@ 2019-04-01 14:23                 ` Juergen Gross
  2019-04-01 15:15                   ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-04-01 14:23 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Roger Pau Monné

On 01/04/2019 16:01, Julien Grall wrote:
> Hi,
> 
> On 4/1/19 2:33 PM, Juergen Gross wrote:
>> On 01/04/2019 15:21, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 4/1/19 11:37 AM, Juergen Gross wrote:
>>>> On 01/04/2019 12:29, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> On 4/1/19 10:40 AM, Juergen Gross wrote:
>>>>>> On 01/04/2019 11:21, Julien Grall wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 3/29/19 3:08 PM, Juergen Gross wrote:
>>>>>>>> cpu_disable_scheduler() is being called from __cpu_disable() today.
>>>>>>>> There is no need to execute it on the cpu just being disabled,
>>>>>>>> so use
>>>>>>>> the CPU_DEAD case of the cpu notifier chain. Moving the call out of
>>>>>>>> stop_machine() context is fine, as we just need to hold the domain
>>>>>>>> RCU
>>>>>>>> lock and need the scheduler percpu data to be still allocated.
>>>>>>>>
>>>>>>>> Add another hook for CPU_DOWN_PREPARE to bail out early in case
>>>>>>>> cpu_disable_scheduler() would fail. This will avoid crashes in rare
>>>>>>>> cases for cpu hotplug or suspend.
>>>>>>>>
>>>>>>>> While at it remove a superfluous smp_mb() in the ARM
>>>>>>>> __cpu_disable()
>>>>>>>> incarnation.
>>>>>>>
>>>>>>> This is not obvious why the smp_mb() is superfluous. Can you please
>>>>>>> provide more details on why this is not necessary?
>>>>>>
>>>>>> cpumask_clear_cpu() should already have the needed semantics, no?
>>>>>> It is based on clear_bit() which is defined to be atomic.
>>>>>
>>>>> atomicity does not mean the store/load cannot be re-ordered by the
>>>>> CPU.
>>>>> You would need a barrier to prevent re-ordering.
>>>>>
>>>>> cpumask_clear_cpu() and clear_bit() does not contain any barrier, so
>>>>> store/load can be re-ordered.
>>>>
>>>> Uh, couldn't this lead to problems, e.g. in vcpu_block()? The comment
>>>> there suggests the sequence of setting the blocked bit and doing the
>>>> test is important for avoiding a race...
>>>
>>> Hmmm... looking at the other usage (such as in do_poll), on non-x86
>>> platform, there is a smp_mb() between set_bit(...) and checking the
>>> event with a similar comment above.
>>>
>>> I don't know enough the scheduler code to know why the barrier is
>>> needed. But for consistency, it seems to me the smp_mb() would be
>>> required in vcpu_block() as well.
>>>
>>> Also, it is quite interesting that the barrier is not presence for x86.
>>> If I understand correctly the comment on top of set_bit/clear_bit, it
>>> could as well be re-ordered. So we seem to relying on the underlying
>>> implementation of set_bit/clear_bit.
>>
>> On x86 reads and writes can't be reordered with locked operations (SDM
>> Vol 3 8.2.2). So the barrier is really not needed AFAIU.
>>
>> include/asm-x86/bitops.h:
>>
>>   * clear_bit() is atomic and may not be reordered.
> 
> I interpreted the "may not" as you should not rely on the re-ordering to
> not happen.
> 
> In place were re-ordering should not happen (e.g test_and_set_bit) we
> use the wording "cannot".

The SDM is very clear here:

"Reads or writes cannot be reordered with I/O instructions, locked
 instructions, or serializing instructions."

>>> Wouldn't it make sense to try to uniformize the semantics? Maybe by
>>> introducing a new helper?
>>
>> Or adding the barrier on ARM for the atomic operations?
> 
> On which basis?  Why should we impact every users for fixing a bug in
> the scheduler?

I'm assuming there are more places like this either in common code or
code copied verbatim from arch/x86 to arch/arm with that problem.

So I take it you'd rather let me add that smp_mb() in __cpu_disable()
again.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces
  2019-04-01  8:19           ` Andrew Cooper
  2019-04-01  8:49             ` Juergen Gross
@ 2019-04-01 15:15             ` Dario Faggioli
  1 sibling, 0 replies; 111+ messages in thread
From: Dario Faggioli @ 2019-04-01 15:15 UTC (permalink / raw)
  To: Andrew Cooper, Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Robert VanVossen,
	Julien Grall, Josh Whitehead, Meng Xu, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 2105 bytes --]

On Mon, 2019-04-01 at 09:19 +0100, Andrew Cooper wrote:
> On 01/04/2019 08:05, Dario Faggioli wrote:
> > On Mon, 2019-04-01 at 08:06 +0200, Juergen Gross wrote:
> > > The
> > > wrappers could then call the related specific scheduler function
> > > based
> > > on the scheduler Id using a chain of if ... else if ...
> > > statements. 
> > > 
> > I guess we'd have to see how the final code will look, but I like
> > the
> > idea, and I think it's well worth a try.
> 
> Normally, the result is put together with PGO rather than manually,
> because the effects are quite subtle.
> 
> The base case which might be good enough for Xen is:
> 
> if ( sched == default )
>     sched_foo();
> else
>     sched->foo();
> 
Yep, and this was exactly what I had in mind, before a full 'if..else'
was mentioned here. And if that's as far as it's sane to get, I'm fine
with that.

> which for the common case of the default cpupool only, or multiple
> groups with the same scheduler, will always take the direct path
> rather
> than the indirect path.
> 
Yeah, and as far as I've been seeing, using default scheduler and
pretty much ignoring cpupool is common enough (and I'm not saying it's
too great a thing! :-/)

> Beyond that, the best length of the if/else chain can only reasonably
> be
> determined with profiling.  It depends on the relative frequencies of
> each call, and blindly doing an if/else chain to the end of the
> scheduler list will probably make worse performance if you're using
> the
> final scheduler than using a retpoline would.  
>
Yeah, makes sense.

And anyway...

> I think its useful to consider optimisations potential optimisations,
> but I'd advise against trying to merge everything into this series.
> 
...yes, let's keep this for later.

Regards,
Dario
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-01 14:23                 ` Juergen Gross
@ 2019-04-01 15:15                   ` Julien Grall
  2019-04-01 16:00                     ` Juergen Gross
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2019-04-01 15:15 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Roger Pau Monné

Hi,

On 4/1/19 3:23 PM, Juergen Gross wrote:
> On 01/04/2019 16:01, Julien Grall wrote:
>> Hi,
>>
>> On 4/1/19 2:33 PM, Juergen Gross wrote:
>>> On 01/04/2019 15:21, Julien Grall wrote:
>>>> Hi Juergen,
>>>>
>>>> On 4/1/19 11:37 AM, Juergen Gross wrote:
>>>>> On 01/04/2019 12:29, Julien Grall wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 4/1/19 10:40 AM, Juergen Gross wrote:
>>>>>>> On 01/04/2019 11:21, Julien Grall wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 3/29/19 3:08 PM, Juergen Gross wrote:
>>>>>>>>> cpu_disable_scheduler() is being called from __cpu_disable() today.
>>>>>>>>> There is no need to execute it on the cpu just being disabled,
>>>>>>>>> so use
>>>>>>>>> the CPU_DEAD case of the cpu notifier chain. Moving the call out of
>>>>>>>>> stop_machine() context is fine, as we just need to hold the domain
>>>>>>>>> RCU
>>>>>>>>> lock and need the scheduler percpu data to be still allocated.
>>>>>>>>>
>>>>>>>>> Add another hook for CPU_DOWN_PREPARE to bail out early in case
>>>>>>>>> cpu_disable_scheduler() would fail. This will avoid crashes in rare
>>>>>>>>> cases for cpu hotplug or suspend.
>>>>>>>>>
>>>>>>>>> While at it remove a superfluous smp_mb() in the ARM
>>>>>>>>> __cpu_disable()
>>>>>>>>> incarnation.
>>>>>>>>
>>>>>>>> This is not obvious why the smp_mb() is superfluous. Can you please
>>>>>>>> provide more details on why this is not necessary?
>>>>>>>
>>>>>>> cpumask_clear_cpu() should already have the needed semantics, no?
>>>>>>> It is based on clear_bit() which is defined to be atomic.
>>>>>>
>>>>>> atomicity does not mean the store/load cannot be re-ordered by the
>>>>>> CPU.
>>>>>> You would need a barrier to prevent re-ordering.
>>>>>>
>>>>>> cpumask_clear_cpu() and clear_bit() does not contain any barrier, so
>>>>>> store/load can be re-ordered.
>>>>>
>>>>> Uh, couldn't this lead to problems, e.g. in vcpu_block()? The comment
>>>>> there suggests the sequence of setting the blocked bit and doing the
>>>>> test is important for avoiding a race...
>>>>
>>>> Hmmm... looking at the other usage (such as in do_poll), on non-x86
>>>> platform, there is a smp_mb() between set_bit(...) and checking the
>>>> event with a similar comment above.
>>>>
>>>> I don't know enough the scheduler code to know why the barrier is
>>>> needed. But for consistency, it seems to me the smp_mb() would be
>>>> required in vcpu_block() as well.
>>>>
>>>> Also, it is quite interesting that the barrier is not presence for x86.
>>>> If I understand correctly the comment on top of set_bit/clear_bit, it
>>>> could as well be re-ordered. So we seem to relying on the underlying
>>>> implementation of set_bit/clear_bit.
>>>
>>> On x86 reads and writes can't be reordered with locked operations (SDM
>>> Vol 3 8.2.2). So the barrier is really not needed AFAIU.
>>>
>>> include/asm-x86/bitops.h:
>>>
>>>    * clear_bit() is atomic and may not be reordered.
>>
>> I interpreted the "may not" as you should not rely on the re-ordering to
>> not happen.
>>
>> In place were re-ordering should not happen (e.g test_and_set_bit) we
>> use the wording "cannot".
> 
> The SDM is very clear here:
> 
> "Reads or writes cannot be reordered with I/O instructions, locked
>   instructions, or serializing instructions."

This is what the specification says not the intended semantic. Helper 
may have a more relaxed semantics to accommodate other architecture.

I believe, this is the case here. The semantic is more relaxed than the 
implementation. So you don't have to impose a barrier in architecture 
with a more relaxed memory ordering.

> 
>>>> Wouldn't it make sense to try to uniformize the semantics? Maybe by
>>>> introducing a new helper?
>>>
>>> Or adding the barrier on ARM for the atomic operations?
>>
>> On which basis?  Why should we impact every users for fixing a bug in
>> the scheduler?
> 
> I'm assuming there are more places like this either in common code or
> code copied verbatim from arch/x86 to arch/arm with that problem.

Adding it in the *_set helpers is just the poor's man fix. If we do that 
this is going to stick for a long time and impact performance.

Instead we should fix the scheduler code (and hopefully only that) where 
the ordering is necessary.

> 
> So I take it you'd rather let me add that smp_mb() in __cpu_disable()
> again.

Removing/Adding barriers should be accompanied with a proper 
justifications in the commit message. Additionally, new barrier should 
have a comment explaining what they are for.

In this case, I don't know what is the correct answer. It feels to me we 
should keep it until we have a better understanding of this code. But 
then it raises the question whether a barrier would also be necessary 
after calling cpu_disable_scheduler().

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-01 15:15                   ` Julien Grall
@ 2019-04-01 16:00                     ` Juergen Gross
  2019-04-01 17:17                       ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Juergen Gross @ 2019-04-01 16:00 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Roger Pau Monné

On 01/04/2019 17:15, Julien Grall wrote:
> Hi,
> 
> On 4/1/19 3:23 PM, Juergen Gross wrote:
>> On 01/04/2019 16:01, Julien Grall wrote:
>>> Hi,
>>>
>>> On 4/1/19 2:33 PM, Juergen Gross wrote:
>>>> On 01/04/2019 15:21, Julien Grall wrote:
>>>>> Hi Juergen,
>>>>>
>>>>> On 4/1/19 11:37 AM, Juergen Gross wrote:
>>>>>> On 01/04/2019 12:29, Julien Grall wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 4/1/19 10:40 AM, Juergen Gross wrote:
>>>>>>>> On 01/04/2019 11:21, Julien Grall wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 3/29/19 3:08 PM, Juergen Gross wrote:
>>>>>>>>>> cpu_disable_scheduler() is being called from __cpu_disable()
>>>>>>>>>> today.
>>>>>>>>>> There is no need to execute it on the cpu just being disabled,
>>>>>>>>>> so use
>>>>>>>>>> the CPU_DEAD case of the cpu notifier chain. Moving the call
>>>>>>>>>> out of
>>>>>>>>>> stop_machine() context is fine, as we just need to hold the
>>>>>>>>>> domain
>>>>>>>>>> RCU
>>>>>>>>>> lock and need the scheduler percpu data to be still allocated.
>>>>>>>>>>
>>>>>>>>>> Add another hook for CPU_DOWN_PREPARE to bail out early in case
>>>>>>>>>> cpu_disable_scheduler() would fail. This will avoid crashes in
>>>>>>>>>> rare
>>>>>>>>>> cases for cpu hotplug or suspend.
>>>>>>>>>>
>>>>>>>>>> While at it remove a superfluous smp_mb() in the ARM
>>>>>>>>>> __cpu_disable()
>>>>>>>>>> incarnation.
>>>>>>>>>
>>>>>>>>> This is not obvious why the smp_mb() is superfluous. Can you
>>>>>>>>> please
>>>>>>>>> provide more details on why this is not necessary?
>>>>>>>>
>>>>>>>> cpumask_clear_cpu() should already have the needed semantics, no?
>>>>>>>> It is based on clear_bit() which is defined to be atomic.
>>>>>>>
>>>>>>> atomicity does not mean the store/load cannot be re-ordered by the
>>>>>>> CPU.
>>>>>>> You would need a barrier to prevent re-ordering.
>>>>>>>
>>>>>>> cpumask_clear_cpu() and clear_bit() does not contain any barrier, so
>>>>>>> store/load can be re-ordered.
>>>>>>
>>>>>> Uh, couldn't this lead to problems, e.g. in vcpu_block()? The comment
>>>>>> there suggests the sequence of setting the blocked bit and doing the
>>>>>> test is important for avoiding a race...
>>>>>
>>>>> Hmmm... looking at the other usage (such as in do_poll), on non-x86
>>>>> platform, there is a smp_mb() between set_bit(...) and checking the
>>>>> event with a similar comment above.
>>>>>
>>>>> I don't know enough the scheduler code to know why the barrier is
>>>>> needed. But for consistency, it seems to me the smp_mb() would be
>>>>> required in vcpu_block() as well.
>>>>>
>>>>> Also, it is quite interesting that the barrier is not presence for
>>>>> x86.
>>>>> If I understand correctly the comment on top of set_bit/clear_bit, it
>>>>> could as well be re-ordered. So we seem to relying on the underlying
>>>>> implementation of set_bit/clear_bit.
>>>>
>>>> On x86 reads and writes can't be reordered with locked operations (SDM
>>>> Vol 3 8.2.2). So the barrier is really not needed AFAIU.
>>>>
>>>> include/asm-x86/bitops.h:
>>>>
>>>>    * clear_bit() is atomic and may not be reordered.
>>>
>>> I interpreted the "may not" as you should not rely on the re-ordering to
>>> not happen.
>>>
>>> In place were re-ordering should not happen (e.g test_and_set_bit) we
>>> use the wording "cannot".
>>
>> The SDM is very clear here:
>>
>> "Reads or writes cannot be reordered with I/O instructions, locked
>>   instructions, or serializing instructions."
> 
> This is what the specification says not the intended semantic. Helper
> may have a more relaxed semantics to accommodate other architecture.
> 
> I believe, this is the case here. The semantic is more relaxed than the
> implementation. So you don't have to impose a barrier in architecture
> with a more relaxed memory ordering.
> 
>>
>>>>> Wouldn't it make sense to try to uniformize the semantics? Maybe by
>>>>> introducing a new helper?
>>>>
>>>> Or adding the barrier on ARM for the atomic operations?
>>>
>>> On which basis?  Why should we impact every users for fixing a bug in
>>> the scheduler?
>>
>> I'm assuming there are more places like this either in common code or
>> code copied verbatim from arch/x86 to arch/arm with that problem.
> 
> Adding it in the *_set helpers is just the poor's man fix. If we do that
> this is going to stick for a long time and impact performance.
> 
> Instead we should fix the scheduler code (and hopefully only that) where
> the ordering is necessary.

I believe that should be a patch on its own. Are you doing that?

>> So I take it you'd rather let me add that smp_mb() in __cpu_disable()
>> again.
> 
> Removing/Adding barriers should be accompanied with a proper
> justifications in the commit message. Additionally, new barrier should
> have a comment explaining what they are for.
> 
> In this case, I don't know what is the correct answer. It feels to me we
> should keep it until we have a better understanding of this code. But

Okay.

> then it raises the question whether a barrier would also be necessary
> after calling cpu_disable_scheduler().

That one is quite easy: all paths of cpu_disable_scheduler() are doing
an unlock operation at the end, so the barrier is already there.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-01 16:00                     ` Juergen Gross
@ 2019-04-01 17:17                       ` Julien Grall
  0 siblings, 0 replies; 111+ messages in thread
From: Julien Grall @ 2019-04-01 17:17 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Roger Pau Monné

Hi,

On 4/1/19 5:00 PM, Juergen Gross wrote:
> On 01/04/2019 17:15, Julien Grall wrote:
>> Hi,
>>
>> On 4/1/19 3:23 PM, Juergen Gross wrote:
>>> On 01/04/2019 16:01, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 4/1/19 2:33 PM, Juergen Gross wrote:
>>>>> On 01/04/2019 15:21, Julien Grall wrote:
>>>>>> Hi Juergen,
>>>>>>
>>>>>> On 4/1/19 11:37 AM, Juergen Gross wrote:
>>>>>>> On 01/04/2019 12:29, Julien Grall wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 4/1/19 10:40 AM, Juergen Gross wrote:
>>>>>>>>> On 01/04/2019 11:21, Julien Grall wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On 3/29/19 3:08 PM, Juergen Gross wrote:
>>>>>>>>>>> cpu_disable_scheduler() is being called from __cpu_disable()
>>>>>>>>>>> today.
>>>>>>>>>>> There is no need to execute it on the cpu just being disabled,
>>>>>>>>>>> so use
>>>>>>>>>>> the CPU_DEAD case of the cpu notifier chain. Moving the call
>>>>>>>>>>> out of
>>>>>>>>>>> stop_machine() context is fine, as we just need to hold the
>>>>>>>>>>> domain
>>>>>>>>>>> RCU
>>>>>>>>>>> lock and need the scheduler percpu data to be still allocated.
>>>>>>>>>>>
>>>>>>>>>>> Add another hook for CPU_DOWN_PREPARE to bail out early in case
>>>>>>>>>>> cpu_disable_scheduler() would fail. This will avoid crashes in
>>>>>>>>>>> rare
>>>>>>>>>>> cases for cpu hotplug or suspend.
>>>>>>>>>>>
>>>>>>>>>>> While at it remove a superfluous smp_mb() in the ARM
>>>>>>>>>>> __cpu_disable()
>>>>>>>>>>> incarnation.
>>>>>>>>>>
>>>>>>>>>> This is not obvious why the smp_mb() is superfluous. Can you
>>>>>>>>>> please
>>>>>>>>>> provide more details on why this is not necessary?
>>>>>>>>>
>>>>>>>>> cpumask_clear_cpu() should already have the needed semantics, no?
>>>>>>>>> It is based on clear_bit() which is defined to be atomic.
>>>>>>>>
>>>>>>>> atomicity does not mean the store/load cannot be re-ordered by the
>>>>>>>> CPU.
>>>>>>>> You would need a barrier to prevent re-ordering.
>>>>>>>>
>>>>>>>> cpumask_clear_cpu() and clear_bit() does not contain any barrier, so
>>>>>>>> store/load can be re-ordered.
>>>>>>>
>>>>>>> Uh, couldn't this lead to problems, e.g. in vcpu_block()? The comment
>>>>>>> there suggests the sequence of setting the blocked bit and doing the
>>>>>>> test is important for avoiding a race...
>>>>>>
>>>>>> Hmmm... looking at the other usage (such as in do_poll), on non-x86
>>>>>> platform, there is a smp_mb() between set_bit(...) and checking the
>>>>>> event with a similar comment above.
>>>>>>
>>>>>> I don't know enough the scheduler code to know why the barrier is
>>>>>> needed. But for consistency, it seems to me the smp_mb() would be
>>>>>> required in vcpu_block() as well.
>>>>>>
>>>>>> Also, it is quite interesting that the barrier is not presence for
>>>>>> x86.
>>>>>> If I understand correctly the comment on top of set_bit/clear_bit, it
>>>>>> could as well be re-ordered. So we seem to relying on the underlying
>>>>>> implementation of set_bit/clear_bit.
>>>>>
>>>>> On x86 reads and writes can't be reordered with locked operations (SDM
>>>>> Vol 3 8.2.2). So the barrier is really not needed AFAIU.
>>>>>
>>>>> include/asm-x86/bitops.h:
>>>>>
>>>>>     * clear_bit() is atomic and may not be reordered.
>>>>
>>>> I interpreted the "may not" as you should not rely on the re-ordering to
>>>> not happen.
>>>>
>>>> In place were re-ordering should not happen (e.g test_and_set_bit) we
>>>> use the wording "cannot".
>>>
>>> The SDM is very clear here:
>>>
>>> "Reads or writes cannot be reordered with I/O instructions, locked
>>>    instructions, or serializing instructions."
>>
>> This is what the specification says not the intended semantic. Helper
>> may have a more relaxed semantics to accommodate other architecture.
>>
>> I believe, this is the case here. The semantic is more relaxed than the
>> implementation. So you don't have to impose a barrier in architecture
>> with a more relaxed memory ordering.
>>
>>>
>>>>>> Wouldn't it make sense to try to uniformize the semantics? Maybe by
>>>>>> introducing a new helper?
>>>>>
>>>>> Or adding the barrier on ARM for the atomic operations?
>>>>
>>>> On which basis?  Why should we impact every users for fixing a bug in
>>>> the scheduler?
>>>
>>> I'm assuming there are more places like this either in common code or
>>> code copied verbatim from arch/x86 to arch/arm with that problem.
>>
>> Adding it in the *_set helpers is just the poor's man fix. If we do that
>> this is going to stick for a long time and impact performance.
>>
>> Instead we should fix the scheduler code (and hopefully only that) where
>> the ordering is necessary.
> 
> I believe that should be a patch on its own. Are you doing that?

I will try to have a look tomorrow.

> 
>>> So I take it you'd rather let me add that smp_mb() in __cpu_disable()
>>> again.
>>
>> Removing/Adding barriers should be accompanied with a proper
>> justifications in the commit message. Additionally, new barrier should
>> have a comment explaining what they are for.
>>
>> In this case, I don't know what is the correct answer. It feels to me we
>> should keep it until we have a better understanding of this code. But
> 
> Okay.
> 
>> then it raises the question whether a barrier would also be necessary
>> after calling cpu_disable_scheduler().
> 
> That one is quite easy: all paths of cpu_disable_scheduler() are doing
> an unlock operation at the end, so the barrier is already there.

Oh, nothing to worry then :). Thank you for look at it.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 44/49] xen: round up max vcpus to scheduling granularity
  2019-04-01  9:47     ` Juergen Gross
@ 2019-04-02  7:49       ` Juergen Gross
  0 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-04-02  7:49 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Tim Deegan, Julien Grall,
	Jan Beulich, Roger Pau Monné

On 01/04/2019 11:47, Juergen Gross wrote:
> On 01/04/2019 10:50, Andrew Cooper wrote:
>> On 29/03/2019 15:09, Juergen Gross wrote:
>>> Make sure the number of vcpus is always a multiple of the scheduling
>>> granularity. Note that we don't support a scheduling granularity above
>>> one on ARM.
>>
>> I'm afraid that I don't think this is a clever move.  In turn, this
>> brings into question the approach to idle handling.
>>
>> Firstly, with a proposed socket granularity, this would be 128 on some
>> systems which exist today.  Furthermore, consider the case where
>> cpupool0 has a granularity of 1, and a second pool has a granularity of
>> 2.  A domain can be created with an odd number of vcpus and operate in
>> pool0 fine, but can't now be moved to pool1.
> 
> For now granularity is the same for all pools, but I plan to enhance
> that in future.
> 
> The answer to that problem might be either to allow for later addition
> of dummy vcpus (e.g. by sizing only the vcpu pointer array to the needed
> number), or to really disallow moving such a domain between pools.
> 
>> If at all possible, I think it would be better to try and reuse the idle
>> cpus for holes like this.  Seeing as you've been playing with this code
>> a lot, what is your assessment?
> 
> This would be rather complicated. I'd either need to switch vcpus
> dynamically in schedule items, or I'd need to special case the idle
> vcpus in _lots_ of places.

I have thought more about this and maybe I have found a way to make that
less intrusive as I thought in the beginning.

I'll give it a try...


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
@ 2019-04-11  0:34     ` Dario Faggioli
  0 siblings, 0 replies; 111+ messages in thread
From: Dario Faggioli @ 2019-04-11  0:34 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jun Nakajima,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Robert VanVossen, Julien Grall, Paul Durrant, Josh Whitehead,
	Meng Xu, Jan Beulich, Ian Jackson, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 4788 bytes --]

On Fri, 2019-03-29 at 19:16 +0100, Dario Faggioli wrote:
> On Fri, 2019-03-29 at 16:08 +0100, Juergen Gross wrote:
> > I have done some very basic performance testing: on a 4 cpu system
> > (2 cores with 2 threads each) I did a "make -j 4" for building the
> > Xen
> > hypervisor. With This test has been run on dom0, once with no other
> > guest active and once with another guest with 4 vcpus running the
> > same
> > test.
> Just as an heads up for people (as Juergen knows this already :-D),
> I'm
> planning to run some performance evaluation of this patches.
> 
> I've got an 8 CPUs system (4 cores, 2 threads each, no-NUMA) and an
> 16
> CPUs system (2 sockets/NUMA nodes, 4 cores each, 2 threads each) on
> which I should be able to get some bench suite running relatively
> easy
> and (hopefully) quick.
> 
> I'm planning to evaluate:
> - vanilla (i.e., without this series), SMT enabled in BIOS
> - vanilla (i.e., without this series), SMT disabled in BIOS
> - patched (i.e., with this series), granularity=thread
> - patched (i.e., with this series), granularity=core
> 
> I'll do start with no overcommitment, and then move to 2x
> overcommitment (as you did above).
> 
I've got the first set of results. It's fewer than I wanted/expected to
have at this point in time, but still...

Also, it's Phoronix again. I don't especially love it, but I'm still
working on convincing our own internal automated benchmarking tool
(which I like a lot more :-) ) to be a good friend of Xen. :-P

It's a not too big set of tests, done in the following conditions:
- hardware: Intel Xeon E5620; 2 NUMA nodes, 4 cores and 2 threads each
- slow disk (old rotational HDD)
- benchmarks run in dom0
- CPU, memory and some disk IO benchmarks
- all Spec&Melt mitigations disabled both at Xen and dom0 kernel level
- cpufreq governor = performance, max_cstate = C1
- *non* debug hypervisor

In just one sentence, what I'd say is "So far so god" :-D

https://openbenchmarking.org/result/1904105-SP-1904100DA38

1) 'Xen dom0, SMT On, vanilla' is staging *without* this series even 
    applied
2) 'Xen dom0, SMT on, patched, sched_granularity=thread' is with this 
    series applied, but scheduler behavior as right now
3) 'Xen dom0, SMT on, patched, sched_granularity=core' is with this 
    series applied, and core-scheduling enabled
4) 'Xen dom0, SMT Off, vanilla' is staging *without* this series 
    applied, and SMT turned off in BIOS (i.e., we only have 8 CPUs)

So, comparing 1 and 4, we see, for each specific benchmark, what is the
cost of disabling SMT (or vice-versa, the gain of using SMT).

Comparing 1 and 2, we see the overhead introduced by this series, when
it is not used to achieve core-scheduling.

Compating 1 and 3, we see the differences with what we have right now,
and what we'll have with core-scheduling enabled, as it is implemented
in this series.

Some of the things we can see from the results:
- disabling SMT (i.e., 1 vs 4) is not always bad, but it is bad 
  overall, i.e., if you look at how many tests are better and at how 
  many are slower, with SMT off (and also, by how much). Of course, 
  this can be considered true for these specific benchmarks, on this 
  specific hardware and with this configuration
- the overhead introduced by this series is, overall, pretty small, 
  apart from not more than a couple of exceptions (e.g., Stream Triad 
  or zstd compression). OTOH, there seem to be cases where this series 
  improves performance (e.g., Stress-NG Socket Activity)
- the performance we achieve with core-scheduling are more than 
  acceptable
- between core-scheduling and disabling SMT, core-scheduling wins and
  I wouldn't even call it a match :-P

Of course, other thoughts, comments, alternative analysis are welcome.

As said above, this is less that what I wanted to have, and in fact I'm
running more stuff.

I have a much more comprehensive set of benchmarks running in these
days. It being "much more comprehensive", however, also means it takes
more time.

I have a newer and faster (both CPU and disk) machine, but I need to
re-purpose it for benchmarking purposes.

At least now that the old Xeon NUMA box is done with this first round,
I can use it for:
- running the tests inside a "regular" PV domain
- running the tests inside more than one PV domain, i.e. with some 
  degree of overcommitment

I'll push out results as soon as I have them.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/49] xen: add core scheduling support
@ 2019-04-11  0:34     ` Dario Faggioli
  0 siblings, 0 replies; 111+ messages in thread
From: Dario Faggioli @ 2019-04-11  0:34 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jun Nakajima,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Robert VanVossen, Julien Grall, Paul Durrant, Josh Whitehead,
	Meng Xu, Jan Beulich, Ian Jackson, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 4788 bytes --]

On Fri, 2019-03-29 at 19:16 +0100, Dario Faggioli wrote:
> On Fri, 2019-03-29 at 16:08 +0100, Juergen Gross wrote:
> > I have done some very basic performance testing: on a 4 cpu system
> > (2 cores with 2 threads each) I did a "make -j 4" for building the
> > Xen
> > hypervisor. With This test has been run on dom0, once with no other
> > guest active and once with another guest with 4 vcpus running the
> > same
> > test.
> Just as an heads up for people (as Juergen knows this already :-D),
> I'm
> planning to run some performance evaluation of this patches.
> 
> I've got an 8 CPUs system (4 cores, 2 threads each, no-NUMA) and an
> 16
> CPUs system (2 sockets/NUMA nodes, 4 cores each, 2 threads each) on
> which I should be able to get some bench suite running relatively
> easy
> and (hopefully) quick.
> 
> I'm planning to evaluate:
> - vanilla (i.e., without this series), SMT enabled in BIOS
> - vanilla (i.e., without this series), SMT disabled in BIOS
> - patched (i.e., with this series), granularity=thread
> - patched (i.e., with this series), granularity=core
> 
> I'll do start with no overcommitment, and then move to 2x
> overcommitment (as you did above).
> 
I've got the first set of results. It's fewer than I wanted/expected to
have at this point in time, but still...

Also, it's Phoronix again. I don't especially love it, but I'm still
working on convincing our own internal automated benchmarking tool
(which I like a lot more :-) ) to be a good friend of Xen. :-P

It's a not too big set of tests, done in the following conditions:
- hardware: Intel Xeon E5620; 2 NUMA nodes, 4 cores and 2 threads each
- slow disk (old rotational HDD)
- benchmarks run in dom0
- CPU, memory and some disk IO benchmarks
- all Spec&Melt mitigations disabled both at Xen and dom0 kernel level
- cpufreq governor = performance, max_cstate = C1
- *non* debug hypervisor

In just one sentence, what I'd say is "So far so god" :-D

https://openbenchmarking.org/result/1904105-SP-1904100DA38

1) 'Xen dom0, SMT On, vanilla' is staging *without* this series even 
    applied
2) 'Xen dom0, SMT on, patched, sched_granularity=thread' is with this 
    series applied, but scheduler behavior as right now
3) 'Xen dom0, SMT on, patched, sched_granularity=core' is with this 
    series applied, and core-scheduling enabled
4) 'Xen dom0, SMT Off, vanilla' is staging *without* this series 
    applied, and SMT turned off in BIOS (i.e., we only have 8 CPUs)

So, comparing 1 and 4, we see, for each specific benchmark, what is the
cost of disabling SMT (or vice-versa, the gain of using SMT).

Comparing 1 and 2, we see the overhead introduced by this series, when
it is not used to achieve core-scheduling.

Compating 1 and 3, we see the differences with what we have right now,
and what we'll have with core-scheduling enabled, as it is implemented
in this series.

Some of the things we can see from the results:
- disabling SMT (i.e., 1 vs 4) is not always bad, but it is bad 
  overall, i.e., if you look at how many tests are better and at how 
  many are slower, with SMT off (and also, by how much). Of course, 
  this can be considered true for these specific benchmarks, on this 
  specific hardware and with this configuration
- the overhead introduced by this series is, overall, pretty small, 
  apart from not more than a couple of exceptions (e.g., Stream Triad 
  or zstd compression). OTOH, there seem to be cases where this series 
  improves performance (e.g., Stress-NG Socket Activity)
- the performance we achieve with core-scheduling are more than 
  acceptable
- between core-scheduling and disabling SMT, core-scheduling wins and
  I wouldn't even call it a match :-P

Of course, other thoughts, comments, alternative analysis are welcome.

As said above, this is less that what I wanted to have, and in fact I'm
running more stuff.

I have a much more comprehensive set of benchmarks running in these
days. It being "much more comprehensive", however, also means it takes
more time.

I have a newer and faster (both CPU and disk) machine, but I need to
re-purpose it for benchmarking purposes.

At least now that the old Xeon NUMA box is done with this first round,
I can use it for:
- running the tests inside a "regular" PV domain
- running the tests inside more than one PV domain, i.e. with some 
  degree of overcommitment

I'll push out results as soon as I have them.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
@ 2019-04-11  7:16       ` Juergen Gross
  0 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-04-11  7:16 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jun Nakajima,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Robert VanVossen, Julien Grall, Paul Durrant, Josh Whitehead,
	Meng Xu, Jan Beulich, Ian Jackson, Roger Pau Monné

On 11/04/2019 02:34, Dario Faggioli wrote:
> On Fri, 2019-03-29 at 19:16 +0100, Dario Faggioli wrote:
>> On Fri, 2019-03-29 at 16:08 +0100, Juergen Gross wrote:
>>> I have done some very basic performance testing: on a 4 cpu system
>>> (2 cores with 2 threads each) I did a "make -j 4" for building the
>>> Xen
>>> hypervisor. With This test has been run on dom0, once with no other
>>> guest active and once with another guest with 4 vcpus running the
>>> same
>>> test.
>> Just as an heads up for people (as Juergen knows this already :-D),
>> I'm
>> planning to run some performance evaluation of this patches.
>>
>> I've got an 8 CPUs system (4 cores, 2 threads each, no-NUMA) and an
>> 16
>> CPUs system (2 sockets/NUMA nodes, 4 cores each, 2 threads each) on
>> which I should be able to get some bench suite running relatively
>> easy
>> and (hopefully) quick.
>>
>> I'm planning to evaluate:
>> - vanilla (i.e., without this series), SMT enabled in BIOS
>> - vanilla (i.e., without this series), SMT disabled in BIOS
>> - patched (i.e., with this series), granularity=thread
>> - patched (i.e., with this series), granularity=core
>>
>> I'll do start with no overcommitment, and then move to 2x
>> overcommitment (as you did above).
>>
> I've got the first set of results. It's fewer than I wanted/expected to
> have at this point in time, but still...
> 
> Also, it's Phoronix again. I don't especially love it, but I'm still
> working on convincing our own internal automated benchmarking tool
> (which I like a lot more :-) ) to be a good friend of Xen. :-P

I think the Phoronix tests as such are not that bad, its the way they
are used by Phoronix which is completely idiotic.

> It's a not too big set of tests, done in the following conditions:
> - hardware: Intel Xeon E5620; 2 NUMA nodes, 4 cores and 2 threads each
> - slow disk (old rotational HDD)
> - benchmarks run in dom0
> - CPU, memory and some disk IO benchmarks
> - all Spec&Melt mitigations disabled both at Xen and dom0 kernel level
> - cpufreq governor = performance, max_cstate = C1
> - *non* debug hypervisor
> 
> In just one sentence, what I'd say is "So far so god" :-D
> 
> https://openbenchmarking.org/result/1904105-SP-1904100DA38

Thanks for doing that!


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/49] xen: add core scheduling support
@ 2019-04-11  7:16       ` Juergen Gross
  0 siblings, 0 replies; 111+ messages in thread
From: Juergen Gross @ 2019-04-11  7:16 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jun Nakajima,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Robert VanVossen, Julien Grall, Paul Durrant, Josh Whitehead,
	Meng Xu, Jan Beulich, Ian Jackson, Roger Pau Monné

On 11/04/2019 02:34, Dario Faggioli wrote:
> On Fri, 2019-03-29 at 19:16 +0100, Dario Faggioli wrote:
>> On Fri, 2019-03-29 at 16:08 +0100, Juergen Gross wrote:
>>> I have done some very basic performance testing: on a 4 cpu system
>>> (2 cores with 2 threads each) I did a "make -j 4" for building the
>>> Xen
>>> hypervisor. With This test has been run on dom0, once with no other
>>> guest active and once with another guest with 4 vcpus running the
>>> same
>>> test.
>> Just as an heads up for people (as Juergen knows this already :-D),
>> I'm
>> planning to run some performance evaluation of this patches.
>>
>> I've got an 8 CPUs system (4 cores, 2 threads each, no-NUMA) and an
>> 16
>> CPUs system (2 sockets/NUMA nodes, 4 cores each, 2 threads each) on
>> which I should be able to get some bench suite running relatively
>> easy
>> and (hopefully) quick.
>>
>> I'm planning to evaluate:
>> - vanilla (i.e., without this series), SMT enabled in BIOS
>> - vanilla (i.e., without this series), SMT disabled in BIOS
>> - patched (i.e., with this series), granularity=thread
>> - patched (i.e., with this series), granularity=core
>>
>> I'll do start with no overcommitment, and then move to 2x
>> overcommitment (as you did above).
>>
> I've got the first set of results. It's fewer than I wanted/expected to
> have at this point in time, but still...
> 
> Also, it's Phoronix again. I don't especially love it, but I'm still
> working on convincing our own internal automated benchmarking tool
> (which I like a lot more :-) ) to be a good friend of Xen. :-P

I think the Phoronix tests as such are not that bad, its the way they
are used by Phoronix which is completely idiotic.

> It's a not too big set of tests, done in the following conditions:
> - hardware: Intel Xeon E5620; 2 NUMA nodes, 4 cores and 2 threads each
> - slow disk (old rotational HDD)
> - benchmarks run in dom0
> - CPU, memory and some disk IO benchmarks
> - all Spec&Melt mitigations disabled both at Xen and dom0 kernel level
> - cpufreq governor = performance, max_cstate = C1
> - *non* debug hypervisor
> 
> In just one sentence, what I'd say is "So far so god" :-D
> 
> https://openbenchmarking.org/result/1904105-SP-1904100DA38

Thanks for doing that!


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 00/49] xen: add core scheduling support
@ 2019-04-11 13:28         ` Dario Faggioli
  0 siblings, 0 replies; 111+ messages in thread
From: Dario Faggioli @ 2019-04-11 13:28 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jan Beulich,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Robert VanVossen, Julien Grall, Paul Durrant, Josh Whitehead,
	Meng Xu, Jun Nakajima, Ian Jackson, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1477 bytes --]

On Thu, 2019-04-11 at 09:16 +0200, Juergen Gross wrote:
> On 11/04/2019 02:34, Dario Faggioli wrote:
> > Also, it's Phoronix again. I don't especially love it, but I'm
> > still
> > working on convincing our own internal automated benchmarking tool
> > (which I like a lot more :-) ) to be a good friend of Xen. :-P
> 
> I think the Phoronix tests as such are not that bad, its the way they
> are used by Phoronix which is completely idiotic.
> 
Sure, that is the main problem.

About the suite itself, the fact that it is kind of a black box, can be
a very good thing, but also a not so good one.

Opaqueness is, AFAIUI, among its design goals, so I can't possibly
complain about that. And in fact, that is what makes it so easy and
quick to play with it. :-)

If you want to tweak the configuration of a benchmark, or change how
they're run, beside the config options that are pre-defined for each
benchmark, (e.g., do stuff like adding `numactl blabla` "in front" of
some), that is a lot less obvious or easy. And yes, this is somewhat
the case for most, if not all, the benchmarking suite, but I find
Phoronix makes this _particularly_ tricky.

Anyway... :-D :-D

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/49] xen: add core scheduling support
@ 2019-04-11 13:28         ` Dario Faggioli
  0 siblings, 0 replies; 111+ messages in thread
From: Dario Faggioli @ 2019-04-11 13:28 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jan Beulich,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Robert VanVossen, Julien Grall, Paul Durrant, Josh Whitehead,
	Meng Xu, Jun Nakajima, Ian Jackson, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1477 bytes --]

On Thu, 2019-04-11 at 09:16 +0200, Juergen Gross wrote:
> On 11/04/2019 02:34, Dario Faggioli wrote:
> > Also, it's Phoronix again. I don't especially love it, but I'm
> > still
> > working on convincing our own internal automated benchmarking tool
> > (which I like a lot more :-) ) to be a good friend of Xen. :-P
> 
> I think the Phoronix tests as such are not that bad, its the way they
> are used by Phoronix which is completely idiotic.
> 
Sure, that is the main problem.

About the suite itself, the fact that it is kind of a black box, can be
a very good thing, but also a not so good one.

Opaqueness is, AFAIUI, among its design goals, so I can't possibly
complain about that. And in fact, that is what makes it so easy and
quick to play with it. :-)

If you want to tweak the configuration of a benchmark, or change how
they're run, beside the config options that are pre-defined for each
benchmark, (e.g., do stuff like adding `numactl blabla` "in front" of
some), that is a lot less obvious or easy. And yes, this is somewhat
the case for most, if not all, the benchmarking suite, but I find
Phoronix makes this _particularly_ tricky.

Anyway... :-D :-D

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
@ 2019-04-16 19:34           ` Stefano Stabellini
  0 siblings, 0 replies; 111+ messages in thread
From: Stefano Stabellini @ 2019-04-16 19:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Dario Faggioli, Jan Beulich, xen-devel,
	Roger Pau Monné

On Mon, 1 Apr 2019, Julien Grall wrote:
> Hi,
> 
> On 4/1/19 10:40 AM, Juergen Gross wrote:
> > On 01/04/2019 11:21, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 3/29/19 3:08 PM, Juergen Gross wrote:
> > > > cpu_disable_scheduler() is being called from __cpu_disable() today.
> > > > There is no need to execute it on the cpu just being disabled, so use
> > > > the CPU_DEAD case of the cpu notifier chain. Moving the call out of
> > > > stop_machine() context is fine, as we just need to hold the domain RCU
> > > > lock and need the scheduler percpu data to be still allocated.
> > > > 
> > > > Add another hook for CPU_DOWN_PREPARE to bail out early in case
> > > > cpu_disable_scheduler() would fail. This will avoid crashes in rare
> > > > cases for cpu hotplug or suspend.
> > > > 
> > > > While at it remove a superfluous smp_mb() in the ARM __cpu_disable()
> > > > incarnation.
> > > 
> > > This is not obvious why the smp_mb() is superfluous. Can you please
> > > provide more details on why this is not necessary?
> > 
> > cpumask_clear_cpu() should already have the needed semantics, no?
> > It is based on clear_bit() which is defined to be atomic.
> 
> atomicity does not mean the store/load cannot be re-ordered by the CPU. You
> would need a barrier to prevent re-ordering.
> 
> cpumask_clear_cpu() and clear_bit() does not contain any barrier, so
> store/load can be re-ordered.
> 
> I see we have similar smp_mb() barrier in __cpu_die(). Sadly, there are no
> documentation in the code why the barrier is here. The logs don't help either.
> 
> The barrier here will ensure that the load/store related to disabling the CPU
> are seen before any load/store happening after the return. Although, I am not
> sure why this is necessary.
> 
> Stefano, Do you remember the rationale?

/me doing some archeology

I am pretty sure it was meant to accompany the cpumask_clear_cpu call. I
think we should keep it in __cpu_disable right after cpumask_clear_cpu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Xen-devel] [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier
@ 2019-04-16 19:34           ` Stefano Stabellini
  0 siblings, 0 replies; 111+ messages in thread
From: Stefano Stabellini @ 2019-04-16 19:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Dario Faggioli, Jan Beulich, xen-devel,
	Roger Pau Monné

On Mon, 1 Apr 2019, Julien Grall wrote:
> Hi,
> 
> On 4/1/19 10:40 AM, Juergen Gross wrote:
> > On 01/04/2019 11:21, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 3/29/19 3:08 PM, Juergen Gross wrote:
> > > > cpu_disable_scheduler() is being called from __cpu_disable() today.
> > > > There is no need to execute it on the cpu just being disabled, so use
> > > > the CPU_DEAD case of the cpu notifier chain. Moving the call out of
> > > > stop_machine() context is fine, as we just need to hold the domain RCU
> > > > lock and need the scheduler percpu data to be still allocated.
> > > > 
> > > > Add another hook for CPU_DOWN_PREPARE to bail out early in case
> > > > cpu_disable_scheduler() would fail. This will avoid crashes in rare
> > > > cases for cpu hotplug or suspend.
> > > > 
> > > > While at it remove a superfluous smp_mb() in the ARM __cpu_disable()
> > > > incarnation.
> > > 
> > > This is not obvious why the smp_mb() is superfluous. Can you please
> > > provide more details on why this is not necessary?
> > 
> > cpumask_clear_cpu() should already have the needed semantics, no?
> > It is based on clear_bit() which is defined to be atomic.
> 
> atomicity does not mean the store/load cannot be re-ordered by the CPU. You
> would need a barrier to prevent re-ordering.
> 
> cpumask_clear_cpu() and clear_bit() does not contain any barrier, so
> store/load can be re-ordered.
> 
> I see we have similar smp_mb() barrier in __cpu_die(). Sadly, there are no
> documentation in the code why the barrier is here. The logs don't help either.
> 
> The barrier here will ensure that the load/store related to disabling the CPU
> are seen before any load/store happening after the return. Although, I am not
> sure why this is necessary.
> 
> Stefano, Do you remember the rationale?

/me doing some archeology

I am pretty sure it was meant to accompany the cpumask_clear_cpu call. I
think we should keep it in __cpu_disable right after cpumask_clear_cpu.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2019-04-16 19:35 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-29 15:08 [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
2019-03-29 15:08 ` [PATCH RFC 01/49] xen/sched: call cpu_disable_scheduler() via cpu notifier Juergen Gross
2019-04-01  9:21   ` Julien Grall
2019-04-01  9:40     ` Juergen Gross
2019-04-01 10:29       ` Julien Grall
2019-04-01 10:37         ` Juergen Gross
2019-04-01 13:21           ` Julien Grall
2019-04-01 13:33             ` Juergen Gross
2019-04-01 14:01               ` Julien Grall
2019-04-01 14:23                 ` Juergen Gross
2019-04-01 15:15                   ` Julien Grall
2019-04-01 16:00                     ` Juergen Gross
2019-04-01 17:17                       ` Julien Grall
2019-04-16 19:34         ` Stefano Stabellini
2019-04-16 19:34           ` [Xen-devel] " Stefano Stabellini
2019-03-29 15:08 ` [PATCH RFC 02/49] xen: add helper for calling notifier_call_chain() to common/cpu.c Juergen Gross
2019-03-29 15:08 ` [PATCH RFC 03/49] xen: add new cpu notifier action CPU_RESUME_FAILED Juergen Gross
2019-03-29 15:08 ` [PATCH RFC 04/49] xen: don't free percpu areas during suspend Juergen Gross
2019-03-29 15:08 ` [PATCH RFC 05/49] xen/cpupool: simplify suspend/resume handling Juergen Gross
2019-03-29 15:08 ` [PATCH RFC 06/49] xen/sched: don't disable scheduler on cpus during suspend Juergen Gross
2019-03-29 15:08 ` [PATCH RFC 07/49] xen/sched: fix credit2 smt idle handling Juergen Gross
2019-03-29 18:22   ` Dario Faggioli
2019-03-29 15:08 ` [PATCH RFC 08/49] xen/sched: use new sched_item instead of vcpu in scheduler interfaces Juergen Gross
2019-03-29 18:42   ` Andrew Cooper
2019-03-30 10:24     ` Juergen Gross
2019-04-01  6:06       ` Juergen Gross
2019-04-01  7:05         ` Dario Faggioli
2019-04-01  8:19           ` Andrew Cooper
2019-04-01  8:49             ` Juergen Gross
2019-04-01 15:15             ` Dario Faggioli
2019-03-29 15:08 ` [PATCH RFC 09/49] xen/sched: alloc struct sched_item for each vcpu Juergen Gross
2019-03-29 15:08 ` [PATCH RFC 10/49] xen/sched: move per-vcpu scheduler private data pointer to sched_item Juergen Gross
2019-03-29 15:08 ` [PATCH RFC 11/49] xen/sched: build a linked list of struct sched_item Juergen Gross
2019-03-29 15:08 ` [PATCH RFC 12/49] xen/sched: introduce struct sched_resource Juergen Gross
2019-03-29 15:08 ` [PATCH RFC 13/49] xen/sched: let pick_cpu return a scheduler resource Juergen Gross
2019-03-29 15:08 ` [PATCH RFC 14/49] xen/sched: switch schedule_data.curr to point at sched_item Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 15/49] xen/sched: move per cpu scheduler private data into struct sched_resource Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 16/49] xen/sched: switch vcpu_schedule_lock to item_schedule_lock Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 17/49] xen/sched: move some per-vcpu items to struct sched_item Juergen Gross
2019-03-29 21:33   ` Andrew Cooper
2019-03-30  9:59     ` Juergen Gross
2019-04-01  5:59       ` Juergen Gross
2019-04-01  8:05         ` Jan Beulich
2019-04-01  8:26           ` Andrew Cooper
2019-04-01  8:41             ` Jan Beulich
2019-04-01  8:45             ` Juergen Gross
2019-04-01  8:01       ` Jan Beulich
2019-04-01  8:33         ` Andrew Cooper
2019-04-01  8:44           ` Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 18/49] xen/sched: add scheduler helpers hiding vcpu Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 19/49] xen/sched: add domain pointer to struct sched_item Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 20/49] xen/sched: add id " Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 21/49] xen/sched: rename scheduler related perf counters Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 22/49] xen/sched: switch struct task_slice from vcpu to sched_item Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 23/49] xen/sched: move is_running indicator to struct sched_item Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 24/49] xen/sched: make null scheduler vcpu agnostic Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 25/49] xen/sched: make rt " Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 26/49] xen/sched: make credit " Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 27/49] xen/sched: make credit2 " Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 28/49] xen/sched: make arinc653 " Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 29/49] xen: add sched_item_pause_nosync() and sched_item_unpause() Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 30/49] xen: let vcpu_create() select processor Juergen Gross
2019-03-29 19:17   ` Andrew Cooper
2019-03-30 10:23     ` Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 31/49] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers Juergen Gross
2019-03-29 19:36   ` Andrew Cooper
2019-03-30 10:22     ` Juergen Gross
2019-04-01  8:10       ` Jan Beulich
2019-03-29 15:09 ` [PATCH RFC 32/49] xen/sched: switch schedule() from vcpus to sched_items Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 33/49] xen/sched: switch sched_move_irqs() to take sched_item as parameter Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 34/49] xen: switch from for_each_vcpu() to for_each_sched_item() Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 35/49] xen/sched: add runstate counters to struct sched_item Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 36/49] xen/sched: rework and rename vcpu_force_reschedule() Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 37/49] xen/sched: Change vcpu_migrate_*() to operate on schedule item Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 38/49] xen/sched: move struct task_slice into struct sched_item Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 39/49] xen/sched: add code to sync scheduling of all vcpus of a sched item Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 40/49] xen/sched: add support for multiple vcpus per sched item where missing Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 41/49] x86: make loading of GDT at context switch more modular Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 42/49] xen/sched: add support for guest vcpu idle Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 43/49] xen/sched: modify cpupool_domain_cpumask() to be an item mask Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 44/49] xen: round up max vcpus to scheduling granularity Juergen Gross
2019-04-01  8:50   ` Andrew Cooper
2019-04-01  9:47     ` Juergen Gross
2019-04-02  7:49       ` Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 45/49] xen/sched: support allocating multiple vcpus into one sched item Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 46/49] xen/sched: add a scheduler_percpu_init() function Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 47/49] xen/sched: support core scheduling in continue_running() Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 48/49] xen/sched: make vcpu_wake() core scheduling aware Juergen Gross
2019-03-29 15:09 ` [PATCH RFC 49/49] xen/sched: add scheduling granularity enum Juergen Gross
2019-03-29 15:37 ` [PATCH RFC 00/49] xen: add core scheduling support Juergen Gross
2019-03-29 15:39 ` Jan Beulich
     [not found] ` <5C9E3C3D0200007800222FB0@suse.com>
2019-03-29 15:46   ` Juergen Gross
2019-03-29 16:56     ` Dario Faggioli
2019-03-29 17:00       ` Juergen Gross
2019-03-29 17:29         ` Dario Faggioli
2019-03-29 17:39         ` Rian Quinn
2019-03-29 17:48           ` Andrew Cooper
2019-03-29 18:35             ` Rian Quinn
2019-03-29 18:16 ` Dario Faggioli
2019-03-30  9:55   ` Juergen Gross
2019-04-11  0:34   ` Dario Faggioli
2019-04-11  0:34     ` [Xen-devel] " Dario Faggioli
2019-04-11  7:16     ` Juergen Gross
2019-04-11  7:16       ` [Xen-devel] " Juergen Gross
2019-04-11 13:28       ` Dario Faggioli
2019-04-11 13:28         ` [Xen-devel] " Dario Faggioli
2019-04-01  6:41 ` Jan Beulich
     [not found] ` <5CA1B285020000780022361D@suse.com>
2019-04-01  6:49   ` Juergen Gross
2019-04-01  7:10     ` Dario Faggioli
2019-04-01  7:15       ` Juergen Gross
2019-04-01  7:13     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.