[PATCH] SEDF: avoid gathering vCPU-s on pCPU0

* [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
@ 2013-03-01 15:35 Jan Beulich
  2013-03-01 15:54 ` Keir Fraser
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Jan Beulich @ 2013-03-01 15:35 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2233 bytes --]

The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
incompatible with the SEDF scheduler: Any vCPU using
VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
guests) ends up on pCPU0 after that call. Obviously, running all PV
guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
guests rather sooner than later.

So the main thing that was clearly wrong (and bogus from the beginning)
was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
by a construct that prefers to put back the vCPU on the pCPU that it
got launched on.

However, there's one more glitch: When reducing the affinity of a vCPU
temporarily, and then widening it again to a set that includes the pCPU
that the vCPU was last running on, the generic scheduler code would not
force a migration of that vCPU, and hence it would forever stay on the
pCPU it last ran on. Since that can again create a load imbalance, the
SEDF scheduler wants a migration to happen regardless of it being
apparently unnecessary.

Of course, an alternative to checking for SEDF explicitly in
vcpu_set_affinity() would be to introduce a flags field in struct
scheduler, and have SEDF set a "always-migrate-on-affinity-change"
flag.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/sched_sedf.c
+++ b/xen/common/sched_sedf.c
@@ -397,7 +397,8 @@ static int sedf_pick_cpu(const struct sc
 
     online = cpupool_scheduler_cpumask(v->domain->cpupool);
     cpumask_and(&online_affinity, v->cpu_affinity, online);
-    return cpumask_first(&online_affinity);
+    return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1,
+                         &online_affinity);
 }
 
 /*
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -605,7 +605,8 @@ int vcpu_set_affinity(struct vcpu *v, co
     vcpu_schedule_lock_irq(v);
 
     cpumask_copy(v->cpu_affinity, affinity);
-    if ( !cpumask_test_cpu(v->processor, v->cpu_affinity) )
+    if ( VCPU2OP(v)->sched_id == XEN_SCHEDULER_SEDF ||
+         !cpumask_test_cpu(v->processor, v->cpu_affinity) )
         set_bit(_VPF_migrating, &v->pause_flags);
 
     vcpu_schedule_unlock_irq(v);




[-- Attachment #2: sedf-spread-load.patch --]
[-- Type: text/plain, Size: 2268 bytes --]

SEDF: avoid gathering vCPU-s on pCPU0

The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
incompatible with the SEDF scheduler: Any vCPU using
VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
guests) ends up on pCPU0 after that call. Obviously, running all PV
guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
guests rather sooner than later.

So the main thing that was clearly wrong (and bogus from the beginning)
was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
by a construct that prefers to put back the vCPU on the pCPU that it
got launched on.

However, there's one more glitch: When reducing the affinity of a vCPU
temporarily, and then widening it again to a set that includes the pCPU
that the vCPU was last running on, the generic scheduler code would not
force a migration of that vCPU, and hence it would forever stay on the
pCPU it last ran on. Since that can again create a load imbalance, the
SEDF scheduler wants a migration to happen regardless of it being
apparently unnecessary.

Of course, an alternative to checking for SEDF explicitly in
vcpu_set_affinity() would be to introduce a flags field in struct
scheduler, and have SEDF set a "always-migrate-on-affinity-change"
flag.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/sched_sedf.c
+++ b/xen/common/sched_sedf.c
@@ -397,7 +397,8 @@ static int sedf_pick_cpu(const struct sc
 
     online = cpupool_scheduler_cpumask(v->domain->cpupool);
     cpumask_and(&online_affinity, v->cpu_affinity, online);
-    return cpumask_first(&online_affinity);
+    return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1,
+                         &online_affinity);
 }
 
 /*
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -605,7 +605,8 @@ int vcpu_set_affinity(struct vcpu *v, co
     vcpu_schedule_lock_irq(v);
 
     cpumask_copy(v->cpu_affinity, affinity);
-    if ( !cpumask_test_cpu(v->processor, v->cpu_affinity) )
+    if ( VCPU2OP(v)->sched_id == XEN_SCHEDULER_SEDF ||
+         !cpumask_test_cpu(v->processor, v->cpu_affinity) )
         set_bit(_VPF_migrating, &v->pause_flags);
 
     vcpu_schedule_unlock_irq(v);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread