[RFC PATCH v1 8/8] xen: sched: Credit2 group-scheduling: anti-starvation measures

From: Dario Faggioli <dfaggioli@suse.com>
To: xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@citrix.com>
Subject: [RFC PATCH v1 8/8] xen: sched: Credit2 group-scheduling: anti-starvation measures
Date: Fri, 12 Oct 2018 19:44:44 +0200	[thread overview]
Message-ID: <153936628484.22652.13344399217340851105.stgit@wayrath> (raw)
In-Reply-To: <153936590062.22652.12114301510794181099.stgit@wayrath>

With group scheduling enabled, if a vcpu of, say, domain A, is already
running on a CPU, the other CPUs of the group can only run vcpus of
that same domain. And in fact, we scan the runqueue and look for one.

But then what can happen is that vcpus of domain A takes turns at
switching between idle/blocked and running, and manage to keep every
other (vcpus of the other) domains out of a group of CPUs for long time,
or even indefinitely (impacting fairness, or causing starvation).

To avoid this, let's limit how deep we go along the runqueue in search
of a vcpu of domain A. That is, if we don't find any that have at least
a certain amount of credits less than what the vcpu at the top of the
runqueue has, give up and keep the CPU idle.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
---
TODO:
- for now, CSCHED2_MIN_TIMER is what's used as threshold, but this can
  use some tuning (e.g., it probably wants to be adaptive, depending on
  how wide the coscheduling group of CPUs is, etc.)
---
 xen/common/sched_credit2.c |   32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index d2b4c907dc..a23c8f18d6 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3476,7 +3476,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
                unsigned int *skipped)
 {
     struct list_head *iter, *temp;
-    struct csched2_vcpu *snext = NULL;
+    struct csched2_vcpu *first_svc, *snext = NULL;
     struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
     struct csched2_grpsched_data *gscd = c2gscd(cpu);
     bool yield = false, soft_aff_preempt = false;
@@ -3568,11 +3568,28 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * Of course, we also default to idle also if scurr is not runnable.
      */
     if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
+
         snext = scurr;
     else
         snext = csched2_vcpu(idle_vcpu[cpu]);
 
  check_runq:
+    /*
+     * To retain fairness, and avoid starvation issues, we don't let
+     * group scheduling make us run vcpus which are too far behing (i.e.,
+     * have less credits) than what is currently in the runqueue.
+     *
+     * XXX Just use MIN_TIMER as the threshold, for now.
+     */
+    first_svc = list_entry(&rqd->runq, struct csched2_vcpu, runq_elem);
+    if ( grpsched_enabled() && !is_idle_vcpu(scurr->vcpu) &&
+         !list_empty(&rqd->runq) )
+    {
+        ASSERT(gscd->sdom != NULL);
+        if ( scurr->credit < first_svc->credit - CSCHED2_MIN_TIMER )
+            snext = csched2_vcpu(idle_vcpu[cpu]);
+    }
+
     list_for_each_safe( iter, temp, &rqd->runq )
     {
         struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
@@ -3637,6 +3654,19 @@ runq_candidate(struct csched2_runqueue_data *rqd,
             continue;
         }
 
+        /*
+         * As stated above, let's not go too far and risk picking up
+         * a vcpu which has too much lower credits than the one we would
+         * have picked if group scheduling was not enabled.
+         *
+         * There's a risk that this means leaving the CPU idle (if we don't
+         * find vcpus that satisfy this rule, and also the group scheduling
+         * constraints)... but that's what coscheduling is all about!
+         */
+        if ( grpsched_enabled() && gscd->sdom != NULL &&
+             svc->credit < first_svc->credit - CSCHED2_MIN_TIMER )
+            break;
+
         /*
          * If the one in the runqueue has more credit than current (or idle,
          * if current is not runnable), or if current is yielding, and also


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel