[pli PATCH bla] Xen: credit2: avoid picking a spurious idle unit when caps are used

* [pli PATCH bla] Xen: credit2: avoid picking a spurious idle unit when caps are used
@ 2021-08-03 17:36 Dario Faggioli
  2021-08-03 17:42 ` Dario Faggioli
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dario Faggioli @ 2021-08-03 17:36 UTC (permalink / raw)
  To: xen-devel, xen-devel; +Cc: George Dunlap, George Dunlap, Jan Beulich

Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from the
runq if there is one") did not fix completely the problem of potentially
selecting a scheduling unit that will then not be able to run.

In fact, in case caps are used and the unit we are currently looking
at, during the runqueue scan, does not have budget to be executed, we
should continue looking instead than giving up and picking the idle
unit.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Suggested-by: George Dunlap <george.dunlap@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
This is necessary to completely fix the bug that was described in and
addressed by 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit
from the runq if there is one").

It should, therefore, be backported and applied to all the branches to
which that commit has been. About backports, it should be
straigthforward to do that until 4.13.

For 4.12 and earlier, it's trickier, but the fix is still necessary.
Actually, both 07b0eb5d0ef0 and this patch should be backported to that
branch!

I will provide the backports myself in a email that I'll send as a
reply to this one.

Regards,
Dario
---
 xen/common/sched/credit2.c |   85 +++++++++++++++++++++++++-------------------
 1 file changed, 49 insertions(+), 36 deletions(-)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index ebb09ea43a..f9b95db313 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -3463,48 +3463,61 @@ runq_candidate(struct csched2_runqueue_data *rqd,
                         (unsigned char *)&d);
         }
 
-        /* Skip non runnable units that we (temporarily) have in the runq */
-        if ( unlikely(!unit_runnable_state(svc->unit)) )
-            continue;
-
-        /* Only consider vcpus that are allowed to run on this processor. */
-        if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
-            continue;
-
         /*
-         * If an unit is meant to be picked up by another processor, and such
-         * processor has not scheduled yet, leave it in the runqueue for him.
+         * If the unit in the runqueue has more credit than current (or than
+         * idle, if current is not runnable) or if current is yielding, we may
+         * want to pick it up.
          */
-        if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
-             cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+        if ( (yield || svc->credit > snext->credit) )
         {
-            SCHED_STAT_CRANK(deferred_to_tickled_cpu);
-            continue;
-        }
+            /* Skip non runnable units that we (temporarily) have in the runq */
+            if ( unlikely(!unit_runnable_state(svc->unit)) )
+                continue;
 
-        /*
-         * If this is on a different processor, don't pull it unless
-         * its credit is at least CSCHED2_MIGRATE_RESIST higher.
-         */
-        if ( sched_unit_master(svc->unit) != cpu
-             && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
-        {
-            SCHED_STAT_CRANK(migrate_resisted);
-            continue;
-        }
+            /* Only consider vcpus that are allowed to run on this processor. */
+            if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
+                continue;
 
-        /*
-         * If the one in the runqueue has more credit than current (or idle,
-         * if current is not runnable), or if current is yielding, and also
-         * if the one in runqueue either is not capped, or is capped but has
-         * some budget, then choose it.
-         */
-        if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || unit_grab_budget(svc)) )
-            snext = svc;
+            /*
+             * If an unit is meant to be picked up by another processor, and such
+             * processor has not scheduled yet, leave it in the runqueue for him.
+             */
+            if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
+                 cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+            {
+                SCHED_STAT_CRANK(deferred_to_tickled_cpu);
+                continue;
+            }
 
-        /* In any case, if we got this far, break. */
-        break;
+            /*
+             * If this is on a different processor, don't pull it unless
+             * its credit is at least CSCHED2_MIGRATE_RESIST higher.
+             */
+            if ( sched_unit_master(svc->unit) != cpu
+                 && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
+            {
+                SCHED_STAT_CRANK(migrate_resisted);
+                continue;
+            }
+
+            /*
+             * If we are here, we are almost sure we want to pick the unit in
+             * the runqueue. Last thing we need to check is that it either is
+             * is not capped, or if it is, it has some budget.
+             *
+             * Note that cap & budget should really be the last thing we do
+             * check. In fact, unit_grab_budget() will reserve some budget for
+             * this unit, from the per-domain budget pool, and we want to do
+             * that only if we are sure that we'll then run the unit, consume
+             * some of it, and return the leftover (if any) in the usual way.
+             */
+            if ( has_cap(svc) && !unit_grab_budget(svc) )
+                continue;
+
+            /* If we got this far, we are done. */
+            snext = svc;
+            break;
+        }
     }
 
     if ( unlikely(tb_init_done) )




^ permalink raw reply related	[flat|nested] 7+ messages in thread