xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] credit2: make sure we pick a runnable unit from the runq if there is one
@ 2021-05-28 15:12 Dario Faggioli
  2021-06-07 12:10 ` George Dunlap
  0 siblings, 1 reply; 3+ messages in thread
From: Dario Faggioli @ 2021-05-28 15:12 UTC (permalink / raw)
  To: xen-devel
  Cc: Michał Leszczyński, Dion Kant, George Dunlap,
	Jan Beulich, Michał Leszczyński, Dion Kant

A !runnable unit (temporarily) present in the runq may cause us to
stop scanning the runq itself too early. Of course, we don't run any
non-runnable vCPUs, but we end the scan and we fallback to picking
the idle unit. In other word, this prevent us to find there and pick
the actual unit that we're meant to start running (which might be
further ahead in the runq).

Depending on the vCPU pinning configuration, this may lead to such
unit to be stuck in the runq for long time, causing malfunctioning
inside the guest.

Fix this by checking runnable/non-runnable status up-front, in the runq
scanning function.

Reported-by: Michał Leszczyński <michal.leszczynski@cert.pl>
Reported-by: Dion Kant <g.w.kant@hunenet.nl>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Michał Leszczyński <michal.leszczynski@cert.pl>
Cc: Dion Kant <g.w.kant@hunenet.nl>
---
This is a bugfix and it solves the following problems, reported in
various ways:
* https://lists.xen.org/archives/html/xen-devel/2020-05/msg01985.html
* https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01561.html
* https://bugzilla.opensuse.org/show_bug.cgi?id=1179246

Hence, it should be backported, I'd say as far as possible... At least
to all the releases that have Credit2 as the default scheduler.

I will look further into this, and I think I can provide the backports
myself.

I'd like to send a *huge* thank you to Dion Kant who arranged for me to
be able to use a box where it was particularly easy to reproduce the
bug, and that was for all the time that it took me to finally be able to
work on this properly and nail it! :-)
---
 xen/common/sched/credit2.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index eb5e5a78c5..f5c1e5b944 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -3463,6 +3463,10 @@ runq_candidate(struct csched2_runqueue_data *rqd,
                         (unsigned char *)&d);
         }
 
+        /* Skip non runnable units that we (temporarily) have in the runq */
+        if ( unlikely(!unit_runnable_state(svc->unit)) )
+            continue;
+
         /* Only consider vcpus that are allowed to run on this processor. */
         if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
             continue;
@@ -3496,8 +3500,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          * some budget, then choose it.
          */
         if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || unit_grab_budget(svc)) &&
-             unit_runnable_state(svc->unit) )
+             (!has_cap(svc) || unit_grab_budget(svc)) )
             snext = svc;
 
         /* In any case, if we got this far, break. */




^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-06-21 16:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-28 15:12 [PATCH] credit2: make sure we pick a runnable unit from the runq if there is one Dario Faggioli
2021-06-07 12:10 ` George Dunlap
2021-06-21 16:12   ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).