* Re: [pli PATCH bla] Xen: credit2: avoid picking a spurious idle unit when caps are used
2021-08-03 17:36 [pli PATCH bla] Xen: credit2: avoid picking a spurious idle unit when caps are used Dario Faggioli
@ 2021-08-03 17:42 ` Dario Faggioli
2021-08-03 17:44 ` [PATCH] " Dario Faggioli
2021-08-04 7:37 ` Jan Beulich
2 siblings, 0 replies; 7+ messages in thread
From: Dario Faggioli @ 2021-08-03 17:42 UTC (permalink / raw)
To: xen-devel; +Cc: George Dunlap, Jan Beulich
[-- Attachment #1: Type: text/plain, Size: 7492 bytes --]
Err... of course, the "pli" and "bla" stuff between the [] are a
leftover of some experiments that I had to do with `stg email`, due to
mail handling changes, and should really not have been there in this
email...
Sorry. :-/
On Tue, 2021-08-03 at 19:36 +0200, Dario Faggioli wrote:
> Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from
> the
> runq if there is one") did not fix completely the problem of
> potentially
> selecting a scheduling unit that will then not be able to run.
>
> In fact, in case caps are used and the unit we are currently looking
> at, during the runqueue scan, does not have budget to be executed, we
> should continue looking instead than giving up and picking the idle
> unit.
>
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> Suggested-by: George Dunlap <george.dunlap@citrix.com>
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
> This is necessary to completely fix the bug that was described in and
> addressed by 07b0eb5d0ef0 ("credit2: make sure we pick a runnable
> unit
> from the runq if there is one").
>
> It should, therefore, be backported and applied to all the branches
> to
> which that commit has been. About backports, it should be
> straigthforward to do that until 4.13.
>
> For 4.12 and earlier, it's trickier, but the fix is still necessary.
> Actually, both 07b0eb5d0ef0 and this patch should be backported to
> that
> branch!
>
> I will provide the backports myself in a email that I'll send as a
> reply to this one.
>
> Regards,
> Dario
> ---
> xen/common/sched/credit2.c | 85 +++++++++++++++++++++++++---------
> ----------
> 1 file changed, 49 insertions(+), 36 deletions(-)
>
> diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
> index ebb09ea43a..f9b95db313 100644
> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -3463,48 +3463,61 @@ runq_candidate(struct csched2_runqueue_data
> *rqd,
> (unsigned char *)&d);
> }
>
> - /* Skip non runnable units that we (temporarily) have in the
> runq */
> - if ( unlikely(!unit_runnable_state(svc->unit)) )
> - continue;
> -
> - /* Only consider vcpus that are allowed to run on this
> processor. */
> - if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
> - continue;
> -
> /*
> - * If an unit is meant to be picked up by another processor,
> and such
> - * processor has not scheduled yet, leave it in the runqueue
> for him.
> + * If the unit in the runqueue has more credit than current
> (or than
> + * idle, if current is not runnable) or if current is
> yielding, we may
> + * want to pick it up.
> */
> - if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
> - cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> + if ( (yield || svc->credit > snext->credit) )
> {
> - SCHED_STAT_CRANK(deferred_to_tickled_cpu);
> - continue;
> - }
> + /* Skip non runnable units that we (temporarily) have in
> the runq */
> + if ( unlikely(!unit_runnable_state(svc->unit)) )
> + continue;
>
> - /*
> - * If this is on a different processor, don't pull it unless
> - * its credit is at least CSCHED2_MIGRATE_RESIST higher.
> - */
> - if ( sched_unit_master(svc->unit) != cpu
> - && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit
> )
> - {
> - SCHED_STAT_CRANK(migrate_resisted);
> - continue;
> - }
> + /* Only consider vcpus that are allowed to run on this
> processor. */
> + if ( !cpumask_test_cpu(cpu, svc->unit-
> >cpu_hard_affinity) )
> + continue;
>
> - /*
> - * If the one in the runqueue has more credit than current
> (or idle,
> - * if current is not runnable), or if current is yielding,
> and also
> - * if the one in runqueue either is not capped, or is capped
> but has
> - * some budget, then choose it.
> - */
> - if ( (yield || svc->credit > snext->credit) &&
> - (!has_cap(svc) || unit_grab_budget(svc)) )
> - snext = svc;
> + /*
> + * If an unit is meant to be picked up by another
> processor, and such
> + * processor has not scheduled yet, leave it in the
> runqueue for him.
> + */
> + if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu
> &&
> + cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> + {
> + SCHED_STAT_CRANK(deferred_to_tickled_cpu);
> + continue;
> + }
>
> - /* In any case, if we got this far, break. */
> - break;
> + /*
> + * If this is on a different processor, don't pull it
> unless
> + * its credit is at least CSCHED2_MIGRATE_RESIST higher.
> + */
> + if ( sched_unit_master(svc->unit) != cpu
> + && snext->credit + CSCHED2_MIGRATE_RESIST > svc-
> >credit )
> + {
> + SCHED_STAT_CRANK(migrate_resisted);
> + continue;
> + }
> +
> + /*
> + * If we are here, we are almost sure we want to pick
> the unit in
> + * the runqueue. Last thing we need to check is that it
> either is
> + * is not capped, or if it is, it has some budget.
> + *
> + * Note that cap & budget should really be the last
> thing we do
> + * check. In fact, unit_grab_budget() will reserve some
> budget for
> + * this unit, from the per-domain budget pool, and we
> want to do
> + * that only if we are sure that we'll then run the
> unit, consume
> + * some of it, and return the leftover (if any) in the
> usual way.
> + */
> + if ( has_cap(svc) && !unit_grab_budget(svc) )
> + continue;
> +
> + /* If we got this far, we are done. */
> + snext = svc;
> + break;
> + }
> }
>
> if ( unlikely(tb_init_done) )
>
>
>
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Xen: credit2: avoid picking a spurious idle unit when caps are used
2021-08-03 17:36 [pli PATCH bla] Xen: credit2: avoid picking a spurious idle unit when caps are used Dario Faggioli
2021-08-03 17:42 ` Dario Faggioli
@ 2021-08-03 17:44 ` Dario Faggioli
2021-08-04 7:37 ` Jan Beulich
2 siblings, 0 replies; 7+ messages in thread
From: Dario Faggioli @ 2021-08-03 17:44 UTC (permalink / raw)
To: xen-devel; +Cc: George Dunlap, Jan Beulich
[-- Attachment #1.1: Type: text/plain, Size: 915 bytes --]
On Tue, 2021-08-03 at 19:36 +0200, Dario Faggioli wrote:
> It should, therefore, be backported and applied to all the branches
> to
> which that commit has been. About backports, it should be
> straigthforward to do that until 4.13.
>
> For 4.12 and earlier, it's trickier, but the fix is still necessary.
> Actually, both 07b0eb5d0ef0 and this patch should be backported to
> that
> branch!
>
> I will provide the backports myself in a email that I'll send as a
> reply to this one.
>
And here they are, attached, the various backports, as they are
described in the paragraph above.
Hope this helps.
Thanks and Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)
[-- Attachment #1.2: xen-credit2-avoid-picking-idle_4.15.patch --]
[-- Type: text/x-patch, Size: 5111 bytes --]
commit 64a39b82c19c706985da91e41ba09d984b561e8a
Author: Dario Faggioli <dfaggioli@suse.com>
Date: Mon Jun 21 18:25:34 2021 +0200
Xen: credit2: avoid picking a spurious idle unit when caps are used
Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from the
runq if there is one") did not fix completely the problem of potentially
selecting a scheduling unit that will then not be able to run.
In fact, in case caps are used and the unit we are currently looking
at, during the runqueue scan, does not have budget to be executed, we
should continue looking instead than giving up and picking the idle
unit.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Suggested-by: George Dunlap <george.dunlap@citrix.com>
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index ebb09ea43a..f9b95db313 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -3463,48 +3463,61 @@ runq_candidate(struct csched2_runqueue_data *rqd,
(unsigned char *)&d);
}
- /* Skip non runnable units that we (temporarily) have in the runq */
- if ( unlikely(!unit_runnable_state(svc->unit)) )
- continue;
-
- /* Only consider vcpus that are allowed to run on this processor. */
- if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
- continue;
-
/*
- * If an unit is meant to be picked up by another processor, and such
- * processor has not scheduled yet, leave it in the runqueue for him.
+ * If the unit in the runqueue has more credit than current (or than
+ * idle, if current is not runnable) or if current is yielding, we may
+ * want to pick it up.
*/
- if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
- cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+ if ( (yield || svc->credit > snext->credit) )
{
- SCHED_STAT_CRANK(deferred_to_tickled_cpu);
- continue;
- }
+ /* Skip non runnable units that we (temporarily) have in the runq */
+ if ( unlikely(!unit_runnable_state(svc->unit)) )
+ continue;
- /*
- * If this is on a different processor, don't pull it unless
- * its credit is at least CSCHED2_MIGRATE_RESIST higher.
- */
- if ( sched_unit_master(svc->unit) != cpu
- && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
- {
- SCHED_STAT_CRANK(migrate_resisted);
- continue;
- }
+ /* Only consider vcpus that are allowed to run on this processor. */
+ if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
+ continue;
- /*
- * If the one in the runqueue has more credit than current (or idle,
- * if current is not runnable), or if current is yielding, and also
- * if the one in runqueue either is not capped, or is capped but has
- * some budget, then choose it.
- */
- if ( (yield || svc->credit > snext->credit) &&
- (!has_cap(svc) || unit_grab_budget(svc)) )
- snext = svc;
+ /*
+ * If an unit is meant to be picked up by another processor, and such
+ * processor has not scheduled yet, leave it in the runqueue for him.
+ */
+ if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
+ cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+ {
+ SCHED_STAT_CRANK(deferred_to_tickled_cpu);
+ continue;
+ }
- /* In any case, if we got this far, break. */
- break;
+ /*
+ * If this is on a different processor, don't pull it unless
+ * its credit is at least CSCHED2_MIGRATE_RESIST higher.
+ */
+ if ( sched_unit_master(svc->unit) != cpu
+ && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
+ {
+ SCHED_STAT_CRANK(migrate_resisted);
+ continue;
+ }
+
+ /*
+ * If we are here, we are almost sure we want to pick the unit in
+ * the runqueue. Last thing we need to check is that it either is
+ * is not capped, or if it is, it has some budget.
+ *
+ * Note that cap & budget should really be the last thing we do
+ * check. In fact, unit_grab_budget() will reserve some budget for
+ * this unit, from the per-domain budget pool, and we want to do
+ * that only if we are sure that we'll then run the unit, consume
+ * some of it, and return the leftover (if any) in the usual way.
+ */
+ if ( has_cap(svc) && !unit_grab_budget(svc) )
+ continue;
+
+ /* If we got this far, we are done. */
+ snext = svc;
+ break;
+ }
}
if ( unlikely(tb_init_done) )
[-- Attachment #1.3: xen-credit2-avoid-picking-idle_4.14.patch --]
[-- Type: text/x-patch, Size: 5111 bytes --]
commit 1790a997476ae0bafcb4c83145b2c1395d57bfb5
Author: Dario Faggioli <dfaggioli@suse.com>
Date: Mon Jun 21 18:25:34 2021 +0200
Xen: credit2: avoid picking a spurious idle unit when caps are used
Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from the
runq if there is one") did not fix completely the problem of potentially
selecting a scheduling unit that will then not be able to run.
In fact, in case caps are used and the unit we are currently looking
at, during the runqueue scan, does not have budget to be executed, we
should continue looking instead than giving up and picking the idle
unit.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Suggested-by: George Dunlap <george.dunlap@citrix.com>
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index ebb09ea43a..f9b95db313 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -3463,48 +3463,61 @@ runq_candidate(struct csched2_runqueue_data *rqd,
(unsigned char *)&d);
}
- /* Skip non runnable units that we (temporarily) have in the runq */
- if ( unlikely(!unit_runnable_state(svc->unit)) )
- continue;
-
- /* Only consider vcpus that are allowed to run on this processor. */
- if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
- continue;
-
/*
- * If an unit is meant to be picked up by another processor, and such
- * processor has not scheduled yet, leave it in the runqueue for him.
+ * If the unit in the runqueue has more credit than current (or than
+ * idle, if current is not runnable) or if current is yielding, we may
+ * want to pick it up.
*/
- if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
- cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+ if ( (yield || svc->credit > snext->credit) )
{
- SCHED_STAT_CRANK(deferred_to_tickled_cpu);
- continue;
- }
+ /* Skip non runnable units that we (temporarily) have in the runq */
+ if ( unlikely(!unit_runnable_state(svc->unit)) )
+ continue;
- /*
- * If this is on a different processor, don't pull it unless
- * its credit is at least CSCHED2_MIGRATE_RESIST higher.
- */
- if ( sched_unit_master(svc->unit) != cpu
- && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
- {
- SCHED_STAT_CRANK(migrate_resisted);
- continue;
- }
+ /* Only consider vcpus that are allowed to run on this processor. */
+ if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
+ continue;
- /*
- * If the one in the runqueue has more credit than current (or idle,
- * if current is not runnable), or if current is yielding, and also
- * if the one in runqueue either is not capped, or is capped but has
- * some budget, then choose it.
- */
- if ( (yield || svc->credit > snext->credit) &&
- (!has_cap(svc) || unit_grab_budget(svc)) )
- snext = svc;
+ /*
+ * If an unit is meant to be picked up by another processor, and such
+ * processor has not scheduled yet, leave it in the runqueue for him.
+ */
+ if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
+ cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+ {
+ SCHED_STAT_CRANK(deferred_to_tickled_cpu);
+ continue;
+ }
- /* In any case, if we got this far, break. */
- break;
+ /*
+ * If this is on a different processor, don't pull it unless
+ * its credit is at least CSCHED2_MIGRATE_RESIST higher.
+ */
+ if ( sched_unit_master(svc->unit) != cpu
+ && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
+ {
+ SCHED_STAT_CRANK(migrate_resisted);
+ continue;
+ }
+
+ /*
+ * If we are here, we are almost sure we want to pick the unit in
+ * the runqueue. Last thing we need to check is that it either is
+ * is not capped, or if it is, it has some budget.
+ *
+ * Note that cap & budget should really be the last thing we do
+ * check. In fact, unit_grab_budget() will reserve some budget for
+ * this unit, from the per-domain budget pool, and we want to do
+ * that only if we are sure that we'll then run the unit, consume
+ * some of it, and return the leftover (if any) in the usual way.
+ */
+ if ( has_cap(svc) && !unit_grab_budget(svc) )
+ continue;
+
+ /* If we got this far, we are done. */
+ snext = svc;
+ break;
+ }
}
if ( unlikely(tb_init_done) )
[-- Attachment #1.4: xen-credit2-avoid-picking-idle_4.13.patch --]
[-- Type: text/x-patch, Size: 5111 bytes --]
commit 79ebf54e08f68819ec400364da734a53f2a10417
Author: Dario Faggioli <dfaggioli@suse.com>
Date: Mon Jun 21 18:25:34 2021 +0200
Xen: credit2: avoid picking a spurious idle unit when caps are used
Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from the
runq if there is one") did not fix completely the problem of potentially
selecting a scheduling unit that will then not be able to run.
In fact, in case caps are used and the unit we are currently looking
at, during the runqueue scan, does not have budget to be executed, we
should continue looking instead than giving up and picking the idle
unit.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Suggested-by: George Dunlap <george.dunlap@citrix.com>
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 8ba741e379..f38760a750 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3368,48 +3368,61 @@ runq_candidate(struct csched2_runqueue_data *rqd,
(unsigned char *)&d);
}
- /* Skip non runnable units that we (temporarily) have in the runq */
- if ( unlikely(!unit_runnable_state(svc->unit)) )
- continue;
-
- /* Only consider vcpus that are allowed to run on this processor. */
- if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
- continue;
-
/*
- * If an unit is meant to be picked up by another processor, and such
- * processor has not scheduled yet, leave it in the runqueue for him.
+ * If the unit in the runqueue has more credit than current (or than
+ * idle, if current is not runnable) or if current is yielding, we may
+ * want to pick it up.
*/
- if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
- cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+ if ( (yield || svc->credit > snext->credit) )
{
- SCHED_STAT_CRANK(deferred_to_tickled_cpu);
- continue;
- }
+ /* Skip non runnable units that we (temporarily) have in the runq */
+ if ( unlikely(!unit_runnable_state(svc->unit)) )
+ continue;
- /*
- * If this is on a different processor, don't pull it unless
- * its credit is at least CSCHED2_MIGRATE_RESIST higher.
- */
- if ( sched_unit_master(svc->unit) != cpu
- && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
- {
- SCHED_STAT_CRANK(migrate_resisted);
- continue;
- }
+ /* Only consider vcpus that are allowed to run on this processor. */
+ if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
+ continue;
- /*
- * If the one in the runqueue has more credit than current (or idle,
- * if current is not runnable), or if current is yielding, and also
- * if the one in runqueue either is not capped, or is capped but has
- * some budget, then choose it.
- */
- if ( (yield || svc->credit > snext->credit) &&
- (!has_cap(svc) || unit_grab_budget(svc)) )
- snext = svc;
+ /*
+ * If an unit is meant to be picked up by another processor, and such
+ * processor has not scheduled yet, leave it in the runqueue for him.
+ */
+ if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
+ cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+ {
+ SCHED_STAT_CRANK(deferred_to_tickled_cpu);
+ continue;
+ }
- /* In any case, if we got this far, break. */
- break;
+ /*
+ * If this is on a different processor, don't pull it unless
+ * its credit is at least CSCHED2_MIGRATE_RESIST higher.
+ */
+ if ( sched_unit_master(svc->unit) != cpu
+ && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
+ {
+ SCHED_STAT_CRANK(migrate_resisted);
+ continue;
+ }
+
+ /*
+ * If we are here, we are almost sure we want to pick the unit in
+ * the runqueue. Last thing we need to check is that it either is
+ * is not capped, or if it is, it has some budget.
+ *
+ * Note that cap & budget should really be the last thing we do
+ * check. In fact, unit_grab_budget() will reserve some budget for
+ * this unit, from the per-domain budget pool, and we want to do
+ * that only if we are sure that we'll then run the unit, consume
+ * some of it, and return the leftover (if any) in the usual way.
+ */
+ if ( has_cap(svc) && !unit_grab_budget(svc) )
+ continue;
+
+ /* If we got this far, we are done. */
+ snext = svc;
+ break;
+ }
}
if ( unlikely(tb_init_done) )
[-- Attachment #1.5: xen-credit2-avoid-picking-idle_patch1_4.12.patch --]
[-- Type: text/x-patch, Size: 1741 bytes --]
commit baccc50fcb7cec6f5ec84473dad28847b65b65e8
Author: Dario Faggioli <dfaggioli@suse.com>
Date: Fri May 28 17:12:48 2021 +0200
credit2: make sure we pick a runnable unit from the runq if there is one
A !runnable unit (temporarily) present in the runq may cause us to
stop scanning the runq itself too early. Of course, we don't run any
non-runnable vCPUs, but we end the scan and we fallback to picking
the idle unit. In other word, this prevent us to find there and pick
the actual unit that we're meant to start running (which might be
further ahead in the runq).
Depending on the vCPU pinning configuration, this may lead to such
unit to be stuck in the runq for long time, causing malfunctioning
inside the guest.
Fix this by checking runnable/non-runnable status up-front, in the runq
scanning function.
Reported-by: Michał Leszczyński <michal.leszczynski@cert.pl>
Reported-by: Dion Kant <g.w.kant@hunenet.nl>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index d6ebd126de..d89e340905 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3361,6 +3361,10 @@ runq_candidate(struct csched2_runqueue_data *rqd,
(unsigned char *)&d);
}
+ /* Skip non runnable vcpus that we (temporarily) have in the runq */
+ if ( unlikely(!vcpu_runnable(svc->vcpu)) )
+ continue;
+
/* Only consider vcpus that are allowed to run on this processor. */
if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
continue;
[-- Attachment #1.6: xen-credit2-avoid-picking-idle_patch2_4.12.patch --]
[-- Type: text/x-patch, Size: 5081 bytes --]
commit 0c56ff1edfaffb83ff96aaa3024d37741a349288
Author: Dario Faggioli <dfaggioli@suse.com>
Date: Mon Jun 21 18:25:34 2021 +0200
Xen: credit2: avoid picking a spurious idle vcpu when caps are used
Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable vcpu from the
runq if there is one") did not fix completely the problem of potentially
selecting a scheduling vcpu that will then not be able to run.
In fact, in case caps are used and the vcpu we are currently looking
at, during the runqueue scan, does not have budget to be executed, we
should continue looking instead than giving up and picking the idle
vcpu.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Suggested-by: George Dunlap <george.dunlap@citrix.com>
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index d89e340905..16b8c2928e 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3361,48 +3361,61 @@ runq_candidate(struct csched2_runqueue_data *rqd,
(unsigned char *)&d);
}
- /* Skip non runnable vcpus that we (temporarily) have in the runq */
- if ( unlikely(!vcpu_runnable(svc->vcpu)) )
- continue;
-
- /* Only consider vcpus that are allowed to run on this processor. */
- if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
- continue;
-
/*
- * If a vcpu is meant to be picked up by another processor, and such
- * processor has not scheduled yet, leave it in the runqueue for him.
+ * If the vcpu in the runqueue has more credit than current (or than
+ * idle, if current is not runnable) or if current is yielding, we may
+ * want to pick it up.
*/
- if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
- cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+ if ( (yield || svc->credit > snext->credit) )
{
- SCHED_STAT_CRANK(deferred_to_tickled_cpu);
- continue;
- }
+ /* Skip non runnable vcpus that we (temporarily) have in the runq */
+ if ( unlikely(!vcpu_runnable(svc->vcpu)) )
+ continue;
- /*
- * If this is on a different processor, don't pull it unless
- * its credit is at least CSCHED2_MIGRATE_RESIST higher.
- */
- if ( svc->vcpu->processor != cpu
- && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
- {
- SCHED_STAT_CRANK(migrate_resisted);
- continue;
- }
+ /* Only consider vcpus that are allowed to run on this processor. */
+ if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
+ continue;
- /*
- * If the one in the runqueue has more credit than current (or idle,
- * if current is not runnable), or if current is yielding, and also
- * if the one in runqueue either is not capped, or is capped but has
- * some budget, then choose it.
- */
- if ( (yield || svc->credit > snext->credit) &&
- (!has_cap(svc) || vcpu_grab_budget(svc)) )
- snext = svc;
+ /*
+ * If a vcpu is meant to be picked up by another processor, and such
+ * processor has not scheduled yet, leave it in the runqueue for him.
+ */
+ if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
+ cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+ {
+ SCHED_STAT_CRANK(deferred_to_tickled_cpu);
+ continue;
+ }
- /* In any case, if we got this far, break. */
- break;
+ /*
+ * If this is on a different processor, don't pull it unless
+ * its credit is at least CSCHED2_MIGRATE_RESIST higher.
+ */
+ if ( svc->vcpu->processor != cpu
+ && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
+ {
+ SCHED_STAT_CRANK(migrate_resisted);
+ continue;
+ }
+
+ /*
+ * If we are here, we are almost sure we want to pick the vcpu in
+ * the runqueue. Last thing we need to check is that it either is
+ * is not capped, or if it is, it has some budget.
+ *
+ * Note that cap & budget should really be the last thing we do
+ * check. In fact, vcpu_grab_budget() will reserve some budget for
+ * this vcpu, from the per-domain budget pool, and we want to do
+ * that only if we are sure that we'll then run the unit, consume
+ * some of it, and return the leftover (if any) in the usual way.
+ */
+ if ( has_cap(svc) && !vcpu_grab_budget(svc) )
+ continue;
+
+ /* If we got this far, we are done. */
+ snext = svc;
+ break;
+ }
}
if ( unlikely(tb_init_done) )
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Xen: credit2: avoid picking a spurious idle unit when caps are used
2021-08-03 17:36 [pli PATCH bla] Xen: credit2: avoid picking a spurious idle unit when caps are used Dario Faggioli
2021-08-03 17:42 ` Dario Faggioli
2021-08-03 17:44 ` [PATCH] " Dario Faggioli
@ 2021-08-04 7:37 ` Jan Beulich
2021-08-04 13:28 ` Dario Faggioli
2 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2021-08-04 7:37 UTC (permalink / raw)
To: Dario Faggioli; +Cc: George Dunlap, xen-devel
On 03.08.2021 19:36, Dario Faggioli wrote:
> Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from the
> runq if there is one") did not fix completely the problem of potentially
> selecting a scheduling unit that will then not be able to run.
>
> In fact, in case caps are used and the unit we are currently looking
> at, during the runqueue scan, does not have budget to be executed, we
> should continue looking instead than giving up and picking the idle
> unit.
>
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> Suggested-by: George Dunlap <george.dunlap@citrix.com>
Minor remark: Generally I think the order of tags should follow the
timeline: Suggestions (or bug reports) come before patch creation,
which in turns comes before reviewing / acking of a patch.
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
Since George is on leave and since I was Cc-ed, I thought I'd make an
attempt at reviewing this. The more that ...
> ---
> This is necessary to completely fix the bug that was described in and
> addressed by 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit
> from the runq if there is one").
>
> It should, therefore, be backported and applied to all the branches to
> which that commit has been. About backports, it should be
> straigthforward to do that until 4.13.
... for 4.13.4 it would of course be nice to have it in. Things look
plausible overall, but I've got one question which - despite concerning
code you only move - may play into the underlying issue.
> For 4.12 and earlier, it's trickier, but the fix is still necessary.
> Actually, both 07b0eb5d0ef0 and this patch should be backported to that
> branch!
Depends on what you target with this remark: For downstreams - yes. The
stable upstream branch, otoh, is out of general support, and since this
is not a security fix it is not going to be applied to that tree.
> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -3463,48 +3463,61 @@ runq_candidate(struct csched2_runqueue_data *rqd,
> (unsigned char *)&d);
> }
>
> - /* Skip non runnable units that we (temporarily) have in the runq */
> - if ( unlikely(!unit_runnable_state(svc->unit)) )
> - continue;
> -
> - /* Only consider vcpus that are allowed to run on this processor. */
> - if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
> - continue;
> -
> /*
> - * If an unit is meant to be picked up by another processor, and such
> - * processor has not scheduled yet, leave it in the runqueue for him.
> + * If the unit in the runqueue has more credit than current (or than
> + * idle, if current is not runnable) or if current is yielding, we may
> + * want to pick it up.
> */
> - if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
> - cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> + if ( (yield || svc->credit > snext->credit) )
The "credit" field is plain "int", i.e. signed. Idle domain's vCPU-s
don't get INT_MIN credit afaict (there's only one use of INT_MIN
throughout the entire file). Hence I can't see why in principle a
vCPU of an ordinary domain couldn't have equal or less credit than
the CPU's idle vCPU. I therefore wonder whether "yield" shouldn't be
set to true whenever snext != scurr (or - see the top of the
function - is_idle_unit() returns true), to make sure this if()'s
body gets entered in such a case.
As a nit, there's no need for the inner parentheses (anymore).
> {
> - SCHED_STAT_CRANK(deferred_to_tickled_cpu);
> - continue;
> - }
> + /* Skip non runnable units that we (temporarily) have in the runq */
> + if ( unlikely(!unit_runnable_state(svc->unit)) )
> + continue;
>
> - /*
> - * If this is on a different processor, don't pull it unless
> - * its credit is at least CSCHED2_MIGRATE_RESIST higher.
> - */
> - if ( sched_unit_master(svc->unit) != cpu
> - && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
> - {
> - SCHED_STAT_CRANK(migrate_resisted);
> - continue;
> - }
> + /* Only consider vcpus that are allowed to run on this processor. */
> + if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
> + continue;
>
> - /*
> - * If the one in the runqueue has more credit than current (or idle,
> - * if current is not runnable), or if current is yielding, and also
> - * if the one in runqueue either is not capped, or is capped but has
> - * some budget, then choose it.
> - */
> - if ( (yield || svc->credit > snext->credit) &&
> - (!has_cap(svc) || unit_grab_budget(svc)) )
> - snext = svc;
> + /*
> + * If an unit is meant to be picked up by another processor, and such
Nit: As you move/re-indent (and hence touch) this, would you mind also
replacing "an" by "a"? I'm less certain about "such" and ...
> + * processor has not scheduled yet, leave it in the runqueue for him.
... "him", but to me "that" and "it" respectively would seem more
suitable.
> + */
> + if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
> + cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> + {
> + SCHED_STAT_CRANK(deferred_to_tickled_cpu);
> + continue;
> + }
>
> - /* In any case, if we got this far, break. */
> - break;
> + /*
> + * If this is on a different processor, don't pull it unless
> + * its credit is at least CSCHED2_MIGRATE_RESIST higher.
> + */
> + if ( sched_unit_master(svc->unit) != cpu
> + && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
Again, despite the code just getting moved/re-indented, please correct
style here ("&&" to be moved to the end of the earlier line) as you
touch the code.
Otoh I'm having trouble seeing why all of this code movement / re-
indentation is necessary in the first place: If the initial if() was
inverted to
if ( !yield && svc->credit <= snext->credit )
continue;
less code churn would result afaict, and then the backports likely also
would become less involved (plus my stylistic remarks might evaporate,
as the affected code may then remain untouched altogether).
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread