All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Soft affinity for Credit2
@ 2017-06-16 14:13 Dario Faggioli
  2017-06-16 14:13 ` [PATCH 1/7] xen: sched: factor affinity helpers out of sched_credit.c Dario Faggioli
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Dario Faggioli @ 2017-06-16 14:13 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Wei Liu, Anshul Makkar, Ian Jackson

Hello,

This series implement soft-affinity for Credit2. Or, at least, it implements
most of it.

In fact, these patches introduces soft-affinity support in the scheduler,
everywhere it is needed, with the exception of the load balancer.  This is
because I think load balancing should be reorganized and reworked a bit, to
make soft-affinity (and other enahcement we may come up with in future) fit
well inside it.

And since this "first part" of soft-affinity support, and the "load balancing
part" of it, are 100% orthogonal (thanks a lot again and again, George, for
doing such a good job in Credit2 design! :-D), I decided to go ahead submitting
this one, to parallelize things a bit.

Note that I sent something like this already, a few time ago:

 https://lists.xenproject.org/archives/html/xen-devel/2016-08/msg02183.html
 [patches 16 to 24]

However, although some of the patches are indeed very similar, or even the same
(although, I think that applies only to patch 1!), others are different.
This is because the code base changed since then, but also because there were
thing _I_ wanted to change.  For instance, for what is patch 4 in this series,
I followed George's suggestions from back then, and reworked pretty much the
whole logic.  For others, I slightly tweaked them, for various reasons.

I'm, therefore, not sending this series as a v2 of that one, but as a new
submission. I've dropped all the Acks and Reviews (again, with the only
exception of patch 1, which is truly identical). When there is the need to fish
back some context from the old thread, I've put links in the '---' area of the
changelogs of the specific patches.

Note, finally, that the last 2 patches are not stirctly related to
soft-affinity, but it felt ok to include them in the series.

There's a branch here:
 git://xenbits.xen.org/people/dariof/xen.git rel/sched/credit2-soft-aff
 http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/credit2-soft-aff
 https://travis-ci.org/fdario/xen/builds/243672460

Thanks and Regards,
Dario
---
Dario Faggioli (7):
      xen: sched: factor affinity helpers out of sched_credit.c
      xen/tools: credit2: soft-affinity awareness in runq_tickle()
      xen: credit2: soft-affinity awareness in fallback_cpu()
      xen: credit2: soft-affinity awareness in csched2_cpu_pick()
      xen: credit2: kick away vcpus not running within their soft-affinity
      xen: credit2: optimize runq_candidate() a little bit
      xen: credit2: try to avoid tickling cpus subject to ratelimiting

 tools/xentrace/formats     |    2 
 tools/xentrace/xenalyze.c  |    7 -
 xen/common/sched_credit.c  |   97 +--------
 xen/common/sched_credit2.c |  486 +++++++++++++++++++++++++++++++++-----------
 xen/include/xen/sched-if.h |   64 ++++++
 5 files changed, 448 insertions(+), 208 deletions(-)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/7] xen: sched: factor affinity helpers out of sched_credit.c
  2017-06-16 14:13 [PATCH 0/7] Soft affinity for Credit2 Dario Faggioli
@ 2017-06-16 14:13 ` Dario Faggioli
  2017-06-23 10:02   ` Anshul Makkar
  2017-06-16 14:13 ` [PATCH 2/7] xen/tools: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2017-06-16 14:13 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, Justin T. Weaver, George Dunlap

In fact, we want to be able to use them from any scheduler.

While there, make the moved code use 'v' for struct_vcpu*
variable, like it should be done everywhere.

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit.c  |   97 +++++++-------------------------------------
 xen/include/xen/sched-if.h |   64 +++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+), 82 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index efdf6bf..53773df 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -136,27 +136,6 @@
 #define TRC_CSCHED_RATELIMIT     TRC_SCHED_CLASS_EVT(CSCHED, 10)
 #define TRC_CSCHED_STEAL_CHECK   TRC_SCHED_CLASS_EVT(CSCHED, 11)
 
-
-/*
- * Hard and soft affinity load balancing.
- *
- * Idea is each vcpu has some pcpus that it prefers, some that it does not
- * prefer but is OK with, and some that it cannot run on at all. The first
- * set of pcpus are the ones that are both in the soft affinity *and* in the
- * hard affinity; the second set of pcpus are the ones that are in the hard
- * affinity but *not* in the soft affinity; the third set of pcpus are the
- * ones that are not in the hard affinity.
- *
- * We implement a two step balancing logic. Basically, every time there is
- * the need to decide where to run a vcpu, we first check the soft affinity
- * (well, actually, the && between soft and hard affinity), to see if we can
- * send it where it prefers to (and can) run on. However, if the first step
- * does not find any suitable and free pcpu, we fall back checking the hard
- * affinity.
- */
-#define CSCHED_BALANCE_SOFT_AFFINITY    0
-#define CSCHED_BALANCE_HARD_AFFINITY    1
-
 /*
  * Boot parameters
  */
@@ -331,52 +310,6 @@ runq_remove(struct csched_vcpu *svc)
     __runq_remove(svc);
 }
 
-#define for_each_csched_balance_step(step) \
-    for ( (step) = 0; (step) <= CSCHED_BALANCE_HARD_AFFINITY; (step)++ )
-
-
-/*
- * Hard affinity balancing is always necessary and must never be skipped.
- * But soft affinity need only be considered when it has a functionally
- * different effect than other constraints (such as hard affinity, cpus
- * online, or cpupools).
- *
- * Soft affinity only needs to be considered if:
- * * The cpus in the cpupool are not a subset of soft affinity
- * * The hard affinity is not a subset of soft affinity
- * * There is an overlap between the soft affinity and the mask which is
- *   currently being considered.
- */
-static inline int __vcpu_has_soft_affinity(const struct vcpu *vc,
-                                           const cpumask_t *mask)
-{
-    return !cpumask_subset(cpupool_domain_cpumask(vc->domain),
-                           vc->cpu_soft_affinity) &&
-           !cpumask_subset(vc->cpu_hard_affinity, vc->cpu_soft_affinity) &&
-           cpumask_intersects(vc->cpu_soft_affinity, mask);
-}
-
-/*
- * Each csched-balance step uses its own cpumask. This function determines
- * which one (given the step) and copies it in mask. For the soft affinity
- * balancing step, the pcpus that are not part of vc's hard affinity are
- * filtered out from the result, to avoid running a vcpu where it would
- * like, but is not allowed to!
- */
-static void
-csched_balance_cpumask(const struct vcpu *vc, int step, cpumask_t *mask)
-{
-    if ( step == CSCHED_BALANCE_SOFT_AFFINITY )
-    {
-        cpumask_and(mask, vc->cpu_soft_affinity, vc->cpu_hard_affinity);
-
-        if ( unlikely(cpumask_empty(mask)) )
-            cpumask_copy(mask, vc->cpu_hard_affinity);
-    }
-    else /* step == CSCHED_BALANCE_HARD_AFFINITY */
-        cpumask_copy(mask, vc->cpu_hard_affinity);
-}
-
 static void burn_credits(struct csched_vcpu *svc, s_time_t now)
 {
     s_time_t delta;
@@ -441,18 +374,18 @@ static inline void __runq_tickle(struct csched_vcpu *new)
          * Soft and hard affinity balancing loop. For vcpus without
          * a useful soft affinity, consider hard affinity only.
          */
-        for_each_csched_balance_step( balance_step )
+        for_each_affinity_balance_step( balance_step )
         {
             int new_idlers_empty;
 
-            if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY
-                 && !__vcpu_has_soft_affinity(new->vcpu,
-                                              new->vcpu->cpu_hard_affinity) )
+            if ( balance_step == BALANCE_SOFT_AFFINITY
+                 && !has_soft_affinity(new->vcpu,
+                                       new->vcpu->cpu_hard_affinity) )
                 continue;
 
             /* Are there idlers suitable for new (for this balance step)? */
-            csched_balance_cpumask(new->vcpu, balance_step,
-                                   cpumask_scratch_cpu(cpu));
+            affinity_balance_cpumask(new->vcpu, balance_step,
+                                     cpumask_scratch_cpu(cpu));
             cpumask_and(cpumask_scratch_cpu(cpu),
                         cpumask_scratch_cpu(cpu), &idle_mask);
             new_idlers_empty = cpumask_empty(cpumask_scratch_cpu(cpu));
@@ -463,7 +396,7 @@ static inline void __runq_tickle(struct csched_vcpu *new)
              * hard affinity as well, before taking final decisions.
              */
             if ( new_idlers_empty
-                 && balance_step == CSCHED_BALANCE_SOFT_AFFINITY )
+                 && balance_step == BALANCE_SOFT_AFFINITY )
                 continue;
 
             /*
@@ -789,7 +722,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     online = cpupool_domain_cpumask(vc->domain);
     cpumask_and(&cpus, vc->cpu_hard_affinity, online);
 
-    for_each_csched_balance_step( balance_step )
+    for_each_affinity_balance_step( balance_step )
     {
         /*
          * We want to pick up a pcpu among the ones that are online and
@@ -809,12 +742,12 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * cpus and, if the result is empty, we just skip the soft affinity
          * balancing step all together.
          */
-        if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY
-             && !__vcpu_has_soft_affinity(vc, &cpus) )
+        if ( balance_step == BALANCE_SOFT_AFFINITY
+             && !has_soft_affinity(vc, &cpus) )
             continue;
 
         /* Pick an online CPU from the proper affinity mask */
-        csched_balance_cpumask(vc, balance_step, &cpus);
+        affinity_balance_cpumask(vc, balance_step, &cpus);
         cpumask_and(&cpus, &cpus, online);
 
         /* If present, prefer vc's current processor */
@@ -1710,11 +1643,11 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
          * or counter.
          */
         if ( vc->is_running ||
-             (balance_step == CSCHED_BALANCE_SOFT_AFFINITY
-              && !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity)) )
+             (balance_step == BALANCE_SOFT_AFFINITY
+              && !has_soft_affinity(vc, vc->cpu_hard_affinity)) )
             continue;
 
-        csched_balance_cpumask(vc, balance_step, cpumask_scratch);
+        affinity_balance_cpumask(vc, balance_step, cpumask_scratch);
         if ( __csched_vcpu_is_migrateable(vc, cpu, cpumask_scratch) )
         {
             /* We got a candidate. Grab it! */
@@ -1774,7 +1707,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
      *  1. any "soft-affine work" to steal first,
      *  2. if not finding anything, any "hard-affine work" to steal.
      */
-    for_each_csched_balance_step( bstep )
+    for_each_affinity_balance_step( bstep )
     {
         /*
          * We peek at the non-idling CPUs in a node-wise fashion. In fact,
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index c32ee7a..c4a4935 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -208,4 +208,68 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
     return d->cpupool->cpu_valid;
 }
 
+/*
+ * Hard and soft affinity load balancing.
+ *
+ * Idea is each vcpu has some pcpus that it prefers, some that it does not
+ * prefer but is OK with, and some that it cannot run on at all. The first
+ * set of pcpus are the ones that are both in the soft affinity *and* in the
+ * hard affinity; the second set of pcpus are the ones that are in the hard
+ * affinity but *not* in the soft affinity; the third set of pcpus are the
+ * ones that are not in the hard affinity.
+ *
+ * We implement a two step balancing logic. Basically, every time there is
+ * the need to decide where to run a vcpu, we first check the soft affinity
+ * (well, actually, the && between soft and hard affinity), to see if we can
+ * send it where it prefers to (and can) run on. However, if the first step
+ * does not find any suitable and free pcpu, we fall back checking the hard
+ * affinity.
+ */
+#define BALANCE_SOFT_AFFINITY    0
+#define BALANCE_HARD_AFFINITY    1
+
+#define for_each_affinity_balance_step(step) \
+    for ( (step) = 0; (step) <= BALANCE_HARD_AFFINITY; (step)++ )
+
+/*
+ * Hard affinity balancing is always necessary and must never be skipped.
+ * But soft affinity need only be considered when it has a functionally
+ * different effect than other constraints (such as hard affinity, cpus
+ * online, or cpupools).
+ *
+ * Soft affinity only needs to be considered if:
+ * * The cpus in the cpupool are not a subset of soft affinity
+ * * The hard affinity is not a subset of soft affinity
+ * * There is an overlap between the soft affinity and the mask which is
+ *   currently being considered.
+ */
+static inline int has_soft_affinity(const struct vcpu *v,
+                                    const cpumask_t *mask)
+{
+    return !cpumask_subset(cpupool_domain_cpumask(v->domain),
+                           v->cpu_soft_affinity) &&
+           !cpumask_subset(v->cpu_hard_affinity, v->cpu_soft_affinity) &&
+           cpumask_intersects(v->cpu_soft_affinity, mask);
+}
+
+/*
+ * This function copies in mask the cpumask that should be used for a
+ * particular affinity balancing step. For the soft affinity one, the pcpus
+ * that are not part of vc's hard affinity are filtered out from the result,
+ * to avoid running a vcpu where it would like, but is not allowed to!
+ */
+static inline void
+affinity_balance_cpumask(const struct vcpu *v, int step, cpumask_t *mask)
+{
+    if ( step == BALANCE_SOFT_AFFINITY )
+    {
+        cpumask_and(mask, v->cpu_soft_affinity, v->cpu_hard_affinity);
+
+        if ( unlikely(cpumask_empty(mask)) )
+            cpumask_copy(mask, v->cpu_hard_affinity);
+    }
+    else /* step == BALANCE_HARD_AFFINITY */
+        cpumask_copy(mask, v->cpu_hard_affinity);
+}
+
 #endif /* __XEN_SCHED_IF_H__ */


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/7] xen/tools: credit2: soft-affinity awareness in runq_tickle()
  2017-06-16 14:13 [PATCH 0/7] Soft affinity for Credit2 Dario Faggioli
  2017-06-16 14:13 ` [PATCH 1/7] xen: sched: factor affinity helpers out of sched_credit.c Dario Faggioli
@ 2017-06-16 14:13 ` Dario Faggioli
  2017-06-23 10:35   ` Anshul Makkar
  2017-07-25 11:47   ` George Dunlap
  2017-06-16 14:13 ` [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu() Dario Faggioli
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Dario Faggioli @ 2017-06-16 14:13 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Anshul Makkar, Ian Jackson, George Dunlap

Soft-affinity support is usually implemented by means
of a two step "balancing loop", where:
- during the first step, we consider soft-affinity
  (if the vcpu has one);
- during the second (if we get to it), we consider
  hard-affinity.

In runq_tickle(), we need to do that for checking
whether we can execute the waking vCPU on an pCPU
that is idle. In fact, we want to be sure that, if
there is an idle pCPU in the vCPU's soft affinity,
we'll use it.

If there are no such idle pCPUs, though, and we
have to check non-idle ones, we can avoid the loop
and to both hard and soft-affinity in one pass.

In fact, we can we scan runqueue and compute a
"score" for each vCPU which is running on each pCPU.
The idea is, since we may have to preempt someone:
- try to make sure that the waking vCPU will run
  inside its soft-affinity,
- try to preempt someone that is running outside
  of its own soft-affinity.

The value of the score is added to a trace record,
so xenalyze's code and tools/xentrace/formats are
updated accordingly.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
This is *very* different from what was first submitted here:
 https://lists.xenproject.org/archives/html/xen-devel/2016-08/msg02200.html

In fact, I reworked it starting from George's idea and draft:
 https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg03171.html

But then ended up changing that quite a bit as well. So, George, I'm more than
happy for this patch to have a 'Signed-off-by: George Dunlap', and in fact, I
had it here until 5 minutes before actually sending the series.

But since I altered the code significantly, I could not be sure you'd be happy
about that, and hence decided to remove it, and ask you. Are you ok with it, or
maybe you prefer some other tag (stuff like 'Idea-by:', etc.). Let me know. :-)
---
 tools/xentrace/formats     |    2 
 tools/xentrace/xenalyze.c  |    7 +
 xen/common/sched_credit2.c |  214 +++++++++++++++++++++++++++++---------------
 3 files changed, 146 insertions(+), 77 deletions(-)

diff --git a/tools/xentrace/formats b/tools/xentrace/formats
index 8b31780..2684611 100644
--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -53,7 +53,7 @@
 0x00022202  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:runq_pos       [ dom:vcpu = 0x%(1)08x, pos = %(2)d]
 0x00022203  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit burn    [ dom:vcpu = 0x%(1)08x, credit = %(2)d, delta = %(3)d ]
 0x00022204  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit_add
-0x00022205  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tickle_check   [ dom:vcpu = 0x%(1)08x, credit = %(2)d ]
+0x00022205  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tickle_check   [ dom:vcpu = 0x%(1)08x, credit = %(2)d, score = %(3)d ]
 0x00022206  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tickle         [ cpu = %(1)d ]
 0x00022207  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit_reset   [ dom:vcpu = 0x%(1)08x, cr_start = %(2)d, cr_end = %(3)d, mult = %(4)d ]
 0x00022208  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:sched_tasklet
diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index fa608ad..01315e6 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7692,11 +7692,12 @@ void sched_process(struct pcpu_info *p)
             if(opt.dump_all) {
                 struct {
                     unsigned int vcpuid:16, domid:16;
-                    int credit;
+                    int credit, score;
                 } *r = (typeof(r))ri->d;
 
-                printf(" %s csched2:tickle_check d%uv%u, credit = %d\n",
-                       ri->dump_header, r->domid, r->vcpuid, r->credit);
+                printf(" %s csched2:tickle_check d%uv%u, credit = %d, score = %d\n\n",
+                       ri->dump_header, r->domid, r->vcpuid,
+                       r->credit, r->score);
             }
             break;
         case TRC_SCHED_CLASS_EVT(CSCHED2, 6): /* TICKLE            */
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 126417c..c749d4e 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1132,6 +1132,73 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
 }
 
 /*
+ * Score to preempt the target cpu.  Return a negative number if the
+ * credit isn't high enough; if it is, favor a preemption on cpu in
+ * this order:
+ * - cpu is in new's soft-affinity, not in cur's soft-affinity
+ *   (2 x CSCHED2_CREDIT_INIT score bonus);
+ * - cpu is in new's soft-affinity and cur's soft-affinity, or
+ *   cpu is not in new's soft-affinity, nor in cur's soft-affinity
+ *   (1x CSCHED2_CREDIT_INIT score bonus);
+ * - cpu is not in new's soft-affinity, while it is in cur's soft-affinity
+ *   (no bonus).
+ *
+ * Within the same class, the highest difference of credit.
+ */
+static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now,
+                             struct csched2_vcpu *new, unsigned int cpu)
+{
+    struct csched2_vcpu * cur = csched2_vcpu(curr_on_cpu(cpu));
+    s_time_t score;
+
+    /*
+     * We are dealing with cpus that are marked non-idle (i.e., that are not
+     * in rqd->idle). However, some of them may be running their idle vcpu,
+     * if taking care of tasklets. In that case, we want to leave it alone.
+     */
+    if ( unlikely(is_idle_vcpu(cur->vcpu)) )
+        return -1;
+
+    burn_credits(rqd, cur, now);
+
+    score = new->credit - cur->credit;
+    if ( new->vcpu->processor != cpu )
+        score -= CSCHED2_MIGRATE_RESIST;
+
+    /*
+     * If score is positive, it means new has enough credits (i.e.,
+     * new->credit > cur->credit+CSCHED2_MIGRATE_RESIST).
+     *
+     * Let's compute the bonuses for soft-affinities.
+     */
+    if ( score > 0 )
+    {
+        if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+            score += CSCHED2_CREDIT_INIT;
+
+        if ( !cpumask_test_cpu(cpu, cur->vcpu->cpu_soft_affinity) )
+            score += CSCHED2_CREDIT_INIT;
+    }
+
+    if ( unlikely(tb_init_done) )
+    {
+        struct {
+            unsigned vcpu:16, dom:16;
+            int credit, score;
+        } d;
+        d.dom = cur->vcpu->domain->domain_id;
+        d.vcpu = cur->vcpu->vcpu_id;
+        d.credit = cur->credit;
+        d.score = score;
+        __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
+                    sizeof(d),
+                    (unsigned char *)&d);
+    }
+
+    return score;
+}
+
+/*
  * Check what processor it is best to 'wake', for picking up a vcpu that has
  * just been put (back) in the runqueue. Logic is as follows:
  *  1. if there are idle processors in the runq, wake one of them;
@@ -1151,11 +1218,11 @@ static void
 runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
 {
     int i, ipid = -1;
-    s_time_t lowest = (1<<30);
-    unsigned int cpu = new->vcpu->processor;
+    s_time_t max = 0;
+    unsigned int bs, cpu = new->vcpu->processor;
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
+    cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain);
     cpumask_t mask;
-    struct csched2_vcpu * cur;
 
     ASSERT(new->rqd == rqd);
 
@@ -1175,109 +1242,110 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                     (unsigned char *)&d);
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
-                cpupool_domain_cpumask(new->vcpu->domain));
-
-    /*
-     * First of all, consider idle cpus, checking if we can just
-     * re-use the pcpu where we were running before.
-     *
-     * If there are cores where all the siblings are idle, consider
-     * them first, honoring whatever the spreading-vs-consolidation
-     * SMT policy wants us to do.
-     */
-    if ( unlikely(sched_smt_power_savings) )
-        cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
-    else
-        cpumask_copy(&mask, &rqd->smt_idle);
-    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
-    i = cpumask_test_or_cycle(cpu, &mask);
-    if ( i < nr_cpu_ids )
+    for_each_affinity_balance_step( bs )
     {
-        SCHED_STAT_CRANK(tickled_idle_cpu);
-        ipid = i;
-        goto tickle;
+        /* Just skip first step, if we don't have a soft affinity */
+        if ( bs == BALANCE_SOFT_AFFINITY &&
+             !has_soft_affinity(new->vcpu, new->vcpu->cpu_hard_affinity) )
+            continue;
+
+        affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch_cpu(cpu));
+
+        /*
+         * First of all, consider idle cpus, checking if we can just
+         * re-use the pcpu where we were running before.
+         *
+         * If there are cores where all the siblings are idle, consider
+         * them first, honoring whatever the spreading-vs-consolidation
+         * SMT policy wants us to do.
+         */
+        if ( unlikely(sched_smt_power_savings) )
+        {
+            cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
+            cpumask_and(&mask, &mask, online);
+        }
+        else
+            cpumask_and(&mask, &rqd->smt_idle, online);
+        cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
+        i = cpumask_test_or_cycle(cpu, &mask);
+        if ( i < nr_cpu_ids )
+        {
+            SCHED_STAT_CRANK(tickled_idle_cpu);
+            ipid = i;
+            goto tickle;
+        }
+
+        /*
+         * If there are no fully idle cores, check all idlers, after
+         * having filtered out pcpus that have been tickled but haven't
+         * gone through the scheduler yet.
+         */
+        cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), online);
+        cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
+        i = cpumask_test_or_cycle(cpu, &mask);
+        if ( i < nr_cpu_ids )
+        {
+            SCHED_STAT_CRANK(tickled_idle_cpu);
+            ipid = i;
+            goto tickle;
+        }
     }
 
     /*
-     * If there are no fully idle cores, check all idlers, after
-     * having filtered out pcpus that have been tickled but haven't
-     * gone through the scheduler yet.
+     * Note that, if we are here, it means we have done the hard-affinity
+     * balancing step of the loop, and hence what we have in cpumask_scratch
+     * is what we put there for last, i.e., new's vcpu_hard_affinity & online
+     * which is exactly what we need for the next part of the function.
      */
-    cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
-    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
-    i = cpumask_test_or_cycle(cpu, &mask);
-    if ( i < nr_cpu_ids )
-    {
-        SCHED_STAT_CRANK(tickled_idle_cpu);
-        ipid = i;
-        goto tickle;
-    }
 
     /*
      * Otherwise, look for the non-idle (and non-tickled) processors with
      * the lowest credit, among the ones new is allowed to run on. Again,
      * the cpu were it was running on would be the best candidate.
+     *
+     * For deciding which cpu to tickle, we use tickle_score(), which will
+     * factor in both new's soft-affinity, and the soft-affinity of the
+     * vcpu running on each cpu that we consider.
      */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
     cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     if ( __cpumask_test_and_clear_cpu(cpu, &mask) )
     {
-        cur = csched2_vcpu(curr_on_cpu(cpu));
-        burn_credits(rqd, cur, now);
+        s_time_t score = tickle_score(rqd, now, new, cpu);
 
-        if ( cur->credit < new->credit )
+        if ( score > max )
         {
-            SCHED_STAT_CRANK(tickled_busy_cpu);
+            max = score;
             ipid = cpu;
-            goto tickle;
+
+            /* If this is in new's soft affinity, just take it */
+            if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+            {
+                SCHED_STAT_CRANK(tickled_busy_cpu);
+                goto tickle;
+            }
         }
     }
 
     for_each_cpu(i, &mask)
     {
+        s_time_t score;
+
         /* Already looked at this one above */
         ASSERT(i != cpu);
 
-        cur = csched2_vcpu(curr_on_cpu(i));
-
-        /*
-         * Even if the cpu is not in rqd->idle, it may be running the
-         * idle vcpu, if it's doing tasklet work. Just skip it.
-         */
-        if ( is_idle_vcpu(cur->vcpu) )
-            continue;
-
-        /* Update credits for current to see if we want to preempt. */
-        burn_credits(rqd, cur, now);
+        score = tickle_score(rqd, now, new, i);
 
-        if ( cur->credit < lowest )
+        if ( score > max )
         {
+            max = score;
             ipid = i;
-            lowest = cur->credit;
-        }
-
-        if ( unlikely(tb_init_done) )
-        {
-            struct {
-                unsigned vcpu:16, dom:16;
-                int credit;
-            } d;
-            d.dom = cur->vcpu->domain->domain_id;
-            d.vcpu = cur->vcpu->vcpu_id;
-            d.credit = cur->credit;
-            __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
-                        sizeof(d),
-                        (unsigned char *)&d);
         }
     }
 
-    /*
-     * Only switch to another processor if the credit difference is
-     * greater than the migrate resistance.
-     */
-    if ( ipid == -1 || lowest + CSCHED2_MIGRATE_RESIST > new->credit )
+    if ( ipid == -1 )
     {
         SCHED_STAT_CRANK(tickled_no_cpu);
         return;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
  2017-06-16 14:13 [PATCH 0/7] Soft affinity for Credit2 Dario Faggioli
  2017-06-16 14:13 ` [PATCH 1/7] xen: sched: factor affinity helpers out of sched_credit.c Dario Faggioli
  2017-06-16 14:13 ` [PATCH 2/7] xen/tools: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
@ 2017-06-16 14:13 ` Dario Faggioli
  2017-07-25 10:19   ` George Dunlap
  2017-06-16 14:14 ` [PATCH 4/7] xen: credit2: soft-affinity awareness in csched2_cpu_pick() Dario Faggioli
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2017-06-16 14:13 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Anshul Makkar, Justin T. Weaver

By, basically, moving all the logic of the function
inside the usual two steps (soft-affinity step and
hard-affinity step) loop.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
---
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
George, you gave your Reviewed-by to:
 https://lists.xenproject.org/archives/html/xen-devel/2016-08/msg02201.html

which was adding soft-affinity awareness to both fallback_cpu and cpu_pick(). See here:
 https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg03259.html

I changed the cpu_pick() part a lot, and that's why I decided to split the
patch in two.  As far as fallback_cpu(), though, what's done in this patch is
exactly the same that was being done in the original one.

So, of course I'm dropping the Rev-by, but I thought it could have been useful
to mention this. :-)
---
 xen/common/sched_credit2.c |   77 ++++++++++++++++++++++++++++++++------------
 1 file changed, 56 insertions(+), 21 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index c749d4e..54f6e21 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 }
 
 /*
- * When a hard affinity change occurs, we may not be able to check some
- * (any!) of the other runqueues, when looking for the best new processor
- * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
- * pick, in order of decreasing preference:
- *  - svc's current pcpu;
- *  - another pcpu from svc's current runq;
- *  - any cpu.
+ * In csched2_cpu_pick(), it may not be possible to actually look at remote
+ * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
+ * we pick, in order of decreasing preference:
+ *  1) svc's current pcpu, if it is part of svc's soft affinity;
+ *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
+ *  3) just one valid pcpu from svc's soft affinity;
+ *  4) svc's current pcpu, if it is part of svc's hard affinity;
+ *  5) a pcpu in svc's current runqueue that is also in svc's hard affinity;
+ *  6) just one valid pcpu from svc's hard affinity
+ *
+ * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
+ * note that at least 6 is guaranteed to _always_ return at least one pcpu.
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
     struct vcpu *v = svc->vcpu;
-    int cpu = v->processor;
+    unsigned int bs;
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
-                cpupool_domain_cpumask(v->domain));
+    for_each_affinity_balance_step( bs )
+    {
+        int cpu = v->processor;
+
+        if ( bs == BALANCE_SOFT_AFFINITY &&
+             !has_soft_affinity(v, v->cpu_hard_affinity) )
+            continue;
 
-    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
-        return cpu;
+        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    cpupool_domain_cpumask(v->domain));
 
-    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
-                                   &svc->rqd->active)) )
-    {
-        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
-                    cpumask_scratch_cpu(cpu));
-        return cpumask_first(cpumask_scratch_cpu(cpu));
-    }
+        /*
+         * This is cases 1 or 4 (depending on bs): if v->processor is (still)
+         * in our affinity, go for it, for cache betterness.
+         */
+        if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
+            return cpu;
 
-    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
+        /*
+         * This is cases 2 or 5 (depending on bs): v->processor isn't there
+         * any longer, check if we at least can stay in our current runq.
+         */
+        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                       &svc->rqd->active)) )
+        {
+            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                        &svc->rqd->active);
+            return cpumask_first(cpumask_scratch_cpu(cpu));
+        }
 
-    return cpumask_first(cpumask_scratch_cpu(cpu));
+        /*
+         * This is cases 3 or 6 (depending on bs): last stand, just one valid
+         * pcpu from our soft affinity, if we have one and if there's any. In
+         * fact, if we are doing soft-affinity, it is possible that we fail,
+         * which means we stay in the loop and look for hard affinity. OTOH,
+         * if we are at the hard-affinity balancing step, it's guaranteed that
+         * there is at least one valid cpu, and therefore we are sure that we
+         * return it, and never really exit the loop.
+         */
+        ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
+               bs == BALANCE_SOFT_AFFINITY);
+        cpu = cpumask_first(cpumask_scratch_cpu(cpu));
+        if ( likely(cpu < nr_cpu_ids) )
+            return cpu;
+    }
+    BUG_ON(1);
 }
 
 /*


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/7] xen: credit2: soft-affinity awareness in csched2_cpu_pick()
  2017-06-16 14:13 [PATCH 0/7] Soft affinity for Credit2 Dario Faggioli
                   ` (2 preceding siblings ...)
  2017-06-16 14:13 ` [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu() Dario Faggioli
@ 2017-06-16 14:14 ` Dario Faggioli
  2017-07-25 10:54   ` George Dunlap
  2017-07-25 11:04   ` George Dunlap
  2017-06-16 14:14 ` [PATCH 5/7] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Dario Faggioli @ 2017-06-16 14:14 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, Justin T. Weaver, George Dunlap

We want to find the runqueue with the least average load,
and to do that, we scan through all the runqueues.

It is, therefore, enough that, during such scan:
- we identify the runqueue with the least load, among
  the ones that have pcpus that are part of the soft
  affinity of the vcpu we're calling pick on;
- we identify the same, but for hard affinity.

At this point, we can decide whether to go for the
runqueue with the least load among the ones with some
soft-affinity, or overall.

Therefore, at the price of some code reshuffling, we
can avoid the loop.

(Also, kill a spurious ';' in the definition of MAX_LOAD.)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |  117 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 97 insertions(+), 20 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 54f6e21..fb97ff7 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1725,14 +1725,16 @@ csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
     vcpu_schedule_unlock_irq(lock, vc);
 }
 
-#define MAX_LOAD (STIME_MAX);
+#define MAX_LOAD (STIME_MAX)
 static int
 csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    int i, min_rqi = -1, new_cpu, cpu = vc->processor;
+    int i, min_rqi = -1, min_s_rqi = -1;
+    unsigned int new_cpu, cpu = vc->processor;
     struct csched2_vcpu *svc = csched2_vcpu(vc);
-    s_time_t min_avgload = MAX_LOAD;
+    s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
+    bool has_soft;
 
     ASSERT(!cpumask_empty(&prv->active_queues));
 
@@ -1781,17 +1783,35 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
                                      &svc->migrate_rqd->active) )
         {
+            /*
+             * If we've been asked to move to migrate_rqd, we should just do
+             * that, which we actually do by returning one cpu from that runq.
+             * There is no need to take care of soft affinity, as that will
+             * happen in runq_tickle().
+             */
             cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                         &svc->migrate_rqd->active);
             new_cpu = cpumask_cycle(svc->migrate_rqd->pick_bias,
                                     cpumask_scratch_cpu(cpu));
+
             svc->migrate_rqd->pick_bias = new_cpu;
             goto out_up;
         }
         /* Fall-through to normal cpu pick */
     }
 
-    /* Find the runqueue with the lowest average load. */
+    /*
+     * What we want is:
+     *  - if we have soft affinity, the runqueue with the lowest average
+     *    load, among the ones that contain cpus in our soft affinity; this
+     *    represents the best runq on which we would want to run.
+     *  - the runqueue with the lowest average load among the ones that
+     *    contains cpus in our hard affinity; this represent the best runq
+     *    on which we can run.
+     *
+     * Find both runqueues in one pass.
+     */
+    has_soft = has_soft_affinity(vc, vc->cpu_hard_affinity);
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
@@ -1800,31 +1820,51 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         rqd = prv->rqd + i;
 
         /*
-         * If checking a different runqueue, grab the lock, check hard
-         * affinity, read the avg, and then release the lock.
+         * If none of the cpus of this runqueue is in svc's hard-affinity,
+         * skip the runqueue.
+         *
+         * Note that, in case svc's hard-affinity has changed, this is the
+         * first time when we see such change, so it is indeed possible
+         * that we end up skipping svc's current runqueue.
+         */
+        if ( !cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
+            continue;
+
+        /*
+         * If checking a different runqueue, grab the lock, read the avg,
+         * and then release the lock.
          *
          * If on our own runqueue, don't grab or release the lock;
          * but subtract our own load from the runqueue load to simulate
          * impartiality.
-         *
-         * Note that, if svc's hard affinity has changed, this is the
-         * first time when we see such change, so it is indeed possible
-         * that none of the cpus in svc's current runqueue is in our
-         * (new) hard affinity!
          */
         if ( rqd == svc->rqd )
         {
-            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
-                rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
+            rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
-                rqd_avgload = rqd->b_avgload;
-
+            rqd_avgload = rqd->b_avgload;
             spin_unlock(&rqd->lock);
         }
 
+        /*
+         * if svc has a soft-affinity, and some cpus of rqd are part of it,
+         * see if we need to update the "soft-affinity minimum".
+         */
+        if ( has_soft &&
+             rqd_avgload < min_s_avgload )
+        {
+            cpumask_t mask;
+
+            cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active);
+            if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) )
+            {
+                min_s_avgload = rqd_avgload;
+                min_s_rqi = i;
+            }
+        }
+        /* In any case, keep the "hard-affinity minimum" updated too. */
         if ( rqd_avgload < min_avgload )
         {
             min_avgload = rqd_avgload;
@@ -1832,17 +1872,54 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         }
     }
 
-    /* We didn't find anyone (most likely because of spinlock contention). */
-    if ( min_rqi == -1 )
+    if ( has_soft && min_s_rqi != -1 )
+    {
+        /*
+         * We have soft affinity, and we have a candidate runq, so go for it.
+         *
+         * Note that, to obtain the soft-affinity mask, we "just" put what we
+         * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
+         * ok because:
+         * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have
+         *   a non-empty intersection (because has_soft is true);
+         * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already
+         *   in cpumask_scratch, we do save a lot doing like this.
+         *
+         * It's kind of like open coding affinity_balance_cpumask() but, in
+         * this specific case, calling that would mean a lot of (unnecessary)
+         * cpumask operations.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    vc->cpu_soft_affinity);
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    &prv->rqd[min_s_rqi].active);
+    }
+    else if ( min_rqi != -1 )
     {
+        /*
+         * Either we don't have soft-affinity, or we do, but we did not find
+         * any suitable runq. But we did find one when considering hard
+         * affinity, so go for it.
+         *
+         * cpumask_scratch already has vc->cpu_hard_affinity &
+         * cpupool_domain_cpumask() in it, so it's enough that we filter
+         * with the cpus of the runq.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    &prv->rqd[min_rqi].active);
+    }
+    else
+    {
+        /*
+         * We didn't find anyone at all (most likely because of spinlock
+         * contention).
+         */
         new_cpu = get_fallback_cpu(svc);
         min_rqi = c2r(ops, new_cpu);
         min_avgload = prv->rqd[min_rqi].b_avgload;
         goto out_up;
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                &prv->rqd[min_rqi].active);
     new_cpu = cpumask_cycle(prv->rqd[min_rqi].pick_bias,
                             cpumask_scratch_cpu(cpu));
     prv->rqd[min_rqi].pick_bias = new_cpu;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/7] xen: credit2: kick away vcpus not running within their soft-affinity
  2017-06-16 14:13 [PATCH 0/7] Soft affinity for Credit2 Dario Faggioli
                   ` (3 preceding siblings ...)
  2017-06-16 14:14 ` [PATCH 4/7] xen: credit2: soft-affinity awareness in csched2_cpu_pick() Dario Faggioli
@ 2017-06-16 14:14 ` Dario Faggioli
  2017-07-25 11:06   ` George Dunlap
  2017-06-16 14:14 ` [PATCH 6/7] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
  2017-06-16 14:14 ` [PATCH 7/7] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
  6 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2017-06-16 14:14 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, George Dunlap

If, during scheduling, we realize that the current vcpu
is running outside of its own soft-affinity, it would be
preferable to send it somewhere else.

Of course, that may not be possible, and if we're too
strict, we risk having vcpus sit in runqueues, even if
there are idle pcpus (violating work-conservingness).
In fact, what about there are no pcpus, from the soft
affinity mask of the vcpu in question, where it can
run?

To make sure we don't fall in the above described trap,
only actually de-schedule the vcpu if there are idle and
not already tickled cpus from its soft affinity where it
can run immediately.

If there is (at least one) of such cpus, we let current
be preempted, so that csched2_context_saved() will put
it back in the runq, and runq_tickle() will wake (one
of) the cpu.

If there is not even one, we let current run where it is,
as running outside its soft-affinity is still better than
not running at all.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |   40 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index fb97ff7..5d8f25c 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2637,6 +2637,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     struct csched2_vcpu *snext = NULL;
     struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
     bool yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+    bool soft_aff_preempt = false;
 
     *skipped = 0;
 
@@ -2670,8 +2671,43 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         return scurr;
     }
 
-    /* Default to current if runnable, idle otherwise */
-    if ( vcpu_runnable(scurr->vcpu) )
+    /* If scurr has a soft-affinity, let's check whether cpu is part of it */
+    if ( !is_idle_vcpu(scurr->vcpu) &&
+         has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
+    {
+        affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
+                                 cpumask_scratch);
+        if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) )
+        {
+            cpumask_t *online = cpupool_domain_cpumask(scurr->vcpu->domain);
+
+            /* Ok, is any of the pcpus in scurr soft-affinity idle? */
+            cpumask_and(cpumask_scratch, cpumask_scratch, &rqd->idle);
+            cpumask_andnot(cpumask_scratch, cpumask_scratch, &rqd->tickled);
+            soft_aff_preempt = cpumask_intersects(cpumask_scratch, online);
+        }
+    }
+
+    /*
+     * If scurr is runnable, and this cpu is in its soft-affinity, default to
+     * it. We also default to it, even if cpu is not in its soft-affinity, if
+     * there aren't any idle and not tickled cpu in its soft-affinity. In
+     * fact, we don't want to risk leaving scurr in the runq and this cpu idle
+     * only because scurr is running outside of its soft-affinity.
+     *
+     * On the other hand, if cpu is not in scurr's soft-affinity, and there
+     * looks to be better options, go for them. That happens by defaulting to
+     * idle here, which means scurr will be preempted, put back in runq, and
+     * one of those idle and not tickled cpus from its soft-affinity will be
+     * tickled to pick it up.
+     *
+     * Finally, if scurr does not have a valid soft-affinity, we also let it
+     * continue to run here (in fact, soft_aff_preempt will still be false,
+     * in this case).
+     *
+     * Of course, we also default to idle also if scurr is not runnable.
+     */
+    if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
         snext = scurr;
     else
         snext = csched2_vcpu(idle_vcpu[cpu]);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/7] xen: credit2: optimize runq_candidate() a little bit
  2017-06-16 14:13 [PATCH 0/7] Soft affinity for Credit2 Dario Faggioli
                   ` (4 preceding siblings ...)
  2017-06-16 14:14 ` [PATCH 5/7] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
@ 2017-06-16 14:14 ` Dario Faggioli
  2017-07-25 11:25   ` George Dunlap
  2017-06-16 14:14 ` [PATCH 7/7] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
  6 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2017-06-16 14:14 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, George Dunlap

By factoring into one (at the top) all the checks
to see whether current is the idle vcpu, and mark
it as unlikely().

In fact, if current is idle, all the logic for
dealing with yielding, context switching rate
limiting and soft-affinity, is just pure overhead,
and we better rush checking the runq and pick some
vcpu up.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 5d8f25c..bbda790 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2634,13 +2634,20 @@ runq_candidate(struct csched2_runqueue_data *rqd,
                unsigned int *skipped)
 {
     struct list_head *iter;
-    struct csched2_vcpu *snext = NULL;
+    struct csched2_vcpu *snext = csched2_vcpu(idle_vcpu[cpu]);
     struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
-    bool yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
-    bool soft_aff_preempt = false;
+    bool yield = false, soft_aff_preempt = false;
 
     *skipped = 0;
 
+    if ( unlikely(is_idle_vcpu(scurr->vcpu)) )
+    {
+        snext = scurr;
+        goto check_runq;
+    }
+
+    yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+
     /*
      * Return the current vcpu if it has executed for less than ratelimit.
      * Adjuststment for the selected vcpu's credit and decision
@@ -2650,8 +2657,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !yield && prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
-         vcpu_runnable(scurr->vcpu) &&
+    if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) &&
          (now - scurr->vcpu->runstate.state_entry_time) <
           MICROSECS(prv->ratelimit_us) )
     {
@@ -2672,8 +2678,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     }
 
     /* If scurr has a soft-affinity, let's check whether cpu is part of it */
-    if ( !is_idle_vcpu(scurr->vcpu) &&
-         has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
+    if ( has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
     {
         affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
                                  cpumask_scratch);
@@ -2709,9 +2714,8 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      */
     if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
         snext = scurr;
-    else
-        snext = csched2_vcpu(idle_vcpu[cpu]);
 
+ check_runq:
     list_for_each( iter, &rqd->runq )
     {
         struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/7] xen: credit2: try to avoid tickling cpus subject to ratelimiting
  2017-06-16 14:13 [PATCH 0/7] Soft affinity for Credit2 Dario Faggioli
                   ` (5 preceding siblings ...)
  2017-06-16 14:14 ` [PATCH 6/7] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
@ 2017-06-16 14:14 ` Dario Faggioli
  2017-07-25 11:31   ` George Dunlap
  6 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2017-06-16 14:14 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Anshul Makkar

With context switching ratelimiting enabled, the following
pattern is quite common in a scheduling trace:

     0.000845622 |||||||||||.x||| d32768v12 csched2:runq_insert d0v13, position 0
     0.000845831 |||||||||||.x||| d32768v12 csched2:runq_tickle_new d0v13, processor = 12, credit = 10135529
     0.000846546 |||||||||||.x||| d32768v12 csched2:burn_credits d2v7, credit = 2619231, delta = 255937
 [1] 0.000846739 |||||||||||.x||| d32768v12 csched2:runq_tickle cpu 12
     [...]
 [2] 0.000850597 ||||||||||||x||| d32768v12 csched2:schedule cpu 12, rq# 1, busy, SMT busy, tickled
     0.000850760 ||||||||||||x||| d32768v12 csched2:burn_credits d2v7, credit = 2614028, delta = 5203
 [3] 0.000851022 ||||||||||||x||| d32768v12 csched2:ratelimit triggered
 [4] 0.000851614 ||||||||||||x||| d32768v12 runstate_continue d2v7 running->running

Basically, what happens is that runq_tickle() realizes
d0v13 should preempt d2v7, running on cpu 12, as it
has higher credits (10135529 vs. 2619231). It therefore
tickles cpu 12 [1], which, in turn, schedules [2].

But --surprise surprise-- d2v7 has run for less than the
ratelimit interval [3], and hence it is _not_ preempted,
and continues to run. This indeed looks fine. Actually,
this is what ratelimiting is there for. Note, however,
that:
 1) we interrupted cpu 12 for nothing;
 2) what if, say on cpu 8, there is a vcpu that has:
    + less credit than d0v13 (so d0v13 can well
      preempt it),
    + more credit than d2v7 (that's why it was not
      selected to be preempted),
    + run for more than the ratelimiting interval
      (so it can really be scheduled out)?

With this patch, if we are in case 2), we'd realize
that tickling 12 would be pointless, and we'll continue
looking, eventually finding and tickling 8.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |   30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index bbda790..c45bc03 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -160,6 +160,8 @@
 #define CSCHED2_MIGRATE_RESIST       ((opt_migrate_resist)*MICROSECS(1))
 /* How much to "compensate" a vcpu for L2 migration. */
 #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50)
+/* How tolerant we should be when peeking at runtime of vcpus on other cpus */
+#define CSCHED2_RATELIMIT_TICKLE_TOLERANCE MICROSECS(50)
 /* Reset: Value below which credit will be reset. */
 #define CSCHED2_CREDIT_RESET         0
 /* Max timer: Maximum time a guest can be run for. */
@@ -1167,6 +1169,23 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
 }
 
 /*
+ * What we want to know is whether svc, which we assume to be running on some
+ * pcpu, can be interrupted and preempted (which, so far, basically means
+ * whether or not it already run for more than the ratelimit, to which we
+ * apply some tolerance).
+ */
+static inline bool is_preemptable(const struct csched2_vcpu *svc,
+                                    s_time_t now, s_time_t ratelimit)
+{
+    if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
+        return true;
+
+    ASSERT(svc->vcpu->is_running);
+    return now - svc->vcpu->runstate.state_entry_time >
+           ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE;
+}
+
+/*
  * Score to preempt the target cpu.  Return a negative number if the
  * credit isn't high enough; if it is, favor a preemption on cpu in
  * this order:
@@ -1180,10 +1199,12 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
  *
  * Within the same class, the highest difference of credit.
  */
-static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now,
+static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
                              struct csched2_vcpu *new, unsigned int cpu)
 {
+    struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
     struct csched2_vcpu * cur = csched2_vcpu(curr_on_cpu(cpu));
+    struct csched2_private *prv = csched2_priv(ops);
     s_time_t score;
 
     /*
@@ -1191,7 +1212,8 @@ static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now,
      * in rqd->idle). However, some of them may be running their idle vcpu,
      * if taking care of tasklets. In that case, we want to leave it alone.
      */
-    if ( unlikely(is_idle_vcpu(cur->vcpu)) )
+    if ( unlikely(is_idle_vcpu(cur->vcpu) ||
+         !is_preemptable(cur, now, MICROSECS(prv->ratelimit_us))) )
         return -1;
 
     burn_credits(rqd, cur, now);
@@ -1348,7 +1370,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
     cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     if ( __cpumask_test_and_clear_cpu(cpu, &mask) )
     {
-        s_time_t score = tickle_score(rqd, now, new, cpu);
+        s_time_t score = tickle_score(ops, now, new, cpu);
 
         if ( score > max )
         {
@@ -1371,7 +1393,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         /* Already looked at this one above */
         ASSERT(i != cpu);
 
-        score = tickle_score(rqd, now, new, i);
+        score = tickle_score(ops, now, new, i);
 
         if ( score > max )
         {


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] xen: sched: factor affinity helpers out of sched_credit.c
  2017-06-16 14:13 ` [PATCH 1/7] xen: sched: factor affinity helpers out of sched_credit.c Dario Faggioli
@ 2017-06-23 10:02   ` Anshul Makkar
  2017-06-23 13:33     ` Dario Faggioli
  0 siblings, 1 reply; 25+ messages in thread
From: Anshul Makkar @ 2017-06-23 10:02 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Justin T. Weaver, George Dunlap

On 16/06/2017 15:13, Dario Faggioli wrote:
> In fact, we want to be able to use them from any scheduler.
>
> While there, make the moved code use 'v' for struct_vcpu*
> variable, like it should be done everywhere.
>
> No functional change intended.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
> ---
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit.c  |   97 +++++++-------------------------------------
>  xen/include/xen/sched-if.h |   64 +++++++++++++++++++++++++++++
>  2 files changed, 79 insertions(+), 82 deletions(-)
>
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index efdf6bf..53773df 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -136,27 +136,6 @@
*
> + * Hard affinity balancing is always necessary and must never be skipped.
> + * But soft affinity need only be considered when it has a functionally
> + * different effect than other constraints (such as hard affinity, cpus
> + * online, or cpupools).
> + *
> + * Soft affinity only needs to be considered if:
> + * * The cpus in the cpupool are not a subset of soft affinity
do you mean cpus in a cpupool are not in a subset of the soft affinity 
of a vcpu which is a runnable state ?
> + * * The hard affinity is not a subset of soft affinity
> + * * There is an overlap between the soft affinity and the mask which is
> + *   currently being considered.
> + */
>
reviewed !! looks good.

Anshul

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/7] xen/tools: credit2: soft-affinity awareness in runq_tickle()
  2017-06-16 14:13 ` [PATCH 2/7] xen/tools: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
@ 2017-06-23 10:35   ` Anshul Makkar
  2017-07-25 11:47   ` George Dunlap
  1 sibling, 0 replies; 25+ messages in thread
From: Anshul Makkar @ 2017-06-23 10:35 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Wei Liu, Ian Jackson, George Dunlap

On 16/06/2017 15:13, Dario Faggioli wrote:
> Soft-affinity support is usually implemented by means
> of a two step "balancing loop", where:
> - during the first step, we consider soft-affinity
>   (if the vcpu has one);
> - during the second (if we get to it), we consider
>   hard-affinity.
>
> In runq_tickle(), we need to do that for checking
> whether we can execute the waking vCPU on an pCPU
> that is idle. In fact, we want to be sure that, if
> there is an idle pCPU in the vCPU's soft affinity,
> we'll use it.
>
> If there are no such idle pCPUs, though, and we
> have to check non-idle ones, we can avoid the loop
> and to both hard and soft-affinity in one pass.
>
> In fact, we can we scan runqueue and compute a
> "score" for each vCPU which is running on each pCPU.
> The idea is, since we may have to preempt someone:
> - try to make sure that the waking vCPU will run
>   inside its soft-affinity,
> - try to preempt someone that is running outside
>   of its own soft-affinity.
>
> The value of the score is added to a trace record,
> so xenalyze's code and tools/xentrace/formats are
> updated accordingly.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---

reviewed. Looks good.

Anshul



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] xen: sched: factor affinity helpers out of sched_credit.c
  2017-06-23 10:02   ` Anshul Makkar
@ 2017-06-23 13:33     ` Dario Faggioli
  0 siblings, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2017-06-23 13:33 UTC (permalink / raw)
  To: Anshul Makkar, xen-devel; +Cc: Justin T. Weaver, George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 1748 bytes --]

On Fri, 2017-06-23 at 11:02 +0100, Anshul Makkar wrote:
> On 16/06/2017 15:13, Dario Faggioli wrote:
> > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> > index efdf6bf..53773df 100644
> > --- a/xen/common/sched_credit.c
> > +++ b/xen/common/sched_credit.c
> > @@ -136,27 +136,6 @@
> 
> *
> > + * Hard affinity balancing is always necessary and must never be
> > skipped.
> > + * But soft affinity need only be considered when it has a
> > functionally
> > + * different effect than other constraints (such as hard affinity,
> > cpus
> > + * online, or cpupools).
> > + *
> > + * Soft affinity only needs to be considered if:
> > + * * The cpus in the cpupool are not a subset of soft affinity
> 
> do you mean cpus in a cpupool are not in a subset of the soft
> affinity 
> of a vcpu which is a runnable state ?
>
Err... not sure. It's probably the same. IAC, what this means is that,
if a vcpu is in cpupool C, and the cpus of cpupool C are 1,2,3 _and_
the soft affinity of the vcpu is 0,1,2,3,4,5 then, it does not make
sense to check soft-affinity.

In fact, the vcpu will only execute on either 1,2 or 3, which are all
part of its soft-affinity anyway.

And it should be exactly that the cpupool is *a* *subset* of the soft-
affinity. In fact, if cpupool is 1,2,3 and soft-affinity is 0,1,3,4,5
then it _does_ make sense to check soft affinity,, to try not to run
the vcpu on 2.

Hope this clarify things.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
  2017-06-16 14:13 ` [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu() Dario Faggioli
@ 2017-07-25 10:19   ` George Dunlap
  2017-07-25 10:20     ` George Dunlap
  2017-07-25 16:00     ` Dario Faggioli
  0 siblings, 2 replies; 25+ messages in thread
From: George Dunlap @ 2017-07-25 10:19 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Anshul Makkar, Justin T. Weaver

On 06/16/2017 03:13 PM, Dario Faggioli wrote:
> By, basically, moving all the logic of the function
> inside the usual two steps (soft-affinity step and
> hard-affinity step) loop.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
> ---
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
> George, you gave your Reviewed-by to:
>  https://lists.xenproject.org/archives/html/xen-devel/2016-08/msg02201.html
> 
> which was adding soft-affinity awareness to both fallback_cpu and cpu_pick(). See here:
>  https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg03259.html
> 
> I changed the cpu_pick() part a lot, and that's why I decided to split the
> patch in two.  As far as fallback_cpu(), though, what's done in this patch is
> exactly the same that was being done in the original one.
> 
> So, of course I'm dropping the Rev-by, but I thought it could have been useful
> to mention this. :-)
> ---
>  xen/common/sched_credit2.c |   77 ++++++++++++++++++++++++++++++++------------
>  1 file changed, 56 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index c749d4e..54f6e21 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
>  }
>  
>  /*
> - * When a hard affinity change occurs, we may not be able to check some
> - * (any!) of the other runqueues, when looking for the best new processor
> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
> - * pick, in order of decreasing preference:
> - *  - svc's current pcpu;
> - *  - another pcpu from svc's current runq;
> - *  - any cpu.
> + * In csched2_cpu_pick(), it may not be possible to actually look at remote
> + * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
> + * we pick, in order of decreasing preference:
> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
> + *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
> + *  3) just one valid pcpu from svc's soft affinity;
> + *  4) svc's current pcpu, if it is part of svc's hard affinity;
> + *  5) a pcpu in svc's current runqueue that is also in svc's hard affinity;
> + *  6) just one valid pcpu from svc's hard affinity
> + *
> + * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
> + * note that at least 6 is guaranteed to _always_ return at least one pcpu.
>   */
>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>  {
>      struct vcpu *v = svc->vcpu;
> -    int cpu = v->processor;
> +    unsigned int bs;
>  
> -    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
> -                cpupool_domain_cpumask(v->domain));
> +    for_each_affinity_balance_step( bs )
> +    {
> +        int cpu = v->processor;
> +
> +        if ( bs == BALANCE_SOFT_AFFINITY &&
> +             !has_soft_affinity(v, v->cpu_hard_affinity) )
> +            continue;
>  
> -    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
> -        return cpu;
> +        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                    cpupool_domain_cpumask(v->domain));
>  
> -    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> -                                   &svc->rqd->active)) )
> -    {
> -        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
> -                    cpumask_scratch_cpu(cpu));
> -        return cpumask_first(cpumask_scratch_cpu(cpu));
> -    }
> +        /*
> +         * This is cases 1 or 4 (depending on bs): if v->processor is (still)
> +         * in our affinity, go for it, for cache betterness.
> +         */
> +        if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
> +            return cpu;
>  
> -    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
> +        /*
> +         * This is cases 2 or 5 (depending on bs): v->processor isn't there
> +         * any longer, check if we at least can stay in our current runq.
> +         */
> +        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> +                                       &svc->rqd->active)) )
> +        {
> +            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                        &svc->rqd->active);
> +            return cpumask_first(cpumask_scratch_cpu(cpu));
> +        }
>  
> -    return cpumask_first(cpumask_scratch_cpu(cpu));
> +        /*
> +         * This is cases 3 or 6 (depending on bs): last stand, just one valid
> +         * pcpu from our soft affinity, if we have one and if there's any. In
> +         * fact, if we are doing soft-affinity, it is possible that we fail,
> +         * which means we stay in the loop and look for hard affinity. OTOH,
> +         * if we are at the hard-affinity balancing step, it's guaranteed that
> +         * there is at least one valid cpu, and therefore we are sure that we
> +         * return it, and never really exit the loop.
> +         */
> +        ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
> +               bs == BALANCE_SOFT_AFFINITY);
> +        cpu = cpumask_first(cpumask_scratch_cpu(cpu));

So just checking my understanding here... at this point we're not taking
into consideration load or idleness or anything else -- we're just
saying, "Is there a cpu in my soft affinity it is *possible* to run on?"
 So on a properly configured system, we should never take the second
iteration of the loop?

> +        if ( likely(cpu < nr_cpu_ids) )
> +            return cpu;
> +    }
> +    BUG_ON(1);

Do we want to BUG() here?  I don't think this constitutes an
unrecoverable error; an ASSERT_UNREACHABLE() plus something random would
be better, wouldn't it?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
  2017-07-25 10:19   ` George Dunlap
@ 2017-07-25 10:20     ` George Dunlap
  2017-07-25 16:00     ` Dario Faggioli
  1 sibling, 0 replies; 25+ messages in thread
From: George Dunlap @ 2017-07-25 10:20 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Anshul Makkar, Justin T. Weaver

On 07/25/2017 11:19 AM, George Dunlap wrote:
> On 06/16/2017 03:13 PM, Dario Faggioli wrote:
>> By, basically, moving all the logic of the function
>> inside the usual two steps (soft-affinity step and
>> hard-affinity step) loop.
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
>> ---
>> Cc: Anshul Makkar <anshul.makkar@citrix.com>
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> ---
>> George, you gave your Reviewed-by to:
>>  https://lists.xenproject.org/archives/html/xen-devel/2016-08/msg02201.html
>>
>> which was adding soft-affinity awareness to both fallback_cpu and cpu_pick(). See here:
>>  https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg03259.html
>>
>> I changed the cpu_pick() part a lot, and that's why I decided to split the
>> patch in two.  As far as fallback_cpu(), though, what's done in this patch is
>> exactly the same that was being done in the original one.
>>
>> So, of course I'm dropping the Rev-by, but I thought it could have been useful
>> to mention this. :-)
>> ---
>>  xen/common/sched_credit2.c |   77 ++++++++++++++++++++++++++++++++------------
>>  1 file changed, 56 insertions(+), 21 deletions(-)
>>
>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index c749d4e..54f6e21 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
>>  }
>>  
>>  /*
>> - * When a hard affinity change occurs, we may not be able to check some
>> - * (any!) of the other runqueues, when looking for the best new processor
>> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
>> - * pick, in order of decreasing preference:
>> - *  - svc's current pcpu;
>> - *  - another pcpu from svc's current runq;
>> - *  - any cpu.
>> + * In csched2_cpu_pick(), it may not be possible to actually look at remote
>> + * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
>> + * we pick, in order of decreasing preference:
>> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
>> + *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
>> + *  3) just one valid pcpu from svc's soft affinity;
>> + *  4) svc's current pcpu, if it is part of svc's hard affinity;
>> + *  5) a pcpu in svc's current runqueue that is also in svc's hard affinity;
>> + *  6) just one valid pcpu from svc's hard affinity
>> + *
>> + * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
>> + * note that at least 6 is guaranteed to _always_ return at least one pcpu.
>>   */
>>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>>  {
>>      struct vcpu *v = svc->vcpu;
>> -    int cpu = v->processor;
>> +    unsigned int bs;
>>  
>> -    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
>> -                cpupool_domain_cpumask(v->domain));
>> +    for_each_affinity_balance_step( bs )
>> +    {
>> +        int cpu = v->processor;
>> +
>> +        if ( bs == BALANCE_SOFT_AFFINITY &&
>> +             !has_soft_affinity(v, v->cpu_hard_affinity) )
>> +            continue;
>>  
>> -    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
>> -        return cpu;
>> +        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
>> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>> +                    cpupool_domain_cpumask(v->domain));
>>  
>> -    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>> -                                   &svc->rqd->active)) )
>> -    {
>> -        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
>> -                    cpumask_scratch_cpu(cpu));
>> -        return cpumask_first(cpumask_scratch_cpu(cpu));
>> -    }
>> +        /*
>> +         * This is cases 1 or 4 (depending on bs): if v->processor is (still)
>> +         * in our affinity, go for it, for cache betterness.
>> +         */
>> +        if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
>> +            return cpu;
>>  
>> -    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
>> +        /*
>> +         * This is cases 2 or 5 (depending on bs): v->processor isn't there
>> +         * any longer, check if we at least can stay in our current runq.
>> +         */
>> +        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>> +                                       &svc->rqd->active)) )
>> +        {
>> +            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>> +                        &svc->rqd->active);
>> +            return cpumask_first(cpumask_scratch_cpu(cpu));
>> +        }
>>  
>> -    return cpumask_first(cpumask_scratch_cpu(cpu));
>> +        /*
>> +         * This is cases 3 or 6 (depending on bs): last stand, just one valid
>> +         * pcpu from our soft affinity, if we have one and if there's any. In
>> +         * fact, if we are doing soft-affinity, it is possible that we fail,
>> +         * which means we stay in the loop and look for hard affinity. OTOH,
>> +         * if we are at the hard-affinity balancing step, it's guaranteed that
>> +         * there is at least one valid cpu, and therefore we are sure that we
>> +         * return it, and never really exit the loop.
>> +         */
>> +        ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
>> +               bs == BALANCE_SOFT_AFFINITY);
>> +        cpu = cpumask_first(cpumask_scratch_cpu(cpu));
> 
> So just checking my understanding here... at this point we're not taking
> into consideration load or idleness or anything else -- we're just
> saying, "Is there a cpu in my soft affinity it is *possible* to run on?"
>  So on a properly configured system, we should never take the second
> iteration of the loop?
> 
>> +        if ( likely(cpu < nr_cpu_ids) )
>> +            return cpu;
>> +    }
>> +    BUG_ON(1);
> 
> Do we want to BUG() here?  I don't think this constitutes an
> unrecoverable error; an ASSERT_UNREACHABLE() plus something random would
> be better, wouldn't it?

Oh, should have said, everything else looks good; but apparently I said
that before. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] xen: credit2: soft-affinity awareness in csched2_cpu_pick()
  2017-06-16 14:14 ` [PATCH 4/7] xen: credit2: soft-affinity awareness in csched2_cpu_pick() Dario Faggioli
@ 2017-07-25 10:54   ` George Dunlap
  2017-07-25 11:04   ` George Dunlap
  1 sibling, 0 replies; 25+ messages in thread
From: George Dunlap @ 2017-07-25 10:54 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Anshul Makkar, Justin T. Weaver

On 06/16/2017 03:14 PM, Dario Faggioli wrote:
> We want to find the runqueue with the least average load,
> and to do that, we scan through all the runqueues.
> 
> It is, therefore, enough that, during such scan:
> - we identify the runqueue with the least load, among
>   the ones that have pcpus that are part of the soft
>   affinity of the vcpu we're calling pick on;
> - we identify the same, but for hard affinity.
> 
> At this point, we can decide whether to go for the
> runqueue with the least load among the ones with some
> soft-affinity, or overall.
> 
> Therefore, at the price of some code reshuffling, we
> can avoid the loop.
> 
> (Also, kill a spurious ';' in the definition of MAX_LOAD.)

Wow, that is kind of hilarious now that managed to slip its way in there
by always being behind a ';' anyway. :-)

> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit2.c |  117 ++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 97 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 54f6e21..fb97ff7 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -1725,14 +1725,16 @@ csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
>      vcpu_schedule_unlock_irq(lock, vc);
>  }
>  
> -#define MAX_LOAD (STIME_MAX);
> +#define MAX_LOAD (STIME_MAX)
>  static int
>  csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>  {
>      struct csched2_private *prv = csched2_priv(ops);
> -    int i, min_rqi = -1, new_cpu, cpu = vc->processor;
> +    int i, min_rqi = -1, min_s_rqi = -1;
> +    unsigned int new_cpu, cpu = vc->processor;
>      struct csched2_vcpu *svc = csched2_vcpu(vc);
> -    s_time_t min_avgload = MAX_LOAD;
> +    s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
> +    bool has_soft;
>  
>      ASSERT(!cpumask_empty(&prv->active_queues));
>  
> @@ -1781,17 +1783,35 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>          else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
>                                       &svc->migrate_rqd->active) )
>          {
> +            /*
> +             * If we've been asked to move to migrate_rqd, we should just do
> +             * that, which we actually do by returning one cpu from that runq.
> +             * There is no need to take care of soft affinity, as that will
> +             * happen in runq_tickle().
> +             */
>              cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>                          &svc->migrate_rqd->active);
>              new_cpu = cpumask_cycle(svc->migrate_rqd->pick_bias,
>                                      cpumask_scratch_cpu(cpu));
> +
>              svc->migrate_rqd->pick_bias = new_cpu;
>              goto out_up;
>          }
>          /* Fall-through to normal cpu pick */
>      }
>  
> -    /* Find the runqueue with the lowest average load. */
> +    /*
> +     * What we want is:
> +     *  - if we have soft affinity, the runqueue with the lowest average
> +     *    load, among the ones that contain cpus in our soft affinity; this
> +     *    represents the best runq on which we would want to run.
> +     *  - the runqueue with the lowest average load among the ones that
> +     *    contains cpus in our hard affinity; this represent the best runq
> +     *    on which we can run.
> +     *
> +     * Find both runqueues in one pass.
> +     */
> +    has_soft = has_soft_affinity(vc, vc->cpu_hard_affinity);
>      for_each_cpu(i, &prv->active_queues)
>      {
>          struct csched2_runqueue_data *rqd;
> @@ -1800,31 +1820,51 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>          rqd = prv->rqd + i;
>  
>          /*
> -         * If checking a different runqueue, grab the lock, check hard
> -         * affinity, read the avg, and then release the lock.
> +         * If none of the cpus of this runqueue is in svc's hard-affinity,
> +         * skip the runqueue.
> +         *
> +         * Note that, in case svc's hard-affinity has changed, this is the
> +         * first time when we see such change, so it is indeed possible
> +         * that we end up skipping svc's current runqueue.
> +         */
> +        if ( !cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
> +            continue;
> +
> +        /*
> +         * If checking a different runqueue, grab the lock, read the avg,
> +         * and then release the lock.
>           *
>           * If on our own runqueue, don't grab or release the lock;
>           * but subtract our own load from the runqueue load to simulate
>           * impartiality.
> -         *
> -         * Note that, if svc's hard affinity has changed, this is the
> -         * first time when we see such change, so it is indeed possible
> -         * that none of the cpus in svc's current runqueue is in our
> -         * (new) hard affinity!
>           */
>          if ( rqd == svc->rqd )
>          {
> -            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
> -                rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
> +            rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
>          }
>          else if ( spin_trylock(&rqd->lock) )
>          {
> -            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
> -                rqd_avgload = rqd->b_avgload;
> -
> +            rqd_avgload = rqd->b_avgload;
>              spin_unlock(&rqd->lock);
>          }
>  
> +        /*
> +         * if svc has a soft-affinity, and some cpus of rqd are part of it,
> +         * see if we need to update the "soft-affinity minimum".
> +         */
> +        if ( has_soft &&
> +             rqd_avgload < min_s_avgload )
> +        {
> +            cpumask_t mask;
> +
> +            cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active);
> +            if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) )
> +            {
> +                min_s_avgload = rqd_avgload;
> +                min_s_rqi = i;
> +            }
> +        }
> +        /* In any case, keep the "hard-affinity minimum" updated too. */
>          if ( rqd_avgload < min_avgload )
>          {
>              min_avgload = rqd_avgload;
> @@ -1832,17 +1872,54 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>          }
>      }
>  
> -    /* We didn't find anyone (most likely because of spinlock contention). */
> -    if ( min_rqi == -1 )
> +    if ( has_soft && min_s_rqi != -1 )
> +    {
> +        /*
> +         * We have soft affinity, and we have a candidate runq, so go for it.
> +         *
> +         * Note that, to obtain the soft-affinity mask, we "just" put what we
> +         * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
> +         * ok because:
> +         * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have
> +         *   a non-empty intersection (because has_soft is true);
> +         * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already
> +         *   in cpumask_scratch, we do save a lot doing like this.
> +         *
> +         * It's kind of like open coding affinity_balance_cpumask() but, in
> +         * this specific case, calling that would mean a lot of (unnecessary)
> +         * cpumask operations.
> +         */
> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                    vc->cpu_soft_affinity);
> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                    &prv->rqd[min_s_rqi].active);
> +    }
> +    else if ( min_rqi != -1 )
>      {
> +        /*
> +         * Either we don't have soft-affinity, or we do, but we did not find
> +         * any suitable runq. But we did find one when considering hard
> +         * affinity, so go for it.
> +         *
> +         * cpumask_scratch already has vc->cpu_hard_affinity &
> +         * cpupool_domain_cpumask() in it, so it's enough that we filter
> +         * with the cpus of the runq.
> +         */
> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                    &prv->rqd[min_rqi].active);
> +    }
> +    else
> +    {
> +        /*
> +         * We didn't find anyone at all (most likely because of spinlock
> +         * contention).
> +         */
>          new_cpu = get_fallback_cpu(svc);
>          min_rqi = c2r(ops, new_cpu);
>          min_avgload = prv->rqd[min_rqi].b_avgload;
>          goto out_up;
>      }
>  
> -    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> -                &prv->rqd[min_rqi].active);
>      new_cpu = cpumask_cycle(prv->rqd[min_rqi].pick_bias,
>                              cpumask_scratch_cpu(cpu));
>      prv->rqd[min_rqi].pick_bias = new_cpu;
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] xen: credit2: soft-affinity awareness in csched2_cpu_pick()
  2017-06-16 14:14 ` [PATCH 4/7] xen: credit2: soft-affinity awareness in csched2_cpu_pick() Dario Faggioli
  2017-07-25 10:54   ` George Dunlap
@ 2017-07-25 11:04   ` George Dunlap
  2017-07-25 11:05     ` George Dunlap
  1 sibling, 1 reply; 25+ messages in thread
From: George Dunlap @ 2017-07-25 11:04 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Anshul Makkar, Justin T. Weaver

On 06/16/2017 03:14 PM, Dario Faggioli wrote:
> We want to find the runqueue with the least average load,
> and to do that, we scan through all the runqueues.
> 
> It is, therefore, enough that, during such scan:
> - we identify the runqueue with the least load, among
>   the ones that have pcpus that are part of the soft
>   affinity of the vcpu we're calling pick on;
> - we identify the same, but for hard affinity.
> 
> At this point, we can decide whether to go for the
> runqueue with the least load among the ones with some
> soft-affinity, or overall.
> 
> Therefore, at the price of some code reshuffling, we
> can avoid the loop.
> 
> (Also, kill a spurious ';' in the definition of MAX_LOAD.)
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>

Looks good:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit2.c |  117 ++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 97 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 54f6e21..fb97ff7 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -1725,14 +1725,16 @@ csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
>      vcpu_schedule_unlock_irq(lock, vc);
>  }
>  
> -#define MAX_LOAD (STIME_MAX);
> +#define MAX_LOAD (STIME_MAX)
>  static int
>  csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>  {
>      struct csched2_private *prv = csched2_priv(ops);
> -    int i, min_rqi = -1, new_cpu, cpu = vc->processor;
> +    int i, min_rqi = -1, min_s_rqi = -1;
> +    unsigned int new_cpu, cpu = vc->processor;
>      struct csched2_vcpu *svc = csched2_vcpu(vc);
> -    s_time_t min_avgload = MAX_LOAD;
> +    s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
> +    bool has_soft;
>  
>      ASSERT(!cpumask_empty(&prv->active_queues));
>  
> @@ -1781,17 +1783,35 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>          else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
>                                       &svc->migrate_rqd->active) )
>          {
> +            /*
> +             * If we've been asked to move to migrate_rqd, we should just do
> +             * that, which we actually do by returning one cpu from that runq.
> +             * There is no need to take care of soft affinity, as that will
> +             * happen in runq_tickle().
> +             */
>              cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>                          &svc->migrate_rqd->active);
>              new_cpu = cpumask_cycle(svc->migrate_rqd->pick_bias,
>                                      cpumask_scratch_cpu(cpu));
> +
>              svc->migrate_rqd->pick_bias = new_cpu;
>              goto out_up;
>          }
>          /* Fall-through to normal cpu pick */
>      }
>  
> -    /* Find the runqueue with the lowest average load. */
> +    /*
> +     * What we want is:
> +     *  - if we have soft affinity, the runqueue with the lowest average
> +     *    load, among the ones that contain cpus in our soft affinity; this
> +     *    represents the best runq on which we would want to run.
> +     *  - the runqueue with the lowest average load among the ones that
> +     *    contains cpus in our hard affinity; this represent the best runq
> +     *    on which we can run.
> +     *
> +     * Find both runqueues in one pass.
> +     */
> +    has_soft = has_soft_affinity(vc, vc->cpu_hard_affinity);
>      for_each_cpu(i, &prv->active_queues)
>      {
>          struct csched2_runqueue_data *rqd;
> @@ -1800,31 +1820,51 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>          rqd = prv->rqd + i;
>  
>          /*
> -         * If checking a different runqueue, grab the lock, check hard
> -         * affinity, read the avg, and then release the lock.
> +         * If none of the cpus of this runqueue is in svc's hard-affinity,
> +         * skip the runqueue.
> +         *
> +         * Note that, in case svc's hard-affinity has changed, this is the
> +         * first time when we see such change, so it is indeed possible
> +         * that we end up skipping svc's current runqueue.
> +         */
> +        if ( !cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
> +            continue;
> +
> +        /*
> +         * If checking a different runqueue, grab the lock, read the avg,
> +         * and then release the lock.
>           *
>           * If on our own runqueue, don't grab or release the lock;
>           * but subtract our own load from the runqueue load to simulate
>           * impartiality.
> -         *
> -         * Note that, if svc's hard affinity has changed, this is the
> -         * first time when we see such change, so it is indeed possible
> -         * that none of the cpus in svc's current runqueue is in our
> -         * (new) hard affinity!
>           */
>          if ( rqd == svc->rqd )
>          {
> -            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
> -                rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
> +            rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
>          }
>          else if ( spin_trylock(&rqd->lock) )
>          {
> -            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
> -                rqd_avgload = rqd->b_avgload;
> -
> +            rqd_avgload = rqd->b_avgload;
>              spin_unlock(&rqd->lock);
>          }
>  
> +        /*
> +         * if svc has a soft-affinity, and some cpus of rqd are part of it,
> +         * see if we need to update the "soft-affinity minimum".
> +         */
> +        if ( has_soft &&
> +             rqd_avgload < min_s_avgload )
> +        {
> +            cpumask_t mask;
> +
> +            cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active);
> +            if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) )
> +            {
> +                min_s_avgload = rqd_avgload;
> +                min_s_rqi = i;
> +            }
> +        }This should probably be something 
> +        /* In any case, keep the "hard-affinity minimum" updated too. */
>          if ( rqd_avgload < min_avgload )
>          {
>              min_avgload = rqd_avgload;
> @@ -1832,17 +1872,54 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>          }
>      }
>  
> -    /* We didn't find anyone (most likely because of spinlock contention). */
> -    if ( min_rqi == -1 )
> +    if ( has_soft && min_s_rqi != -1 )
> +    {
> +        /*
> +         * We have soft affinity, and we have a candidate runq, so go for it.
> +         *
> +         * Note that, to obtain the soft-affinity mask, we "just" put what we
> +         * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
> +         * ok because:
> +         * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have
> +         *   a non-empty intersection (because has_soft is true);
> +         * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already
> +         *   in cpumask_scratch, we do save a lot doing like this.
> +         *
> +         * It's kind of like open coding affinity_balance_cpumask() but, in
> +         * this specific case, calling that would mean a lot of (unnecessary)
> +         * cpumask operations.
> +         */
> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                    vc->cpu_soft_affinity);
> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                    &prv->rqd[min_s_rqi].active);
> +    }
> +    else if ( min_rqi != -1 )
>      {
> +        /*
> +         * Either we don't have soft-affinity, or we do, but we did not find
> +         * any suitable runq. But we did find one when considering hard
> +         * affinity, so go for it.
> +         *
> +         * cpumask_scratch already has vc->cpu_hard_affinity &
> +         * cpupool_domain_cpumask() in it, so it's enough that we filter
> +         * with the cpus of the runq.
> +         */
> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                    &prv->rqd[min_rqi].active);
> +    }
> +    else
> +    {
> +        /*
> +         * We didn't find anyone at all (most likely because of spinlock
> +         * contention).
> +         */
>          new_cpu = get_fallback_cpu(svc);
>          min_rqi = c2r(ops, new_cpu);
>          min_avgload = prv->rqd[min_rqi].b_avgload;
>          goto out_up;
>      }
>  
> -    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> -                &prv->rqd[min_rqi].active);
>      new_cpu = cpumask_cycle(prv->rqd[min_rqi].pick_bias,
>                              cpumask_scratch_cpu(cpu));
>      prv->rqd[min_rqi].pick_bias = new_cpu;
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] xen: credit2: soft-affinity awareness in csched2_cpu_pick()
  2017-07-25 11:04   ` George Dunlap
@ 2017-07-25 11:05     ` George Dunlap
  0 siblings, 0 replies; 25+ messages in thread
From: George Dunlap @ 2017-07-25 11:05 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel

On Tue, Jul 25, 2017 at 12:04 PM, George Dunlap
<george.dunlap@citrix.com> wrote:
> On 06/16/2017 03:14 PM, Dario Faggioli wrote:
>> We want to find the runqueue with the least average load,
>> and to do that, we scan through all the runqueues.
>>
>> It is, therefore, enough that, during such scan:
>> - we identify the runqueue with the least load, among
>>   the ones that have pcpus that are part of the soft
>>   affinity of the vcpu we're calling pick on;
>> - we identify the same, but for hard affinity.
>>
>> At this point, we can decide whether to go for the
>> runqueue with the least load among the ones with some
>> soft-affinity, or overall.
>>
>> Therefore, at the price of some code reshuffling, we
>> can avoid the loop.
>>
>> (Also, kill a spurious ';' in the definition of MAX_LOAD.)
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
>
> Looks good:
>
> Reviewed-by: George Dunlap <george.dunlap@citrix.com>

Er, sorry, this was supposed to be for patch 5...

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/7] xen: credit2: kick away vcpus not running within their soft-affinity
  2017-06-16 14:14 ` [PATCH 5/7] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
@ 2017-07-25 11:06   ` George Dunlap
  0 siblings, 0 replies; 25+ messages in thread
From: George Dunlap @ 2017-07-25 11:06 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Anshul Makkar

On 06/16/2017 03:14 PM, Dario Faggioli wrote:
> If, during scheduling, we realize that the current vcpu
> is running outside of its own soft-affinity, it would be
> preferable to send it somewhere else.
> 
> Of course, that may not be possible, and if we're too
> strict, we risk having vcpus sit in runqueues, even if
> there are idle pcpus (violating work-conservingness).
> In fact, what about there are no pcpus, from the soft
> affinity mask of the vcpu in question, where it can
> run?
> 
> To make sure we don't fall in the above described trap,
> only actually de-schedule the vcpu if there are idle and
> not already tickled cpus from its soft affinity where it
> can run immediately.
> 
> If there is (at least one) of such cpus, we let current
> be preempted, so that csched2_context_saved() will put
> it back in the runq, and runq_tickle() will wake (one
> of) the cpu.
> 
> If there is not even one, we let current run where it is,
> as running outside its soft-affinity is still better than
> not running at all.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

*This* one looks good:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit2.c |   40 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index fb97ff7..5d8f25c 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -2637,6 +2637,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>      struct csched2_vcpu *snext = NULL;
>      struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
>      bool yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
> +    bool soft_aff_preempt = false;
>  
>      *skipped = 0;
>  
> @@ -2670,8 +2671,43 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>          return scurr;
>      }
>  
> -    /* Default to current if runnable, idle otherwise */
> -    if ( vcpu_runnable(scurr->vcpu) )
> +    /* If scurr has a soft-affinity, let's check whether cpu is part of it */
> +    if ( !is_idle_vcpu(scurr->vcpu) &&
> +         has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
> +    {
> +        affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
> +                                 cpumask_scratch);
> +        if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) )
> +        {
> +            cpumask_t *online = cpupool_domain_cpumask(scurr->vcpu->domain);
> +
> +            /* Ok, is any of the pcpus in scurr soft-affinity idle? */
> +            cpumask_and(cpumask_scratch, cpumask_scratch, &rqd->idle);
> +            cpumask_andnot(cpumask_scratch, cpumask_scratch, &rqd->tickled);
> +            soft_aff_preempt = cpumask_intersects(cpumask_scratch, online);
> +        }
> +    }
> +
> +    /*
> +     * If scurr is runnable, and this cpu is in its soft-affinity, default to
> +     * it. We also default to it, even if cpu is not in its soft-affinity, if
> +     * there aren't any idle and not tickled cpu in its soft-affinity. In
> +     * fact, we don't want to risk leaving scurr in the runq and this cpu idle
> +     * only because scurr is running outside of its soft-affinity.
> +     *
> +     * On the other hand, if cpu is not in scurr's soft-affinity, and there
> +     * looks to be better options, go for them. That happens by defaulting to
> +     * idle here, which means scurr will be preempted, put back in runq, and
> +     * one of those idle and not tickled cpus from its soft-affinity will be
> +     * tickled to pick it up.
> +     *
> +     * Finally, if scurr does not have a valid soft-affinity, we also let it
> +     * continue to run here (in fact, soft_aff_preempt will still be false,
> +     * in this case).
> +     *
> +     * Of course, we also default to idle also if scurr is not runnable.
> +     */
> +    if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
>          snext = scurr;
>      else
>          snext = csched2_vcpu(idle_vcpu[cpu]);
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] xen: credit2: optimize runq_candidate() a little bit
  2017-06-16 14:14 ` [PATCH 6/7] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
@ 2017-07-25 11:25   ` George Dunlap
  0 siblings, 0 replies; 25+ messages in thread
From: George Dunlap @ 2017-07-25 11:25 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Anshul Makkar

On 06/16/2017 03:14 PM, Dario Faggioli wrote:
> By factoring into one (at the top) all the checks
> to see whether current is the idle vcpu, and mark
> it as unlikely().
> 
> In fact, if current is idle, all the logic for
> dealing with yielding, context switching rate
> limiting and soft-affinity, is just pure overhead,
> and we better rush checking the runq and pick some
> vcpu up.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Overall looks good -- great idea.  Just one comment...

> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit2.c |   22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 5d8f25c..bbda790 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -2634,13 +2634,20 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>                 unsigned int *skipped)
>  {
>      struct list_head *iter;
> -    struct csched2_vcpu *snext = NULL;
> +    struct csched2_vcpu *snext = csched2_vcpu(idle_vcpu[cpu]);
>      struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
> -    bool yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
> -    bool soft_aff_preempt = false;
> +    bool yield = false, soft_aff_preempt = false;
>  
>      *skipped = 0;
>  
> +    if ( unlikely(is_idle_vcpu(scurr->vcpu)) )
> +    {
> +        snext = scurr;

So first of all, since we set snext above, this should (in theory) be
redundant.  However...

> +        goto check_runq;
> +    }
> +
> +    yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
> +
>      /*
>       * Return the current vcpu if it has executed for less than ratelimit.
>       * Adjuststment for the selected vcpu's credit and decision
> @@ -2650,8 +2657,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>       * In fact, it may be the case that scurr is about to spin, and there's
>       * no point forcing it to do so until rate limiting expires.
>       */
> -    if ( !yield && prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
> -         vcpu_runnable(scurr->vcpu) &&
> +    if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) &&
>           (now - scurr->vcpu->runstate.state_entry_time) <
>            MICROSECS(prv->ratelimit_us) )
>      {
> @@ -2672,8 +2678,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>      }
>  
>      /* If scurr has a soft-affinity, let's check whether cpu is part of it */
> -    if ( !is_idle_vcpu(scurr->vcpu) &&
> -         has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
> +    if ( has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
>      {
>          affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
>                                   cpumask_scratch);
> @@ -2709,9 +2714,8 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>       */
>      if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
>          snext = scurr;
> -    else
> -        snext = csched2_vcpu(idle_vcpu[cpu]);

...does moving this up to the declaration buy us anything?

Personally I'm not a fan of the "Set everything at declaration as much
as possible" style (although Jan seems to be).  I think it's easier to
understand what happens here if we leave it as-is rather than having to
go up and see what it is.

I don't feel terribly strongly about it; but I think if we set it at
declaration, we *shouldn't* have the redundant assignment in the "idle
bypass" stanza.  (And if it makes you feel uncomfortable to remove it, I
think that's evidence that "set at declaration" is sub-optimal in this
case.)

Either option (leave unset at declaration and set it explicitly on both
paths, or set it at declaration and leave it unset on all paths except
the 'snext = scurr' here) can have my Reviewed-by.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/7] xen: credit2: try to avoid tickling cpus subject to ratelimiting
  2017-06-16 14:14 ` [PATCH 7/7] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
@ 2017-07-25 11:31   ` George Dunlap
  0 siblings, 0 replies; 25+ messages in thread
From: George Dunlap @ 2017-07-25 11:31 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Anshul Makkar

On 06/16/2017 03:14 PM, Dario Faggioli wrote:
> With context switching ratelimiting enabled, the following
> pattern is quite common in a scheduling trace:
> 
>      0.000845622 |||||||||||.x||| d32768v12 csched2:runq_insert d0v13, position 0
>      0.000845831 |||||||||||.x||| d32768v12 csched2:runq_tickle_new d0v13, processor = 12, credit = 10135529
>      0.000846546 |||||||||||.x||| d32768v12 csched2:burn_credits d2v7, credit = 2619231, delta = 255937
>  [1] 0.000846739 |||||||||||.x||| d32768v12 csched2:runq_tickle cpu 12
>      [...]
>  [2] 0.000850597 ||||||||||||x||| d32768v12 csched2:schedule cpu 12, rq# 1, busy, SMT busy, tickled
>      0.000850760 ||||||||||||x||| d32768v12 csched2:burn_credits d2v7, credit = 2614028, delta = 5203
>  [3] 0.000851022 ||||||||||||x||| d32768v12 csched2:ratelimit triggered
>  [4] 0.000851614 ||||||||||||x||| d32768v12 runstate_continue d2v7 running->running
> 
> Basically, what happens is that runq_tickle() realizes
> d0v13 should preempt d2v7, running on cpu 12, as it
> has higher credits (10135529 vs. 2619231). It therefore
> tickles cpu 12 [1], which, in turn, schedules [2].
> 
> But --surprise surprise-- d2v7 has run for less than the
> ratelimit interval [3], and hence it is _not_ preempted,
> and continues to run. This indeed looks fine. Actually,
> this is what ratelimiting is there for. Note, however,
> that:
>  1) we interrupted cpu 12 for nothing;
>  2) what if, say on cpu 8, there is a vcpu that has:
>     + less credit than d0v13 (so d0v13 can well
>       preempt it),
>     + more credit than d2v7 (that's why it was not
>       selected to be preempted),
>     + run for more than the ratelimiting interval
>       (so it can really be scheduled out)?
> 
> With this patch, if we are in case 2), we'd realize
> that tickling 12 would be pointless, and we'll continue
> looking, eventually finding and tickling 8.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Well spotted:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit2.c |   30 ++++++++++++++++++++++++++----
>  1 file changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index bbda790..c45bc03 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -160,6 +160,8 @@
>  #define CSCHED2_MIGRATE_RESIST       ((opt_migrate_resist)*MICROSECS(1))
>  /* How much to "compensate" a vcpu for L2 migration. */
>  #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50)
> +/* How tolerant we should be when peeking at runtime of vcpus on other cpus */
> +#define CSCHED2_RATELIMIT_TICKLE_TOLERANCE MICROSECS(50)
>  /* Reset: Value below which credit will be reset. */
>  #define CSCHED2_CREDIT_RESET         0
>  /* Max timer: Maximum time a guest can be run for. */
> @@ -1167,6 +1169,23 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
>  }
>  
>  /*
> + * What we want to know is whether svc, which we assume to be running on some
> + * pcpu, can be interrupted and preempted (which, so far, basically means
> + * whether or not it already run for more than the ratelimit, to which we
> + * apply some tolerance).
> + */
> +static inline bool is_preemptable(const struct csched2_vcpu *svc,
> +                                    s_time_t now, s_time_t ratelimit)
> +{
> +    if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
> +        return true;
> +
> +    ASSERT(svc->vcpu->is_running);
> +    return now - svc->vcpu->runstate.state_entry_time >
> +           ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE;
> +}
> +
> +/*
>   * Score to preempt the target cpu.  Return a negative number if the
>   * credit isn't high enough; if it is, favor a preemption on cpu in
>   * this order:
> @@ -1180,10 +1199,12 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
>   *
>   * Within the same class, the highest difference of credit.
>   */
> -static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now,
> +static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
>                               struct csched2_vcpu *new, unsigned int cpu)
>  {
> +    struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
>      struct csched2_vcpu * cur = csched2_vcpu(curr_on_cpu(cpu));
> +    struct csched2_private *prv = csched2_priv(ops);
>      s_time_t score;
>  
>      /*
> @@ -1191,7 +1212,8 @@ static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now,
>       * in rqd->idle). However, some of them may be running their idle vcpu,
>       * if taking care of tasklets. In that case, we want to leave it alone.
>       */
> -    if ( unlikely(is_idle_vcpu(cur->vcpu)) )
> +    if ( unlikely(is_idle_vcpu(cur->vcpu) ||
> +         !is_preemptable(cur, now, MICROSECS(prv->ratelimit_us))) )
>          return -1;
>  
>      burn_credits(rqd, cur, now);
> @@ -1348,7 +1370,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>      cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
>      if ( __cpumask_test_and_clear_cpu(cpu, &mask) )
>      {
> -        s_time_t score = tickle_score(rqd, now, new, cpu);
> +        s_time_t score = tickle_score(ops, now, new, cpu);
>  
>          if ( score > max )
>          {
> @@ -1371,7 +1393,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>          /* Already looked at this one above */
>          ASSERT(i != cpu);
>  
> -        score = tickle_score(rqd, now, new, i);
> +        score = tickle_score(ops, now, new, i);
>  
>          if ( score > max )
>          {
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/7] xen/tools: credit2: soft-affinity awareness in runq_tickle()
  2017-06-16 14:13 ` [PATCH 2/7] xen/tools: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
  2017-06-23 10:35   ` Anshul Makkar
@ 2017-07-25 11:47   ` George Dunlap
  1 sibling, 0 replies; 25+ messages in thread
From: George Dunlap @ 2017-07-25 11:47 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Wei Liu, Anshul Makkar, Ian Jackson

On 06/16/2017 03:13 PM, Dario Faggioli wrote:
> Soft-affinity support is usually implemented by means
> of a two step "balancing loop", where:
> - during the first step, we consider soft-affinity
>   (if the vcpu has one);
> - during the second (if we get to it), we consider
>   hard-affinity.
> 
> In runq_tickle(), we need to do that for checking
> whether we can execute the waking vCPU on an pCPU
> that is idle. In fact, we want to be sure that, if
> there is an idle pCPU in the vCPU's soft affinity,
> we'll use it.
> 
> If there are no such idle pCPUs, though, and we
> have to check non-idle ones, we can avoid the loop
> and to both hard and soft-affinity in one pass.
> 
> In fact, we can we scan runqueue and compute a
> "score" for each vCPU which is running on each pCPU.
> The idea is, since we may have to preempt someone:
> - try to make sure that the waking vCPU will run
>   inside its soft-affinity,
> - try to preempt someone that is running outside
>   of its own soft-affinity.
> 
> The value of the score is added to a trace record,
> so xenalyze's code and tools/xentrace/formats are
> updated accordingly.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
> This is *very* different from what was first submitted here:
>  https://lists.xenproject.org/archives/html/xen-devel/2016-08/msg02200.html
> 
> In fact, I reworked it starting from George's idea and draft:
>  https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg03171.html
> 
> But then ended up changing that quite a bit as well. So, George, I'm more than
> happy for this patch to have a 'Signed-off-by: George Dunlap', and in fact, I
> had it here until 5 minutes before actually sending the series.
> 
> But since I altered the code significantly, I could not be sure you'd be happy
> about that, and hence decided to remove it, and ask you. Are you ok with it, or
> maybe you prefer some other tag (stuff like 'Idea-by:', etc.). Let me know. :-)

Haha -- actually, I'd completely forgotten that I'd suggested that; and
after reviewing it (but before seeing your comment here), I was going to
say I thought using a 'score' was a really good idea. :-)

I think "Suggested-by" would be fine here; the point of sending you a
patch wasn't to use actual code, but to 1) make sure the idea wasn't
crazy, and 2) concisely describe what my idea was.

I can added that and the following on check-in, depending on what we
decide about patch 3:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
  2017-07-25 10:19   ` George Dunlap
  2017-07-25 10:20     ` George Dunlap
@ 2017-07-25 16:00     ` Dario Faggioli
  2017-07-25 16:17       ` George Dunlap
  1 sibling, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2017-07-25 16:00 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Anshul Makkar, Justin T. Weaver


[-- Attachment #1.1: Type: text/plain, Size: 7153 bytes --]

On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
> On 06/16/2017 03:13 PM, Dario Faggioli wrote:
> > 
> > diff --git a/xen/common/sched_credit2.c
> > b/xen/common/sched_credit2.c
> > index c749d4e..54f6e21 100644
> > --- a/xen/common/sched_credit2.c
> > +++ b/xen/common/sched_credit2.c
> > @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu,
> > cpumask_t *mask)
> >  }
> >  
> >  /*
> > - * When a hard affinity change occurs, we may not be able to check
> > some
> > - * (any!) of the other runqueues, when looking for the best new
> > processor
> > - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that
> > happens, we
> > - * pick, in order of decreasing preference:
> > - *  - svc's current pcpu;
> > - *  - another pcpu from svc's current runq;
> > - *  - any cpu.
> > + * In csched2_cpu_pick(), it may not be possible to actually look
> > at remote
> > + * runqueues (the trylock-s on their spinlocks can fail!). If that
> > happens,
> > + * we pick, in order of decreasing preference:
> > + *  1) svc's current pcpu, if it is part of svc's soft affinity;
> > + *  2) a pcpu in svc's current runqueue that is also in svc's soft
> > affinity;
> > + *  3) just one valid pcpu from svc's soft affinity;
> > + *  4) svc's current pcpu, if it is part of svc's hard affinity;
> > + *  5) a pcpu in svc's current runqueue that is also in svc's hard
> > affinity;
> > + *  6) just one valid pcpu from svc's hard affinity
> > + *
> > + * Of course, 1, 2 and 3 makes sense only if svc has a soft
> > affinity. Also
> > + * note that at least 6 is guaranteed to _always_ return at least
> > one pcpu.
> >   */
> >  static int get_fallback_cpu(struct csched2_vcpu *svc)
> >  {
> >      struct vcpu *v = svc->vcpu;
> > -    int cpu = v->processor;
> > +    unsigned int bs;
> >  
> > -    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
> > -                cpupool_domain_cpumask(v->domain));
> > +    for_each_affinity_balance_step( bs )
> > +    {
> > +        int cpu = v->processor;
> > +
> > +        if ( bs == BALANCE_SOFT_AFFINITY &&
> > +             !has_soft_affinity(v, v->cpu_hard_affinity) )
> > +            continue;
> >  
> > -    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
> > -        return cpu;
> > +        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
> > +        cpumask_and(cpumask_scratch_cpu(cpu),
> > cpumask_scratch_cpu(cpu),
> > +                    cpupool_domain_cpumask(v->domain));
> >  
> > -    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> > -                                   &svc->rqd->active)) )
> > -    {
> > -        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
> > -                    cpumask_scratch_cpu(cpu));
> > -        return cpumask_first(cpumask_scratch_cpu(cpu));
> > -    }
> > +        /*
> > +         * This is cases 1 or 4 (depending on bs): if v->processor 
> > is (still)
> > +         * in our affinity, go for it, for cache betterness.
> > +         */
> > +        if ( likely(cpumask_test_cpu(cpu,
> > cpumask_scratch_cpu(cpu))) )
> > +            return cpu;
> >  
> > -    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
> > +        /*
> > +         * This is cases 2 or 5 (depending on bs): v->processor
> > isn't there
> > +         * any longer, check if we at least can stay in our
> > current runq.
> > +         */
> > +        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> > +                                       &svc->rqd->active)) )
> > +        {
> > +            cpumask_and(cpumask_scratch_cpu(cpu),
> > cpumask_scratch_cpu(cpu),
> > +                        &svc->rqd->active);
> > +            return cpumask_first(cpumask_scratch_cpu(cpu));
> > +        }
> >  
> > -    return cpumask_first(cpumask_scratch_cpu(cpu));
> > +        /*
> > +         * This is cases 3 or 6 (depending on bs): last stand,
> > just one valid
> > +         * pcpu from our soft affinity, if we have one and if
> > there's any. In
> > +         * fact, if we are doing soft-affinity, it is possible
> > that we fail,
> > +         * which means we stay in the loop and look for hard
> > affinity. OTOH,
> > +         * if we are at the hard-affinity balancing step, it's
> > guaranteed that
> > +         * there is at least one valid cpu, and therefore we are
> > sure that we
> > +         * return it, and never really exit the loop.
> > +         */
> > +        ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
> > +               bs == BALANCE_SOFT_AFFINITY);
> > +        cpu = cpumask_first(cpumask_scratch_cpu(cpu));
> 
> So just checking my understanding here... at this point we're not
> taking
> into consideration load or idleness or anything else -- we're just
> saying, "Is there a cpu in my soft affinity it is *possible* to run
> on?"
>
Exactly. If we are in this function, it means we failed to take the
locks we needed, for making a choice based on load, idleness, etc, but
we need a CPU, so we pick whatever is valid.

For choosing among all the valid ones, we act how it is explained in
the comment.

>  So on a properly configured system, we should never take the second
> iteration of the loop?
> 
Mmm.. I think you're right. In fact, in a properly configured system,
we'll never go past step 3 (from the comment at the top).

Which is not ideal, or at least not what I had in mind. In fact, I
think it's better to check step 4 (svc->vcpu->processor in hard-
affinity) and step 5 (a CPU from svc's runqueue in hard affinity), as
that would mean avoiding a runqueue migration.

What about I basically kill step 3, i.e., if we reach this point during
the soft-affinity step, I just continue to the hard-affinity one?

> > +        if ( likely(cpu < nr_cpu_ids) )
> > +            return cpu;
> > +    }
> > +    BUG_ON(1);
> 
> Do we want to BUG() here?  I don't think this constitutes an
> unrecoverable error; an ASSERT_UNREACHABLE() plus something random
> would
> be better, wouldn't it?
> 
ASSERT_UNREACHABLE() is indeed much better. What do you mean with
"something random"? The value to be assigned to cpu?

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
  2017-07-25 16:00     ` Dario Faggioli
@ 2017-07-25 16:17       ` George Dunlap
  2017-07-25 16:47         ` Dario Faggioli
  0 siblings, 1 reply; 25+ messages in thread
From: George Dunlap @ 2017-07-25 16:17 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Anshul Makkar, Justin T. Weaver

On 07/25/2017 05:00 PM, Dario Faggioli wrote:
> On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
>> On 06/16/2017 03:13 PM, Dario Faggioli wrote:
>>>
>>> diff --git a/xen/common/sched_credit2.c
>>> b/xen/common/sched_credit2.c
>>> index c749d4e..54f6e21 100644
>>> --- a/xen/common/sched_credit2.c
>>> +++ b/xen/common/sched_credit2.c
>>> @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu,
>>> cpumask_t *mask)
>>>  }
>>>  
>>>  /*
>>> - * When a hard affinity change occurs, we may not be able to check
>>> some
>>> - * (any!) of the other runqueues, when looking for the best new
>>> processor
>>> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that
>>> happens, we
>>> - * pick, in order of decreasing preference:
>>> - *  - svc's current pcpu;
>>> - *  - another pcpu from svc's current runq;
>>> - *  - any cpu.
>>> + * In csched2_cpu_pick(), it may not be possible to actually look
>>> at remote
>>> + * runqueues (the trylock-s on their spinlocks can fail!). If that
>>> happens,
>>> + * we pick, in order of decreasing preference:
>>> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
>>> + *  2) a pcpu in svc's current runqueue that is also in svc's soft
>>> affinity;
>>> + *  3) just one valid pcpu from svc's soft affinity;
>>> + *  4) svc's current pcpu, if it is part of svc's hard affinity;
>>> + *  5) a pcpu in svc's current runqueue that is also in svc's hard
>>> affinity;
>>> + *  6) just one valid pcpu from svc's hard affinity
>>> + *
>>> + * Of course, 1, 2 and 3 makes sense only if svc has a soft
>>> affinity. Also
>>> + * note that at least 6 is guaranteed to _always_ return at least
>>> one pcpu.
>>>   */
>>>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>>>  {
>>>      struct vcpu *v = svc->vcpu;
>>> -    int cpu = v->processor;
>>> +    unsigned int bs;
>>>  
>>> -    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
>>> -                cpupool_domain_cpumask(v->domain));
>>> +    for_each_affinity_balance_step( bs )
>>> +    {
>>> +        int cpu = v->processor;
>>> +
>>> +        if ( bs == BALANCE_SOFT_AFFINITY &&
>>> +             !has_soft_affinity(v, v->cpu_hard_affinity) )
>>> +            continue;
>>>  
>>> -    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
>>> -        return cpu;
>>> +        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
>>> +        cpumask_and(cpumask_scratch_cpu(cpu),
>>> cpumask_scratch_cpu(cpu),
>>> +                    cpupool_domain_cpumask(v->domain));
>>>  
>>> -    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>>> -                                   &svc->rqd->active)) )
>>> -    {
>>> -        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
>>> -                    cpumask_scratch_cpu(cpu));
>>> -        return cpumask_first(cpumask_scratch_cpu(cpu));
>>> -    }
>>> +        /*
>>> +         * This is cases 1 or 4 (depending on bs): if v->processor 
>>> is (still)
>>> +         * in our affinity, go for it, for cache betterness.
>>> +         */
>>> +        if ( likely(cpumask_test_cpu(cpu,
>>> cpumask_scratch_cpu(cpu))) )
>>> +            return cpu;
>>>  
>>> -    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
>>> +        /*
>>> +         * This is cases 2 or 5 (depending on bs): v->processor
>>> isn't there
>>> +         * any longer, check if we at least can stay in our
>>> current runq.
>>> +         */
>>> +        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>>> +                                       &svc->rqd->active)) )
>>> +        {
>>> +            cpumask_and(cpumask_scratch_cpu(cpu),
>>> cpumask_scratch_cpu(cpu),
>>> +                        &svc->rqd->active);
>>> +            return cpumask_first(cpumask_scratch_cpu(cpu));
>>> +        }
>>>  
>>> -    return cpumask_first(cpumask_scratch_cpu(cpu));
>>> +        /*
>>> +         * This is cases 3 or 6 (depending on bs): last stand,
>>> just one valid
>>> +         * pcpu from our soft affinity, if we have one and if
>>> there's any. In
>>> +         * fact, if we are doing soft-affinity, it is possible
>>> that we fail,
>>> +         * which means we stay in the loop and look for hard
>>> affinity. OTOH,
>>> +         * if we are at the hard-affinity balancing step, it's
>>> guaranteed that
>>> +         * there is at least one valid cpu, and therefore we are
>>> sure that we
>>> +         * return it, and never really exit the loop.
>>> +         */
>>> +        ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
>>> +               bs == BALANCE_SOFT_AFFINITY);
>>> +        cpu = cpumask_first(cpumask_scratch_cpu(cpu));
>>
>> So just checking my understanding here... at this point we're not
>> taking
>> into consideration load or idleness or anything else -- we're just
>> saying, "Is there a cpu in my soft affinity it is *possible* to run
>> on?"
>>
> Exactly. If we are in this function, it means we failed to take the
> locks we needed, for making a choice based on load, idleness, etc, but
> we need a CPU, so we pick whatever is valid.
> 
> For choosing among all the valid ones, we act how it is explained in
> the comment.
> 
>>  So on a properly configured system, we should never take the second
>> iteration of the loop?
>>
> Mmm.. I think you're right. In fact, in a properly configured system,
> we'll never go past step 3 (from the comment at the top).
> 
> Which is not ideal, or at least not what I had in mind. In fact, I
> think it's better to check step 4 (svc->vcpu->processor in hard-
> affinity) and step 5 (a CPU from svc's runqueue in hard affinity), as
> that would mean avoiding a runqueue migration.
> 
> What about I basically kill step 3, i.e., if we reach this point during
> the soft-affinity step, I just continue to the hard-affinity one?

Hmm, well *normally* we would rather have a vcpu running within its soft
affinity, even if that means moving it to another runqueue.  Is your
idea that, the only reason we're in this particular code is because we
couldn't grab the lock we need to make a more informed decision; so
defer if possible to previous decisions, which (we might presume) was
able to make a more informed decision?

>>> +        if ( likely(cpu < nr_cpu_ids) )
>>> +            return cpu;
>>> +    }
>>> +    BUG_ON(1);
>>
>> Do we want to BUG() here?  I don't think this constitutes an
>> unrecoverable error; an ASSERT_UNREACHABLE() plus something random
>> would
>> be better, wouldn't it?
>>
> ASSERT_UNREACHABLE() is indeed much better. What do you mean with
> "something random"? The value to be assigned to cpu?

Er, yes, I meant the return value.  Returning 0, or v->processor would
be simple options.  *Really* defensive programming would attempt to
chose something somewhat sensible with the minimal risk of triggering
some other hidden assumptions (say, any cpu on our current runqueue).
But part of me says even thinking too long about it is a waste of time
for something we're 99.99% sure can never happen. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
  2017-07-25 16:17       ` George Dunlap
@ 2017-07-25 16:47         ` Dario Faggioli
  2017-07-25 16:52           ` George Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2017-07-25 16:47 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Anshul Makkar, Justin T. Weaver


[-- Attachment #1.1: Type: text/plain, Size: 3811 bytes --]

On Tue, 2017-07-25 at 17:17 +0100, George Dunlap wrote:
> On 07/25/2017 05:00 PM, Dario Faggioli wrote:
> > On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
> > > 
> > Mmm.. I think you're right. In fact, in a properly configured
> > system,
> > we'll never go past step 3 (from the comment at the top).
> > 
> > Which is not ideal, or at least not what I had in mind. In fact, I
> > think it's better to check step 4 (svc->vcpu->processor in hard-
> > affinity) and step 5 (a CPU from svc's runqueue in hard affinity),
> > as
> > that would mean avoiding a runqueue migration.
> > 
> > What about I basically kill step 3, i.e., if we reach this point
> > during
> > the soft-affinity step, I just continue to the hard-affinity one?
> 
> Hmm, well *normally* we would rather have a vcpu running within its
> soft
> affinity, even if that means moving it to another runqueue.  
>
Yes, but both *ideally* and *normally*, we just should not be here. :-)

If we did end up here, we're in guessing territory, and, although what
you say about a guest wanting to run on within its soft-affinity is
always true, from the guest own point of view, our job as the scheduler
is to do what would be best for the system as a whole. But we are in a
situation where we could not gather the information to make such a
decision.

> Is your
> idea that, the only reason we're in this particular code is because
> we
> couldn't grab the lock we need to make a more informed decision; so
> defer if possible to previous decisions, which (we might presume) was
> able to make a more informed decision?
> 
Kind of, yes. Basically I think we should "escape" from this situation
as quickly as possible, and causing as few troubles as possible to both
ourself and to others, in the hope that it will go better next time.

Trying to stay in the same runqueue seems to me to fit this
requirement, as:
- as you say, we're here because a previous (presumably well informed)
  decision brought us here so, hopefully, staying here is not too bad, 
  neither for us nor overall;
- staying here is quicker and means less overhead for svc;
- staying here means less overhead overall. In fact, if we decide to 
  change runqueue, we will have to take the remote runqueue lock at 
  some point... And I'd prefer that to be for good reasons.

All that being said, it probably would be good to add a performance
counter, and try to get a sense of how frequently we actually end up in
this function as a fallback.

But in the meantime, yes, I'd try to make svc stay in the runqueue
where it is, in this case, if possible.

> > ASSERT_UNREACHABLE() is indeed much better. What do you mean with
> > "something random"? The value to be assigned to cpu?
> 
> Er, yes, I meant the return value.  Returning 0, or v->processor
> would
> be simple options.  *Really* defensive programming would attempt to
> chose something somewhat sensible with the minimal risk of triggering
> some other hidden assumptions (say, any cpu on our current runqueue).
> But part of me says even thinking too long about it is a waste of
> time
> for something we're 99.99% sure can never happen. :-)
> 
Agreed. IAC, I'll go for ASSERT_UNREACHABLE() and then see about using
either v->processor (with a comment), or a cpumask_any(something). Of
course the latter is expensive, but it should not be a big problem,
considering we'll never get there (I'll have a look at generated the
assembly, to confirm that).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
  2017-07-25 16:47         ` Dario Faggioli
@ 2017-07-25 16:52           ` George Dunlap
  2017-07-25 17:30             ` Dario Faggioli
  0 siblings, 1 reply; 25+ messages in thread
From: George Dunlap @ 2017-07-25 16:52 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Anshul Makkar, Justin T. Weaver

On 07/25/2017 05:47 PM, Dario Faggioli wrote:
> On Tue, 2017-07-25 at 17:17 +0100, George Dunlap wrote:
>> On 07/25/2017 05:00 PM, Dario Faggioli wrote:
>>> On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
>>>>
>>> Mmm.. I think you're right. In fact, in a properly configured
>>> system,
>>> we'll never go past step 3 (from the comment at the top).
>>>
>>> Which is not ideal, or at least not what I had in mind. In fact, I
>>> think it's better to check step 4 (svc->vcpu->processor in hard-
>>> affinity) and step 5 (a CPU from svc's runqueue in hard affinity),
>>> as
>>> that would mean avoiding a runqueue migration.
>>>
>>> What about I basically kill step 3, i.e., if we reach this point
>>> during
>>> the soft-affinity step, I just continue to the hard-affinity one?
>>
>> Hmm, well *normally* we would rather have a vcpu running within its
>> soft
>> affinity, even if that means moving it to another runqueue.  
>>
> Yes, but both *ideally* and *normally*, we just should not be here. :-)
> 
> If we did end up here, we're in guessing territory, and, although what
> you say about a guest wanting to run on within its soft-affinity is
> always true, from the guest own point of view, our job as the scheduler
> is to do what would be best for the system as a whole. But we are in a
> situation where we could not gather the information to make such a
> decision.
> 
>> Is your
>> idea that, the only reason we're in this particular code is because
>> we
>> couldn't grab the lock we need to make a more informed decision; so
>> defer if possible to previous decisions, which (we might presume) was
>> able to make a more informed decision?
>>
> Kind of, yes. Basically I think we should "escape" from this situation
> as quickly as possible, and causing as few troubles as possible to both
> ourself and to others, in the hope that it will go better next time.
> 
> Trying to stay in the same runqueue seems to me to fit this
> requirement, as:
> - as you say, we're here because a previous (presumably well informed)
>   decision brought us here so, hopefully, staying here is not too bad, 
>   neither for us nor overall;
> - staying here is quicker and means less overhead for svc;
> - staying here means less overhead overall. In fact, if we decide to 
>   change runqueue, we will have to take the remote runqueue lock at 
>   some point... And I'd prefer that to be for good reasons.
> 
> All that being said, it probably would be good to add a performance
> counter, and try to get a sense of how frequently we actually end up in
> this function as a fallback.
> 
> But in the meantime, yes, I'd try to make svc stay in the runqueue
> where it is, in this case, if possible.

Sounds good.  So are you going to respin the series then?

> 
>>> ASSERT_UNREACHABLE() is indeed much better. What do you mean with
>>> "something random"? The value to be assigned to cpu?
>>
>> Er, yes, I meant the return value.  Returning 0, or v->processor
>> would
>> be simple options.  *Really* defensive programming would attempt to
>> chose something somewhat sensible with the minimal risk of triggering
>> some other hidden assumptions (say, any cpu on our current runqueue).
>> But part of me says even thinking too long about it is a waste of
>> time
>> for something we're 99.99% sure can never happen. :-)
>>
> Agreed. IAC, I'll go for ASSERT_UNREACHABLE() and then see about using
> either v->processor (with a comment), or a cpumask_any(something). Of
> course the latter is expensive, but it should not be a big problem,
> considering we'll never get there (I'll have a look at generated the
> assembly, to confirm that).

OK, thanks.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
  2017-07-25 16:52           ` George Dunlap
@ 2017-07-25 17:30             ` Dario Faggioli
  0 siblings, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2017-07-25 17:30 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Anshul Makkar, Justin T. Weaver


[-- Attachment #1.1: Type: text/plain, Size: 871 bytes --]

On Tue, 2017-07-25 at 17:52 +0100, George Dunlap wrote:
> On 07/25/2017 05:47 PM, Dario Faggioli wrote:
> > 
> > All that being said, it probably would be good to add a performance
> > counter, and try to get a sense of how frequently we actually end
> > up in
> > this function as a fallback.
> > 
> > But in the meantime, yes, I'd try to make svc stay in the runqueue
> > where it is, in this case, if possible.
> 
> Sounds good.  So are you going to respin the series then?
> 
Yep, I'll rebase and respin this series. And then rebase the caps
series on top of this and respin it too.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-07-25 17:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-16 14:13 [PATCH 0/7] Soft affinity for Credit2 Dario Faggioli
2017-06-16 14:13 ` [PATCH 1/7] xen: sched: factor affinity helpers out of sched_credit.c Dario Faggioli
2017-06-23 10:02   ` Anshul Makkar
2017-06-23 13:33     ` Dario Faggioli
2017-06-16 14:13 ` [PATCH 2/7] xen/tools: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
2017-06-23 10:35   ` Anshul Makkar
2017-07-25 11:47   ` George Dunlap
2017-06-16 14:13 ` [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu() Dario Faggioli
2017-07-25 10:19   ` George Dunlap
2017-07-25 10:20     ` George Dunlap
2017-07-25 16:00     ` Dario Faggioli
2017-07-25 16:17       ` George Dunlap
2017-07-25 16:47         ` Dario Faggioli
2017-07-25 16:52           ` George Dunlap
2017-07-25 17:30             ` Dario Faggioli
2017-06-16 14:14 ` [PATCH 4/7] xen: credit2: soft-affinity awareness in csched2_cpu_pick() Dario Faggioli
2017-07-25 10:54   ` George Dunlap
2017-07-25 11:04   ` George Dunlap
2017-07-25 11:05     ` George Dunlap
2017-06-16 14:14 ` [PATCH 5/7] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
2017-07-25 11:06   ` George Dunlap
2017-06-16 14:14 ` [PATCH 6/7] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
2017-07-25 11:25   ` George Dunlap
2017-06-16 14:14 ` [PATCH 7/7] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
2017-07-25 11:31   ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.