All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Soft affinity for Credit2
@ 2017-07-27 12:05 Dario Faggioli
  2017-07-27 12:05 ` [PATCH v2 1/6] xen/tools: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Dario Faggioli @ 2017-07-27 12:05 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Ian Jackson, Justin T. Weaver, Wei Liu, Anshul Makkar

Hello,

Take 2 of soft-affinity for Credit2. Not much changed actually:
- what was patch 1, it went in, so the series is shorter; :-)
- in patch 5 ("xen: credit2: optimize runq_candidate() a little bit"), I
  changed things according to George's advice;
- patch 2 ("xen: credit2: soft-affinity awareness in gat_fallback_cpu()"),
  is the one that changed most. In that case as well, I've accordingly to
  what we decided during review.

In fact, all the patches have the tags they need to go in already, *except*
patch 2.

v1 was here:

 https://lists.xen.org/archives/html/xen-devel/2017-06/msg01795.html

A git branch is here:

 git://xenbits.xen.org/people/dariof/xen.git rel/sched/credit2-soft-aff-v2
 http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/credit2-soft-aff-v2
 https://travis-ci.org/fdario/xen/builds/258112860

Regards,
Dario
---
Dario Faggioli (6):
      xen/tools: credit2: soft-affinity awareness in runq_tickle()
      xen: credit2: soft-affinity awareness in gat_fallback_cpu()
      xen: credit2: soft-affinity awareness in csched2_cpu_pick()
      xen: credit2: kick away vcpus not running within their soft-affinity
      xen: credit2: optimize runq_candidate() a little bit
      xen: credit2: try to avoid tickling cpus subject to ratelimiting

 tools/xentrace/formats       |    2 
 tools/xentrace/xenalyze.c    |    7 -
 xen/common/sched_credit2.c   |  506 ++++++++++++++++++++++++++++++++----------
 xen/include/xen/perfc_defn.h |    2 
 4 files changed, 394 insertions(+), 123 deletions(-)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/6] xen/tools: credit2: soft-affinity awareness in runq_tickle()
  2017-07-27 12:05 [PATCH v2 0/6] Soft affinity for Credit2 Dario Faggioli
@ 2017-07-27 12:05 ` Dario Faggioli
  2017-08-28 14:40   ` George Dunlap
  2017-07-27 12:05 ` [PATCH v2 2/6] xen: credit2: soft-affinity awareness in gat_fallback_cpu() Dario Faggioli
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Dario Faggioli @ 2017-07-27 12:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, George Dunlap, Anshul Makkar

Soft-affinity support is usually implemented by means
of a two step "balancing loop", where:
- during the first step, we consider soft-affinity
  (if the vcpu has one);
- during the second (if we get to it), we consider
  hard-affinity.

In runq_tickle(), we need to do that for checking
whether we can execute the waking vCPU on an pCPU
that is idle. In fact, we want to be sure that, if
there is an idle pCPU in the vCPU's soft affinity,
we'll use it.

If there are no such idle pCPUs, though, and we
have to check non-idle ones, we can avoid the loop
and to both hard and soft-affinity in one pass.

In fact, we can we scan runqueue and compute a
"score" for each vCPU which is running on each pCPU.
The idea is, since we may have to preempt someone:
- try to make sure that the waking vCPU will run
  inside its soft-affinity,
- try to preempt someone that is running outside
  of its own soft-affinity.

The value of the score is added to a trace record,
so xenalyze's code and tools/xentrace/formats are
updated accordingly.

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: Anshul Makkar <anshulmakkar@gmail.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/xentrace/formats     |    2 
 tools/xentrace/xenalyze.c  |    7 +
 xen/common/sched_credit2.c |  214 +++++++++++++++++++++++++++++---------------
 3 files changed, 146 insertions(+), 77 deletions(-)

diff --git a/tools/xentrace/formats b/tools/xentrace/formats
index c1f584f..f39182a 100644
--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -53,7 +53,7 @@
 0x00022202  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:runq_pos       [ dom:vcpu = 0x%(1)08x, pos = %(2)d]
 0x00022203  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit burn    [ dom:vcpu = 0x%(1)08x, credit = %(2)d, delta = %(3)d ]
 0x00022204  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit_add
-0x00022205  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tickle_check   [ dom:vcpu = 0x%(1)08x, credit = %(2)d ]
+0x00022205  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tickle_check   [ dom:vcpu = 0x%(1)08x, credit = %(2)d, score = %(3)d ]
 0x00022206  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tickle         [ cpu = %(1)d ]
 0x00022207  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit_reset   [ dom:vcpu = 0x%(1)08x, cr_start = %(2)d, cr_end = %(3)d, mult = %(4)d ]
 0x00022208  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:sched_tasklet
diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 24cce2a..39fc35f 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7692,11 +7692,12 @@ void sched_process(struct pcpu_info *p)
             if(opt.dump_all) {
                 struct {
                     unsigned int vcpuid:16, domid:16;
-                    int credit;
+                    int credit, score;
                 } *r = (typeof(r))ri->d;
 
-                printf(" %s csched2:tickle_check d%uv%u, credit = %d\n",
-                       ri->dump_header, r->domid, r->vcpuid, r->credit);
+                printf(" %s csched2:tickle_check d%uv%u, credit = %d, score = %d\n\n",
+                       ri->dump_header, r->domid, r->vcpuid,
+                       r->credit, r->score);
             }
             break;
         case TRC_SCHED_CLASS_EVT(CSCHED2, 6): /* TICKLE            */
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 29c002a..57e77df 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1146,6 +1146,73 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
 }
 
 /*
+ * Score to preempt the target cpu.  Return a negative number if the
+ * credit isn't high enough; if it is, favor a preemption on cpu in
+ * this order:
+ * - cpu is in new's soft-affinity, not in cur's soft-affinity
+ *   (2 x CSCHED2_CREDIT_INIT score bonus);
+ * - cpu is in new's soft-affinity and cur's soft-affinity, or
+ *   cpu is not in new's soft-affinity, nor in cur's soft-affinity
+ *   (1x CSCHED2_CREDIT_INIT score bonus);
+ * - cpu is not in new's soft-affinity, while it is in cur's soft-affinity
+ *   (no bonus).
+ *
+ * Within the same class, the highest difference of credit.
+ */
+static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now,
+                             struct csched2_vcpu *new, unsigned int cpu)
+{
+    struct csched2_vcpu * cur = csched2_vcpu(curr_on_cpu(cpu));
+    s_time_t score;
+
+    /*
+     * We are dealing with cpus that are marked non-idle (i.e., that are not
+     * in rqd->idle). However, some of them may be running their idle vcpu,
+     * if taking care of tasklets. In that case, we want to leave it alone.
+     */
+    if ( unlikely(is_idle_vcpu(cur->vcpu)) )
+        return -1;
+
+    burn_credits(rqd, cur, now);
+
+    score = new->credit - cur->credit;
+    if ( new->vcpu->processor != cpu )
+        score -= CSCHED2_MIGRATE_RESIST;
+
+    /*
+     * If score is positive, it means new has enough credits (i.e.,
+     * new->credit > cur->credit+CSCHED2_MIGRATE_RESIST).
+     *
+     * Let's compute the bonuses for soft-affinities.
+     */
+    if ( score > 0 )
+    {
+        if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+            score += CSCHED2_CREDIT_INIT;
+
+        if ( !cpumask_test_cpu(cpu, cur->vcpu->cpu_soft_affinity) )
+            score += CSCHED2_CREDIT_INIT;
+    }
+
+    if ( unlikely(tb_init_done) )
+    {
+        struct {
+            unsigned vcpu:16, dom:16;
+            int credit, score;
+        } d;
+        d.dom = cur->vcpu->domain->domain_id;
+        d.vcpu = cur->vcpu->vcpu_id;
+        d.credit = cur->credit;
+        d.score = score;
+        __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
+                    sizeof(d),
+                    (unsigned char *)&d);
+    }
+
+    return score;
+}
+
+/*
  * Check what processor it is best to 'wake', for picking up a vcpu that has
  * just been put (back) in the runqueue. Logic is as follows:
  *  1. if there are idle processors in the runq, wake one of them;
@@ -1165,11 +1232,11 @@ static void
 runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
 {
     int i, ipid = -1;
-    s_time_t lowest = (1<<30);
-    unsigned int cpu = new->vcpu->processor;
+    s_time_t max = 0;
+    unsigned int bs, cpu = new->vcpu->processor;
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
+    cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain);
     cpumask_t mask;
-    struct csched2_vcpu * cur;
 
     ASSERT(new->rqd == rqd);
 
@@ -1189,109 +1256,110 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                     (unsigned char *)&d);
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
-                cpupool_domain_cpumask(new->vcpu->domain));
-
-    /*
-     * First of all, consider idle cpus, checking if we can just
-     * re-use the pcpu where we were running before.
-     *
-     * If there are cores where all the siblings are idle, consider
-     * them first, honoring whatever the spreading-vs-consolidation
-     * SMT policy wants us to do.
-     */
-    if ( unlikely(sched_smt_power_savings) )
-        cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
-    else
-        cpumask_copy(&mask, &rqd->smt_idle);
-    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
-    i = cpumask_test_or_cycle(cpu, &mask);
-    if ( i < nr_cpu_ids )
+    for_each_affinity_balance_step( bs )
     {
-        SCHED_STAT_CRANK(tickled_idle_cpu);
-        ipid = i;
-        goto tickle;
+        /* Just skip first step, if we don't have a soft affinity */
+        if ( bs == BALANCE_SOFT_AFFINITY &&
+             !has_soft_affinity(new->vcpu, new->vcpu->cpu_hard_affinity) )
+            continue;
+
+        affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch_cpu(cpu));
+
+        /*
+         * First of all, consider idle cpus, checking if we can just
+         * re-use the pcpu where we were running before.
+         *
+         * If there are cores where all the siblings are idle, consider
+         * them first, honoring whatever the spreading-vs-consolidation
+         * SMT policy wants us to do.
+         */
+        if ( unlikely(sched_smt_power_savings) )
+        {
+            cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
+            cpumask_and(&mask, &mask, online);
+        }
+        else
+            cpumask_and(&mask, &rqd->smt_idle, online);
+        cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
+        i = cpumask_test_or_cycle(cpu, &mask);
+        if ( i < nr_cpu_ids )
+        {
+            SCHED_STAT_CRANK(tickled_idle_cpu);
+            ipid = i;
+            goto tickle;
+        }
+
+        /*
+         * If there are no fully idle cores, check all idlers, after
+         * having filtered out pcpus that have been tickled but haven't
+         * gone through the scheduler yet.
+         */
+        cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), online);
+        cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
+        i = cpumask_test_or_cycle(cpu, &mask);
+        if ( i < nr_cpu_ids )
+        {
+            SCHED_STAT_CRANK(tickled_idle_cpu);
+            ipid = i;
+            goto tickle;
+        }
     }
 
     /*
-     * If there are no fully idle cores, check all idlers, after
-     * having filtered out pcpus that have been tickled but haven't
-     * gone through the scheduler yet.
+     * Note that, if we are here, it means we have done the hard-affinity
+     * balancing step of the loop, and hence what we have in cpumask_scratch
+     * is what we put there for last, i.e., new's vcpu_hard_affinity & online
+     * which is exactly what we need for the next part of the function.
      */
-    cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
-    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
-    i = cpumask_test_or_cycle(cpu, &mask);
-    if ( i < nr_cpu_ids )
-    {
-        SCHED_STAT_CRANK(tickled_idle_cpu);
-        ipid = i;
-        goto tickle;
-    }
 
     /*
      * Otherwise, look for the non-idle (and non-tickled) processors with
      * the lowest credit, among the ones new is allowed to run on. Again,
      * the cpu were it was running on would be the best candidate.
+     *
+     * For deciding which cpu to tickle, we use tickle_score(), which will
+     * factor in both new's soft-affinity, and the soft-affinity of the
+     * vcpu running on each cpu that we consider.
      */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
     cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     if ( __cpumask_test_and_clear_cpu(cpu, &mask) )
     {
-        cur = csched2_vcpu(curr_on_cpu(cpu));
-        burn_credits(rqd, cur, now);
+        s_time_t score = tickle_score(rqd, now, new, cpu);
 
-        if ( cur->credit < new->credit )
+        if ( score > max )
         {
-            SCHED_STAT_CRANK(tickled_busy_cpu);
+            max = score;
             ipid = cpu;
-            goto tickle;
+
+            /* If this is in new's soft affinity, just take it */
+            if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+            {
+                SCHED_STAT_CRANK(tickled_busy_cpu);
+                goto tickle;
+            }
         }
     }
 
     for_each_cpu(i, &mask)
     {
+        s_time_t score;
+
         /* Already looked at this one above */
         ASSERT(i != cpu);
 
-        cur = csched2_vcpu(curr_on_cpu(i));
-
-        /*
-         * Even if the cpu is not in rqd->idle, it may be running the
-         * idle vcpu, if it's doing tasklet work. Just skip it.
-         */
-        if ( is_idle_vcpu(cur->vcpu) )
-            continue;
-
-        /* Update credits for current to see if we want to preempt. */
-        burn_credits(rqd, cur, now);
+        score = tickle_score(rqd, now, new, i);
 
-        if ( cur->credit < lowest )
+        if ( score > max )
         {
+            max = score;
             ipid = i;
-            lowest = cur->credit;
-        }
-
-        if ( unlikely(tb_init_done) )
-        {
-            struct {
-                unsigned vcpu:16, dom:16;
-                int credit;
-            } d;
-            d.dom = cur->vcpu->domain->domain_id;
-            d.vcpu = cur->vcpu->vcpu_id;
-            d.credit = cur->credit;
-            __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
-                        sizeof(d),
-                        (unsigned char *)&d);
         }
     }
 
-    /*
-     * Only switch to another processor if the credit difference is
-     * greater than the migrate resistance.
-     */
-    if ( ipid == -1 || lowest + CSCHED2_MIGRATE_RESIST > new->credit )
+    if ( ipid == -1 )
     {
         SCHED_STAT_CRANK(tickled_no_cpu);
         return;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/6] xen: credit2: soft-affinity awareness in gat_fallback_cpu()
  2017-07-27 12:05 [PATCH v2 0/6] Soft affinity for Credit2 Dario Faggioli
  2017-07-27 12:05 ` [PATCH v2 1/6] xen/tools: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
@ 2017-07-27 12:05 ` Dario Faggioli
  2017-08-28 14:49   ` George Dunlap
  2017-07-27 12:05 ` [PATCH v2 3/6] xen: credit2: soft-affinity awareness in csched2_cpu_pick() Dario Faggioli
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Dario Faggioli @ 2017-07-27 12:05 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Justin T. Weaver, Anshul Makkar

By, basically, moving all the logic of the function
inside the usual two steps (soft-affinity step and
hard-affinity step) loop.

While there, add two performance counters (in cpu_pick
and in get_fallback_cpu() itself), in order to be able
to tell how frequently it happens that we need to look
for a fallback cpu.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
---
Cc: Anshul Makkar <anshulmakkar@gmail.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
Changes from v1:
- as discussed during review, only consider hard-affinity for the last stand.
  The idea is not moving the vcpu to a diffrent runqueue because of
  soft-affinity, as a part of finding a fallback cpu;
- as discussed during review, added the performance counters;
- BUG_ON(1) turned into ASSERT_UNREACHABLE(), as suggested during review;
- return something same and random enough, at the end of the function (in
  case we somehow manage to get there).
---
 xen/common/sched_credit2.c   |  101 +++++++++++++++++++++++++++++++++---------
 xen/include/xen/perfc_defn.h |    2 +
 2 files changed, 82 insertions(+), 21 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 57e77df..aa8f169 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -549,36 +549,93 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 }
 
 /*
- * When a hard affinity change occurs, we may not be able to check some
- * (any!) of the other runqueues, when looking for the best new processor
- * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
- * pick, in order of decreasing preference:
- *  - svc's current pcpu;
- *  - another pcpu from svc's current runq;
- *  - any cpu.
+ * In csched2_cpu_pick(), it may not be possible to actually look at remote
+ * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
+ * we pick, in order of decreasing preference:
+ *  1) svc's current pcpu, if it is part of svc's soft affinity;
+ *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
+ *  3) svc's current pcpu, if it is part of svc's hard affinity;
+ *  4) a pcpu in svc's current runqueue that is also in svc's hard affinity;
+ *  5) just one valid pcpu from svc's hard affinity
+ *
+ * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
+ * note that at least 6 is guaranteed to _always_ return at least one pcpu.
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
     struct vcpu *v = svc->vcpu;
-    int cpu = v->processor;
+    unsigned int bs;
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
-                cpupool_domain_cpumask(v->domain));
+    SCHED_STAT_CRANK(need_fallback_cpu);
 
-    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
-        return cpu;
-
-    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
-                                   &svc->rqd->active)) )
+    for_each_affinity_balance_step( bs )
     {
-        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
-                    cpumask_scratch_cpu(cpu));
-        return cpumask_first(cpumask_scratch_cpu(cpu));
-    }
+        int cpu = v->processor;
 
-    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
+        if ( bs == BALANCE_SOFT_AFFINITY &&
+             !has_soft_affinity(v, v->cpu_hard_affinity) )
+            continue;
 
-    return cpumask_first(cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    cpupool_domain_cpumask(v->domain));
+
+        /*
+         * This is cases 1 or 3 (depending on bs): if v->processor is (still)
+         * in our affinity, go for it, for cache betterness.
+         */
+        if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
+            return cpu;
+
+        /*
+         * This is cases 2 or 4 (depending on bs): v->processor isn't there
+         * any longer, check if we at least can stay in our current runq.
+         */
+        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                       &svc->rqd->active)) )
+        {
+            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                        &svc->rqd->active);
+            return cpumask_first(cpumask_scratch_cpu(cpu));
+        }
+
+        /*
+         * We may well pick any valid pcpu from our soft-affinity, outside
+         * of our current runqueue, but we decide not to. In fact, changing
+         * runqueue is slow, affects load distribution, and is a source of
+         * overhead for the vcpus running on the other runqueue (we need the
+         * lock). So, better do that as a consequence of a well informed
+         * decision (or if we really don't have any other chance, as we will,
+         * at step 6, if we get to there).
+         *
+         * Also, being here, looking for a fallback, is an unfortunate and
+         * infrequent event, while the decision of putting us in the runqueue
+         * wehere we are was (likely) made taking all the relevant factors
+         * into account. So let's not disrupt that, just for the sake of
+         * soft-affinity, and let's wait here to be able to made (hopefully,
+         * soon), another similar well informed decision.
+         */
+        if ( bs == BALANCE_SOFT_AFFINITY )
+            continue;
+
+        /*
+         * This is cases 6: last stand, just one valid pcpu from our hard
+         * affinity. It's guaranteed that there is at least one valid cpu,
+         * and therefore we are sure that we return it, and never really
+         * exit the loop.
+         */
+        ASSERT(bs == BALANCE_HARD_AFFINITY &&
+               !cpumask_empty(cpumask_scratch_cpu(cpu)));
+        cpu = cpumask_first(cpumask_scratch_cpu(cpu));
+        if ( likely(cpu < nr_cpu_ids) )
+            return cpu;
+    }
+    ASSERT_UNREACHABLE();
+    /*
+     * We can't be here.  But if that somehow happen (in non-debug builds),
+     * at least return something which both online and in our hard-affinity.
+     */
+    return cpumask_any(cpumask_scratch_cpu(v->processor));
 }
 
 /*
@@ -1715,6 +1772,8 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 
     ASSERT(!cpumask_empty(&prv->active_queues));
 
+    SCHED_STAT_CRANK(pick_cpu);
+
     /* Locking:
      * - Runqueue lock of vc->processor is already locked
      * - Need to grab prv lock to make sure active runqueues don't
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index 53849af..c135bf8 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -66,6 +66,8 @@ PERFCOUNTER(migrate_on_runq,        "csched2: migrate_on_runq")
 PERFCOUNTER(migrate_no_runq,        "csched2: migrate_no_runq")
 PERFCOUNTER(runtime_min_timer,      "csched2: runtime_min_timer")
 PERFCOUNTER(runtime_max_timer,      "csched2: runtime_max_timer")
+PERFCOUNTER(pick_cpu,               "csched2: pick_cpu")
+PERFCOUNTER(need_fallback_cpu,      "csched2: need_fallback_cpu")
 PERFCOUNTER(migrated,               "csched2: migrated")
 PERFCOUNTER(migrate_resisted,       "csched2: migrate_resisted")
 PERFCOUNTER(credit_reset,           "csched2: credit_reset")


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/6] xen: credit2: soft-affinity awareness in csched2_cpu_pick()
  2017-07-27 12:05 [PATCH v2 0/6] Soft affinity for Credit2 Dario Faggioli
  2017-07-27 12:05 ` [PATCH v2 1/6] xen/tools: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
  2017-07-27 12:05 ` [PATCH v2 2/6] xen: credit2: soft-affinity awareness in gat_fallback_cpu() Dario Faggioli
@ 2017-07-27 12:05 ` Dario Faggioli
  2017-07-27 12:06 ` [PATCH v2 4/6] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Dario Faggioli @ 2017-07-27 12:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Justin T. Weaver, George Dunlap, Anshul Makkar

We want to find the runqueue with the least average load,
and to do that, we scan through all the runqueues.

It is, therefore, enough that, during such scan:
- we identify the runqueue with the least load, among
  the ones that have pcpus that are part of the soft
  affinity of the vcpu we're calling pick on;
- we identify the same, but for hard affinity.

At this point, we can decide whether to go for the
runqueue with the least load among the ones with some
soft-affinity, or overall.

Therefore, at the price of some code reshuffling, we
can avoid the loop.

(Also, kill a spurious ';' in the definition of MAX_LOAD.)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: Anshul Makkar <anshulmakkar@gmail.com>
---
 xen/common/sched_credit2.c |  117 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 97 insertions(+), 20 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index aa8f169..8237a0a 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1761,14 +1761,16 @@ csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
     vcpu_schedule_unlock_irq(lock, vc);
 }
 
-#define MAX_LOAD (STIME_MAX);
+#define MAX_LOAD (STIME_MAX)
 static int
 csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    int i, min_rqi = -1, new_cpu, cpu = vc->processor;
+    int i, min_rqi = -1, min_s_rqi = -1;
+    unsigned int new_cpu, cpu = vc->processor;
     struct csched2_vcpu *svc = csched2_vcpu(vc);
-    s_time_t min_avgload = MAX_LOAD;
+    s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
+    bool has_soft;
 
     ASSERT(!cpumask_empty(&prv->active_queues));
 
@@ -1819,17 +1821,35 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
                                      &svc->migrate_rqd->active) )
         {
+            /*
+             * If we've been asked to move to migrate_rqd, we should just do
+             * that, which we actually do by returning one cpu from that runq.
+             * There is no need to take care of soft affinity, as that will
+             * happen in runq_tickle().
+             */
             cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                         &svc->migrate_rqd->active);
             new_cpu = cpumask_cycle(svc->migrate_rqd->pick_bias,
                                     cpumask_scratch_cpu(cpu));
+
             svc->migrate_rqd->pick_bias = new_cpu;
             goto out_up;
         }
         /* Fall-through to normal cpu pick */
     }
 
-    /* Find the runqueue with the lowest average load. */
+    /*
+     * What we want is:
+     *  - if we have soft affinity, the runqueue with the lowest average
+     *    load, among the ones that contain cpus in our soft affinity; this
+     *    represents the best runq on which we would want to run.
+     *  - the runqueue with the lowest average load among the ones that
+     *    contains cpus in our hard affinity; this represent the best runq
+     *    on which we can run.
+     *
+     * Find both runqueues in one pass.
+     */
+    has_soft = has_soft_affinity(vc, vc->cpu_hard_affinity);
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
@@ -1838,31 +1858,51 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         rqd = prv->rqd + i;
 
         /*
-         * If checking a different runqueue, grab the lock, check hard
-         * affinity, read the avg, and then release the lock.
+         * If none of the cpus of this runqueue is in svc's hard-affinity,
+         * skip the runqueue.
+         *
+         * Note that, in case svc's hard-affinity has changed, this is the
+         * first time when we see such change, so it is indeed possible
+         * that we end up skipping svc's current runqueue.
+         */
+        if ( !cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
+            continue;
+
+        /*
+         * If checking a different runqueue, grab the lock, read the avg,
+         * and then release the lock.
          *
          * If on our own runqueue, don't grab or release the lock;
          * but subtract our own load from the runqueue load to simulate
          * impartiality.
-         *
-         * Note that, if svc's hard affinity has changed, this is the
-         * first time when we see such change, so it is indeed possible
-         * that none of the cpus in svc's current runqueue is in our
-         * (new) hard affinity!
          */
         if ( rqd == svc->rqd )
         {
-            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
-                rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
+            rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
-                rqd_avgload = rqd->b_avgload;
-
+            rqd_avgload = rqd->b_avgload;
             spin_unlock(&rqd->lock);
         }
 
+        /*
+         * if svc has a soft-affinity, and some cpus of rqd are part of it,
+         * see if we need to update the "soft-affinity minimum".
+         */
+        if ( has_soft &&
+             rqd_avgload < min_s_avgload )
+        {
+            cpumask_t mask;
+
+            cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active);
+            if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) )
+            {
+                min_s_avgload = rqd_avgload;
+                min_s_rqi = i;
+            }
+        }
+        /* In any case, keep the "hard-affinity minimum" updated too. */
         if ( rqd_avgload < min_avgload )
         {
             min_avgload = rqd_avgload;
@@ -1870,17 +1910,54 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         }
     }
 
-    /* We didn't find anyone (most likely because of spinlock contention). */
-    if ( min_rqi == -1 )
+    if ( has_soft && min_s_rqi != -1 )
+    {
+        /*
+         * We have soft affinity, and we have a candidate runq, so go for it.
+         *
+         * Note that, to obtain the soft-affinity mask, we "just" put what we
+         * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
+         * ok because:
+         * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have
+         *   a non-empty intersection (because has_soft is true);
+         * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already
+         *   in cpumask_scratch, we do save a lot doing like this.
+         *
+         * It's kind of like open coding affinity_balance_cpumask() but, in
+         * this specific case, calling that would mean a lot of (unnecessary)
+         * cpumask operations.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    vc->cpu_soft_affinity);
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    &prv->rqd[min_s_rqi].active);
+    }
+    else if ( min_rqi != -1 )
     {
+        /*
+         * Either we don't have soft-affinity, or we do, but we did not find
+         * any suitable runq. But we did find one when considering hard
+         * affinity, so go for it.
+         *
+         * cpumask_scratch already has vc->cpu_hard_affinity &
+         * cpupool_domain_cpumask() in it, so it's enough that we filter
+         * with the cpus of the runq.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    &prv->rqd[min_rqi].active);
+    }
+    else
+    {
+        /*
+         * We didn't find anyone at all (most likely because of spinlock
+         * contention).
+         */
         new_cpu = get_fallback_cpu(svc);
         min_rqi = c2r(new_cpu);
         min_avgload = prv->rqd[min_rqi].b_avgload;
         goto out_up;
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                &prv->rqd[min_rqi].active);
     new_cpu = cpumask_cycle(prv->rqd[min_rqi].pick_bias,
                             cpumask_scratch_cpu(cpu));
     prv->rqd[min_rqi].pick_bias = new_cpu;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/6] xen: credit2: kick away vcpus not running within their soft-affinity
  2017-07-27 12:05 [PATCH v2 0/6] Soft affinity for Credit2 Dario Faggioli
                   ` (2 preceding siblings ...)
  2017-07-27 12:05 ` [PATCH v2 3/6] xen: credit2: soft-affinity awareness in csched2_cpu_pick() Dario Faggioli
@ 2017-07-27 12:06 ` Dario Faggioli
  2017-07-27 12:06 ` [PATCH v2 5/6] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
  2017-07-27 12:06 ` [PATCH v2 6/6] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
  5 siblings, 0 replies; 10+ messages in thread
From: Dario Faggioli @ 2017-07-27 12:06 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Anshul Makkar

If, during scheduling, we realize that the current vcpu
is running outside of its own soft-affinity, it would be
preferable to send it somewhere else.

Of course, that may not be possible, and if we're too
strict, we risk having vcpus sit in runqueues, even if
there are idle pcpus (violating work-conservingness).
In fact, what about there are no pcpus, from the soft
affinity mask of the vcpu in question, where it can
run?

To make sure we don't fall in the above described trap,
only actually de-schedule the vcpu if there are idle and
not already tickled cpus from its soft affinity where it
can run immediately.

If there is (at least one) of such cpus, we let current
be preempted, so that csched2_context_saved() will put
it back in the runq, and runq_tickle() will wake (one
of) the cpu.

If there is not even one, we let current run where it is,
as running outside its soft-affinity is still better than
not running at all.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: Anshul Makkar <anshulmakkar@gmail.com>
---
 xen/common/sched_credit2.c |   40 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 8237a0a..3f10b4b 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2675,6 +2675,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     struct csched2_vcpu *snext = NULL;
     struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
     bool yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+    bool soft_aff_preempt = false;
 
     *skipped = 0;
 
@@ -2708,8 +2709,43 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         return scurr;
     }
 
-    /* Default to current if runnable, idle otherwise */
-    if ( vcpu_runnable(scurr->vcpu) )
+    /* If scurr has a soft-affinity, let's check whether cpu is part of it */
+    if ( !is_idle_vcpu(scurr->vcpu) &&
+         has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
+    {
+        affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
+                                 cpumask_scratch);
+        if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) )
+        {
+            cpumask_t *online = cpupool_domain_cpumask(scurr->vcpu->domain);
+
+            /* Ok, is any of the pcpus in scurr soft-affinity idle? */
+            cpumask_and(cpumask_scratch, cpumask_scratch, &rqd->idle);
+            cpumask_andnot(cpumask_scratch, cpumask_scratch, &rqd->tickled);
+            soft_aff_preempt = cpumask_intersects(cpumask_scratch, online);
+        }
+    }
+
+    /*
+     * If scurr is runnable, and this cpu is in its soft-affinity, default to
+     * it. We also default to it, even if cpu is not in its soft-affinity, if
+     * there aren't any idle and not tickled cpu in its soft-affinity. In
+     * fact, we don't want to risk leaving scurr in the runq and this cpu idle
+     * only because scurr is running outside of its soft-affinity.
+     *
+     * On the other hand, if cpu is not in scurr's soft-affinity, and there
+     * looks to be better options, go for them. That happens by defaulting to
+     * idle here, which means scurr will be preempted, put back in runq, and
+     * one of those idle and not tickled cpus from its soft-affinity will be
+     * tickled to pick it up.
+     *
+     * Finally, if scurr does not have a valid soft-affinity, we also let it
+     * continue to run here (in fact, soft_aff_preempt will still be false,
+     * in this case).
+     *
+     * Of course, we also default to idle also if scurr is not runnable.
+     */
+    if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt )
         snext = scurr;
     else
         snext = csched2_vcpu(idle_vcpu[cpu]);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 5/6] xen: credit2: optimize runq_candidate() a little bit
  2017-07-27 12:05 [PATCH v2 0/6] Soft affinity for Credit2 Dario Faggioli
                   ` (3 preceding siblings ...)
  2017-07-27 12:06 ` [PATCH v2 4/6] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
@ 2017-07-27 12:06 ` Dario Faggioli
  2017-07-27 12:06 ` [PATCH v2 6/6] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
  5 siblings, 0 replies; 10+ messages in thread
From: Dario Faggioli @ 2017-07-27 12:06 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Anshul Makkar

By factoring into one (at the top) all the checks
to see whether current is the idle vcpu, and mark
it as unlikely().

In fact, if current is idle, all the logic for
dealing with yielding, context switching rate
limiting and soft-affinity, is just pure overhead,
and we better rush checking the runq and pick some
vcpu up.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: Anshul Makkar <anshulmakkar@gmail.com>
---
Changes from v1:
- for George: about what you said in
  <d3bf41b5-a152-8290-378f-3ff279b7e3ab@citrix.com>, I went for the "leave
  unset at declaration and set explicitly on both paths" apprach, i.e., the
  one you said you preferred (as I also like it better in this case). After
  doing that, I've applied your Reviewed-by, as you said I could.
---
 xen/common/sched_credit2.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 3f10b4b..30d9f55 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2674,11 +2674,18 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     struct list_head *iter;
     struct csched2_vcpu *snext = NULL;
     struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu));
-    bool yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
-    bool soft_aff_preempt = false;
+    bool yield = false, soft_aff_preempt = false;
 
     *skipped = 0;
 
+    if ( unlikely(is_idle_vcpu(scurr->vcpu)) )
+    {
+        snext = scurr;
+        goto check_runq;
+    }
+
+    yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+
     /*
      * Return the current vcpu if it has executed for less than ratelimit.
      * Adjuststment for the selected vcpu's credit and decision
@@ -2688,8 +2695,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !yield && prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
-         vcpu_runnable(scurr->vcpu) &&
+    if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) &&
          (now - scurr->vcpu->runstate.state_entry_time) <
           MICROSECS(prv->ratelimit_us) )
     {
@@ -2710,8 +2716,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     }
 
     /* If scurr has a soft-affinity, let's check whether cpu is part of it */
-    if ( !is_idle_vcpu(scurr->vcpu) &&
-         has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
+    if ( has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
     {
         affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
                                  cpumask_scratch);
@@ -2750,6 +2755,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     else
         snext = csched2_vcpu(idle_vcpu[cpu]);
 
+ check_runq:
     list_for_each( iter, &rqd->runq )
     {
         struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 6/6] xen: credit2: try to avoid tickling cpus subject to ratelimiting
  2017-07-27 12:05 [PATCH v2 0/6] Soft affinity for Credit2 Dario Faggioli
                   ` (4 preceding siblings ...)
  2017-07-27 12:06 ` [PATCH v2 5/6] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
@ 2017-07-27 12:06 ` Dario Faggioli
  5 siblings, 0 replies; 10+ messages in thread
From: Dario Faggioli @ 2017-07-27 12:06 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Anshul Makkar

With context switching ratelimiting enabled, the following
pattern is quite common in a scheduling trace:

     0.000845622 |||||||||||.x||| d32768v12 csched2:runq_insert d0v13, position 0
     0.000845831 |||||||||||.x||| d32768v12 csched2:runq_tickle_new d0v13, processor = 12, credit = 10135529
     0.000846546 |||||||||||.x||| d32768v12 csched2:burn_credits d2v7, credit = 2619231, delta = 255937
 [1] 0.000846739 |||||||||||.x||| d32768v12 csched2:runq_tickle cpu 12
     [...]
 [2] 0.000850597 ||||||||||||x||| d32768v12 csched2:schedule cpu 12, rq# 1, busy, SMT busy, tickled
     0.000850760 ||||||||||||x||| d32768v12 csched2:burn_credits d2v7, credit = 2614028, delta = 5203
 [3] 0.000851022 ||||||||||||x||| d32768v12 csched2:ratelimit triggered
 [4] 0.000851614 ||||||||||||x||| d32768v12 runstate_continue d2v7 running->running

Basically, what happens is that runq_tickle() realizes
d0v13 should preempt d2v7, running on cpu 12, as it
has higher credits (10135529 vs. 2619231). It therefore
tickles cpu 12 [1], which, in turn, schedules [2].

But --surprise surprise-- d2v7 has run for less than the
ratelimit interval [3], and hence it is _not_ preempted,
and continues to run. This indeed looks fine. Actually,
this is what ratelimiting is there for. Note, however,
that:
 1) we interrupted cpu 12 for nothing;
 2) what if, say on cpu 8, there is a vcpu that has:
    + less credit than d0v13 (so d0v13 can well
      preempt it),
    + more credit than d2v7 (that's why it was not
      selected to be preempted),
    + run for more than the ratelimiting interval
      (so it can really be scheduled out)?

With this patch, if we are in case 2), we'd realize
that tickling 12 would be pointless, and we'll continue
looking, eventually finding and tickling 8.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
---
Cc: Anshul Makkar <anshulmakkar@gmail.com>
---
 xen/common/sched_credit2.c |   30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 30d9f55..fab7f2e 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -160,6 +160,8 @@
 #define CSCHED2_MIGRATE_RESIST       ((opt_migrate_resist)*MICROSECS(1))
 /* How much to "compensate" a vcpu for L2 migration. */
 #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50)
+/* How tolerant we should be when peeking at runtime of vcpus on other cpus */
+#define CSCHED2_RATELIMIT_TICKLE_TOLERANCE MICROSECS(50)
 /* Reset: Value below which credit will be reset. */
 #define CSCHED2_CREDIT_RESET         0
 /* Max timer: Maximum time a guest can be run for. */
@@ -1203,6 +1205,23 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
 }
 
 /*
+ * What we want to know is whether svc, which we assume to be running on some
+ * pcpu, can be interrupted and preempted (which, so far, basically means
+ * whether or not it already run for more than the ratelimit, to which we
+ * apply some tolerance).
+ */
+static inline bool is_preemptable(const struct csched2_vcpu *svc,
+                                    s_time_t now, s_time_t ratelimit)
+{
+    if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
+        return true;
+
+    ASSERT(svc->vcpu->is_running);
+    return now - svc->vcpu->runstate.state_entry_time >
+           ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE;
+}
+
+/*
  * Score to preempt the target cpu.  Return a negative number if the
  * credit isn't high enough; if it is, favor a preemption on cpu in
  * this order:
@@ -1216,10 +1235,12 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
  *
  * Within the same class, the highest difference of credit.
  */
-static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now,
+static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
                              struct csched2_vcpu *new, unsigned int cpu)
 {
+    struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
     struct csched2_vcpu * cur = csched2_vcpu(curr_on_cpu(cpu));
+    struct csched2_private *prv = csched2_priv(ops);
     s_time_t score;
 
     /*
@@ -1227,7 +1248,8 @@ static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now,
      * in rqd->idle). However, some of them may be running their idle vcpu,
      * if taking care of tasklets. In that case, we want to leave it alone.
      */
-    if ( unlikely(is_idle_vcpu(cur->vcpu)) )
+    if ( unlikely(is_idle_vcpu(cur->vcpu) ||
+         !is_preemptable(cur, now, MICROSECS(prv->ratelimit_us))) )
         return -1;
 
     burn_credits(rqd, cur, now);
@@ -1384,7 +1406,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
     cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     if ( __cpumask_test_and_clear_cpu(cpu, &mask) )
     {
-        s_time_t score = tickle_score(rqd, now, new, cpu);
+        s_time_t score = tickle_score(ops, now, new, cpu);
 
         if ( score > max )
         {
@@ -1407,7 +1429,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         /* Already looked at this one above */
         ASSERT(i != cpu);
 
-        score = tickle_score(rqd, now, new, i);
+        score = tickle_score(ops, now, new, i);
 
         if ( score > max )
         {


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/6] xen/tools: credit2: soft-affinity awareness in runq_tickle()
  2017-07-27 12:05 ` [PATCH v2 1/6] xen/tools: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
@ 2017-08-28 14:40   ` George Dunlap
  0 siblings, 0 replies; 10+ messages in thread
From: George Dunlap @ 2017-08-28 14:40 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Wei Liu, Ian Jackson, Anshul Makkar

On 07/27/2017 01:05 PM, Dario Faggioli wrote:
> Soft-affinity support is usually implemented by means
> of a two step "balancing loop", where:
> - during the first step, we consider soft-affinity
>   (if the vcpu has one);
> - during the second (if we get to it), we consider
>   hard-affinity.
> 
> In runq_tickle(), we need to do that for checking
> whether we can execute the waking vCPU on an pCPU
> that is idle. In fact, we want to be sure that, if
> there is an idle pCPU in the vCPU's soft affinity,
> we'll use it.
> 
> If there are no such idle pCPUs, though, and we
> have to check non-idle ones, we can avoid the loop
> and to both hard and soft-affinity in one pass.
> 
> In fact, we can we scan runqueue and compute a
> "score" for each vCPU which is running on each pCPU.
> The idea is, since we may have to preempt someone:
> - try to make sure that the waking vCPU will run
>   inside its soft-affinity,
> - try to preempt someone that is running outside
>   of its own soft-affinity.
> 
> The value of the score is added to a trace record,
> so xenalyze's code and tools/xentrace/formats are
> updated accordingly.
> 
> Suggested-by: George Dunlap <george.dunlap@citrix.com>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: George Dunlap <george.dunlap@citrix.com>

The primary focus of this patch is credit2, so the title should probably
be xen/credit2, right?

I can change that on checkin.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/6] xen: credit2: soft-affinity awareness in gat_fallback_cpu()
  2017-07-27 12:05 ` [PATCH v2 2/6] xen: credit2: soft-affinity awareness in gat_fallback_cpu() Dario Faggioli
@ 2017-08-28 14:49   ` George Dunlap
  0 siblings, 0 replies; 10+ messages in thread
From: George Dunlap @ 2017-08-28 14:49 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Justin T. Weaver, Anshul Makkar

On 07/27/2017 01:05 PM, Dario Faggioli wrote:
> By, basically, moving all the logic of the function
> inside the usual two steps (soft-affinity step and
> hard-affinity step) loop.
> 
> While there, add two performance counters (in cpu_pick
> and in get_fallback_cpu() itself), in order to be able
> to tell how frequently it happens that we need to look
> for a fallback cpu.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
> ---
> Cc: Anshul Makkar <anshulmakkar@gmail.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
> Changes from v1:
> - as discussed during review, only consider hard-affinity for the last stand.
>   The idea is not moving the vcpu to a diffrent runqueue because of
>   soft-affinity, as a part of finding a fallback cpu;
> - as discussed during review, added the performance counters;
> - BUG_ON(1) turned into ASSERT_UNREACHABLE(), as suggested during review;
> - return something same and random enough, at the end of the function (in
>   case we somehow manage to get there).
> ---
>  xen/common/sched_credit2.c   |  101 +++++++++++++++++++++++++++++++++---------
>  xen/include/xen/perfc_defn.h |    2 +
>  2 files changed, 82 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 57e77df..aa8f169 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -549,36 +549,93 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
>  }
>  
>  /*
> - * When a hard affinity change occurs, we may not be able to check some
> - * (any!) of the other runqueues, when looking for the best new processor
> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
> - * pick, in order of decreasing preference:
> - *  - svc's current pcpu;
> - *  - another pcpu from svc's current runq;
> - *  - any cpu.
> + * In csched2_cpu_pick(), it may not be possible to actually look at remote
> + * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
> + * we pick, in order of decreasing preference:
> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
> + *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
> + *  3) svc's current pcpu, if it is part of svc's hard affinity;
> + *  4) a pcpu in svc's current runqueue that is also in svc's hard affinity;
> + *  5) just one valid pcpu from svc's hard affinity
> + *
> + * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
> + * note that at least 6 is guaranteed to _always_ return at least one pcpu.

s/6/5/; ?

>   */
>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>  {
>      struct vcpu *v = svc->vcpu;
> -    int cpu = v->processor;
> +    unsigned int bs;
>  
> -    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
> -                cpupool_domain_cpumask(v->domain));
> +    SCHED_STAT_CRANK(need_fallback_cpu);
>  
> -    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
> -        return cpu;
> -
> -    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> -                                   &svc->rqd->active)) )
> +    for_each_affinity_balance_step( bs )
>      {
> -        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
> -                    cpumask_scratch_cpu(cpu));
> -        return cpumask_first(cpumask_scratch_cpu(cpu));
> -    }
> +        int cpu = v->processor;
>  
> -    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
> +        if ( bs == BALANCE_SOFT_AFFINITY &&
> +             !has_soft_affinity(v, v->cpu_hard_affinity) )
> +            continue;
>  
> -    return cpumask_first(cpumask_scratch_cpu(cpu));
> +        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                    cpupool_domain_cpumask(v->domain));
> +
> +        /*
> +         * This is cases 1 or 3 (depending on bs): if v->processor is (still)
> +         * in our affinity, go for it, for cache betterness.
> +         */
> +        if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
> +            return cpu;
> +
> +        /*
> +         * This is cases 2 or 4 (depending on bs): v->processor isn't there
> +         * any longer, check if we at least can stay in our current runq.
> +         */
> +        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> +                                       &svc->rqd->active)) )
> +        {
> +            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                        &svc->rqd->active);
> +            return cpumask_first(cpumask_scratch_cpu(cpu));
> +        }
> +
> +        /*
> +         * We may well pick any valid pcpu from our soft-affinity, outside
> +         * of our current runqueue, but we decide not to. In fact, changing
> +         * runqueue is slow, affects load distribution, and is a source of
> +         * overhead for the vcpus running on the other runqueue (we need the
> +         * lock). So, better do that as a consequence of a well informed
> +         * decision (or if we really don't have any other chance, as we will,
> +         * at step 6, if we get to there).

s/5/6/;

> +         *
> +         * Also, being here, looking for a fallback, is an unfortunate and
> +         * infrequent event, while the decision of putting us in the runqueue
> +         * wehere we are was (likely) made taking all the relevant factors
> +         * into account. So let's not disrupt that, just for the sake of
> +         * soft-affinity, and let's wait here to be able to made (hopefully,
> +         * soon), another similar well informed decision.
> +         */
> +        if ( bs == BALANCE_SOFT_AFFINITY )
> +            continue;
> +
> +        /*
> +         * This is cases 6: last stand, just one valid pcpu from our hard
> +         * affinity. It's guaranteed that there is at least one valid cpu,
> +         * and therefore we are sure that we return it, and never really
> +         * exit the loop.
> +         */

s/5/6/;

Everything else looks good -- I can fix these up on commit.

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/6] xen/tools: credit2: soft-affinity awareness in runq_tickle()
@ 2017-08-28 17:58 Dario Faggioli
  0 siblings, 0 replies; 10+ messages in thread
From: Dario Faggioli @ 2017-08-28 17:58 UTC (permalink / raw)
  To: George Dunlap; +Cc: Ian Jackson, Anshul Makkar, Wei Liu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1768 bytes --]

Il 28 Ago 2017 16:40, George Dunlap <george.dunlap@citrix.com> ha scritto:
On 07/27/2017 01:05 PM, Dario Faggioli wrote:
> Soft-affinity support is usually implemented by means
> of a two step "balancing loop", where:
> - during the first step, we consider soft-affinity
>   (if the vcpu has one);
> - during the second (if we get to it), we consider
>   hard-affinity.
>
> In runq_tickle(), we need to do that for checking
> whether we can execute the waking vCPU on an pCPU
> that is idle. In fact, we want to be sure that, if
> there is an idle pCPU in the vCPU's soft affinity,
> we'll use it.
>
> If there are no such idle pCPUs, though, and we
> have to check non-idle ones, we can avoid the loop
> and to both hard and soft-affinity in one pass.
>
> In fact, we can we scan runqueue and compute a
> "score" for each vCPU which is running on each pCPU.
> The idea is, since we may have to preempt someone:
> - try to make sure that the waking vCPU will run
>   inside its soft-affinity,
> - try to preempt someone that is running outside
>   of its own soft-affinity.
>
> The value of the score is added to a trace record,
> so xenalyze's code and tools/xentrace/formats are
> updated accordingly.
>
> Suggested-by: George Dunlap <george.dunlap@citrix.com>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: George Dunlap <george.dunlap@citrix.com>

The primary focus of this patch is credit2, so the title should probably
be xen/credit2, right?

It touches the tools (xenalyze), so I put tools in the rag, but I'm fine with just 'xen/credit2'.


I can change that on checkin.

Sure, go ahead.

Thanks, Dario

PS. Sorry for the (most likely) bad format of this email (semt from my phone)


[-- Attachment #1.2: Type: text/html, Size: 3305 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-08-28 17:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-27 12:05 [PATCH v2 0/6] Soft affinity for Credit2 Dario Faggioli
2017-07-27 12:05 ` [PATCH v2 1/6] xen/tools: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
2017-08-28 14:40   ` George Dunlap
2017-07-27 12:05 ` [PATCH v2 2/6] xen: credit2: soft-affinity awareness in gat_fallback_cpu() Dario Faggioli
2017-08-28 14:49   ` George Dunlap
2017-07-27 12:05 ` [PATCH v2 3/6] xen: credit2: soft-affinity awareness in csched2_cpu_pick() Dario Faggioli
2017-07-27 12:06 ` [PATCH v2 4/6] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
2017-07-27 12:06 ` [PATCH v2 5/6] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
2017-07-27 12:06 ` [PATCH v2 6/6] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
2017-08-28 17:58 [PATCH v2 1/6] xen/tools: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.