On Wed, 2016-08-31 at 18:10 +0100, anshul makkar wrote: > On 17/08/16 18:18, Dario Faggioli wrote: > > > > Right now, the following scenario can occurr: > >   - upon vcpu v wakeup, v itself is put in the runqueue, > >     and pcpu X is tickled; > >   - pcpu Y schedules (for whatever reason), sees v in > >     the runqueue and picks it up. > > > > This may seem ok (or even a good thing), but it's not. > > In fact, if runq_tickle() decided X is where v should > > run, it did it for a reason (load distribution, SMT > > support, cache hotness, affinity, etc), and we really > > should try as hard as possible to stick to that. > > > > Of course, we can't be too strict, or we risk leaving > > vcpus in the runqueue while there is available CPU > > capacity. So, we only leave v in runqueue --for X to > > pick it up-- if we see that X has been tickled and > > has not scheduled yet, i.e., it will have a real chance > > of actually select and schedule v. > > > > If that is not the case, we schedule it on Y (or, at > > least, we consider that), as running somewhere non-ideal > > is better than not running at all. > > > > The commit also adds performance counters for each of > > the possible situations. > > > > Signed-off-by: Dario Faggioli > > --- > > Cc: George Dunlap > > Cc: Anshul Makkar > > Cc: Jan Beulich > > Cc: Andrew Cooper > > --- > >   xen/common/sched_credit2.c   |   65 > > +++++++++++++++++++++++++++++++++++++++--- > >   xen/include/xen/perfc_defn.h |    3 ++ > >   2 files changed, 64 insertions(+), 4 deletions(-) > > > > diff --git a/xen/common/sched_credit2.c > > b/xen/common/sched_credit2.c > > index 12dfd20..a3d7beb 100644 > > --- a/xen/common/sched_credit2.c > > +++ b/xen/common/sched_credit2.c > > @@ -54,6 +54,7 @@ > >   #define TRC_CSCHED2_LOAD_CHECK       TRC_SCHED_CLASS_EVT(CSCHED2, > > 16) > >   #define TRC_CSCHED2_LOAD_BALANCE     TRC_SCHED_CLASS_EVT(CSCHED2, > > 17) > >   #define TRC_CSCHED2_PICKED_CPU       TRC_SCHED_CLASS_EVT(CSCHED2, > > 19) > > +#define TRC_CSCHED2_RUNQ_CANDIDATE   TRC_SCHED_CLASS_EVT(CSCHED2, > > 20) > > > >   /* > >    * WARNING: This is still in an experimental phase.  Status and > > work can be found at the > > @@ -398,6 +399,7 @@ struct csched2_vcpu { > >       int credit; > >       s_time_t start_time; /* When we were scheduled (used for > > credit) */ > >       unsigned flags;      /* 16 bits doesn't seem to play well > > with clear_bit() */ > > +    int tickled_cpu;     /* cpu tickled for picking us up (-1 if > > none) */ > > > >       /* Individual contribution to load */ > >       s_time_t load_last_update;  /* Last time average was updated > > */ > > @@ -1049,6 +1051,10 @@ runq_tickle(const struct scheduler *ops, > > struct csched2_vcpu *new, s_time_t now) > >       __cpumask_set_cpu(ipid, &rqd->tickled); > >       smt_idle_mask_clear(ipid, &rqd->smt_idle); > >       cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ); > > + > > +    if ( unlikely(new->tickled_cpu != -1) ) > > +        SCHED_STAT_CRANK(tickled_cpu_overwritten); > > +    new->tickled_cpu = ipid; > >   } > > > >   /* > > @@ -1266,6 +1272,7 @@ csched2_alloc_vdata(const struct scheduler > > *ops, struct vcpu *vc, void *dd) > >           ASSERT(svc->sdom != NULL); > >           svc->credit = CSCHED2_CREDIT_INIT; > >           svc->weight = svc->sdom->weight; > > +        svc->tickled_cpu = -1; > >           /* Starting load of 50% */ > >           svc->avgload = 1ULL << (CSCHED2_PRIV(ops)- > > >load_precision_shift - 1); > >           svc->load_last_update = NOW() >> > > LOADAVG_GRANULARITY_SHIFT; > > @@ -1273,6 +1280,7 @@ csched2_alloc_vdata(const struct scheduler > > *ops, struct vcpu *vc, void *dd) > >       else > >       { > >           ASSERT(svc->sdom == NULL); > > +        svc->tickled_cpu = svc->vcpu->vcpu_id; > If I understood correctly, tickled_cpu refers to pcpu and not a > vcpu.  > Saving vcpu_id in tickled_cpu looks wrong. > Yes, and in fact, as you can see in the previous hunk, for pretty much all vcpus, tickled_cpu is initialized to -1. Here, we are dealing with the vcpus of the idle domain. And for vcpus of the idle domain, their vcpu id is the same as the id of the pcpu they're associated to. I agree it looks a little bit weird, but it's both correct, and the easiest and cleanest way for initializing this. I guess I can add a comment... Thanks and Regards, Dario -- <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)