From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <George.Dunlap@eu.citrix.com>
Subject: Re: [PATCH v3 4/4] sched: credit2: consider per-vcpu
	soft affinity
Date: Mon, 20 Apr 2015 16:38:55 +0100
Message-ID: <CAFLBxZZ3Fq0vR4oYdU69=hY-bws-31J+Fw0Lh1Owb+x5GL9sxw@mail.gmail.com>
References: <1427363314-25430-1-git-send-email-jtweaver@hawaii.edu>
	<1427363314-25430-5-git-send-email-jtweaver@hawaii.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1427363314-25430-5-git-send-email-jtweaver@hawaii.edu>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: "Justin T. Weaver" <jtweaver@hawaii.edu>
Cc: Dario Faggioli <dario.faggioli@citrix.com>, henric@hawaii.edu, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On Thu, Mar 26, 2015 at 9:48 AM, Justin T. Weaver <jtweaver@hawaii.edu> wrote:
>  * choose_cpu
>
> choose_cpu now tries to find the run queue with the most cpus in the given
> vcpu's soft affinity. It uses minimum run queue load as a tie breaker.
[snip]
>  * choose_cpu: added balance loop to find cpu for given vcpu that has most
>    soft cpus (with run queue load being a tie breaker), or if none were found,
>    or not considering soft affinity, pick cpu from runq with least load
[snip]
> @@ -1086,7 +1130,7 @@ static int
>  choose_cpu(const struct scheduler *ops, struct vcpu *vc)
>  {
>      struct csched2_private *prv = CSCHED2_PRIV(ops);
> -    int i, min_rqi = -1, new_cpu;
> +    int i, rqi = -1, new_cpu, max_soft_cpus = 0, balance_step;
>      struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
>      s_time_t min_avgload;
>

Hey Justin -- sorry for taking so long to get back to this one.

Before getting into the changes to choose_cpu(): it looks like on the
__CSFLAG_runq_migrate_request path (starting with "First check to see
if we're here because someone else suggested a place for us to move"),
we only consider the hard affinity, not the soft affinity.  Is that
intentional?

> @@ -1143,9 +1187,28 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
>
>      min_avgload = MAX_LOAD;
>
> -    /* Find the runqueue with the lowest instantaneous load */
> +    /*
> +     * Find the run queue with the most cpus in vc's soft affinity. If there
> +     * is more than one queue with the highest soft affinity cpu count, then
> +     * pick the one with the lowest instantaneous run queue load. If the
> +     * vcpu does not have soft affinity, then only try to find the run queue
> +     * with the lowest instantaneous load.
> +     */
> +    for_each_sched_balance_step( balance_step )
> +    {
> +        if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY
> +            && !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity) )
> +            continue;
> +
> +        if ( balance_step == SCHED_BALANCE_HARD_AFFINITY && rqi > -1 )
> +        {
> +            balance_step = SCHED_BALANCE_SOFT_AFFINITY;
> +            break;
> +        }
> +
>          for_each_cpu(i, &prv->active_queues)
>          {
> +            int rqd_soft_cpus = 0;
>              struct csched2_runqueue_data *rqd;
>              s_time_t rqd_avgload = MAX_LOAD;
>
> @@ -1163,35 +1226,61 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
>               * so it is possible here that svc does not have hard affinity
>               * with any of the pcpus of svc's currently assigned run queue.
>               */
> +            sched_balance_cpumask(vc, balance_step, csched2_cpumask);
>              if ( rqd == svc->rqd )
>              {
> -                if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +                if ( cpumask_intersects(csched2_cpumask, &rqd->active) )
>                      rqd_avgload = rqd->b_avgload - svc->avgload;
> +                if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY )
> +                {
> +                    cpumask_and(csched2_cpumask, csched2_cpumask,
> +                        &rqd->active);
> +                    rqd_soft_cpus = cpumask_weight(csched2_cpumask);
> +                }
>              }
>              else if ( spin_trylock(&rqd->lock) )
>              {
> -                if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +                if ( cpumask_intersects(csched2_cpumask, &rqd->active) )
>                      rqd_avgload = rqd->b_avgload;
> +                if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY )
> +                {
> +                    cpumask_and(csched2_cpumask, csched2_cpumask,
> +                        &rqd->active);
> +                    rqd_soft_cpus = cpumask_weight(csched2_cpumask);
> +                }
>
>                  spin_unlock(&rqd->lock);
>              }
>              else
>                  continue;
>
> -            if ( rqd_avgload < min_avgload )
> +            if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY
> +                && rqd_soft_cpus > 0
> +                && ( rqd_soft_cpus > max_soft_cpus
> +                    ||
> +                   ( rqd_soft_cpus == max_soft_cpus
> +                    && rqd_avgload < min_avgload )) )
> +            {
> +                max_soft_cpus = rqd_soft_cpus;
> +                rqi = i;
> +                min_avgload = rqd_avgload;
> +            }
> +            else if ( balance_step == SCHED_BALANCE_HARD_AFFINITY
> +                     && rqd_avgload < min_avgload )
>              {
> +                rqi = i;
>                  min_avgload = rqd_avgload;
> -                min_rqi=i;
>              }
> +        }
>      }
>
>      /* We didn't find anyone (most likely because of spinlock contention). */
> -    if ( min_rqi == -1 )
> +    if ( rqi == -1 )
>          new_cpu = get_fallback_cpu(svc);
>      else
>      {
> -        cpumask_and(csched2_cpumask, vc->cpu_hard_affinity,
> -            &prv->rqd[min_rqi].active);
> +        sched_balance_cpumask(vc, balance_step, csched2_cpumask);
> +        cpumask_and(csched2_cpumask, csched2_cpumask, &prv->rqd[rqi].active);
>          new_cpu = cpumask_any(csched2_cpumask);
>          BUG_ON(new_cpu >= nr_cpu_ids);
>      }

So the general plan here looks right; but is there really a need to go
through the whole thing twice?  Couldn't we keep track of "rqi with
highest # cpus in soft affinity / lowest avgload" and "rqi with lowest
global avgload" in one pass, and then choose whichever one looks the
best at the end?

I think for closure sake I'm going to send this e-mail, and review the
load balancing step in another mail (which will come later this
evening).

 -George