All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <George.Dunlap@eu.citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Marcus Granado <Marcus.Granado@eu.citrix.com>,
	Andre Przywara <andre.przywara@amd.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Anil Madhavapeddy <anil@recoil.org>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	xen-devel@lists.xen.org, Jan Beulich <JBeulich@suse.com>,
	Daniel De Graaf <dgdegra@tycho.nsa.gov>,
	Matt Wilson <msw@amazon.com>
Subject: Re: [PATCH 3 of 8] xen: let the (credit) scheduler know about `node affinity`
Date: Tue, 9 Oct 2012 17:29:54 +0100	[thread overview]
Message-ID: <CAFLBxZb-366Bzkd+Ar3z5p+EfrX6VpesVWKv-PnGhhD7HWLS7g@mail.gmail.com> (raw)
In-Reply-To: <ca2fa958879bbffa9bc6.1349446101@Solace>

On Fri, Oct 5, 2012 at 3:08 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> As vcpu-affinity tells where vcpus can run, node-affinity tells
> where a domain's vcpus prefer to run. Respecting vcpu-affinity is
> the primary concern, but honouring node-affinity will likely
> result in some performances benefit.
>
> This change modifies the vcpu load balancing algorithm (for the
> credit scheduler only), introducing a two steps logic.
> During the first step, we use the node-affinity mask. The aim is
> giving precedence to the CPUs where it is known to be preferrable
> for the domain to run. If that fails in finding a valid CPU, the
> node-affinity is just ignored and, in the second step, we fall
> back to using cpu-affinity only.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Looking at the load-balancing code, it makes me think that there is
probably some interesting work to do there in the future; but I think
this patch can go in as it is for now.  So

(Once others' comments are addressed)
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

Do scroll down to read my comments on load balancing...

> @@ -1211,30 +1289,48 @@ csched_runq_steal(int peer_cpu, int cpu,
>       */
>      if ( peer_pcpu != NULL && !is_idle_vcpu(peer_vcpu) )
>      {
> -        list_for_each( iter, &peer_pcpu->runq )
> +        int balance_step;
> +
> +        /*
> +         * Take node-affinity into account. That means, for all the vcpus
> +         * in peer_pcpu's runq, check _first_ if their node-affinity allows
> +         * them to run on cpu. If not, retry the loop considering plain
> +         * vcpu-affinity. Also, notice that as soon as one vcpu is found,
> +         * balancing is considered done, and the vcpu is returned to the
> +         * caller.
> +         */
> +        for_each_csched_balance_step(balance_step)
>          {
> -            speer = __runq_elem(iter);
> +            list_for_each( iter, &peer_pcpu->runq )
> +            {
> +                cpumask_t balance_mask;
>
> -            /*
> -             * If next available VCPU here is not of strictly higher
> -             * priority than ours, this PCPU is useless to us.
> -             */
> -            if ( speer->pri <= pri )
> -                break;
> +                speer = __runq_elem(iter);
>
> -            /* Is this VCPU is runnable on our PCPU? */
> -            vc = speer->vcpu;
> -            BUG_ON( is_idle_vcpu(vc) );
> +                /*
> +                 * If next available VCPU here is not of strictly higher
> +                 * priority than ours, this PCPU is useless to us.
> +                 */
> +                if ( speer->pri <= pri )
> +                    break;
>
> -            if (__csched_vcpu_is_migrateable(vc, cpu))
> -            {
> -                /* We got a candidate. Grab it! */
> -                CSCHED_VCPU_STAT_CRANK(speer, migrate_q);
> -                CSCHED_STAT_CRANK(migrate_queued);
> -                WARN_ON(vc->is_urgent);
> -                __runq_remove(speer);
> -                vc->processor = cpu;
> -                return speer;
> +                /* Is this VCPU runnable on our PCPU? */
> +                vc = speer->vcpu;
> +                BUG_ON( is_idle_vcpu(vc) );
> +
> +                if ( csched_balance_cpumask(vc, balance_step, &balance_mask) )
> +                    continue;

This will have the effect that a vcpu with any node affinity at all
will be stolen before a vcpu with no node affinity: i.e., if you have
a system with 4 nodes, and one vcpu has an affinity to nodes 1-2-3,
another has affinity with only 1, and another has an affinity to all
4, the one which has an affinity to all 4 will be passed over the
first round, while either of the first ones might be nabbed (depending
on what pcpu they're on).

Furthermore, the effect of this whole thing (if I'm reading it right)
will be to go through *each runqueue* twice, rather than checking all
cpus for vcpus with good node affinity, and then all cpus for vcpus
with good cpumasks.

It seems like it would be better to check:
* This node for node-affine work to steal
* Other nodes for node-affine work to steal
* All nodes for cpu-affine work to steal.

Ideally, the search would terminate fairly quickly with the first set,
meaning that in the common case we never even check other nodes.

Going through the cpu list twice means trying to grab the scheduler
lock for each cpu twice; but hopefully that would be made up for by
having a shorter list.

Thoughts?

Like I said, I think this is something to put on our to-do list; this
patch should go in so we can start testing it as soon as possible.

 -George

> +
> +                if (__csched_vcpu_is_migrateable(vc, cpu, &balance_mask))
> +                {
> +                    /* We got a candidate. Grab it! */
> +                    CSCHED_VCPU_STAT_CRANK(speer, migrate_q);
> +                    CSCHED_STAT_CRANK(migrate_queued);
> +                    WARN_ON(vc->is_urgent);
> +                    __runq_remove(speer);
> +                    vc->processor = cpu;
> +                    return speer;
> +                }
>              }
>          }
>      }
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

  parent reply	other threads:[~2012-10-09 16:29 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-05 14:08 [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler Dario Faggioli
2012-10-05 14:08 ` [PATCH 1 of 8] xen, libxc: rename xenctl_cpumap to xenctl_bitmap Dario Faggioli
2012-10-09 15:59   ` George Dunlap
2012-10-05 14:08 ` [PATCH 2 of 8] xen, libxc: introduce node maps and masks Dario Faggioli
2012-10-09 15:59   ` George Dunlap
2012-10-05 14:08 ` [PATCH 3 of 8] xen: let the (credit) scheduler know about `node affinity` Dario Faggioli
2012-10-05 14:25   ` Jan Beulich
2012-10-09 10:29     ` Dario Faggioli
2012-10-09 11:10       ` Keir Fraser
2012-10-09  9:53   ` Juergen Gross
2012-10-09 10:21     ` Dario Faggioli
2012-10-09 16:29   ` George Dunlap [this message]
2012-10-05 14:08 ` [PATCH 4 of 8] xen: allow for explicitly specifying node-affinity Dario Faggioli
2012-10-09 16:47   ` George Dunlap
2012-10-09 16:52     ` Ian Campbell
2012-10-09 18:31       ` [PATCH RFC] flask: move policy header sources into hypervisor Daniel De Graaf
2012-10-10  8:38         ` Ian Campbell
2012-10-10  8:44         ` Dario Faggioli
2012-10-10 14:03           ` Daniel De Graaf
2012-10-10 14:39             ` Dario Faggioli
2012-10-10 15:32               ` Daniel De Graaf
2012-10-09 17:17     ` [PATCH 4 of 8] xen: allow for explicitly specifying node-affinity Dario Faggioli
2012-10-05 14:08 ` [PATCH 5 of 8] libxc: " Dario Faggioli
2012-10-05 14:08 ` [PATCH 6 of 8] libxl: " Dario Faggioli
2012-10-05 14:08 ` [PATCH 7 of 8] libxl: automatic placement deals with node-affinity Dario Faggioli
2012-10-10 10:55   ` George Dunlap
2012-10-05 14:08 ` [PATCH 8 of 8] xl: add node-affinity to the output of `xl list` Dario Faggioli
2012-10-05 16:36   ` Ian Jackson
2012-10-09 11:07     ` Dario Faggioli
2012-10-09 15:03       ` Ian Jackson
2012-10-10  8:46         ` Dario Faggioli
2012-10-08 19:43 ` [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler Dan Magenheimer
2012-10-09 10:45   ` Dario Faggioli
2012-10-09 20:20     ` Matt Wilson
2012-10-10 16:18   ` Dario Faggioli
2012-10-09 10:02 ` Juergen Gross
2012-10-10 11:00 ` George Dunlap
2012-10-10 12:28   ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFLBxZb-366Bzkd+Ar3z5p+EfrX6VpesVWKv-PnGhhD7HWLS7g@mail.gmail.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Marcus.Granado@eu.citrix.com \
    --cc=andre.przywara@amd.com \
    --cc=anil@recoil.org \
    --cc=dario.faggioli@citrix.com \
    --cc=dgdegra@tycho.nsa.gov \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=msw@amazon.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.