From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <george.dunlap@eu.citrix.com>
Subject: Re: [PATCH 08 of 10 v2] libxl: enable automatic
 placement of guests on NUMA nodes
Date: Thu, 21 Jun 2012 17:16:01 +0100
Message-ID: <4FE348C1.5030407@eu.citrix.com>
References: <patchbomb.1339779868@Solace>
	<81f18379bb3d4d9397d1.1339779876@Solace>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <81f18379bb3d4d9397d1.1339779876@Solace>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <raistlin@linux.it>
Cc: Andre Przywara <andre.przywara@amd.com>, Ian Campbell <Ian.Campbell@citrix.com>, Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>, Juergen Gross <juergen.gross@ts.fujitsu.com>, Ian Jackson <Ian.Jackson@eu.citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 15/06/12 18:04, Dario Faggioli wrote:
> If a domain does not have a VCPU affinity, try to pin it automatically to some
> PCPUs. This is done taking into account the NUMA characteristics of the host.
> In fact, we look for a combination of host's NUMA nodes with enough free memory
> and number of PCPUs for the new domain, and pin it to the VCPUs of those nodes.
>
> Once we know which ones, among all the possible combinations, represents valid
> placement candidates for a domain, use some heuistics for deciding which is the
> best. For instance, smaller candidates are considered to be better, both from
> the domain's point of view (fewer memory spreading among nodes) and from the
> system as a whole point of view (fewer memoy fragmentation).  In case of
> candidates of equal sizes (i.e., with the same number of nodes), the one with
> the greater amount of memory wins, as this is also good for keeping memory
> fragmentation under control.
>
> This all happens internally to libxl, and no API for driving the mechanism is
> provided for now. This matches what xend already does.
>
> Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com>
Overall I think this approach is much better.  I haven't done an 
extremely detailed review, but I do have some comments below.
> +    /*
> +     * Round up and down some of the constraints. For instance, the minimum
> +     * number of cpus a candidate should have must at least be non-negative.
> +     * Regarding the minimum number of NUMA nodes, if not explicitly specified
> +     * (i.e., min_nodes<= 0), we try to figure out a sensible number of nodes
> +     * from where to start generating candidates, if possible (or just start
> +     * from 1 otherwise). The maximum number of nodes should not exceed the
> +     * number of existent NUMA nodes on the host, or the candidate genaration
> +     * won't work properly.
> +     */
> +    min_cpus = min_cpus<= 0 ? 0 : min_cpus;
Wouldn't it just make more sense to specify that "min_cpus" (and other 
parameters) had to be >=0?

In any case, this is a very complicated way of saying "if (min_cpus<0) 
min_cpus = 0;".
> +    if (min_nodes<= 0) {
> +        int cpus_per_node;
> +
> +        cpus_per_node = cpus_per_node_count(tinfo, nr_cpus, ninfo, nr_nodes);
> +        if (cpus_per_node == 0)
> +            min_nodes = 1;
> +        else
> +            min_nodes = (min_cpus + cpus_per_node - 1) / cpus_per_node;
> +    }
Same here; you could just write "if(!min_nodes) {..."
> +    min_nodes = min_nodes>  nr_nodes ? nr_nodes : min_nodes;
> +    if (max_nodes<= 0)
> +        max_nodes = nr_nodes;
> +    else
> +        max_nodes = max_nodes>  nr_nodes ? nr_nodes : max_nodes;
if (max_nodes == 0 || max_nodes > nr_nodes) max_nodes = nr_nodes;
> +
> +/*
> + * The NUMA placement candidates are reordered according to the following
> + * heuristics:
> + *  - candidates involving fewer nodes come first. In case two (or
> + *    more) candidates span the same number of nodes,
> + *  - candidates with greater amount of free memory come first. In
> + *    case two (or more) candidates differ in their amount of free
> + *    memory by less than 10%,
Interesting idea -- sounds pretty reasonable.
> + *  - candidates with fewer domains insisting on them at the time of
> + *    this call come first.
Do you mean "existing"?  I think "assigned to" is probably better.
> + */
> +static int numa_cmpf(const void *v1, const void *v2)
> +{
> +    const libxl__numa_candidate *c1 = (const libxl__numa_candidate*) v1;
> +    const libxl__numa_candidate *c2 = (const libxl__numa_candidate*) v2;
> +    double mem_diff = labs(c1->free_memkb - c2->free_memkb);
> +    double mem_avg = (c1->free_memkb + c2->free_memkb) / 2.0;
> +
> +    if (c1->nr_nodes != c2->nr_nodes)
> +        return c1->nr_nodes - c2->nr_nodes;
> +
> +    if ((mem_diff / mem_avg) * 100.0<  10.0&&
> +        c1->nr_domains != c2->nr_domains)
> +        return c1->nr_domains - c2->nr_domains;
I realize this isn't a hot path, but it seems like moving into FP is 
really unnecessary.  You can just do this:

if ( ((mem_diff * 100) / mem_avg) < 10 ...

One minor note: I personally think it's a lot more readable to put the 
'&&' and '||' on the same line as the next item, rather than the 
previous item; i.e.:
  if ( expression_a
&& expression_b )

One more thing: Is there a reason why you put get_numa_candidates() in 
libxl_internal.h, but not sort_numa_candidates()?  It seems like both or 
neither should go. :-)

That's all I have for now.  I'm OK with the general approach, so here's 
a "weak ack", so if a maintainer is happy with the code, he can check it in:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

I'll try to come back and give a more detailed code review if I get a 
chance.

  -George