From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH 08 of 10 v2] libxl: enable automatic placement of guests on NUMA nodes Date: Thu, 21 Jun 2012 17:16:01 +0100 Message-ID: <4FE348C1.5030407@eu.citrix.com> References: <81f18379bb3d4d9397d1.1339779876@Solace> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <81f18379bb3d4d9397d1.1339779876@Solace> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: Andre Przywara , Ian Campbell , Stefano Stabellini , Juergen Gross , Ian Jackson , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 15/06/12 18:04, Dario Faggioli wrote: > If a domain does not have a VCPU affinity, try to pin it automatically to some > PCPUs. This is done taking into account the NUMA characteristics of the host. > In fact, we look for a combination of host's NUMA nodes with enough free memory > and number of PCPUs for the new domain, and pin it to the VCPUs of those nodes. > > Once we know which ones, among all the possible combinations, represents valid > placement candidates for a domain, use some heuistics for deciding which is the > best. For instance, smaller candidates are considered to be better, both from > the domain's point of view (fewer memory spreading among nodes) and from the > system as a whole point of view (fewer memoy fragmentation). In case of > candidates of equal sizes (i.e., with the same number of nodes), the one with > the greater amount of memory wins, as this is also good for keeping memory > fragmentation under control. > > This all happens internally to libxl, and no API for driving the mechanism is > provided for now. This matches what xend already does. > > Signed-off-by: Dario Faggioli Overall I think this approach is much better. I haven't done an extremely detailed review, but I do have some comments below. > + /* > + * Round up and down some of the constraints. For instance, the minimum > + * number of cpus a candidate should have must at least be non-negative. > + * Regarding the minimum number of NUMA nodes, if not explicitly specified > + * (i.e., min_nodes<= 0), we try to figure out a sensible number of nodes > + * from where to start generating candidates, if possible (or just start > + * from 1 otherwise). The maximum number of nodes should not exceed the > + * number of existent NUMA nodes on the host, or the candidate genaration > + * won't work properly. > + */ > + min_cpus = min_cpus<= 0 ? 0 : min_cpus; Wouldn't it just make more sense to specify that "min_cpus" (and other parameters) had to be >=0? In any case, this is a very complicated way of saying "if (min_cpus<0) min_cpus = 0;". > + if (min_nodes<= 0) { > + int cpus_per_node; > + > + cpus_per_node = cpus_per_node_count(tinfo, nr_cpus, ninfo, nr_nodes); > + if (cpus_per_node == 0) > + min_nodes = 1; > + else > + min_nodes = (min_cpus + cpus_per_node - 1) / cpus_per_node; > + } Same here; you could just write "if(!min_nodes) {..." > + min_nodes = min_nodes> nr_nodes ? nr_nodes : min_nodes; > + if (max_nodes<= 0) > + max_nodes = nr_nodes; > + else > + max_nodes = max_nodes> nr_nodes ? nr_nodes : max_nodes; if (max_nodes == 0 || max_nodes > nr_nodes) max_nodes = nr_nodes; > + > +/* > + * The NUMA placement candidates are reordered according to the following > + * heuristics: > + * - candidates involving fewer nodes come first. In case two (or > + * more) candidates span the same number of nodes, > + * - candidates with greater amount of free memory come first. In > + * case two (or more) candidates differ in their amount of free > + * memory by less than 10%, Interesting idea -- sounds pretty reasonable. > + * - candidates with fewer domains insisting on them at the time of > + * this call come first. Do you mean "existing"? I think "assigned to" is probably better. > + */ > +static int numa_cmpf(const void *v1, const void *v2) > +{ > + const libxl__numa_candidate *c1 = (const libxl__numa_candidate*) v1; > + const libxl__numa_candidate *c2 = (const libxl__numa_candidate*) v2; > + double mem_diff = labs(c1->free_memkb - c2->free_memkb); > + double mem_avg = (c1->free_memkb + c2->free_memkb) / 2.0; > + > + if (c1->nr_nodes != c2->nr_nodes) > + return c1->nr_nodes - c2->nr_nodes; > + > + if ((mem_diff / mem_avg) * 100.0< 10.0&& > + c1->nr_domains != c2->nr_domains) > + return c1->nr_domains - c2->nr_domains; I realize this isn't a hot path, but it seems like moving into FP is really unnecessary. You can just do this: if ( ((mem_diff * 100) / mem_avg) < 10 ... One minor note: I personally think it's a lot more readable to put the '&&' and '||' on the same line as the next item, rather than the previous item; i.e.: if ( expression_a && expression_b ) One more thing: Is there a reason why you put get_numa_candidates() in libxl_internal.h, but not sort_numa_candidates()? It seems like both or neither should go. :-) That's all I have for now. I'm OK with the general approach, so here's a "weak ack", so if a maintainer is happy with the code, he can check it in: Acked-by: George Dunlap I'll try to come back and give a more detailed code review if I get a chance. -George