From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Magenheimer Subject: RE: [PATCH 00/11] PV NUMA Guests Date: Tue, 6 Apr 2010 10:18:09 -0700 (PDT) Message-ID: <7a461573-c606-4f3b-989d-30626655362d@default> References: <0aec86b6-f895-4a73-989b-76ee5d5f3874@default m2q940bcfd21004052051o71080a95s8443d4c92918a62f@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Dulloor Cc: xen-devel@lists.xensource.com, Keir Fraser List-Id: xen-devel@lists.xenproject.org In general, I am of the opinion that in a virtualized world, one gets best flexibility or best performance, but not both. There may be a couple of reasonable points on this "slider selector", but I'm not sure in general if it will be worth a huge time investment as real users will not understand the subtleties of their workloads well enough to choose from a large number of (perhaps more than two) points on the performance/flexibility spectrum. So customers that want highest performance should be prepared to pin their guests and not use ballooning. And those that want the flexibility of migration and ballooning etc should expect to see a performance hit (including NUMA consequences). But since I don't get to make that decision, let's look at the combination of NUMA + dynamic memory utilization... > Please refer to my previously submitted patch for this > (http://old.nabble.com/Xen-devel--XEN-PATCH---Linux-PVOPS--ballooning- > on-numa-domains-td26262334.html). > I intend to send out a refreshed patch once the basic guest numa is > checked in. OK, will wait and take a look at that later. =20 > We first try to CONFINE a domain and only then proceed to STRIPE or > SPLIT(if capable) the domain. So, in this (automatic) global domain > memory allocation scheme, there is no possibility of starvation from > memory pov. Hope I got your question right. The example I'm concerned with is: 1) Domain A is CONFINE'd to node A and domain B/C/D/etc are not CONFINE'd 2) Domain A uses less than the total memory on node A and/or balloons down so it uses even less than when launched. 3) Domains B/C/D have an increasing memory need, and semi-randomly absorb memory from all nodes, including node A. After (3), free memory is somewhat randomly distributed across all nodes. Then: 4) Domain A suddenly has an increasing memory need... but there's not enough free memory remaining on node A (in fact possibly there is none at all) to serve its need. But by definition of CONFINE, domain A is not allowed to use memory other than on node A. What happens now? It appears to me that other domains have (perhaps even maliciously) starved domain A. I think this is a dynamic bin-packing problem which is unsolvable in general form. So the choice of heuristics is going to be important. =20 > For the tmem, I was thinking of the ability to specify a set of nodes > from which the tmem-space memory is preferred which could be derived > from the domain's numa enlightenment, but as you mentioned the > full-page copy overhead is less noticeable (at least on my smaller > NUMA machine). But, the rate would determine if we should do this to > reduce inter-node traffic. What do you suggest ? I was looking at the > data structures too. Since tmem allocates individual xmalloc-tlsf memory pools per domain, it should be possible to inform tmem of node preferences, but I don't know that it will be feasible to truly CONFINE a domain's tmem. On the other hand, because of the page copying, affinity by itself may be sufficient. > > Also, I will be looking into adding some page-sharing > > techniques into tmem in the near future. =A0This (and the > > existing page sharing feature just added to 4.0) may > > create some other interesting challenges for NUMA-awareness. > I have just started reading up on the memsharing feature of Xen. I > would be glad to get your input on NUMA challenges over there. Note that the tmem patch that does sharing (tmem calls it "page deduplication") was just accepted into xen-unstable. Basically some memory may belong to more than one domain, so NUMA affects and performance/memory tradeoffs may get very complicated. Dan