From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes Date: Fri, 20 Jul 2012 13:47:35 +0200 Message-ID: <50094557.8020208@amd.com> References: <5fa66c8b9093399e5bc3.1342458792@Solace> <5007FBCE.6000201@amd.com> <1342707771.19530.235.camel@Solace> <1342772429.19530.247.camel@Solace> <5009161C.2060005@amd.com> <1342777496.19530.271.camel@Solace> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1342777496.19530.271.camel@Solace> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: Ian Campbell , Stefano Stabellini , George Dunlap , Andrew Cooper , Juergen Gross , Ian Jackson , xen-devel List-Id: xen-devel@lists.xenproject.org On 07/20/2012 11:44 AM, Dario Faggioli wrote: > On Fri, 2012-07-20 at 10:26 +0200, Andre Przywara wrote: >>> I really am not sure what to do here, perhaps treating the two metrics >>> more evenly? Or maybe even reverse the logic and give nr_domains more >>> weight? >> >> I replaced the 3 with 1 already, that didn't change so much. I think we >> should kind of reverse the importance of node load, since starving for >> CPU time is much worse than bad memory latency. I will do some >> experiments... >> > That would be nice. If you happen to have time to put something like > "3*nrdomains_diff+memfree_diff" and see how it goes, I'll be happy to > include at least that change, even in next posting. I did that. Guests are 2 VCPUs/2GB RAM. The results looked much better. After 16 guests I get: # xl vcpu-list | sed -e 1d | sort -n -k 7 | tr -s \ | cut -d\ -f7 | uniq -c 16 any (Dom0 had max_vcpus=16) 4 0-7 4 8-15 4 16-23 4 24-31 4 32-39 4 40-47 4 48-55 4 56-63 This is number of VCPUs per node. Also distributes equally. Memory looked like this: 0: 17280 9969 1: 8192 2268 2: 8192 1690 3: 8192 1754 4: 16384 10049 5: 8192 1879 6: 16384 10947 7: 16368 9267 After another 8 guests: 16 any 8 0-7 4 8-15 4 16-23 4 24-31 8 32-39 4 40-47 8 48-55 8 56-63 still not over-commited. 0: 17280 5846 1: 8192 2266 2: 8192 1690 3: 8192 1752 4: 16384 5925 5: 8192 1877 6: 16384 6824 7: 16368 5142 Finally with all 32 guests: 12 0-7 6 8-15 4 16-23 4 24-31 12 32-39 4 40-47 12 48-55 10 56-63 The bigger nodes are overcommited, while the others have free pCPUs (8 cores per node). But memory dictates this: 0: 17280 3130 1: 8192 0 2: 8192 485 3: 8192 1747 4: 16384 1803 5: 8192 1876 6: 16384 2701 7: 16368 3081 And the last domain already took memory from multiple nodes: (XEN) Domain 105 (total: 523255): (XEN) Node 0: 162740 (XEN) Node 1: 52616 (XEN) Node 2: 307899 (XEN) Node 3: 0 (XEN) Node 4: 0 (XEN) Node 5: 0 (XEN) Node 6: 0 (XEN) Node 7: 0 All other domains had all their 523255 pages from a single node. # xl vcpu-list 105 Name ID VCPU CPU State Time(s) CPU Affinity Guest32 105 0 5 -b- 3.1 0-7 Guest32 105 1 2 -b- 0.7 0-7 So without any deeper thinking this looks much better than the original version and possibly good enough for Xen 4.2. Maybe the automated testing finds some leftovers. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany