From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes Date: Thu, 19 Jul 2012 14:21:34 +0200 Message-ID: <5007FBCE.6000201@amd.com> References: <5fa66c8b9093399e5bc3.1342458792@Solace> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5fa66c8b9093399e5bc3.1342458792@Solace> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: Ian Campbell , Stefano Stabellini , George Dunlap , Andrew Cooper , Juergen Gross , Ian Jackson , xen-devel List-Id: xen-devel@lists.xenproject.org Dario, sorry for joining the discussion so late, but I was busy with other things and saw the project in good hands. Finally I managed to get some testing on these patches. I took my 8-node machine, alternately equipped with 16GB and 8GB per node. Each node has 8 pCPUs. As a special(i)ty I removed the DIMMs from node 2 and 3 to test Andrew's memory-less node patches, leading to this configuration: node: memsize memfree distances 0: 17280 4518 10,16,16,22,16,22,16,22 1: 8192 3639 16,10,16,22,22,16,22,16 2: 0 0 16,16,10,16,16,16,16,22 3: 0 0 22,22,16,10,16,16,22,16 4: 16384 4766 16,22,16,16,10,16,16,16 5: 8192 2882 22,16,16,16,16,10,22,22 6: 16384 4866 16,22,16,22,16,22,10,16 7: 8176 2799 22,16,22,16,16,22,16,10 Then I started 32 guests, each 4 vCPUs and 1 GB of RAM. Now since the code prefers free memory so much over free CPUs, the placement was the following: node0: guests 2,5,8,11,14,17,20,25,30 node1: guests 21,27 node2: none node3: none node4: guests 1,4,7,10,13,16,19,23,29 node5: guests 24,31 node6: guests 3,6,9,12,15,18,22,28 node7: guests 26,32 As you can see, the nodes with more memory are _way_ overloaded, while the lower memory ones are underutilized. In fact the first 20 guests didn't use the other nodes at all. I don't care so much about the two memory-less nodes, but I'd like to know how you came to the magic "3" in the formula: > + > + return sign(3*freememkb_diff + nrdomains_diff); > +} I haven't done any measurements on this, but I guess scheduling 36 vCPUs on 8 pCPUs has a much bigger performance penalty than any remote NUMA access, which nowadays is much better than a few years ago, with big L3 caches, better predictors and faster interconnects. I will now put in the memory again and also try to play a bit with the amount of memory and VCPUs per guest. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany