From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andre Przywara <andre.przywara@amd.com>
Subject: Re: [PATCH 1 of 3 v5/leftover] libxl: enable automatic
 placement of guests on NUMA nodes
Date: Fri, 20 Jul 2012 13:47:35 +0200
Message-ID: <50094557.8020208@amd.com>
References: <patchbomb.1342458791@Solace>
	<5fa66c8b9093399e5bc3.1342458792@Solace>
	<5007FBCE.6000201@amd.com> <1342707771.19530.235.camel@Solace>
	<1342772429.19530.247.camel@Solace> <5009161C.2060005@amd.com>
	<1342777496.19530.271.camel@Solace>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1342777496.19530.271.camel@Solace>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <raistlin@linux.it>
Cc: Ian Campbell <Ian.Campbell@citrix.com>, Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>, George Dunlap <george.dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Juergen Gross <juergen.gross@ts.fujitsu.com>, Ian Jackson <Ian.Jackson@eu.citrix.com>, xen-devel <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 07/20/2012 11:44 AM, Dario Faggioli wrote:
> On Fri, 2012-07-20 at 10:26 +0200, Andre Przywara wrote:
>>> I really am not sure what to do here, perhaps treating the two metrics
>>> more evenly? Or maybe even reverse the logic and give nr_domains more
>>> weight?
>>
>> I replaced the 3 with 1 already, that didn't change so much. I think we
>> should kind of reverse the importance of node load, since starving for
>> CPU time is much worse than bad memory latency. I will do some
>> experiments...
>>
> That would be nice. If you happen to have time to put something like
> "3*nrdomains_diff+memfree_diff" and see how it goes, I'll be happy to
> include at least that change, even in next posting.

I did that. Guests are 2 VCPUs/2GB RAM. The results looked much better. 
After 16 guests I get:
# xl vcpu-list | sed -e 1d | sort -n -k 7 | tr -s \  | cut -d\  -f7 | 
uniq -c
      16 any       (Dom0 had max_vcpus=16)
       4 0-7
       4 8-15
       4 16-23
       4 24-31
       4 32-39
       4 40-47
       4 48-55
       4 56-63
This is number of VCPUs per node. Also distributes equally.
Memory looked like this:
    0:     17280       9969
    1:      8192       2268
    2:      8192       1690
    3:      8192       1754
    4:     16384      10049
    5:      8192       1879
    6:     16384      10947
    7:     16368       9267

After another 8 guests:
      16 any
       8 0-7
       4 8-15
       4 16-23
       4 24-31
       8 32-39
       4 40-47
       8 48-55
       8 56-63
still not over-commited.
    0:     17280       5846
    1:      8192       2266
    2:      8192       1690
    3:      8192       1752
    4:     16384       5925
    5:      8192       1877
    6:     16384       6824
    7:     16368       5142

Finally with all 32 guests:
      12 0-7
       6 8-15
       4 16-23
       4 24-31
      12 32-39
       4 40-47
      12 48-55
      10 56-63
The bigger nodes are overcommited, while the others have free pCPUs (8 
cores per node). But memory dictates this:
    0:     17280       3130
    1:      8192          0
    2:      8192        485
    3:      8192       1747
    4:     16384       1803
    5:      8192       1876
    6:     16384       2701
    7:     16368       3081
And the last domain already took memory from multiple nodes:
(XEN) Domain 105 (total: 523255):
(XEN)     Node 0: 162740
(XEN)     Node 1: 52616
(XEN)     Node 2: 307899
(XEN)     Node 3: 0
(XEN)     Node 4: 0
(XEN)     Node 5: 0
(XEN)     Node 6: 0
(XEN)     Node 7: 0

All other domains had all their 523255 pages from a single node.

# xl vcpu-list 105
Name                        ID  VCPU   CPU State   Time(s) CPU Affinity
Guest32                    105     0    5   -b-       3.1  0-7
Guest32                    105     1    2   -b-       0.7  0-7

So without any deeper thinking this looks much better than the original 
version and possibly good enough for Xen 4.2.
Maybe the automated testing finds some leftovers.


Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany