From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andre Przywara <andre.przywara@amd.com>
Subject: Re: [PATCH 1 of 3 v5/leftover] libxl: enable automatic
 placement of guests on NUMA nodes
Date: Thu, 19 Jul 2012 14:21:34 +0200
Message-ID: <5007FBCE.6000201@amd.com>
References: <patchbomb.1342458791@Solace>
	<5fa66c8b9093399e5bc3.1342458792@Solace>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <5fa66c8b9093399e5bc3.1342458792@Solace>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <raistlin@linux.it>
Cc: Ian Campbell <Ian.Campbell@citrix.com>, Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>, George Dunlap <george.dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Juergen Gross <juergen.gross@ts.fujitsu.com>, Ian Jackson <Ian.Jackson@eu.citrix.com>, xen-devel <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

Dario,

sorry for joining the discussion so late, but I was busy with other 
things and saw the project in good hands.
Finally I managed to get some testing on these patches.

I took my 8-node machine, alternately equipped with 16GB and 8GB per 
node. Each node has 8 pCPUs.
As a special(i)ty I removed the DIMMs from node 2 and 3 to test Andrew's 
memory-less node patches, leading to this configuration:
node:    memsize    memfree    distances
    0:     17280       4518      10,16,16,22,16,22,16,22
    1:      8192       3639      16,10,16,22,22,16,22,16
    2:         0          0      16,16,10,16,16,16,16,22
    3:         0          0      22,22,16,10,16,16,22,16
    4:     16384       4766      16,22,16,16,10,16,16,16
    5:      8192       2882      22,16,16,16,16,10,22,22
    6:     16384       4866      16,22,16,22,16,22,10,16
    7:      8176       2799      22,16,22,16,16,22,16,10

Then I started 32 guests, each 4 vCPUs and 1 GB of RAM.
Now since the code prefers free memory so much over free CPUs, the 
placement was the following:
node0: guests 2,5,8,11,14,17,20,25,30
node1: guests 21,27
node2: none
node3: none
node4: guests 1,4,7,10,13,16,19,23,29
node5: guests 24,31
node6: guests 3,6,9,12,15,18,22,28
node7: guests 26,32

As you can see, the nodes with more memory are _way_ overloaded, while 
the lower memory ones are underutilized. In fact the first 20 guests 
didn't use the other nodes at all.
I don't care so much about the two memory-less nodes, but I'd like to 
know how you came to the magic "3" in the formula:

> +
> +    return sign(3*freememkb_diff + nrdomains_diff);
> +}

I haven't done any measurements on this, but I guess scheduling 36 vCPUs 
on 8 pCPUs has a much bigger performance penalty than any remote NUMA 
access, which nowadays is much better than a few years ago, with big L3 
caches, better predictors and faster interconnects.

I will now put in the memory again and also try to play a bit with the 
amount of memory and VCPUs per guest.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany