Re: [PATCH 00/11] PV NUMA Guests

From: Dulloor <dulloor@gmail.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: xen-devel@lists.xensource.com, Keir Fraser <keir.fraser@eu.citrix.com>
Subject: Re: [PATCH 00/11] PV NUMA Guests
Date: Mon, 5 Apr 2010 23:51:57 -0400	[thread overview]
Message-ID: <m2q940bcfd21004052051o71080a95s8443d4c92918a62f@mail.gmail.com> (raw)
In-Reply-To: <0aec86b6-f895-4a73-989b-76ee5d5f3874@default>

Dan, Sorry I missed one of your previous mails on the topic too, so I
have copied answers to those too.

> Could you comment on if/how these work when memory is more
> dynamically allocated (e.g. via an active balloon driver
> in a guest)?
The balloon driver is also made numa-aware and uses (the same)
enlightenment to derive the guest-node to physical-node mapping.
Please refer to my previously submitted patch for this
(http://old.nabble.com/Xen-devel--XEN-PATCH---Linux-PVOPS--ballooning-on-numa-domains-td26262334.html).
I intend to send out a refreshed patch once the basic guest numa is
checked in.

> Specifically, I'm wondering if you are running
> multiple domains, all are actively ballooning, and there
> is a mix of guest NUMA policies, how do you ensure that
> non-CONFINE'd domains don't starve a CONFINE'd domain?
We first try to CONFINE a domain and only then proceed to STRIPE or
SPLIT(if capable) the domain. So, in this (automatic) global domain
memory allocation scheme, there is no possibility of starvation from
memory pov. Hope I got your question right.

> I'd be interested in your thoughts on numa-aware tmem
> as well as the other dynamic memory mechanisms in Xen 4.0.
> Tmem is special in that it uses primarily full-page copies
> from/to tmem-space to/from guest-space so, assuming the
> interconnect can pipeline/stream a memcpy, overhead of
> off-node memory vs on-node memory should be less
> noticeable.  However tmem uses large data structures
> (rbtrees and radix-trees) and the lookup process might
> benefit from being NUMA-aware.
For the tmem, I was thinking of the ability to specify a set of nodes
from which the tmem-space memory is preferred which could be derived
from the domain's numa enlightenment, but as you mentioned the
full-page copy overhead is less noticeable (at least on my smaller
NUMA machine). But, the rate would determine if we should do this to
reduce inter-node traffic. What do you suggest ?  I was looking at the
data structures too.

> Also, I will be looking into adding some page-sharing
> techniques into tmem in the near future.  This (and the
> existing page sharing feature just added to 4.0) may
> create some other interesting challenges for NUMA-awareness.
I have just started reading up on the memsharing feature of Xen. I
would be glad to get your input on NUMA challenges over there.

thanks
dulloor

On Mon, Apr 5, 2010 at 10:52 AM, Dan Magenheimer
<dan.magenheimer@oracle.com> wrote:
> Could you comment on if/how these work when memory is more
> dynamically allocated (e.g. via an active balloon driver
> in a guest)?  Specifically, I'm wondering if you are running
> multiple domains, all are actively ballooning, and there
> is a mix of guest NUMA policies, how do you ensure that
> non-CONFINE'd domains don't starve a CONFINE'd domain?
>
> Thanks,
> Dan
>
>> -----Original Message-----
>> From: Dulloor [mailto:dulloor@gmail.com]
>> Sent: Sunday, April 04, 2010 1:30 PM
>> To: xen-devel@lists.xensource.com
>> Cc: Keir Fraser
>> Subject: [Xen-devel] [PATCH 00/11] PV NUMA Guests
>>
>> The set of patches implements virtual NUMA-enlightenment to support
>> NUMA-aware PV guests. In more detail, the patch implements the
>> following :
>>
>> * For the NUMA systems, the following memory allocation strategies are
>> implemented :
>>  - CONFINE : Confine the VM memory allocation to a single node. As
>> opposed to the current method of doing this in python, the patch
>> implements this in libxc(along with other strategies) and with
>> assurance that the memory actually comes from the selected node.
>> - STRIPE : If the VM memory doesn't fit in a single node and if the VM
>> is not compiled with guest-numa-support, the memory is allocated
>> striped across a selected max-set of nodes.
>> - SPLIT : If the VM memory doesn't fit in a single node and if the VM
>> is compiled with guest-numa-support, the memory is allocated split
>> (equally for now) from the min-set of nodes. The  VM is then made
>> aware of this NUMA allocation (virtual NUMA enlightenment).
>> -DEFAULT : This is the existing allocation scheme.
>>
>> * If the numa-guest support is compiled into the PV guest, we add
>> numa-guest-support to xen features elfnote. The xen tools use this to
>> determine if SPLIT strategy can be applied.
>>
>> * The PV guest uses the virtual NUMA enlightenment to setup its NUMA
>> layout (at the time of initmem_init)
>>
>> Please comment.
>>
>> -dulloor
>>
>> Signed-off-by: Dulloor Rao <dulloor@gatech.edu>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>