From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Liu Subject: Re: RFC: vNUMA project Date: Wed, 12 Nov 2014 12:14:48 +0000 Message-ID: <20141112121448.GB28075@zion.uk.xensource.com> References: <20141111173606.GC21312@zion.uk.xensource.com> <54624F6A.40002@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <54624F6A.40002@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: David Vrabel Cc: Dario Faggioli , Wei Liu , Jan Beulich , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Tue, Nov 11, 2014 at 06:03:22PM +0000, David Vrabel wrote: > On 11/11/14 17:36, Wei Liu wrote: > > # What's already implemented? > > > > PV vNUMA support in libxl/xl and Linux kernel. > > Linux doesn't have vnuma yet, although the last set of patches I saw > looked fine and were waiting for acks from x86 maintainers I think. > What I meant was I have those implemented but not yet posted. ;-) > > # NUMA-aware ballooning > > > > It's agreed that NUMA-aware ballooning should be achieved solely in > > hypervisor. Everything should happen under the hood without guest > > knowing vnode to pnode mapping. > > > > As far as I can tell, existing guests (Linux and FreeBSD) use > > XENMEM_populate_physmap to balloon up. There's a hypercall > > called XENMEM_increase_reservation but it's not used > > by Linux and FreeBSD. > > > > I can think of two options to implement NUMA-aware ballooning: > > > > 1. Modify XENMEM_populate_physmap to take into account vNUMA hint > > when it tries to allocate a page for guest. > [...] > > Option #1 requires less modification to guest, because guest won't > > need to switch to new hypercall. It's unclear at this point if a guest > > asks to populate a gpfn that doesn't belong to any vnode, what Xen > > should do about it. Should it be permissive or strict? > > There are XENMEMF flags to request exact node or not -- leave it up to > the balloon driver. The Linux balloon driver could try exact on all > nodes before falling back to permissive or just always try inexact. > > Perhaps a XENMEMF_vnode bit to indicate the node is virtual? > Good idea. It should be easy to make it work. > > > > # HVM vNUMA > > > > HVM vNUMA is implemented as followed: > > > > 1. Libxl generates vNUMA information and passes it to hvmloader. > > 2. Hvmloader build SRAT table. > > > > Note that hvmloader is capable of relocating memory. This means > > toolstack and guest can have different ideas of the memory layout. > > Why can't hvmloader update the vnuma tables after it has relocated memory? > Because setvnuma is a domctl which cannot be issued by hvmloader. Wei. > David