Re: [PATCH v10 6/9] libxl: build numa nodes memory blocks

From: Elena Ufimtseva <ufimtseva@gmail.com>
To: Ian Campbell <Ian.Campbell@citrix.com>, Wei Liu <wei.liu2@citrix.com>
Cc: Keir Fraser <keir@xen.org>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Matt Wilson <msw@linux.com>,
	Dario Faggioli <dario.faggioli@citrix.com>,
	Li Yechen <lccycc123@gmail.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: [PATCH v10 6/9] libxl: build numa nodes memory blocks
Date: Thu, 4 Sep 2014 23:50:06 -0400	[thread overview]
Message-ID: <CAEr7rXi+eeeQK4oip=-1iHVHB_hXSfpmrXXYQ2riMhAdCBQmVA@mail.gmail.com> (raw)
In-Reply-To: <1409757696.3323.33.camel@kazak.uk.xensource.com>

On Wed, Sep 3, 2014 at 11:21 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Wed, 2014-09-03 at 00:24 -0400, Elena Ufimtseva wrote:
>> Create the vmemrange structure based on the
>> PV guests E820 map. Values are in in Megabytes.
>> Also export the E820 filter code e820_sanitize
>> out to be available internally.
>> As Xen can support mutiranges vNUMA nodes, for
>> PV guest it is one range per one node. The other
>> domain types should have their building routines
>> implemented.
>>
>> Changes since v8:
>>     - Added setting of the vnuma memory ranges node ids;
>>
>> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
>> ---
>>  tools/libxl/libxl_internal.h |    9 ++
>>  tools/libxl/libxl_numa.c     |  201 ++++++++++++++++++++++++++++++++++++++++++
>>  tools/libxl/libxl_x86.c      |    3 +-
>>  3 files changed, 212 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index beb052e..63ccb5e 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -3088,6 +3088,15 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
>>      libxl_bitmap_copy(CTX, &cndt->nodemap, nodemap);
>>  }
>>
>> +bool libxl__vnodemap_is_usable(libxl__gc *gc, libxl_domain_build_info *info);
>> +
>> +int e820_sanitize(libxl_ctx *ctx, struct e820entry src[], uint32_t *nr_entries,
>> +                  unsigned long map_limitkb, unsigned long balloon_kb);
>
> This is x86 specific, so simply exposing it to common libxl code as
> you've done will break on arm.
>
> I'm not sure if this means moving the whole thing to libxl_x86 and
> adding an ARM stub or if there is some common stuff from which the x86
> stuff needs to be abstracted.
>
>> +
>> +int libxl__vnuma_align_mem(libxl__gc *gc, uint32_t domid,
>> +                           struct libxl_domain_build_info *b_info,
>> +                           vmemrange_t *memblks);
>> +
>>  _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
>>                                     const libxl_ms_vm_genid *id);
>>
>> diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c
>> index 94ca4fe..c416faf 100644
>> --- a/tools/libxl/libxl_numa.c
>> +++ b/tools/libxl/libxl_numa.c
>> @@ -19,6 +19,10 @@
>>
>>  #include "libxl_internal.h"
>>
>> +#include "libxl_vnuma.h"
>> +
>> +#include "xc_private.h"
>> +
>>  /*
>>   * What follows are helpers for generating all the k-combinations
>>   * without repetitions of a set S with n elements in it. Formally
>> @@ -508,6 +512,203 @@ int libxl__get_numa_candidate(libxl__gc *gc,
>>  }
>>
>>  /*
>> + * Check if we can fit vnuma nodes to numa pnodes
>> + * from vnode_to_pnode array.
>> + */
>> +bool libxl__vnodemap_is_usable(libxl__gc *gc,
>> +                            libxl_domain_build_info *info)
>> +{
>> +    unsigned int i;
>> +    libxl_numainfo *ninfo = NULL;
>> +    unsigned long long *claim;
>> +    unsigned int node;
>> +    uint64_t *sz_array;
>
> I don't think you set this, so please make it const so as to make sure.
>
>> +    int nr_nodes = 0;
>> +
>> +    /* Cannot use specified mapping if not NUMA machine. */
>> +    ninfo = libxl_get_numainfo(CTX, &nr_nodes);
>> +    if (ninfo == NULL)
>> +        return false;
>> +
>> +    sz_array = info->vnuma_mem;
>> +    claim = libxl__calloc(gc, info->vnodes, sizeof(*claim));
>> +    /* Get total memory required on each physical node. */
>> +    for (i = 0; i < info->vnodes; i++)
>> +    {
>> +        node = info->vnuma_vnodemap[i];
>> +
>> +        if (node < nr_nodes)
>> +            claim[node] += (sz_array[i] << 20);
>> +        else
>> +            goto vnodemapout;
>> +   }
>> +   for (i = 0; i < nr_nodes; i++) {
>> +       if (claim[i] > ninfo[i].free)
>> +          /* Cannot complete user request, falling to default. */
>> +          goto vnodemapout;
>> +   }
>> +
>> + vnodemapout:
>> +   return true;
>
> Apart from non-NUMA systems, you always return true. Is that really
> correct right? I'd expect one or the other "goto vnodemapout" to want to
> return false.
>
> "out" is OK as a label name.
>
>
>> +
>> +/*
>> + * For each node, build memory block start and end addresses.
>> + * Substract any memory hole from the range found in e820 map.
>> + * vnode memory size are passed here in megabytes, the result is
>> + * in memory block addresses.
>> + * Linux kernel will adjust numa memory block sizes on its own.
>> + * But we want to provide to the kernel numa block addresses that
>> + * will be the same in kernel and hypervisor.
>
> You shouldn't be making these sorts of guest OS assumptions. This is not
> only Linux specific but also (presumably) specific to the version of
> Linux you happened to be looking at.
>
> Please define the semantics wrt the interface and guarantees which Xen
> wants to provide to all PV guests, not just one particular kernel/
>
>> + */
>> +int libxl__vnuma_align_mem(libxl__gc *gc,
>
> Is "align" in the name here accurate? The doc comment says "build"
> instead, which suggests it is creating things (perhaps aligned as it
> goes).
>
>> +                            uint32_t domid,
>> +                            /* IN: mem sizes in megabytes */
>> +                            libxl_domain_build_info *b_info,
>
> if it is input only then please make it const.
>
>> +                            /* OUT: linux NUMA blocks addresses */
>
> Again -- Linux specific.
>
>> +    libxl_ctx *ctx = libxl__gc_owner(gc);
>
> You can use CTX instead of creating this local thing.
>
> Ian.
>

+ Wei

Looks like it will take a bit more time to make sure this part is in a
good shape.
I will poke around to see how it can be done.

Wei, maybe you will some idea?

-- 
Elena