All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/19] Virtual NUMA for PV and HVM
@ 2014-12-01 15:33 Wei Liu
  2014-12-01 15:33 ` [PATCH v2 01/19] xen: dump vNUMA information with debug key "u" Wei Liu
                   ` (19 more replies)
  0 siblings, 20 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Hi all

This is version 2 of this series.

This patch series implements virtual NUMA support for both PV and HVM guest.
That is, admin can configure via libxl what virtual NUMA topology the guest
sees.

This is the stage 1 (basic vNUMA support) and part of stage 2 (vNUMA-ware
ballooning, hypervisor side) described in my previous email to xen-devel [0].

This series is broken into several parts:

1. xen patches: vNUMA debug output and vNUMA-aware memory hypercall support.
2. libxc/libxl support for PV vNUMA.
3. libxc/libxl/hypervisor support for HVM vNUMA.
4. xl vNUMA configuration documentation and parser.

I think one significant difference from Elena's work is that this patch series
makes use of multiple vmemranges should there be a memory hole, instead of
shrinking ram. This matches the behaviour of real hardware.

The vNUMA auto placement algorithm is missing at the moment and Dario is
working on it.

This series can be found at:
 git://xenbits.xen.org/people/liuw/xen.git wip.vnuma-v2

With this series, the following configuration can be used to enabled virtual
NUMA support, and it works for both PV and HVM guests.

memory = 6000
vnuma_memory = [3000, 3000]
vnuma_vcpu_map = [0, 1]
vnuma_pnode_map = [0, 0]
vnuma_vdistances = [ [10,30], [30,10] ]

For example output of guest NUMA information, please look at [1].

Wei.

[0] <20141111173606.GC21312@zion.uk.xensource.com>
[1] <1416582421-10789-1-git-send-email-wei.liu2@citrix.com>

Changes in v2:
1. Make vnuma_vdistances mandatory.
2. Use nested list to specify distances among nodes.
3. Hvmloader uses hypercall to retrieve vNUMA information.
4. Fix some problems spotted by Jan.

Wei Liu (19):
  xen: dump vNUMA information with debug key "u"
  xen: make two memory hypercalls vNUMA-aware
  libxc: allocate memory with vNUMA information for PV guest
  libxl: add emacs local variables in libxl_{x86,arm}.c
  libxl: introduce vNUMA types
  libxl: add vmemrange to libxl__domain_build_state
  libxl: introduce libxl__vnuma_config_check
  libxl: x86: factor out e820_host_sanitize
  libxl: functions to build vmemranges for PV guest
  libxl: build, check and pass vNUMA info to Xen for PV guest
  xen: handle XENMEMF_get_vnumainfo in compat_memory_op
  hvmloader: retrieve vNUMA information from hypervisor
  hvmloader: construct SRAT
  hvmloader: construct SLIT
  libxc: allocate memory with vNUMA information for HVM guest
  libxl: build, check and pass vNUMA info to Xen for HVM guest
  libxl: disallow memory relocation when vNUMA is enabled
  libxlutil: nested list support
  xl: vNUMA support

 docs/man/xl.cfg.pod.5                   |   39 ++++++
 tools/firmware/hvmloader/Makefile       |    2 +-
 tools/firmware/hvmloader/acpi/acpi2_0.h |   61 +++++++++
 tools/firmware/hvmloader/acpi/build.c   |  109 +++++++++++++++
 tools/firmware/hvmloader/hvmloader.c    |    3 +
 tools/firmware/hvmloader/vnuma.c        |   84 ++++++++++++
 tools/firmware/hvmloader/vnuma.h        |   56 ++++++++
 tools/libxc/include/xc_dom.h            |    5 +
 tools/libxc/include/xenguest.h          |    7 +
 tools/libxc/xc_dom_x86.c                |   72 ++++++++--
 tools/libxc/xc_hvm_build_x86.c          |  224 +++++++++++++++++++-----------
 tools/libxc/xc_private.h                |    2 +
 tools/libxl/Makefile                    |    2 +-
 tools/libxl/libxl_arch.h                |    6 +
 tools/libxl/libxl_arm.c                 |   17 +++
 tools/libxl/libxl_create.c              |    9 ++
 tools/libxl/libxl_dm.c                  |    6 +-
 tools/libxl/libxl_dom.c                 |   99 ++++++++++++++
 tools/libxl/libxl_internal.h            |   18 +++
 tools/libxl/libxl_types.idl             |    9 ++
 tools/libxl/libxl_vnuma.c               |  228 +++++++++++++++++++++++++++++++
 tools/libxl/libxl_x86.c                 |  113 +++++++++++++--
 tools/libxl/libxlu_cfg.c                |  180 ++++++++++++++----------
 tools/libxl/libxlu_cfg_i.h              |   15 +-
 tools/libxl/libxlu_cfg_y.c              |  110 +++++++--------
 tools/libxl/libxlu_cfg_y.h              |    2 +-
 tools/libxl/libxlu_cfg_y.y              |   19 ++-
 tools/libxl/libxlu_internal.h           |   24 +++-
 tools/libxl/libxlutil.h                 |   11 +-
 tools/libxl/xl_cmdimpl.c                |  147 ++++++++++++++++++++
 xen/arch/x86/numa.c                     |   47 ++++++-
 xen/common/compat/memory.c              |   38 ++++++
 xen/common/kernel.c                     |    1 +
 xen/common/memory.c                     |   58 +++++++-
 xen/include/public/features.h           |    3 +
 xen/include/public/memory.h             |    2 +
 xen/include/xlat.lst                    |    2 +
 37 files changed, 1568 insertions(+), 262 deletions(-)
 create mode 100644 tools/firmware/hvmloader/vnuma.c
 create mode 100644 tools/firmware/hvmloader/vnuma.h
 create mode 100644 tools/libxl/libxl_vnuma.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 01/19] xen: dump vNUMA information with debug key "u"
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-08 17:01   ` Jan Beulich
  2014-12-01 15:33 ` [PATCH v2 02/19] xen: make two memory hypercalls vNUMA-aware Wei Liu
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Signed-off-by: Elena Ufimsteva <ufimtseva@gmail.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
---
Changes in v2:
1. Use unsigned int for loop vars.
2. Use strlcpy.
3. Properly align output.
---
 xen/arch/x86/numa.c |   47 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 628a40a..3bbc975 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -363,10 +363,13 @@ EXPORT_SYMBOL(node_data);
 static void dump_numa(unsigned char key)
 {
     s_time_t now = NOW();
-    int i;
+    unsigned int i, j, n;
+    int err;
     struct domain *d;
     struct page_info *page;
     unsigned int page_num_node[MAX_NUMNODES];
+    uint64_t mem;
+    struct vnuma_info *vnuma;
 
     printk("'%c' pressed -> dumping numa info (now-0x%X:%08X)\n", key,
            (u32)(now>>32), (u32)now);
@@ -408,6 +411,48 @@ static void dump_numa(unsigned char key)
 
         for_each_online_node ( i )
             printk("    Node %u: %u\n", i, page_num_node[i]);
+
+        if ( !d->vnuma )
+                continue;
+
+        vnuma = d->vnuma;
+        printk("     %u vnodes, %u vcpus:\n", vnuma->nr_vnodes, d->max_vcpus);
+        for ( i = 0; i < vnuma->nr_vnodes; i++ )
+        {
+            err = snprintf(keyhandler_scratch, 12, "%3u",
+                    vnuma->vnode_to_pnode[i]);
+            if ( err < 0 || vnuma->vnode_to_pnode[i] == NUMA_NO_NODE )
+                strlcpy(keyhandler_scratch, "???", 3);
+
+            printk("       vnode  %3u - pnode %s\n", i, keyhandler_scratch);
+            for ( j = 0; j < vnuma->nr_vmemranges; j++ )
+            {
+                if ( vnuma->vmemrange[j].nid == i )
+                {
+                    mem = vnuma->vmemrange[j].end - vnuma->vmemrange[j].start;
+                    printk("%16"PRIu64" MB: %#016"PRIx64" - %#016"PRIx64"\n",
+                           mem >> 20,
+                           vnuma->vmemrange[j].start,
+                           vnuma->vmemrange[j].end);
+                }
+            }
+
+            printk("       vcpus: ");
+            for ( j = 0, n = 0; j < d->max_vcpus; j++ )
+            {
+                if ( vnuma->vcpu_to_vnode[j] == i )
+                {
+                    if ( (n + 1) % 8 == 0 )
+                        printk("%3d\n", j);
+                    else if ( !(n % 8) && n != 0 )
+                        printk("%17d ", j);
+                    else
+                        printk("%3d ", j);
+                    n++;
+                }
+            }
+            printk("\n");
+        }
     }
 
     rcu_read_unlock(&domlist_read_lock);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 02/19] xen: make two memory hypercalls vNUMA-aware
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
  2014-12-01 15:33 ` [PATCH v2 01/19] xen: dump vNUMA information with debug key "u" Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-08 17:05   ` Jan Beulich
  2014-12-01 15:33 ` [PATCH v2 03/19] libxc: allocate memory with vNUMA information for PV guest Wei Liu
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Make XENMEM_increase_reservation and XENMEM_populate_physmap
vNUMA-aware.

That is, if guest requests Xen to allocate memory for specific vnode,
Xen can translate vnode to pnode using vNUMA information of that guest.

XENMEMF_vnode is introduced for the guest to mark the node number is in
fact virtual node number and should be translated by Xen.

XENFEAT_memory_op_vnode_supported is introduced to indicate that Xen is
able to translate virtual node to physical node.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
---
Changes in v2:
1. Return start_extent when vnode translation fails.
2. Expose new feature bit to guest.
3. Fix typo in comment.
---
 xen/common/kernel.c           |    1 +
 xen/common/memory.c           |   58 ++++++++++++++++++++++++++++++++++++++---
 xen/include/public/features.h |    3 +++
 xen/include/public/memory.h   |    2 ++
 4 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index d23c422..9f97f68 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -302,6 +302,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         {
         case 0:
             fi.submap = 0;
+            fi.submap |= (1U << XENFEAT_memory_op_vnode_supported);
             if ( VM_ASSIST(d, VMASST_TYPE_pae_extended_cr3) )
                 fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
             if ( paging_mode_translate(current->domain) )
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 9f21bd3..f78c403 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -692,6 +692,49 @@ out:
     return rc;
 }
 
+static int translate_vnode_to_pnode(struct domain *d,
+                                    struct xen_memory_reservation *r,
+                                    struct memop_args *a)
+{
+    int rc = 0;
+    unsigned int vnode, pnode;
+
+    /* Note: we don't strictly require non-supported bits set to zero,
+     * so we may have exact_vnode bit set for old guests that don't
+     * support vNUMA.
+     *
+     * To distinguish spurious vnode request v.s. real one, check if
+     * d->vnuma exists.
+     */
+    if ( r->mem_flags & XENMEMF_vnode )
+    {
+        read_lock(&d->vnuma_rwlock);
+        if ( d->vnuma )
+        {
+            vnode = XENMEMF_get_node(r->mem_flags);
+
+            if ( vnode < d->vnuma->nr_vnodes )
+            {
+                pnode = d->vnuma->vnode_to_pnode[vnode];
+
+                a->memflags &=
+                    ~MEMF_node(XENMEMF_get_node(r->mem_flags));
+
+                if ( pnode != NUMA_NO_NODE )
+                {
+                    a->memflags |= MEMF_exact_node;
+                    a->memflags |= MEMF_node(pnode);
+                }
+            }
+            else
+                rc = -EINVAL;
+        }
+        read_unlock(&d->vnuma_rwlock);
+    }
+
+    return rc;
+}
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -734,10 +777,6 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             args.memflags = MEMF_bits(address_bits);
         }
 
-        args.memflags |= MEMF_node(XENMEMF_get_node(reservation.mem_flags));
-        if ( reservation.mem_flags & XENMEMF_exact_node_request )
-            args.memflags |= MEMF_exact_node;
-
         if ( op == XENMEM_populate_physmap
              && (reservation.mem_flags & XENMEMF_populate_on_demand) )
             args.memflags |= MEMF_populate_on_demand;
@@ -747,6 +786,17 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return start_extent;
         args.domain = d;
 
+        args.memflags |= MEMF_node(XENMEMF_get_node(reservation.mem_flags));
+        if ( reservation.mem_flags & XENMEMF_exact_node_request )
+            args.memflags |= MEMF_exact_node;
+
+        rc = translate_vnode_to_pnode(d, &reservation, &args);
+        if ( rc )
+        {
+            rcu_unlock_domain(d);
+            return start_extent;
+        }
+
         rc = xsm_memory_adjust_reservation(XSM_TARGET, current->domain, d);
         if ( rc )
         {
diff --git a/xen/include/public/features.h b/xen/include/public/features.h
index 16d92aa..2110b04 100644
--- a/xen/include/public/features.h
+++ b/xen/include/public/features.h
@@ -99,6 +99,9 @@
 #define XENFEAT_grant_map_identity        12
  */
 
+/* Guest can use XENMEMF_vnode to specify virtual node for memory op. */
+#define XENFEAT_memory_op_vnode_supported 13
+
 #define XENFEAT_NR_SUBMAPS 1
 
 #endif /* __XEN_PUBLIC_FEATURES_H__ */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 595f953..2b5206b 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -55,6 +55,8 @@
 /* Flag to request allocation only from the node specified */
 #define XENMEMF_exact_node_request  (1<<17)
 #define XENMEMF_exact_node(n) (XENMEMF_node(n) | XENMEMF_exact_node_request)
+/* Flag to indicate the node specified is virtual node */
+#define XENMEMF_vnode  (1<<18)
 #endif
 
 struct xen_memory_reservation {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 03/19] libxc: allocate memory with vNUMA information for PV guest
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
  2014-12-01 15:33 ` [PATCH v2 01/19] xen: dump vNUMA information with debug key "u" Wei Liu
  2014-12-01 15:33 ` [PATCH v2 02/19] xen: make two memory hypercalls vNUMA-aware Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2015-01-12 17:11   ` Ian Campbell
  2014-12-01 15:33 ` [PATCH v2 04/19] libxl: add emacs local variables in libxl_{x86, arm}.c Wei Liu
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxc/include/xc_dom.h |    5 +++
 tools/libxc/xc_dom_x86.c     |   72 +++++++++++++++++++++++++++++++++++-------
 tools/libxc/xc_private.h     |    2 ++
 3 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 07d7224..577a017 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -167,6 +167,11 @@ struct xc_dom_image {
     struct xc_dom_loader *kernel_loader;
     void *private_loader;
 
+    /* vNUMA information */
+    unsigned int *vnode_to_pnode;
+    uint64_t *vnode_size;
+    unsigned int nr_vnodes;
+
     /* kernel loader */
     struct xc_dom_arch *arch_hooks;
     /* allocate up to virt_alloc_end */
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index bf06fe4..3286232 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -759,7 +759,8 @@ static int x86_shadow(xc_interface *xch, domid_t domid)
 int arch_setup_meminit(struct xc_dom_image *dom)
 {
     int rc;
-    xen_pfn_t pfn, allocsz, i, j, mfn;
+    xen_pfn_t pfn, allocsz, mfn, total, pfn_base;
+    int i, j;
 
     rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type);
     if ( rc )
@@ -811,18 +812,67 @@ int arch_setup_meminit(struct xc_dom_image *dom)
         /* setup initial p2m */
         for ( pfn = 0; pfn < dom->total_pages; pfn++ )
             dom->p2m_host[pfn] = pfn;
-        
+
+        /* setup dummy vNUMA information if it's not provided */
+        if ( dom->nr_vnodes == 0 )
+        {
+            dom->nr_vnodes = 1;
+            dom->vnode_to_pnode = xc_dom_malloc(dom,
+                                                sizeof(*dom->vnode_to_pnode));
+            dom->vnode_to_pnode[0] = XC_VNUMA_NO_NODE;
+            dom->vnode_size = xc_dom_malloc(dom, sizeof(*dom->vnode_size));
+            dom->vnode_size[0] = ((dom->total_pages << PAGE_SHIFT) >> 20);
+        }
+
+        total = 0;
+        for ( i = 0; i < dom->nr_vnodes; i++ )
+            total += ((dom->vnode_size[i] << 20) >> PAGE_SHIFT);
+        if ( total != dom->total_pages )
+        {
+            xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                         "%s: number of pages requested by vNUMA (0x%"PRIpfn") mismatches number of pages configured for domain (0x%"PRIpfn")\n",
+                         __FUNCTION__, total, dom->total_pages);
+            return -EINVAL;
+        }
+
         /* allocate guest memory */
-        for ( i = rc = allocsz = 0;
-              (i < dom->total_pages) && !rc;
-              i += allocsz )
+        pfn_base = 0;
+        for ( i = 0; i < dom->nr_vnodes; i++ )
         {
-            allocsz = dom->total_pages - i;
-            if ( allocsz > 1024*1024 )
-                allocsz = 1024*1024;
-            rc = xc_domain_populate_physmap_exact(
-                dom->xch, dom->guest_domid, allocsz,
-                0, 0, &dom->p2m_host[i]);
+            unsigned int memflags;
+            uint64_t pages;
+
+            memflags = 0;
+            if ( dom->vnode_to_pnode[i] != XC_VNUMA_NO_NODE )
+            {
+                memflags |= XENMEMF_exact_node(dom->vnode_to_pnode[i]);
+                memflags |= XENMEMF_exact_node_request;
+            }
+
+            pages = (dom->vnode_size[i] << 20) >> PAGE_SHIFT;
+
+            for ( j = 0; j < pages; j += allocsz )
+            {
+                allocsz = pages - j;
+                if ( allocsz > 1024*1024 )
+                    allocsz = 1024*1024;
+
+                rc = xc_domain_populate_physmap_exact(dom->xch,
+                         dom->guest_domid, allocsz, 0, memflags,
+                         &dom->p2m_host[pfn_base+j]);
+
+                if ( rc )
+                {
+                    if ( dom->vnode_to_pnode[i] != XC_VNUMA_NO_NODE )
+                        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                                     "%s: fail to allocate 0x%"PRIx64" pages for vnode %d on pnode %d out of 0x%"PRIpfn"\n",
+                                     __FUNCTION__, pages, i,
+                                     dom->vnode_to_pnode[i], dom->total_pages);
+                    return rc;
+                }
+            }
+
+            pfn_base += pages;
         }
 
         /* Ensure no unclaimed pages are left unused.
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 45b8644..1809674 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -35,6 +35,8 @@
 
 #include <xen/sys/privcmd.h>
 
+#define XC_VNUMA_NO_NODE (~0U)
+
 #if defined(HAVE_VALGRIND_MEMCHECK_H) && !defined(NDEBUG) && !defined(__MINIOS__)
 /* Compile in Valgrind client requests? */
 #include <valgrind/memcheck.h>
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 04/19] libxl: add emacs local variables in libxl_{x86, arm}.c
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (2 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 03/19] libxc: allocate memory with vNUMA information for PV guest Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2015-01-12 17:11   ` Ian Campbell
  2014-12-01 15:33 ` [PATCH v2 05/19] libxl: introduce vNUMA types Wei Liu
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxl/libxl_arm.c |    8 ++++++++
 tools/libxl/libxl_x86.c |    8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 448ac07..34d21f5 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -706,3 +706,11 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
 
     return 0;
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 7589060..9ceb373 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -324,3 +324,11 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
 {
     return 0;
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 05/19] libxl: introduce vNUMA types
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (3 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 04/19] libxl: add emacs local variables in libxl_{x86, arm}.c Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2015-01-12 17:17   ` Ian Campbell
  2014-12-01 15:33 ` [PATCH v2 06/19] libxl: add vmemrange to libxl__domain_build_state Wei Liu
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxl/libxl_types.idl |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f7fc695..75855fb 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -353,6 +353,13 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
     ("budget",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
     ])
 
+libxl_vnode_info = Struct("vnode_info", [
+    ("mem", uint64), # memory size of this node, in MiB
+    ("distances", Array(uint32, "num_distances")), # distances from this node to other nodes
+    ("pnode", uint32), # physical node of this node
+    ("vcpus", libxl_bitmap), # vcpus in this node
+    ])
+
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -373,6 +380,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("disable_migrate", libxl_defbool),
     ("cpuid",           libxl_cpuid_policy_list),
     ("blkdev_start",    string),
+
+    ("vnuma_nodes", Array(libxl_vnode_info, "num_vnuma_nodes")),
     
     ("device_model_version", libxl_device_model_version),
     ("device_model_stubdomain", libxl_defbool),
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 06/19] libxl: add vmemrange to libxl__domain_build_state
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (4 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 05/19] libxl: introduce vNUMA types Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-01 15:33 ` [PATCH v2 07/19] libxl: introduce libxl__vnuma_config_check Wei Liu
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Currently we haven't exported vmemrange interface to libxl user.
Vmemranges are generated during domain build.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxl/libxl_internal.h |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a38f695..6632459 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -971,6 +971,9 @@ typedef struct {
     libxl__file_reference pv_ramdisk;
     const char * pv_cmdline;
     bool pvh_enabled;
+
+    xen_vmemrange_t *vmemranges;
+    uint32_t num_vmemranges;
 } libxl__domain_build_state;
 
 _hidden int libxl__build_pre(libxl__gc *gc, uint32_t domid,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 07/19] libxl: introduce libxl__vnuma_config_check
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (5 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 06/19] libxl: add vmemrange to libxl__domain_build_state Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-01 15:33 ` [PATCH v2 08/19] libxl: x86: factor out e820_host_sanitize Wei Liu
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

This vNUMA function (and future ones) is placed in a new file called
libxl_vnuma.c

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxl/Makefile         |    2 +-
 tools/libxl/libxl_internal.h |    5 ++
 tools/libxl/libxl_vnuma.c    |  138 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxl/libxl_vnuma.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index df08c8a..9fcdfb1 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -93,7 +93,7 @@ LIBXL_LIBS += -lyajl
 LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
 			libxl_internal.o libxl_utils.o libxl_uuid.o \
-			libxl_json.o libxl_aoutils.o libxl_numa.o \
+			libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o \
 			libxl_save_callout.o _libxl_save_msgs_callout.o \
 			libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 6632459..88da7ba 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3392,6 +3392,11 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
     libxl_bitmap_copy(CTX, &cndt->nodemap, nodemap);
 }
 
+/* Check if vNUMA config is valid. Returns 0 if valid. */
+int libxl__vnuma_config_check(libxl__gc *gc,
+                              const libxl_domain_build_info *b_info,
+                              const libxl__domain_build_state *state);
+
 _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
                                    const libxl_ms_vm_genid *id);
 
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
new file mode 100644
index 0000000..d92376d
--- /dev/null
+++ b/tools/libxl/libxl_vnuma.c
@@ -0,0 +1,138 @@
+/*
+ * Copyright (C) 2014      Citrix Ltd.
+ * Author Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+#include "libxl_osdeps.h" /* must come before any other headers */
+#include "libxl_internal.h"
+#include <stdlib.h>
+
+/* Sort vmemranges in ascending order with "start" */
+static int compare_vmemrange(const void *a, const void *b)
+{
+    const xen_vmemrange_t *x = a, *y = b;
+    if (x->start < y->start)
+        return -1;
+    if (x->start > y->start)
+        return 1;
+    return 0;
+}
+
+/* Check if vNUMA configuration is valid:
+ *  1. all pnodes inside vnode_to_pnode array are valid
+ *  2. one vcpu belongs to and only belongs to one vnode
+ *  3. each vmemrange is valid and doesn't overlap with each other
+ */
+int libxl__vnuma_config_check(libxl__gc *gc,
+			      const libxl_domain_build_info *b_info,
+                              const libxl__domain_build_state *state)
+{
+    int i, j, rc = ERROR_INVAL, nr_nodes;
+    libxl_numainfo *ninfo = NULL;
+    uint64_t total_ram = 0;
+    libxl_bitmap cpumap;
+    libxl_vnode_info *p;
+
+    libxl_bitmap_init(&cpumap);
+
+    /* Check pnode specified is valid */
+    ninfo = libxl_get_numainfo(CTX, &nr_nodes);
+    if (!ninfo) {
+        LIBXL__LOG(CTX, LIBXL__LOG_ERROR, "libxl_get_numainfo failed");
+        goto out;
+    }
+
+    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+        uint32_t pnode;
+
+        p = &b_info->vnuma_nodes[i];
+        pnode = p->pnode;
+
+        /* The pnode specified is not valid? */
+        if (pnode >= nr_nodes) {
+            LIBXL__LOG(CTX, LIBXL__LOG_ERROR,
+                       "Invalid pnode %d specified",
+                       pnode);
+            goto out;
+        }
+
+        total_ram += p->mem;
+    }
+
+    if (total_ram != (b_info->max_memkb >> 10)) {
+        LIBXL__LOG(CTX, LIBXL__LOG_ERROR,
+                   "Total ram in vNUMA configuration 0x%"PRIx64" while maxmem specified 0x%"PRIx64,
+                   total_ram, (b_info->max_memkb >> 10));
+        goto out;
+    }
+
+    /* Check vcpu mapping */
+    libxl_cpu_bitmap_alloc(CTX, &cpumap, b_info->max_vcpus);
+    libxl_bitmap_set_none(&cpumap);
+    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+        p = &b_info->vnuma_nodes[i];
+        libxl_for_each_set_bit(j, p->vcpus) {
+            if (!libxl_bitmap_test(&cpumap, j))
+                libxl_bitmap_set(&cpumap, j);
+            else {
+                LIBXL__LOG(CTX, LIBXL__LOG_ERROR,
+                           "Try to assign vcpu %d to vnode %d while it's already assigned to other vnode",
+                           j, i);
+                goto out;
+            }
+        }
+    }
+
+    for (i = 0; i < b_info->max_vcpus; i++) {
+        if (!libxl_bitmap_test(&cpumap, i)) {
+            LIBXL__LOG(CTX, LIBXL__LOG_ERROR,
+                       "Vcpu %d is not assigned to any vnode", i);
+            goto out;
+        }
+    }
+
+    /* Check vmemranges */
+    qsort(state->vmemranges, state->num_vmemranges, sizeof(xen_vmemrange_t),
+          compare_vmemrange);
+
+    for (i = 0; i < state->num_vmemranges; i++) {
+        if (state->vmemranges[i].end < state->vmemranges[i].start) {
+                LIBXL__LOG(CTX, LIBXL__LOG_ERROR,
+                           "Vmemrange end < start");
+                goto out;
+        }
+    }
+
+    for (i = 0; i < state->num_vmemranges - 1; i++) {
+        if (state->vmemranges[i].end > state->vmemranges[i+1].start) {
+            LIBXL__LOG(CTX, LIBXL__LOG_ERROR,
+                       "Vmemranges overlapped, 0x%"PRIx64"-0x%"PRIx64", 0x%"PRIx64"-0x%"PRIx64,
+                       state->vmemranges[i].start, state->vmemranges[i].end,
+                       state->vmemranges[i+1].start, state->vmemranges[i+1].end);
+            goto out;
+        }
+    }
+
+    rc = 0;
+out:
+    if (ninfo) libxl_numainfo_dispose(ninfo);
+    libxl_bitmap_dispose(&cpumap);
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 08/19] libxl: x86: factor out e820_host_sanitize
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (6 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 07/19] libxl: introduce libxl__vnuma_config_check Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-01 15:33 ` [PATCH v2 09/19] libxl: functions to build vmemranges for PV guest Wei Liu
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

This function gets the machine E820 map and sanitize it according to PV
guest configuration.

This will be used in later patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxl/libxl_x86.c |   31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 9ceb373..e959e37 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -207,6 +207,27 @@ static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[],
     return 0;
 }
 
+static int e820_host_sanitize(libxl__gc *gc,
+                              libxl_domain_build_info *b_info,
+                              struct e820entry map[],
+                              uint32_t *nr)
+{
+    int rc;
+
+    rc = xc_get_machine_memory_map(CTX->xch, map, E820MAX);
+    if (rc < 0) {
+        errno = rc;
+        return ERROR_FAIL;
+    }
+
+    *nr = rc;
+
+    rc = e820_sanitize(CTX, map, nr, b_info->target_memkb,
+                       (b_info->max_memkb - b_info->target_memkb) +
+                       b_info->u.pv.slack_memkb);
+    return rc;
+}
+
 static int libxl__e820_alloc(libxl__gc *gc, uint32_t domid,
         libxl_domain_config *d_config)
 {
@@ -223,15 +244,7 @@ static int libxl__e820_alloc(libxl__gc *gc, uint32_t domid,
     if (!libxl_defbool_val(b_info->u.pv.e820_host))
         return ERROR_INVAL;
 
-    rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
-    if (rc < 0) {
-        errno = rc;
-        return ERROR_FAIL;
-    }
-    nr = rc;
-    rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
-                       (b_info->max_memkb - b_info->target_memkb) +
-                       b_info->u.pv.slack_memkb);
+    rc = e820_host_sanitize(gc, b_info, map, &nr);
     if (rc)
         return ERROR_FAIL;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 09/19] libxl: functions to build vmemranges for PV guest
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (7 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 08/19] libxl: x86: factor out e820_host_sanitize Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-01 15:33 ` [PATCH v2 10/19] libxl: build, check and pass vNUMA info to Xen " Wei Liu
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

One vmemrange is generated for each vnode. And for those guests who care
about machine E820 map, vmemranges are further split to accommodate
memory holes.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxl/libxl_arch.h     |    6 ++++
 tools/libxl/libxl_arm.c      |    9 +++++
 tools/libxl/libxl_internal.h |    5 +++
 tools/libxl/libxl_vnuma.c    |   34 +++++++++++++++++++
 tools/libxl/libxl_x86.c      |   74 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 128 insertions(+)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index d3bc136..e249048 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -27,4 +27,10 @@ int libxl__arch_domain_init_hw_description(libxl__gc *gc,
 int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
                                       libxl_domain_build_info *info,
                                       struct xc_dom_image *dom);
+
+/* build vNUMA vmemrange with arch specific information */
+int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
+                                      uint32_t domid,
+                                      libxl_domain_build_info *b_info,
+                                      libxl__domain_build_state *state);
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 34d21f5..1f1bc24 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -707,6 +707,15 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
     return 0;
 }
 
+int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
+                                      uint32_t domid,
+                                      libxl_domain_build_info *info,
+                                      libxl__domain_build_state *state)
+{
+    /* Don't touch anything. */
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 88da7ba..1678715 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3397,6 +3397,11 @@ int libxl__vnuma_config_check(libxl__gc *gc,
                               const libxl_domain_build_info *b_info,
                               const libxl__domain_build_state *state);
 
+int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
+                                    uint32_t domid,
+                                    libxl_domain_build_info *b_info,
+                                    libxl__domain_build_state *state);
+
 _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
                                    const libxl_ms_vm_genid *id);
 
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
index d92376d..b8d9268 100644
--- a/tools/libxl/libxl_vnuma.c
+++ b/tools/libxl/libxl_vnuma.c
@@ -14,6 +14,7 @@
  */
 #include "libxl_osdeps.h" /* must come before any other headers */
 #include "libxl_internal.h"
+#include "libxl_arch.h"
 #include <stdlib.h>
 
 /* Sort vmemranges in ascending order with "start" */
@@ -129,6 +130,39 @@ out:
     return rc;
 }
 
+/* Build vmemranges for PV guest */
+int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
+                                    uint32_t domid,
+                                    libxl_domain_build_info *b_info,
+                                    libxl__domain_build_state *state)
+{
+    int i;
+    uint64_t next;
+    xen_vmemrange_t *v = NULL;
+
+    assert(state->vmemranges == NULL);
+
+    /* Generate one vmemrange for each virtual node. */
+    next = 0;
+    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+        libxl_vnode_info *p = &b_info->vnuma_nodes[i];
+
+        v = libxl__realloc(gc, v, sizeof(*v) * (i+1));
+
+        v[i].start = next;
+        v[i].end = next + (p->mem << 20); /* mem is in MiB */
+        v[i].flags = 0;
+        v[i].nid = i;
+
+        next = v[i].end;
+    }
+
+    state->vmemranges = v;
+    state->num_vmemranges = i;
+
+    return libxl__arch_vnuma_build_vmemrange(gc, domid, b_info, state);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index e959e37..2018afc 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -338,6 +338,80 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
     return 0;
 }
 
+int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
+                                      uint32_t domid,
+                                      libxl_domain_build_info *b_info,
+                                      libxl__domain_build_state *state)
+{
+    int i, x, n, rc;
+    uint32_t nr_e820;
+    struct e820entry map[E820MAX];
+    xen_vmemrange_t *v;
+
+    /* Only touch vmemranges if it's PV guest and e820_host is true */
+    if (!(b_info->type == LIBXL_DOMAIN_TYPE_PV &&
+          libxl_defbool_val(b_info->u.pv.e820_host))) {
+        rc = 0;
+        goto out;
+    }
+
+    rc = e820_host_sanitize(gc, b_info, map, &nr_e820);
+    if (rc) goto out;
+
+    /* Ditch old vmemranges and start with host E820 map. Note, memory
+     * was gc allocated.
+     */
+    state->vmemranges = NULL;
+    state->num_vmemranges = 0;
+
+    n = 0; /* E820 counter */
+    x = 0;
+    v = NULL;
+    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+        libxl_vnode_info *p = &b_info->vnuma_nodes[i];
+        uint64_t remaining = (p->mem << 20);
+
+        while (remaining > 0) {
+            if (n >= nr_e820) {
+                rc = ERROR_FAIL;
+                goto out;
+            }
+
+            /* Skip non RAM region */
+            if (map[n].type != E820_RAM) {
+                n++;
+                continue;
+            }
+
+            v = libxl__realloc(gc, v, sizeof(xen_vmemrange_t) * (x+1));
+
+            if (map[n].size >= remaining) {
+                v[x].start = map[n].addr;
+                v[x].end = map[n].addr + remaining;
+                map[n].addr += remaining;
+                map[n].size -= remaining;
+                remaining = 0;
+            } else {
+                v[x].start = map[n].addr;
+                v[x].end = map[n].addr + map[n].size;
+                remaining -= map[n].size;
+                n++;
+            }
+
+            v[x].flags = 0;
+            v[x].nid = i;
+            x++;
+        }
+    }
+
+    state->vmemranges = v;
+    state->num_vmemranges = x;
+
+    rc = 0;
+out:
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 10/19] libxl: build, check and pass vNUMA info to Xen for PV guest
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (8 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 09/19] libxl: functions to build vmemranges for PV guest Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-01 15:33 ` [PATCH v2 11/19] xen: handle XENMEMF_get_vnumainfo in compat_memory_op Wei Liu
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxl/libxl_dom.c |   71 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 74ea84b..7339bbc 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -512,6 +512,51 @@ retry_transaction:
     return 0;
 }
 
+static int set_vnuma_info(libxl__gc *gc, uint32_t domid,
+                          const libxl_domain_build_info *info,
+                          const libxl__domain_build_state *state)
+{
+    int rc = 0;
+    int i, nr_vdistance;
+    unsigned int *vcpu_to_vnode, *vnode_to_pnode, *vdistance = NULL;
+
+    vcpu_to_vnode = libxl__calloc(gc, info->max_vcpus,
+                                  sizeof(unsigned int));
+    vnode_to_pnode = libxl__calloc(gc, info->num_vnuma_nodes,
+                                   sizeof(unsigned int));
+
+    nr_vdistance = info->num_vnuma_nodes * info->num_vnuma_nodes;
+    vdistance = libxl__calloc(gc, nr_vdistance, sizeof(unsigned int));
+
+    for (i = 0; i < info->num_vnuma_nodes; i++) {
+        libxl_vnode_info *v = &info->vnuma_nodes[i];
+        int bit;
+
+        /* vnode to pnode mapping */
+        vnode_to_pnode[i] = v->pnode;
+
+        /* vcpu to vnode mapping */
+        libxl_for_each_set_bit(bit, v->vcpus)
+            vcpu_to_vnode[bit] = i;
+
+        /* node distances */
+        assert(info->num_vnuma_nodes == v->num_distances);
+        memcpy(vdistance + (i * info->num_vnuma_nodes),
+               v->distances,
+               v->num_distances * sizeof(unsigned int));
+    }
+
+    if ( xc_domain_setvnuma(CTX->xch, domid, info->num_vnuma_nodes,
+                            state->num_vmemranges, info->max_vcpus,
+                            state->vmemranges, vdistance,
+                            vcpu_to_vnode, vnode_to_pnode) < 0 ) {
+        LOGE(ERROR, "xc_domain_setvnuma failed");
+        rc = ERROR_FAIL;
+    }
+
+    return rc;
+}
+
 int libxl__build_pv(libxl__gc *gc, uint32_t domid,
              libxl_domain_build_info *info, libxl__domain_build_state *state)
 {
@@ -569,6 +614,32 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
     dom->xenstore_domid = state->store_domid;
     dom->claim_enabled = libxl_defbool_val(info->claim_mode);
 
+    if (info->num_vnuma_nodes != 0) {
+        int i;
+
+        ret = libxl__vnuma_build_vmemrange_pv(gc, domid, info, state);
+        if (ret) {
+            LOGE(ERROR, "cannot build vmemranges");
+            goto out;
+        }
+        ret = libxl__vnuma_config_check(gc, info, state);
+        if (ret) goto out;
+
+        ret = set_vnuma_info(gc, domid, info, state);
+        if (ret) goto out;
+
+        dom->nr_vnodes = info->num_vnuma_nodes;
+        dom->vnode_to_pnode = xc_dom_malloc(dom, sizeof(*dom->vnode_to_pnode) *
+                                            dom->nr_vnodes);
+        dom->vnode_size = xc_dom_malloc(dom, sizeof(*dom->vnode_size) *
+                                        dom->nr_vnodes);
+
+        for (i = 0; i < dom->nr_vnodes; i++) {
+            dom->vnode_to_pnode[i] = info->vnuma_nodes[i].pnode;
+            dom->vnode_size[i] = info->vnuma_nodes[i].mem;
+        }
+    }
+
     if ( (ret = xc_dom_boot_xen_init(dom, ctx->xch, domid)) != 0 ) {
         LOGE(ERROR, "xc_dom_boot_xen_init failed");
         goto out;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 11/19] xen: handle XENMEMF_get_vnumainfo in compat_memory_op
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (9 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 10/19] libxl: build, check and pass vNUMA info to Xen " Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-09  9:09   ` Jan Beulich
  2014-12-01 15:33 ` [PATCH v2 12/19] hvmloader: retrieve vNUMA information from hypervisor Wei Liu
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, Jan Beulich,
	boris.ostrovsky, ufimtseva

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/common/compat/memory.c |   38 ++++++++++++++++++++++++++++++++++++++
 xen/include/xlat.lst       |    2 ++
 2 files changed, 40 insertions(+)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index 06c90be..58cb847 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -15,6 +15,7 @@ CHECK_TYPE(domid);
 #undef xen_domid_t
 
 CHECK_mem_access_op;
+CHECK_vmemrange;
 
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
@@ -32,12 +33,14 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
             struct xen_add_to_physmap *atp;
             struct xen_add_to_physmap_batch *atpb;
             struct xen_remove_from_physmap *xrfp;
+            struct xen_vnuma_topology_info *vnuma;
         } nat;
         union {
             struct compat_memory_reservation rsrv;
             struct compat_memory_exchange xchg;
             struct compat_add_to_physmap atp;
             struct compat_add_to_physmap_batch atpb;
+            struct compat_vnuma_topology_info vnuma;
         } cmp;
 
         set_xen_guest_handle(nat.hnd, COMPAT_ARG_XLAT_VIRT_BASE);
@@ -273,6 +276,33 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
             break;
         }
 
+        case XENMEM_get_vnumainfo:
+	{
+            enum XLAT_vnuma_topology_info_vdistance vdistance =
+                XLAT_vnuma_topology_info_vdistance_h;
+            enum XLAT_vnuma_topology_info_vcpu_to_vnode vcpu_to_vnode =
+                XLAT_vnuma_topology_info_vcpu_to_vnode_h;
+            enum XLAT_vnuma_topology_info_vmemrange vmemrange =
+                XLAT_vnuma_topology_info_vmemrange_h;
+
+            if ( copy_from_guest(&cmp.vnuma, compat, 1) )
+                return -EFAULT;
+
+#define XLAT_vnuma_topology_info_HNDL_vdistance_h(_d_, _s_)		\
+            guest_from_compat_handle((_d_)->vdistance.h, (_s_)->vdistance.h)
+#define XLAT_vnuma_topology_info_HNDL_vcpu_to_vnode_h(_d_, _s_)		\
+            guest_from_compat_handle((_d_)->vcpu_to_vnode.h, (_s_)->vcpu_to_vnode.h)
+#define XLAT_vnuma_topology_info_HNDL_vmemrange_h(_d_, _s_)		\
+            guest_from_compat_handle((_d_)->vmemrange.h, (_s_)->vmemrange.h)
+
+            XLAT_vnuma_topology_info(nat.vnuma, &cmp.vnuma);
+
+#undef XLAT_vnuma_topology_info_HNDL_vdistance_h
+#undef XLAT_vnuma_topology_info_HNDL_vcpu_to_vnode_h
+#undef XLAT_vnuma_topology_info_HNDL_vmemrange_h
+            break;
+	}
+
         default:
             return compat_arch_memory_op(cmd, compat);
         }
@@ -398,6 +428,14 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
         case XENMEM_remove_from_physmap:
             break;
 
+        case XENMEM_get_vnumainfo:
+            cmp.vnuma.nr_vnodes = nat.vnuma->nr_vnodes;
+            cmp.vnuma.nr_vcpus = nat.vnuma->nr_vcpus;
+            cmp.vnuma.nr_vmemranges = nat.vnuma->nr_vmemranges;
+            if ( __copy_to_guest(compat, &cmp.vnuma, 1) )
+                rc = -EFAULT;
+            break;
+
         default:
             domain_crash(current->domain);
             split = 0;
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 41b3e35..9c9fd9a 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -64,6 +64,8 @@
 ?	mem_access_op		memory.h
 !	pod_target			memory.h
 !	remove_from_physmap		memory.h
+?	vmemrange			memory.h
+!	vnuma_topology_info		memory.h
 ?	physdev_eoi			physdev.h
 ?	physdev_get_free_pirq		physdev.h
 ?	physdev_irq			physdev.h
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 12/19] hvmloader: retrieve vNUMA information from hypervisor
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (10 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 11/19] xen: handle XENMEMF_get_vnumainfo in compat_memory_op Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-09 16:46   ` Jan Beulich
  2014-12-01 15:33 ` [PATCH v2 13/19] hvmloader: construct SRAT Wei Liu
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Hvmloader issues XENMEM_get_vnumainfo hypercall and stores the
information retrieved in scratch space for later use.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
---
 tools/firmware/hvmloader/Makefile    |    2 +-
 tools/firmware/hvmloader/hvmloader.c |    3 ++
 tools/firmware/hvmloader/vnuma.c     |   84 ++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/vnuma.h     |   56 +++++++++++++++++++++++
 4 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 tools/firmware/hvmloader/vnuma.c
 create mode 100644 tools/firmware/hvmloader/vnuma.h

diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
index 46a79c5..a5af34b 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -29,7 +29,7 @@ LOADADDR = 0x100000
 CFLAGS += $(CFLAGS_xeninclude)
 
 OBJS  = hvmloader.o mp_tables.o util.o smbios.o 
-OBJS += smp.o cacheattr.o xenbus.o
+OBJS += smp.o cacheattr.o xenbus.o vnuma.o
 OBJS += e820.o pci.o pir.o ctype.o
 OBJS += hvm_param.o
 ifeq ($(debug),y)
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index 7b0da38..2aa2f76 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -26,6 +26,7 @@
 #include "pci_regs.h"
 #include "apic_regs.h"
 #include "acpi/acpi2_0.h"
+#include "vnuma.h"
 #include <xen/version.h>
 #include <xen/hvm/params.h>
 
@@ -261,6 +262,8 @@ int main(void)
 
     init_hypercalls();
 
+    init_vnuma_info();
+
     xenbus_setup();
 
     bios = detect_bios();
diff --git a/tools/firmware/hvmloader/vnuma.c b/tools/firmware/hvmloader/vnuma.c
new file mode 100644
index 0000000..cd26f4f
--- /dev/null
+++ b/tools/firmware/hvmloader/vnuma.c
@@ -0,0 +1,84 @@
+/*
+ * vnuma.c: obtain vNUMA information from hypervisor
+ *
+ * Copyright (c) 2014 Wei Liu, Citrix Systems (R&D) Ltd.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include "util.h"
+#include "hypercall.h"
+#include "vnuma.h"
+#include <errno.h>
+
+unsigned int nr_vnodes, nr_vmemranges;
+unsigned int *vcpu_to_vnode, *vdistance;
+xen_vmemrange_t *vmemrange;
+
+void init_vnuma_info(void)
+{
+    int rc, retry = 0;
+    struct xen_vnuma_topology_info vnuma_topo = {
+        .domid = DOMID_SELF,
+    };
+
+    vcpu_to_vnode = scratch_alloc(sizeof(uint32_t) * hvm_info->nr_vcpus, 0);
+    vdistance = scratch_alloc(sizeof(uint32_t) * MAX_VDISTANCE, 0);
+    vmemrange = scratch_alloc(sizeof(xen_vmemrange_t) * MAX_VMEMRANGES, 0);
+
+    vnuma_topo.nr_vnodes = MAX_VNODES;
+    vnuma_topo.nr_vcpus = hvm_info->nr_vcpus;
+    vnuma_topo.nr_vmemranges = MAX_VMEMRANGES;
+
+    set_xen_guest_handle(vnuma_topo.vdistance.h, vdistance);
+    set_xen_guest_handle(vnuma_topo.vcpu_to_vnode.h, vcpu_to_vnode);
+    set_xen_guest_handle(vnuma_topo.vmemrange.h, vmemrange);
+
+    rc = hypercall_memory_op(XENMEM_get_vnumainfo, &vnuma_topo);
+    while ( rc == -EAGAIN && retry < 10 )
+    {
+        rc = hypercall_memory_op(XENMEM_get_vnumainfo, &vnuma_topo);
+        retry++;
+    }
+    if ( rc < 0 )
+    {
+        printf("Failed to retrieve vNUMA information, rc = %d\n", rc);
+        goto out;
+    }
+
+    nr_vnodes = vnuma_topo.nr_vnodes;
+    nr_vmemranges = vnuma_topo.nr_vmemranges;
+
+out:
+    return;
+}
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/firmware/hvmloader/vnuma.h b/tools/firmware/hvmloader/vnuma.h
new file mode 100644
index 0000000..0779b83
--- /dev/null
+++ b/tools/firmware/hvmloader/vnuma.h
@@ -0,0 +1,56 @@
+/******************************************************************************
+ * vnuma.h
+ *
+ * Copyright (c) 2014, Wei Liu
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __HVMLOADER_VNUMA_H__
+#define __HVMLOADER_VNUMA_H__
+
+#include <xen/memory.h>
+
+#define MAX_VNODES     64
+#define MAX_VDISTANCE  (MAX_VNODES * MAX_VNODES)
+#define MAX_VMEMRANGES (MAX_VNODES * 2)
+
+extern unsigned int nr_vnodes, nr_vmemranges;
+extern unsigned int *vcpu_to_vnode, *vdistance;
+extern xen_vmemrange_t *vmemrange;
+
+void init_vnuma_info(void);
+
+#endif /* __HVMLOADER_VNUMA_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 13/19] hvmloader: construct SRAT
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (11 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 12/19] hvmloader: retrieve vNUMA information from hypervisor Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-09 16:53   ` Jan Beulich
  2014-12-01 15:33 ` [PATCH v2 14/19] hvmloader: construct SLIT Wei Liu
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
---
Changes in v2:
1. Remove explicit zero initializers.
2. Adapt to new vNUMA retrieval routine.
3. Move SRAT very late in secondary table build.
---
 tools/firmware/hvmloader/acpi/acpi2_0.h |   53 +++++++++++++++++++++++
 tools/firmware/hvmloader/acpi/build.c   |   71 +++++++++++++++++++++++++++++++
 2 files changed, 124 insertions(+)

diff --git a/tools/firmware/hvmloader/acpi/acpi2_0.h b/tools/firmware/hvmloader/acpi/acpi2_0.h
index 7b22d80..6169213 100644
--- a/tools/firmware/hvmloader/acpi/acpi2_0.h
+++ b/tools/firmware/hvmloader/acpi/acpi2_0.h
@@ -364,6 +364,57 @@ struct acpi_20_madt_intsrcovr {
 };
 
 /*
+ * System Resource Affinity Table header definition (SRAT)
+ */
+struct acpi_20_srat {
+    struct acpi_header header;
+    uint32_t table_revision;
+    uint32_t reserved2[2];
+};
+
+#define ACPI_SRAT_TABLE_REVISION 1
+
+/*
+ * System Resource Affinity Table structure types.
+ */
+#define ACPI_PROCESSOR_AFFINITY 0x0
+#define ACPI_MEMORY_AFFINITY    0x1
+struct acpi_20_srat_processor {
+    uint8_t type;
+    uint8_t length;
+    uint8_t domain;
+    uint8_t apic_id;
+    uint32_t flags;
+    uint8_t sapic_id;
+    uint8_t domain_hi[3];
+    uint32_t reserved;
+};
+
+/*
+ * Local APIC Affinity Flags.  All other bits are reserved and must be 0.
+ */
+#define ACPI_LOCAL_APIC_AFFIN_ENABLED (1 << 0)
+
+struct acpi_20_srat_memory {
+    uint8_t type;
+    uint8_t length;
+    uint32_t domain;
+    uint16_t reserved;
+    uint64_t base_address;
+    uint64_t mem_length;
+    uint32_t reserved2;
+    uint32_t flags;
+    uint64_t reserved3;
+};
+
+/*
+ * Memory Affinity Flags.  All other bits are reserved and must be 0.
+ */
+#define ACPI_MEM_AFFIN_ENABLED (1 << 0)
+#define ACPI_MEM_AFFIN_HOTPLUGGABLE (1 << 1)
+#define ACPI_MEM_AFFIN_NONVOLATILE (1 << 2)
+
+/*
  * Table Signatures.
  */
 #define ACPI_2_0_RSDP_SIGNATURE ASCII64('R','S','D',' ','P','T','R',' ')
@@ -375,6 +426,7 @@ struct acpi_20_madt_intsrcovr {
 #define ACPI_2_0_TCPA_SIGNATURE ASCII32('T','C','P','A')
 #define ACPI_2_0_HPET_SIGNATURE ASCII32('H','P','E','T')
 #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
+#define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
 
 /*
  * Table revision numbers.
@@ -388,6 +440,7 @@ struct acpi_20_madt_intsrcovr {
 #define ACPI_2_0_HPET_REVISION 0x01
 #define ACPI_2_0_WAET_REVISION 0x01
 #define ACPI_1_0_FADT_REVISION 0x01
+#define ACPI_2_0_SRAT_REVISION 0x01
 
 #pragma pack ()
 
diff --git a/tools/firmware/hvmloader/acpi/build.c b/tools/firmware/hvmloader/acpi/build.c
index 1431296..917b2c9 100644
--- a/tools/firmware/hvmloader/acpi/build.c
+++ b/tools/firmware/hvmloader/acpi/build.c
@@ -23,6 +23,7 @@
 #include "ssdt_pm.h"
 #include "../config.h"
 #include "../util.h"
+#include "../vnuma.h"
 #include <xen/hvm/hvm_xs_strings.h>
 #include <xen/hvm/params.h>
 
@@ -203,6 +204,65 @@ static struct acpi_20_waet *construct_waet(void)
     return waet;
 }
 
+static struct acpi_20_srat *construct_srat(void)
+{
+    struct acpi_20_srat *srat;
+    struct acpi_20_srat_processor *processor;
+    struct acpi_20_srat_memory *memory;
+    unsigned int size;
+    void *p;
+    int i;
+    uint64_t mem;
+
+    size = sizeof(*srat) + sizeof(*processor) * hvm_info->nr_vcpus +
+        sizeof(*memory) * nr_vmemranges;
+
+    p = mem_alloc(size, 16);
+    if (!p) return NULL;
+
+    srat = p;
+    memset(srat, 0, sizeof(*srat));
+    srat->header.signature    = ACPI_2_0_SRAT_SIGNATURE;
+    srat->header.revision     = ACPI_2_0_SRAT_REVISION;
+    fixed_strcpy(srat->header.oem_id, ACPI_OEM_ID);
+    fixed_strcpy(srat->header.oem_table_id, ACPI_OEM_TABLE_ID);
+    srat->header.oem_revision = ACPI_OEM_REVISION;
+    srat->header.creator_id   = ACPI_CREATOR_ID;
+    srat->header.creator_revision = ACPI_CREATOR_REVISION;
+    srat->table_revision      = ACPI_SRAT_TABLE_REVISION;
+
+    processor = (struct acpi_20_srat_processor *)(srat + 1);
+    for ( i = 0; i < hvm_info->nr_vcpus; i++ )
+    {
+        memset(processor, 0, sizeof(*processor));
+        processor->type     = ACPI_PROCESSOR_AFFINITY;
+        processor->length   = sizeof(*processor);
+        processor->domain   = vcpu_to_vnode[i];
+        processor->apic_id  = LAPIC_ID(i);
+        processor->flags    = ACPI_LOCAL_APIC_AFFIN_ENABLED;
+        processor++;
+    }
+
+    memory = (struct acpi_20_srat_memory *)processor;
+    for ( i = 0; i < nr_vmemranges; i++ )
+    {
+        mem = vmemrange[i].end - vmemrange[i].start;
+        memset(memory, 0, sizeof(*memory));
+        memory->type          = ACPI_MEMORY_AFFINITY;
+        memory->length        = sizeof(*memory);
+        memory->domain        = vmemrange[i].nid;
+        memory->flags         = ACPI_MEM_AFFIN_ENABLED;
+        memory->base_address  = vmemrange[i].start;
+        memory->mem_length    = mem;
+        memory++;
+    }
+
+    srat->header.length = size;
+    set_checksum(srat, offsetof(struct acpi_header, checksum), size);
+
+    return srat;
+}
+
 static int construct_passthrough_tables(unsigned long *table_ptrs,
                                         int nr_tables)
 {
@@ -257,6 +317,7 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
     struct acpi_20_hpet *hpet;
     struct acpi_20_waet *waet;
     struct acpi_20_tcpa *tcpa;
+    struct acpi_20_srat *srat;
     unsigned char *ssdt;
     static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001};
     uint16_t *tis_hdr;
@@ -346,6 +407,16 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
         }
     }
 
+    /* SRAT */
+    if ( nr_vnodes > 0 )
+    {
+        srat = construct_srat();
+        if (srat)
+            table_ptrs[nr_tables++] = (unsigned long)srat;
+        else
+            printf("Failed to build SRAT, skipping...\n");
+    }
+
     /* Load any additional tables passed through. */
     nr_tables += construct_passthrough_tables(table_ptrs, nr_tables);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 14/19] hvmloader: construct SLIT
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (12 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 13/19] hvmloader: construct SRAT Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-09 16:57   ` Jan Beulich
  2014-12-01 15:33 ` [PATCH v2 15/19] libxc: allocate memory with vNUMA information for HVM guest Wei Liu
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
---
Changes in v2:
1. Adapt to new vNUMA retrieval routine.
2. Move SLIT very late in secondary table build.
---
 tools/firmware/hvmloader/acpi/acpi2_0.h |    8 +++++++
 tools/firmware/hvmloader/acpi/build.c   |   40 ++++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/acpi/acpi2_0.h b/tools/firmware/hvmloader/acpi/acpi2_0.h
index 6169213..d698095 100644
--- a/tools/firmware/hvmloader/acpi/acpi2_0.h
+++ b/tools/firmware/hvmloader/acpi/acpi2_0.h
@@ -414,6 +414,12 @@ struct acpi_20_srat_memory {
 #define ACPI_MEM_AFFIN_HOTPLUGGABLE (1 << 1)
 #define ACPI_MEM_AFFIN_NONVOLATILE (1 << 2)
 
+struct acpi_20_slit {
+    struct acpi_header header;
+    uint64_t localities;
+    uint8_t entry[0];
+};
+
 /*
  * Table Signatures.
  */
@@ -427,6 +433,7 @@ struct acpi_20_srat_memory {
 #define ACPI_2_0_HPET_SIGNATURE ASCII32('H','P','E','T')
 #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
 #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
+#define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
 
 /*
  * Table revision numbers.
@@ -441,6 +448,7 @@ struct acpi_20_srat_memory {
 #define ACPI_2_0_WAET_REVISION 0x01
 #define ACPI_1_0_FADT_REVISION 0x01
 #define ACPI_2_0_SRAT_REVISION 0x01
+#define ACPI_2_0_SLIT_REVISION 0x01
 
 #pragma pack ()
 
diff --git a/tools/firmware/hvmloader/acpi/build.c b/tools/firmware/hvmloader/acpi/build.c
index 917b2c9..51c737e 100644
--- a/tools/firmware/hvmloader/acpi/build.c
+++ b/tools/firmware/hvmloader/acpi/build.c
@@ -263,6 +263,38 @@ static struct acpi_20_srat *construct_srat(void)
     return srat;
 }
 
+static struct acpi_20_slit *construct_slit(void)
+{
+    struct acpi_20_slit *slit;
+    unsigned int num, size;
+    int i;
+
+    num = nr_vnodes * nr_vnodes;
+    size = sizeof(*slit) + num * sizeof(uint8_t);
+
+    slit = mem_alloc(size, 16);
+    if (!slit) return NULL;
+
+    memset(slit, 0, size);
+    slit->header.signature    = ACPI_2_0_SLIT_SIGNATURE;
+    slit->header.revision     = ACPI_2_0_SLIT_REVISION;
+    fixed_strcpy(slit->header.oem_id, ACPI_OEM_ID);
+    fixed_strcpy(slit->header.oem_table_id, ACPI_OEM_TABLE_ID);
+    slit->header.oem_revision = ACPI_OEM_REVISION;
+    slit->header.creator_id   = ACPI_CREATOR_ID;
+    slit->header.creator_revision = ACPI_CREATOR_REVISION;
+
+    for ( i = 0; i < num; i++ )
+        slit->entry[i] = vdistance[i];
+
+    slit->localities = nr_vnodes;
+
+    slit->header.length = size;
+    set_checksum(slit, offsetof(struct acpi_header, checksum), size);
+
+    return slit;
+}
+
 static int construct_passthrough_tables(unsigned long *table_ptrs,
                                         int nr_tables)
 {
@@ -318,6 +350,7 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
     struct acpi_20_waet *waet;
     struct acpi_20_tcpa *tcpa;
     struct acpi_20_srat *srat;
+    struct acpi_20_slit *slit;
     unsigned char *ssdt;
     static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001};
     uint16_t *tis_hdr;
@@ -407,7 +440,7 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
         }
     }
 
-    /* SRAT */
+    /* SRAT and SLIT */
     if ( nr_vnodes > 0 )
     {
         srat = construct_srat();
@@ -415,6 +448,11 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
             table_ptrs[nr_tables++] = (unsigned long)srat;
         else
             printf("Failed to build SRAT, skipping...\n");
+        slit = construct_slit();
+        if (srat)
+            table_ptrs[nr_tables++] = (unsigned long)slit;
+        else
+            printf("Failed to build SLIT, skipping...\n");
     }
 
     /* Load any additional tables passed through. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 15/19] libxc: allocate memory with vNUMA information for HVM guest
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (13 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 14/19] hvmloader: construct SLIT Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-01 15:33 ` [PATCH v2 16/19] libxl: build, check and pass vNUMA info to Xen " Wei Liu
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

And then returns low memory end, high memory end and mmio start to
caller.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxc/include/xenguest.h |    7 ++
 tools/libxc/xc_hvm_build_x86.c |  224 ++++++++++++++++++++++++++--------------
 2 files changed, 151 insertions(+), 80 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 40bbac8..dca0375 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -230,6 +230,13 @@ struct xc_hvm_build_args {
     struct xc_hvm_firmware_module smbios_module;
     /* Whether to use claim hypercall (1 - enable, 0 - disable). */
     int claim_enabled;
+    unsigned int nr_vnodes;
+    unsigned int *vnode_to_pnode;
+    uint64_t *vnode_size;;
+    /* Out parameters  */
+    uint64_t lowmem_end;
+    uint64_t highmem_end;
+    uint64_t mmio_start;
 };
 
 /**
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index c81a25b..54d3dc8 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -89,7 +89,8 @@ static int modules_init(struct xc_hvm_build_args *args,
 }
 
 static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
-                           uint64_t mmio_start, uint64_t mmio_size)
+                           uint64_t mmio_start, uint64_t mmio_size,
+                           struct xc_hvm_build_args *args)
 {
     struct hvm_info_table *hvm_info = (struct hvm_info_table *)
         (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
@@ -119,6 +120,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
     hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
 
+    args->lowmem_end = lowmem_end;
+    args->highmem_end = highmem_end;
+    args->mmio_start = mmio_start;
+
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
         sum += ((uint8_t *)hvm_info)[i];
@@ -244,7 +249,7 @@ static int setup_guest(xc_interface *xch,
                        char *image, unsigned long image_size)
 {
     xen_pfn_t *page_array = NULL;
-    unsigned long i, nr_pages = args->mem_size >> PAGE_SHIFT;
+    unsigned long i, j, nr_pages = args->mem_size >> PAGE_SHIFT;
     unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
     uint64_t mmio_start = (1ull << 32) - args->mmio_size;
     uint64_t mmio_size = args->mmio_size;
@@ -258,13 +263,13 @@ static int setup_guest(xc_interface *xch,
     xen_capabilities_info_t caps;
     unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
         stat_1gb_pages = 0;
-    int pod_mode = 0;
+    unsigned int memflags = 0;
     int claim_enabled = args->claim_enabled;
     xen_pfn_t special_array[NR_SPECIAL_PAGES];
     xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
-
-    if ( nr_pages > target_pages )
-        pod_mode = XENMEMF_populate_on_demand;
+    uint64_t dummy_vnode_size;
+    unsigned int dummy_vnode_to_pnode;
+    uint64_t total;
 
     memset(&elf, 0, sizeof(elf));
     if ( elf_init(&elf, image, image_size) != 0 )
@@ -276,6 +281,37 @@ static int setup_guest(xc_interface *xch,
     v_start = 0;
     v_end = args->mem_size;
 
+    if ( nr_pages > target_pages )
+        memflags |= XENMEMF_populate_on_demand;
+
+    if ( args->nr_vnodes == 0 )
+    {
+        /* Build dummy vnode information */
+        args->nr_vnodes = 1;
+        dummy_vnode_to_pnode = XC_VNUMA_NO_NODE;
+        dummy_vnode_size = args->mem_size >> 20;
+        args->vnode_size = &dummy_vnode_size;
+        args->vnode_to_pnode = &dummy_vnode_to_pnode;
+    }
+    else
+    {
+        if ( nr_pages > target_pages )
+        {
+            PERROR("Cannot enable vNUMA and PoD at the same time");
+            goto error_out;
+        }
+    }
+
+    total = 0;
+    for ( i = 0; i < args->nr_vnodes; i++ )
+        total += (args->vnode_size[i] << 20);
+    if ( total != args->mem_size )
+    {
+        PERROR("Memory size requested by vNUMA (0x%"PRIx64") mismatches memory size configured for domain (0x%"PRIx64")",
+               total, args->mem_size);
+        goto error_out;
+    }
+
     if ( xc_version(xch, XENVER_capabilities, &caps) != 0 )
     {
         PERROR("Could not get Xen capabilities");
@@ -320,7 +356,7 @@ static int setup_guest(xc_interface *xch,
         }
     }
 
-    if ( pod_mode )
+    if ( memflags & XENMEMF_populate_on_demand )
     {
         /*
          * Subtract VGA_HOLE_SIZE from target_pages for the VGA
@@ -349,103 +385,128 @@ static int setup_guest(xc_interface *xch,
      * ensure that we can be preempted and hence dom0 remains responsive.
      */
     rc = xc_domain_populate_physmap_exact(
-        xch, dom, 0xa0, 0, pod_mode, &page_array[0x00]);
+        xch, dom, 0xa0, 0, memflags, &page_array[0x00]);
     cur_pages = 0xc0;
     stat_normal_pages = 0xc0;
 
-    while ( (rc == 0) && (nr_pages > cur_pages) )
+    for ( i = 0; i < args->nr_vnodes; i++ )
     {
-        /* Clip count to maximum 1GB extent. */
-        unsigned long count = nr_pages - cur_pages;
-        unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS;
-
-        if ( count > max_pages )
-            count = max_pages;
-
-        cur_pfn = page_array[cur_pages];
+        unsigned int new_memflags = memflags;
+        uint64_t pages, finished;
 
-        /* Take care the corner cases of super page tails */
-        if ( ((cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
-             (count > (-cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1))) )
-            count = -cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1);
-        else if ( ((count & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
-                  (count > SUPERPAGE_1GB_NR_PFNS) )
-            count &= ~(SUPERPAGE_1GB_NR_PFNS - 1);
-
-        /* Attemp to allocate 1GB super page. Because in each pass we only
-         * allocate at most 1GB, we don't have to clip super page boundaries.
-         */
-        if ( ((count | cur_pfn) & (SUPERPAGE_1GB_NR_PFNS - 1)) == 0 &&
-             /* Check if there exists MMIO hole in the 1GB memory range */
-             !check_mmio_hole(cur_pfn << PAGE_SHIFT,
-                              SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT,
-                              mmio_start, mmio_size) )
+        if ( args->vnode_to_pnode[i] != XC_VNUMA_NO_NODE )
         {
-            long done;
-            unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT;
-            xen_pfn_t sp_extents[nr_extents];
-
-            for ( i = 0; i < nr_extents; i++ )
-                sp_extents[i] = page_array[cur_pages+(i<<SUPERPAGE_1GB_SHIFT)];
-
-            done = xc_domain_populate_physmap(xch, dom, nr_extents, SUPERPAGE_1GB_SHIFT,
-                                              pod_mode, sp_extents);
-
-            if ( done > 0 )
-            {
-                stat_1gb_pages += done;
-                done <<= SUPERPAGE_1GB_SHIFT;
-                cur_pages += done;
-                count -= done;
-            }
+            new_memflags |= XENMEMF_exact_node(args->vnode_to_pnode[i]);
+            new_memflags |= XENMEMF_exact_node_request;
         }
 
-        if ( count != 0 )
+        pages = (args->vnode_size[i] << 20) >> PAGE_SHIFT;
+        /* Consider vga hole belongs to node 0 */
+        if ( i == 0 )
+            finished = 0xc0;
+        else
+            finished = 0;
+
+        while ( (rc == 0) && (pages > finished) )
         {
-            /* Clip count to maximum 8MB extent. */
-            max_pages = SUPERPAGE_2MB_NR_PFNS * 4;
+            /* Clip count to maximum 1GB extent. */
+            unsigned long count = pages - finished;
+            unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS;
+
             if ( count > max_pages )
                 count = max_pages;
-            
-            /* Clip partial superpage extents to superpage boundaries. */
-            if ( ((cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
-                 (count > (-cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1))) )
-                count = -cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1);
-            else if ( ((count & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
-                      (count > SUPERPAGE_2MB_NR_PFNS) )
-                count &= ~(SUPERPAGE_2MB_NR_PFNS - 1); /* clip non-s.p. tail */
-
-            /* Attempt to allocate superpage extents. */
-            if ( ((count | cur_pfn) & (SUPERPAGE_2MB_NR_PFNS - 1)) == 0 )
+
+            cur_pfn = page_array[cur_pages];
+
+            /* Take care the corner cases of super page tails */
+            if ( ((cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
+                 (count > (-cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1))) )
+                count = -cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1);
+            else if ( ((count & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
+                      (count > SUPERPAGE_1GB_NR_PFNS) )
+                count &= ~(SUPERPAGE_1GB_NR_PFNS - 1);
+
+            /* Attemp to allocate 1GB super page. Because in each pass we only
+             * allocate at most 1GB, we don't have to clip super page boundaries.
+             */
+            if ( ((count | cur_pfn) & (SUPERPAGE_1GB_NR_PFNS - 1)) == 0 &&
+                 /* Check if there exists MMIO hole in the 1GB memory range */
+                 !check_mmio_hole(cur_pfn << PAGE_SHIFT,
+                                  SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT,
+                                  mmio_start, mmio_size) )
             {
                 long done;
-                unsigned long nr_extents = count >> SUPERPAGE_2MB_SHIFT;
+                unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT;
                 xen_pfn_t sp_extents[nr_extents];
 
-                for ( i = 0; i < nr_extents; i++ )
-                    sp_extents[i] = page_array[cur_pages+(i<<SUPERPAGE_2MB_SHIFT)];
+                for ( j = 0; j < nr_extents; j++ )
+                    sp_extents[j] = page_array[cur_pages+(j<<SUPERPAGE_1GB_SHIFT)];
 
-                done = xc_domain_populate_physmap(xch, dom, nr_extents, SUPERPAGE_2MB_SHIFT,
-                                                  pod_mode, sp_extents);
+                done = xc_domain_populate_physmap(xch, dom, nr_extents, SUPERPAGE_1GB_SHIFT,
+                                                  new_memflags, sp_extents);
 
                 if ( done > 0 )
                 {
-                    stat_2mb_pages += done;
-                    done <<= SUPERPAGE_2MB_SHIFT;
+                    stat_1gb_pages += done;
+                    done <<= SUPERPAGE_1GB_SHIFT;
                     cur_pages += done;
+                    finished += done;
                     count -= done;
                 }
             }
-        }
 
-        /* Fall back to 4kB extents. */
-        if ( count != 0 )
-        {
-            rc = xc_domain_populate_physmap_exact(
-                xch, dom, count, 0, pod_mode, &page_array[cur_pages]);
-            cur_pages += count;
-            stat_normal_pages += count;
+            if ( count != 0 )
+            {
+                /* Clip count to maximum 8MB extent. */
+                max_pages = SUPERPAGE_2MB_NR_PFNS * 4;
+                if ( count > max_pages )
+                    count = max_pages;
+
+                /* Clip partial superpage extents to superpage boundaries. */
+                if ( ((cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
+                     (count > (-cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1))) )
+                    count = -cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1);
+                else if ( ((count & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
+                          (count > SUPERPAGE_2MB_NR_PFNS) )
+                    count &= ~(SUPERPAGE_2MB_NR_PFNS - 1); /* clip non-s.p. tail */
+
+                /* Attempt to allocate superpage extents. */
+                if ( ((count | cur_pfn) & (SUPERPAGE_2MB_NR_PFNS - 1)) == 0 )
+                {
+                    long done;
+                    unsigned long nr_extents = count >> SUPERPAGE_2MB_SHIFT;
+                    xen_pfn_t sp_extents[nr_extents];
+
+                    for ( j = 0; j < nr_extents; j++ )
+                        sp_extents[j] = page_array[cur_pages+(j<<SUPERPAGE_2MB_SHIFT)];
+
+                    done = xc_domain_populate_physmap(xch, dom, nr_extents, SUPERPAGE_2MB_SHIFT,
+                                                      new_memflags, sp_extents);
+
+                    if ( done > 0 )
+                    {
+                        stat_2mb_pages += done;
+                        done <<= SUPERPAGE_2MB_SHIFT;
+                        cur_pages += done;
+                        finished += done;
+                        count -= done;
+                    }
+                }
+            }
+
+            /* Fall back to 4kB extents. */
+            if ( count != 0 )
+            {
+                rc = xc_domain_populate_physmap_exact(
+                    xch, dom, count, 0, new_memflags, &page_array[cur_pages]);
+                cur_pages += count;
+                finished += count;
+                stat_normal_pages += count;
+            }
         }
+
+        if ( rc != 0 )
+            break;
     }
 
     if ( rc != 0 )
@@ -469,7 +530,7 @@ static int setup_guest(xc_interface *xch,
               xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
               HVM_INFO_PFN)) == NULL )
         goto error_out;
-    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size);
+    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size, args);
     munmap(hvm_info_page, PAGE_SIZE);
 
     /* Allocate and clear special pages. */
@@ -608,6 +669,9 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
             args.acpi_module.guest_addr_out;
         hvm_args->smbios_module.guest_addr_out = 
             args.smbios_module.guest_addr_out;
+        hvm_args->lowmem_end = args.lowmem_end;
+        hvm_args->highmem_end = args.highmem_end;
+        hvm_args->mmio_start = args.mmio_start;
     }
 
     free(image);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 16/19] libxl: build, check and pass vNUMA info to Xen for HVM guest
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (14 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 15/19] libxc: allocate memory with vNUMA information for HVM guest Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-01 15:33 ` [PATCH v2 17/19] libxl: disallow memory relocation when vNUMA is enabled Wei Liu
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Libxc has more involvement in building vmemranges in HVM case. The
building of vmemranges is placed after xc_hvm_build returns, because it
relies on memory hole information provided by xc_hvm_build.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxl/libxl_create.c   |    9 +++++++
 tools/libxl/libxl_dom.c      |   28 +++++++++++++++++++++
 tools/libxl/libxl_internal.h |    5 ++++
 tools/libxl/libxl_vnuma.c    |   56 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 98 insertions(+)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1198225..11b20d2 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -845,6 +845,15 @@ static void initiate_domain_create(libxl__egc *egc,
         goto error_out;
     }
 
+    /* Disallow PoD and vNUMA to be enabled at the same time because PoD
+     * pool is not vNUMA-aware yet.
+     */
+    if (pod_enabled && d_config->b_info.num_vnuma_nodes) {
+        ret = ERROR_INVAL;
+        LOG(ERROR, "Cannot enable PoD and vNUMA at the same time");
+        goto error_out;
+    }
+
     ret = libxl__domain_create_info_setdefault(gc, &d_config->c_info);
     if (ret) goto error_out;
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 7339bbc..3fe3092 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -884,12 +884,40 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         goto out;
     }
 
+    if (info->num_vnuma_nodes != 0) {
+        int i;
+
+        args.nr_vnodes = info->num_vnuma_nodes;
+        args.vnode_to_pnode = libxl__malloc(gc, sizeof(*args.vnode_to_pnode) *
+                                            args.nr_vnodes);
+        args.vnode_size = libxl__malloc(gc, sizeof(*args.vnode_size) *
+                                        args.nr_vnodes);
+        for (i = 0; i < args.nr_vnodes; i++) {
+            args.vnode_to_pnode[i] = info->vnuma_nodes[i].pnode;
+            args.vnode_size[i] = info->vnuma_nodes[i].mem;
+        }
+
+        /* Consider video ram belongs to node 0 */
+        args.vnode_size[0] -= (info->video_memkb >> 10);
+    }
+
     ret = xc_hvm_build(ctx->xch, domid, &args);
     if (ret) {
         LOGEV(ERROR, ret, "hvm building failed");
         goto out;
     }
 
+    if (info->num_vnuma_nodes != 0) {
+        ret = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, &args);
+        if (ret) {
+            LOGEV(ERROR, ret, "hvm build vmemranges failed");
+            goto out;
+        }
+        ret = libxl__vnuma_config_check(gc, info, state);
+        if (ret) goto out;
+        ret = set_vnuma_info(gc, domid, info, state);
+        if (ret) goto out;
+    }
     ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1678715..13c05c4 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3401,6 +3401,11 @@ int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
                                     uint32_t domid,
                                     libxl_domain_build_info *b_info,
                                     libxl__domain_build_state *state);
+int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
+                                     uint32_t domid,
+                                     libxl_domain_build_info *b_info,
+                                     libxl__domain_build_state *state,
+                                     struct xc_hvm_build_args *args);
 
 _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
                                    const libxl_ms_vm_genid *id);
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
index b8d9268..08a7639 100644
--- a/tools/libxl/libxl_vnuma.c
+++ b/tools/libxl/libxl_vnuma.c
@@ -163,6 +163,62 @@ int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
     return libxl__arch_vnuma_build_vmemrange(gc, domid, b_info, state);
 }
 
+/* Build vmemranges for HVM guest */
+int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
+                                     uint32_t domid,
+                                     libxl_domain_build_info *b_info,
+                                     libxl__domain_build_state *state,
+                                     struct xc_hvm_build_args *args)
+{
+    uint64_t hole_start, hole_end, next;
+    int i, x;
+    xen_vmemrange_t *v;
+
+    /* Derive vmemranges from vnode size and memory hole.
+     *
+     * Guest physical address space layout:
+     * [0, hole_start) [hole_start, hole_end) [hole_end, highmem_end)
+     */
+    hole_start = args->lowmem_end < args->mmio_start ?
+        args->lowmem_end : args->mmio_start;
+    hole_end = (args->mmio_start + args->mmio_size) > (1ULL << 32) ?
+        (args->mmio_start + args->mmio_size) : (1ULL << 32);
+
+    assert(state->vmemranges == NULL);
+
+    next = 0;
+    x = 0;
+    v = NULL;
+    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+        libxl_vnode_info *p = &b_info->vnuma_nodes[i];
+        uint64_t remaining = (p->mem << 20);
+
+        while (remaining > 0) {
+            uint64_t count = remaining;
+
+            if (next >= hole_start && next < hole_end)
+                next = hole_end;
+            if ((next < hole_start) && (next + remaining >= hole_start))
+                count = hole_start - next;
+
+            v = libxl__realloc(gc, v, sizeof(xen_vmemrange_t) * (x + 1));
+            v[x].start = next;
+            v[x].end = next + count;
+            v[x].flags = 0;
+            v[x].nid = i;
+
+            x++;
+            remaining -= count;
+            next += count;
+        }
+    }
+
+    state->vmemranges = v;
+    state->num_vmemranges = x;
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 17/19] libxl: disallow memory relocation when vNUMA is enabled
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (15 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 16/19] libxl: build, check and pass vNUMA info to Xen " Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-01 15:33 ` [PATCH v2 18/19] libxlutil: nested list support Wei Liu
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Disallow memory relocation when vNUMA is enabled, because relocated
memory ends up off node. Further more, even if we dynamically expand
node coverage in hvmloader, low memory and high memory may reside
in different physical nodes, blindly relocating low memory to high
memory gives us a sub-optimal configuration.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl_dm.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 3e191c3..8631343 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1345,13 +1345,15 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss)
                         libxl__sprintf(gc, "%s/hvmloader/bios", path),
                         "%s", libxl_bios_type_to_string(b_info->u.hvm.bios));
         /* Disable relocating memory to make the MMIO hole larger
-         * unless we're running qemu-traditional */
+         * unless we're running qemu-traditional and vNUMA is not
+         * configured. */
         libxl__xs_write(gc, XBT_NULL,
                         libxl__sprintf(gc,
                                        "%s/hvmloader/allow-memory-relocate",
                                        path),
                         "%d",
-                        b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL);
+                        b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL &&
+                        !b_info->num_vnuma_nodes);
         free(path);
     }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 18/19] libxlutil: nested list support
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (16 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 17/19] libxl: disallow memory relocation when vNUMA is enabled Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-01 15:33 ` [PATCH v2 19/19] xl: vNUMA support Wei Liu
  2014-12-08  9:58 ` [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
  19 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

This is done with three major changes:
1. Rework internal representation of setting.
2. Extend grammar of parser.
3. Introduce new APIs.

New APIs introduced:
1. xlu_cfg_value_type
2. xlu_cfg_value_get_string
3. xlu_cfg_value_get_list
4. xlu_cfg_get_listitem2

Previous APIs work as before.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/libxlu_cfg.c      |  180 +++++++++++++++++++++++++----------------
 tools/libxl/libxlu_cfg_i.h    |   15 +++-
 tools/libxl/libxlu_cfg_y.c    |  110 +++++++++++--------------
 tools/libxl/libxlu_cfg_y.h    |    2 +-
 tools/libxl/libxlu_cfg_y.y    |   19 +++--
 tools/libxl/libxlu_internal.h |   24 ++++--
 tools/libxl/libxlutil.h       |   11 ++-
 7 files changed, 208 insertions(+), 153 deletions(-)

diff --git a/tools/libxl/libxlu_cfg.c b/tools/libxl/libxlu_cfg.c
index 22adcb0..db26c34 100644
--- a/tools/libxl/libxlu_cfg.c
+++ b/tools/libxl/libxlu_cfg.c
@@ -132,13 +132,9 @@ int xlu_cfg_readdata(XLU_Config *cfg, const char *data, int length) {
 }
 
 void xlu__cfg_set_free(XLU_ConfigSetting *set) {
-    int i;
-
     if (!set) return;
+    xlu__cfg_value_free(set->value);
     free(set->name);
-    for (i=0; i<set->nvalues; i++)
-        free(set->values[i]);
-    free(set->values);
     free(set);
 }
 
@@ -173,7 +169,7 @@ static int find_atom(const XLU_Config *cfg, const char *n,
     set= find(cfg,n);
     if (!set) return ESRCH;
 
-    if (set->avalues!=1) {
+    if (set->value->type!=XLU_STRING) {
         if (!dont_warn)
             fprintf(cfg->report,
                     "%s:%d: warning: parameter `%s' is"
@@ -185,13 +181,30 @@ static int find_atom(const XLU_Config *cfg, const char *n,
     return 0;
 }
 
+enum XLU_ConfigValueType xlu_cfg_value_type(const XLU_ConfigValue *value)
+{
+    return value->type;
+}
+
+const char *xlu_cfg_value_get_string(const XLU_ConfigValue *value)
+{
+    assert(value->type == XLU_STRING);
+    return value->u.string;
+}
+
+const XLU_ConfigList *xlu_cfg_value_get_list(const XLU_ConfigValue *value)
+{
+    assert(value->type == XLU_LIST);
+    return &value->u.list;
+}
+
 int xlu_cfg_get_string(const XLU_Config *cfg, const char *n,
                        const char **value_r, int dont_warn) {
     XLU_ConfigSetting *set;
     int e;
 
     e= find_atom(cfg,n,&set,dont_warn);  if (e) return e;
-    *value_r= set->values[0];
+    *value_r= set->value->u.string;
     return 0;
 }
 
@@ -202,7 +215,7 @@ int xlu_cfg_replace_string(const XLU_Config *cfg, const char *n,
 
     e= find_atom(cfg,n,&set,dont_warn);  if (e) return e;
     free(*value_r);
-    *value_r= strdup(set->values[0]);
+    *value_r= strdup(set->value->u.string);
     return 0;
 }
 
@@ -214,7 +227,7 @@ int xlu_cfg_get_long(const XLU_Config *cfg, const char *n,
     char *ep;
 
     e= find_atom(cfg,n,&set,dont_warn);  if (e) return e;
-    errno= 0; l= strtol(set->values[0], &ep, 0);
+    errno= 0; l= strtol(set->value->u.string, &ep, 0);
     e= errno;
     if (errno) {
         e= errno;
@@ -226,7 +239,7 @@ int xlu_cfg_get_long(const XLU_Config *cfg, const char *n,
                     cfg->config_source, set->lineno, n, strerror(e));
         return e;
     }
-    if (*ep || ep==set->values[0]) {
+    if (*ep || ep==set->value->u.string) {
         if (!dont_warn)
             fprintf(cfg->report,
                     "%s:%d: warning: parameter `%s' is not a valid number\n",
@@ -253,7 +266,7 @@ int xlu_cfg_get_list(const XLU_Config *cfg, const char *n,
                      XLU_ConfigList **list_r, int *entries_r, int dont_warn) {
     XLU_ConfigSetting *set;
     set= find(cfg,n);  if (!set) return ESRCH;
-    if (set->avalues==1) {
+    if (set->value->type!=XLU_LIST) {
         if (!dont_warn) {
             fprintf(cfg->report,
                     "%s:%d: warning: parameter `%s' is a single value"
@@ -262,8 +275,8 @@ int xlu_cfg_get_list(const XLU_Config *cfg, const char *n,
         }
         return EINVAL;
     }
-    if (list_r) *list_r= set;
-    if (entries_r) *entries_r= set->nvalues;
+    if (list_r) *list_r= &set->value->u.list;
+    if (entries_r) *entries_r= set->value->u.list.nvalues;
     return 0;
 }
 
@@ -290,72 +303,29 @@ int xlu_cfg_get_list_as_string_list(const XLU_Config *cfg, const char *n,
     return 0;
 }
 
-const char *xlu_cfg_get_listitem(const XLU_ConfigList *set, int entry) {
-    if (entry < 0 || entry >= set->nvalues) return 0;
-    return set->values[entry];
+const char *xlu_cfg_get_listitem(const XLU_ConfigList *list, int entry) {
+    if (entry < 0 || entry >= list->nvalues) return 0;
+    if (list->values[entry]->type != XLU_STRING) return 0;
+    return list->values[entry]->u.string;
 }
 
-
-XLU_ConfigSetting *xlu__cfg_set_mk(CfgParseContext *ctx,
-                                   int alloc, char *atom) {
-    XLU_ConfigSetting *set= 0;
-
-    if (ctx->err) goto x;
-    assert(!!alloc == !!atom);
-
-    set= malloc(sizeof(*set));
-    if (!set) goto xe;
-
-    set->name= 0; /* tbd */
-    set->avalues= alloc;
-
-    if (!alloc) {
-        set->nvalues= 0;
-        set->values= 0;
-    } else {
-        set->values= malloc(sizeof(*set->values) * alloc);
-        if (!set->values) goto xe;
-
-        set->nvalues= 1;
-        set->values[0]= atom;
-    }
-    return set;
-
- xe:
-    ctx->err= errno;
- x:
-    free(set);
-    free(atom);
-    return 0;
-}
-
-void xlu__cfg_set_add(CfgParseContext *ctx, XLU_ConfigSetting *set,
-                      char *atom) {
-    if (ctx->err) return;
-
-    assert(atom);
-
-    if (set->nvalues >= set->avalues) {
-        int new_avalues;
-        char **new_values;
-
-        if (set->avalues > INT_MAX / 100) { ctx->err= ERANGE; return; }
-        new_avalues= set->avalues * 4;
-        new_values= realloc(set->values,
-                            sizeof(*new_values) * new_avalues);
-        if (!new_values) { ctx->err= errno; free(atom); return; }
-        set->values= new_values;
-        set->avalues= new_avalues;
-    }
-    set->values[set->nvalues++]= atom;
+const XLU_ConfigValue *xlu_cfg_get_listitem2(const XLU_ConfigList *list,
+                                             int entry)
+{
+    if (entry < 0 || entry >= list->nvalues) return NULL;
+    return list->values[entry];
 }
 
 void xlu__cfg_set_store(CfgParseContext *ctx, char *name,
-                        XLU_ConfigSetting *set, int lineno) {
+                        XLU_ConfigValue *val, int lineno) {
+    XLU_ConfigSetting *set;
     if (ctx->err) return;
 
     assert(name);
+    set = malloc(sizeof(*set));
+    assert(set);
     set->name= name;
+    set->value = val;
     set->lineno= lineno;
     set->next= ctx->cfg->settings;
     ctx->cfg->settings= set;
@@ -473,6 +443,76 @@ void xlu__cfg_yyerror(YYLTYPE *loc, CfgParseContext *ctx, char const *msg) {
     if (!ctx->err) ctx->err= EINVAL;
 }
 
+XLU_ConfigValue *xlu__cfg_make_string(CfgParseContext *ctx,
+                                      char *src)
+{
+    XLU_ConfigValue *ret;
+
+    ret = malloc(sizeof(*ret));
+    assert(ret);
+    ret->type = XLU_STRING;
+    ret->u.string = src;
+
+    return ret;
+}
+
+XLU_ConfigValue *xlu__cfg_make_list(CfgParseContext *ctx,
+                                    XLU_ConfigValue *val)
+{
+    XLU_ConfigValue *ret;
+
+    ret = malloc(sizeof(*ret));
+    assert(ret);
+    ret->type = XLU_LIST;
+    ret->u.list.nvalues = 1;
+    ret->u.list.values = malloc(sizeof(*ret->u.list.values));
+    ret->u.list.values[0] = val;
+
+    return ret;
+}
+
+XLU_ConfigValue *xlu__cfg_list_append(CfgParseContext *ctx,
+                                      XLU_ConfigValue *list,
+                                      XLU_ConfigValue *val)
+{
+    XLU_ConfigValue **new_val;
+
+    assert(list->type == XLU_LIST);
+
+    new_val = realloc(list->u.list.values,
+                      sizeof(*new_val) * (list->u.list.nvalues+1));
+    if (!new_val) {
+        ctx->err = errno;
+        xlu__cfg_value_free(val);
+        return list;
+    }
+
+    list->u.list.values = new_val;
+    list->u.list.values[list->u.list.nvalues] = val;
+    list->u.list.nvalues++;
+
+    return list;
+}
+
+void xlu__cfg_value_free(XLU_ConfigValue *val)
+{
+    int i;
+
+    if (!val) return;
+
+    switch (val->type) {
+    case XLU_STRING:
+        free(val->u.string);
+        break;
+    case XLU_LIST:
+        for (i = 0; i < val->u.list.nvalues; i++) {
+            xlu__cfg_value_free(val->u.list.values[i]);
+        }
+        free(val->u.list.values);
+    }
+    free(val);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxlu_cfg_i.h b/tools/libxl/libxlu_cfg_i.h
index 54d033c..89cef9a 100644
--- a/tools/libxl/libxlu_cfg_i.h
+++ b/tools/libxl/libxlu_cfg_i.h
@@ -23,10 +23,8 @@
 #include "libxlu_cfg_y.h"
 
 void xlu__cfg_set_free(XLU_ConfigSetting *set);
-XLU_ConfigSetting *xlu__cfg_set_mk(CfgParseContext*, int alloc, char *atom);
-void xlu__cfg_set_add(CfgParseContext*, XLU_ConfigSetting *set, char *atom);
 void xlu__cfg_set_store(CfgParseContext*, char *name,
-                        XLU_ConfigSetting *set, int lineno);
+                        XLU_ConfigValue *val, int lineno);
 
 char *xlu__cfgl_strdup(CfgParseContext*, const char *src);
 char *xlu__cfgl_dequote(CfgParseContext*, const char *src);
@@ -36,7 +34,18 @@ void xlu__cfgl_lexicalerror(CfgParseContext*, char const *msg);
 
 void xlu__cfgl_likely_python(CfgParseContext *ctx);
 
+XLU_ConfigValue *xlu__cfg_make_string(CfgParseContext*, char *src);
 
+XLU_ConfigValue *xlu__cfg_make_list(CfgParseContext*,
+                                    XLU_ConfigValue *val);
+
+
+
+XLU_ConfigValue *xlu__cfg_list_append(CfgParseContext *ctx,
+                                      XLU_ConfigValue *list,
+                                      XLU_ConfigValue *val);
+
+void xlu__cfg_value_free(struct XLU_ConfigValue *val);
 
 /* Why oh why does bison not declare this in its autogenerated .h ? */
 int xlu__cfg_yyparse(CfgParseContext *ctx);
diff --git a/tools/libxl/libxlu_cfg_y.c b/tools/libxl/libxlu_cfg_y.c
index 4437e05..0a035ef 100644
--- a/tools/libxl/libxlu_cfg_y.c
+++ b/tools/libxl/libxlu_cfg_y.c
@@ -126,7 +126,7 @@ typedef union YYSTYPE
 #line 25 "libxlu_cfg_y.y"
 
   char *string;
-  XLU_ConfigSetting *setting;
+  XLU_ConfigValue *value;
 
 
 
@@ -377,14 +377,14 @@ union yyalloc
 /* YYFINAL -- State number of the termination state.  */
 #define YYFINAL  3
 /* YYLAST -- Last index in YYTABLE.  */
-#define YYLAST   24
+#define YYLAST   26
 
 /* YYNTOKENS -- Number of terminals.  */
 #define YYNTOKENS  12
 /* YYNNTS -- Number of nonterminals.  */
 #define YYNNTS  11
 /* YYNRULES -- Number of rules.  */
-#define YYNRULES  22
+#define YYNRULES  21
 /* YYNRULES -- Number of states.  */
 #define YYNSTATES  30
 
@@ -433,8 +433,8 @@ static const yytype_uint8 yytranslate[] =
 static const yytype_uint8 yyprhs[] =
 {
        0,     0,     3,     5,     8,     9,    12,    15,    17,    20,
-      24,    26,    28,    30,    35,    37,    39,    40,    42,    46,
-      49,    55,    56
+      24,    26,    28,    30,    32,    34,    36,    41,    42,    45,
+      51,    52
 };
 
 /* YYRHS -- A `-1'-separated list of the rules' RHS.  */
@@ -443,17 +443,17 @@ static const yytype_int8 yyrhs[] =
       13,     0,    -1,    14,    -1,    14,    16,    -1,    -1,    14,
       15,    -1,    16,    17,    -1,    17,    -1,     1,     6,    -1,
        3,     7,    18,    -1,     6,    -1,     8,    -1,    19,    -1,
-       9,    22,    20,    10,    -1,     4,    -1,     5,    -1,    -1,
-      21,    -1,    21,    11,    22,    -1,    19,    22,    -1,    21,
-      11,    22,    19,    22,    -1,    -1,    22,     6,    -1
+      20,    -1,     4,    -1,     5,    -1,     9,    22,    21,    10,
+      -1,    -1,    18,    22,    -1,    21,    11,    22,    18,    22,
+      -1,    -1,    22,     6,    -1
 };
 
 /* YYRLINE[YYN] -- source line where rule number YYN was defined.  */
 static const yytype_uint8 yyrline[] =
 {
        0,    47,    47,    48,    50,    51,    53,    54,    55,    57,
-      59,    60,    62,    63,    65,    66,    68,    69,    70,    72,
-      73,    75,    77
+      59,    60,    62,    63,    65,    66,    68,    70,    71,    72,
+      74,    76
 };
 #endif
 
@@ -464,7 +464,7 @@ static const char *const yytname[] =
 {
   "$end", "error", "$undefined", "IDENT", "STRING", "NUMBER", "NEWLINE",
   "'='", "';'", "'['", "']'", "','", "$accept", "file", "stmts", "stmt",
-  "assignment", "endstmt", "value", "atom", "valuelist", "values", "nlok", 0
+  "assignment", "endstmt", "value", "atom", "list", "values", "nlok", 0
 };
 #endif
 
@@ -482,16 +482,16 @@ static const yytype_uint16 yytoknum[] =
 static const yytype_uint8 yyr1[] =
 {
        0,    12,    13,    13,    14,    14,    15,    15,    15,    16,
-      17,    17,    18,    18,    19,    19,    20,    20,    20,    21,
-      21,    22,    22
+      17,    17,    18,    18,    19,    19,    20,    21,    21,    21,
+      22,    22
 };
 
 /* YYR2[YYN] -- Number of symbols composing right hand side of rule YYN.  */
 static const yytype_uint8 yyr2[] =
 {
        0,     2,     1,     2,     0,     2,     2,     1,     2,     3,
-       1,     1,     1,     4,     1,     1,     0,     1,     3,     2,
-       5,     0,     2
+       1,     1,     1,     1,     1,     1,     4,     0,     2,     5,
+       0,     2
 };
 
 /* YYDEFACT[STATE-NAME] -- Default reduction number in state STATE-NUM.
@@ -500,32 +500,32 @@ static const yytype_uint8 yyr2[] =
 static const yytype_uint8 yydefact[] =
 {
        4,     0,     0,     1,     0,     0,    10,    11,     5,     3,
-       7,     8,     0,     6,    14,    15,    21,     9,    12,    16,
-      22,    21,     0,    17,    19,    13,    21,    18,    21,    20
+       7,     8,     0,     6,    14,    15,    20,     9,    12,    13,
+      17,    21,    20,     0,    18,    16,    20,     0,    20,    19
 };
 
 /* YYDEFGOTO[NTERM-NUM].  */
 static const yytype_int8 yydefgoto[] =
 {
-      -1,     1,     2,     8,     9,    10,    17,    18,    22,    23,
-      19
+      -1,     1,     2,     8,     9,    10,    17,    18,    19,    23,
+      20
 };
 
 /* YYPACT[STATE-NUM] -- Index in YYTABLE of the portion describing
    STATE-NUM.  */
-#define YYPACT_NINF -18
+#define YYPACT_NINF -19
 static const yytype_int8 yypact[] =
 {
-     -18,     4,     0,   -18,    -1,     6,   -18,   -18,   -18,     3,
-     -18,   -18,    11,   -18,   -18,   -18,   -18,   -18,   -18,    13,
-     -18,   -18,    12,    10,    17,   -18,   -18,    13,   -18,    17
+     -19,    20,     0,   -19,    15,    16,   -19,   -19,   -19,     4,
+     -19,   -19,    13,   -19,   -19,   -19,   -19,   -19,   -19,   -19,
+      10,   -19,   -19,    -6,    18,   -19,   -19,    10,   -19,    18
 };
 
 /* YYPGOTO[NTERM-NUM].  */
 static const yytype_int8 yypgoto[] =
 {
-     -18,   -18,   -18,   -18,   -18,    15,   -18,   -17,   -18,   -18,
-     -14
+     -19,   -19,   -19,   -19,   -19,    17,   -18,   -19,   -19,   -19,
+     -15
 };
 
 /* YYTABLE[YYPACT[STATE-NUM]].  What to do in state STATE-NUM.  If
@@ -534,22 +534,22 @@ static const yytype_int8 yypgoto[] =
 #define YYTABLE_NINF -3
 static const yytype_int8 yytable[] =
 {
-      -2,     4,    21,     5,     3,    11,     6,    24,     7,     6,
-      28,     7,    27,    12,    29,    14,    15,    14,    15,    20,
-      16,    26,    25,    20,    13
+      -2,     4,    22,     5,    25,    26,     6,    24,     7,    28,
+       6,    27,     7,    29,    14,    15,    21,    14,    15,    16,
+       3,    11,    16,    12,    21,     0,    13
 };
 
 #define yypact_value_is_default(yystate) \
-  ((yystate) == (-18))
+  ((yystate) == (-19))
 
 #define yytable_value_is_error(yytable_value) \
   YYID (0)
 
-static const yytype_uint8 yycheck[] =
+static const yytype_int8 yycheck[] =
 {
-       0,     1,    19,     3,     0,     6,     6,    21,     8,     6,
-      27,     8,    26,     7,    28,     4,     5,     4,     5,     6,
-       9,    11,    10,     6,     9
+       0,     1,    20,     3,    10,    11,     6,    22,     8,    27,
+       6,    26,     8,    28,     4,     5,     6,     4,     5,     9,
+       0,     6,     9,     7,     6,    -1,     9
 };
 
 /* YYSTOS[STATE-NUM] -- The (internal number of the) accessing
@@ -557,8 +557,8 @@ static const yytype_uint8 yycheck[] =
 static const yytype_uint8 yystos[] =
 {
        0,    13,    14,     0,     1,     3,     6,     8,    15,    16,
-      17,     6,     7,    17,     4,     5,     9,    18,    19,    22,
-       6,    19,    20,    21,    22,    10,    11,    22,    19,    22
+      17,     6,     7,    17,     4,     5,     9,    18,    19,    20,
+      22,     6,    18,    21,    22,    10,    11,    22,    18,    22
 };
 
 #define yyerrok		(yyerrstatus = 0)
@@ -1148,7 +1148,7 @@ yydestruct (yymsg, yytype, yyvaluep, yylocationp, ctx)
 
 /* Line 1391 of yacc.c  */
 #line 43 "libxlu_cfg_y.y"
-	{ xlu__cfg_set_free((yyvaluep->setting)); };
+	{ xlu__cfg_value_free((yyvaluep->value)); };
 
 /* Line 1391 of yacc.c  */
 #line 1155 "libxlu_cfg_y.c"
@@ -1162,11 +1162,11 @@ yydestruct (yymsg, yytype, yyvaluep, yylocationp, ctx)
 /* Line 1391 of yacc.c  */
 #line 1164 "libxlu_cfg_y.c"
 	break;
-      case 20: /* "valuelist" */
+      case 20: /* "list" */
 
 /* Line 1391 of yacc.c  */
 #line 43 "libxlu_cfg_y.y"
-	{ xlu__cfg_set_free((yyvaluep->setting)); };
+	{ xlu__cfg_value_free((yyvaluep->value)); };
 
 /* Line 1391 of yacc.c  */
 #line 1173 "libxlu_cfg_y.c"
@@ -1175,7 +1175,7 @@ yydestruct (yymsg, yytype, yyvaluep, yylocationp, ctx)
 
 /* Line 1391 of yacc.c  */
 #line 43 "libxlu_cfg_y.y"
-	{ xlu__cfg_set_free((yyvaluep->setting)); };
+	{ xlu__cfg_value_free((yyvaluep->value)); };
 
 /* Line 1391 of yacc.c  */
 #line 1182 "libxlu_cfg_y.c"
@@ -1508,21 +1508,14 @@ yyreduce:
 
 /* Line 1806 of yacc.c  */
 #line 57 "libxlu_cfg_y.y"
-    { xlu__cfg_set_store(ctx,(yyvsp[(1) - (3)].string),(yyvsp[(3) - (3)].setting),(yylsp[(3) - (3)]).first_line); }
+    { xlu__cfg_set_store(ctx,(yyvsp[(1) - (3)].string),(yyvsp[(3) - (3)].value),(yylsp[(3) - (3)]).first_line); }
     break;
 
   case 12:
 
 /* Line 1806 of yacc.c  */
 #line 62 "libxlu_cfg_y.y"
-    { (yyval.setting)= xlu__cfg_set_mk(ctx,1,(yyvsp[(1) - (1)].string)); }
-    break;
-
-  case 13:
-
-/* Line 1806 of yacc.c  */
-#line 63 "libxlu_cfg_y.y"
-    { (yyval.setting)= (yyvsp[(3) - (4)].setting); }
+    { (yyval.value)= xlu__cfg_make_string(ctx,(yyvsp[(1) - (1)].string)); }
     break;
 
   case 14:
@@ -1543,41 +1536,34 @@ yyreduce:
 
 /* Line 1806 of yacc.c  */
 #line 68 "libxlu_cfg_y.y"
-    { (yyval.setting)= xlu__cfg_set_mk(ctx,0,0); }
+    { (yyval.value)= (yyvsp[(3) - (4)].value); }
     break;
 
   case 17:
 
 /* Line 1806 of yacc.c  */
-#line 69 "libxlu_cfg_y.y"
-    { (yyval.setting)= (yyvsp[(1) - (1)].setting); }
+#line 70 "libxlu_cfg_y.y"
+    { (yyval.value)= xlu__cfg_make_list(ctx,NULL); }
     break;
 
   case 18:
 
 /* Line 1806 of yacc.c  */
-#line 70 "libxlu_cfg_y.y"
-    { (yyval.setting)= (yyvsp[(1) - (3)].setting); }
+#line 71 "libxlu_cfg_y.y"
+    { (yyval.value)= xlu__cfg_make_list(ctx,(yyvsp[(1) - (2)].value)); }
     break;
 
   case 19:
 
 /* Line 1806 of yacc.c  */
 #line 72 "libxlu_cfg_y.y"
-    { (yyval.setting)= xlu__cfg_set_mk(ctx,2,(yyvsp[(1) - (2)].string)); }
-    break;
-
-  case 20:
-
-/* Line 1806 of yacc.c  */
-#line 73 "libxlu_cfg_y.y"
-    { xlu__cfg_set_add(ctx,(yyvsp[(1) - (5)].setting),(yyvsp[(4) - (5)].string)); (yyval.setting)= (yyvsp[(1) - (5)].setting); }
+    { xlu__cfg_list_append(ctx,(yyvsp[(1) - (5)].value),(yyvsp[(4) - (5)].value)); (yyval.value)= (yyvsp[(1) - (5)].value);}
     break;
 
 
 
 /* Line 1806 of yacc.c  */
-#line 1581 "libxlu_cfg_y.c"
+#line 1567 "libxlu_cfg_y.c"
       default: break;
     }
   /* User semantic actions sometimes alter yychar, and that requires
diff --git a/tools/libxl/libxlu_cfg_y.h b/tools/libxl/libxlu_cfg_y.h
index d7dfaf2..37e8213 100644
--- a/tools/libxl/libxlu_cfg_y.h
+++ b/tools/libxl/libxlu_cfg_y.h
@@ -54,7 +54,7 @@ typedef union YYSTYPE
 #line 25 "libxlu_cfg_y.y"
 
   char *string;
-  XLU_ConfigSetting *setting;
+  XLU_ConfigValue *value;
 
 
 
diff --git a/tools/libxl/libxlu_cfg_y.y b/tools/libxl/libxlu_cfg_y.y
index aa9f787..bd4156e 100644
--- a/tools/libxl/libxlu_cfg_y.y
+++ b/tools/libxl/libxlu_cfg_y.y
@@ -24,7 +24,7 @@
 
 %union {
   char *string;
-  XLU_ConfigSetting *setting;
+  XLU_ConfigValue *value;
 }
 
 %locations
@@ -39,8 +39,8 @@
 %type <string>            atom
 %destructor { free($$); } atom IDENT STRING NUMBER
 
-%type <setting>                         value valuelist values
-%destructor { xlu__cfg_set_free($$); }  value valuelist values
+%type <value>                         value list values
+%destructor { xlu__cfg_value_free($$); }  value list values
 
 %%
 
@@ -59,18 +59,17 @@ assignment: IDENT '=' value { xlu__cfg_set_store(ctx,$1,$3,@3.first_line); }
 endstmt: NEWLINE
  |      ';'
 
-value:  atom                         { $$= xlu__cfg_set_mk(ctx,1,$1); }
- |      '[' nlok valuelist ']'       { $$= $3; }
+value:  atom                         { $$= xlu__cfg_make_string(ctx,$1); }
+ |      list
 
 atom:   STRING                   { $$= $1; }
  |      NUMBER                   { $$= $1; }
 
-valuelist: /* empty */           { $$= xlu__cfg_set_mk(ctx,0,0); }
- |      values                  { $$= $1; }
- |      values ',' nlok         { $$= $1; }
+list:   '[' nlok values ']'      { $$= $3; }
 
-values: atom nlok                  { $$= xlu__cfg_set_mk(ctx,2,$1); }
- |      values ',' nlok atom nlok  { xlu__cfg_set_add(ctx,$1,$4); $$= $1; }
+values: /* empty */                { $$= xlu__cfg_make_list(ctx,NULL); }
+ |      value nlok                 { $$= xlu__cfg_make_list(ctx,$1); }
+ |      values ',' nlok value nlok { xlu__cfg_list_append(ctx,$1,$4); $$= $1;}
 
 nlok:
         /* nothing */
diff --git a/tools/libxl/libxlu_internal.h b/tools/libxl/libxlu_internal.h
index 7579158..66eec28 100644
--- a/tools/libxl/libxlu_internal.h
+++ b/tools/libxl/libxlu_internal.h
@@ -23,17 +23,29 @@
 #include <assert.h>
 #include <regex.h>
 
-#define XLU_ConfigList XLU_ConfigSetting
-
 #include "libxlutil.h"
 
-struct XLU_ConfigSetting { /* transparent */
+typedef struct XLU_ConfigValue XLU_ConfigValue;
+
+typedef struct XLU_ConfigList {
+    int nvalues;
+    XLU_ConfigValue **values;
+} XLU_ConfigList;
+
+typedef struct XLU_ConfigValue {
+    enum XLU_ConfigValueType type;
+    union {
+        char *string;
+        XLU_ConfigList list;
+    } u;
+} XLU_ConfigValue;
+
+typedef struct XLU_ConfigSetting { /* transparent */
     struct XLU_ConfigSetting *next;
     char *name;
-    int nvalues, avalues; /* lists have avalues>1 */
-    char **values;
+    struct XLU_ConfigValue *value;
     int lineno;
-};
+} XLU_ConfigSetting;
 
 struct XLU_Config {
     XLU_ConfigSetting *settings;
diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h
index 0333e55..37d9549 100644
--- a/tools/libxl/libxlutil.h
+++ b/tools/libxl/libxlutil.h
@@ -23,6 +23,15 @@
 /* Unless otherwise stated, all functions return an errno value. */
 typedef struct XLU_Config XLU_Config;
 typedef struct XLU_ConfigList XLU_ConfigList;
+typedef struct XLU_ConfigValue XLU_ConfigValue;
+enum XLU_ConfigValueType {
+    XLU_STRING,
+    XLU_LIST,
+};
+
+enum XLU_ConfigValueType xlu_cfg_value_type(const XLU_ConfigValue *value);
+const char *xlu_cfg_value_get_string(const XLU_ConfigValue *value);
+const XLU_ConfigList *xlu_cfg_value_get_list(const XLU_ConfigValue *value);
 
 XLU_Config *xlu_cfg_init(FILE *report, const char *report_filename);
   /* 0 means we got ENOMEM. */
@@ -65,7 +74,7 @@ int xlu_cfg_get_list_as_string_list(const XLU_Config *cfg, const char *n,
 const char *xlu_cfg_get_listitem(const XLU_ConfigList*, int entry);
   /* xlu_cfg_get_listitem cannot fail, except that if entry is
    * out of range it returns 0 (not setting errno) */
-
+const XLU_ConfigValue *xlu_cfg_get_listitem2(const XLU_ConfigList*, int entry);
 
 /*
  * Disk specification parsing.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 19/19] xl: vNUMA support
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (17 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 18/19] libxlutil: nested list support Wei Liu
@ 2014-12-01 15:33 ` Wei Liu
  2014-12-08  9:58 ` [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
  19 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-01 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

This patch includes configuration options parser and documentation.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
Changes in v2:
1. Make vnuma_vdistances mandatory.
2. Use nested list to specify vdistances.
3. Update documentation.
---
 docs/man/xl.cfg.pod.5    |   39 ++++++++++++
 tools/libxl/xl_cmdimpl.c |  147 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 186 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 622ea53..e9af221 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -266,6 +266,45 @@ it will crash.
 
 =back
 
+=head3 Virtual NUMA Memory Allocation
+
+=over 4
+
+=item B<vnuma_memory=[ NUMBER, NUMBER, ... ]>
+
+Specify the size of memory covered by each virtual NUMA node. The number of
+elements in the list also implicitly defines the number of virtual NUMA nodes.
+
+The sum of all elements in this list should be equal to memory size specified
+by B<maxmem=> in guest configuration file, or B<memory=> if B<maxmem=> is not
+specified.
+
+=item B<vnuma_vcpu_map=[ NUMBER, NUMBER, ... ]>
+
+Specifiy which virutal NUMA node a specific vcpu belongs to. The number of
+elements in this list should be equal to B<maxvcpus=> in guest configuration
+file, or B<vcpus=> if B<maxvcpus=> is not specified.
+
+=item B<vnuma_pnode_map=[ NUMBER, NUMBER, ... ]>
+
+Specifiy which physical NUMA node a specific virtual NUMA node maps to. The
+number of elements in this list should be equal to the number of virtual
+NUMA nodes defined in B<vnuma_memory=>.
+
+=item B<vnuma_vdistance=[ [NUMBER, ..., NUMBER], [NUMBER, ..., NUMBER], ... ]>
+
+Two dimensional list to specify distances among nodes.
+
+The number of elements in the first dimension list equals the number of virtual
+nodes. Each element in position B<i> is a list that specifies the distances
+from node B<i> to other nodes.
+
+For example, for a guest with 2 virtual nodes, user can specify:
+
+  vnuma_vdistance = [ [10, 20], [20, 10] ]
+
+=back
+
 =head3 Event Actions
 
 =over 4
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 0e754e7..19996ed 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -904,6 +904,151 @@ static void replace_string(char **str, const char *val)
     *str = xstrdup(val);
 }
 
+static void parse_vnuma_config(const XLU_Config *config,
+                               libxl_domain_build_info *b_info)
+{
+    int i, j;
+    XLU_ConfigList *vnuma_memory, *vnuma_vcpu_map, *vnuma_pnode_map,
+        *vnuma_vdistances;
+    int num_vnuma_memory, num_vnuma_vcpu_map, num_vnuma_pnode_map,
+        num_vnuma_vdistances;
+    const char *buf;
+    libxl_physinfo physinfo;
+    uint32_t nr_nodes;
+    unsigned long val;
+    char *ep;
+
+    libxl_physinfo_init(&physinfo);
+    if (libxl_get_physinfo(ctx, &physinfo) != 0) {
+        libxl_physinfo_dispose(&physinfo);
+        fprintf(stderr, "libxl_physinfo failed.\n");
+        exit(1);
+    }
+    nr_nodes = physinfo.nr_nodes;
+    libxl_physinfo_dispose(&physinfo);
+
+    if (xlu_cfg_get_list(config, "vnuma_memory", &vnuma_memory,
+                         &num_vnuma_memory, 1))
+        return;              /* No vnuma config */
+
+    b_info->num_vnuma_nodes = num_vnuma_memory;
+    b_info->vnuma_nodes = xmalloc(num_vnuma_memory * sizeof(libxl_vnode_info));
+
+    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+        libxl_vnode_info *p = &b_info->vnuma_nodes[i];
+
+        libxl_vnode_info_init(p);
+        libxl_cpu_bitmap_alloc(ctx, &p->vcpus, b_info->max_vcpus);
+        libxl_bitmap_set_none(&p->vcpus);
+        p->distances = xmalloc(b_info->num_vnuma_nodes * sizeof(uint32_t));
+        p->num_distances = b_info->num_vnuma_nodes;
+    }
+
+    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+        buf = xlu_cfg_get_listitem(vnuma_memory, i);
+        val = strtoul(buf, &ep, 10);
+        if (ep == buf) {
+            fprintf(stderr, "xl: Invalid argument parsing vnuma memory: %s\n", buf);
+            exit(1);
+        }
+        b_info->vnuma_nodes[i].mem = val;
+    }
+
+    if (xlu_cfg_get_list(config, "vnuma_vcpu_map", &vnuma_vcpu_map,
+                         &num_vnuma_vcpu_map, 1)) {
+        fprintf(stderr, "No vcpu to vnode map specified\n");
+        exit(1);
+    }
+
+    i = 0;
+    while (i < b_info->max_vcpus &&
+           (buf = xlu_cfg_get_listitem(vnuma_vcpu_map, i)) != NULL) {
+        val = strtoul(buf, &ep, 10);
+        if (ep == buf) {
+            fprintf(stderr, "xl: Invalid argument parsing vcpu map: %s\n", buf);
+            exit(1);
+        }
+        if (val >= num_vnuma_memory) {
+            fprintf(stderr, "xl: Invalid vnode number specified: %lu\n", val);
+            exit(1);
+        }
+        libxl_bitmap_set(&b_info->vnuma_nodes[val].vcpus, i);
+        i++;
+    }
+
+    if (i != b_info->max_vcpus) {
+        fprintf(stderr, "xl: Not enough elements in vnuma_vcpu_map, provided %d, required %d\n",
+                i + 1, b_info->max_vcpus);
+        exit(1);
+    }
+
+    if (xlu_cfg_get_list(config, "vnuma_pnode_map", &vnuma_pnode_map,
+                         &num_vnuma_pnode_map, 1)) {
+        fprintf(stderr, "No vnode to pnode map specified\n");
+        exit(1);
+    }
+
+    i = 0;
+    while (i < num_vnuma_pnode_map &&
+           (buf = xlu_cfg_get_listitem(vnuma_pnode_map, i)) != NULL) {
+        val = strtoul(buf, &ep, 10);
+        if (ep == buf) {
+            fprintf(stderr, "xl: Invalid argument parsing vnode to pnode map: %s\n", buf);
+            exit(1);
+        }
+        if (val >= nr_nodes) {
+            fprintf(stderr, "xl: Invalid pnode number specified: %lu\n", val);
+            exit(1);
+        }
+
+        b_info->vnuma_nodes[i].pnode = val;
+
+        i++;
+    }
+
+    if (i != b_info->num_vnuma_nodes) {
+        fprintf(stderr, "xl: Not enough elements in vnuma_vnode_map, provided %d, required %d\n",
+                i + 1, b_info->num_vnuma_nodes);
+        exit(1);
+    }
+
+    if (!xlu_cfg_get_list(config, "vnuma_vdistances", &vnuma_vdistances,
+                          &num_vnuma_vdistances, 1)) {
+        if (num_vnuma_vdistances != num_vnuma_memory) {
+            fprintf(stderr, "xl: Required %d sub-lists in vnuma_vdistances but provided %d\n",
+                    num_vnuma_memory, num_vnuma_vdistances);
+            exit(1);
+        }
+
+        for (i = 0; i < num_vnuma_vdistances; i++) {
+            const XLU_ConfigValue *tmp;
+            const XLU_ConfigList *sublist;
+
+            tmp = xlu_cfg_get_listitem2(vnuma_vdistances, i);
+            if (xlu_cfg_value_type(tmp) != XLU_LIST) {
+                fprintf(stderr, "xl: Expecting list in vnuma_vdistance\n");
+                exit(1);
+            }
+            sublist = xlu_cfg_value_get_list(tmp);
+            for (j = 0;
+                 (buf = xlu_cfg_get_listitem(sublist, j)) != NULL;
+                 j++) {
+                val = strtoul(buf, &ep, 10);
+                if (ep == buf) {
+                    fprintf(stderr, "xl: Invalid argument parsing vdistances map: %s\n", buf);
+                    exit(1);
+                }
+
+                b_info->vnuma_nodes[i].distances[j] = val;
+            }
+        }
+
+    } else {
+        fprintf(stderr, "xl: No vnuma_vdistances specified.\n");
+        exit(1);
+    }
+}
+
 static void parse_config_data(const char *config_source,
                               const char *config_data,
                               int config_len,
@@ -1093,6 +1238,8 @@ static void parse_config_data(const char *config_source,
         }
     }
 
+    parse_vnuma_config(config, b_info);
+
     if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
         b_info->rtc_timeoffset = l;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 00/19] Virtual NUMA for PV and HVM
  2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
                   ` (18 preceding siblings ...)
  2014-12-01 15:33 ` [PATCH v2 19/19] xl: vNUMA support Wei Liu
@ 2014-12-08  9:58 ` Wei Liu
  2014-12-08 10:19   ` Jan Beulich
  19 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-08  9:58 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, JBeulich,
	boris.ostrovsky, ufimtseva

Ping?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 00/19] Virtual NUMA for PV and HVM
  2014-12-08  9:58 ` [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
@ 2014-12-08 10:19   ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2014-12-08 10:19 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, ufimtseva,
	boris.ostrovsky

>>> On 08.12.14 at 10:58, <wei.liu2@citrix.com> wrote:
> Ping?

Please be a little more patient - with this series not to go into 4.5,
there's no need to urgently review it. The hypervisor side parts sit
in my to-be-reviewed queue.

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/19] xen: dump vNUMA information with debug key "u"
  2014-12-01 15:33 ` [PATCH v2 01/19] xen: dump vNUMA information with debug key "u" Wei Liu
@ 2014-12-08 17:01   ` Jan Beulich
  2014-12-09 11:22     ` Wei Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2014-12-08 17:01 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, ufimtseva,
	boris.ostrovsky

>>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -363,10 +363,13 @@ EXPORT_SYMBOL(node_data);
>  static void dump_numa(unsigned char key)
>  {
>      s_time_t now = NOW();
> -    int i;
> +    unsigned int i, j, n;
> +    int err;
>      struct domain *d;
>      struct page_info *page;
>      unsigned int page_num_node[MAX_NUMNODES];
> +    uint64_t mem;
> +    struct vnuma_info *vnuma;

If this can be const, it should be in a pure dumping function.

> @@ -408,6 +411,48 @@ static void dump_numa(unsigned char key)
>  
>          for_each_online_node ( i )
>              printk("    Node %u: %u\n", i, page_num_node[i]);
> +
> +        if ( !d->vnuma )
> +                continue;
> +
> +        vnuma = d->vnuma;
> +        printk("     %u vnodes, %u vcpus:\n", vnuma->nr_vnodes, d->max_vcpus);
> +        for ( i = 0; i < vnuma->nr_vnodes; i++ )
> +        {
> +            err = snprintf(keyhandler_scratch, 12, "%3u",
> +                    vnuma->vnode_to_pnode[i]);
> +            if ( err < 0 || vnuma->vnode_to_pnode[i] == NUMA_NO_NODE )
> +                strlcpy(keyhandler_scratch, "???", 3);
> +
> +            printk("       vnode  %3u - pnode %s\n", i, keyhandler_scratch);
> +            for ( j = 0; j < vnuma->nr_vmemranges; j++ )
> +            {
> +                if ( vnuma->vmemrange[j].nid == i )
> +                {
> +                    mem = vnuma->vmemrange[j].end - vnuma->vmemrange[j].start;
> +                    printk("%16"PRIu64" MB: %#016"PRIx64" - %#016"PRIx64"\n",

Am I misremembering that these were just "0x%"PRIx64 originally?
I ask because converting to the 0-padded fixed width form makes
no sense together with the # modifier. For these ranges I think it's
quite obvious that the numbers are hex, so I'd suggest dropping
the #s without replacement. And to be honest I'm also against
printing duplicate information: The memory range already specifies
how much memory this is.

> +                           mem >> 20,
> +                           vnuma->vmemrange[j].start,
> +                           vnuma->vmemrange[j].end);
> +                }
> +            }
> +
> +            printk("       vcpus: ");
> +            for ( j = 0, n = 0; j < d->max_vcpus; j++ )
> +            {
> +                if ( vnuma->vcpu_to_vnode[j] == i )
> +                {
> +                    if ( (n + 1) % 8 == 0 )
> +                        printk("%3d\n", j);
> +                    else if ( !(n % 8) && n != 0 )
> +                        printk("%17d ", j);
> +                    else
> +                        printk("%3d ", j);
> +                    n++;
> +                }

Please consider very-many-vCPU guests here - see Andrew's commit
9cf71226 ("process softirqs while dumping domains").

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 02/19] xen: make two memory hypercalls vNUMA-aware
  2014-12-01 15:33 ` [PATCH v2 02/19] xen: make two memory hypercalls vNUMA-aware Wei Liu
@ 2014-12-08 17:05   ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2014-12-08 17:05 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, ufimtseva,
	boris.ostrovsky

>>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
> --- a/xen/common/kernel.c
> +++ b/xen/common/kernel.c
> @@ -302,6 +302,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          {
>          case 0:
>              fi.submap = 0;
> +            fi.submap |= (1U << XENFEAT_memory_op_vnode_supported);

Why don't you just replace the assignment above?

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -692,6 +692,49 @@ out:
>      return rc;
>  }
>  
> +static int translate_vnode_to_pnode(struct domain *d,
> +                                    struct xen_memory_reservation *r,
> +                                    struct memop_args *a)
> +{
> +    int rc = 0;
> +    unsigned int vnode, pnode;
> +
> +    /* Note: we don't strictly require non-supported bits set to zero,

Coding style.

With these fixed,
Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 11/19] xen: handle XENMEMF_get_vnumainfo in compat_memory_op
  2014-12-01 15:33 ` [PATCH v2 11/19] xen: handle XENMEMF_get_vnumainfo in compat_memory_op Wei Liu
@ 2014-12-09  9:09   ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2014-12-09  9:09 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, ufimtseva,
	boris.ostrovsky

>>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
> @@ -273,6 +276,33 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
>              break;
>          }
>  
> +        case XENMEM_get_vnumainfo:
> +	{

Hard tab.

> +            enum XLAT_vnuma_topology_info_vdistance vdistance =
> +                XLAT_vnuma_topology_info_vdistance_h;
> +            enum XLAT_vnuma_topology_info_vcpu_to_vnode vcpu_to_vnode =
> +                XLAT_vnuma_topology_info_vcpu_to_vnode_h;
> +            enum XLAT_vnuma_topology_info_vmemrange vmemrange =
> +                XLAT_vnuma_topology_info_vmemrange_h;
> +
> +            if ( copy_from_guest(&cmp.vnuma, compat, 1) )
> +                return -EFAULT;
> +
> +#define XLAT_vnuma_topology_info_HNDL_vdistance_h(_d_, _s_)		\
> +            guest_from_compat_handle((_d_)->vdistance.h, (_s_)->vdistance.h)
> +#define XLAT_vnuma_topology_info_HNDL_vcpu_to_vnode_h(_d_, _s_)		\
> +            guest_from_compat_handle((_d_)->vcpu_to_vnode.h, (_s_)->vcpu_to_vnode.h)
> +#define XLAT_vnuma_topology_info_HNDL_vmemrange_h(_d_, _s_)		\
> +            guest_from_compat_handle((_d_)->vmemrange.h, (_s_)->vmemrange.h)
> +
> +            XLAT_vnuma_topology_info(nat.vnuma, &cmp.vnuma);
> +
> +#undef XLAT_vnuma_topology_info_HNDL_vdistance_h
> +#undef XLAT_vnuma_topology_info_HNDL_vcpu_to_vnode_h
> +#undef XLAT_vnuma_topology_info_HNDL_vmemrange_h
> +            break;
> +	}

Again.

With these two fixed and the subject adjusted to name the sub-op
correctly,
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/19] xen: dump vNUMA information with debug key "u"
  2014-12-08 17:01   ` Jan Beulich
@ 2014-12-09 11:22     ` Wei Liu
  2014-12-09 11:38       ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-09 11:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, boris.ostrovsky

On Mon, Dec 08, 2014 at 05:01:29PM +0000, Jan Beulich wrote:
[...]
> > +        for ( i = 0; i < vnuma->nr_vnodes; i++ )
> > +        {
> > +            err = snprintf(keyhandler_scratch, 12, "%3u",
> > +                    vnuma->vnode_to_pnode[i]);
> > +            if ( err < 0 || vnuma->vnode_to_pnode[i] == NUMA_NO_NODE )
> > +                strlcpy(keyhandler_scratch, "???", 3);
> > +
> > +            printk("       vnode  %3u - pnode %s\n", i, keyhandler_scratch);
> > +            for ( j = 0; j < vnuma->nr_vmemranges; j++ )
> > +            {
> > +                if ( vnuma->vmemrange[j].nid == i )
> > +                {
> > +                    mem = vnuma->vmemrange[j].end - vnuma->vmemrange[j].start;
> > +                    printk("%16"PRIu64" MB: %#016"PRIx64" - %#016"PRIx64"\n",
> 
> Am I misremembering that these were just "0x%"PRIx64 originally?

Yes.

> I ask because converting to the 0-padded fixed width form makes
> no sense together with the # modifier. For these ranges I think it's

OK.

> quite obvious that the numbers are hex, so I'd suggest dropping
> the #s without replacement. And to be honest I'm also against
> printing duplicate information: The memory range already specifies
> how much memory this is.
> 

Is this what you want?

+                if ( vnuma->vmemrange[j].nid == i )
+                {
+                    printk("         %016"PRIx64" - %016"PRIx64"\n",
+                           vnuma->vmemrange[j].start,
+                           vnuma->vmemrange[j].end);
+                }

And it prints out something like:

(XEN)      2 vnodes, 2 vcpus:
(XEN)        vnode    0 - pnode   0
(XEN)          0000000000000000 - 00000000bb800000
(XEN)        vcpus:   0

Wei.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/19] xen: dump vNUMA information with debug key "u"
  2014-12-09 11:22     ` Wei Liu
@ 2014-12-09 11:38       ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2014-12-09 11:38 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, ufimtseva,
	boris.ostrovsky

>>> On 09.12.14 at 12:22, <wei.liu2@citrix.com> wrote:
> Is this what you want?
> 
> +                if ( vnuma->vmemrange[j].nid == i )
> +                {
> +                    printk("         %016"PRIx64" - %016"PRIx64"\n",
> +                           vnuma->vmemrange[j].start,
> +                           vnuma->vmemrange[j].end);
> +                }
> 
> And it prints out something like:
> 
> (XEN)      2 vnodes, 2 vcpus:
> (XEN)        vnode    0 - pnode   0
> (XEN)          0000000000000000 - 00000000bb800000
> (XEN)        vcpus:   0

This looks fine, yes.

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 12/19] hvmloader: retrieve vNUMA information from hypervisor
  2014-12-01 15:33 ` [PATCH v2 12/19] hvmloader: retrieve vNUMA information from hypervisor Wei Liu
@ 2014-12-09 16:46   ` Jan Beulich
  2014-12-09 17:52     ` Wei Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2014-12-09 16:46 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, ufimtseva,
	boris.ostrovsky

>>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
> @@ -261,6 +262,8 @@ int main(void)
>  
>      init_hypercalls();
>  
> +    init_vnuma_info();

This is very early, and I don't think it needs to be - I guess it could be
done right before doing ACPI stuff?

> --- /dev/null
> +++ b/tools/firmware/hvmloader/vnuma.c
> @@ -0,0 +1,84 @@
> +/*
> + * vnuma.c: obtain vNUMA information from hypervisor
> + *
> + * Copyright (c) 2014 Wei Liu, Citrix Systems (R&D) Ltd.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +
> +#include "util.h"
> +#include "hypercall.h"
> +#include "vnuma.h"
> +#include <errno.h>

This is the system header, not the Xen one. See the discussion at
http://lists.xenproject.org/archives/html/xen-devel/2014-10/msg03206.html
and perhaps build on the resulting patch
http://lists.xenproject.org/archives/html/xen-devel/2014-12/msg00013.html.

> +
> +unsigned int nr_vnodes, nr_vmemranges;
> +unsigned int *vcpu_to_vnode, *vdistance;
> +xen_vmemrange_t *vmemrange;
> +
> +void init_vnuma_info(void)
> +{
> +    int rc, retry = 0;
> +    struct xen_vnuma_topology_info vnuma_topo = {
> +        .domid = DOMID_SELF,
> +    };
> +
> +    vcpu_to_vnode = scratch_alloc(sizeof(uint32_t) * hvm_info->nr_vcpus, 0);
> +    vdistance = scratch_alloc(sizeof(uint32_t) * MAX_VDISTANCE, 0);
> +    vmemrange = scratch_alloc(sizeof(xen_vmemrange_t) * MAX_VMEMRANGES, 0);
> +
> +    vnuma_topo.nr_vnodes = MAX_VNODES;
> +    vnuma_topo.nr_vcpus = hvm_info->nr_vcpus;
> +    vnuma_topo.nr_vmemranges = MAX_VMEMRANGES;
> +
> +    set_xen_guest_handle(vnuma_topo.vdistance.h, vdistance);
> +    set_xen_guest_handle(vnuma_topo.vcpu_to_vnode.h, vcpu_to_vnode);
> +    set_xen_guest_handle(vnuma_topo.vmemrange.h, vmemrange);
> +
> +    rc = hypercall_memory_op(XENMEM_get_vnumainfo, &vnuma_topo);
> +    while ( rc == -EAGAIN && retry < 10 )
> +    {
> +        rc = hypercall_memory_op(XENMEM_get_vnumainfo, &vnuma_topo);
> +        retry++;
> +    }
> +    if ( rc < 0 )
> +    {
> +        printf("Failed to retrieve vNUMA information, rc = %d\n", rc);
> +        goto out;

return;

> +    }
> +
> +    nr_vnodes = vnuma_topo.nr_vnodes;
> +    nr_vmemranges = vnuma_topo.nr_vmemranges;
> +
> +out:
> +    return;

Drop these two (or really three) unnecessary lines please.

> --- /dev/null
> +++ b/tools/firmware/hvmloader/vnuma.h
> @@ -0,0 +1,56 @@
> +/******************************************************************************
> + * vnuma.h
> + *
> + * Copyright (c) 2014, Wei Liu
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation; or, when distributed
> + * separately from the Linux kernel or incorporated into other
> + * software packages, subject to the following license:
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this source file (the "Software"), to deal in the Software without
> + * restriction, including without limitation the rights to use, copy, modify,
> + * merge, publish, distribute, sublicense, and/or sell copies of the Software,
> + * and to permit persons to whom the Software is furnished to do so, subject to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#ifndef __HVMLOADER_VNUMA_H__
> +#define __HVMLOADER_VNUMA_H__
> +
> +#include <xen/memory.h>
> +
> +#define MAX_VNODES     64
> +#define MAX_VDISTANCE  (MAX_VNODES * MAX_VNODES)
> +#define MAX_VMEMRANGES (MAX_VNODES * 2)

These look arbitrary - please add a (brief) comment giving some
rationale so that people needing to change them know eventual
consequences. Would it perhaps make sense to derive
MAX_VNODES from HVM_MAX_VCPUS? Additionally I think the
code wouldn't become much more difficult if you didn't build in
static upper limits.

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 13/19] hvmloader: construct SRAT
  2014-12-01 15:33 ` [PATCH v2 13/19] hvmloader: construct SRAT Wei Liu
@ 2014-12-09 16:53   ` Jan Beulich
  2014-12-09 18:06     ` Wei Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2014-12-09 16:53 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, ufimtseva,
	boris.ostrovsky

>>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
> @@ -203,6 +204,65 @@ static struct acpi_20_waet *construct_waet(void)
>      return waet;
>  }
>  
> +static struct acpi_20_srat *construct_srat(void)
> +{
> +    struct acpi_20_srat *srat;
> +    struct acpi_20_srat_processor *processor;
> +    struct acpi_20_srat_memory *memory;
> +    unsigned int size;
> +    void *p;
> +    int i;
> +    uint64_t mem;
> +
> +    size = sizeof(*srat) + sizeof(*processor) * hvm_info->nr_vcpus +
> +        sizeof(*memory) * nr_vmemranges;
> +
> +    p = mem_alloc(size, 16);
> +    if (!p) return NULL;

Coding style (despite you likely copied it from elsewhere).

> +
> +    srat = p;
> +    memset(srat, 0, sizeof(*srat));
> +    srat->header.signature    = ACPI_2_0_SRAT_SIGNATURE;
> +    srat->header.revision     = ACPI_2_0_SRAT_REVISION;
> +    fixed_strcpy(srat->header.oem_id, ACPI_OEM_ID);
> +    fixed_strcpy(srat->header.oem_table_id, ACPI_OEM_TABLE_ID);
> +    srat->header.oem_revision = ACPI_OEM_REVISION;
> +    srat->header.creator_id   = ACPI_CREATOR_ID;
> +    srat->header.creator_revision = ACPI_CREATOR_REVISION;
> +    srat->table_revision      = ACPI_SRAT_TABLE_REVISION;
> +
> +    processor = (struct acpi_20_srat_processor *)(srat + 1);
> +    for ( i = 0; i < hvm_info->nr_vcpus; i++ )
> +    {
> +        memset(processor, 0, sizeof(*processor));
> +        processor->type     = ACPI_PROCESSOR_AFFINITY;
> +        processor->length   = sizeof(*processor);
> +        processor->domain   = vcpu_to_vnode[i];
> +        processor->apic_id  = LAPIC_ID(i);
> +        processor->flags    = ACPI_LOCAL_APIC_AFFIN_ENABLED;
> +        processor++;
> +    }
> +
> +    memory = (struct acpi_20_srat_memory *)processor;
> +    for ( i = 0; i < nr_vmemranges; i++ )
> +    {
> +        mem = vmemrange[i].end - vmemrange[i].start;

Do you really need this helper variable?

> +        memset(memory, 0, sizeof(*memory));
> +        memory->type          = ACPI_MEMORY_AFFINITY;
> +        memory->length        = sizeof(*memory);
> +        memory->domain        = vmemrange[i].nid;
> +        memory->flags         = ACPI_MEM_AFFIN_ENABLED;
> +        memory->base_address  = vmemrange[i].start;
> +        memory->mem_length    = mem;
> +        memory++;
> +    }
> +
> +    srat->header.length = size;

Mind checking size == memory - p here?

> @@ -346,6 +407,16 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
>          }
>      }
>  
> +    /* SRAT */
> +    if ( nr_vnodes > 0 )
> +    {
> +        srat = construct_srat();
> +        if (srat)

Coding style again.

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 14/19] hvmloader: construct SLIT
  2014-12-01 15:33 ` [PATCH v2 14/19] hvmloader: construct SLIT Wei Liu
@ 2014-12-09 16:57   ` Jan Beulich
  2014-12-09 18:09     ` Wei Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2014-12-09 16:57 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, ufimtseva,
	boris.ostrovsky

>>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
> --- a/tools/firmware/hvmloader/acpi/build.c
> +++ b/tools/firmware/hvmloader/acpi/build.c
> @@ -263,6 +263,38 @@ static struct acpi_20_srat *construct_srat(void)
>      return srat;
>  }
>  
> +static struct acpi_20_slit *construct_slit(void)
> +{
> +    struct acpi_20_slit *slit;
> +    unsigned int num, size;
> +    int i;

unsigned int please. Plus similar coding style issues further down as in
the previous patch.

> @@ -415,6 +448,11 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
>              table_ptrs[nr_tables++] = (unsigned long)srat;
>          else
>              printf("Failed to build SRAT, skipping...\n");
> +        slit = construct_slit();
> +        if (srat)

DYM slit?

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 12/19] hvmloader: retrieve vNUMA information from hypervisor
  2014-12-09 16:46   ` Jan Beulich
@ 2014-12-09 17:52     ` Wei Liu
  2014-12-10  8:19       ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-09 17:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, boris.ostrovsky

On Tue, Dec 09, 2014 at 04:46:22PM +0000, Jan Beulich wrote:
> >>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
> > @@ -261,6 +262,8 @@ int main(void)
> >  
> >      init_hypercalls();
> >  
> > +    init_vnuma_info();
> 
> This is very early, and I don't think it needs to be - I guess it could be
> done right before doing ACPI stuff?
> 

Yes, it can be moved right before ACPI stuff.

> > --- /dev/null
> > +++ b/tools/firmware/hvmloader/vnuma.c
> > @@ -0,0 +1,84 @@
> > +/*
> > + * vnuma.c: obtain vNUMA information from hypervisor
> > + *
> > + * Copyright (c) 2014 Wei Liu, Citrix Systems (R&D) Ltd.
> > + *
> > + * Redistribution and use in source and binary forms, with or without
> > + * modification, are permitted provided that the following conditions
> > + * are met:
> > + * 1. Redistributions of source code must retain the above copyright
> > + *    notice, this list of conditions and the following disclaimer.
> > + * 2. Redistributions in binary form must reproduce the above copyright
> > + *    notice, this list of conditions and the following disclaimer in the
> > + *    documentation and/or other materials provided with the distribution.
> > + *
> > + * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
> > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> > + * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
> > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> > + * SUCH DAMAGE.
> > + */
> > +
> > +#include "util.h"
> > +#include "hypercall.h"
> > +#include "vnuma.h"
> > +#include <errno.h>
> 
> This is the system header, not the Xen one. See the discussion at
> http://lists.xenproject.org/archives/html/xen-devel/2014-10/msg03206.html
> and perhaps build on the resulting patch
> http://lists.xenproject.org/archives/html/xen-devel/2014-12/msg00013.html.
> 

I will cherry pick that patch and build on top of it.

> > +
> > +unsigned int nr_vnodes, nr_vmemranges;
> > +unsigned int *vcpu_to_vnode, *vdistance;
> > +xen_vmemrange_t *vmemrange;
> > +
[...]
> > + */
> > +
> > +#ifndef __HVMLOADER_VNUMA_H__
> > +#define __HVMLOADER_VNUMA_H__
> > +
> > +#include <xen/memory.h>
> > +
> > +#define MAX_VNODES     64
> > +#define MAX_VDISTANCE  (MAX_VNODES * MAX_VNODES)
> > +#define MAX_VMEMRANGES (MAX_VNODES * 2)
> 
> These look arbitrary - please add a (brief) comment giving some
> rationale so that people needing to change them know eventual
> consequences. Would it perhaps make sense to derive
> MAX_VNODES from HVM_MAX_VCPUS? Additionally I think the

I don't think these two have very strong connection though.

And I remember you saying HVM_MAX_VCPUS will be removed.

> code wouldn't become much more difficult if you didn't build in
> static upper limits.
> 

Yes I can make two hypercalls. First one to retrieve the number of nodes
/ vmemranges configured, allocate memory then fill in those arrays with
second hypercall.

Wei.

> Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 13/19] hvmloader: construct SRAT
  2014-12-09 16:53   ` Jan Beulich
@ 2014-12-09 18:06     ` Wei Liu
  2014-12-10  8:20       ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-09 18:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, boris.ostrovsky

On Tue, Dec 09, 2014 at 04:53:40PM +0000, Jan Beulich wrote:
> >>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
> > @@ -203,6 +204,65 @@ static struct acpi_20_waet *construct_waet(void)
> >      return waet;
> >  }
> >  
> > +static struct acpi_20_srat *construct_srat(void)
> > +{
> > +    struct acpi_20_srat *srat;
> > +    struct acpi_20_srat_processor *processor;
> > +    struct acpi_20_srat_memory *memory;
> > +    unsigned int size;
> > +    void *p;
> > +    int i;
> > +    uint64_t mem;
> > +
> > +    size = sizeof(*srat) + sizeof(*processor) * hvm_info->nr_vcpus +
> > +        sizeof(*memory) * nr_vmemranges;
> > +
> > +    p = mem_alloc(size, 16);
> > +    if (!p) return NULL;
> 
> Coding style (despite you likely copied it from elsewhere).
> 

Fixed.

> > +
> > +    srat = p;
> > +    memset(srat, 0, sizeof(*srat));
> > +    srat->header.signature    = ACPI_2_0_SRAT_SIGNATURE;
> > +    srat->header.revision     = ACPI_2_0_SRAT_REVISION;
> > +    fixed_strcpy(srat->header.oem_id, ACPI_OEM_ID);
> > +    fixed_strcpy(srat->header.oem_table_id, ACPI_OEM_TABLE_ID);
> > +    srat->header.oem_revision = ACPI_OEM_REVISION;
> > +    srat->header.creator_id   = ACPI_CREATOR_ID;
> > +    srat->header.creator_revision = ACPI_CREATOR_REVISION;
> > +    srat->table_revision      = ACPI_SRAT_TABLE_REVISION;
> > +
> > +    processor = (struct acpi_20_srat_processor *)(srat + 1);
> > +    for ( i = 0; i < hvm_info->nr_vcpus; i++ )
> > +    {
> > +        memset(processor, 0, sizeof(*processor));
> > +        processor->type     = ACPI_PROCESSOR_AFFINITY;
> > +        processor->length   = sizeof(*processor);
> > +        processor->domain   = vcpu_to_vnode[i];
> > +        processor->apic_id  = LAPIC_ID(i);
> > +        processor->flags    = ACPI_LOCAL_APIC_AFFIN_ENABLED;
> > +        processor++;
> > +    }
> > +
> > +    memory = (struct acpi_20_srat_memory *)processor;
> > +    for ( i = 0; i < nr_vmemranges; i++ )
> > +    {
> > +        mem = vmemrange[i].end - vmemrange[i].start;
> 
> Do you really need this helper variable?
> 

Removed.

> > +        memset(memory, 0, sizeof(*memory));
> > +        memory->type          = ACPI_MEMORY_AFFINITY;
> > +        memory->length        = sizeof(*memory);
> > +        memory->domain        = vmemrange[i].nid;
> > +        memory->flags         = ACPI_MEM_AFFIN_ENABLED;
> > +        memory->base_address  = vmemrange[i].start;
> > +        memory->mem_length    = mem;
> > +        memory++;
> > +    }
> > +
> > +    srat->header.length = size;
> 
> Mind checking size == memory - p here?
> 

Why? There doesn't seem to be anything that would cause memory -p !=
size in between during runtime.

> > @@ -346,6 +407,16 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
> >          }
> >      }
> >  
> > +    /* SRAT */
> > +    if ( nr_vnodes > 0 )
> > +    {
> > +        srat = construct_srat();
> > +        if (srat)
> 
> Coding style again.
> 

Fixed.

Wei.

> Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 14/19] hvmloader: construct SLIT
  2014-12-09 16:57   ` Jan Beulich
@ 2014-12-09 18:09     ` Wei Liu
  0 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-09 18:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, boris.ostrovsky

On Tue, Dec 09, 2014 at 04:57:14PM +0000, Jan Beulich wrote:
> >>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
> > --- a/tools/firmware/hvmloader/acpi/build.c
> > +++ b/tools/firmware/hvmloader/acpi/build.c
> > @@ -263,6 +263,38 @@ static struct acpi_20_srat *construct_srat(void)
> >      return srat;
> >  }
> >  
> > +static struct acpi_20_slit *construct_slit(void)
> > +{
> > +    struct acpi_20_slit *slit;
> > +    unsigned int num, size;
> > +    int i;
> 
> unsigned int please. Plus similar coding style issues further down as in
> the previous patch.
> 

Fixed.

> > @@ -415,6 +448,11 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
> >              table_ptrs[nr_tables++] = (unsigned long)srat;
> >          else
> >              printf("Failed to build SRAT, skipping...\n");
> > +        slit = construct_slit();
> > +        if (srat)
> 
> DYM slit?
> 

Yes. :-/

Wei.

> Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 12/19] hvmloader: retrieve vNUMA information from hypervisor
  2014-12-09 17:52     ` Wei Liu
@ 2014-12-10  8:19       ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2014-12-10  8:19 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, ufimtseva,
	boris.ostrovsky

>>> On 09.12.14 at 18:52, <wei.liu2@citrix.com> wrote:
> On Tue, Dec 09, 2014 at 04:46:22PM +0000, Jan Beulich wrote:
>> >>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
>> > + */
>> > +
>> > +#ifndef __HVMLOADER_VNUMA_H__
>> > +#define __HVMLOADER_VNUMA_H__
>> > +
>> > +#include <xen/memory.h>
>> > +
>> > +#define MAX_VNODES     64
>> > +#define MAX_VDISTANCE  (MAX_VNODES * MAX_VNODES)
>> > +#define MAX_VMEMRANGES (MAX_VNODES * 2)
>> 
>> These look arbitrary - please add a (brief) comment giving some
>> rationale so that people needing to change them know eventual
>> consequences. Would it perhaps make sense to derive
>> MAX_VNODES from HVM_MAX_VCPUS? Additionally I think the
> 
> I don't think these two have very strong connection though.
> 
> And I remember you saying HVM_MAX_VCPUS will be removed.

Removed? I recall myself saying increased...

>> code wouldn't become much more difficult if you didn't build in
>> static upper limits.
>> 
> 
> Yes I can make two hypercalls. First one to retrieve the number of nodes
> / vmemranges configured, allocate memory then fill in those arrays with
> second hypercall.

That'd be great, as it would eliminate the point above at once.

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 13/19] hvmloader: construct SRAT
  2014-12-09 18:06     ` Wei Liu
@ 2014-12-10  8:20       ` Jan Beulich
  2014-12-10 10:54         ` Wei Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2014-12-10  8:20 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, ufimtseva,
	boris.ostrovsky

>>> On 09.12.14 at 19:06, <wei.liu2@citrix.com> wrote:
> On Tue, Dec 09, 2014 at 04:53:40PM +0000, Jan Beulich wrote:
>> >>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
>> > +        memset(memory, 0, sizeof(*memory));
>> > +        memory->type          = ACPI_MEMORY_AFFINITY;
>> > +        memory->length        = sizeof(*memory);
>> > +        memory->domain        = vmemrange[i].nid;
>> > +        memory->flags         = ACPI_MEM_AFFIN_ENABLED;
>> > +        memory->base_address  = vmemrange[i].start;
>> > +        memory->mem_length    = mem;
>> > +        memory++;
>> > +    }
>> > +
>> > +    srat->header.length = size;
>> 
>> Mind checking size == memory - p here?
>> 
> 
> Why? There doesn't seem to be anything that would cause memory -p !=
> size in between during runtime.

Except for that original calculation being wrong - that's what I would
mean such a check to verify.

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 13/19] hvmloader: construct SRAT
  2014-12-10  8:20       ` Jan Beulich
@ 2014-12-10 10:54         ` Wei Liu
  2014-12-10 11:06           ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2014-12-10 10:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, boris.ostrovsky

On Wed, Dec 10, 2014 at 08:20:38AM +0000, Jan Beulich wrote:
> >>> On 09.12.14 at 19:06, <wei.liu2@citrix.com> wrote:
> > On Tue, Dec 09, 2014 at 04:53:40PM +0000, Jan Beulich wrote:
> >> >>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
> >> > +        memset(memory, 0, sizeof(*memory));
> >> > +        memory->type          = ACPI_MEMORY_AFFINITY;
> >> > +        memory->length        = sizeof(*memory);
> >> > +        memory->domain        = vmemrange[i].nid;
> >> > +        memory->flags         = ACPI_MEM_AFFIN_ENABLED;
> >> > +        memory->base_address  = vmemrange[i].start;
> >> > +        memory->mem_length    = mem;
> >> > +        memory++;
> >> > +    }
> >> > +
> >> > +    srat->header.length = size;
> >> 
> >> Mind checking size == memory - p here?
> >> 
> > 
> > Why? There doesn't seem to be anything that would cause memory -p !=
> > size in between during runtime.
> 
> Except for that original calculation being wrong - that's what I would
> mean such a check to verify.
> 

Do you mean the code is wrong? I don't think so.

Even if I have 

+    if ( ((unsigned long)memory) - ((unsigned long)p) != size )
+        return NULL;
+

Hvmloader doesn't complain, i.e. I don't see error message printed out
in the construct_secondary_tables.

Anyway, if the above snippet is what you want, I can add it.

Wei.

> Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 13/19] hvmloader: construct SRAT
  2014-12-10 10:54         ` Wei Liu
@ 2014-12-10 11:06           ` Jan Beulich
  2014-12-10 11:10             ` Wei Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2014-12-10 11:06 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, ufimtseva,
	boris.ostrovsky

>>> On 10.12.14 at 11:54, <wei.liu2@citrix.com> wrote:
> On Wed, Dec 10, 2014 at 08:20:38AM +0000, Jan Beulich wrote:
>> >>> On 09.12.14 at 19:06, <wei.liu2@citrix.com> wrote:
>> > On Tue, Dec 09, 2014 at 04:53:40PM +0000, Jan Beulich wrote:
>> >> >>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
>> >> > +        memset(memory, 0, sizeof(*memory));
>> >> > +        memory->type          = ACPI_MEMORY_AFFINITY;
>> >> > +        memory->length        = sizeof(*memory);
>> >> > +        memory->domain        = vmemrange[i].nid;
>> >> > +        memory->flags         = ACPI_MEM_AFFIN_ENABLED;
>> >> > +        memory->base_address  = vmemrange[i].start;
>> >> > +        memory->mem_length    = mem;
>> >> > +        memory++;
>> >> > +    }
>> >> > +
>> >> > +    srat->header.length = size;
>> >> 
>> >> Mind checking size == memory - p here?
>> >> 
>> > 
>> > Why? There doesn't seem to be anything that would cause memory -p !=
>> > size in between during runtime.
>> 
>> Except for that original calculation being wrong - that's what I would
>> mean such a check to verify.
>> 
> 
> Do you mean the code is wrong? I don't think so.

No, the code looks right. It's just that two disconnected
calculations easily get out of sync when someone later changes
them.

> Even if I have 
> 
> +    if ( ((unsigned long)memory) - ((unsigned long)p) != size )
> +        return NULL;
> +
> 
> Hvmloader doesn't complain, i.e. I don't see error message printed out
> in the construct_secondary_tables.
> 
> Anyway, if the above snippet is what you want, I can add it.

I was actually more after a BUG_ON() / ASSERT() kind of thing.

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 13/19] hvmloader: construct SRAT
  2014-12-10 11:06           ` Jan Beulich
@ 2014-12-10 11:10             ` Wei Liu
  0 siblings, 0 replies; 43+ messages in thread
From: Wei Liu @ 2014-12-10 11:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, boris.ostrovsky

On Wed, Dec 10, 2014 at 11:06:20AM +0000, Jan Beulich wrote:
> >>> On 10.12.14 at 11:54, <wei.liu2@citrix.com> wrote:
> > On Wed, Dec 10, 2014 at 08:20:38AM +0000, Jan Beulich wrote:
> >> >>> On 09.12.14 at 19:06, <wei.liu2@citrix.com> wrote:
> >> > On Tue, Dec 09, 2014 at 04:53:40PM +0000, Jan Beulich wrote:
> >> >> >>> On 01.12.14 at 16:33, <wei.liu2@citrix.com> wrote:
> >> >> > +        memset(memory, 0, sizeof(*memory));
> >> >> > +        memory->type          = ACPI_MEMORY_AFFINITY;
> >> >> > +        memory->length        = sizeof(*memory);
> >> >> > +        memory->domain        = vmemrange[i].nid;
> >> >> > +        memory->flags         = ACPI_MEM_AFFIN_ENABLED;
> >> >> > +        memory->base_address  = vmemrange[i].start;
> >> >> > +        memory->mem_length    = mem;
> >> >> > +        memory++;
> >> >> > +    }
> >> >> > +
> >> >> > +    srat->header.length = size;
> >> >> 
> >> >> Mind checking size == memory - p here?
> >> >> 
> >> > 
> >> > Why? There doesn't seem to be anything that would cause memory -p !=
> >> > size in between during runtime.
> >> 
> >> Except for that original calculation being wrong - that's what I would
> >> mean such a check to verify.
> >> 
> > 
> > Do you mean the code is wrong? I don't think so.
> 
> No, the code looks right. It's just that two disconnected
> calculations easily get out of sync when someone later changes
> them.
> 

I see.

> > Even if I have 
> > 
> > +    if ( ((unsigned long)memory) - ((unsigned long)p) != size )
> > +        return NULL;
> > +
> > 
> > Hvmloader doesn't complain, i.e. I don't see error message printed out
> > in the construct_secondary_tables.
> > 
> > Anyway, if the above snippet is what you want, I can add it.
> 
> I was actually more after a BUG_ON() / ASSERT() kind of thing.

Ack.

> 
> Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 03/19] libxc: allocate memory with vNUMA information for PV guest
  2014-12-01 15:33 ` [PATCH v2 03/19] libxc: allocate memory with vNUMA information for PV guest Wei Liu
@ 2015-01-12 17:11   ` Ian Campbell
  0 siblings, 0 replies; 43+ messages in thread
From: Ian Campbell @ 2015-01-12 17:11 UTC (permalink / raw)
  To: Wei Liu
  Cc: ufimtseva, dario.faggioli, ian.jackson, xen-devel, JBeulich,
	boris.ostrovsky

On Mon, 2014-12-01 at 15:33 +0000, Wei Liu wrote:

No longer description of what is going on? Or at least comments on what
the new fields mean. (I figure two of them form an array mapping v to p,
I'm not sure what the other one is...)

> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> ---
>  tools/libxc/include/xc_dom.h |    5 +++
>  tools/libxc/xc_dom_x86.c     |   72 +++++++++++++++++++++++++++++++++++-------
>  tools/libxc/xc_private.h     |    2 ++
>  3 files changed, 68 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
> index 07d7224..577a017 100644
> --- a/tools/libxc/include/xc_dom.h
> +++ b/tools/libxc/include/xc_dom.h
> @@ -167,6 +167,11 @@ struct xc_dom_image {
>      struct xc_dom_loader *kernel_loader;
>      void *private_loader;
>  
> +    /* vNUMA information */
> +    unsigned int *vnode_to_pnode;
> +    uint64_t *vnode_size;
> +    unsigned int nr_vnodes;
> +
>      /* kernel loader */
>      struct xc_dom_arch *arch_hooks;
>      /* allocate up to virt_alloc_end */
> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index bf06fe4..3286232 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -759,7 +759,8 @@ static int x86_shadow(xc_interface *xch, domid_t domid)
>  int arch_setup_meminit(struct xc_dom_image *dom)
>  {
>      int rc;
> -    xen_pfn_t pfn, allocsz, i, j, mfn;
> +    xen_pfn_t pfn, allocsz, mfn, total, pfn_base;
> +    int i, j;
>  
>      rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type);
>      if ( rc )
> @@ -811,18 +812,67 @@ int arch_setup_meminit(struct xc_dom_image *dom)
>          /* setup initial p2m */
>          for ( pfn = 0; pfn < dom->total_pages; pfn++ )
>              dom->p2m_host[pfn] = pfn;
> -        
> +
> +        /* setup dummy vNUMA information if it's not provided */
> +        if ( dom->nr_vnodes == 0 )
> +        {
> +            dom->nr_vnodes = 1;
> +            dom->vnode_to_pnode = xc_dom_malloc(dom,
> +                                                sizeof(*dom->vnode_to_pnode));
> +            dom->vnode_to_pnode[0] = XC_VNUMA_NO_NODE;

Does this indicate there is no vnode 0, or that vnode 0 maps to an
arbitrary (but unknown) pnode?

> +            dom->vnode_size = xc_dom_malloc(dom, sizeof(*dom->vnode_size));

Might as well to &dummy_foo, which is an on stack thing of the right
dummy type.

> +            dom->vnode_size[0] = ((dom->total_pages << PAGE_SHIFT) >> 20);

One unnecessary set of ()s
> +        }
> +
> +        total = 0;
> +        for ( i = 0; i < dom->nr_vnodes; i++ )
> +            total += ((dom->vnode_size[i] << 20) >> PAGE_SHIFT);
> +        if ( total != dom->total_pages )
> +        {
> +            xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
> +                         "%s: number of pages requested by vNUMA (0x%"PRIpfn") mismatches number of pages configured for domain (0x%"PRIpfn")\n",

Can you formulate a terser message and avoid the long line?

Perhaps "vNUMA page count mismatch 0x%PRIpfn != 0x%PRIpfn" conveys all
the useful bits?

Alternatively what if we were to mandate that dom->total_pages must be
set to zero by the caller iff they are passing nr_vnodes != 0? Then
calculate total_pages internally.

Might have too many knockon effects on callers?

> +                         __FUNCTION__, total, dom->total_pages);
> +            return -EINVAL;
> +        }
> +
>          /* allocate guest memory */
> -        for ( i = rc = allocsz = 0;
> -              (i < dom->total_pages) && !rc;
> -              i += allocsz )
> +        pfn_base = 0;
> +        for ( i = 0; i < dom->nr_vnodes; i++ )
>          {
> -            allocsz = dom->total_pages - i;
> -            if ( allocsz > 1024*1024 )
> -                allocsz = 1024*1024;
> -            rc = xc_domain_populate_physmap_exact(
> -                dom->xch, dom->guest_domid, allocsz,
> -                0, 0, &dom->p2m_host[i]);
> +            unsigned int memflags;
> +            uint64_t pages;
> +
> +            memflags = 0;
> +            if ( dom->vnode_to_pnode[i] != XC_VNUMA_NO_NODE )
> +            {
> +                memflags |= XENMEMF_exact_node(dom->vnode_to_pnode[i]);
> +                memflags |= XENMEMF_exact_node_request;
> +            }
> +
> +            pages = (dom->vnode_size[i] << 20) >> PAGE_SHIFT;
> +
> +            for ( j = 0; j < pages; j += allocsz )
> +            {
> +                allocsz = pages - j;
> +                if ( allocsz > 1024*1024 )
> +                    allocsz = 1024*1024;
> +
> +                rc = xc_domain_populate_physmap_exact(dom->xch,
> +                         dom->guest_domid, allocsz, 0, memflags,
> +                         &dom->p2m_host[pfn_base+j]);
> +
> +                if ( rc )
> +                {
> +                    if ( dom->vnode_to_pnode[i] != XC_VNUMA_NO_NODE )
> +                        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
> +                                     "%s: fail to allocate 0x%"PRIx64" pages for vnode %d on pnode %d out of 0x%"PRIpfn"\n",

"vNUMA: failed to allocate 0x%"PRIx64"/0x%"PRIx64" pages (v=%d, p=%d)."?

Ian.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 04/19] libxl: add emacs local variables in libxl_{x86, arm}.c
  2014-12-01 15:33 ` [PATCH v2 04/19] libxl: add emacs local variables in libxl_{x86, arm}.c Wei Liu
@ 2015-01-12 17:11   ` Ian Campbell
  0 siblings, 0 replies; 43+ messages in thread
From: Ian Campbell @ 2015-01-12 17:11 UTC (permalink / raw)
  To: Wei Liu
  Cc: ufimtseva, dario.faggioli, ian.jackson, xen-devel, JBeulich,
	boris.ostrovsky

On Mon, 2014-12-01 at 15:33 +0000, Wei Liu wrote:
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>

Acked + applied.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 05/19] libxl: introduce vNUMA types
  2014-12-01 15:33 ` [PATCH v2 05/19] libxl: introduce vNUMA types Wei Liu
@ 2015-01-12 17:17   ` Ian Campbell
  2015-01-12 17:21     ` Wei Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Ian Campbell @ 2015-01-12 17:17 UTC (permalink / raw)
  To: Wei Liu
  Cc: ufimtseva, dario.faggioli, ian.jackson, xen-devel, JBeulich,
	boris.ostrovsky

On Mon, 2014-12-01 at 15:33 +0000, Wei Liu wrote:

Looking ahead a few patches I see a disturbing trend to include little
or no commit message in this series. I also see no overview of what the
design for vnuma at the libxl level is going to look like, just a bunch
of moving parts in isolation, which are rather hard to review.

Please can you flesh out the descriptions of these patches, and maybe
take a look at the series structure. e.g. I'm not sure why the next
patch "libxl: add vmemrange to libxl__domain_build_state" needs to be
alone, instead of adding the struct field in the patch which adds
support for the field.

Ian.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 05/19] libxl: introduce vNUMA types
  2015-01-12 17:17   ` Ian Campbell
@ 2015-01-12 17:21     ` Wei Liu
  2015-01-12 17:24       ` Ian Campbell
  0 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2015-01-12 17:21 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Liu, ufimtseva, dario.faggioli, ian.jackson, xen-devel,
	JBeulich, boris.ostrovsky

On Mon, Jan 12, 2015 at 05:17:24PM +0000, Ian Campbell wrote:
> On Mon, 2014-12-01 at 15:33 +0000, Wei Liu wrote:
> 
> Looking ahead a few patches I see a disturbing trend to include little
> or no commit message in this series. I also see no overview of what the
> design for vnuma at the libxl level is going to look like, just a bunch
> of moving parts in isolation, which are rather hard to review.
> 
> Please can you flesh out the descriptions of these patches, and maybe
> take a look at the series structure. e.g. I'm not sure why the next
> patch "libxl: add vmemrange to libxl__domain_build_state" needs to be
> alone, instead of adding the struct field in the patch which adds
> support for the field.
> 

I will provide more information and resend.

Wei.

> Ian.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 05/19] libxl: introduce vNUMA types
  2015-01-12 17:21     ` Wei Liu
@ 2015-01-12 17:24       ` Ian Campbell
  0 siblings, 0 replies; 43+ messages in thread
From: Ian Campbell @ 2015-01-12 17:24 UTC (permalink / raw)
  To: Wei Liu
  Cc: ufimtseva, dario.faggioli, ian.jackson, xen-devel, JBeulich,
	boris.ostrovsky

On Mon, 2015-01-12 at 17:21 +0000, Wei Liu wrote:
> On Mon, Jan 12, 2015 at 05:17:24PM +0000, Ian Campbell wrote:
> > On Mon, 2014-12-01 at 15:33 +0000, Wei Liu wrote:
> > 
> > Looking ahead a few patches I see a disturbing trend to include little
> > or no commit message in this series. I also see no overview of what the
> > design for vnuma at the libxl level is going to look like, just a bunch
> > of moving parts in isolation, which are rather hard to review.
> > 
> > Please can you flesh out the descriptions of these patches, and maybe
> > take a look at the series structure. e.g. I'm not sure why the next
> > patch "libxl: add vmemrange to libxl__domain_build_state" needs to be
> > alone, instead of adding the struct field in the patch which adds
> > support for the field.
> > 
> 
> I will provide more information and resend.

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2015-01-12 17:24 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-01 15:33 [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
2014-12-01 15:33 ` [PATCH v2 01/19] xen: dump vNUMA information with debug key "u" Wei Liu
2014-12-08 17:01   ` Jan Beulich
2014-12-09 11:22     ` Wei Liu
2014-12-09 11:38       ` Jan Beulich
2014-12-01 15:33 ` [PATCH v2 02/19] xen: make two memory hypercalls vNUMA-aware Wei Liu
2014-12-08 17:05   ` Jan Beulich
2014-12-01 15:33 ` [PATCH v2 03/19] libxc: allocate memory with vNUMA information for PV guest Wei Liu
2015-01-12 17:11   ` Ian Campbell
2014-12-01 15:33 ` [PATCH v2 04/19] libxl: add emacs local variables in libxl_{x86, arm}.c Wei Liu
2015-01-12 17:11   ` Ian Campbell
2014-12-01 15:33 ` [PATCH v2 05/19] libxl: introduce vNUMA types Wei Liu
2015-01-12 17:17   ` Ian Campbell
2015-01-12 17:21     ` Wei Liu
2015-01-12 17:24       ` Ian Campbell
2014-12-01 15:33 ` [PATCH v2 06/19] libxl: add vmemrange to libxl__domain_build_state Wei Liu
2014-12-01 15:33 ` [PATCH v2 07/19] libxl: introduce libxl__vnuma_config_check Wei Liu
2014-12-01 15:33 ` [PATCH v2 08/19] libxl: x86: factor out e820_host_sanitize Wei Liu
2014-12-01 15:33 ` [PATCH v2 09/19] libxl: functions to build vmemranges for PV guest Wei Liu
2014-12-01 15:33 ` [PATCH v2 10/19] libxl: build, check and pass vNUMA info to Xen " Wei Liu
2014-12-01 15:33 ` [PATCH v2 11/19] xen: handle XENMEMF_get_vnumainfo in compat_memory_op Wei Liu
2014-12-09  9:09   ` Jan Beulich
2014-12-01 15:33 ` [PATCH v2 12/19] hvmloader: retrieve vNUMA information from hypervisor Wei Liu
2014-12-09 16:46   ` Jan Beulich
2014-12-09 17:52     ` Wei Liu
2014-12-10  8:19       ` Jan Beulich
2014-12-01 15:33 ` [PATCH v2 13/19] hvmloader: construct SRAT Wei Liu
2014-12-09 16:53   ` Jan Beulich
2014-12-09 18:06     ` Wei Liu
2014-12-10  8:20       ` Jan Beulich
2014-12-10 10:54         ` Wei Liu
2014-12-10 11:06           ` Jan Beulich
2014-12-10 11:10             ` Wei Liu
2014-12-01 15:33 ` [PATCH v2 14/19] hvmloader: construct SLIT Wei Liu
2014-12-09 16:57   ` Jan Beulich
2014-12-09 18:09     ` Wei Liu
2014-12-01 15:33 ` [PATCH v2 15/19] libxc: allocate memory with vNUMA information for HVM guest Wei Liu
2014-12-01 15:33 ` [PATCH v2 16/19] libxl: build, check and pass vNUMA info to Xen " Wei Liu
2014-12-01 15:33 ` [PATCH v2 17/19] libxl: disallow memory relocation when vNUMA is enabled Wei Liu
2014-12-01 15:33 ` [PATCH v2 18/19] libxlutil: nested list support Wei Liu
2014-12-01 15:33 ` [PATCH v2 19/19] xl: vNUMA support Wei Liu
2014-12-08  9:58 ` [PATCH v2 00/19] Virtual NUMA for PV and HVM Wei Liu
2014-12-08 10:19   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.