All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/24] Virtual NUMA for PV and HVM
@ 2015-02-12 19:44 Wei Liu
  2015-02-12 19:44 ` [PATCH v5 01/24] xen: dump vNUMA information with debug key "u" Wei Liu
                   ` (23 more replies)
  0 siblings, 24 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Hi all

This is version 5 of this series rebased on top of master.

This patch series implements virtual NUMA support for both PV and HVM guest.
That is, admin can configure via libxl what virtual NUMA topology the guest
sees.

This is the stage 1 (basic vNUMA support) and part of stage 2 (vNUMA-ware
ballooning, hypervisor side) described in my previous email to xen-devel [0].

This series is broken into several parts:

1. xen patches: vNUMA debug output and vNUMA-aware memory hypercall support.
2. libxc/libxl support for PV vNUMA.
3. libxc/libxl/hypervisor support for HVM vNUMA.
4. xl vNUMA configuration documentation and parser.

One significant difference from Elena's work is that this patch series makes
use of multiple vmemranges should there be a memory hole, instead of shrinking
ram. This matches the behaviour of real hardware.

The vNUMA auto placement algorithm is missing at the moment and Dario is
working on it.

This series can be found at:
 git://xenbits.xen.org/people/liuw/xen.git wip.vnuma-v5

With this series, the following configuration can be used to enabled virtual
NUMA support, and it works for both PV and HVM guests.

vnuma = [ [ "pnode=0","size=3000","vcpus=0-3","vdistances=10,20"  ],
          [ "pnode=0","size=3000","vcpus=4-7","vdistances=20,10"  ],
        ]

For example output of guest NUMA information, please look at [1].

In terms of libxl / libxc internal, things are broken into several
parts:

1. libxl interface

Users of libxl can only specify how many vnodes a guest can have, but
currently they have no control over the actual memory layout. Note that
it's fairly easy to export the interface to control memory layout in the
future.

2. libxl internal

It generates some internal vNUMA configurations when building domain,
then transform them into libxc representations. It also validates vNUMA
configuration along the line.

3. libxc internal

Libxc does what it's told to do. It doesn't do anything smart (in fact,
I delibrately didn't put any smart logic inside it). Libxc will also
report back some information in HVM case to libxl but that's it.

Wei.

[0] <20141111173606.GC21312@zion.uk.xensource.com>
[1] <1416582421-10789-1-git-send-email-wei.liu2@citrix.com>

Changes in v5:
1. Rewrite PV memory allocation functions, take vmemranges into account.
2. Address Ian J's comments with regard to libxlu.
3. Address Jan and Andrew's comments with regard to hypervisor patches.
4. New syntax for vNUMA xl configuration.

Changes in v4:
1. Address comments from many people.
2. Break down the libxlu patch into three.
3. Use dedicate patches for non-functional changes.

Changes in v3:
1. Address comments made by Jan.
2. More commit messages and comments.
3. Shorten some error messages.

Changes in v2:
1. Make vnuma_vdistances mandatory.
2. Use nested list to specify distances among nodes.
3. Hvmloader uses hypercall to retrieve vNUMA information.
4. Fix some problems spotted by Jan.


Wei Liu (24):
  xen: dump vNUMA information with debug key "u"
  xen: make two memory hypercalls vNUMA-aware
  libxc: duplicate snippet to allocate p2m_host array
  libxc: add p2m_size to xc_dom_image
  libxc: allocate memory with vNUMA information for PV guest
  libxl: introduce vNUMA types
  libxl: add vmemrange to libxl__domain_build_state
  libxl: introduce libxl__vnuma_config_check
  libxl: x86: factor out e820_host_sanitize
  libxl: functions to build vmemranges for PV guest
  libxl: build, check and pass vNUMA info to Xen for PV guest
  hvmloader: retrieve vNUMA information from hypervisor
  hvmloader: construct SRAT
  hvmloader: construct SLIT
  libxc: indentation change to xc_hvm_build_x86.c
  libxc: allocate memory with vNUMA information for HVM guest
  libxl: build, check and pass vNUMA info to Xen for HVM guest
  libxl: disallow memory relocation when vNUMA is enabled
  libxl: define LIBXL_HAVE_VNUMA
  libxlu: rework internal representation of setting
  libxlu: nested list support
  libxlu: introduce new APIs
  xl: introduce xcalloc
  xl: vNUMA support

 docs/man/xl.cfg.pod.5                   |  54 +++++++
 tools/firmware/hvmloader/Makefile       |   2 +-
 tools/firmware/hvmloader/acpi/acpi2_0.h |  61 ++++++++
 tools/firmware/hvmloader/acpi/build.c   | 110 +++++++++++++++
 tools/firmware/hvmloader/hvmloader.c    |   3 +
 tools/firmware/hvmloader/vnuma.c        |  84 +++++++++++
 tools/firmware/hvmloader/vnuma.h        |  52 +++++++
 tools/libxc/include/xc_dom.h            |   7 +
 tools/libxc/include/xenguest.h          |  11 ++
 tools/libxc/xc_dom_arm.c                |   1 +
 tools/libxc/xc_dom_core.c               |   8 +-
 tools/libxc/xc_dom_x86.c                | 132 ++++++++++++++----
 tools/libxc/xc_hvm_build_x86.c          | 240 +++++++++++++++++++++-----------
 tools/libxc/xc_private.h                |   2 +
 tools/libxl/Makefile                    |   2 +-
 tools/libxl/libxl.h                     |   6 +
 tools/libxl/libxl_arch.h                |   6 +
 tools/libxl/libxl_arm.c                 |   8 ++
 tools/libxl/libxl_create.c              |   9 ++
 tools/libxl/libxl_dm.c                  |   6 +-
 tools/libxl/libxl_dom.c                 | 114 +++++++++++++++
 tools/libxl/libxl_internal.h            |  23 +++
 tools/libxl/libxl_types.idl             |  10 ++
 tools/libxl/libxl_vnuma.c               | 228 ++++++++++++++++++++++++++++++
 tools/libxl/libxl_x86.c                 | 105 ++++++++++++--
 tools/libxl/libxlu_cfg.c                | 199 +++++++++++++++++++-------
 tools/libxl/libxlu_cfg_i.h              |  13 +-
 tools/libxl/libxlu_cfg_y.c              |  46 +++---
 tools/libxl/libxlu_cfg_y.h              |   2 +-
 tools/libxl/libxlu_cfg_y.y              |  14 +-
 tools/libxl/libxlu_internal.h           |  23 ++-
 tools/libxl/libxlutil.h                 |  13 ++
 tools/libxl/xl_cmdimpl.c                | 151 +++++++++++++++++++-
 xen/arch/x86/numa.c                     |  71 +++++++++-
 xen/common/kernel.c                     |   2 +-
 xen/common/memory.c                     |  51 ++++++-
 xen/include/public/features.h           |   3 +
 xen/include/public/memory.h             |   2 +
 38 files changed, 1655 insertions(+), 219 deletions(-)
 create mode 100644 tools/firmware/hvmloader/vnuma.c
 create mode 100644 tools/firmware/hvmloader/vnuma.h
 create mode 100644 tools/libxl/libxl_vnuma.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v5 01/24] xen: dump vNUMA information with debug key "u"
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 11:50   ` Andrew Cooper
  2015-02-12 19:44 ` [PATCH v5 02/24] xen: make two memory hypercalls vNUMA-aware Wei Liu
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Signed-off-by: Elena Ufimsteva <ufimtseva@gmail.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
---
Changes in v5:
1. Use read_trylock.
2. Use correct array size for strlcpy.
3. Coding style fix.

Changes in v4:
1. Acquire rwlock before accessing vnuma struct.
2. Improve output.

Changes in v3:
1. Constify struct vnuma_info.
2. Don't print amount of ram of a vmemrange.
3. Process softirqs when dumping information.
4. Fix format string.

Changes in v2:
1. Use unsigned int for loop vars.
2. Use strlcpy.
3. Properly align output.
---
 xen/arch/x86/numa.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 70 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 628a40a..e500f33 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -16,6 +16,7 @@
 #include <xen/pfn.h>
 #include <asm/acpi.h>
 #include <xen/sched.h>
+#include <xen/softirq.h>
 
 static int numa_setup(char *s);
 custom_param("numa", numa_setup);
@@ -363,10 +364,12 @@ EXPORT_SYMBOL(node_data);
 static void dump_numa(unsigned char key)
 {
     s_time_t now = NOW();
-    int i;
+    unsigned int i, j;
+    int err;
     struct domain *d;
     struct page_info *page;
     unsigned int page_num_node[MAX_NUMNODES];
+    const struct vnuma_info *vnuma;
 
     printk("'%c' pressed -> dumping numa info (now-0x%X:%08X)\n", key,
            (u32)(now>>32), (u32)now);
@@ -393,6 +396,8 @@ static void dump_numa(unsigned char key)
     printk("Memory location of each domain:\n");
     for_each_domain ( d )
     {
+        process_pending_softirqs();
+
         printk("Domain %u (total: %u):\n", d->domain_id, d->tot_pages);
 
         for_each_online_node ( i )
@@ -408,6 +413,70 @@ static void dump_numa(unsigned char key)
 
         for_each_online_node ( i )
             printk("    Node %u: %u\n", i, page_num_node[i]);
+
+        if ( !read_trylock(&d->vnuma_rwlock) )
+            continue;
+
+        if ( !d->vnuma )
+        {
+            read_unlock(&d->vnuma_rwlock);
+            continue;
+        }
+
+        vnuma = d->vnuma;
+        printk("     %u vnodes, %u vcpus, guest physical layout:\n",
+               vnuma->nr_vnodes, d->max_vcpus);
+        for ( i = 0; i < vnuma->nr_vnodes; i++ )
+        {
+            unsigned int start_cpu = ~0U;
+
+            err = snprintf(keyhandler_scratch, 12, "%3u",
+                    vnuma->vnode_to_pnode[i]);
+            if ( err < 0 || vnuma->vnode_to_pnode[i] == NUMA_NO_NODE )
+                strlcpy(keyhandler_scratch, "???", sizeof(keyhandler_scratch));
+
+            printk("       %3u: pnode %s,", i, keyhandler_scratch);
+
+            printk(" vcpus ");
+
+            for ( j = 0; j < d->max_vcpus; j++ )
+            {
+                if ( !(j & 0x3f) )
+                    process_pending_softirqs();
+
+                if ( vnuma->vcpu_to_vnode[j] == i )
+                {
+                    if ( start_cpu == ~0U )
+                    {
+                        printk("%d", j);
+                        start_cpu = j;
+                    }
+                }
+                else if ( start_cpu != ~0U )
+                {
+                    if ( j - 1 != start_cpu )
+                        printk("-%d ", j - 1);
+                    else
+                        printk(" ");
+                    start_cpu = ~0U;
+                }
+            }
+
+            if ( start_cpu != ~0U  && start_cpu != j - 1 )
+                printk("-%d", j - 1);
+
+            printk("\n");
+
+            for ( j = 0; j < vnuma->nr_vmemranges; j++ )
+            {
+                if ( vnuma->vmemrange[j].nid == i )
+                    printk("           %016"PRIx64" - %016"PRIx64"\n",
+                           vnuma->vmemrange[j].start,
+                           vnuma->vmemrange[j].end);
+            }
+        }
+
+        read_unlock(&d->vnuma_rwlock);
     }
 
     rcu_read_unlock(&domlist_read_lock);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 02/24] xen: make two memory hypercalls vNUMA-aware
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
  2015-02-12 19:44 ` [PATCH v5 01/24] xen: dump vNUMA information with debug key "u" Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 12:00   ` Andrew Cooper
  2015-02-12 19:44 ` [PATCH v5 03/24] libxc: duplicate snippet to allocate p2m_host array Wei Liu
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Make XENMEM_increase_reservation and XENMEM_populate_physmap
vNUMA-aware.

That is, if guest requests Xen to allocate memory for specific vnode,
Xen can translate vnode to pnode using vNUMA information of that guest.

XENMEMF_vnode is introduced for the guest to mark the node number is in
fact virtual node number and should be translated by Xen.

XENFEAT_memory_op_vnode_supported is introduced to indicate that Xen is
able to translate virtual node to physical node.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
---
Changes in v5:
1. New logic in translation function.

Changes in v3:
1. Coding style fix.
2. Remove redundant assignment.

Changes in v2:
1. Return start_extent when vnode translation fails.
2. Expose new feature bit to guest.
3. Fix typo in comment.
---
 xen/common/kernel.c           |  2 +-
 xen/common/memory.c           | 51 +++++++++++++++++++++++++++++++++++++++----
 xen/include/public/features.h |  3 +++
 xen/include/public/memory.h   |  2 ++
 4 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 0d9e519..e5e0050 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -301,7 +301,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         switch ( fi.submap_idx )
         {
         case 0:
-            fi.submap = 0;
+            fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
             if ( VM_ASSIST(d, VMASST_TYPE_pae_extended_cr3) )
                 fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
             if ( paging_mode_translate(current->domain) )
diff --git a/xen/common/memory.c b/xen/common/memory.c
index e84ace9..fa3729b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -692,6 +692,43 @@ out:
     return rc;
 }
 
+static int translate_vnode_to_pnode(struct domain *d,
+                                    struct xen_memory_reservation *r,
+                                    struct memop_args *a)
+{
+    int rc = 0;
+    unsigned int vnode, pnode;
+
+    if ( r->mem_flags & XENMEMF_vnode )
+    {
+        a->memflags &= ~MEMF_node(XENMEMF_get_node(r->mem_flags));
+        a->memflags &= ~MEMF_exact_node;
+
+        read_lock(&d->vnuma_rwlock);
+        if ( d->vnuma )
+        {
+            vnode = XENMEMF_get_node(r->mem_flags);
+
+            if ( vnode < d->vnuma->nr_vnodes )
+            {
+                pnode = d->vnuma->vnode_to_pnode[vnode];
+
+                if ( pnode != NUMA_NO_NODE )
+                {
+                    a->memflags |= MEMF_node(pnode);
+                    if ( r->mem_flags & XENMEMF_exact_node_request )
+                        a->memflags |= MEMF_exact_node;
+                }
+            }
+            else
+                rc = -EINVAL;
+        }
+        read_unlock(&d->vnuma_rwlock);
+    }
+
+    return rc;
+}
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -734,10 +771,6 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             args.memflags = MEMF_bits(address_bits);
         }
 
-        args.memflags |= MEMF_node(XENMEMF_get_node(reservation.mem_flags));
-        if ( reservation.mem_flags & XENMEMF_exact_node_request )
-            args.memflags |= MEMF_exact_node;
-
         if ( op == XENMEM_populate_physmap
              && (reservation.mem_flags & XENMEMF_populate_on_demand) )
             args.memflags |= MEMF_populate_on_demand;
@@ -747,6 +780,16 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return start_extent;
         args.domain = d;
 
+        args.memflags |= MEMF_node(XENMEMF_get_node(reservation.mem_flags));
+        if ( reservation.mem_flags & XENMEMF_exact_node_request )
+            args.memflags |= MEMF_exact_node;
+
+        if ( translate_vnode_to_pnode(d, &reservation, &args) )
+        {
+            rcu_unlock_domain(d);
+            return start_extent;
+        }
+
         if ( xsm_memory_adjust_reservation(XSM_TARGET, current->domain, d) )
         {
             rcu_unlock_domain(d);
diff --git a/xen/include/public/features.h b/xen/include/public/features.h
index 16d92aa..2110b04 100644
--- a/xen/include/public/features.h
+++ b/xen/include/public/features.h
@@ -99,6 +99,9 @@
 #define XENFEAT_grant_map_identity        12
  */
 
+/* Guest can use XENMEMF_vnode to specify virtual node for memory op. */
+#define XENFEAT_memory_op_vnode_supported 13
+
 #define XENFEAT_NR_SUBMAPS 1
 
 #endif /* __XEN_PUBLIC_FEATURES_H__ */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 595f953..2b5206b 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -55,6 +55,8 @@
 /* Flag to request allocation only from the node specified */
 #define XENMEMF_exact_node_request  (1<<17)
 #define XENMEMF_exact_node(n) (XENMEMF_node(n) | XENMEMF_exact_node_request)
+/* Flag to indicate the node specified is virtual node */
+#define XENMEMF_vnode  (1<<18)
 #endif
 
 struct xen_memory_reservation {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 03/24] libxc: duplicate snippet to allocate p2m_host array
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
  2015-02-12 19:44 ` [PATCH v5 01/24] xen: dump vNUMA information with debug key "u" Wei Liu
  2015-02-12 19:44 ` [PATCH v5 02/24] xen: make two memory hypercalls vNUMA-aware Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-12 19:44 ` [PATCH v5 04/24] libxc: add p2m_size to xc_dom_image Wei Liu
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Currently all in tree code doesn't set the superpage flag, but Konrad
wants it retained for the moment.

As I'm going to change the p2m_host array allocation, duplicate the code
snippet to allocate p2m_host array in this patch, so that we retain the
behaviour in superpage case.

This patch introduces no functional change and it will make future patch
easier to review. Also removed one stray tab while I was there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Konrad Wilk <konrad.wilk@oracle.com>
---
 tools/libxc/xc_dom_x86.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index bf06fe4..9dbaedb 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -772,15 +772,16 @@ int arch_setup_meminit(struct xc_dom_image *dom)
             return rc;
     }
 
-    dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) * dom->total_pages);
-    if ( dom->p2m_host == NULL )
-        return -EINVAL;
-
     if ( dom->superpages )
     {
         int count = dom->total_pages >> SUPERPAGE_PFN_SHIFT;
         xen_pfn_t extents[count];
 
+        dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
+                                      dom->total_pages);
+        if ( dom->p2m_host == NULL )
+            return -EINVAL;
+
         DOMPRINTF("Populating memory with %d superpages", count);
         for ( pfn = 0; pfn < count; pfn++ )
             extents[pfn] = pfn << SUPERPAGE_PFN_SHIFT;
@@ -809,9 +810,13 @@ int arch_setup_meminit(struct xc_dom_image *dom)
                 return rc;
         }
         /* setup initial p2m */
+        dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
+                                      dom->total_pages);
+        if ( dom->p2m_host == NULL )
+            return -EINVAL;
         for ( pfn = 0; pfn < dom->total_pages; pfn++ )
             dom->p2m_host[pfn] = pfn;
-        
+
         /* allocate guest memory */
         for ( i = rc = allocsz = 0;
               (i < dom->total_pages) && !rc;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 04/24] libxc: add p2m_size to xc_dom_image
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (2 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 03/24] libxc: duplicate snippet to allocate p2m_host array Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-16 14:46   ` Dario Faggioli
  2015-02-12 19:44 ` [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest Wei Liu
                   ` (19 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Add a new field p2m_size to keep track of the number of pages covert by
p2m.  Change total_pages to p2m_size in functions which in fact need
the size of p2m.

This is needed because we are going to ditch the assumption that PV x86
has only one contiguous ram region. Originally the p2m size was always
equal to total_pages, but we will soon change that in later patch.

This patch doesn't change the behaviour of libxc.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxc/include/xc_dom.h |  1 +
 tools/libxc/xc_dom_arm.c     |  1 +
 tools/libxc/xc_dom_core.c    |  8 ++++----
 tools/libxc/xc_dom_x86.c     | 19 +++++++++++--------
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 07d7224..f57da42 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -129,6 +129,7 @@ struct xc_dom_image {
      */
     xen_pfn_t rambase_pfn;
     xen_pfn_t total_pages;
+    xen_pfn_t p2m_size;         /* number of pfns covert by p2m */
     struct xc_dom_phys *phys_pages;
     int realmodearea_log;
 #if defined (__arm__) || defined(__aarch64__)
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index 9b31b1f..f278927 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -449,6 +449,7 @@ int arch_setup_meminit(struct xc_dom_image *dom)
     assert(dom->rambank_size[0] != 0);
     assert(ramsize == 0); /* Too much RAM is rejected above */
 
+    dom->p2m_size = p2m_size;
     dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) * p2m_size);
     if ( dom->p2m_host == NULL )
         return -EINVAL;
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index ecbf981..b100ce1 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -931,9 +931,9 @@ int xc_dom_update_guest_p2m(struct xc_dom_image *dom)
     {
     case 4:
         DOMPRINTF("%s: dst 32bit, pages 0x%" PRIpfn "",
-                  __FUNCTION__, dom->total_pages);
+                  __FUNCTION__, dom->p2m_size);
         p2m_32 = dom->p2m_guest;
-        for ( i = 0; i < dom->total_pages; i++ )
+        for ( i = 0; i < dom->p2m_size; i++ )
             if ( dom->p2m_host[i] != INVALID_P2M_ENTRY )
                 p2m_32[i] = dom->p2m_host[i];
             else
@@ -941,9 +941,9 @@ int xc_dom_update_guest_p2m(struct xc_dom_image *dom)
         break;
     case 8:
         DOMPRINTF("%s: dst 64bit, pages 0x%" PRIpfn "",
-                  __FUNCTION__, dom->total_pages);
+                  __FUNCTION__, dom->p2m_size);
         p2m_64 = dom->p2m_guest;
-        for ( i = 0; i < dom->total_pages; i++ )
+        for ( i = 0; i < dom->p2m_size; i++ )
             if ( dom->p2m_host[i] != INVALID_P2M_ENTRY )
                 p2m_64[i] = dom->p2m_host[i];
             else
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 9dbaedb..bea54f2 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -122,11 +122,11 @@ static int count_pgtables(struct xc_dom_image *dom, int pae,
 
         try_pfn_end = (try_virt_end - dom->parms.virt_base) >> PAGE_SHIFT_X86;
 
-        if ( try_pfn_end > dom->total_pages )
+        if ( try_pfn_end > dom->p2m_size )
         {
             xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
                          "%s: not enough memory for initial mapping (%#"PRIpfn" > %#"PRIpfn")",
-                         __FUNCTION__, try_pfn_end, dom->total_pages);
+                         __FUNCTION__, try_pfn_end, dom->p2m_size);
             return -ENOMEM;
         }
 
@@ -440,10 +440,11 @@ pfn_error:
 
 static int alloc_magic_pages(struct xc_dom_image *dom)
 {
-    size_t p2m_size = dom->total_pages * dom->arch_hooks->sizeof_pfn;
+    size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
 
     /* allocate phys2mach table */
-    if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach", 0, p2m_size) )
+    if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach",
+                              0, p2m_alloc_size) )
         return -1;
     dom->p2m_guest = xc_dom_seg_to_ptr(dom, &dom->p2m_seg);
     if ( dom->p2m_guest == NULL )
@@ -777,8 +778,9 @@ int arch_setup_meminit(struct xc_dom_image *dom)
         int count = dom->total_pages >> SUPERPAGE_PFN_SHIFT;
         xen_pfn_t extents[count];
 
+        dom->p2m_size = dom->total_pages;
         dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
-                                      dom->total_pages);
+                                      dom->p2m_size);
         if ( dom->p2m_host == NULL )
             return -EINVAL;
 
@@ -810,8 +812,9 @@ int arch_setup_meminit(struct xc_dom_image *dom)
                 return rc;
         }
         /* setup initial p2m */
+        dom->p2m_size = dom->total_pages;
         dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
-                                      dom->total_pages);
+                                      dom->p2m_size);
         if ( dom->p2m_host == NULL )
             return -EINVAL;
         for ( pfn = 0; pfn < dom->total_pages; pfn++ )
@@ -860,7 +863,7 @@ static int map_grant_table_frames(struct xc_dom_image *dom)
     {
         rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid,
                                       XENMAPSPACE_grant_table,
-                                      i, dom->total_pages + i);
+                                      i, dom->p2m_size + i);
         if ( rc != 0 )
         {
             if ( (i > 0) && (errno == EINVAL) )
@@ -870,7 +873,7 @@ static int map_grant_table_frames(struct xc_dom_image *dom)
             }
             xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
                          "%s: mapping grant tables failed " "(pfn=0x%" PRIpfn
-                         ", rc=%d)", __FUNCTION__, dom->total_pages + i, rc);
+                         ", rc=%d)", __FUNCTION__, dom->p2m_size + i, rc);
             return rc;
         }
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (3 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 04/24] libxc: add p2m_size to xc_dom_image Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 14:30   ` Andrew Cooper
  2015-02-16 16:58   ` Dario Faggioli
  2015-02-12 19:44 ` [PATCH v5 06/24] libxl: introduce vNUMA types Wei Liu
                   ` (18 subsequent siblings)
  23 siblings, 2 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

>From libxc's point of view, it only needs to know vnode to pnode mapping
and size of each vnode to allocate memory accordingly. Add these fields
to xc_dom structure.

The caller might not pass in vNUMA information. In that case, a dummy
layout is generated for the convenience of libxc's allocation code. The
upper layer (libxl etc) still sees the domain has no vNUMA
configuration.

Note that for this patch on PV x86 guest can have multiple regions of
ram allocated.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
Changes in v5:
1. Ditch xc_vnuma_info.

Changes in v4:
1. Pack fields into a struct.
2. Use "page" as unit.
3. __FUNCTION__ -> __func__.
4. Don't print total_pages.
5. Improve comment.

Changes in v3:
1. Rewrite commit log.
2. Shorten some error messages.
---
 tools/libxc/include/xc_dom.h |   6 +++
 tools/libxc/xc_dom_x86.c     | 104 +++++++++++++++++++++++++++++++++++++------
 tools/libxc/xc_private.h     |   2 +
 3 files changed, 98 insertions(+), 14 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index f57da42..52d9832 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -168,6 +168,12 @@ struct xc_dom_image {
     struct xc_dom_loader *kernel_loader;
     void *private_loader;
 
+    /* vNUMA information */
+    xen_vmemrange_t *vmemranges;
+    unsigned int nr_vmemranges;
+    unsigned int *vnode_to_pnode;
+    unsigned int nr_vnodes;
+
     /* kernel loader */
     struct xc_dom_arch *arch_hooks;
     /* allocate up to virt_alloc_end */
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index bea54f2..3f8c5b8 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -760,7 +760,8 @@ static int x86_shadow(xc_interface *xch, domid_t domid)
 int arch_setup_meminit(struct xc_dom_image *dom)
 {
     int rc;
-    xen_pfn_t pfn, allocsz, i, j, mfn;
+    xen_pfn_t pfn, allocsz, mfn, total, pfn_base;
+    int i, j;
 
     rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type);
     if ( rc )
@@ -811,26 +812,101 @@ int arch_setup_meminit(struct xc_dom_image *dom)
             if ( rc )
                 return rc;
         }
-        /* setup initial p2m */
-        dom->p2m_size = dom->total_pages;
+
+        /* Setup dummy vNUMA information if it's not provided. Note
+         * that this is a valid state if libxl doesn't provide any
+         * vNUMA information.
+         *
+         * The dummy values make libxc allocate all pages from
+         * arbitrary physical. This is the expected behaviour if no
+         * vNUMA configuration is provided to libxc.
+         *
+         * Note that the following hunk is just for the convenience of
+         * allocation code. No defaulting happens in libxc.
+         */
+        if ( dom->nr_vmemranges == 0 )
+        {
+            dom->nr_vmemranges = 1;
+            dom->vmemranges = xc_dom_malloc(dom, sizeof(*dom->vmemranges));
+            dom->vmemranges[0].start = 0;
+            dom->vmemranges[0].end   = dom->total_pages << PAGE_SHIFT;
+            dom->vmemranges[0].flags = 0;
+            dom->vmemranges[0].nid   = 0;
+
+            dom->nr_vnodes = 1;
+            dom->vnode_to_pnode = xc_dom_malloc(dom,
+                                      sizeof(*dom->vnode_to_pnode));
+            dom->vnode_to_pnode[0] = XC_VNUMA_NO_NODE;
+        }
+
+        total = dom->p2m_size = 0;
+        for ( i = 0; i < dom->nr_vmemranges; i++ )
+        {
+            total += ((dom->vmemranges[i].end - dom->vmemranges[i].start)
+                      >> PAGE_SHIFT);
+            dom->p2m_size =
+                dom->p2m_size > (dom->vmemranges[i].end >> PAGE_SHIFT) ?
+                dom->p2m_size : (dom->vmemranges[i].end >> PAGE_SHIFT);
+        }
+        if ( total != dom->total_pages )
+        {
+            xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                         "%s: vNUMA page count mismatch (0x%"PRIpfn" != 0x%"PRIpfn")\n",
+                         __func__, total, dom->total_pages);
+            return -EINVAL;
+        }
+
         dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
                                       dom->p2m_size);
         if ( dom->p2m_host == NULL )
             return -EINVAL;
-        for ( pfn = 0; pfn < dom->total_pages; pfn++ )
-            dom->p2m_host[pfn] = pfn;
+        for ( pfn = 0; pfn < dom->p2m_size; pfn++ )
+            dom->p2m_host[pfn] = INVALID_P2M_ENTRY;
 
         /* allocate guest memory */
-        for ( i = rc = allocsz = 0;
-              (i < dom->total_pages) && !rc;
-              i += allocsz )
+        for ( i = 0; i < dom->nr_vmemranges; i++ )
         {
-            allocsz = dom->total_pages - i;
-            if ( allocsz > 1024*1024 )
-                allocsz = 1024*1024;
-            rc = xc_domain_populate_physmap_exact(
-                dom->xch, dom->guest_domid, allocsz,
-                0, 0, &dom->p2m_host[i]);
+            unsigned int memflags;
+            uint64_t pages;
+            unsigned int pnode = dom->vnode_to_pnode[dom->vmemranges[i].nid];
+
+            memflags = 0;
+            if ( pnode != XC_VNUMA_NO_NODE )
+            {
+                memflags |= XENMEMF_exact_node(pnode);
+                memflags |= XENMEMF_exact_node_request;
+            }
+
+            pages = (dom->vmemranges[i].end - dom->vmemranges[i].start)
+                >> PAGE_SHIFT;
+            pfn_base = dom->vmemranges[i].start >> PAGE_SHIFT;
+
+            for ( pfn = pfn_base; pfn < pfn_base+pages; pfn++ )
+                dom->p2m_host[pfn] = pfn;
+
+            for ( j = 0; j < pages; j += allocsz )
+            {
+                allocsz = pages - j;
+                if ( allocsz > 1024*1024 )
+                    allocsz = 1024*1024;
+
+                rc = xc_domain_populate_physmap_exact(dom->xch,
+                         dom->guest_domid, allocsz, 0, memflags,
+                         &dom->p2m_host[pfn_base+j]);
+
+                if ( rc )
+                {
+                    if ( pnode != XC_VNUMA_NO_NODE )
+                        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                                     "%s: failed to allocate 0x%"PRIx64" pages (v=%d, p=%d)\n",
+                                     __func__, pages, i, pnode);
+                    else
+                        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                                     "%s: failed to allocate 0x%"PRIx64" pages\n",
+                                     __func__, pages);
+                    return rc;
+                }
+            }
         }
 
         /* Ensure no unclaimed pages are left unused.
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 45b8644..1809674 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -35,6 +35,8 @@
 
 #include <xen/sys/privcmd.h>
 
+#define XC_VNUMA_NO_NODE (~0U)
+
 #if defined(HAVE_VALGRIND_MEMCHECK_H) && !defined(NDEBUG) && !defined(__MINIOS__)
 /* Compile in Valgrind client requests? */
 #include <valgrind/memcheck.h>
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 06/24] libxl: introduce vNUMA types
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (4 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-16 14:58   ` Dario Faggioli
  2015-02-12 19:44 ` [PATCH v5 07/24] libxl: add vmemrange to libxl__domain_build_state Wei Liu
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

A domain can contain several virtual NUMA nodes, hence we introduce an
array in libxl_domain_build_info.

libxl_vnode_info contains the size of memory in that node, the distance
from that node to every nodes, the underlying pnode and a bitmap of
vcpus.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
Changes in v4:
1. Use MemKB.

Changes in v3:
1. Add commit message.
---
 tools/libxl/libxl_types.idl | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 02be466..14c7e7c 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -356,6 +356,13 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
     ("budget",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
     ])
 
+libxl_vnode_info = Struct("vnode_info", [
+    ("memkb", MemKB),
+    ("distances", Array(uint32, "num_distances")), # distances from this node to other nodes
+    ("pnode", uint32), # physical node of this node
+    ("vcpus", libxl_bitmap), # vcpus in this node
+    ])
+
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -376,6 +383,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("disable_migrate", libxl_defbool),
     ("cpuid",           libxl_cpuid_policy_list),
     ("blkdev_start",    string),
+
+    ("vnuma_nodes", Array(libxl_vnode_info, "num_vnuma_nodes")),
     
     ("device_model_version", libxl_device_model_version),
     ("device_model_stubdomain", libxl_defbool),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 07/24] libxl: add vmemrange to libxl__domain_build_state
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (5 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 06/24] libxl: introduce vNUMA types Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-16 16:00   ` Dario Faggioli
  2015-02-12 19:44 ` [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check Wei Liu
                   ` (16 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

A vnode consists of one or more vmemranges (virtual memory range).  One
example of multiple vmemranges is that there is a hole in one vnode.

Currently we haven't exported vmemrange interface to libxl user.
Vmemranges are generated during domain build, so we have relevant
structures in domain build state.

Later if we discover we need to export the interface, those structures
can be moved to libxl_domain_build_info as well.

These new fields (along with other fields in that struct) are set to 0
at start of day so we don't need to explicitly initialise them. A
following patch which introduces an independent checking function will
need to access these fields. I don't feel very comfortable squashing
this change into that one so I didn't use a single commit.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
Changes in v5:
1. Fix commit message.

Changes in v4:
1. Improve commit message.

Changes in v3:
1. Rewrite commit message.
---
 tools/libxl/libxl_internal.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 934465a..6d3ac58 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -973,6 +973,9 @@ typedef struct {
     libxl__file_reference pv_ramdisk;
     const char * pv_cmdline;
     bool pvh_enabled;
+
+    xen_vmemrange_t *vmemranges;
+    uint32_t num_vmemranges;
 } libxl__domain_build_state;
 
 _hidden int libxl__build_pre(libxl__gc *gc, uint32_t domid,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (6 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 07/24] libxl: add vmemrange to libxl__domain_build_state Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 14:15   ` Ian Jackson
                     ` (2 more replies)
  2015-02-12 19:44 ` [PATCH v5 09/24] libxl: x86: factor out e820_host_sanitize Wei Liu
                   ` (15 subsequent siblings)
  23 siblings, 3 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

This function is used to check whether vNUMA configuration (be it
auto-generated or supplied by user) is valid.

Define a new error code ERROR_VNUMA_CONFIG_INVALID.

The checks performed can be found in the comment of the function.

This vNUMA function (and future ones) is placed in a new file called
libxl_vnuma.c

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
Changes in v5:
1. Define and use new error code.
2. Use LOG macro.
3. Fix hard tabs.

Changes in v4:
1. Adapt to new interface.

Changes in v3:
1. Rewrite commit log.
2. Shorten two error messages.
---
 tools/libxl/Makefile         |   2 +-
 tools/libxl/libxl_internal.h |   7 +++
 tools/libxl/libxl_types.idl  |   1 +
 tools/libxl/libxl_vnuma.c    | 131 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 140 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxl/libxl_vnuma.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 7329521..1b16598 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -93,7 +93,7 @@ LIBXL_LIBS += -lyajl
 LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
 			libxl_internal.o libxl_utils.o libxl_uuid.o \
-			libxl_json.o libxl_aoutils.o libxl_numa.o \
+			libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o \
 			libxl_save_callout.o _libxl_save_msgs_callout.o \
 			libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 6d3ac58..258be0d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3394,6 +3394,13 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
     libxl_bitmap_copy(CTX, &cndt->nodemap, nodemap);
 }
 
+/* Check if vNUMA config is valid. Returns 0 if valid,
+ * ERROR_VNUMA_CONFIG_INVALID otherwise.
+ */
+int libxl__vnuma_config_check(libxl__gc *gc,
+                              const libxl_domain_build_info *b_info,
+                              const libxl__domain_build_state *state);
+
 _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
                                    const libxl_ms_vm_genid *id);
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 14c7e7c..23951fc 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -63,6 +63,7 @@ libxl_error = Enumeration("error", [
     (-17, "DEVICE_EXISTS"),
     (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
     (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
+    (-20, "VNUMA_CONFIG_INVALID"),
     ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
new file mode 100644
index 0000000..fa5aa8d
--- /dev/null
+++ b/tools/libxl/libxl_vnuma.c
@@ -0,0 +1,131 @@
+/*
+ * Copyright (C) 2014      Citrix Ltd.
+ * Author Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+#include "libxl_osdeps.h" /* must come before any other headers */
+#include "libxl_internal.h"
+#include <stdlib.h>
+
+/* Sort vmemranges in ascending order with "start" */
+static int compare_vmemrange(const void *a, const void *b)
+{
+    const xen_vmemrange_t *x = a, *y = b;
+    if (x->start < y->start)
+        return -1;
+    if (x->start > y->start)
+        return 1;
+    return 0;
+}
+
+/* Check if vNUMA configuration is valid:
+ *  1. all pnodes inside vnode_to_pnode array are valid
+ *  2. one vcpu belongs to and only belongs to one vnode
+ *  3. each vmemrange is valid and doesn't overlap with each other
+ */
+int libxl__vnuma_config_check(libxl__gc *gc,
+                              const libxl_domain_build_info *b_info,
+                              const libxl__domain_build_state *state)
+{
+    int i, j, rc = ERROR_VNUMA_CONFIG_INVALID, nr_nodes;
+    libxl_numainfo *ninfo = NULL;
+    uint64_t total_memkb = 0;
+    libxl_bitmap cpumap;
+    libxl_vnode_info *p;
+
+    libxl_bitmap_init(&cpumap);
+
+    /* Check pnode specified is valid */
+    ninfo = libxl_get_numainfo(CTX, &nr_nodes);
+    if (!ninfo) {
+        LOG(ERROR, "libxl_get_numainfo failed");
+        goto out;
+    }
+
+    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+        uint32_t pnode;
+
+        p = &b_info->vnuma_nodes[i];
+        pnode = p->pnode;
+
+        /* The pnode specified is not valid? */
+        if (pnode >= nr_nodes) {
+            LOG(ERROR, "Invalid pnode %d specified", pnode);
+            goto out;
+        }
+
+        total_memkb += p->memkb;
+    }
+
+    if (total_memkb != b_info->max_memkb) {
+        LOG(ERROR, "Amount of memory mismatch (0x%"PRIx64" != 0x%"PRIx64")",
+            total_memkb, b_info->max_memkb);
+        goto out;
+    }
+
+    /* Check vcpu mapping */
+    libxl_cpu_bitmap_alloc(CTX, &cpumap, b_info->max_vcpus);
+    libxl_bitmap_set_none(&cpumap);
+    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+        p = &b_info->vnuma_nodes[i];
+        libxl_for_each_set_bit(j, p->vcpus) {
+            if (!libxl_bitmap_test(&cpumap, j))
+                libxl_bitmap_set(&cpumap, j);
+            else {
+                LOG(ERROR, "Vcpu %d assigned more than once", j);
+                goto out;
+            }
+        }
+    }
+
+    for (i = 0; i < b_info->max_vcpus; i++) {
+        if (!libxl_bitmap_test(&cpumap, i)) {
+            LOG(ERROR, "Vcpu %d is not assigned to any vnode", i);
+            goto out;
+        }
+    }
+
+    /* Check vmemranges */
+    qsort(state->vmemranges, state->num_vmemranges, sizeof(xen_vmemrange_t),
+          compare_vmemrange);
+
+    for (i = 0; i < state->num_vmemranges; i++) {
+        if (state->vmemranges[i].end < state->vmemranges[i].start) {
+                LOG(ERROR, "Vmemrange end < start");
+                goto out;
+        }
+    }
+
+    for (i = 0; i < state->num_vmemranges - 1; i++) {
+        if (state->vmemranges[i].end > state->vmemranges[i+1].start) {
+            LOG(ERROR,
+                "Vmemranges overlapped, 0x%"PRIx64"-0x%"PRIx64", 0x%"PRIx64"-0x%"PRIx64,
+                state->vmemranges[i].start, state->vmemranges[i].end,
+                state->vmemranges[i+1].start, state->vmemranges[i+1].end);
+            goto out;
+        }
+    }
+
+    rc = 0;
+out:
+    if (ninfo) libxl_numainfo_dispose(ninfo);
+    libxl_bitmap_dispose(&cpumap);
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 09/24] libxl: x86: factor out e820_host_sanitize
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (7 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 15:42   ` Andrew Cooper
  2015-02-12 19:44 ` [PATCH v5 10/24] libxl: functions to build vmemranges for PV guest Wei Liu
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

This function gets the machine E820 map and sanitize it according to PV
guest configuration.

This will be used in later patch. No functional change introduced in
this patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
Changes in v4:
1. Use actual size of the map instead of using E820MAX.
---
 tools/libxl/libxl_x86.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 9ceb373..d012b4d 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -207,6 +207,27 @@ static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[],
     return 0;
 }
 
+static int e820_host_sanitize(libxl__gc *gc,
+                              libxl_domain_build_info *b_info,
+                              struct e820entry map[],
+                              uint32_t *nr)
+{
+    int rc;
+
+    rc = xc_get_machine_memory_map(CTX->xch, map, *nr);
+    if (rc < 0) {
+        errno = rc;
+        return ERROR_FAIL;
+    }
+
+    *nr = rc;
+
+    rc = e820_sanitize(CTX, map, nr, b_info->target_memkb,
+                       (b_info->max_memkb - b_info->target_memkb) +
+                       b_info->u.pv.slack_memkb);
+    return rc;
+}
+
 static int libxl__e820_alloc(libxl__gc *gc, uint32_t domid,
         libxl_domain_config *d_config)
 {
@@ -223,15 +244,8 @@ static int libxl__e820_alloc(libxl__gc *gc, uint32_t domid,
     if (!libxl_defbool_val(b_info->u.pv.e820_host))
         return ERROR_INVAL;
 
-    rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
-    if (rc < 0) {
-        errno = rc;
-        return ERROR_FAIL;
-    }
-    nr = rc;
-    rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
-                       (b_info->max_memkb - b_info->target_memkb) +
-                       b_info->u.pv.slack_memkb);
+    nr = E820MAX;
+    rc = e820_host_sanitize(gc, b_info, map, &nr);
     if (rc)
         return ERROR_FAIL;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 10/24] libxl: functions to build vmemranges for PV guest
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (8 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 09/24] libxl: x86: factor out e820_host_sanitize Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 15:49   ` Andrew Cooper
  2015-02-17 15:28   ` Dario Faggioli
  2015-02-12 19:44 ` [PATCH v5 11/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
                   ` (13 subsequent siblings)
  23 siblings, 2 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Introduce a arch-independent routine to generate one vmemrange per
vnode. Also introduce arch-dependent routines for different
architectures because part of the process is arch-specific -- ARM has
yet have NUMA support and E820 is x86 only.

For those x86 guests who care about machine E820 map (i.e. with
e820_host=1), vnode is further split into several vmemranges to
accommodate memory holes.  A few stubs for libxl_arm.c are created.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
Changes in v5:
1. Allocate array all in one go.
2. Reverse the logic of vmemranges generation.

Changes in v4:
1. Adapt to new interface.
2. Address Ian Jackson's comments.

Changes in v3:
1. Rewrite commit log.
---
 tools/libxl/libxl_arch.h     |  6 ++++
 tools/libxl/libxl_arm.c      |  8 +++++
 tools/libxl/libxl_internal.h |  8 +++++
 tools/libxl/libxl_vnuma.c    | 41 +++++++++++++++++++++++++
 tools/libxl/libxl_x86.c      | 73 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 136 insertions(+)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index d3bc136..e249048 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -27,4 +27,10 @@ int libxl__arch_domain_init_hw_description(libxl__gc *gc,
 int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
                                       libxl_domain_build_info *info,
                                       struct xc_dom_image *dom);
+
+/* build vNUMA vmemrange with arch specific information */
+int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
+                                      uint32_t domid,
+                                      libxl_domain_build_info *b_info,
+                                      libxl__domain_build_state *state);
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 65a762b..7da254f 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -707,6 +707,14 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
     return 0;
 }
 
+int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
+                                      uint32_t domid,
+                                      libxl_domain_build_info *info,
+                                      libxl__domain_build_state *state)
+{
+    return libxl__vnuma_build_vmemrange_pv_generic(gc, domid, info, state);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 258be0d..7d1e1cf 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3400,6 +3400,14 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
 int libxl__vnuma_config_check(libxl__gc *gc,
                               const libxl_domain_build_info *b_info,
                               const libxl__domain_build_state *state);
+int libxl__vnuma_build_vmemrange_pv_generic(libxl__gc *gc,
+                                            uint32_t domid,
+                                            libxl_domain_build_info *b_info,
+                                            libxl__domain_build_state *state);
+int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
+                                    uint32_t domid,
+                                    libxl_domain_build_info *b_info,
+                                    libxl__domain_build_state *state);
 
 _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
                                    const libxl_ms_vm_genid *id);
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
index fa5aa8d..3d46239 100644
--- a/tools/libxl/libxl_vnuma.c
+++ b/tools/libxl/libxl_vnuma.c
@@ -14,6 +14,7 @@
  */
 #include "libxl_osdeps.h" /* must come before any other headers */
 #include "libxl_internal.h"
+#include "libxl_arch.h"
 #include <stdlib.h>
 
 /* Sort vmemranges in ascending order with "start" */
@@ -122,6 +123,46 @@ out:
     return rc;
 }
 
+
+int libxl__vnuma_build_vmemrange_pv_generic(libxl__gc *gc,
+                                            uint32_t domid,
+                                            libxl_domain_build_info *b_info,
+                                            libxl__domain_build_state *state)
+{
+    int i;
+    uint64_t next;
+    xen_vmemrange_t *v = NULL;
+
+    /* Generate one vmemrange for each virtual node. */
+    GCREALLOC_ARRAY(v, b_info->num_vnuma_nodes);
+    next = 0;
+    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+        libxl_vnode_info *p = &b_info->vnuma_nodes[i];
+
+        v[i].start = next;
+        v[i].end = next + (p->memkb << 10);
+        v[i].flags = 0;
+        v[i].nid = i;
+
+        next = v[i].end;
+    }
+
+    state->vmemranges = v;
+    state->num_vmemranges = i;
+
+    return 0;
+}
+
+/* Build vmemranges for PV guest */
+int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
+                                    uint32_t domid,
+                                    libxl_domain_build_info *b_info,
+                                    libxl__domain_build_state *state)
+{
+    assert(state->vmemranges == NULL);
+    return libxl__arch_vnuma_build_vmemrange(gc, domid, b_info, state);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index d012b4d..d37cca1 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -339,6 +339,79 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
     return 0;
 }
 
+/* Return 0 on success, ERROR_* on failure. */
+int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
+                                      uint32_t domid,
+                                      libxl_domain_build_info *b_info,
+                                      libxl__domain_build_state *state)
+{
+    int nid, nr_vmemrange, rc;
+    uint32_t nr_e820, e820_count;
+    struct e820entry map[E820MAX];
+    xen_vmemrange_t *vmemranges;
+
+    /* If e820_host is not set, call the generic function */
+    if (!(b_info->type == LIBXL_DOMAIN_TYPE_PV &&
+          libxl_defbool_val(b_info->u.pv.e820_host)))
+        return libxl__vnuma_build_vmemrange_pv_generic(gc, domid, b_info,
+                                                       state);
+
+    assert(state->vmemranges == NULL);
+
+    nr_e820 = E820MAX;
+    rc = e820_host_sanitize(gc, b_info, map, &nr_e820);
+    if (rc) goto out;
+
+    e820_count = 0;
+    nr_vmemrange = 0;
+    vmemranges = NULL;
+    for (nid = 0; nid < b_info->num_vnuma_nodes; nid++) {
+        libxl_vnode_info *p = &b_info->vnuma_nodes[nid];
+        uint64_t remaining_bytes = (p->memkb << 10), bytes;
+
+        while (remaining_bytes > 0) {
+            if (e820_count >= nr_e820) {
+                rc = ERROR_NOMEM;
+                goto out;
+            }
+
+            /* Skip non RAM region */
+            if (map[e820_count].type != E820_RAM) {
+                e820_count++;
+                continue;
+            }
+
+            GCREALLOC_ARRAY(vmemranges, nr_vmemrange+1);
+
+            bytes = map[e820_count].size >= remaining_bytes ?
+                remaining_bytes : map[e820_count].size;
+
+            vmemranges[nr_vmemrange].start = map[e820_count].addr;
+            vmemranges[nr_vmemrange].end = map[e820_count].addr + bytes;
+
+            if (map[e820_count].size >= remaining_bytes) {
+                map[e820_count].addr += bytes;
+                map[e820_count].size -= bytes;
+            } else {
+                e820_count++;
+            }
+
+            remaining_bytes -= bytes;
+
+            vmemranges[nr_vmemrange].flags = 0;
+            vmemranges[nr_vmemrange].nid = nid;
+            nr_vmemrange++;
+        }
+    }
+
+    state->vmemranges = vmemranges;
+    state->num_vmemranges = nr_vmemrange;
+
+    rc = 0;
+out:
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 11/24] libxl: build, check and pass vNUMA info to Xen for PV guest
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (9 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 10/24] libxl: functions to build vmemranges for PV guest Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 15:54   ` Andrew Cooper
  2015-02-17 14:49   ` Dario Faggioli
  2015-02-12 19:44 ` [PATCH v5 12/24] hvmloader: retrieve vNUMA information from hypervisor Wei Liu
                   ` (12 subsequent siblings)
  23 siblings, 2 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Transform the user supplied vNUMA configuration into libxl internal
representations, and finally libxc representations. Check validity of
the configuration along the line.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
Changes in v5:
1. Adapt to change of interface (ditching xc_vnuma_info).

Changes in v4:
1. Adapt to new interfaces.

Changes in v3:
1. Add more commit log.
---
 tools/libxl/libxl_dom.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 48d661a..1ff0704 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -515,6 +515,51 @@ retry_transaction:
     return 0;
 }
 
+static int set_vnuma_info(libxl__gc *gc, uint32_t domid,
+                          const libxl_domain_build_info *info,
+                          const libxl__domain_build_state *state)
+{
+    int rc = 0;
+    int i, nr_vdistance;
+    unsigned int *vcpu_to_vnode, *vnode_to_pnode, *vdistance = NULL;
+
+    vcpu_to_vnode = libxl__calloc(gc, info->max_vcpus,
+                                  sizeof(unsigned int));
+    vnode_to_pnode = libxl__calloc(gc, info->num_vnuma_nodes,
+                                   sizeof(unsigned int));
+
+    nr_vdistance = info->num_vnuma_nodes * info->num_vnuma_nodes;
+    vdistance = libxl__calloc(gc, nr_vdistance, sizeof(unsigned int));
+
+    for (i = 0; i < info->num_vnuma_nodes; i++) {
+        libxl_vnode_info *v = &info->vnuma_nodes[i];
+        int bit;
+
+        /* vnode to pnode mapping */
+        vnode_to_pnode[i] = v->pnode;
+
+        /* vcpu to vnode mapping */
+        libxl_for_each_set_bit(bit, v->vcpus)
+            vcpu_to_vnode[bit] = i;
+
+        /* node distances */
+        assert(info->num_vnuma_nodes == v->num_distances);
+        memcpy(vdistance + (i * info->num_vnuma_nodes),
+               v->distances,
+               v->num_distances * sizeof(unsigned int));
+    }
+
+    if (xc_domain_setvnuma(CTX->xch, domid, info->num_vnuma_nodes,
+                           state->num_vmemranges, info->max_vcpus,
+                           state->vmemranges, vdistance,
+                           vcpu_to_vnode, vnode_to_pnode) < 0) {
+        LOGE(ERROR, "xc_domain_setvnuma failed");
+        rc = ERROR_FAIL;
+    }
+
+    return rc;
+}
+
 int libxl__build_pv(libxl__gc *gc, uint32_t domid,
              libxl_domain_build_info *info, libxl__domain_build_state *state)
 {
@@ -572,6 +617,38 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
     dom->xenstore_domid = state->store_domid;
     dom->claim_enabled = libxl_defbool_val(info->claim_mode);
 
+    if (info->num_vnuma_nodes != 0) {
+        int i;
+
+        ret = libxl__vnuma_build_vmemrange_pv(gc, domid, info, state);
+        if (ret) {
+            LOGE(ERROR, "cannot build vmemranges");
+            goto out;
+        }
+        ret = libxl__vnuma_config_check(gc, info, state);
+        if (ret) goto out;
+
+        ret = set_vnuma_info(gc, domid, info, state);
+        if (ret) goto out;
+
+        dom->nr_vmemranges = state->num_vmemranges;
+        dom->vmemranges = xc_dom_malloc(dom, sizeof(*dom->vmemranges) *
+                                        dom->nr_vmemranges);
+
+        for (i = 0; i < dom->nr_vmemranges; i++) {
+            dom->vmemranges[i].start = state->vmemranges[i].start;
+            dom->vmemranges[i].end   = state->vmemranges[i].end;
+            dom->vmemranges[i].flags = state->vmemranges[i].flags;
+            dom->vmemranges[i].nid   = state->vmemranges[i].nid;
+        }
+
+        dom->nr_vnodes = info->num_vnuma_nodes;
+        dom->vnode_to_pnode = xc_dom_malloc(dom, sizeof(*dom->vnode_to_pnode) *
+                                            dom->nr_vnodes);
+        for (i = 0; i < info->num_vnuma_nodes; i++)
+            dom->vnode_to_pnode[i] = info->vnuma_nodes[0].pnode;
+    }
+
     if ( (ret = xc_dom_boot_xen_init(dom, ctx->xch, domid)) != 0 ) {
         LOGE(ERROR, "xc_dom_boot_xen_init failed");
         goto out;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 12/24] hvmloader: retrieve vNUMA information from hypervisor
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (10 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 11/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 15:58   ` Andrew Cooper
  2015-02-17 11:36   ` Jan Beulich
  2015-02-12 19:44 ` [PATCH v5 13/24] hvmloader: construct SRAT Wei Liu
                   ` (11 subsequent siblings)
  23 siblings, 2 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Hvmloader issues XENMEM_get_vnumainfo hypercall and stores the
information retrieved in scratch space for later use.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
---
Changes in v5:
1. Group scratch_alloc togeter.
2. Use memset.
3. Drop unnecessary "return";
4. Rebase onto Jan's errno ABI change.

Changes in v4:
1. Use *vnode_to_pnode to calculate size.
2. Remove loop.

Changes in v3:
1. Move init_vnuma_info before ACPI stuff.
2. Fix errno.h inclusion.
3. Remove upper limits and use loop.
---
 tools/firmware/hvmloader/Makefile    |  2 +-
 tools/firmware/hvmloader/hvmloader.c |  3 ++
 tools/firmware/hvmloader/vnuma.c     | 84 ++++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/vnuma.h     | 52 ++++++++++++++++++++++
 4 files changed, 140 insertions(+), 1 deletion(-)
 create mode 100644 tools/firmware/hvmloader/vnuma.c
 create mode 100644 tools/firmware/hvmloader/vnuma.h

diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
index b759e81..cf967fd 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -29,7 +29,7 @@ LOADADDR = 0x100000
 CFLAGS += $(CFLAGS_xeninclude)
 
 OBJS  = hvmloader.o mp_tables.o util.o smbios.o 
-OBJS += smp.o cacheattr.o xenbus.o
+OBJS += smp.o cacheattr.o xenbus.o vnuma.o
 OBJS += e820.o pci.o pir.o ctype.o
 OBJS += hvm_param.o
 ifeq ($(debug),y)
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index 7b0da38..25b7f08 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -26,6 +26,7 @@
 #include "pci_regs.h"
 #include "apic_regs.h"
 #include "acpi/acpi2_0.h"
+#include "vnuma.h"
 #include <xen/version.h>
 #include <xen/hvm/params.h>
 
@@ -310,6 +311,8 @@ int main(void)
 
     if ( acpi_enabled )
     {
+        init_vnuma_info();
+
         if ( bios->acpi_build_tables )
         {
             printf("Loading ACPI ...\n");
diff --git a/tools/firmware/hvmloader/vnuma.c b/tools/firmware/hvmloader/vnuma.c
new file mode 100644
index 0000000..a71d31a
--- /dev/null
+++ b/tools/firmware/hvmloader/vnuma.c
@@ -0,0 +1,84 @@
+/*
+ * vnuma.c: obtain vNUMA information from hypervisor
+ *
+ * Copyright (c) 2014 Wei Liu, Citrix Systems (R&D) Ltd.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include "util.h"
+#include "hypercall.h"
+#include "vnuma.h"
+#include <xen/errno.h>
+
+unsigned int nr_vnodes, nr_vmemranges;
+unsigned int *vcpu_to_vnode, *vdistance;
+xen_vmemrange_t *vmemrange;
+
+void init_vnuma_info(void)
+{
+    int rc;
+    struct xen_vnuma_topology_info vnuma_topo;
+
+    memset(&vnuma_topo, 0, sizeof(vnuma_topo));
+    vnuma_topo.domid = DOMID_SELF;
+
+    rc = hypercall_memory_op(XENMEM_get_vnumainfo, &vnuma_topo);
+
+    if ( rc != -XEN_ENOBUFS )
+        return;
+
+    ASSERT(vnuma_topo.nr_vcpus == hvm_info->nr_vcpus);
+
+    vcpu_to_vnode =
+        scratch_alloc(sizeof(*vcpu_to_vnode) * hvm_info->nr_vcpus, 0);
+    vdistance = scratch_alloc(sizeof(uint32_t) * vnuma_topo.nr_vnodes *
+                              vnuma_topo.nr_vnodes, 0);
+    vmemrange = scratch_alloc(sizeof(xen_vmemrange_t) *
+                              vnuma_topo.nr_vmemranges, 0);
+
+    set_xen_guest_handle(vnuma_topo.vdistance.h, vdistance);
+    set_xen_guest_handle(vnuma_topo.vcpu_to_vnode.h, vcpu_to_vnode);
+    set_xen_guest_handle(vnuma_topo.vmemrange.h, vmemrange);
+
+    rc = hypercall_memory_op(XENMEM_get_vnumainfo, &vnuma_topo);
+
+    if ( rc < 0 )
+    {
+        printf("Failed to retrieve vNUMA information, rc = %d\n", rc);
+        return;
+    }
+
+    nr_vnodes = vnuma_topo.nr_vnodes;
+    nr_vmemranges = vnuma_topo.nr_vmemranges;
+}
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/firmware/hvmloader/vnuma.h b/tools/firmware/hvmloader/vnuma.h
new file mode 100644
index 0000000..63b648a
--- /dev/null
+++ b/tools/firmware/hvmloader/vnuma.h
@@ -0,0 +1,52 @@
+/******************************************************************************
+ * vnuma.h
+ *
+ * Copyright (c) 2014, Wei Liu
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __HVMLOADER_VNUMA_H__
+#define __HVMLOADER_VNUMA_H__
+
+#include <xen/memory.h>
+
+extern unsigned int nr_vnodes, nr_vmemranges;
+extern unsigned int *vcpu_to_vnode, *vdistance;
+extern xen_vmemrange_t *vmemrange;
+
+void init_vnuma_info(void);
+
+#endif /* __HVMLOADER_VNUMA_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 13/24] hvmloader: construct SRAT
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (11 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 12/24] hvmloader: retrieve vNUMA information from hypervisor Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 16:07   ` Andrew Cooper
  2015-02-12 19:44 ` [PATCH v5 14/24] hvmloader: construct SLIT Wei Liu
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
---
Changes in v3:
1. Remove redundant variable.
2. Coding style fix.
3. Add assertion.

Changes in v2:
1. Remove explicit zero initializers.
2. Adapt to new vNUMA retrieval routine.
3. Move SRAT very late in secondary table build.
---
 tools/firmware/hvmloader/acpi/acpi2_0.h | 53 ++++++++++++++++++++++++
 tools/firmware/hvmloader/acpi/build.c   | 72 +++++++++++++++++++++++++++++++++
 2 files changed, 125 insertions(+)

diff --git a/tools/firmware/hvmloader/acpi/acpi2_0.h b/tools/firmware/hvmloader/acpi/acpi2_0.h
index 7b22d80..6169213 100644
--- a/tools/firmware/hvmloader/acpi/acpi2_0.h
+++ b/tools/firmware/hvmloader/acpi/acpi2_0.h
@@ -364,6 +364,57 @@ struct acpi_20_madt_intsrcovr {
 };
 
 /*
+ * System Resource Affinity Table header definition (SRAT)
+ */
+struct acpi_20_srat {
+    struct acpi_header header;
+    uint32_t table_revision;
+    uint32_t reserved2[2];
+};
+
+#define ACPI_SRAT_TABLE_REVISION 1
+
+/*
+ * System Resource Affinity Table structure types.
+ */
+#define ACPI_PROCESSOR_AFFINITY 0x0
+#define ACPI_MEMORY_AFFINITY    0x1
+struct acpi_20_srat_processor {
+    uint8_t type;
+    uint8_t length;
+    uint8_t domain;
+    uint8_t apic_id;
+    uint32_t flags;
+    uint8_t sapic_id;
+    uint8_t domain_hi[3];
+    uint32_t reserved;
+};
+
+/*
+ * Local APIC Affinity Flags.  All other bits are reserved and must be 0.
+ */
+#define ACPI_LOCAL_APIC_AFFIN_ENABLED (1 << 0)
+
+struct acpi_20_srat_memory {
+    uint8_t type;
+    uint8_t length;
+    uint32_t domain;
+    uint16_t reserved;
+    uint64_t base_address;
+    uint64_t mem_length;
+    uint32_t reserved2;
+    uint32_t flags;
+    uint64_t reserved3;
+};
+
+/*
+ * Memory Affinity Flags.  All other bits are reserved and must be 0.
+ */
+#define ACPI_MEM_AFFIN_ENABLED (1 << 0)
+#define ACPI_MEM_AFFIN_HOTPLUGGABLE (1 << 1)
+#define ACPI_MEM_AFFIN_NONVOLATILE (1 << 2)
+
+/*
  * Table Signatures.
  */
 #define ACPI_2_0_RSDP_SIGNATURE ASCII64('R','S','D',' ','P','T','R',' ')
@@ -375,6 +426,7 @@ struct acpi_20_madt_intsrcovr {
 #define ACPI_2_0_TCPA_SIGNATURE ASCII32('T','C','P','A')
 #define ACPI_2_0_HPET_SIGNATURE ASCII32('H','P','E','T')
 #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
+#define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
 
 /*
  * Table revision numbers.
@@ -388,6 +440,7 @@ struct acpi_20_madt_intsrcovr {
 #define ACPI_2_0_HPET_REVISION 0x01
 #define ACPI_2_0_WAET_REVISION 0x01
 #define ACPI_1_0_FADT_REVISION 0x01
+#define ACPI_2_0_SRAT_REVISION 0x01
 
 #pragma pack ()
 
diff --git a/tools/firmware/hvmloader/acpi/build.c b/tools/firmware/hvmloader/acpi/build.c
index 1431296..3e96c23 100644
--- a/tools/firmware/hvmloader/acpi/build.c
+++ b/tools/firmware/hvmloader/acpi/build.c
@@ -23,6 +23,7 @@
 #include "ssdt_pm.h"
 #include "../config.h"
 #include "../util.h"
+#include "../vnuma.h"
 #include <xen/hvm/hvm_xs_strings.h>
 #include <xen/hvm/params.h>
 
@@ -203,6 +204,66 @@ static struct acpi_20_waet *construct_waet(void)
     return waet;
 }
 
+static struct acpi_20_srat *construct_srat(void)
+{
+    struct acpi_20_srat *srat;
+    struct acpi_20_srat_processor *processor;
+    struct acpi_20_srat_memory *memory;
+    unsigned int size;
+    void *p;
+    int i;
+
+    size = sizeof(*srat) + sizeof(*processor) * hvm_info->nr_vcpus +
+        sizeof(*memory) * nr_vmemranges;
+
+    p = mem_alloc(size, 16);
+    if ( !p )
+        return NULL;
+
+    srat = p;
+    memset(srat, 0, sizeof(*srat));
+    srat->header.signature    = ACPI_2_0_SRAT_SIGNATURE;
+    srat->header.revision     = ACPI_2_0_SRAT_REVISION;
+    fixed_strcpy(srat->header.oem_id, ACPI_OEM_ID);
+    fixed_strcpy(srat->header.oem_table_id, ACPI_OEM_TABLE_ID);
+    srat->header.oem_revision = ACPI_OEM_REVISION;
+    srat->header.creator_id   = ACPI_CREATOR_ID;
+    srat->header.creator_revision = ACPI_CREATOR_REVISION;
+    srat->table_revision      = ACPI_SRAT_TABLE_REVISION;
+
+    processor = (struct acpi_20_srat_processor *)(srat + 1);
+    for ( i = 0; i < hvm_info->nr_vcpus; i++ )
+    {
+        memset(processor, 0, sizeof(*processor));
+        processor->type     = ACPI_PROCESSOR_AFFINITY;
+        processor->length   = sizeof(*processor);
+        processor->domain   = vcpu_to_vnode[i];
+        processor->apic_id  = LAPIC_ID(i);
+        processor->flags    = ACPI_LOCAL_APIC_AFFIN_ENABLED;
+        processor++;
+    }
+
+    memory = (struct acpi_20_srat_memory *)processor;
+    for ( i = 0; i < nr_vmemranges; i++ )
+    {
+        memset(memory, 0, sizeof(*memory));
+        memory->type          = ACPI_MEMORY_AFFINITY;
+        memory->length        = sizeof(*memory);
+        memory->domain        = vmemrange[i].nid;
+        memory->flags         = ACPI_MEM_AFFIN_ENABLED;
+        memory->base_address  = vmemrange[i].start;
+        memory->mem_length    = vmemrange[i].end - vmemrange[i].start;
+        memory++;
+    }
+
+    ASSERT(((unsigned long)memory) - ((unsigned long)p) == size);
+
+    srat->header.length = size;
+    set_checksum(srat, offsetof(struct acpi_header, checksum), size);
+
+    return srat;
+}
+
 static int construct_passthrough_tables(unsigned long *table_ptrs,
                                         int nr_tables)
 {
@@ -257,6 +318,7 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
     struct acpi_20_hpet *hpet;
     struct acpi_20_waet *waet;
     struct acpi_20_tcpa *tcpa;
+    struct acpi_20_srat *srat;
     unsigned char *ssdt;
     static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001};
     uint16_t *tis_hdr;
@@ -346,6 +408,16 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
         }
     }
 
+    /* SRAT */
+    if ( nr_vnodes > 0 )
+    {
+        srat = construct_srat();
+        if ( srat )
+            table_ptrs[nr_tables++] = (unsigned long)srat;
+        else
+            printf("Failed to build SRAT, skipping...\n");
+    }
+
     /* Load any additional tables passed through. */
     nr_tables += construct_passthrough_tables(table_ptrs, nr_tables);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 14/24] hvmloader: construct SLIT
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (12 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 13/24] hvmloader: construct SRAT Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 16:10   ` Andrew Cooper
  2015-02-12 19:44 ` [PATCH v5 15/24] libxc: indentation change to xc_hvm_build_x86.c Wei Liu
                   ` (9 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
---
Changes in v3:
1. Coding style fix.
2. Fix an error code.
3. Use unsigned int for loop variable.

Changes in v2:
1. Adapt to new vNUMA retrieval routine.
2. Move SLIT very late in secondary table build.
---
 tools/firmware/hvmloader/acpi/acpi2_0.h |  8 +++++++
 tools/firmware/hvmloader/acpi/build.c   | 40 ++++++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/acpi/acpi2_0.h b/tools/firmware/hvmloader/acpi/acpi2_0.h
index 6169213..d698095 100644
--- a/tools/firmware/hvmloader/acpi/acpi2_0.h
+++ b/tools/firmware/hvmloader/acpi/acpi2_0.h
@@ -414,6 +414,12 @@ struct acpi_20_srat_memory {
 #define ACPI_MEM_AFFIN_HOTPLUGGABLE (1 << 1)
 #define ACPI_MEM_AFFIN_NONVOLATILE (1 << 2)
 
+struct acpi_20_slit {
+    struct acpi_header header;
+    uint64_t localities;
+    uint8_t entry[0];
+};
+
 /*
  * Table Signatures.
  */
@@ -427,6 +433,7 @@ struct acpi_20_srat_memory {
 #define ACPI_2_0_HPET_SIGNATURE ASCII32('H','P','E','T')
 #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
 #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
+#define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
 
 /*
  * Table revision numbers.
@@ -441,6 +448,7 @@ struct acpi_20_srat_memory {
 #define ACPI_2_0_WAET_REVISION 0x01
 #define ACPI_1_0_FADT_REVISION 0x01
 #define ACPI_2_0_SRAT_REVISION 0x01
+#define ACPI_2_0_SLIT_REVISION 0x01
 
 #pragma pack ()
 
diff --git a/tools/firmware/hvmloader/acpi/build.c b/tools/firmware/hvmloader/acpi/build.c
index 3e96c23..7dac6a8 100644
--- a/tools/firmware/hvmloader/acpi/build.c
+++ b/tools/firmware/hvmloader/acpi/build.c
@@ -264,6 +264,38 @@ static struct acpi_20_srat *construct_srat(void)
     return srat;
 }
 
+static struct acpi_20_slit *construct_slit(void)
+{
+    struct acpi_20_slit *slit;
+    unsigned int i, num, size;
+
+    num = nr_vnodes * nr_vnodes;
+    size = sizeof(*slit) + num * sizeof(uint8_t);
+
+    slit = mem_alloc(size, 16);
+    if ( !slit )
+        return NULL;
+
+    memset(slit, 0, size);
+    slit->header.signature    = ACPI_2_0_SLIT_SIGNATURE;
+    slit->header.revision     = ACPI_2_0_SLIT_REVISION;
+    fixed_strcpy(slit->header.oem_id, ACPI_OEM_ID);
+    fixed_strcpy(slit->header.oem_table_id, ACPI_OEM_TABLE_ID);
+    slit->header.oem_revision = ACPI_OEM_REVISION;
+    slit->header.creator_id   = ACPI_CREATOR_ID;
+    slit->header.creator_revision = ACPI_CREATOR_REVISION;
+
+    for ( i = 0; i < num; i++ )
+        slit->entry[i] = vdistance[i];
+
+    slit->localities = nr_vnodes;
+
+    slit->header.length = size;
+    set_checksum(slit, offsetof(struct acpi_header, checksum), size);
+
+    return slit;
+}
+
 static int construct_passthrough_tables(unsigned long *table_ptrs,
                                         int nr_tables)
 {
@@ -319,6 +351,7 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
     struct acpi_20_waet *waet;
     struct acpi_20_tcpa *tcpa;
     struct acpi_20_srat *srat;
+    struct acpi_20_slit *slit;
     unsigned char *ssdt;
     static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001};
     uint16_t *tis_hdr;
@@ -408,7 +441,7 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
         }
     }
 
-    /* SRAT */
+    /* SRAT and SLIT */
     if ( nr_vnodes > 0 )
     {
         srat = construct_srat();
@@ -416,6 +449,11 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
             table_ptrs[nr_tables++] = (unsigned long)srat;
         else
             printf("Failed to build SRAT, skipping...\n");
+        slit = construct_slit();
+        if ( slit )
+            table_ptrs[nr_tables++] = (unsigned long)slit;
+        else
+            printf("Failed to build SLIT, skipping...\n");
     }
 
     /* Load any additional tables passed through. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 15/24] libxc: indentation change to xc_hvm_build_x86.c
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (13 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 14/24] hvmloader: construct SLIT Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-12 19:44 ` [PATCH v5 16/24] libxc: allocate memory with vNUMA information for HVM guest Wei Liu
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Move a while loop in xc_hvm_build_x86 one block to the right. No
functional change introduced.

Functional changes will be introduced in next patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxc/xc_hvm_build_x86.c | 153 ++++++++++++++++++++++-------------------
 1 file changed, 81 insertions(+), 72 deletions(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index c81a25b..ecc3224 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -353,98 +353,107 @@ static int setup_guest(xc_interface *xch,
     cur_pages = 0xc0;
     stat_normal_pages = 0xc0;
 
-    while ( (rc == 0) && (nr_pages > cur_pages) )
     {
-        /* Clip count to maximum 1GB extent. */
-        unsigned long count = nr_pages - cur_pages;
-        unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS;
-
-        if ( count > max_pages )
-            count = max_pages;
-
-        cur_pfn = page_array[cur_pages];
-
-        /* Take care the corner cases of super page tails */
-        if ( ((cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
-             (count > (-cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1))) )
-            count = -cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1);
-        else if ( ((count & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
-                  (count > SUPERPAGE_1GB_NR_PFNS) )
-            count &= ~(SUPERPAGE_1GB_NR_PFNS - 1);
-
-        /* Attemp to allocate 1GB super page. Because in each pass we only
-         * allocate at most 1GB, we don't have to clip super page boundaries.
-         */
-        if ( ((count | cur_pfn) & (SUPERPAGE_1GB_NR_PFNS - 1)) == 0 &&
-             /* Check if there exists MMIO hole in the 1GB memory range */
-             !check_mmio_hole(cur_pfn << PAGE_SHIFT,
-                              SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT,
-                              mmio_start, mmio_size) )
+        while ( (rc == 0) && (nr_pages > cur_pages) )
         {
-            long done;
-            unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT;
-            xen_pfn_t sp_extents[nr_extents];
-
-            for ( i = 0; i < nr_extents; i++ )
-                sp_extents[i] = page_array[cur_pages+(i<<SUPERPAGE_1GB_SHIFT)];
-
-            done = xc_domain_populate_physmap(xch, dom, nr_extents, SUPERPAGE_1GB_SHIFT,
-                                              pod_mode, sp_extents);
-
-            if ( done > 0 )
-            {
-                stat_1gb_pages += done;
-                done <<= SUPERPAGE_1GB_SHIFT;
-                cur_pages += done;
-                count -= done;
-            }
-        }
+            /* Clip count to maximum 1GB extent. */
+            unsigned long count = nr_pages - cur_pages;
+            unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS;
 
-        if ( count != 0 )
-        {
-            /* Clip count to maximum 8MB extent. */
-            max_pages = SUPERPAGE_2MB_NR_PFNS * 4;
             if ( count > max_pages )
                 count = max_pages;
-            
-            /* Clip partial superpage extents to superpage boundaries. */
-            if ( ((cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
-                 (count > (-cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1))) )
-                count = -cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1);
-            else if ( ((count & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
-                      (count > SUPERPAGE_2MB_NR_PFNS) )
-                count &= ~(SUPERPAGE_2MB_NR_PFNS - 1); /* clip non-s.p. tail */
-
-            /* Attempt to allocate superpage extents. */
-            if ( ((count | cur_pfn) & (SUPERPAGE_2MB_NR_PFNS - 1)) == 0 )
+
+            cur_pfn = page_array[cur_pages];
+
+            /* Take care the corner cases of super page tails */
+            if ( ((cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
+                 (count > (-cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1))) )
+                count = -cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1);
+            else if ( ((count & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
+                      (count > SUPERPAGE_1GB_NR_PFNS) )
+                count &= ~(SUPERPAGE_1GB_NR_PFNS - 1);
+
+            /* Attemp to allocate 1GB super page. Because in each pass
+             * we only allocate at most 1GB, we don't have to clip
+             * super page boundaries.
+             */
+            if ( ((count | cur_pfn) & (SUPERPAGE_1GB_NR_PFNS - 1)) == 0 &&
+                 /* Check if there exists MMIO hole in the 1GB memory
+                  * range */
+                 !check_mmio_hole(cur_pfn << PAGE_SHIFT,
+                                  SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT,
+                                  mmio_start, mmio_size) )
             {
                 long done;
-                unsigned long nr_extents = count >> SUPERPAGE_2MB_SHIFT;
+                unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT;
                 xen_pfn_t sp_extents[nr_extents];
 
                 for ( i = 0; i < nr_extents; i++ )
-                    sp_extents[i] = page_array[cur_pages+(i<<SUPERPAGE_2MB_SHIFT)];
+                    sp_extents[i] =
+                        page_array[cur_pages+(i<<SUPERPAGE_1GB_SHIFT)];
 
-                done = xc_domain_populate_physmap(xch, dom, nr_extents, SUPERPAGE_2MB_SHIFT,
+                done = xc_domain_populate_physmap(xch, dom, nr_extents,
+                                                  SUPERPAGE_1GB_SHIFT,
                                                   pod_mode, sp_extents);
 
                 if ( done > 0 )
                 {
-                    stat_2mb_pages += done;
-                    done <<= SUPERPAGE_2MB_SHIFT;
+                    stat_1gb_pages += done;
+                    done <<= SUPERPAGE_1GB_SHIFT;
                     cur_pages += done;
                     count -= done;
                 }
             }
-        }
 
-        /* Fall back to 4kB extents. */
-        if ( count != 0 )
-        {
-            rc = xc_domain_populate_physmap_exact(
-                xch, dom, count, 0, pod_mode, &page_array[cur_pages]);
-            cur_pages += count;
-            stat_normal_pages += count;
+            if ( count != 0 )
+            {
+                /* Clip count to maximum 8MB extent. */
+                max_pages = SUPERPAGE_2MB_NR_PFNS * 4;
+                if ( count > max_pages )
+                    count = max_pages;
+
+                /* Clip partial superpage extents to superpage
+                 * boundaries. */
+                if ( ((cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
+                     (count > (-cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1))) )
+                    count = -cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1);
+                else if ( ((count & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
+                          (count > SUPERPAGE_2MB_NR_PFNS) )
+                    count &= ~(SUPERPAGE_2MB_NR_PFNS - 1); /* clip non-s.p. tail */
+
+                /* Attempt to allocate superpage extents. */
+                if ( ((count | cur_pfn) & (SUPERPAGE_2MB_NR_PFNS - 1)) == 0 )
+                {
+                    long done;
+                    unsigned long nr_extents = count >> SUPERPAGE_2MB_SHIFT;
+                    xen_pfn_t sp_extents[nr_extents];
+
+                    for ( i = 0; i < nr_extents; i++ )
+                        sp_extents[i] =
+                            page_array[cur_pages+(i<<SUPERPAGE_2MB_SHIFT)];
+
+                    done = xc_domain_populate_physmap(xch, dom, nr_extents,
+                                                      SUPERPAGE_2MB_SHIFT,
+                                                      pod_mode, sp_extents);
+
+                    if ( done > 0 )
+                    {
+                        stat_2mb_pages += done;
+                        done <<= SUPERPAGE_2MB_SHIFT;
+                        cur_pages += done;
+                        count -= done;
+                    }
+                }
+            }
+
+            /* Fall back to 4kB extents. */
+            if ( count != 0 )
+            {
+                rc = xc_domain_populate_physmap_exact(
+                    xch, dom, count, 0, pod_mode, &page_array[cur_pages]);
+                cur_pages += count;
+                stat_normal_pages += count;
+            }
         }
     }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 16/24] libxc: allocate memory with vNUMA information for HVM guest
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (14 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 15/24] libxc: indentation change to xc_hvm_build_x86.c Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 16:22   ` Andrew Cooper
  2015-02-12 19:44 ` [PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

The algorithm is more or less the same as the one used for PV guest.
Libxc gets hold of the mapping of vnode to pnode and size of each vnode
then allocate memory accordingly.

And then the function returns low memory end, high memory end and mmio
start to caller. Libxl needs those values to construct vmemranges for
that guest.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
Changes in v5:
1. Use a better loop variable name vnid.

Changes in v4:
1. Adapt to new interface.
2. Shorten error message.
3. This patch includes only functional changes.

Changes in v3:
1. Rewrite commit log.
2. Add a few code comments.
---
 tools/libxc/include/xenguest.h |  11 +++++
 tools/libxc/xc_hvm_build_x86.c | 105 ++++++++++++++++++++++++++++++++++-------
 2 files changed, 100 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 40bbac8..ff66cb1 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -230,6 +230,17 @@ struct xc_hvm_build_args {
     struct xc_hvm_firmware_module smbios_module;
     /* Whether to use claim hypercall (1 - enable, 0 - disable). */
     int claim_enabled;
+
+    /* vNUMA information*/
+    xen_vmemrange_t *vmemranges;
+    unsigned int nr_vmemranges;
+    unsigned int *vnode_to_pnode;
+    unsigned int nr_vnodes;
+
+    /* Out parameters  */
+    uint64_t lowmem_end;
+    uint64_t highmem_end;
+    uint64_t mmio_start;
 };
 
 /**
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index ecc3224..a2a3777 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -89,7 +89,8 @@ static int modules_init(struct xc_hvm_build_args *args,
 }
 
 static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
-                           uint64_t mmio_start, uint64_t mmio_size)
+                           uint64_t mmio_start, uint64_t mmio_size,
+                           struct xc_hvm_build_args *args)
 {
     struct hvm_info_table *hvm_info = (struct hvm_info_table *)
         (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
@@ -119,6 +120,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
     hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
 
+    args->lowmem_end = lowmem_end;
+    args->highmem_end = highmem_end;
+    args->mmio_start = mmio_start;
+
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
         sum += ((uint8_t *)hvm_info)[i];
@@ -244,7 +249,7 @@ static int setup_guest(xc_interface *xch,
                        char *image, unsigned long image_size)
 {
     xen_pfn_t *page_array = NULL;
-    unsigned long i, nr_pages = args->mem_size >> PAGE_SHIFT;
+    unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
     unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
     uint64_t mmio_start = (1ull << 32) - args->mmio_size;
     uint64_t mmio_size = args->mmio_size;
@@ -258,13 +263,13 @@ static int setup_guest(xc_interface *xch,
     xen_capabilities_info_t caps;
     unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
         stat_1gb_pages = 0;
-    int pod_mode = 0;
+    unsigned int memflags = 0;
     int claim_enabled = args->claim_enabled;
     xen_pfn_t special_array[NR_SPECIAL_PAGES];
     xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
-
-    if ( nr_pages > target_pages )
-        pod_mode = XENMEMF_populate_on_demand;
+    uint64_t total_pages;
+    xen_vmemrange_t dummy_vmemrange;
+    unsigned int dummy_vnode_to_pnode;
 
     memset(&elf, 0, sizeof(elf));
     if ( elf_init(&elf, image, image_size) != 0 )
@@ -276,6 +281,43 @@ static int setup_guest(xc_interface *xch,
     v_start = 0;
     v_end = args->mem_size;
 
+    if ( nr_pages > target_pages )
+        memflags |= XENMEMF_populate_on_demand;
+
+    if ( args->nr_vmemranges == 0 )
+    {
+        /* Build dummy vnode information */
+        dummy_vmemrange.start = 0;
+        dummy_vmemrange.end   = args->mem_size;
+        dummy_vmemrange.flags = 0;
+        dummy_vmemrange.nid   = 0;
+        args->nr_vmemranges = 1;
+        args->vmemranges = &dummy_vmemrange;
+
+        dummy_vnode_to_pnode = XC_VNUMA_NO_NODE;
+        args->nr_vnodes = 1;
+        args->vnode_to_pnode = &dummy_vnode_to_pnode;
+    }
+    else
+    {
+        if ( nr_pages > target_pages )
+        {
+            PERROR("Cannot enable vNUMA and PoD at the same time");
+            goto error_out;
+        }
+    }
+
+    total_pages = 0;
+    for ( i = 0; i < args->nr_vmemranges; i++ )
+        total_pages += ((args->vmemranges[i].end - args->vmemranges[i].start)
+                        >> PAGE_SHIFT);
+    if ( total_pages != (args->mem_size >> PAGE_SHIFT) )
+    {
+        PERROR("vNUMA memory pages mismatch (0x%"PRIx64" != 0x%"PRIx64")",
+               total_pages, args->mem_size >> PAGE_SHIFT);
+        goto error_out;
+    }
+
     if ( xc_version(xch, XENVER_capabilities, &caps) != 0 )
     {
         PERROR("Could not get Xen capabilities");
@@ -320,7 +362,7 @@ static int setup_guest(xc_interface *xch,
         }
     }
 
-    if ( pod_mode )
+    if ( memflags & XENMEMF_populate_on_demand )
     {
         /*
          * Subtract VGA_HOLE_SIZE from target_pages for the VGA
@@ -349,15 +391,40 @@ static int setup_guest(xc_interface *xch,
      * ensure that we can be preempted and hence dom0 remains responsive.
      */
     rc = xc_domain_populate_physmap_exact(
-        xch, dom, 0xa0, 0, pod_mode, &page_array[0x00]);
-    cur_pages = 0xc0;
-    stat_normal_pages = 0xc0;
+        xch, dom, 0xa0, 0, memflags, &page_array[0x00]);
 
+    stat_normal_pages = 0;
+    for ( vmemid = 0; vmemid < args->nr_vmemranges; vmemid++ )
     {
-        while ( (rc == 0) && (nr_pages > cur_pages) )
+        unsigned int new_memflags = memflags;
+        uint64_t end_pages;
+        unsigned int vnode = args->vmemranges[vmemid].nid;
+        unsigned int pnode = args->vnode_to_pnode[vnode];
+
+        if ( pnode != XC_VNUMA_NO_NODE )
+        {
+            new_memflags |= XENMEMF_exact_node(pnode);
+            new_memflags |= XENMEMF_exact_node_request;
+        }
+
+        end_pages = args->vmemranges[i].end >> PAGE_SHIFT;
+        /*
+         * Consider vga hole belongs to the vmemrange that covers
+         * 0xA0000-0xC0000. Note that 0x00000-0xA0000 is populated just
+         * before this loop.
+         */
+        if ( args->vmemranges[vmemid].start == 0 )
+        {
+            cur_pages = 0xc0;
+            stat_normal_pages += 0xc0;
+        }
+        else
+            cur_pages = args->vmemranges[vmemid].start >> PAGE_SHIFT;
+
+        while ( (rc == 0) && (end_pages > cur_pages) )
         {
             /* Clip count to maximum 1GB extent. */
-            unsigned long count = nr_pages - cur_pages;
+            unsigned long count = end_pages - cur_pages;
             unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS;
 
             if ( count > max_pages )
@@ -394,7 +461,7 @@ static int setup_guest(xc_interface *xch,
 
                 done = xc_domain_populate_physmap(xch, dom, nr_extents,
                                                   SUPERPAGE_1GB_SHIFT,
-                                                  pod_mode, sp_extents);
+                                                  memflags, sp_extents);
 
                 if ( done > 0 )
                 {
@@ -434,7 +501,7 @@ static int setup_guest(xc_interface *xch,
 
                     done = xc_domain_populate_physmap(xch, dom, nr_extents,
                                                       SUPERPAGE_2MB_SHIFT,
-                                                      pod_mode, sp_extents);
+                                                      memflags, sp_extents);
 
                     if ( done > 0 )
                     {
@@ -450,11 +517,14 @@ static int setup_guest(xc_interface *xch,
             if ( count != 0 )
             {
                 rc = xc_domain_populate_physmap_exact(
-                    xch, dom, count, 0, pod_mode, &page_array[cur_pages]);
+                    xch, dom, count, 0, new_memflags, &page_array[cur_pages]);
                 cur_pages += count;
                 stat_normal_pages += count;
             }
         }
+
+        if ( rc != 0 )
+            break;
     }
 
     if ( rc != 0 )
@@ -478,7 +548,7 @@ static int setup_guest(xc_interface *xch,
               xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
               HVM_INFO_PFN)) == NULL )
         goto error_out;
-    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size);
+    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size, args);
     munmap(hvm_info_page, PAGE_SIZE);
 
     /* Allocate and clear special pages. */
@@ -617,6 +687,9 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
             args.acpi_module.guest_addr_out;
         hvm_args->smbios_module.guest_addr_out = 
             args.smbios_module.guest_addr_out;
+        hvm_args->lowmem_end = args.lowmem_end;
+        hvm_args->highmem_end = args.highmem_end;
+        hvm_args->mmio_start = args.mmio_start;
     }
 
     free(image);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen for HVM guest
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (15 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 16/24] libxc: allocate memory with vNUMA information for HVM guest Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 14:21   ` Ian Jackson
  2015-02-17 14:26   ` Dario Faggioli
  2015-02-12 19:44 ` [PATCH v5 18/24] libxl: disallow memory relocation when vNUMA is enabled Wei Liu
                   ` (6 subsequent siblings)
  23 siblings, 2 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Transform user supplied vNUMA configuration into libxl internal
representations then libxc representations. Check validity along the
line.

Libxc has more involvement in building vmemranges in HVM case compared
to PV case. The building of vmemranges is placed after xc_hvm_build
returns, because it relies on memory hole information provided by
xc_hvm_build.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
---
Changes in v5:
1. Check vnode 0 is large enough to accommodate video ram.

Changes in v4:
1. Adapt to new interface.
2. Rename some variables.
3. Use GCREALLOC_ARRAY.

Changes in v3:
1. Rewrite commit log.
---
 tools/libxc/xc_hvm_build_x86.c |  2 +-
 tools/libxl/libxl_create.c     |  9 +++++++
 tools/libxl/libxl_dom.c        | 37 ++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h   |  5 ++++
 tools/libxl/libxl_vnuma.c      | 56 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index a2a3777..bd12e30 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -407,7 +407,7 @@ static int setup_guest(xc_interface *xch,
             new_memflags |= XENMEMF_exact_node_request;
         }
 
-        end_pages = args->vmemranges[i].end >> PAGE_SHIFT;
+        end_pages = args->vmemranges[vmemid].end >> PAGE_SHIFT;
         /*
          * Consider vga hole belongs to the vmemrange that covers
          * 0xA0000-0xC0000. Note that 0x00000-0xA0000 is populated just
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 98687bd..af04248 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -853,6 +853,15 @@ static void initiate_domain_create(libxl__egc *egc,
         goto error_out;
     }
 
+    /* Disallow PoD and vNUMA to be enabled at the same time because PoD
+     * pool is not vNUMA-aware yet.
+     */
+    if (pod_enabled && d_config->b_info.num_vnuma_nodes) {
+        ret = ERROR_INVAL;
+        LOG(ERROR, "Cannot enable PoD and vNUMA at the same time");
+        goto error_out;
+    }
+
     ret = libxl__domain_create_info_setdefault(gc, &d_config->c_info);
     if (ret) goto error_out;
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 1ff0704..b2c9daf 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -893,12 +893,49 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         goto out;
     }
 
+    if (info->num_vnuma_nodes != 0) {
+        int i;
+
+        args.nr_vmemranges = state->num_vmemranges;
+        args.vmemranges = libxl__malloc(gc, sizeof(*args.vmemranges) *
+                                        args.nr_vmemranges);
+
+        for (i = 0; i < args.nr_vmemranges; i++) {
+            args.vmemranges[i].start = state->vmemranges[i].start;
+            args.vmemranges[i].end   = state->vmemranges[i].end;
+            args.vmemranges[i].flags = state->vmemranges[i].flags;
+            args.vmemranges[i].nid   = state->vmemranges[i].nid;
+        }
+
+        /* Consider video ram belongs to vmemrange 0 -- just shrink it
+         * by the size of video ram.
+         */
+        if (((args.vmemranges[0].end - args.vmemranges[0].start) >> 10)
+            < info->video_memkb) {
+            LOG(ERROR, "vmemrange 0 too small to contain video ram");
+            goto out;
+        }
+
+        args.vmemranges[0].end -= (info->video_memkb << 10);
+    }
+
     ret = xc_hvm_build(ctx->xch, domid, &args);
     if (ret) {
         LOGEV(ERROR, ret, "hvm building failed");
         goto out;
     }
 
+    if (info->num_vnuma_nodes != 0) {
+        ret = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, &args);
+        if (ret) {
+            LOGEV(ERROR, ret, "hvm build vmemranges failed");
+            goto out;
+        }
+        ret = libxl__vnuma_config_check(gc, info, state);
+        if (ret) goto out;
+        ret = set_vnuma_info(gc, domid, info, state);
+        if (ret) goto out;
+    }
     ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 7d1e1cf..e93089a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3408,6 +3408,11 @@ int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
                                     uint32_t domid,
                                     libxl_domain_build_info *b_info,
                                     libxl__domain_build_state *state);
+int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
+                                     uint32_t domid,
+                                     libxl_domain_build_info *b_info,
+                                     libxl__domain_build_state *state,
+                                     struct xc_hvm_build_args *args);
 
 _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
                                    const libxl_ms_vm_genid *id);
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
index 3d46239..eefca38 100644
--- a/tools/libxl/libxl_vnuma.c
+++ b/tools/libxl/libxl_vnuma.c
@@ -163,6 +163,62 @@ int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
     return libxl__arch_vnuma_build_vmemrange(gc, domid, b_info, state);
 }
 
+/* Build vmemranges for HVM guest */
+int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
+                                     uint32_t domid,
+                                     libxl_domain_build_info *b_info,
+                                     libxl__domain_build_state *state,
+                                     struct xc_hvm_build_args *args)
+{
+    uint64_t hole_start, hole_end, next;
+    int nid, nr_vmemrange;
+    xen_vmemrange_t *vmemranges;
+
+    /* Derive vmemranges from vnode size and memory hole.
+     *
+     * Guest physical address space layout:
+     * [0, hole_start) [hole_start, hole_end) [hole_end, highmem_end)
+     */
+    hole_start = args->lowmem_end < args->mmio_start ?
+        args->lowmem_end : args->mmio_start;
+    hole_end = (args->mmio_start + args->mmio_size) > (1ULL << 32) ?
+        (args->mmio_start + args->mmio_size) : (1ULL << 32);
+
+    assert(state->vmemranges == NULL);
+
+    next = 0;
+    nr_vmemrange = 0;
+    vmemranges = NULL;
+    for (nid = 0; nid < b_info->num_vnuma_nodes; nid++) {
+        libxl_vnode_info *p = &b_info->vnuma_nodes[nid];
+        uint64_t remaining_bytes = p->memkb << 10;
+
+        while (remaining_bytes > 0) {
+            uint64_t count = remaining_bytes;
+
+            if (next >= hole_start && next < hole_end)
+                next = hole_end;
+            if ((next < hole_start) && (next + remaining_bytes >= hole_start))
+                count = hole_start - next;
+
+            GCREALLOC_ARRAY(vmemranges, nr_vmemrange+1);
+            vmemranges[nr_vmemrange].start = next;
+            vmemranges[nr_vmemrange].end = next + count;
+            vmemranges[nr_vmemrange].flags = 0;
+            vmemranges[nr_vmemrange].nid = nid;
+
+            nr_vmemrange++;
+            remaining_bytes -= count;
+            next += count;
+        }
+    }
+
+    state->vmemranges = vmemranges;
+    state->num_vmemranges = nr_vmemrange;
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 18/24] libxl: disallow memory relocation when vNUMA is enabled
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (16 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 14:17   ` Ian Jackson
  2015-02-12 19:44 ` [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA Wei Liu
                   ` (5 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Disallow memory relocation when vNUMA is enabled, because relocated
memory ends up off node. Further more, even if we dynamically expand
node coverage in hvmloader, low memory and high memory may reside
in different physical nodes, blindly relocating low memory to high
memory gives us a sub-optimal configuration.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
---
 tools/libxl/libxl_dm.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 8599a6a..8edf276 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1365,13 +1365,15 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss)
                         libxl__sprintf(gc, "%s/hvmloader/bios", path),
                         "%s", libxl_bios_type_to_string(b_info->u.hvm.bios));
         /* Disable relocating memory to make the MMIO hole larger
-         * unless we're running qemu-traditional */
+         * unless we're running qemu-traditional and vNUMA is not
+         * configured. */
         libxl__xs_write(gc, XBT_NULL,
                         libxl__sprintf(gc,
                                        "%s/hvmloader/allow-memory-relocate",
                                        path),
                         "%d",
-                        b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL);
+                        b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL &&
+                        !b_info->num_vnuma_nodes);
         free(path);
     }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (17 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 18/24] libxl: disallow memory relocation when vNUMA is enabled Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 14:12   ` Ian Jackson
  2015-02-12 19:44 ` [PATCH v5 20/24] libxlu: rework internal representation of setting Wei Liu
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index c219f59..f33178c 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -67,6 +67,12 @@
  * the same $(XEN_VERSION) (e.g. throughout a major release).
  */
 
+/* LIBXL_HAVE_VNUMA
+ *
+ * If it is defined, libxl supports vNUMA configuration
+ */
+#define LIBXL_HAVE_VNUMA 1
+
 /* LIBXL_HAVE_USERDATA_UNLINK
  *
  * If it is defined, libxl has a library function called
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 20/24] libxlu: rework internal representation of setting
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (18 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 14:24   ` Ian Jackson
  2015-02-12 19:44 ` [PATCH v5 21/24] libxlu: nested list support Wei Liu
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

This patches does following things:

1. Properly define a XLU_ConfigList type. Originally it was defined to
   be XLU_ConfigSetting.
2. Define XLU_ConfigValue type, which can be either a string or a list
   of XLU_ConfigValue.
3. ConfigSetting now references XLU_ConfigValue. Originally it only
   worked with **string.
4. Properly construct list where necessary, see changes to .y file.

To achieve above changes:

1. xlu__cfg_set_mk and xlu__cfg_set_add are deleted, because they
   are no more needed in the new code.
2. Introduce xlu__cfg_string_mk to make a XLU_ConfigSetting that points
   to a XLU_ConfigValue that wraps a string.
3. Introduce xlu__cfg_list_mk to make a XLU_ConfigSetting that points
   to XLU_ConfigValue that is a list.
4. The parser now generates XLU_ConfigValue instead of XLU_ConfigSetting
   when construct values, which enables us to recursively generate list
   of lists.
5. XLU_ConfigSetting is generated in xlu__cfg_set_store.
6. Adapt other functions to use new types.

No change to public API. Xl compiles without problem and 'xl create -n
guest.cfg' is valgrind clean.

This patch is needed because we're going to implement nested list
support, which requires support for list of list.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
Changes in v5:
1. Use standard expanding-array pattern.
---
 tools/libxl/libxlu_cfg.c      | 170 ++++++++++++++++++++++++++++++------------
 tools/libxl/libxlu_cfg_i.h    |  12 ++-
 tools/libxl/libxlu_cfg_y.c    |  24 +++---
 tools/libxl/libxlu_cfg_y.h    |   2 +-
 tools/libxl/libxlu_cfg_y.y    |  14 ++--
 tools/libxl/libxlu_internal.h |  30 ++++++--
 6 files changed, 173 insertions(+), 79 deletions(-)

diff --git a/tools/libxl/libxlu_cfg.c b/tools/libxl/libxlu_cfg.c
index 22adcb0..f000eed 100644
--- a/tools/libxl/libxlu_cfg.c
+++ b/tools/libxl/libxlu_cfg.c
@@ -131,14 +131,28 @@ int xlu_cfg_readdata(XLU_Config *cfg, const char *data, int length) {
     return ctx.err;
 }
 
-void xlu__cfg_set_free(XLU_ConfigSetting *set) {
+void xlu__cfg_value_free(XLU_ConfigValue *value)
+{
     int i;
 
+    if (!value) return;
+
+    switch (value->type) {
+    case XLU_STRING:
+        free(value->u.string);
+        break;
+    case XLU_LIST:
+        for (i = 0; i < value->u.list.nvalues; i++)
+            xlu__cfg_value_free(value->u.list.values[i]);
+        free(value->u.list.values);
+    }
+    free(value);
+}
+
+void xlu__cfg_set_free(XLU_ConfigSetting *set) {
     if (!set) return;
     free(set->name);
-    for (i=0; i<set->nvalues; i++)
-        free(set->values[i]);
-    free(set->values);
+    xlu__cfg_value_free(set->value);
     free(set);
 }
 
@@ -173,7 +187,7 @@ static int find_atom(const XLU_Config *cfg, const char *n,
     set= find(cfg,n);
     if (!set) return ESRCH;
 
-    if (set->avalues!=1) {
+    if (set->value->type!=XLU_STRING) {
         if (!dont_warn)
             fprintf(cfg->report,
                     "%s:%d: warning: parameter `%s' is"
@@ -191,7 +205,7 @@ int xlu_cfg_get_string(const XLU_Config *cfg, const char *n,
     int e;
 
     e= find_atom(cfg,n,&set,dont_warn);  if (e) return e;
-    *value_r= set->values[0];
+    *value_r= set->value->u.string;
     return 0;
 }
 
@@ -202,7 +216,7 @@ int xlu_cfg_replace_string(const XLU_Config *cfg, const char *n,
 
     e= find_atom(cfg,n,&set,dont_warn);  if (e) return e;
     free(*value_r);
-    *value_r= strdup(set->values[0]);
+    *value_r= strdup(set->value->u.string);
     return 0;
 }
 
@@ -214,7 +228,7 @@ int xlu_cfg_get_long(const XLU_Config *cfg, const char *n,
     char *ep;
 
     e= find_atom(cfg,n,&set,dont_warn);  if (e) return e;
-    errno= 0; l= strtol(set->values[0], &ep, 0);
+    errno= 0; l= strtol(set->value->u.string, &ep, 0);
     e= errno;
     if (errno) {
         e= errno;
@@ -226,7 +240,7 @@ int xlu_cfg_get_long(const XLU_Config *cfg, const char *n,
                     cfg->config_source, set->lineno, n, strerror(e));
         return e;
     }
-    if (*ep || ep==set->values[0]) {
+    if (*ep || ep==set->value->u.string) {
         if (!dont_warn)
             fprintf(cfg->report,
                     "%s:%d: warning: parameter `%s' is not a valid number\n",
@@ -253,7 +267,7 @@ int xlu_cfg_get_list(const XLU_Config *cfg, const char *n,
                      XLU_ConfigList **list_r, int *entries_r, int dont_warn) {
     XLU_ConfigSetting *set;
     set= find(cfg,n);  if (!set) return ESRCH;
-    if (set->avalues==1) {
+    if (set->value->type!=XLU_LIST) {
         if (!dont_warn) {
             fprintf(cfg->report,
                     "%s:%d: warning: parameter `%s' is a single value"
@@ -262,8 +276,8 @@ int xlu_cfg_get_list(const XLU_Config *cfg, const char *n,
         }
         return EINVAL;
     }
-    if (list_r) *list_r= set;
-    if (entries_r) *entries_r= set->nvalues;
+    if (list_r) *list_r= &set->value->u.list;
+    if (entries_r) *entries_r= set->value->u.list.nvalues;
     return 0;
 }
 
@@ -290,72 +304,130 @@ int xlu_cfg_get_list_as_string_list(const XLU_Config *cfg, const char *n,
     return 0;
 }
 
-const char *xlu_cfg_get_listitem(const XLU_ConfigList *set, int entry) {
-    if (entry < 0 || entry >= set->nvalues) return 0;
-    return set->values[entry];
+const char *xlu_cfg_get_listitem(const XLU_ConfigList *list, int entry) {
+    if (entry < 0 || entry >= list->nvalues) return 0;
+    if (list->values[entry]->type != XLU_STRING) return 0;
+    return list->values[entry]->u.string;
 }
 
 
-XLU_ConfigSetting *xlu__cfg_set_mk(CfgParseContext *ctx,
-                                   int alloc, char *atom) {
-    XLU_ConfigSetting *set= 0;
+XLU_ConfigValue *xlu__cfg_string_mk(CfgParseContext *ctx, char *atom)
+{
+    XLU_ConfigValue *value = NULL;
 
     if (ctx->err) goto x;
-    assert(!!alloc == !!atom);
 
-    set= malloc(sizeof(*set));
-    if (!set) goto xe;
+    value = malloc(sizeof(*value));
+    if (!value) goto xe;
+    value->type = XLU_STRING;
+    value->u.string = atom;
+
+    return value;
 
-    set->name= 0; /* tbd */
-    set->avalues= alloc;
+ xe:
+    ctx->err= errno;
+ x:
+    free(value);
+    free(atom);
+    return NULL;
+}
 
-    if (!alloc) {
-        set->nvalues= 0;
-        set->values= 0;
-    } else {
-        set->values= malloc(sizeof(*set->values) * alloc);
-        if (!set->values) goto xe;
+XLU_ConfigValue *xlu__cfg_list_mk(CfgParseContext *ctx, char *atom)
+{
+    XLU_ConfigValue *value = NULL;
+    XLU_ConfigValue **values = NULL;
+    XLU_ConfigValue *val = NULL;
 
-        set->nvalues= 1;
-        set->values[0]= atom;
-    }
-    return set;
+    if (ctx->err) goto x;
+
+    val = malloc(sizeof(*val));
+    if (!val) goto xe;
+    val->type = XLU_STRING;
+    val->u.string = atom;
+
+    values = malloc(sizeof(*values));
+    if (!values) goto xe;
+    values[0] = val;
+
+    value = malloc(sizeof(*value));
+    if (!value) goto xe;
+    value->type = XLU_LIST;
+    value->u.list.nvalues = 1;
+    value->u.list.avalues = 1;
+    value->u.list.values = values;
+
+    return value;
 
  xe:
     ctx->err= errno;
  x:
-    free(set);
+    free(value);
+    free(values);
+    free(val);
     free(atom);
-    return 0;
+    return NULL;
 }
 
-void xlu__cfg_set_add(CfgParseContext *ctx, XLU_ConfigSetting *set,
-                      char *atom) {
+void xlu__cfg_list_append(CfgParseContext *ctx,
+                          XLU_ConfigValue *list,
+                          char *atom)
+{
+    XLU_ConfigValue *val = NULL;
     if (ctx->err) return;
 
     assert(atom);
+    assert(list->type == XLU_LIST);
 
-    if (set->nvalues >= set->avalues) {
+    if (list->u.list.nvalues >= list->u.list.avalues) {
         int new_avalues;
-        char **new_values;
-
-        if (set->avalues > INT_MAX / 100) { ctx->err= ERANGE; return; }
-        new_avalues= set->avalues * 4;
-        new_values= realloc(set->values,
-                            sizeof(*new_values) * new_avalues);
-        if (!new_values) { ctx->err= errno; free(atom); return; }
-        set->values= new_values;
-        set->avalues= new_avalues;
+        XLU_ConfigValue **new_values = NULL;
+
+        if (list->u.list.avalues > INT_MAX / 100) {
+            ctx->err = ERANGE;
+            free(atom);
+            return;
+        }
+
+        new_avalues = list->u.list.avalues * 4;
+        new_values  = realloc(list->u.list.values,
+                              sizeof(*new_values) * new_avalues);
+        if (!new_values) {
+            ctx->err = errno;
+            free(atom);
+            return;
+        }
+
+        list->u.list.avalues = new_avalues;
+        list->u.list.values  = new_values;
+    }
+
+    val = malloc(sizeof(*val));
+    if (!val) {
+        ctx->err = errno;
+        free(atom);
+        return;
     }
-    set->values[set->nvalues++]= atom;
+
+    val->type = XLU_STRING;
+    val->u.string = atom;
+    list->u.list.values[list->u.list.nvalues] = val;
+    list->u.list.nvalues++;
 }
 
 void xlu__cfg_set_store(CfgParseContext *ctx, char *name,
-                        XLU_ConfigSetting *set, int lineno) {
+                        XLU_ConfigValue *val, int lineno) {
+    XLU_ConfigSetting *set;
+
     if (ctx->err) return;
 
     assert(name);
+    set = malloc(sizeof(*set));
+    if (!set) {
+        ctx->err = errno;
+        return;
+    }
     set->name= name;
+    set->value = val;
     set->lineno= lineno;
     set->next= ctx->cfg->settings;
     ctx->cfg->settings= set;
diff --git a/tools/libxl/libxlu_cfg_i.h b/tools/libxl/libxlu_cfg_i.h
index 54d033c..b71e9fd 100644
--- a/tools/libxl/libxlu_cfg_i.h
+++ b/tools/libxl/libxlu_cfg_i.h
@@ -23,11 +23,15 @@
 #include "libxlu_cfg_y.h"
 
 void xlu__cfg_set_free(XLU_ConfigSetting *set);
-XLU_ConfigSetting *xlu__cfg_set_mk(CfgParseContext*, int alloc, char *atom);
-void xlu__cfg_set_add(CfgParseContext*, XLU_ConfigSetting *set, char *atom);
 void xlu__cfg_set_store(CfgParseContext*, char *name,
-                        XLU_ConfigSetting *set, int lineno);
-
+                        XLU_ConfigValue *val, int lineno);
+XLU_ConfigValue *xlu__cfg_string_mk(CfgParseContext *ctx,
+                                    char *atom);
+XLU_ConfigValue *xlu__cfg_list_mk(CfgParseContext *ctx, char *atom);
+void xlu__cfg_list_append(CfgParseContext *ctx,
+                          XLU_ConfigValue *list,
+                          char *atom);
+void xlu__cfg_value_free(XLU_ConfigValue *value);
 char *xlu__cfgl_strdup(CfgParseContext*, const char *src);
 char *xlu__cfgl_dequote(CfgParseContext*, const char *src);
 
diff --git a/tools/libxl/libxlu_cfg_y.c b/tools/libxl/libxlu_cfg_y.c
index 07b5a1d..eb3884f 100644
--- a/tools/libxl/libxlu_cfg_y.c
+++ b/tools/libxl/libxlu_cfg_y.c
@@ -126,7 +126,7 @@ typedef union YYSTYPE
 #line 25 "libxlu_cfg_y.y"
 
   char *string;
-  XLU_ConfigSetting *setting;
+  XLU_ConfigValue *value;
 
 
 
@@ -1148,7 +1148,7 @@ yydestruct (yymsg, yytype, yyvaluep, yylocationp, ctx)
 
 /* Line 1391 of yacc.c  */
 #line 43 "libxlu_cfg_y.y"
-	{ xlu__cfg_set_free((yyvaluep->setting)); };
+	{ xlu__cfg_value_free((yyvaluep->value)); };
 
 /* Line 1391 of yacc.c  */
 #line 1155 "libxlu_cfg_y.c"
@@ -1166,7 +1166,7 @@ yydestruct (yymsg, yytype, yyvaluep, yylocationp, ctx)
 
 /* Line 1391 of yacc.c  */
 #line 43 "libxlu_cfg_y.y"
-	{ xlu__cfg_set_free((yyvaluep->setting)); };
+	{ xlu__cfg_value_free((yyvaluep->value)); };
 
 /* Line 1391 of yacc.c  */
 #line 1173 "libxlu_cfg_y.c"
@@ -1175,7 +1175,7 @@ yydestruct (yymsg, yytype, yyvaluep, yylocationp, ctx)
 
 /* Line 1391 of yacc.c  */
 #line 43 "libxlu_cfg_y.y"
-	{ xlu__cfg_set_free((yyvaluep->setting)); };
+	{ xlu__cfg_value_free((yyvaluep->value)); };
 
 /* Line 1391 of yacc.c  */
 #line 1182 "libxlu_cfg_y.c"
@@ -1508,21 +1508,21 @@ yyreduce:
 
 /* Line 1806 of yacc.c  */
 #line 57 "libxlu_cfg_y.y"
-    { xlu__cfg_set_store(ctx,(yyvsp[(1) - (3)].string),(yyvsp[(3) - (3)].setting),(yylsp[(3) - (3)]).first_line); }
+    { xlu__cfg_set_store(ctx,(yyvsp[(1) - (3)].string),(yyvsp[(3) - (3)].value),(yylsp[(3) - (3)]).first_line); }
     break;
 
   case 12:
 
 /* Line 1806 of yacc.c  */
 #line 62 "libxlu_cfg_y.y"
-    { (yyval.setting)= xlu__cfg_set_mk(ctx,1,(yyvsp[(1) - (1)].string)); }
+    { (yyval.value)= xlu__cfg_string_mk(ctx,(yyvsp[(1) - (1)].string)); }
     break;
 
   case 13:
 
 /* Line 1806 of yacc.c  */
 #line 63 "libxlu_cfg_y.y"
-    { (yyval.setting)= (yyvsp[(3) - (4)].setting); }
+    { (yyval.value)= (yyvsp[(3) - (4)].value); }
     break;
 
   case 14:
@@ -1543,35 +1543,35 @@ yyreduce:
 
 /* Line 1806 of yacc.c  */
 #line 68 "libxlu_cfg_y.y"
-    { (yyval.setting)= xlu__cfg_set_mk(ctx,0,0); }
+    { (yyval.value)= xlu__cfg_list_mk(ctx,NULL); }
     break;
 
   case 17:
 
 /* Line 1806 of yacc.c  */
 #line 69 "libxlu_cfg_y.y"
-    { (yyval.setting)= (yyvsp[(1) - (1)].setting); }
+    { (yyval.value)= (yyvsp[(1) - (1)].value); }
     break;
 
   case 18:
 
 /* Line 1806 of yacc.c  */
 #line 70 "libxlu_cfg_y.y"
-    { (yyval.setting)= (yyvsp[(1) - (3)].setting); }
+    { (yyval.value)= (yyvsp[(1) - (3)].value); }
     break;
 
   case 19:
 
 /* Line 1806 of yacc.c  */
 #line 72 "libxlu_cfg_y.y"
-    { (yyval.setting)= xlu__cfg_set_mk(ctx,2,(yyvsp[(1) - (2)].string)); }
+    { (yyval.value)= xlu__cfg_list_mk(ctx,(yyvsp[(1) - (2)].string)); }
     break;
 
   case 20:
 
 /* Line 1806 of yacc.c  */
 #line 73 "libxlu_cfg_y.y"
-    { xlu__cfg_set_add(ctx,(yyvsp[(1) - (5)].setting),(yyvsp[(4) - (5)].string)); (yyval.setting)= (yyvsp[(1) - (5)].setting); }
+    { xlu__cfg_list_append(ctx,(yyvsp[(1) - (5)].value),(yyvsp[(4) - (5)].string)); (yyval.value)= (yyvsp[(1) - (5)].value); }
     break;
 
 
diff --git a/tools/libxl/libxlu_cfg_y.h b/tools/libxl/libxlu_cfg_y.h
index d7dfaf2..37e8213 100644
--- a/tools/libxl/libxlu_cfg_y.h
+++ b/tools/libxl/libxlu_cfg_y.h
@@ -54,7 +54,7 @@ typedef union YYSTYPE
 #line 25 "libxlu_cfg_y.y"
 
   char *string;
-  XLU_ConfigSetting *setting;
+  XLU_ConfigValue *value;
 
 
 
diff --git a/tools/libxl/libxlu_cfg_y.y b/tools/libxl/libxlu_cfg_y.y
index 5acd438..6848686 100644
--- a/tools/libxl/libxlu_cfg_y.y
+++ b/tools/libxl/libxlu_cfg_y.y
@@ -24,7 +24,7 @@
 
 %union {
   char *string;
-  XLU_ConfigSetting *setting;
+  XLU_ConfigValue *value;
 }
 
 %locations
@@ -39,8 +39,8 @@
 %type <string>            atom
 %destructor { free($$); } atom IDENT STRING NUMBER
 
-%type <setting>                         value valuelist values
-%destructor { xlu__cfg_set_free($$); }  value valuelist values
+%type <value>                             value valuelist values
+%destructor { xlu__cfg_value_free($$); }  value valuelist values
 
 %%
 
@@ -59,18 +59,18 @@ assignment: IDENT '=' value { xlu__cfg_set_store(ctx,$1,$3,@3.first_line); }
 endstmt: NEWLINE
  |      ';'
 
-value:  atom                         { $$= xlu__cfg_set_mk(ctx,1,$1); }
+value:  atom                         { $$= xlu__cfg_string_mk(ctx,$1); }
  |      '[' nlok valuelist ']'       { $$= $3; }
 
 atom:   STRING                   { $$= $1; }
  |      NUMBER                   { $$= $1; }
 
-valuelist: /* empty */           { $$= xlu__cfg_set_mk(ctx,0,0); }
+valuelist: /* empty */           { $$= xlu__cfg_list_mk(ctx,NULL); }
  |      values                  { $$= $1; }
  |      values ',' nlok         { $$= $1; }
 
-values: atom nlok                  { $$= xlu__cfg_set_mk(ctx,2,$1); }
- |      values ',' nlok atom nlok  { xlu__cfg_set_add(ctx,$1,$4); $$= $1; }
+values: atom nlok                  { $$= xlu__cfg_list_mk(ctx,$1); }
+ |      values ',' nlok atom nlok  { xlu__cfg_list_append(ctx,$1,$4); $$= $1; }
 
 nlok:
         /* nothing */
diff --git a/tools/libxl/libxlu_internal.h b/tools/libxl/libxlu_internal.h
index 7579158..092a17a 100644
--- a/tools/libxl/libxlu_internal.h
+++ b/tools/libxl/libxlu_internal.h
@@ -23,17 +23,35 @@
 #include <assert.h>
 #include <regex.h>
 
-#define XLU_ConfigList XLU_ConfigSetting
-
 #include "libxlutil.h"
 
-struct XLU_ConfigSetting { /* transparent */
+enum XLU_ConfigValueType {
+    XLU_STRING,
+    XLU_LIST,
+};
+
+typedef struct XLU_ConfigValue XLU_ConfigValue;
+
+typedef struct XLU_ConfigList {
+    int avalues; /* available slots */
+    int nvalues; /* actual occupied slots */
+    XLU_ConfigValue **values;
+} XLU_ConfigList;
+
+struct XLU_ConfigValue {
+    enum XLU_ConfigValueType type;
+    union {
+        char *string;
+        XLU_ConfigList list;
+    } u;
+};
+
+typedef struct XLU_ConfigSetting { /* transparent */
     struct XLU_ConfigSetting *next;
     char *name;
-    int nvalues, avalues; /* lists have avalues>1 */
-    char **values;
+    XLU_ConfigValue *value;
     int lineno;
-};
+} XLU_ConfigSetting;
 
 struct XLU_Config {
     XLU_ConfigSetting *settings;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 21/24] libxlu: nested list support
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (19 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 20/24] libxlu: rework internal representation of setting Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-12 19:44 ` [PATCH v5 22/24] libxlu: introduce new APIs Wei Liu
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

1. Extend grammar of parser.
2. Adjust internal functions to accept XLU_ConfigValue instead of
   char *.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxlu_cfg.c   | 30 +++++++-----------------------
 tools/libxl/libxlu_cfg_i.h |  5 +++--
 tools/libxl/libxlu_cfg_y.c | 26 +++++++++++++-------------
 tools/libxl/libxlu_cfg_y.y |  4 ++--
 4 files changed, 25 insertions(+), 40 deletions(-)

diff --git a/tools/libxl/libxlu_cfg.c b/tools/libxl/libxlu_cfg.c
index f000eed..611f5ec 100644
--- a/tools/libxl/libxlu_cfg.c
+++ b/tools/libxl/libxlu_cfg.c
@@ -332,19 +332,14 @@ XLU_ConfigValue *xlu__cfg_string_mk(CfgParseContext *ctx, char *atom)
     return NULL;
 }
 
-XLU_ConfigValue *xlu__cfg_list_mk(CfgParseContext *ctx, char *atom)
+XLU_ConfigValue *xlu__cfg_list_mk(CfgParseContext *ctx,
+                                  XLU_ConfigValue *val)
 {
     XLU_ConfigValue *value = NULL;
     XLU_ConfigValue **values = NULL;
-    XLU_ConfigValue *val = NULL;
 
     if (ctx->err) goto x;
 
-    val = malloc(sizeof(*val));
-    if (!val) goto xe;
-    val->type = XLU_STRING;
-    val->u.string = atom;
-
     values = malloc(sizeof(*values));
     if (!values) goto xe;
     values[0] = val;
@@ -363,19 +358,17 @@ XLU_ConfigValue *xlu__cfg_list_mk(CfgParseContext *ctx, char *atom)
  x:
     free(value);
     free(values);
-    free(val);
-    free(atom);
+    xlu__cfg_value_free(val);
     return NULL;
 }
 
 void xlu__cfg_list_append(CfgParseContext *ctx,
                           XLU_ConfigValue *list,
-                          char *atom)
+                          XLU_ConfigValue *val)
 {
-    XLU_ConfigValue *val = NULL;
     if (ctx->err) return;
 
-    assert(atom);
+    assert(val);
     assert(list->type == XLU_LIST);
 
     if (list->u.list.nvalues >= list->u.list.avalues) {
@@ -384,7 +377,7 @@ void xlu__cfg_list_append(CfgParseContext *ctx,
 
         if (list->u.list.avalues > INT_MAX / 100) {
             ctx->err = ERANGE;
-            free(atom);
+            xlu__cfg_value_free(val);
             return;
         }
 
@@ -393,7 +386,7 @@ void xlu__cfg_list_append(CfgParseContext *ctx,
                               sizeof(*new_values) * new_avalues);
         if (!new_values) {
             ctx->err = errno;
-            free(atom);
+            xlu__cfg_value_free(val);
             return;
         }
 
@@ -401,15 +394,6 @@ void xlu__cfg_list_append(CfgParseContext *ctx,
         list->u.list.values  = new_values;
     }
 
-    val = malloc(sizeof(*val));
-    if (!val) {
-        ctx->err = errno;
-        free(atom);
-        return;
-    }
-
-    val->type = XLU_STRING;
-    val->u.string = atom;
     list->u.list.values[list->u.list.nvalues] = val;
     list->u.list.nvalues++;
 }
diff --git a/tools/libxl/libxlu_cfg_i.h b/tools/libxl/libxlu_cfg_i.h
index b71e9fd..11dc33f 100644
--- a/tools/libxl/libxlu_cfg_i.h
+++ b/tools/libxl/libxlu_cfg_i.h
@@ -27,10 +27,11 @@ void xlu__cfg_set_store(CfgParseContext*, char *name,
                         XLU_ConfigValue *val, int lineno);
 XLU_ConfigValue *xlu__cfg_string_mk(CfgParseContext *ctx,
                                     char *atom);
-XLU_ConfigValue *xlu__cfg_list_mk(CfgParseContext *ctx, char *atom);
+XLU_ConfigValue *xlu__cfg_list_mk(CfgParseContext *ctx,
+                                  XLU_ConfigValue *val);
 void xlu__cfg_list_append(CfgParseContext *ctx,
                           XLU_ConfigValue *list,
-                          char *atom);
+                          XLU_ConfigValue *val);
 void xlu__cfg_value_free(XLU_ConfigValue *value);
 char *xlu__cfgl_strdup(CfgParseContext*, const char *src);
 char *xlu__cfgl_dequote(CfgParseContext*, const char *src);
diff --git a/tools/libxl/libxlu_cfg_y.c b/tools/libxl/libxlu_cfg_y.c
index eb3884f..b05e48b 100644
--- a/tools/libxl/libxlu_cfg_y.c
+++ b/tools/libxl/libxlu_cfg_y.c
@@ -377,7 +377,7 @@ union yyalloc
 /* YYFINAL -- State number of the termination state.  */
 #define YYFINAL  3
 /* YYLAST -- Last index in YYTABLE.  */
-#define YYLAST   24
+#define YYLAST   25
 
 /* YYNTOKENS -- Number of terminals.  */
 #define YYNTOKENS  12
@@ -444,8 +444,8 @@ static const yytype_int8 yyrhs[] =
       15,    -1,    16,    17,    -1,    17,    -1,     1,     6,    -1,
        3,     7,    18,    -1,     6,    -1,     8,    -1,    19,    -1,
        9,    22,    20,    10,    -1,     4,    -1,     5,    -1,    -1,
-      21,    -1,    21,    11,    22,    -1,    19,    22,    -1,    21,
-      11,    22,    19,    22,    -1,    -1,    22,     6,    -1
+      21,    -1,    21,    11,    22,    -1,    18,    22,    -1,    21,
+      11,    22,    18,    22,    -1,    -1,    22,     6,    -1
 };
 
 /* YYRLINE[YYN] -- source line where rule number YYN was defined.  */
@@ -517,14 +517,14 @@ static const yytype_int8 yydefgoto[] =
 static const yytype_int8 yypact[] =
 {
      -18,     4,     0,   -18,    -1,     6,   -18,   -18,   -18,     3,
-     -18,   -18,    11,   -18,   -18,   -18,   -18,   -18,   -18,    13,
-     -18,   -18,    12,    10,    17,   -18,   -18,    13,   -18,    17
+     -18,   -18,    14,   -18,   -18,   -18,   -18,   -18,   -18,    11,
+     -18,   -18,    12,    10,    18,   -18,   -18,    11,   -18,    18
 };
 
 /* YYPGOTO[NTERM-NUM].  */
 static const yytype_int8 yypgoto[] =
 {
-     -18,   -18,   -18,   -18,   -18,    15,   -18,   -17,   -18,   -18,
+     -18,   -18,   -18,   -18,   -18,    16,   -17,   -18,   -18,   -18,
      -14
 };
 
@@ -535,8 +535,8 @@ static const yytype_int8 yypgoto[] =
 static const yytype_int8 yytable[] =
 {
       -2,     4,    21,     5,     3,    11,     6,    24,     7,     6,
-      28,     7,    27,    12,    29,    14,    15,    14,    15,    20,
-      16,    26,    25,    20,    13
+      28,     7,    27,    12,    29,    14,    15,    20,    14,    15,
+      16,    26,    25,    16,    20,    13
 };
 
 #define yypact_value_is_default(yystate) \
@@ -548,8 +548,8 @@ static const yytype_int8 yytable[] =
 static const yytype_uint8 yycheck[] =
 {
        0,     1,    19,     3,     0,     6,     6,    21,     8,     6,
-      27,     8,    26,     7,    28,     4,     5,     4,     5,     6,
-       9,    11,    10,     6,     9
+      27,     8,    26,     7,    28,     4,     5,     6,     4,     5,
+       9,    11,    10,     9,     6,     9
 };
 
 /* YYSTOS[STATE-NUM] -- The (internal number of the) accessing
@@ -558,7 +558,7 @@ static const yytype_uint8 yystos[] =
 {
        0,    13,    14,     0,     1,     3,     6,     8,    15,    16,
       17,     6,     7,    17,     4,     5,     9,    18,    19,    22,
-       6,    19,    20,    21,    22,    10,    11,    22,    19,    22
+       6,    18,    20,    21,    22,    10,    11,    22,    18,    22
 };
 
 #define yyerrok		(yyerrstatus = 0)
@@ -1564,14 +1564,14 @@ yyreduce:
 
 /* Line 1806 of yacc.c  */
 #line 72 "libxlu_cfg_y.y"
-    { (yyval.value)= xlu__cfg_list_mk(ctx,(yyvsp[(1) - (2)].string)); }
+    { (yyval.value)= xlu__cfg_list_mk(ctx,(yyvsp[(1) - (2)].value)); }
     break;
 
   case 20:
 
 /* Line 1806 of yacc.c  */
 #line 73 "libxlu_cfg_y.y"
-    { xlu__cfg_list_append(ctx,(yyvsp[(1) - (5)].value),(yyvsp[(4) - (5)].string)); (yyval.value)= (yyvsp[(1) - (5)].value); }
+    { xlu__cfg_list_append(ctx,(yyvsp[(1) - (5)].value),(yyvsp[(4) - (5)].value)); (yyval.value)= (yyvsp[(1) - (5)].value); }
     break;
 
 
diff --git a/tools/libxl/libxlu_cfg_y.y b/tools/libxl/libxlu_cfg_y.y
index 6848686..4a5ca3a 100644
--- a/tools/libxl/libxlu_cfg_y.y
+++ b/tools/libxl/libxlu_cfg_y.y
@@ -69,8 +69,8 @@ valuelist: /* empty */           { $$= xlu__cfg_list_mk(ctx,NULL); }
  |      values                  { $$= $1; }
  |      values ',' nlok         { $$= $1; }
 
-values: atom nlok                  { $$= xlu__cfg_list_mk(ctx,$1); }
- |      values ',' nlok atom nlok  { xlu__cfg_list_append(ctx,$1,$4); $$= $1; }
+values: value nlok                  { $$= xlu__cfg_list_mk(ctx,$1); }
+ |      values ',' nlok value nlok  { xlu__cfg_list_append(ctx,$1,$4); $$= $1; }
 
 nlok:
         /* nothing */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 22/24] libxlu: introduce new APIs
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (20 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 21/24] libxlu: nested list support Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-13 14:12   ` Ian Jackson
  2015-02-12 19:44 ` [PATCH v5 23/24] xl: introduce xcalloc Wei Liu
  2015-02-12 19:44 ` [PATCH v5 24/24] xl: vNUMA support Wei Liu
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

These APIs can be used to manipulate XLU_ConfigValue and XLU_ConfigList.

APIs introduced:
1. xlu_cfg_value_type
2. xlu_cfg_value_get_string
3. xlu_cfg_value_get_list
4. xlu_cfg_get_listitem2

Move some definitions from private header to public header as needed.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
Changes in v5:
1. Use calling convention like old APIs.
---
 tools/libxl/libxlu_cfg.c      | 41 +++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxlu_internal.h |  7 -------
 tools/libxl/libxlutil.h       | 13 +++++++++++++
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/tools/libxl/libxlu_cfg.c b/tools/libxl/libxlu_cfg.c
index 611f5ec..46b1d4f 100644
--- a/tools/libxl/libxlu_cfg.c
+++ b/tools/libxl/libxlu_cfg.c
@@ -199,6 +199,47 @@ static int find_atom(const XLU_Config *cfg, const char *n,
     return 0;
 }
 
+
+enum XLU_ConfigValueType xlu_cfg_value_type(const XLU_ConfigValue *value)
+{
+    return value->type;
+}
+
+int xlu_cfg_value_get_string(const XLU_Config *cfg, XLU_ConfigValue *value,
+                             char **value_r, int dont_warn)
+{
+    if (value->type != XLU_STRING) {
+        if (!dont_warn)
+            fprintf(cfg->report, "warning: value is not a string\n");
+        *value_r = NULL;
+        return EINVAL;
+    }
+
+    *value_r = value->u.string;
+    return 0;
+}
+
+int xlu_cfg_value_get_list(const XLU_Config *cfg, XLU_ConfigValue *value,
+                           XLU_ConfigList **value_r, int dont_warn)
+{
+    if (value->type != XLU_LIST) {
+        if (!dont_warn)
+            fprintf(cfg->report, "warning: value is not a list\n");
+        *value_r = NULL;
+        return EINVAL;
+    }
+
+    *value_r = &value->u.list;
+    return 0;
+}
+
+XLU_ConfigValue *xlu_cfg_get_listitem2(const XLU_ConfigList *list,
+                                       int entry)
+{
+    if (entry < 0 || entry >= list->nvalues) return NULL;
+    return list->values[entry];
+}
+
 int xlu_cfg_get_string(const XLU_Config *cfg, const char *n,
                        const char **value_r, int dont_warn) {
     XLU_ConfigSetting *set;
diff --git a/tools/libxl/libxlu_internal.h b/tools/libxl/libxlu_internal.h
index 092a17a..24ed6d4 100644
--- a/tools/libxl/libxlu_internal.h
+++ b/tools/libxl/libxlu_internal.h
@@ -25,13 +25,6 @@
 
 #include "libxlutil.h"
 
-enum XLU_ConfigValueType {
-    XLU_STRING,
-    XLU_LIST,
-};
-
-typedef struct XLU_ConfigValue XLU_ConfigValue;
-
 typedef struct XLU_ConfigList {
     int avalues; /* available slots */
     int nvalues; /* actual occupied slots */
diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h
index 0333e55..989605a 100644
--- a/tools/libxl/libxlutil.h
+++ b/tools/libxl/libxlutil.h
@@ -20,9 +20,15 @@
 
 #include "libxl.h"
 
+enum XLU_ConfigValueType {
+    XLU_STRING,
+    XLU_LIST,
+};
+
 /* Unless otherwise stated, all functions return an errno value. */
 typedef struct XLU_Config XLU_Config;
 typedef struct XLU_ConfigList XLU_ConfigList;
+typedef struct XLU_ConfigValue XLU_ConfigValue;
 
 XLU_Config *xlu_cfg_init(FILE *report, const char *report_filename);
   /* 0 means we got ENOMEM. */
@@ -66,6 +72,13 @@ const char *xlu_cfg_get_listitem(const XLU_ConfigList*, int entry);
   /* xlu_cfg_get_listitem cannot fail, except that if entry is
    * out of range it returns 0 (not setting errno) */
 
+enum XLU_ConfigValueType xlu_cfg_value_type(const XLU_ConfigValue *value);
+int xlu_cfg_value_get_string(const XLU_Config *cfg,  XLU_ConfigValue *value,
+                             char **value_r, int dont_warn);
+int xlu_cfg_value_get_list(const XLU_Config *cfg, XLU_ConfigValue *value,
+                           XLU_ConfigList **value_r, int dont_warn);
+XLU_ConfigValue *xlu_cfg_get_listitem2(const XLU_ConfigList *list,
+                                       int entry);
 
 /*
  * Disk specification parsing.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 23/24] xl: introduce xcalloc
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (21 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 22/24] libxlu: introduce new APIs Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-12 20:17   ` Andrew Cooper
  2015-02-12 19:44 ` [PATCH v5 24/24] xl: vNUMA support Wei Liu
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/xl_cmdimpl.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 440db78..ec7fb2d 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -289,6 +289,18 @@ static void *xmalloc(size_t sz) {
     return r;
 }
 
+static void *xcalloc(size_t n, size_t sz) __attribute__((unused));
+static void *xcalloc(size_t n, size_t sz) {
+    void *r;
+    r = calloc(n, sz);
+    if (!r) {
+        fprintf(stderr,"xl: Unable to calloc %lu bytes.\n",
+                (unsigned long)sz * (unsigned long)n);
+        exit(-ERROR_FAIL);
+    }
+    return r;
+}
+
 static void *xrealloc(void *ptr, size_t sz) {
     void *r;
     if (!sz) { free(ptr); return 0; }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 24/24] xl: vNUMA support
  2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
                   ` (22 preceding siblings ...)
  2015-02-12 19:44 ` [PATCH v5 23/24] xl: introduce xcalloc Wei Liu
@ 2015-02-12 19:44 ` Wei Liu
  2015-02-24 16:19   ` Dario Faggioli
  23 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-12 19:44 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, JBeulich, ufimtseva

This patch includes configuration options parser and documentation.

Please find the hunk to xl.cfg.pod.5 for more information.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes in v5:
1. New syntax for vNUMA configuration.
---
 docs/man/xl.cfg.pod.5    |  54 ++++++++++++++++++
 tools/libxl/xl_cmdimpl.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 192 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 408653f..2a27b1c 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -266,6 +266,60 @@ it will crash.
 
 =back
 
+=head3 Guest Virtual NUMA Configuration
+
+=over 4
+
+=item B<vnuma=[ VNODE_SPEC, VNODE_SPEC, ... ]
+
+Specify virtual NUMA configuration with positional arguments. The
+nth B<VNODE_SPECE> in the list specifies the configuration of nth
+virtual node.
+
+Each B<VNODE_SPEC> is a list, which has a form of
+"[VNODE_CONFIG_OPTION,VNODE_CONFIG_OPTION, ... ]"  (without quotes).
+
+For example vnuma = [ ["pnode=0","size=512","vcpus=0-4","vdistances=10,20"] ]
+means vnode 0 is mapped to pnode 0, has 512MB ram, has vcpus 0 to 4, the
+distance to itself is 10 and the distance to vnode 1 is 20.
+
+Each B<VNODE_CONFIG_OPTION> is a quoted string. Supported
+B<VNODE_CONFIG_OPTION>s are:
+
+=over 4
+
+=item B<pnode=NUMBER>
+
+Specify which physical node this virtual node maps to.
+
+=item B<size=MBYTES>
+
+Specify the size of this virtual node. The sum of memory size of all
+vnodes must match B<maxmem=> (or B<memory=> if B<maxmem=> is not
+specified).
+
+=item B<vcpus=CPU-STRING>
+
+Specify which vcpus belong to this node. B<CPU-STRING> is a string
+separated by comma. You can specify range and single cpu. An example
+is "vcpus=0-5,8", which means you specify vcpu 0 to vcpu 5, and vcpu
+8.
+
+=item B<vdistances=NUMBER, NUMBER, ... >
+
+Specify virtual distance from this node to all nodes (including
+itself) with positional arguments. For example, "vdistance=10,20"
+for vnode 0 means the distance from vnode 0 to vnode 0 is 10, from
+vnode 0 to vnode 1 is 20. The number of arguments supplied must match
+the total number of vnodes.
+
+Normally you can use the values from "xl info -n" or "numactl
+--hardware" to fill in vdistance list.
+
+=back
+
+=back
+
 =head3 Event Actions
 
 =over 4
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index ec7fb2d..f52daf9 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -158,7 +158,6 @@ struct domain_create {
 };
 
 
-static uint32_t find_domain(const char *p) __attribute__((warn_unused_result));
 static uint32_t find_domain(const char *p)
 {
     uint32_t domid;
@@ -989,6 +988,142 @@ static int parse_nic_config(libxl_device_nic *nic, XLU_Config **config, char *to
     return 0;
 }
 
+static void parse_vnuma_config(const XLU_Config *config,
+                               libxl_domain_build_info *b_info)
+{
+    libxl_physinfo physinfo;
+    uint32_t nr_nodes;
+    XLU_ConfigList *vnuma;
+    int i, j, len, num_vnuma;
+
+
+    libxl_physinfo_init(&physinfo);
+    if (libxl_get_physinfo(ctx, &physinfo) != 0) {
+        libxl_physinfo_dispose(&physinfo);
+        fprintf(stderr, "libxl_get_physinfo failed\n");
+        exit(1);
+    }
+
+    nr_nodes = physinfo.nr_nodes;
+    libxl_physinfo_dispose(&physinfo);
+
+    if (xlu_cfg_get_list(config, "vnuma", &vnuma, &num_vnuma, 1))
+        return;
+
+    b_info->num_vnuma_nodes = num_vnuma;
+    b_info->vnuma_nodes = xcalloc(num_vnuma, sizeof(libxl_vnode_info));
+
+    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+        libxl_vnode_info *p = &b_info->vnuma_nodes[i];
+
+        libxl_vnode_info_init(p);
+        libxl_cpu_bitmap_alloc(ctx, &p->vcpus, b_info->max_vcpus);
+        libxl_bitmap_set_none(&p->vcpus);
+        p->distances = xcalloc(b_info->num_vnuma_nodes,
+                               sizeof(*p->distances));
+        p->num_distances = b_info->num_vnuma_nodes;
+    }
+
+    for (i = 0; i < num_vnuma; i++) {
+        XLU_ConfigValue *vnode_spec, *conf_option;
+        XLU_ConfigList *vnode_config_list;
+        int conf_count;
+        libxl_vnode_info *p = &b_info->vnuma_nodes[i];
+
+        vnode_spec = xlu_cfg_get_listitem2(vnuma, i);
+        assert(vnode_spec);
+
+        xlu_cfg_value_get_list(config, vnode_spec, &vnode_config_list, 0);
+        if (!vnode_config_list) {
+            fprintf(stderr, "xl: cannot get vnode config option list\n");
+            exit(1);
+        }
+
+        for (conf_count = 0;
+             (conf_option =
+              xlu_cfg_get_listitem2(vnode_config_list, conf_count));
+             conf_count++) {
+
+            if (xlu_cfg_value_type(conf_option) == XLU_STRING) {
+                char *buf, *option_untrimmed, *value_untrimmed;
+                char *option, *value;
+                char *endptr;
+                unsigned long val;
+
+                xlu_cfg_value_get_string(config, conf_option, &buf, 0);
+
+                if (!buf) continue;
+
+                if (split_string_into_pair(buf, "=",
+                                           &option_untrimmed,
+                                           &value_untrimmed)) {
+                    fprintf(stderr, "xl: failed to split \"%s\" into pair\n",
+                            buf);
+                    exit(1);
+                }
+                trim(isspace, option_untrimmed, &option);
+                trim(isspace, value_untrimmed, &value);
+
+#define ABORT_IF_FAILED(str)                                            \
+                do {                                                    \
+                    if (endptr == value || val == ULONG_MAX) {          \
+                        fprintf(stderr,                                 \
+                                "xl: failed to convert \"%s\" to number\n", \
+                                (str));                                 \
+                        exit(1);                                        \
+                    }                                                   \
+                } while (0)
+
+                if (!strcmp("pnode", option)) {
+                    val = strtoul(value, &endptr, 10);
+                    ABORT_IF_FAILED(value);
+                    if (val >= nr_nodes) {
+                        fprintf(stderr,
+                                "xl: invalid pnode number: %lu\n", val);
+                        exit(1);
+                    }
+                    p->pnode = val;
+                } else if (!strcmp("size", option)) {
+                    val = strtoul(value, &endptr, 10);
+                    ABORT_IF_FAILED(value);
+                    p->memkb = val << 10;
+                } else if (!strcmp("vcpus", option)) {
+                    libxl_string_list cpu_spec_list;
+                    int cpu;
+                    unsigned long s, e;
+
+                    split_string_into_string_list(value, ",", &cpu_spec_list);
+                    len = libxl_string_list_length(&cpu_spec_list);
+
+                    for (j = 0; j < len; j++) {
+                        parse_range(cpu_spec_list[j], &s, &e);
+                        for (cpu = s; cpu <=e; cpu++)
+                            libxl_bitmap_set(&p->vcpus, cpu);
+                    }
+                    libxl_string_list_dispose(&cpu_spec_list);
+                } else if (!strcmp("vdistances", option)) {
+                    libxl_string_list vdist;
+
+                    split_string_into_string_list(value, ",", &vdist);
+                    len = libxl_string_list_length(&vdist);
+
+                    for (j = 0; j < len; j++) {
+                        val = strtoul(vdist[j], &endptr, 10);
+                        ABORT_IF_FAILED(vdist[j]);
+                        p->distances[j] = val;
+                    }
+                    libxl_string_list_dispose(&vdist);
+                }
+#undef ABORT_IF_FAILED
+                free(option);
+                free(value);
+                free(option_untrimmed);
+                free(value_untrimmed);
+            }
+        }
+    }
+}
+
 static void parse_config_data(const char *config_source,
                               const char *config_data,
                               int config_len,
@@ -1179,6 +1314,8 @@ static void parse_config_data(const char *config_source,
         }
     }
 
+    parse_vnuma_config(config, b_info);
+
     if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
         b_info->rtc_timeoffset = l;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 23/24] xl: introduce xcalloc
  2015-02-12 19:44 ` [PATCH v5 23/24] xl: introduce xcalloc Wei Liu
@ 2015-02-12 20:17   ` Andrew Cooper
  2015-02-13 10:25     ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2015-02-12 20:17 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> ---
>  tools/libxl/xl_cmdimpl.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 440db78..ec7fb2d 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -289,6 +289,18 @@ static void *xmalloc(size_t sz) {
>      return r;
>  }
>  
> +static void *xcalloc(size_t n, size_t sz) __attribute__((unused));
> +static void *xcalloc(size_t n, size_t sz) {
> +    void *r;
> +    r = calloc(n, sz);

These two lines can be joined, espcially in a small wrapper like this.

> +    if (!r) {
> +        fprintf(stderr,"xl: Unable to calloc %lu bytes.\n",
> +                (unsigned long)sz * (unsigned long)n);

%zu is the correct format identifier for a size_t, and it will allow you
to drop the casts.

~Andrew

> +        exit(-ERROR_FAIL);
> +    }
> +    return r;
> +}
> +
>  static void *xrealloc(void *ptr, size_t sz) {
>      void *r;
>      if (!sz) { free(ptr); return 0; }

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 23/24] xl: introduce xcalloc
  2015-02-12 20:17   ` Andrew Cooper
@ 2015-02-13 10:25     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-13 10:25 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, xen-devel,
	JBeulich, ufimtseva

On Thu, Feb 12, 2015 at 08:17:42PM +0000, Andrew Cooper wrote:
> On 12/02/15 19:44, Wei Liu wrote:
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > ---
> >  tools/libxl/xl_cmdimpl.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 440db78..ec7fb2d 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -289,6 +289,18 @@ static void *xmalloc(size_t sz) {
> >      return r;
> >  }
> >  
> > +static void *xcalloc(size_t n, size_t sz) __attribute__((unused));
> > +static void *xcalloc(size_t n, size_t sz) {
> > +    void *r;
> > +    r = calloc(n, sz);
> 
> These two lines can be joined, espcially in a small wrapper like this.
> 
> > +    if (!r) {
> > +        fprintf(stderr,"xl: Unable to calloc %lu bytes.\n",
> > +                (unsigned long)sz * (unsigned long)n);
> 
> %zu is the correct format identifier for a size_t, and it will allow you
> to drop the casts.
> 

Both issues fixed. Thanks for reviewing.

Wei.

> ~Andrew
> 
> > +        exit(-ERROR_FAIL);
> > +    }
> > +    return r;
> > +}
> > +
> >  static void *xrealloc(void *ptr, size_t sz) {
> >      void *r;
> >      if (!sz) { free(ptr); return 0; }

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 01/24] xen: dump vNUMA information with debug key "u"
  2015-02-12 19:44 ` [PATCH v5 01/24] xen: dump vNUMA information with debug key "u" Wei Liu
@ 2015-02-13 11:50   ` Andrew Cooper
  2015-02-16 14:35     ` Dario Faggioli
  0 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 11:50 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> Signed-off-by: Elena Ufimsteva <ufimtseva@gmail.com>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> ---
> Changes in v5:
> 1. Use read_trylock.
> 2. Use correct array size for strlcpy.
> 3. Coding style fix.
>
> Changes in v4:
> 1. Acquire rwlock before accessing vnuma struct.
> 2. Improve output.
>
> Changes in v3:
> 1. Constify struct vnuma_info.
> 2. Don't print amount of ram of a vmemrange.
> 3. Process softirqs when dumping information.
> 4. Fix format string.
>
> Changes in v2:
> 1. Use unsigned int for loop vars.
> 2. Use strlcpy.
> 3. Properly align output.
> ---
>  xen/arch/x86/numa.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 70 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> index 628a40a..e500f33 100644
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -16,6 +16,7 @@
>  #include <xen/pfn.h>
>  #include <asm/acpi.h>
>  #include <xen/sched.h>
> +#include <xen/softirq.h>
>  
>  static int numa_setup(char *s);
>  custom_param("numa", numa_setup);
> @@ -363,10 +364,12 @@ EXPORT_SYMBOL(node_data);
>  static void dump_numa(unsigned char key)
>  {
>      s_time_t now = NOW();
> -    int i;
> +    unsigned int i, j;
> +    int err;
>      struct domain *d;
>      struct page_info *page;
>      unsigned int page_num_node[MAX_NUMNODES];
> +    const struct vnuma_info *vnuma;
>  
>      printk("'%c' pressed -> dumping numa info (now-0x%X:%08X)\n", key,
>             (u32)(now>>32), (u32)now);
> @@ -393,6 +396,8 @@ static void dump_numa(unsigned char key)
>      printk("Memory location of each domain:\n");
>      for_each_domain ( d )
>      {
> +        process_pending_softirqs();
> +
>          printk("Domain %u (total: %u):\n", d->domain_id, d->tot_pages);
>  
>          for_each_online_node ( i )
> @@ -408,6 +413,70 @@ static void dump_numa(unsigned char key)
>  
>          for_each_online_node ( i )
>              printk("    Node %u: %u\n", i, page_num_node[i]);
> +
> +        if ( !read_trylock(&d->vnuma_rwlock) )
> +            continue;
> +
> +        if ( !d->vnuma )
> +        {
> +            read_unlock(&d->vnuma_rwlock);
> +            continue;
> +        }
> +
> +        vnuma = d->vnuma;
> +        printk("     %u vnodes, %u vcpus, guest physical layout:\n",
> +               vnuma->nr_vnodes, d->max_vcpus);
> +        for ( i = 0; i < vnuma->nr_vnodes; i++ )r
> +        {
> +            unsigned int start_cpu = ~0U;
> +
> +            err = snprintf(keyhandler_scratch, 12, "%3u",
> +                    vnuma->vnode_to_pnode[i]);
> +            if ( err < 0 || vnuma->vnode_to_pnode[i] == NUMA_NO_NODE )
> +                strlcpy(keyhandler_scratch, "???", sizeof(keyhandler_scratch));
> +
> +            printk("       %3u: pnode %s,", i, keyhandler_scratch);
> +
> +            printk(" vcpus ");
> +
> +            for ( j = 0; j < d->max_vcpus; j++ )
> +            {
> +                if ( !(j & 0x3f) )
> +                    process_pending_softirqs();
> +
> +                if ( vnuma->vcpu_to_vnode[j] == i )
> +                {
> +                    if ( start_cpu == ~0U )
> +                    {
> +                        printk("%d", j);
> +                        start_cpu = j;
> +                    }
> +                }
> +                else if ( start_cpu != ~0U )
> +                {
> +                    if ( j - 1 != start_cpu )
> +                        printk("-%d ", j - 1);
> +                    else
> +                        printk(" ");
> +                    start_cpu = ~0U;
> +                }
> +            }
> +
> +            if ( start_cpu != ~0U  && start_cpu != j - 1 )
> +                printk("-%d", j - 1);
> +
> +            printk("\n");
> +
> +            for ( j = 0; j < vnuma->nr_vmemranges; j++ )
> +            {
> +                if ( vnuma->vmemrange[j].nid == i )
> +                    printk("           %016"PRIx64" - %016"PRIx64"\n",
> +                           vnuma->vmemrange[j].start,
> +                           vnuma->vmemrange[j].end);
> +            }
> +        }
> +
> +        read_unlock(&d->vnuma_rwlock);
>      }
>  
>      rcu_read_unlock(&domlist_read_lock);

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 02/24] xen: make two memory hypercalls vNUMA-aware
  2015-02-12 19:44 ` [PATCH v5 02/24] xen: make two memory hypercalls vNUMA-aware Wei Liu
@ 2015-02-13 12:00   ` Andrew Cooper
  2015-02-13 13:24     ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 12:00 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> Make XENMEM_increase_reservation and XENMEM_populate_physmap
> vNUMA-aware.
>
> That is, if guest requests Xen to allocate memory for specific vnode,
> Xen can translate vnode to pnode using vNUMA information of that guest.
>
> XENMEMF_vnode is introduced for the guest to mark the node number is in
> fact virtual node number and should be translated by Xen.
>
> XENFEAT_memory_op_vnode_supported is introduced to indicate that Xen is
> able to translate virtual node to physical node.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> ---
> Changes in v5:
> 1. New logic in translation function.
>
> Changes in v3:
> 1. Coding style fix.
> 2. Remove redundant assignment.
>
> Changes in v2:
> 1. Return start_extent when vnode translation fails.
> 2. Expose new feature bit to guest.
> 3. Fix typo in comment.
> ---
>  xen/common/kernel.c           |  2 +-
>  xen/common/memory.c           | 51 +++++++++++++++++++++++++++++++++++++++----
>  xen/include/public/features.h |  3 +++
>  xen/include/public/memory.h   |  2 ++
>  4 files changed, 53 insertions(+), 5 deletions(-)
>
> diff --git a/xen/common/kernel.c b/xen/common/kernel.c
> index 0d9e519..e5e0050 100644
> --- a/xen/common/kernel.c
> +++ b/xen/common/kernel.c
> @@ -301,7 +301,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          switch ( fi.submap_idx )
>          {
>          case 0:
> -            fi.submap = 0;
> +            fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
>              if ( VM_ASSIST(d, VMASST_TYPE_pae_extended_cr3) )
>                  fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
>              if ( paging_mode_translate(current->domain) )
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index e84ace9..fa3729b 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -692,6 +692,43 @@ out:
>      return rc;
>  }
>  
> +static int translate_vnode_to_pnode(struct domain *d,
> +                                    struct xen_memory_reservation *r,

const struct xen_memory_reservation *r

> +                                    struct memop_args *a)
> +{
> +    int rc = 0;
> +    unsigned int vnode, pnode;
> +
> +    if ( r->mem_flags & XENMEMF_vnode )
> +    {
> +        a->memflags &= ~MEMF_node(XENMEMF_get_node(r->mem_flags));
> +        a->memflags &= ~MEMF_exact_node;

This interface feels semantically wrong, especially as the caller sets
these fields first, just to have them dropped at this point.

A rather more appropriate function would be something along the lines of
"construct_memop_from_reservation()" (name subject to improvement) which
takes care completely filling 'a' from 'r', doing a v->p translation if
needed.

~Andrew

> +
> +        read_lock(&d->vnuma_rwlock);
> +        if ( d->vnuma )
> +        {
> +            vnode = XENMEMF_get_node(r->mem_flags);
> +
> +            if ( vnode < d->vnuma->nr_vnodes )
> +            {
> +                pnode = d->vnuma->vnode_to_pnode[vnode];
> +
> +                if ( pnode != NUMA_NO_NODE )
> +                {
> +                    a->memflags |= MEMF_node(pnode);
> +                    if ( r->mem_flags & XENMEMF_exact_node_request )
> +                        a->memflags |= MEMF_exact_node;
> +                }
> +            }
> +            else
> +                rc = -EINVAL;
> +        }
> +        read_unlock(&d->vnuma_rwlock);
> +    }
> +
> +    return rc;
> +}
> +
>  long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>  {
>      struct domain *d;
> @@ -734,10 +771,6 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>              args.memflags = MEMF_bits(address_bits);
>          }
>  
> -        args.memflags |= MEMF_node(XENMEMF_get_node(reservation.mem_flags));
> -        if ( reservation.mem_flags & XENMEMF_exact_node_request )
> -            args.memflags |= MEMF_exact_node;
> -
>          if ( op == XENMEM_populate_physmap
>               && (reservation.mem_flags & XENMEMF_populate_on_demand) )
>              args.memflags |= MEMF_populate_on_demand;
> @@ -747,6 +780,16 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>              return start_extent;
>          args.domain = d;
>  
> +        args.memflags |= MEMF_node(XENMEMF_get_node(reservation.mem_flags));
> +        if ( reservation.mem_flags & XENMEMF_exact_node_request )
> +            args.memflags |= MEMF_exact_node;
> +
> +        if ( translate_vnode_to_pnode(d, &reservation, &args) )
> +        {
> +            rcu_unlock_domain(d);
> +            return start_extent;
> +        }
> +
>          if ( xsm_memory_adjust_reservation(XSM_TARGET, current->domain, d) )
>          {
>              rcu_unlock_domain(d);
> diff --git a/xen/include/public/features.h b/xen/include/public/features.h
> index 16d92aa..2110b04 100644
> --- a/xen/include/public/features.h
> +++ b/xen/include/public/features.h
> @@ -99,6 +99,9 @@
>  #define XENFEAT_grant_map_identity        12
>   */
>  
> +/* Guest can use XENMEMF_vnode to specify virtual node for memory op. */
> +#define XENFEAT_memory_op_vnode_supported 13
> +
>  #define XENFEAT_NR_SUBMAPS 1
>  
>  #endif /* __XEN_PUBLIC_FEATURES_H__ */
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index 595f953..2b5206b 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -55,6 +55,8 @@
>  /* Flag to request allocation only from the node specified */
>  #define XENMEMF_exact_node_request  (1<<17)
>  #define XENMEMF_exact_node(n) (XENMEMF_node(n) | XENMEMF_exact_node_request)
> +/* Flag to indicate the node specified is virtual node */
> +#define XENMEMF_vnode  (1<<18)
>  #endif
>  
>  struct xen_memory_reservation {

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 02/24] xen: make two memory hypercalls vNUMA-aware
  2015-02-13 12:00   ` Andrew Cooper
@ 2015-02-13 13:24     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-13 13:24 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, xen-devel,
	JBeulich, ufimtseva

On Fri, Feb 13, 2015 at 12:00:29PM +0000, Andrew Cooper wrote:
[...]
> > +static int translate_vnode_to_pnode(struct domain *d,
> > +                                    struct xen_memory_reservation *r,
> 
> const struct xen_memory_reservation *r
> 

Ack.

> > +                                    struct memop_args *a)
> > +{
> > +    int rc = 0;
> > +    unsigned int vnode, pnode;
> > +
> > +    if ( r->mem_flags & XENMEMF_vnode )
> > +    {
> > +        a->memflags &= ~MEMF_node(XENMEMF_get_node(r->mem_flags));
> > +        a->memflags &= ~MEMF_exact_node;
> 
> This interface feels semantically wrong, especially as the caller sets
> these fields first, just to have them dropped at this point.
> 
> A rather more appropriate function would be something along the lines of
> "construct_memop_from_reservation()" (name subject to improvement) which
> takes care completely filling 'a' from 'r', doing a v->p translation if
> needed.
> 

I'm fine with this. I'm going to use construct_memop_from_reservation if
no better alternative emerges.

Wei.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 22/24] libxlu: introduce new APIs
  2015-02-12 19:44 ` [PATCH v5 22/24] libxlu: introduce new APIs Wei Liu
@ 2015-02-13 14:12   ` Ian Jackson
  2015-02-16 19:10     ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Ian Jackson @ 2015-02-13 14:12 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, andrew.cooper3, dario.faggioli, ian.jackson,
	xen-devel, JBeulich, ufimtseva

Wei Liu writes ("[PATCH v5 22/24] libxlu: introduce new APIs"):
> These APIs can be used to manipulate XLU_ConfigValue and XLU_ConfigList.
> 
> +    if (value->type != XLU_STRING) {
> +        if (!dont_warn)
> +            fprintf(cfg->report, "warning: value is not a string\n");
> +        *value_r = NULL;
> +        return EINVAL;

This message needs to include the file and line number, or it is very
hard for the user to use.  The other call sites (which are based on
`find') require the caller to provide a name, which means that the
setting name can be printed too.  Maybe you could do something
similar.

If you were feeling keen you could replace these formulaic things with
something like:
   return report_bad_cfg(dont_warn, cfg, set, n, "value is not a string");
or
   return REPORT_BAD_CFG("value is not a string");
(being a function or macro which always returns EINVAL), or some such.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA
  2015-02-12 19:44 ` [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA Wei Liu
@ 2015-02-13 14:12   ` Ian Jackson
  2015-02-13 15:21     ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Ian Jackson @ 2015-02-13 14:12 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

Wei Liu writes ("[PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA"):
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
...
> +/* LIBXL_HAVE_VNUMA
> + *
> + * If it is defined, libxl supports vNUMA configuration
> + */

I think you should be more specific about which calls are covered.

Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-12 19:44 ` [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check Wei Liu
@ 2015-02-13 14:15   ` Ian Jackson
  2015-02-13 15:12     ` Wei Liu
  2015-02-13 15:40   ` Andrew Cooper
  2015-02-17 16:38   ` Dario Faggioli
  2 siblings, 1 reply; 94+ messages in thread
From: Ian Jackson @ 2015-02-13 14:15 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

Wei Liu writes ("[PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check"):
> This function is used to check whether vNUMA configuration (be it
> auto-generated or supplied by user) is valid.

This looks plausible, but I think you should explain what the impact
of this patch is.  Presumably the intent is to replace various later
failures with ERROR_FAIL with something more useful and more
specific ?

Are there any cases which this new check forbids but which are
currently accepted by libxl ?  If so then we have to think about
compatibility.

Also I would like to see an ack from the authors of the vnuma support,
as I'm not familiar enough with vnuma to fully understand the
semantics of the new checks.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 18/24] libxl: disallow memory relocation when vNUMA is enabled
  2015-02-12 19:44 ` [PATCH v5 18/24] libxl: disallow memory relocation when vNUMA is enabled Wei Liu
@ 2015-02-13 14:17   ` Ian Jackson
  2015-02-13 15:18     ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Ian Jackson @ 2015-02-13 14:17 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, andrew.cooper3, dario.faggioli, ian.jackson,
	xen-devel, JBeulich, ufimtseva

Wei Liu writes ("[PATCH v5 18/24] libxl: disallow memory relocation when vNUMA is enabled"):
> Disallow memory relocation when vNUMA is enabled, because relocated
> memory ends up off node. Further more, even if we dynamically expand
> node coverage in hvmloader, low memory and high memory may reside
> in different physical nodes, blindly relocating low memory to high
> memory gives us a sub-optimal configuration.
...
>                          "%d",
> -                        b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL);
> +                        b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL &&
> +                        !b_info->num_vnuma_nodes);

I think it would be useful to add a helper function
  libxl__vnuma_configured()
to replace all these open-coded !b_info calls etc.

That will make things easier if the vnuma specification arrangements
change, and it will make the code more readable too I think.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen for HVM guest
  2015-02-12 19:44 ` [PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
@ 2015-02-13 14:21   ` Ian Jackson
  2015-02-13 15:18     ` Wei Liu
  2015-02-17 14:26   ` Dario Faggioli
  1 sibling, 1 reply; 94+ messages in thread
From: Ian Jackson @ 2015-02-13 14:21 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

Wei Liu writes ("[PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen for HVM guest"):
> Transform user supplied vNUMA configuration into libxl internal
> representations then libxc representations. Check validity along the
> line.

This is going to be a totally stupid question: but why are there three
representations here, rather than two ?

Is the libxc representation too annoying for use internally in libxl ?

Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 20/24] libxlu: rework internal representation of setting
  2015-02-12 19:44 ` [PATCH v5 20/24] libxlu: rework internal representation of setting Wei Liu
@ 2015-02-13 14:24   ` Ian Jackson
  0 siblings, 0 replies; 94+ messages in thread
From: Ian Jackson @ 2015-02-13 14:24 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

Wei Liu writes ("[PATCH v5 20/24] libxlu: rework internal representation of setting"):
> This patches does following things:

Thanks,

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest
  2015-02-12 19:44 ` [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest Wei Liu
@ 2015-02-13 14:30   ` Andrew Cooper
  2015-02-13 15:05     ` Wei Liu
  2015-02-16 16:58   ` Dario Faggioli
  1 sibling, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 14:30 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
> index 45b8644..1809674 100644
> --- a/tools/libxc/xc_private.h
> +++ b/tools/libxc/xc_private.h
> @@ -35,6 +35,8 @@
>  
>  #include <xen/sys/privcmd.h>
>  
> +#define XC_VNUMA_NO_NODE (~0U)

Shouldn't this sentinel value come from the Xen public API?

~Andrew

> +
>  #if defined(HAVE_VALGRIND_MEMCHECK_H) && !defined(NDEBUG) && !defined(__MINIOS__)
>  /* Compile in Valgrind client requests? */
>  #include <valgrind/memcheck.h>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest
  2015-02-13 14:30   ` Andrew Cooper
@ 2015-02-13 15:05     ` Wei Liu
  2015-02-13 15:17       ` Andrew Cooper
  0 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-13 15:05 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, xen-devel,
	JBeulich, ufimtseva

On Fri, Feb 13, 2015 at 02:30:30PM +0000, Andrew Cooper wrote:
> On 12/02/15 19:44, Wei Liu wrote:
> > diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
> > index 45b8644..1809674 100644
> > --- a/tools/libxc/xc_private.h
> > +++ b/tools/libxc/xc_private.h
> > @@ -35,6 +35,8 @@
> >  
> >  #include <xen/sys/privcmd.h>
> >  
> > +#define XC_VNUMA_NO_NODE (~0U)
> 
> Shouldn't this sentinel value come from the Xen public API?
> 

Xen public headers don't have a NUMA_NO_NODE though. Do you want me
to add one to memory.h?

Wei.

> ~Andrew
> 
> > +
> >  #if defined(HAVE_VALGRIND_MEMCHECK_H) && !defined(NDEBUG) && !defined(__MINIOS__)
> >  /* Compile in Valgrind client requests? */
> >  #include <valgrind/memcheck.h>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-13 14:15   ` Ian Jackson
@ 2015-02-13 15:12     ` Wei Liu
  2015-02-13 15:39       ` Elena Ufimtseva
  2015-02-17 16:44       ` Dario Faggioli
  0 siblings, 2 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-13 15:12 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

On Fri, Feb 13, 2015 at 02:15:47PM +0000, Ian Jackson wrote:
> Wei Liu writes ("[PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check"):
> > This function is used to check whether vNUMA configuration (be it
> > auto-generated or supplied by user) is valid.
> 
> This looks plausible, but I think you should explain what the impact
> of this patch is.  Presumably the intent is to replace various later
> failures with ERROR_FAIL with something more useful and more
> specific ?
> 

Yes, providing more useful error message is on aspect. Another aspect is
just to do sanity check -- passing an invalid layout to guest doesn't
make much sense.

> Are there any cases which this new check forbids but which are
> currently accepted by libxl ?  If so then we have to think about
> compatibility.
> 

First thing is there is no previous supported vNUMA interface in
toolstack so there won't be a situation where previous good config
doesn't pass this check.

Second thing is if user supplies a config without vNUMA configuration
this function will not get called, so it won't have any effect.

> Also I would like to see an ack from the authors of the vnuma support,
> as I'm not familiar enough with vnuma to fully understand the
> semantics of the new checks.
> 

Elena and Dario, what do you think?

Wei.

> Thanks,
> Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest
  2015-02-13 15:05     ` Wei Liu
@ 2015-02-13 15:17       ` Andrew Cooper
  0 siblings, 0 replies; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 15:17 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, dario.faggioli, ian.jackson, xen-devel, JBeulich,
	ufimtseva

On 13/02/15 15:05, Wei Liu wrote:
> On Fri, Feb 13, 2015 at 02:30:30PM +0000, Andrew Cooper wrote:
>> On 12/02/15 19:44, Wei Liu wrote:
>>> diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
>>> index 45b8644..1809674 100644
>>> --- a/tools/libxc/xc_private.h
>>> +++ b/tools/libxc/xc_private.h
>>> @@ -35,6 +35,8 @@
>>>  
>>>  #include <xen/sys/privcmd.h>
>>>  
>>> +#define XC_VNUMA_NO_NODE (~0U)
>> Shouldn't this sentinel value come from the Xen public API?
>>
> Xen public headers don't have a NUMA_NO_NODE though. Do you want me
> to add one to memory.h?

Yes please.

There are far too many sentinel values missing in the Xen public API
which libxc has a different #define for (INVALID_MFN being the main
one).  Best not to make the situation any worse.

~Andrew

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen for HVM guest
  2015-02-13 14:21   ` Ian Jackson
@ 2015-02-13 15:18     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-13 15:18 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

On Fri, Feb 13, 2015 at 02:21:27PM +0000, Ian Jackson wrote:
> Wei Liu writes ("[PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen for HVM guest"):
> > Transform user supplied vNUMA configuration into libxl internal
> > representations then libxc representations. Check validity along the
> > line.
> 
> This is going to be a totally stupid question: but why are there three
> representations here, rather than two ?
> 

There are two. One is libxl and the other is libxc.

> Is the libxc representation too annoying for use internally in libxl ?
> 

Yes, because libxl's representation is consolidated so that it can be
easily used by libxl's user while libxc doesn't actually care about all
the fields in libxl's structure.

Wei.

> Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 18/24] libxl: disallow memory relocation when vNUMA is enabled
  2015-02-13 14:17   ` Ian Jackson
@ 2015-02-13 15:18     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-13 15:18 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

On Fri, Feb 13, 2015 at 02:17:40PM +0000, Ian Jackson wrote:
> Wei Liu writes ("[PATCH v5 18/24] libxl: disallow memory relocation when vNUMA is enabled"):
> > Disallow memory relocation when vNUMA is enabled, because relocated
> > memory ends up off node. Further more, even if we dynamically expand
> > node coverage in hvmloader, low memory and high memory may reside
> > in different physical nodes, blindly relocating low memory to high
> > memory gives us a sub-optimal configuration.
> ...
> >                          "%d",
> > -                        b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL);
> > +                        b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL &&
> > +                        !b_info->num_vnuma_nodes);
> 
> I think it would be useful to add a helper function
>   libxl__vnuma_configured()
> to replace all these open-coded !b_info calls etc.
> 
> That will make things easier if the vnuma specification arrangements
> change, and it will make the code more readable too I think.
> 

Ack.

Wei.

> Thanks,
> Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA
  2015-02-13 14:12   ` Ian Jackson
@ 2015-02-13 15:21     ` Wei Liu
  2015-02-13 15:26       ` Ian Jackson
  0 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-13 15:21 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

On Fri, Feb 13, 2015 at 02:12:58PM +0000, Ian Jackson wrote:
> Wei Liu writes ("[PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA"):
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ...
> > +/* LIBXL_HAVE_VNUMA
> > + *
> > + * If it is defined, libxl supports vNUMA configuration
> > + */
> 
> I think you should be more specific about which calls are covered.
> 

There is no new API yet. Just new structure to specify vNUMA
configuration.

How about this:

/* LIBXL_HAVE_VNUMA
 *
 * If this is defined, libxl's IDL has libxl_vnode_info and there is a
 * array call vnuma_nodes inside libxl_domain_build_info to specify
 * vNUMA configuration.
 */

Wei.

> Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA
  2015-02-13 15:21     ` Wei Liu
@ 2015-02-13 15:26       ` Ian Jackson
  2015-02-13 15:27         ` Ian Jackson
  2015-02-13 15:28         ` Wei Liu
  0 siblings, 2 replies; 94+ messages in thread
From: Ian Jackson @ 2015-02-13 15:26 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

Wei Liu writes ("Re: [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA"):
> There is no new API yet. Just new structure to specify vNUMA
> configuration.
> 
> How about this:
> 
> /* LIBXL_HAVE_VNUMA
>  *
>  * If this is defined, libxl's IDL has libxl_vnode_info and there is a
>  * array call vnuma_nodes inside libxl_domain_build_info to specify
>  * vNUMA configuration.
>  */

Yes, something like that.  It would be better if the precise wording
were more similar to that for the other HAVE macros.  How about:

   * If this is defined the type libxl_vnode_info exists, and a
   * field 'vnuma_nodes' is present in libxl_domain_build_info.

or similar ?

Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA
  2015-02-13 15:26       ` Ian Jackson
@ 2015-02-13 15:27         ` Ian Jackson
  2015-02-13 15:28         ` Wei Liu
  1 sibling, 0 replies; 94+ messages in thread
From: Ian Jackson @ 2015-02-13 15:27 UTC (permalink / raw)
  To: Wei Liu, xen-devel, ian.campbell, dario.faggioli, ufimtseva,
	JBeulich, andrew.cooper3

Ian Jackson writes ("Re: [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA"):
> Yes, something like that.  It would be better if the precise wording
> were more similar to that for the other HAVE macros.  How about:
> 
>    * If this is defined the type libxl_vnode_info exists, and a
>    * field 'vnuma_nodes' is present in libxl_domain_build_info.
> 
> or similar ?

BTW, this may seem picky, but it is much easier to quickly find the
relevant one out of a lot of textual descriptions if there aren't any
unnecessary differences.

Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA
  2015-02-13 15:26       ` Ian Jackson
  2015-02-13 15:27         ` Ian Jackson
@ 2015-02-13 15:28         ` Wei Liu
  1 sibling, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-13 15:28 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

On Fri, Feb 13, 2015 at 03:26:49PM +0000, Ian Jackson wrote:
> Wei Liu writes ("Re: [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA"):
> > There is no new API yet. Just new structure to specify vNUMA
> > configuration.
> > 
> > How about this:
> > 
> > /* LIBXL_HAVE_VNUMA
> >  *
> >  * If this is defined, libxl's IDL has libxl_vnode_info and there is a
> >  * array call vnuma_nodes inside libxl_domain_build_info to specify
> >  * vNUMA configuration.
> >  */
> 
> Yes, something like that.  It would be better if the precise wording
> were more similar to that for the other HAVE macros.  How about:
> 
>    * If this is defined the type libxl_vnode_info exists, and a
>    * field 'vnuma_nodes' is present in libxl_domain_build_info.
> 
> or similar ?

That's good.

Wei.

> 
> Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-13 15:12     ` Wei Liu
@ 2015-02-13 15:39       ` Elena Ufimtseva
  2015-02-13 16:06         ` Wei Liu
  2015-02-17 16:44       ` Dario Faggioli
  1 sibling, 1 reply; 94+ messages in thread
From: Elena Ufimtseva @ 2015-02-13 15:39 UTC (permalink / raw)
  To: Wei Liu
  Cc: Ian Campbell, Andrew Cooper, Dario Faggioli, Ian Jackson,
	xen-devel, Jan Beulich

On Fri, Feb 13, 2015 at 10:12 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> On Fri, Feb 13, 2015 at 02:15:47PM +0000, Ian Jackson wrote:
>> Wei Liu writes ("[PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check"):
>> > This function is used to check whether vNUMA configuration (be it
>> > auto-generated or supplied by user) is valid.
>>
>> This looks plausible, but I think you should explain what the impact
>> of this patch is.  Presumably the intent is to replace various later
>> failures with ERROR_FAIL with something more useful and more
>> specific ?
>>
>
> Yes, providing more useful error message is on aspect. Another aspect is
> just to do sanity check -- passing an invalid layout to guest doesn't
> make much sense.
>
>> Are there any cases which this new check forbids but which are
>> currently accepted by libxl ?  If so then we have to think about
>> compatibility.
>>
>
> First thing is there is no previous supported vNUMA interface in
> toolstack so there won't be a situation where previous good config
> doesn't pass this check.
>
> Second thing is if user supplies a config without vNUMA configuration
> this function will not get called, so it won't have any effect.
>
>> Also I would like to see an ack from the authors of the vnuma support,
>> as I'm not familiar enough with vnuma to fully understand the
>> semantics of the new checks.
>>
>
> Elena and Dario, what do you think?

The checks themselves look reasonable. And unforgiving :)
I think we had discussion before and some previous patches were
bailing out to some default/basic vnuma
configuration (when possible) in case of 'recoverable'  errors in config.

Any sanity checks for distances?

>
> Wei.
>
>> Thanks,
>> Ian.



-- 
Elena

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-12 19:44 ` [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check Wei Liu
  2015-02-13 14:15   ` Ian Jackson
@ 2015-02-13 15:40   ` Andrew Cooper
  2015-02-17 12:56     ` Wei Liu
  2015-02-17 16:38   ` Dario Faggioli
  2 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 15:40 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> This function is used to check whether vNUMA configuration (be it
> auto-generated or supplied by user) is valid.
>
> Define a new error code ERROR_VNUMA_CONFIG_INVALID.
>
> The checks performed can be found in the comment of the function.
>
> This vNUMA function (and future ones) is placed in a new file called
> libxl_vnuma.c
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> ---
> Changes in v5:
> 1. Define and use new error code.
> 2. Use LOG macro.
> 3. Fix hard tabs.
>
> Changes in v4:
> 1. Adapt to new interface.
>
> Changes in v3:
> 1. Rewrite commit log.
> 2. Shorten two error messages.
> ---
>  tools/libxl/Makefile         |   2 +-
>  tools/libxl/libxl_internal.h |   7 +++
>  tools/libxl/libxl_types.idl  |   1 +
>  tools/libxl/libxl_vnuma.c    | 131 +++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 140 insertions(+), 1 deletion(-)
>  create mode 100644 tools/libxl/libxl_vnuma.c
>
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index 7329521..1b16598 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -93,7 +93,7 @@ LIBXL_LIBS += -lyajl
>  LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
>  			libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
>  			libxl_internal.o libxl_utils.o libxl_uuid.o \
> -			libxl_json.o libxl_aoutils.o libxl_numa.o \
> +			libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o \
>  			libxl_save_callout.o _libxl_save_msgs_callout.o \
>  			libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
>  LIBXL_OBJS += libxl_genid.o
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 6d3ac58..258be0d 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -3394,6 +3394,13 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
>      libxl_bitmap_copy(CTX, &cndt->nodemap, nodemap);
>  }
>  
> +/* Check if vNUMA config is valid. Returns 0 if valid,
> + * ERROR_VNUMA_CONFIG_INVALID otherwise.
> + */
> +int libxl__vnuma_config_check(libxl__gc *gc,
> +                              const libxl_domain_build_info *b_info,
> +                              const libxl__domain_build_state *state);
> +
>  _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
>                                     const libxl_ms_vm_genid *id);
>  
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 14c7e7c..23951fc 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -63,6 +63,7 @@ libxl_error = Enumeration("error", [
>      (-17, "DEVICE_EXISTS"),
>      (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
>      (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
> +    (-20, "VNUMA_CONFIG_INVALID"),
>      ], value_namespace = "")
>  
>  libxl_domain_type = Enumeration("domain_type", [
> diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
> new file mode 100644
> index 0000000..fa5aa8d
> --- /dev/null
> +++ b/tools/libxl/libxl_vnuma.c
> @@ -0,0 +1,131 @@
> +/*
> + * Copyright (C) 2014      Citrix Ltd.
> + * Author Wei Liu <wei.liu2@citrix.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as published
> + * by the Free Software Foundation; version 2.1 only. with the special
> + * exception on linking described in file LICENSE.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + */
> +#include "libxl_osdeps.h" /* must come before any other headers */
> +#include "libxl_internal.h"
> +#include <stdlib.h>
> +
> +/* Sort vmemranges in ascending order with "start" */
> +static int compare_vmemrange(const void *a, const void *b)
> +{
> +    const xen_vmemrange_t *x = a, *y = b;
> +    if (x->start < y->start)
> +        return -1;
> +    if (x->start > y->start)
> +        return 1;
> +    return 0;
> +}
> +
> +/* Check if vNUMA configuration is valid:
> + *  1. all pnodes inside vnode_to_pnode array are valid
> + *  2. one vcpu belongs to and only belongs to one vnode
> + *  3. each vmemrange is valid and doesn't overlap with each other
> + */
> +int libxl__vnuma_config_check(libxl__gc *gc,
> +                              const libxl_domain_build_info *b_info,
> +                              const libxl__domain_build_state *state)
> +{
> +    int i, j, rc = ERROR_VNUMA_CONFIG_INVALID, nr_nodes;

i, j and nr_nodes are all semantically unsigned.

> +    libxl_numainfo *ninfo = NULL;
> +    uint64_t total_memkb = 0;
> +    libxl_bitmap cpumap;
> +    libxl_vnode_info *p;
> +
> +    libxl_bitmap_init(&cpumap);
> +
> +    /* Check pnode specified is valid */
> +    ninfo = libxl_get_numainfo(CTX, &nr_nodes);
> +    if (!ninfo) {
> +        LOG(ERROR, "libxl_get_numainfo failed");
> +        goto out;
> +    }
> +
> +    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
> +        uint32_t pnode;
> +
> +        p = &b_info->vnuma_nodes[i];
> +        pnode = p->pnode;
> +
> +        /* The pnode specified is not valid? */
> +        if (pnode >= nr_nodes) {
> +            LOG(ERROR, "Invalid pnode %d specified", pnode);

pnode is uint32_t, so should be %u

> +            goto out;
> +        }
> +
> +        total_memkb += p->memkb;
> +    }
> +
> +    if (total_memkb != b_info->max_memkb) {
> +        LOG(ERROR, "Amount of memory mismatch (0x%"PRIx64" != 0x%"PRIx64")",
> +            total_memkb, b_info->max_memkb);
> +        goto out;
> +    }
> +
> +    /* Check vcpu mapping */
> +    libxl_cpu_bitmap_alloc(CTX, &cpumap, b_info->max_vcpus);
> +    libxl_bitmap_set_none(&cpumap);

Worth using/making libxl_cpu_bitmap_zalloc(), or perhaps making this a
defined semantic of the alloc() function?  This would seem to be a very
common pair of operations to perform.

> +    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
> +        p = &b_info->vnuma_nodes[i];
> +        libxl_for_each_set_bit(j, p->vcpus) {
> +            if (!libxl_bitmap_test(&cpumap, j))
> +                libxl_bitmap_set(&cpumap, j);
> +            else {
> +                LOG(ERROR, "Vcpu %d assigned more than once", j);
> +                goto out;
> +            }
> +        }

This libxl_for_each_set_bit() loop can be optimised to a
bitmap_intersects() for the error condition, and bitmap_or() for the
success case.

> +    }
> +
> +    for (i = 0; i < b_info->max_vcpus; i++) {
> +        if (!libxl_bitmap_test(&cpumap, i)) {
> +            LOG(ERROR, "Vcpu %d is not assigned to any vnode", i);
> +            goto out;
> +        }

This loop can be optimised to !bitmap_all_set().

> +    }
> +
> +    /* Check vmemranges */
> +    qsort(state->vmemranges, state->num_vmemranges, sizeof(xen_vmemrange_t),
> +          compare_vmemrange);
> +
> +    for (i = 0; i < state->num_vmemranges; i++) {
> +        if (state->vmemranges[i].end < state->vmemranges[i].start) {
> +                LOG(ERROR, "Vmemrange end < start");
> +                goto out;
> +        }
> +    }
> +
> +    for (i = 0; i < state->num_vmemranges - 1; i++) {
> +        if (state->vmemranges[i].end > state->vmemranges[i+1].start) {
> +            LOG(ERROR,
> +                "Vmemranges overlapped, 0x%"PRIx64"-0x%"PRIx64", 0x%"PRIx64"-0x%"PRIx64,
> +                state->vmemranges[i].start, state->vmemranges[i].end,
> +                state->vmemranges[i+1].start, state->vmemranges[i+1].end);
> +            goto out;
> +        }
> +    }
> +
> +    rc = 0;
> +out:
> +    if (ninfo) libxl_numainfo_dispose(ninfo);

Completely Unrelated to the content of this patch, I suggest that
$FOO_dispose() functions are required to act as free() does with NULL
pointers.  It would simply caller error handling.

~Andrew

> +    libxl_bitmap_dispose(&cpumap);
> +    return rc;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 09/24] libxl: x86: factor out e820_host_sanitize
  2015-02-12 19:44 ` [PATCH v5 09/24] libxl: x86: factor out e820_host_sanitize Wei Liu
@ 2015-02-13 15:42   ` Andrew Cooper
  2015-02-16 17:00     ` Dario Faggioli
  0 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 15:42 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> This function gets the machine E820 map and sanitize it according to PV
> guest configuration.
>
> This will be used in later patch. No functional change introduced in
> this patch.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Looks to have addressed my previous concerns.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> ---
> Changes in v4:
> 1. Use actual size of the map instead of using E820MAX.
> ---
>  tools/libxl/libxl_x86.c | 32 +++++++++++++++++++++++---------
>  1 file changed, 23 insertions(+), 9 deletions(-)
>
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index 9ceb373..d012b4d 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -207,6 +207,27 @@ static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[],
>      return 0;
>  }
>  
> +static int e820_host_sanitize(libxl__gc *gc,
> +                              libxl_domain_build_info *b_info,
> +                              struct e820entry map[],
> +                              uint32_t *nr)
> +{
> +    int rc;
> +
> +    rc = xc_get_machine_memory_map(CTX->xch, map, *nr);
> +    if (rc < 0) {
> +        errno = rc;
> +        return ERROR_FAIL;
> +    }
> +
> +    *nr = rc;
> +
> +    rc = e820_sanitize(CTX, map, nr, b_info->target_memkb,
> +                       (b_info->max_memkb - b_info->target_memkb) +
> +                       b_info->u.pv.slack_memkb);
> +    return rc;
> +}
> +
>  static int libxl__e820_alloc(libxl__gc *gc, uint32_t domid,
>          libxl_domain_config *d_config)
>  {
> @@ -223,15 +244,8 @@ static int libxl__e820_alloc(libxl__gc *gc, uint32_t domid,
>      if (!libxl_defbool_val(b_info->u.pv.e820_host))
>          return ERROR_INVAL;
>  
> -    rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
> -    if (rc < 0) {
> -        errno = rc;
> -        return ERROR_FAIL;
> -    }
> -    nr = rc;
> -    rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
> -                       (b_info->max_memkb - b_info->target_memkb) +
> -                       b_info->u.pv.slack_memkb);
> +    nr = E820MAX;
> +    rc = e820_host_sanitize(gc, b_info, map, &nr);
>      if (rc)
>          return ERROR_FAIL;
>  

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 10/24] libxl: functions to build vmemranges for PV guest
  2015-02-12 19:44 ` [PATCH v5 10/24] libxl: functions to build vmemranges for PV guest Wei Liu
@ 2015-02-13 15:49   ` Andrew Cooper
  2015-02-17 14:08     ` Wei Liu
  2015-02-17 15:28   ` Dario Faggioli
  1 sibling, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 15:49 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> Introduce a arch-independent routine to generate one vmemrange per
> vnode. Also introduce arch-dependent routines for different
> architectures because part of the process is arch-specific -- ARM has
> yet have NUMA support and E820 is x86 only.
>
> For those x86 guests who care about machine E820 map (i.e. with
> e820_host=1), vnode is further split into several vmemranges to
> accommodate memory holes.  A few stubs for libxl_arm.c are created.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> ---
> Changes in v5:
> 1. Allocate array all in one go.
> 2. Reverse the logic of vmemranges generation.
>
> Changes in v4:
> 1. Adapt to new interface.
> 2. Address Ian Jackson's comments.
>
> Changes in v3:
> 1. Rewrite commit log.
> ---
>  tools/libxl/libxl_arch.h     |  6 ++++
>  tools/libxl/libxl_arm.c      |  8 +++++
>  tools/libxl/libxl_internal.h |  8 +++++
>  tools/libxl/libxl_vnuma.c    | 41 +++++++++++++++++++++++++
>  tools/libxl/libxl_x86.c      | 73 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 136 insertions(+)
>
> diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
> index d3bc136..e249048 100644
> --- a/tools/libxl/libxl_arch.h
> +++ b/tools/libxl/libxl_arch.h
> @@ -27,4 +27,10 @@ int libxl__arch_domain_init_hw_description(libxl__gc *gc,
>  int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
>                                        libxl_domain_build_info *info,
>                                        struct xc_dom_image *dom);
> +
> +/* build vNUMA vmemrange with arch specific information */
> +int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
> +                                      uint32_t domid,
> +                                      libxl_domain_build_info *b_info,
> +                                      libxl__domain_build_state *state);
>  #endif
> diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
> index 65a762b..7da254f 100644
> --- a/tools/libxl/libxl_arm.c
> +++ b/tools/libxl/libxl_arm.c
> @@ -707,6 +707,14 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
>      return 0;
>  }
>  
> +int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
> +                                      uint32_t domid,
> +                                      libxl_domain_build_info *info,
> +                                      libxl__domain_build_state *state)
> +{
> +    return libxl__vnuma_build_vmemrange_pv_generic(gc, domid, info, state);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 258be0d..7d1e1cf 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -3400,6 +3400,14 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
>  int libxl__vnuma_config_check(libxl__gc *gc,
>                                const libxl_domain_build_info *b_info,
>                                const libxl__domain_build_state *state);
> +int libxl__vnuma_build_vmemrange_pv_generic(libxl__gc *gc,
> +                                            uint32_t domid,
> +                                            libxl_domain_build_info *b_info,
> +                                            libxl__domain_build_state *state);
> +int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
> +                                    uint32_t domid,
> +                                    libxl_domain_build_info *b_info,
> +                                    libxl__domain_build_state *state);
>  
>  _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
>                                     const libxl_ms_vm_genid *id);
> diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
> index fa5aa8d..3d46239 100644
> --- a/tools/libxl/libxl_vnuma.c
> +++ b/tools/libxl/libxl_vnuma.c
> @@ -14,6 +14,7 @@
>   */
>  #include "libxl_osdeps.h" /* must come before any other headers */
>  #include "libxl_internal.h"
> +#include "libxl_arch.h"
>  #include <stdlib.h>
>  
>  /* Sort vmemranges in ascending order with "start" */
> @@ -122,6 +123,46 @@ out:
>      return rc;
>  }
>  
> +
> +int libxl__vnuma_build_vmemrange_pv_generic(libxl__gc *gc,
> +                                            uint32_t domid,
> +                                            libxl_domain_build_info *b_info,
> +                                            libxl__domain_build_state *state)
> +{
> +    int i;
> +    uint64_t next;
> +    xen_vmemrange_t *v = NULL;
> +
> +    /* Generate one vmemrange for each virtual node. */
> +    GCREALLOC_ARRAY(v, b_info->num_vnuma_nodes);
> +    next = 0;
> +    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
> +        libxl_vnode_info *p = &b_info->vnuma_nodes[i];
> +
> +        v[i].start = next;
> +        v[i].end = next + (p->memkb << 10);
> +        v[i].flags = 0;
> +        v[i].nid = i;
> +
> +        next = v[i].end;

Using "start" and "end", this would appear to have a fencepost error
which a start/size pair wouldn't have.

~Andrew

> +    }
> +
> +    state->vmemranges = v;
> +    state->num_vmemranges = i;
> +
> +    return 0;
> +}
> +
> +/* Build vmemranges for PV guest */
> +int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
> +                                    uint32_t domid,
> +                                    libxl_domain_build_info *b_info,
> +                                    libxl__domain_build_state *state)
> +{
> +    assert(state->vmemranges == NULL);
> +    return libxl__arch_vnuma_build_vmemrange(gc, domid, b_info, state);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index d012b4d..d37cca1 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -339,6 +339,79 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
>      return 0;
>  }
>  
> +/* Return 0 on success, ERROR_* on failure. */
> +int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
> +                                      uint32_t domid,
> +                                      libxl_domain_build_info *b_info,
> +                                      libxl__domain_build_state *state)
> +{
> +    int nid, nr_vmemrange, rc;
> +    uint32_t nr_e820, e820_count;
> +    struct e820entry map[E820MAX];
> +    xen_vmemrange_t *vmemranges;
> +
> +    /* If e820_host is not set, call the generic function */
> +    if (!(b_info->type == LIBXL_DOMAIN_TYPE_PV &&
> +          libxl_defbool_val(b_info->u.pv.e820_host)))
> +        return libxl__vnuma_build_vmemrange_pv_generic(gc, domid, b_info,
> +                                                       state);
> +
> +    assert(state->vmemranges == NULL);
> +
> +    nr_e820 = E820MAX;
> +    rc = e820_host_sanitize(gc, b_info, map, &nr_e820);
> +    if (rc) goto out;
> +
> +    e820_count = 0;
> +    nr_vmemrange = 0;
> +    vmemranges = NULL;
> +    for (nid = 0; nid < b_info->num_vnuma_nodes; nid++) {
> +        libxl_vnode_info *p = &b_info->vnuma_nodes[nid];
> +        uint64_t remaining_bytes = (p->memkb << 10), bytes;
> +
> +        while (remaining_bytes > 0) {
> +            if (e820_count >= nr_e820) {
> +                rc = ERROR_NOMEM;
> +                goto out;
> +            }
> +
> +            /* Skip non RAM region */
> +            if (map[e820_count].type != E820_RAM) {
> +                e820_count++;
> +                continue;
> +            }
> +
> +            GCREALLOC_ARRAY(vmemranges, nr_vmemrange+1);
> +
> +            bytes = map[e820_count].size >= remaining_bytes ?
> +                remaining_bytes : map[e820_count].size;
> +
> +            vmemranges[nr_vmemrange].start = map[e820_count].addr;
> +            vmemranges[nr_vmemrange].end = map[e820_count].addr + bytes;
> +
> +            if (map[e820_count].size >= remaining_bytes) {
> +                map[e820_count].addr += bytes;
> +                map[e820_count].size -= bytes;
> +            } else {
> +                e820_count++;
> +            }
> +
> +            remaining_bytes -= bytes;
> +
> +            vmemranges[nr_vmemrange].flags = 0;
> +            vmemranges[nr_vmemrange].nid = nid;
> +            nr_vmemrange++;
> +        }
> +    }
> +
> +    state->vmemranges = vmemranges;
> +    state->num_vmemranges = nr_vmemrange;
> +
> +    rc = 0;
> +out:
> +    return rc;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 11/24] libxl: build, check and pass vNUMA info to Xen for PV guest
  2015-02-12 19:44 ` [PATCH v5 11/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
@ 2015-02-13 15:54   ` Andrew Cooper
  2015-02-17 14:49   ` Dario Faggioli
  1 sibling, 0 replies; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 15:54 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> Transform the user supplied vNUMA configuration into libxl internal
> representations, and finally libxc representations. Check validity of
> the configuration along the line.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> ---
> Changes in v5:
> 1. Adapt to change of interface (ditching xc_vnuma_info).
>
> Changes in v4:
> 1. Adapt to new interfaces.
>
> Changes in v3:
> 1. Add more commit log.
> ---
>  tools/libxl/libxl_dom.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 77 insertions(+)
>
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 48d661a..1ff0704 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -515,6 +515,51 @@ retry_transaction:
>      return 0;
>  }
>  
> +static int set_vnuma_info(libxl__gc *gc, uint32_t domid,
> +                          const libxl_domain_build_info *info,
> +                          const libxl__domain_build_state *state)
> +{
> +    int rc = 0;
> +    int i, nr_vdistance;

unsigned

> +    unsigned int *vcpu_to_vnode, *vnode_to_pnode, *vdistance = NULL;
> +
> +    vcpu_to_vnode = libxl__calloc(gc, info->max_vcpus,
> +                                  sizeof(unsigned int));
> +    vnode_to_pnode = libxl__calloc(gc, info->num_vnuma_nodes,
> +                                   sizeof(unsigned int));
> +
> +    nr_vdistance = info->num_vnuma_nodes * info->num_vnuma_nodes;
> +    vdistance = libxl__calloc(gc, nr_vdistance, sizeof(unsigned int));
> +
> +    for (i = 0; i < info->num_vnuma_nodes; i++) {
> +        libxl_vnode_info *v = &info->vnuma_nodes[i];
> +        int bit;
> +
> +        /* vnode to pnode mapping */
> +        vnode_to_pnode[i] = v->pnode;
> +
> +        /* vcpu to vnode mapping */
> +        libxl_for_each_set_bit(bit, v->vcpus)
> +            vcpu_to_vnode[bit] = i;
> +
> +        /* node distances */
> +        assert(info->num_vnuma_nodes == v->num_distances);
> +        memcpy(vdistance + (i * info->num_vnuma_nodes),
> +               v->distances,
> +               v->num_distances * sizeof(unsigned int));
> +    }
> +
> +    if (xc_domain_setvnuma(CTX->xch, domid, info->num_vnuma_nodes,
> +                           state->num_vmemranges, info->max_vcpus,
> +                           state->vmemranges, vdistance,
> +                           vcpu_to_vnode, vnode_to_pnode) < 0) {
> +        LOGE(ERROR, "xc_domain_setvnuma failed");
> +        rc = ERROR_FAIL;
> +    }
> +
> +    return rc;
> +}
> +
>  int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>               libxl_domain_build_info *info, libxl__domain_build_state *state)
>  {
> @@ -572,6 +617,38 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>      dom->xenstore_domid = state->store_domid;
>      dom->claim_enabled = libxl_defbool_val(info->claim_mode);
>  
> +    if (info->num_vnuma_nodes != 0) {
> +        int i;

unsigned

~Andrew

> +
> +        ret = libxl__vnuma_build_vmemrange_pv(gc, domid, info, state);
> +        if (ret) {
> +            LOGE(ERROR, "cannot build vmemranges");
> +            goto out;
> +        }
> +        ret = libxl__vnuma_config_check(gc, info, state);
> +        if (ret) goto out;
> +
> +        ret = set_vnuma_info(gc, domid, info, state);
> +        if (ret) goto out;
> +
> +        dom->nr_vmemranges = state->num_vmemranges;
> +        dom->vmemranges = xc_dom_malloc(dom, sizeof(*dom->vmemranges) *
> +                                        dom->nr_vmemranges);
> +
> +        for (i = 0; i < dom->nr_vmemranges; i++) {
> +            dom->vmemranges[i].start = state->vmemranges[i].start;
> +            dom->vmemranges[i].end   = state->vmemranges[i].end;
> +            dom->vmemranges[i].flags = state->vmemranges[i].flags;
> +            dom->vmemranges[i].nid   = state->vmemranges[i].nid;
> +        }
> +
> +        dom->nr_vnodes = info->num_vnuma_nodes;
> +        dom->vnode_to_pnode = xc_dom_malloc(dom, sizeof(*dom->vnode_to_pnode) *
> +                                            dom->nr_vnodes);
> +        for (i = 0; i < info->num_vnuma_nodes; i++)
> +            dom->vnode_to_pnode[i] = info->vnuma_nodes[0].pnode;
> +    }
> +
>      if ( (ret = xc_dom_boot_xen_init(dom, ctx->xch, domid)) != 0 ) {
>          LOGE(ERROR, "xc_dom_boot_xen_init failed");
>          goto out;

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 12/24] hvmloader: retrieve vNUMA information from hypervisor
  2015-02-12 19:44 ` [PATCH v5 12/24] hvmloader: retrieve vNUMA information from hypervisor Wei Liu
@ 2015-02-13 15:58   ` Andrew Cooper
  2015-02-17 11:36   ` Jan Beulich
  1 sibling, 0 replies; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 15:58 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> Hvmloader issues XENMEM_get_vnumainfo hypercall and stores the
> information retrieved in scratch space for later use.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> ---
> Changes in v5:
> 1. Group scratch_alloc togeter.
> 2. Use memset.
> 3. Drop unnecessary "return";
> 4. Rebase onto Jan's errno ABI change.
>
> Changes in v4:
> 1. Use *vnode_to_pnode to calculate size.
> 2. Remove loop.
>
> Changes in v3:
> 1. Move init_vnuma_info before ACPI stuff.
> 2. Fix errno.h inclusion.
> 3. Remove upper limits and use loop.
> ---
>  tools/firmware/hvmloader/Makefile    |  2 +-
>  tools/firmware/hvmloader/hvmloader.c |  3 ++
>  tools/firmware/hvmloader/vnuma.c     | 84 ++++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/vnuma.h     | 52 ++++++++++++++++++++++
>  4 files changed, 140 insertions(+), 1 deletion(-)
>  create mode 100644 tools/firmware/hvmloader/vnuma.c
>  create mode 100644 tools/firmware/hvmloader/vnuma.h
>
> diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
> index b759e81..cf967fd 100644
> --- a/tools/firmware/hvmloader/Makefile
> +++ b/tools/firmware/hvmloader/Makefile
> @@ -29,7 +29,7 @@ LOADADDR = 0x100000
>  CFLAGS += $(CFLAGS_xeninclude)
>  
>  OBJS  = hvmloader.o mp_tables.o util.o smbios.o 
> -OBJS += smp.o cacheattr.o xenbus.o
> +OBJS += smp.o cacheattr.o xenbus.o vnuma.o
>  OBJS += e820.o pci.o pir.o ctype.o
>  OBJS += hvm_param.o
>  ifeq ($(debug),y)
> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
> index 7b0da38..25b7f08 100644
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -26,6 +26,7 @@
>  #include "pci_regs.h"
>  #include "apic_regs.h"
>  #include "acpi/acpi2_0.h"
> +#include "vnuma.h"
>  #include <xen/version.h>
>  #include <xen/hvm/params.h>
>  
> @@ -310,6 +311,8 @@ int main(void)
>  
>      if ( acpi_enabled )
>      {
> +        init_vnuma_info();
> +
>          if ( bios->acpi_build_tables )
>          {
>              printf("Loading ACPI ...\n");
> diff --git a/tools/firmware/hvmloader/vnuma.c b/tools/firmware/hvmloader/vnuma.c
> new file mode 100644
> index 0000000..a71d31a
> --- /dev/null
> +++ b/tools/firmware/hvmloader/vnuma.c
> @@ -0,0 +1,84 @@
> +/*
> + * vnuma.c: obtain vNUMA information from hypervisor
> + *
> + * Copyright (c) 2014 Wei Liu, Citrix Systems (R&D) Ltd.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +
> +#include "util.h"
> +#include "hypercall.h"
> +#include "vnuma.h"
> +#include <xen/errno.h>
> +
> +unsigned int nr_vnodes, nr_vmemranges;
> +unsigned int *vcpu_to_vnode, *vdistance;
> +xen_vmemrange_t *vmemrange;
> +
> +void init_vnuma_info(void)
> +{
> +    int rc;
> +    struct xen_vnuma_topology_info vnuma_topo;
> +
> +    memset(&vnuma_topo, 0, sizeof(vnuma_topo));
> +    vnuma_topo.domid = DOMID_SELF;

struct xen_vnuma_topology_info vnum_topo = { .domid = DOMID_SELF };

Might as well use C99 features to your advantage.

Otherwise, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> +
> +    rc = hypercall_memory_op(XENMEM_get_vnumainfo, &vnuma_topo);
> +
> +    if ( rc != -XEN_ENOBUFS )
> +        return;
> +
> +    ASSERT(vnuma_topo.nr_vcpus == hvm_info->nr_vcpus);
> +
> +    vcpu_to_vnode =
> +        scratch_alloc(sizeof(*vcpu_to_vnode) * hvm_info->nr_vcpus, 0);
> +    vdistance = scratch_alloc(sizeof(uint32_t) * vnuma_topo.nr_vnodes *
> +                              vnuma_topo.nr_vnodes, 0);
> +    vmemrange = scratch_alloc(sizeof(xen_vmemrange_t) *
> +                              vnuma_topo.nr_vmemranges, 0);
> +
> +    set_xen_guest_handle(vnuma_topo.vdistance.h, vdistance);
> +    set_xen_guest_handle(vnuma_topo.vcpu_to_vnode.h, vcpu_to_vnode);
> +    set_xen_guest_handle(vnuma_topo.vmemrange.h, vmemrange);
> +
> +    rc = hypercall_memory_op(XENMEM_get_vnumainfo, &vnuma_topo);
> +
> +    if ( rc < 0 )
> +    {
> +        printf("Failed to retrieve vNUMA information, rc = %d\n", rc);
> +        return;
> +    }
> +
> +    nr_vnodes = vnuma_topo.nr_vnodes;
> +    nr_vmemranges = vnuma_topo.nr_vmemranges;
> +}
> +
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/tools/firmware/hvmloader/vnuma.h b/tools/firmware/hvmloader/vnuma.h
> new file mode 100644
> index 0000000..63b648a
> --- /dev/null
> +++ b/tools/firmware/hvmloader/vnuma.h
> @@ -0,0 +1,52 @@
> +/******************************************************************************
> + * vnuma.h
> + *
> + * Copyright (c) 2014, Wei Liu
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation; or, when distributed
> + * separately from the Linux kernel or incorporated into other
> + * software packages, subject to the following license:
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this source file (the "Software"), to deal in the Software without
> + * restriction, including without limitation the rights to use, copy, modify,
> + * merge, publish, distribute, sublicense, and/or sell copies of the Software,
> + * and to permit persons to whom the Software is furnished to do so, subject to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#ifndef __HVMLOADER_VNUMA_H__
> +#define __HVMLOADER_VNUMA_H__
> +
> +#include <xen/memory.h>
> +
> +extern unsigned int nr_vnodes, nr_vmemranges;
> +extern unsigned int *vcpu_to_vnode, *vdistance;
> +extern xen_vmemrange_t *vmemrange;
> +
> +void init_vnuma_info(void);
> +
> +#endif /* __HVMLOADER_VNUMA_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-13 15:39       ` Elena Ufimtseva
@ 2015-02-13 16:06         ` Wei Liu
  2015-02-13 16:11           ` Elena Ufimtseva
  0 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-13 16:06 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: Wei Liu, Ian Campbell, Andrew Cooper, Dario Faggioli,
	Ian Jackson, xen-devel, Jan Beulich

On Fri, Feb 13, 2015 at 10:39:25AM -0500, Elena Ufimtseva wrote:
> On Fri, Feb 13, 2015 at 10:12 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> > On Fri, Feb 13, 2015 at 02:15:47PM +0000, Ian Jackson wrote:
> >> Wei Liu writes ("[PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check"):
> >> > This function is used to check whether vNUMA configuration (be it
> >> > auto-generated or supplied by user) is valid.
> >>
> >> This looks plausible, but I think you should explain what the impact
> >> of this patch is.  Presumably the intent is to replace various later
> >> failures with ERROR_FAIL with something more useful and more
> >> specific ?
> >>
> >
> > Yes, providing more useful error message is on aspect. Another aspect is
> > just to do sanity check -- passing an invalid layout to guest doesn't
> > make much sense.
> >
> >> Are there any cases which this new check forbids but which are
> >> currently accepted by libxl ?  If so then we have to think about
> >> compatibility.
> >>
> >
> > First thing is there is no previous supported vNUMA interface in
> > toolstack so there won't be a situation where previous good config
> > doesn't pass this check.
> >
> > Second thing is if user supplies a config without vNUMA configuration
> > this function will not get called, so it won't have any effect.
> >
> >> Also I would like to see an ack from the authors of the vnuma support,
> >> as I'm not familiar enough with vnuma to fully understand the
> >> semantics of the new checks.
> >>
> >
> > Elena and Dario, what do you think?
> 
> The checks themselves look reasonable. And unforgiving :)
> I think we had discussion before and some previous patches were
> bailing out to some default/basic vnuma
> configuration (when possible) in case of 'recoverable'  errors in config.
> 

Since this is new I would start with strict then consider recoverable
configs later. It's hard to code for something that's not yet well
defined.

> Any sanity checks for distances?
> 

The same applies, what is a valid distance what is not? I guess zero is
not valid? Or do we enforce that the distance to local node must be
smaller than or equal to the distance to remote node?

Wei.

> >
> > Wei.
> >
> >> Thanks,
> >> Ian.
> 
> 
> 
> -- 
> Elena

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 13/24] hvmloader: construct SRAT
  2015-02-12 19:44 ` [PATCH v5 13/24] hvmloader: construct SRAT Wei Liu
@ 2015-02-13 16:07   ` Andrew Cooper
  0 siblings, 0 replies; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 16:07 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Jan Beulich <JBeulich@suse.com>
> ---
> Changes in v3:
> 1. Remove redundant variable.
> 2. Coding style fix.
> 3. Add assertion.
>
> Changes in v2:
> 1. Remove explicit zero initializers.
> 2. Adapt to new vNUMA retrieval routine.
> 3. Move SRAT very late in secondary table build.
> ---
>  tools/firmware/hvmloader/acpi/acpi2_0.h | 53 ++++++++++++++++++++++++
>  tools/firmware/hvmloader/acpi/build.c   | 72 +++++++++++++++++++++++++++++++++
>  2 files changed, 125 insertions(+)
>
> diff --git a/tools/firmware/hvmloader/acpi/acpi2_0.h b/tools/firmware/hvmloader/acpi/acpi2_0.h
> index 7b22d80..6169213 100644
> --- a/tools/firmware/hvmloader/acpi/acpi2_0.h
> +++ b/tools/firmware/hvmloader/acpi/acpi2_0.h
> @@ -364,6 +364,57 @@ struct acpi_20_madt_intsrcovr {
>  };
>  
>  /*
> + * System Resource Affinity Table header definition (SRAT)
> + */
> +struct acpi_20_srat {
> +    struct acpi_header header;
> +    uint32_t table_revision;
> +    uint32_t reserved2[2];
> +};
> +
> +#define ACPI_SRAT_TABLE_REVISION 1
> +
> +/*
> + * System Resource Affinity Table structure types.
> + */
> +#define ACPI_PROCESSOR_AFFINITY 0x0
> +#define ACPI_MEMORY_AFFINITY    0x1
> +struct acpi_20_srat_processor {
> +    uint8_t type;
> +    uint8_t length;
> +    uint8_t domain;
> +    uint8_t apic_id;
> +    uint32_t flags;
> +    uint8_t sapic_id;
> +    uint8_t domain_hi[3];
> +    uint32_t reserved;
> +};
> +
> +/*
> + * Local APIC Affinity Flags.  All other bits are reserved and must be 0.
> + */
> +#define ACPI_LOCAL_APIC_AFFIN_ENABLED (1 << 0)
> +
> +struct acpi_20_srat_memory {
> +    uint8_t type;
> +    uint8_t length;
> +    uint32_t domain;
> +    uint16_t reserved;
> +    uint64_t base_address;
> +    uint64_t mem_length;
> +    uint32_t reserved2;
> +    uint32_t flags;
> +    uint64_t reserved3;
> +};
> +
> +/*
> + * Memory Affinity Flags.  All other bits are reserved and must be 0.
> + */
> +#define ACPI_MEM_AFFIN_ENABLED (1 << 0)
> +#define ACPI_MEM_AFFIN_HOTPLUGGABLE (1 << 1)
> +#define ACPI_MEM_AFFIN_NONVOLATILE (1 << 2)
> +
> +/*
>   * Table Signatures.
>   */
>  #define ACPI_2_0_RSDP_SIGNATURE ASCII64('R','S','D',' ','P','T','R',' ')
> @@ -375,6 +426,7 @@ struct acpi_20_madt_intsrcovr {
>  #define ACPI_2_0_TCPA_SIGNATURE ASCII32('T','C','P','A')
>  #define ACPI_2_0_HPET_SIGNATURE ASCII32('H','P','E','T')
>  #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
> +#define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
>  
>  /*
>   * Table revision numbers.
> @@ -388,6 +440,7 @@ struct acpi_20_madt_intsrcovr {
>  #define ACPI_2_0_HPET_REVISION 0x01
>  #define ACPI_2_0_WAET_REVISION 0x01
>  #define ACPI_1_0_FADT_REVISION 0x01
> +#define ACPI_2_0_SRAT_REVISION 0x01
>  
>  #pragma pack ()
>  
> diff --git a/tools/firmware/hvmloader/acpi/build.c b/tools/firmware/hvmloader/acpi/build.c
> index 1431296..3e96c23 100644
> --- a/tools/firmware/hvmloader/acpi/build.c
> +++ b/tools/firmware/hvmloader/acpi/build.c
> @@ -23,6 +23,7 @@
>  #include "ssdt_pm.h"
>  #include "../config.h"
>  #include "../util.h"
> +#include "../vnuma.h"
>  #include <xen/hvm/hvm_xs_strings.h>
>  #include <xen/hvm/params.h>
>  
> @@ -203,6 +204,66 @@ static struct acpi_20_waet *construct_waet(void)
>      return waet;
>  }
>  
> +static struct acpi_20_srat *construct_srat(void)
> +{
> +    struct acpi_20_srat *srat;
> +    struct acpi_20_srat_processor *processor;
> +    struct acpi_20_srat_memory *memory;
> +    unsigned int size;
> +    void *p;
> +    int i;

unsigned

> +
> +    size = sizeof(*srat) + sizeof(*processor) * hvm_info->nr_vcpus +
> +        sizeof(*memory) * nr_vmemranges;
> +
> +    p = mem_alloc(size, 16);
> +    if ( !p )
> +        return NULL;
> +
> +    srat = p;
> +    memset(srat, 0, sizeof(*srat));

Probably best to memset() all of size in one go, rather than memset()ing
each bit individually.

Otherwise, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> +    srat->header.signature    = ACPI_2_0_SRAT_SIGNATURE;
> +    srat->header.revision     = ACPI_2_0_SRAT_REVISION;
> +    fixed_strcpy(srat->header.oem_id, ACPI_OEM_ID);
> +    fixed_strcpy(srat->header.oem_table_id, ACPI_OEM_TABLE_ID);
> +    srat->header.oem_revision = ACPI_OEM_REVISION;
> +    srat->header.creator_id   = ACPI_CREATOR_ID;
> +    srat->header.creator_revision = ACPI_CREATOR_REVISION;
> +    srat->table_revision      = ACPI_SRAT_TABLE_REVISION;
> +
> +    processor = (struct acpi_20_srat_processor *)(srat + 1);
> +    for ( i = 0; i < hvm_info->nr_vcpus; i++ )
> +    {
> +        memset(processor, 0, sizeof(*processor));
> +        processor->type     = ACPI_PROCESSOR_AFFINITY;
> +        processor->length   = sizeof(*processor);
> +        processor->domain   = vcpu_to_vnode[i];
> +        processor->apic_id  = LAPIC_ID(i);
> +        processor->flags    = ACPI_LOCAL_APIC_AFFIN_ENABLED;
> +        processor++;
> +    }
> +
> +    memory = (struct acpi_20_srat_memory *)processor;
> +    for ( i = 0; i < nr_vmemranges; i++ )
> +    {
> +        memset(memory, 0, sizeof(*memory));
> +        memory->type          = ACPI_MEMORY_AFFINITY;
> +        memory->length        = sizeof(*memory);
> +        memory->domain        = vmemrange[i].nid;
> +        memory->flags         = ACPI_MEM_AFFIN_ENABLED;
> +        memory->base_address  = vmemrange[i].start;
> +        memory->mem_length    = vmemrange[i].end - vmemrange[i].start;
> +        memory++;
> +    }
> +
> +    ASSERT(((unsigned long)memory) - ((unsigned long)p) == size);
> +
> +    srat->header.length = size;
> +    set_checksum(srat, offsetof(struct acpi_header, checksum), size);
> +
> +    return srat;
> +}
> +
>  static int construct_passthrough_tables(unsigned long *table_ptrs,
>                                          int nr_tables)
>  {
> @@ -257,6 +318,7 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
>      struct acpi_20_hpet *hpet;
>      struct acpi_20_waet *waet;
>      struct acpi_20_tcpa *tcpa;
> +    struct acpi_20_srat *srat;
>      unsigned char *ssdt;
>      static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001};
>      uint16_t *tis_hdr;
> @@ -346,6 +408,16 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
>          }
>      }
>  
> +    /* SRAT */
> +    if ( nr_vnodes > 0 )
> +    {
> +        srat = construct_srat();
> +        if ( srat )
> +            table_ptrs[nr_tables++] = (unsigned long)srat;
> +        else
> +            printf("Failed to build SRAT, skipping...\n");
> +    }
> +
>      /* Load any additional tables passed through. */
>      nr_tables += construct_passthrough_tables(table_ptrs, nr_tables);
>  

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 14/24] hvmloader: construct SLIT
  2015-02-12 19:44 ` [PATCH v5 14/24] hvmloader: construct SLIT Wei Liu
@ 2015-02-13 16:10   ` Andrew Cooper
  0 siblings, 0 replies; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 16:10 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Jan Beulich <JBeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> ---
> Changes in v3:
> 1. Coding style fix.
> 2. Fix an error code.
> 3. Use unsigned int for loop variable.
>
> Changes in v2:
> 1. Adapt to new vNUMA retrieval routine.
> 2. Move SLIT very late in secondary table build.
> ---
>  tools/firmware/hvmloader/acpi/acpi2_0.h |  8 +++++++
>  tools/firmware/hvmloader/acpi/build.c   | 40 ++++++++++++++++++++++++++++++++-
>  2 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/tools/firmware/hvmloader/acpi/acpi2_0.h b/tools/firmware/hvmloader/acpi/acpi2_0.h
> index 6169213..d698095 100644
> --- a/tools/firmware/hvmloader/acpi/acpi2_0.h
> +++ b/tools/firmware/hvmloader/acpi/acpi2_0.h
> @@ -414,6 +414,12 @@ struct acpi_20_srat_memory {
>  #define ACPI_MEM_AFFIN_HOTPLUGGABLE (1 << 1)
>  #define ACPI_MEM_AFFIN_NONVOLATILE (1 << 2)
>  
> +struct acpi_20_slit {
> +    struct acpi_header header;
> +    uint64_t localities;
> +    uint8_t entry[0];
> +};
> +
>  /*
>   * Table Signatures.
>   */
> @@ -427,6 +433,7 @@ struct acpi_20_srat_memory {
>  #define ACPI_2_0_HPET_SIGNATURE ASCII32('H','P','E','T')
>  #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
>  #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
> +#define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
>  
>  /*
>   * Table revision numbers.
> @@ -441,6 +448,7 @@ struct acpi_20_srat_memory {
>  #define ACPI_2_0_WAET_REVISION 0x01
>  #define ACPI_1_0_FADT_REVISION 0x01
>  #define ACPI_2_0_SRAT_REVISION 0x01
> +#define ACPI_2_0_SLIT_REVISION 0x01
>  
>  #pragma pack ()
>  
> diff --git a/tools/firmware/hvmloader/acpi/build.c b/tools/firmware/hvmloader/acpi/build.c
> index 3e96c23..7dac6a8 100644
> --- a/tools/firmware/hvmloader/acpi/build.c
> +++ b/tools/firmware/hvmloader/acpi/build.c
> @@ -264,6 +264,38 @@ static struct acpi_20_srat *construct_srat(void)
>      return srat;
>  }
>  
> +static struct acpi_20_slit *construct_slit(void)
> +{
> +    struct acpi_20_slit *slit;
> +    unsigned int i, num, size;
> +
> +    num = nr_vnodes * nr_vnodes;
> +    size = sizeof(*slit) + num * sizeof(uint8_t);
> +
> +    slit = mem_alloc(size, 16);
> +    if ( !slit )
> +        return NULL;
> +
> +    memset(slit, 0, size);
> +    slit->header.signature    = ACPI_2_0_SLIT_SIGNATURE;
> +    slit->header.revision     = ACPI_2_0_SLIT_REVISION;
> +    fixed_strcpy(slit->header.oem_id, ACPI_OEM_ID);
> +    fixed_strcpy(slit->header.oem_table_id, ACPI_OEM_TABLE_ID);
> +    slit->header.oem_revision = ACPI_OEM_REVISION;
> +    slit->header.creator_id   = ACPI_CREATOR_ID;
> +    slit->header.creator_revision = ACPI_CREATOR_REVISION;
> +
> +    for ( i = 0; i < num; i++ )
> +        slit->entry[i] = vdistance[i];
> +
> +    slit->localities = nr_vnodes;
> +
> +    slit->header.length = size;
> +    set_checksum(slit, offsetof(struct acpi_header, checksum), size);
> +
> +    return slit;
> +}
> +
>  static int construct_passthrough_tables(unsigned long *table_ptrs,
>                                          int nr_tables)
>  {
> @@ -319,6 +351,7 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
>      struct acpi_20_waet *waet;
>      struct acpi_20_tcpa *tcpa;
>      struct acpi_20_srat *srat;
> +    struct acpi_20_slit *slit;
>      unsigned char *ssdt;
>      static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001};
>      uint16_t *tis_hdr;
> @@ -408,7 +441,7 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
>          }
>      }
>  
> -    /* SRAT */
> +    /* SRAT and SLIT */
>      if ( nr_vnodes > 0 )
>      {
>          srat = construct_srat();
> @@ -416,6 +449,11 @@ static int construct_secondary_tables(unsigned long *table_ptrs,
>              table_ptrs[nr_tables++] = (unsigned long)srat;
>          else
>              printf("Failed to build SRAT, skipping...\n");
> +        slit = construct_slit();
> +        if ( slit )
> +            table_ptrs[nr_tables++] = (unsigned long)slit;
> +        else
> +            printf("Failed to build SLIT, skipping...\n");
>      }
>  
>      /* Load any additional tables passed through. */

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-13 16:06         ` Wei Liu
@ 2015-02-13 16:11           ` Elena Ufimtseva
  2015-02-17 16:51             ` Dario Faggioli
  0 siblings, 1 reply; 94+ messages in thread
From: Elena Ufimtseva @ 2015-02-13 16:11 UTC (permalink / raw)
  To: Wei Liu
  Cc: Ian Campbell, Andrew Cooper, Dario Faggioli, Ian Jackson,
	xen-devel, Jan Beulich

On Fri, Feb 13, 2015 at 11:06 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> On Fri, Feb 13, 2015 at 10:39:25AM -0500, Elena Ufimtseva wrote:
>> On Fri, Feb 13, 2015 at 10:12 AM, Wei Liu <wei.liu2@citrix.com> wrote:
>> > On Fri, Feb 13, 2015 at 02:15:47PM +0000, Ian Jackson wrote:
>> >> Wei Liu writes ("[PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check"):
>> >> > This function is used to check whether vNUMA configuration (be it
>> >> > auto-generated or supplied by user) is valid.
>> >>
>> >> This looks plausible, but I think you should explain what the impact
>> >> of this patch is.  Presumably the intent is to replace various later
>> >> failures with ERROR_FAIL with something more useful and more
>> >> specific ?
>> >>
>> >
>> > Yes, providing more useful error message is on aspect. Another aspect is
>> > just to do sanity check -- passing an invalid layout to guest doesn't
>> > make much sense.
>> >
>> >> Are there any cases which this new check forbids but which are
>> >> currently accepted by libxl ?  If so then we have to think about
>> >> compatibility.
>> >>
>> >
>> > First thing is there is no previous supported vNUMA interface in
>> > toolstack so there won't be a situation where previous good config
>> > doesn't pass this check.
>> >
>> > Second thing is if user supplies a config without vNUMA configuration
>> > this function will not get called, so it won't have any effect.
>> >
>> >> Also I would like to see an ack from the authors of the vnuma support,
>> >> as I'm not familiar enough with vnuma to fully understand the
>> >> semantics of the new checks.
>> >>
>> >
>> > Elena and Dario, what do you think?
>>
>> The checks themselves look reasonable. And unforgiving :)
>> I think we had discussion before and some previous patches were
>> bailing out to some default/basic vnuma
>> configuration (when possible) in case of 'recoverable'  errors in config.
>>
>
> Since this is new I would start with strict then consider recoverable
> configs later. It's hard to code for something that's not yet well
> defined.

Understood.
>
>> Any sanity checks for distances?
>>
>
> The same applies, what is a valid distance what is not? I guess zero is
> not valid? Or do we enforce that the distance to local node must be
> smaller than or equal to the distance to remote node?

Yes, I think the second condition is enough for strict checking.

>
> Wei.
>
>> >
>> > Wei.
>> >
>> >> Thanks,
>> >> Ian.
>>
>>
>>
>> --
>> Elena



-- 
Elena

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 16/24] libxc: allocate memory with vNUMA information for HVM guest
  2015-02-12 19:44 ` [PATCH v5 16/24] libxc: allocate memory with vNUMA information for HVM guest Wei Liu
@ 2015-02-13 16:22   ` Andrew Cooper
  0 siblings, 0 replies; 94+ messages in thread
From: Andrew Cooper @ 2015-02-13 16:22 UTC (permalink / raw)
  To: Wei Liu, xen-devel
  Cc: dario.faggioli, JBeulich, ian.jackson, ian.campbell, ufimtseva

On 12/02/15 19:44, Wei Liu wrote:
> The algorithm is more or less the same as the one used for PV guest.
> Libxc gets hold of the mapping of vnode to pnode and size of each vnode
> then allocate memory accordingly.
>
> And then the function returns low memory end, high memory end and mmio
> start to caller. Libxl needs those values to construct vmemranges for
> that guest.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> ---
> Changes in v5:
> 1. Use a better loop variable name vnid.
>
> Changes in v4:
> 1. Adapt to new interface.
> 2. Shorten error message.
> 3. This patch includes only functional changes.
>
> Changes in v3:
> 1. Rewrite commit log.
> 2. Add a few code comments.
> ---
>  tools/libxc/include/xenguest.h |  11 +++++
>  tools/libxc/xc_hvm_build_x86.c | 105 ++++++++++++++++++++++++++++++++++-------
>  2 files changed, 100 insertions(+), 16 deletions(-)
>
> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> index 40bbac8..ff66cb1 100644
> --- a/tools/libxc/include/xenguest.h
> +++ b/tools/libxc/include/xenguest.h
> @@ -230,6 +230,17 @@ struct xc_hvm_build_args {
>      struct xc_hvm_firmware_module smbios_module;
>      /* Whether to use claim hypercall (1 - enable, 0 - disable). */
>      int claim_enabled;
> +
> +    /* vNUMA information*/
> +    xen_vmemrange_t *vmemranges;
> +    unsigned int nr_vmemranges;
> +    unsigned int *vnode_to_pnode;
> +    unsigned int nr_vnodes;
> +
> +    /* Out parameters  */
> +    uint64_t lowmem_end;
> +    uint64_t highmem_end;
> +    uint64_t mmio_start;
>  };
>  
>  /**
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index ecc3224..a2a3777 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -89,7 +89,8 @@ static int modules_init(struct xc_hvm_build_args *args,
>  }
>  
>  static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
> -                           uint64_t mmio_start, uint64_t mmio_size)
> +                           uint64_t mmio_start, uint64_t mmio_size,
> +                           struct xc_hvm_build_args *args)
>  {
>      struct hvm_info_table *hvm_info = (struct hvm_info_table *)
>          (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
> @@ -119,6 +120,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
>      hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
>      hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
>  
> +    args->lowmem_end = lowmem_end;
> +    args->highmem_end = highmem_end;
> +    args->mmio_start = mmio_start;
> +
>      /* Finish with the checksum. */
>      for ( i = 0, sum = 0; i < hvm_info->length; i++ )
>          sum += ((uint8_t *)hvm_info)[i];
> @@ -244,7 +249,7 @@ static int setup_guest(xc_interface *xch,
>                         char *image, unsigned long image_size)
>  {
>      xen_pfn_t *page_array = NULL;
> -    unsigned long i, nr_pages = args->mem_size >> PAGE_SHIFT;
> +    unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
>      unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
>      uint64_t mmio_start = (1ull << 32) - args->mmio_size;
>      uint64_t mmio_size = args->mmio_size;
> @@ -258,13 +263,13 @@ static int setup_guest(xc_interface *xch,
>      xen_capabilities_info_t caps;
>      unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
>          stat_1gb_pages = 0;
> -    int pod_mode = 0;
> +    unsigned int memflags = 0;
>      int claim_enabled = args->claim_enabled;
>      xen_pfn_t special_array[NR_SPECIAL_PAGES];
>      xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
> -
> -    if ( nr_pages > target_pages )
> -        pod_mode = XENMEMF_populate_on_demand;
> +    uint64_t total_pages;
> +    xen_vmemrange_t dummy_vmemrange;
> +    unsigned int dummy_vnode_to_pnode;
>  
>      memset(&elf, 0, sizeof(elf));
>      if ( elf_init(&elf, image, image_size) != 0 )
> @@ -276,6 +281,43 @@ static int setup_guest(xc_interface *xch,
>      v_start = 0;
>      v_end = args->mem_size;
>  
> +    if ( nr_pages > target_pages )
> +        memflags |= XENMEMF_populate_on_demand;
> +
> +    if ( args->nr_vmemranges == 0 )
> +    {
> +        /* Build dummy vnode information */
> +        dummy_vmemrange.start = 0;
> +        dummy_vmemrange.end   = args->mem_size;
> +        dummy_vmemrange.flags = 0;
> +        dummy_vmemrange.nid   = 0;
> +        args->nr_vmemranges = 1;
> +        args->vmemranges = &dummy_vmemrange;
> +
> +        dummy_vnode_to_pnode = XC_VNUMA_NO_NODE;
> +        args->nr_vnodes = 1;
> +        args->vnode_to_pnode = &dummy_vnode_to_pnode;
> +    }
> +    else
> +    {
> +        if ( nr_pages > target_pages )
> +        {
> +            PERROR("Cannot enable vNUMA and PoD at the same time");

We would solve a large number of interaction issues like this if someone
had the time to reimplement PoD using the paging system to page in a
page of zeroes.

It would be functionally identical from the guests point of view,
wouldn't need any toolstack interaction, and would reduce the number of
moving parts involved in setting up memory for domain.

(I don't suggest this being a prerequisite to this patch series.)

~Andrew

> +            goto error_out;
> +        }
> +    }
> +
> +    total_pages = 0;
> +    for ( i = 0; i < args->nr_vmemranges; i++ )
> +        total_pages += ((args->vmemranges[i].end - args->vmemranges[i].start)
> +                        >> PAGE_SHIFT);
> +    if ( total_pages != (args->mem_size >> PAGE_SHIFT) )
> +    {
> +        PERROR("vNUMA memory pages mismatch (0x%"PRIx64" != 0x%"PRIx64")",
> +               total_pages, args->mem_size >> PAGE_SHIFT);
> +        goto error_out;
> +    }
> +
>      if ( xc_version(xch, XENVER_capabilities, &caps) != 0 )
>      {
>          PERROR("Could not get Xen capabilities");
> @@ -320,7 +362,7 @@ static int setup_guest(xc_interface *xch,
>          }
>      }
>  
> -    if ( pod_mode )
> +    if ( memflags & XENMEMF_populate_on_demand )
>      {
>          /*
>           * Subtract VGA_HOLE_SIZE from target_pages for the VGA
> @@ -349,15 +391,40 @@ static int setup_guest(xc_interface *xch,
>       * ensure that we can be preempted and hence dom0 remains responsive.
>       */
>      rc = xc_domain_populate_physmap_exact(
> -        xch, dom, 0xa0, 0, pod_mode, &page_array[0x00]);
> -    cur_pages = 0xc0;
> -    stat_normal_pages = 0xc0;
> +        xch, dom, 0xa0, 0, memflags, &page_array[0x00]);
>  
> +    stat_normal_pages = 0;
> +    for ( vmemid = 0; vmemid < args->nr_vmemranges; vmemid++ )
>      {
> -        while ( (rc == 0) && (nr_pages > cur_pages) )
> +        unsigned int new_memflags = memflags;
> +        uint64_t end_pages;
> +        unsigned int vnode = args->vmemranges[vmemid].nid;
> +        unsigned int pnode = args->vnode_to_pnode[vnode];
> +
> +        if ( pnode != XC_VNUMA_NO_NODE )
> +        {
> +            new_memflags |= XENMEMF_exact_node(pnode);
> +            new_memflags |= XENMEMF_exact_node_request;
> +        }
> +
> +        end_pages = args->vmemranges[i].end >> PAGE_SHIFT;
> +        /*
> +         * Consider vga hole belongs to the vmemrange that covers
> +         * 0xA0000-0xC0000. Note that 0x00000-0xA0000 is populated just
> +         * before this loop.
> +         */
> +        if ( args->vmemranges[vmemid].start == 0 )
> +        {
> +            cur_pages = 0xc0;
> +            stat_normal_pages += 0xc0;
> +        }
> +        else
> +            cur_pages = args->vmemranges[vmemid].start >> PAGE_SHIFT;
> +
> +        while ( (rc == 0) && (end_pages > cur_pages) )
>          {
>              /* Clip count to maximum 1GB extent. */
> -            unsigned long count = nr_pages - cur_pages;
> +            unsigned long count = end_pages - cur_pages;
>              unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS;
>  
>              if ( count > max_pages )
> @@ -394,7 +461,7 @@ static int setup_guest(xc_interface *xch,
>  
>                  done = xc_domain_populate_physmap(xch, dom, nr_extents,
>                                                    SUPERPAGE_1GB_SHIFT,
> -                                                  pod_mode, sp_extents);
> +                                                  memflags, sp_extents);
>  
>                  if ( done > 0 )
>                  {
> @@ -434,7 +501,7 @@ static int setup_guest(xc_interface *xch,
>  
>                      done = xc_domain_populate_physmap(xch, dom, nr_extents,
>                                                        SUPERPAGE_2MB_SHIFT,
> -                                                      pod_mode, sp_extents);
> +                                                      memflags, sp_extents);
>  
>                      if ( done > 0 )
>                      {
> @@ -450,11 +517,14 @@ static int setup_guest(xc_interface *xch,
>              if ( count != 0 )
>              {
>                  rc = xc_domain_populate_physmap_exact(
> -                    xch, dom, count, 0, pod_mode, &page_array[cur_pages]);
> +                    xch, dom, count, 0, new_memflags, &page_array[cur_pages]);
>                  cur_pages += count;
>                  stat_normal_pages += count;
>              }
>          }
> +
> +        if ( rc != 0 )
> +            break;
>      }
>  
>      if ( rc != 0 )
> @@ -478,7 +548,7 @@ static int setup_guest(xc_interface *xch,
>                xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
>                HVM_INFO_PFN)) == NULL )
>          goto error_out;
> -    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size);
> +    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size, args);
>      munmap(hvm_info_page, PAGE_SIZE);
>  
>      /* Allocate and clear special pages. */
> @@ -617,6 +687,9 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
>              args.acpi_module.guest_addr_out;
>          hvm_args->smbios_module.guest_addr_out = 
>              args.smbios_module.guest_addr_out;
> +        hvm_args->lowmem_end = args.lowmem_end;
> +        hvm_args->highmem_end = args.highmem_end;
> +        hvm_args->mmio_start = args.mmio_start;
>      }
>  
>      free(image);

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 01/24] xen: dump vNUMA information with debug key "u"
  2015-02-13 11:50   ` Andrew Cooper
@ 2015-02-16 14:35     ` Dario Faggioli
  0 siblings, 0 replies; 94+ messages in thread
From: Dario Faggioli @ 2015-02-16 14:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, JBeulich, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 383 bytes --]

On Fri, 2015-02-13 at 11:50 +0000, Andrew Cooper wrote:
> On 12/02/15 19:44, Wei Liu wrote:
> > Signed-off-by: Elena Ufimsteva <ufimtseva@gmail.com>
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Cc: Jan Beulich <JBeulich@suse.com>
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 04/24] libxc: add p2m_size to xc_dom_image
  2015-02-12 19:44 ` [PATCH v5 04/24] libxc: add p2m_size to xc_dom_image Wei Liu
@ 2015-02-16 14:46   ` Dario Faggioli
  2015-02-16 14:49     ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Dario Faggioli @ 2015-02-16 14:46 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 843 bytes --]

On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> Add a new field p2m_size to keep track of the number of pages covert by
>
Here and in the code (comment): is it 'covert' or 'covered' (or maybe
even something else)?

In any case...

> p2m.  Change total_pages to p2m_size in functions which in fact need
> the size of p2m.
> 
> This is needed because we are going to ditch the assumption that PV x86
> has only one contiguous ram region. Originally the p2m size was always
> equal to total_pages, but we will soon change that in later patch.
> 
> This patch doesn't change the behaviour of libxc.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>
... Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 04/24] libxc: add p2m_size to xc_dom_image
  2015-02-16 14:46   ` Dario Faggioli
@ 2015-02-16 14:49     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-16 14:49 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, JBeulich, Andrew Cooper, xen-devel, ufimtseva,
	Ian Jackson, Ian Campbell

On Mon, Feb 16, 2015 at 02:46:54PM +0000, Dario Faggioli wrote:
> On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> > Add a new field p2m_size to keep track of the number of pages covert by
> >
> Here and in the code (comment): is it 'covert' or 'covered' (or maybe
> even something else)?

Should be "covered".

> 
> In any case...
> 
> > p2m.  Change total_pages to p2m_size in functions which in fact need
> > the size of p2m.
> > 
> > This is needed because we are going to ditch the assumption that PV x86
> > has only one contiguous ram region. Originally the p2m size was always
> > equal to total_pages, but we will soon change that in later patch.
> > 
> > This patch doesn't change the behaviour of libxc.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> >
> ... Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
> 

Thanks.

Wei.

> Regards,
> Dario

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 06/24] libxl: introduce vNUMA types
  2015-02-12 19:44 ` [PATCH v5 06/24] libxl: introduce vNUMA types Wei Liu
@ 2015-02-16 14:58   ` Dario Faggioli
  2015-02-16 15:17     ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Dario Faggioli @ 2015-02-16 14:58 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 2159 bytes --]

On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> A domain can contain several virtual NUMA nodes, hence we introduce an
> array in libxl_domain_build_info.
> 
> libxl_vnode_info contains the size of memory in that node, the distance
> from that node to every nodes, the underlying pnode and a bitmap of
> vcpus.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
>
This looks fine, and matches what we agreed upon for this interface, a
few months back, so:

Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Just one comment.

> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 02be466..14c7e7c 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -356,6 +356,13 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
>      ("budget",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
>      ])
>  
> +libxl_vnode_info = Struct("vnode_info", [
> +    ("memkb", MemKB),
> +    ("distances", Array(uint32, "num_distances")), # distances from this node to other nodes
> +    ("pnode", uint32), # physical node of this node
>
I am unsure whether we ever discussed this already or not (and sorry for
not recalling) but, in principle, one vnode can be mapped to more than
just one pnode.

Semantic would be that the memory of the vnode is somehow split (evenly,
by default, I would say) between the specified pnodes. So, pnode could
be a bitmap too (and be called "pnodes" :-) ), although we can put
checks in place that --for now-- it always have only one bit set.

Reasons might be that the user just wants it, or that there is not
enough (free) memory on just one pnode, but we still want to achieve
some locality.

This is probably something we can change/add later, but since it affects
the interface, I felt like saying it as soon as it came to my mind.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 06/24] libxl: introduce vNUMA types
  2015-02-16 14:58   ` Dario Faggioli
@ 2015-02-16 15:17     ` Wei Liu
  2015-02-16 15:56       ` Dario Faggioli
  0 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-16 15:17 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, JBeulich, Andrew Cooper, xen-devel, ufimtseva,
	Ian Jackson, Ian Campbell

On Mon, Feb 16, 2015 at 02:58:32PM +0000, Dario Faggioli wrote:
> On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> > A domain can contain several virtual NUMA nodes, hence we introduce an
> > array in libxl_domain_build_info.
> > 
> > libxl_vnode_info contains the size of memory in that node, the distance
> > from that node to every nodes, the underlying pnode and a bitmap of
> > vcpus.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Dario Faggioli <dario.faggioli@citrix.com>
> > Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> > Acked-by: Ian Campbell <ian.campbell@citrix.com>
> >
> This looks fine, and matches what we agreed upon for this interface, a
> few months back, so:
> 
> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> Just one comment.
> 
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 02be466..14c7e7c 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -356,6 +356,13 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
> >      ("budget",       integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
> >      ])
> >  
> > +libxl_vnode_info = Struct("vnode_info", [
> > +    ("memkb", MemKB),
> > +    ("distances", Array(uint32, "num_distances")), # distances from this node to other nodes
> > +    ("pnode", uint32), # physical node of this node
> >
> I am unsure whether we ever discussed this already or not (and sorry for
> not recalling) but, in principle, one vnode can be mapped to more than
> just one pnode.
> 

I don't recall either.

> Semantic would be that the memory of the vnode is somehow split (evenly,
> by default, I would say) between the specified pnodes. So, pnode could
> be a bitmap too (and be called "pnodes" :-) ), although we can put
> checks in place that --for now-- it always have only one bit set.
> 
> Reasons might be that the user just wants it, or that there is not
> enough (free) memory on just one pnode, but we still want to achieve
> some locality.
> 

Wouldn't this cause unpredictable performance? And there is no way to
specify priority among the group of nodes you specify with a single
bitmap.

I can't say I fully understand the implication of the scenario you
described.

Wei.

> This is probably something we can change/add later, but since it affects
> the interface, I felt like saying it as soon as it came to my mind.
> 
> Regards,
> Dario

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 06/24] libxl: introduce vNUMA types
  2015-02-16 15:17     ` Wei Liu
@ 2015-02-16 15:56       ` Dario Faggioli
  2015-02-16 16:11         ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Dario Faggioli @ 2015-02-16 15:56 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 2916 bytes --]

On Mon, 2015-02-16 at 15:17 +0000, Wei Liu wrote:
> On Mon, Feb 16, 2015 at 02:58:32PM +0000, Dario Faggioli wrote:

> > > +libxl_vnode_info = Struct("vnode_info", [
> > > +    ("memkb", MemKB),
> > > +    ("distances", Array(uint32, "num_distances")), # distances from this node to other nodes
> > > +    ("pnode", uint32), # physical node of this node
> > >
> > I am unsure whether we ever discussed this already or not (and sorry for
> > not recalling) but, in principle, one vnode can be mapped to more than
> > just one pnode.
> > 
> 
> I don't recall either.
> 
> > Semantic would be that the memory of the vnode is somehow split (evenly,
> > by default, I would say) between the specified pnodes. So, pnode could
> > be a bitmap too (and be called "pnodes" :-) ), although we can put
> > checks in place that --for now-- it always have only one bit set.
> > 
> > Reasons might be that the user just wants it, or that there is not
> > enough (free) memory on just one pnode, but we still want to achieve
> > some locality.
> > 
> 
> Wouldn't this cause unpredictable performance? 
>
A certain amount of it, yes, for sure, but always less than having the
memory striped on all nodes, I would say.

Well, of course it depends on how it will be used, as usual with these
things...

> And there is no way to
> specify priority among the group of nodes you specify with a single
> bitmap.
> 
Why do we need such a thing as a 'priority'? What I'm talking about is
making it possible, for each vnode, to specify vnode-to-pnode mapping as
a bitmap of pnode. What we'd do, in presence of a bitmap, would be
allocating the memory by striping it across _all_ the pnodes present in
the bitmap.

If there's only one bit set, you have the same behavior as in this
patch.

> I can't say I fully understand the implication of the scenario you
> described.
> 
Ok. Imagine you want to create a guest with 2 vnodes, 4GB RAM total, so
2GB on each vnode. On the host, you have 8 pnodes, but only 1GB free on
each of them.

If you can only associate a vnode with a single pnode, there is no node
that can accommodate a full vnode, and we would have to give up trying
to place the domain and map the vnodes, and we'll end up with 0.5GB on
each pnode, unpredictable perf, and, basically, no vnuma at all (or at
least no vnode-to-pnode mapping)... Does this make sense?

If we allow the user (or the automatic placement algorithm) to specify a
bitmap of pnode for each vnode, he could put, say, vnode #1 on pnode #0
and #2, which maybe are really close (in terms of NUMA distances) to
each other, and vnode #2 to pnode #5 and #6 (close to each others too).
This would give worst performance than having each vnode on just one
pnode, but, most likely, better performance than the scenario described
right above.

Hope I made myself clear enough :-)

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 07/24] libxl: add vmemrange to libxl__domain_build_state
  2015-02-12 19:44 ` [PATCH v5 07/24] libxl: add vmemrange to libxl__domain_build_state Wei Liu
@ 2015-02-16 16:00   ` Dario Faggioli
  2015-02-16 16:15     ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Dario Faggioli @ 2015-02-16 16:00 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 1787 bytes --]

On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> A vnode consists of one or more vmemranges (virtual memory range).  One
> example of multiple vmemranges is that there is a hole in one vnode.
> 
> Currently we haven't exported vmemrange interface to libxl user.
> Vmemranges are generated during domain build, so we have relevant
> structures in domain build state.
> 
> Later if we discover we need to export the interface, those structures
> can be moved to libxl_domain_build_info as well.
> 
> These new fields (along with other fields in that struct) are set to 0
> at start of day so we don't need to explicitly initialise them. A
> following patch which introduces an independent checking function will
> need to access these fields. I don't feel very comfortable squashing
> this change into that one so I didn't use a single commit.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

> ---
> Changes in v5:
> 1. Fix commit message.
> 
> Changes in v4:
> 1. Improve commit message.
> 
> Changes in v3:
> 1. Rewrite commit message.
>
This is a rather amusing changes history, allow me to say! :-D :-D

Regards,
Dario

PS. The patch (as well as others) has Ian's ack already, so I know my
rev-by tag is rather pointless. However, as I've been involved in the
process, I'd still like to point out I've looked at and am ok with this.
Feel free not to incorporate the tag if you don't want to, or if you're
not resending some of the patches.

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 06/24] libxl: introduce vNUMA types
  2015-02-16 15:56       ` Dario Faggioli
@ 2015-02-16 16:11         ` Wei Liu
  2015-02-16 16:51           ` Dario Faggioli
  0 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-16 16:11 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, JBeulich, Andrew Cooper, xen-devel, ufimtseva,
	Ian Jackson, Ian Campbell

On Mon, Feb 16, 2015 at 03:56:21PM +0000, Dario Faggioli wrote:
> On Mon, 2015-02-16 at 15:17 +0000, Wei Liu wrote:
> > On Mon, Feb 16, 2015 at 02:58:32PM +0000, Dario Faggioli wrote:
> 
> > > > +libxl_vnode_info = Struct("vnode_info", [
> > > > +    ("memkb", MemKB),
> > > > +    ("distances", Array(uint32, "num_distances")), # distances from this node to other nodes
> > > > +    ("pnode", uint32), # physical node of this node
> > > >
> > > I am unsure whether we ever discussed this already or not (and sorry for
> > > not recalling) but, in principle, one vnode can be mapped to more than
> > > just one pnode.
> > > 
> > 
> > I don't recall either.
> > 
> > > Semantic would be that the memory of the vnode is somehow split (evenly,
> > > by default, I would say) between the specified pnodes. So, pnode could
> > > be a bitmap too (and be called "pnodes" :-) ), although we can put
> > > checks in place that --for now-- it always have only one bit set.
> > > 
> > > Reasons might be that the user just wants it, or that there is not
> > > enough (free) memory on just one pnode, but we still want to achieve
> > > some locality.
> > > 
> > 
> > Wouldn't this cause unpredictable performance? 
> >
> A certain amount of it, yes, for sure, but always less than having the
> memory striped on all nodes, I would say.
> 
> Well, of course it depends on how it will be used, as usual with these
> things...
> 
> > And there is no way to
> > specify priority among the group of nodes you specify with a single
> > bitmap.
> > 
> Why do we need such a thing as a 'priority'? What I'm talking about is
> making it possible, for each vnode, to specify vnode-to-pnode mapping as
> a bitmap of pnode. What we'd do, in presence of a bitmap, would be
> allocating the memory by striping it across _all_ the pnodes present in
> the bitmap.
> 

Should we enforce memory equally stripped across all nodes? If so this
should be stated explicitly in the comment of interface.  I can't see
that in your original description. I ask "priority" because I
interpreted as something else (which is one of many ways to interpret
I think).

If it's up to libxl to make dynamic choice, we should also say that. But
this is not very useful to user because libxl's algorithm can change
isn't it? How do users expect to know that across versions of Xen?

> If there's only one bit set, you have the same behavior as in this
> patch.
> 
> > I can't say I fully understand the implication of the scenario you
> > described.
> > 
> Ok. Imagine you want to create a guest with 2 vnodes, 4GB RAM total, so
> 2GB on each vnode. On the host, you have 8 pnodes, but only 1GB free on
> each of them.
> 
> If you can only associate a vnode with a single pnode, there is no node
> that can accommodate a full vnode, and we would have to give up trying
> to place the domain and map the vnodes, and we'll end up with 0.5GB on
> each pnode, unpredictable perf, and, basically, no vnuma at all (or at
> least no vnode-to-pnode mapping)... Does this make sense?
> 
> If we allow the user (or the automatic placement algorithm) to specify a
> bitmap of pnode for each vnode, he could put, say, vnode #1 on pnode #0
> and #2, which maybe are really close (in terms of NUMA distances) to
> each other, and vnode #2 to pnode #5 and #6 (close to each others too).
> This would give worst performance than having each vnode on just one
> pnode, but, most likely, better performance than the scenario described
> right above.
> 

I get what you mean. So by writing the above paragraphs, you sort of
confirm that there still are too many implications in the algorithms,
right? A user cannot just tell from the interface what the behaviour is
going to be.  You can of course say the algorithm is fixed but I don't
think we want to do that?

Wei.

> Hope I made myself clear enough :-)
> 
> Regards,
> Dario

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 07/24] libxl: add vmemrange to libxl__domain_build_state
  2015-02-16 16:00   ` Dario Faggioli
@ 2015-02-16 16:15     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-16 16:15 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, JBeulich, Andrew Cooper, xen-devel, ufimtseva,
	Ian Jackson, Ian Campbell

On Mon, Feb 16, 2015 at 04:00:19PM +0000, Dario Faggioli wrote:
> On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> > A vnode consists of one or more vmemranges (virtual memory range).  One
> > example of multiple vmemranges is that there is a hole in one vnode.
> > 
> > Currently we haven't exported vmemrange interface to libxl user.
> > Vmemranges are generated during domain build, so we have relevant
> > structures in domain build state.
> > 
> > Later if we discover we need to export the interface, those structures
> > can be moved to libxl_domain_build_info as well.
> > 
> > These new fields (along with other fields in that struct) are set to 0
> > at start of day so we don't need to explicitly initialise them. A
> > following patch which introduces an independent checking function will
> > need to access these fields. I don't feel very comfortable squashing
> > this change into that one so I didn't use a single commit.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Dario Faggioli <dario.faggioli@citrix.com>
> > Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> > Acked-by: Ian Campbell <ian.campbell@citrix.com>
> >
> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> > ---
> > Changes in v5:
> > 1. Fix commit message.
> > 
> > Changes in v4:
> > 1. Improve commit message.
> > 
> > Changes in v3:
> > 1. Rewrite commit message.
> >
> This is a rather amusing changes history, allow me to say! :-D :-D
> 

I find myself writing confusing things all the time.  :-)

> Regards,
> Dario
> 
> PS. The patch (as well as others) has Ian's ack already, so I know my
> rev-by tag is rather pointless. However, as I've been involved in the
> process, I'd still like to point out I've looked at and am ok with this.
> Feel free not to incorporate the tag if you don't want to, or if you're
> not resending some of the patches.

More eyes are always good.

Wei.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 06/24] libxl: introduce vNUMA types
  2015-02-16 16:11         ` Wei Liu
@ 2015-02-16 16:51           ` Dario Faggioli
  2015-02-16 17:38             ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Dario Faggioli @ 2015-02-16 16:51 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 4711 bytes --]

On Mon, 2015-02-16 at 16:11 +0000, Wei Liu wrote:
> On Mon, Feb 16, 2015 at 03:56:21PM +0000, Dario Faggioli wrote:
> > On Mon, 2015-02-16 at 15:17 +0000, Wei Liu wrote:

> > > And there is no way to
> > > specify priority among the group of nodes you specify with a single
> > > bitmap.
> > > 
> > Why do we need such a thing as a 'priority'? What I'm talking about is
> > making it possible, for each vnode, to specify vnode-to-pnode mapping as
> > a bitmap of pnode. What we'd do, in presence of a bitmap, would be
> > allocating the memory by striping it across _all_ the pnodes present in
> > the bitmap.
> > 
> 
> Should we enforce memory equally stripped across all nodes? If so this
> should be stated explicitly in the comment of interface.  
>
I don't think we should enforce anything... I was much rather describing
what happens *right* *now* in that scenario, it being documented or not.

> I can't see
> that in your original description. I ask "priority" because I
> interpreted as something else (which is one of many ways to interpret
> I think).
> 
So, if you're saying that, if we use a bitmap, we should write somewhere
how libxl would use it, I certainly agree. Up to what level of details
we, at that point, should do that, I'm not sure. I think I'd be fine, as
a user, if finding it written that "the memory of the vnode will be
allocated out of the pnodes specified in the bitmap", with no much
further detail, especially considering the use case for the feature.

> If it's up to libxl to make dynamic choice, we should also say that. But
> this is not very useful to user because libxl's algorithm can change
> isn't it? How do users expect to know that across versions of Xen?
> 
Why does he need to? This would be something enabling a bit more of
flexibility, if one wants it, or a bit less worse performance, in some
specific situations, and all this pretty much independently from the
algorithm used inside libxl, I think.

As I said, if there is only 1GB free on all pnodes, the user will be
allowed to specify a set of pnodes for the vnodes, instead of not being
able to use vnuma at all, no matter how libxl (or whoever else) will
actually split the memory, in this, previous or future version of Xen...
This is the scenario I'm talking about, and in such a scenario, knowing
how the split happens, does not really help much, it is just the
_possibility_ of splitting, that helps...

> > If we allow the user (or the automatic placement algorithm) to specify a
> > bitmap of pnode for each vnode, he could put, say, vnode #1 on pnode #0
> > and #2, which maybe are really close (in terms of NUMA distances) to
> > each other, and vnode #2 to pnode #5 and #6 (close to each others too).
> > This would give worst performance than having each vnode on just one
> > pnode, but, most likely, better performance than the scenario described
> > right above.
> > 
> 
> I get what you mean. So by writing the above paragraphs, you sort of
> confirm that there still are too many implications in the algorithms,
> right? A user cannot just tell from the interface what the behaviour is
> going to be.  
>
An user can tell that, if he wants a vnode 2GB wide, and there is no
pnode with 2GB free, but the sum of free memory in pnode #4 and #6 is >=
2GB, he can still use vNUMA, by paying the (small or high will depends
on more factors) price of having that vnode split in two (or more!).

I think there would be room for some increased user satisfaction in
this, even without knowing much and/or being in control on how exactly
the split happens, as there are chances for performance to be (if the
thing is used properly) better than in the no-vNUMA case, which is what
we're after.

> You can of course say the algorithm is fixed but I don't
> think we want to do that?
> 
I don't want to, but I don't think it's needed.

Anyway, I'm more than ok if we want to defer the discussion to after
this series is in. It will require a further change in the interface,
but I don't think it would be a terrible price to pay, if we decide the
feature is worth.

Or, and that was the other thing I was suggesting, we can have the
bitmap in vnode_info since now, but then only accept ints in xl config
parsing, and enforce the weight of the bitmap to be 1 (and perhaps print
a warning) for now. This would not require changing the API in future,
it'd just be a matter of changing the xl config file parsing. The
"problem" would still stand for libxl callers different than xl, though,
I know.

Regards,
Dario

> Wei.
> 
> > Hope I made myself clear enough :-)
> > 
> > Regards,
> > Dario
> 
> 


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest
  2015-02-12 19:44 ` [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest Wei Liu
  2015-02-13 14:30   ` Andrew Cooper
@ 2015-02-16 16:58   ` Dario Faggioli
  2015-02-16 17:44     ` Wei Liu
  1 sibling, 1 reply; 94+ messages in thread
From: Dario Faggioli @ 2015-02-16 16:58 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 1271 bytes --]

On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:

> @@ -760,7 +760,8 @@ static int x86_shadow(xc_interface *xch, domid_t domid)
>  int arch_setup_meminit(struct xc_dom_image *dom)
>  {
>      int rc;
> -    xen_pfn_t pfn, allocsz, i, j, mfn;
> +    xen_pfn_t pfn, allocsz, mfn, total, pfn_base;
> +    int i, j;
>  
>      rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type);
>      if ( rc )
> @@ -811,26 +812,101 @@ int arch_setup_meminit(struct xc_dom_image *dom)
>              if ( rc )
>                  return rc;
>          }
> -        /* setup initial p2m */
> -        dom->p2m_size = dom->total_pages;
> +
> +        /* Setup dummy vNUMA information if it's not provided. Note
> +         * that this is a valid state if libxl doesn't provide any
> +         * vNUMA information.
> +         *
> +         * The dummy values make libxc allocate all pages from
> +         * arbitrary physical. This is the expected behaviour if no
> +         * vNUMA configuration is provided to libxc.
> +         *
"from arbitrary phisical nodes" maybe (or something like that?)

Also, it feels like these two paragraphs can be merged into one. Not a
bit deal, though, and the rest of the patch also looks fine.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 09/24] libxl: x86: factor out e820_host_sanitize
  2015-02-13 15:42   ` Andrew Cooper
@ 2015-02-16 17:00     ` Dario Faggioli
  0 siblings, 0 replies; 94+ messages in thread
From: Dario Faggioli @ 2015-02-16 17:00 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, JBeulich, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 783 bytes --]

On Fri, 2015-02-13 at 15:42 +0000, Andrew Cooper wrote:
> On 12/02/15 19:44, Wei Liu wrote:
> > This function gets the machine E820 map and sanitize it according to PV
> > guest configuration.
> >
> > This will be used in later patch. No functional change introduced in
> > this patch.
> >
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Dario Faggioli <dario.faggioli@citrix.com>
> > Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> > Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 
> Looks to have addressed my previous concerns.
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 06/24] libxl: introduce vNUMA types
  2015-02-16 16:51           ` Dario Faggioli
@ 2015-02-16 17:38             ` Wei Liu
  2015-02-17 10:42               ` Dario Faggioli
  0 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-16 17:38 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, JBeulich, Andrew Cooper, xen-devel, ufimtseva,
	Ian Jackson, Ian Campbell

On Mon, Feb 16, 2015 at 04:51:43PM +0000, Dario Faggioli wrote:
> On Mon, 2015-02-16 at 16:11 +0000, Wei Liu wrote:
> > On Mon, Feb 16, 2015 at 03:56:21PM +0000, Dario Faggioli wrote:
> > > On Mon, 2015-02-16 at 15:17 +0000, Wei Liu wrote:
> 
> > > > And there is no way to
> > > > specify priority among the group of nodes you specify with a single
> > > > bitmap.
> > > > 
> > > Why do we need such a thing as a 'priority'? What I'm talking about is
> > > making it possible, for each vnode, to specify vnode-to-pnode mapping as
> > > a bitmap of pnode. What we'd do, in presence of a bitmap, would be
> > > allocating the memory by striping it across _all_ the pnodes present in
> > > the bitmap.
> > > 
> > 
> > Should we enforce memory equally stripped across all nodes? If so this
> > should be stated explicitly in the comment of interface.  
> >
> I don't think we should enforce anything... I was much rather describing
> what happens *right* *now* in that scenario, it being documented or not.
> 
> > I can't see
> > that in your original description. I ask "priority" because I
> > interpreted as something else (which is one of many ways to interpret
> > I think).
> > 
> So, if you're saying that, if we use a bitmap, we should write somewhere
> how libxl would use it, I certainly agree. Up to what level of details
> we, at that point, should do that, I'm not sure. I think I'd be fine, as
> a user, if finding it written that "the memory of the vnode will be
> allocated out of the pnodes specified in the bitmap", with no much
> further detail, especially considering the use case for the feature.
> 

This is of course OK. And the most simple implementation of this
strategy is to pass on the node information to Xen to let Xen decide
which node of the several nodes specified to allocate from. This would
be trivial.

I think having a vnode mapped able to map to several pnode would be
good. I'm just trying to figure out if a single bitmap is enough to
cover all the sensible usecases. Or what should we say about that
interface.

> > If it's up to libxl to make dynamic choice, we should also say that. But
> > this is not very useful to user because libxl's algorithm can change
> > isn't it? How do users expect to know that across versions of Xen?
> > 
> Why does he need to? This would be something enabling a bit more of
> flexibility, if one wants it, or a bit less worse performance, in some
> specific situations, and all this pretty much independently from the
> algorithm used inside libxl, I think.
> 
> As I said, if there is only 1GB free on all pnodes, the user will be
> allowed to specify a set of pnodes for the vnodes, instead of not being
> able to use vnuma at all, no matter how libxl (or whoever else) will
> actually split the memory, in this, previous or future version of Xen...
> This is the scenario I'm talking about, and in such a scenario, knowing
> how the split happens, does not really help much, it is just the
> _possibility_ of splitting, that helps...
> 

I don't see this problem that way though.

Basically you're saying, a user wants to use vNUMA, then at some point
he / she finds out there is no enough memory in each specified pnode to
accommodate his / her requirement, then he / she changes the
configuration on the fly.

In reality, if you have mass deployment you probably won't do that.  You
might just want to use the same configuration all the time. Now this
configuration has different performance on different versions of Xen
because the algorithm is not a fixed algorithm (which is not necessary a
bad thing though, because you can have more sensible algorithm to
improve performance).

> > > If we allow the user (or the automatic placement algorithm) to specify a
> > > bitmap of pnode for each vnode, he could put, say, vnode #1 on pnode #0
> > > and #2, which maybe are really close (in terms of NUMA distances) to
> > > each other, and vnode #2 to pnode #5 and #6 (close to each others too).
> > > This would give worst performance than having each vnode on just one
> > > pnode, but, most likely, better performance than the scenario described
> > > right above.
> > > 
> > 
> > I get what you mean. So by writing the above paragraphs, you sort of
> > confirm that there still are too many implications in the algorithms,
> > right? A user cannot just tell from the interface what the behaviour is
> > going to be.  
> >
> An user can tell that, if he wants a vnode 2GB wide, and there is no
> pnode with 2GB free, but the sum of free memory in pnode #4 and #6 is >=
> 2GB, he can still use vNUMA, by paying the (small or high will depends
> on more factors) price of having that vnode split in two (or more!).
> 

What if #4 and #6 do have > 2GB ram each?  What will the behaviour be?
Does it imply better or worse performance?  Again, I'm thinking about
migrating the same configuration to another version of Xen, or even just
another host that has enough memory.

I guess the best we can say (at this point, if we're to use a bitmap),
is that memory will allocate from the nodes specified, user should not
expect any specific behaviour -- that basically is telling user not to
specify multiple nodes...

> I think there would be room for some increased user satisfaction in
> this, even without knowing much and/or being in control on how exactly
> the split happens, as there are chances for performance to be (if the
> thing is used properly) better than in the no-vNUMA case, which is what
> we're after.
> 
> > You can of course say the algorithm is fixed but I don't
> > think we want to do that?
> > 
> I don't want to, but I don't think it's needed.
> 
> Anyway, I'm more than ok if we want to defer the discussion to after
> this series is in. It will require a further change in the interface,
> but I don't think it would be a terrible price to pay, if we decide the
> feature is worth.
> 
> Or, and that was the other thing I was suggesting, we can have the
> bitmap in vnode_info since now, but then only accept ints in xl config
> parsing, and enforce the weight of the bitmap to be 1 (and perhaps print
> a warning) for now. This would not require changing the API in future,
> it'd just be a matter of changing the xl config file parsing. The
> "problem" would still stand for libxl callers different than xl, though,
> I know.
> 

Note that the uint32_t mapping has a very rigid semantics.  As long as
you give me a well-defined semantics of that bitmap I'm fine with this.
Otherwise I feel more comfortable with the interface as it is.

Wei.

> Regards,
> Dario
> 
> > Wei.
> > 
> > > Hope I made myself clear enough :-)
> > > 
> > > Regards,
> > > Dario
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest
  2015-02-16 16:58   ` Dario Faggioli
@ 2015-02-16 17:44     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-16 17:44 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, JBeulich, Andrew Cooper, xen-devel, ufimtseva,
	Ian Jackson, Ian Campbell

On Mon, Feb 16, 2015 at 04:58:11PM +0000, Dario Faggioli wrote:
> On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> 
> > @@ -760,7 +760,8 @@ static int x86_shadow(xc_interface *xch, domid_t domid)
> >  int arch_setup_meminit(struct xc_dom_image *dom)
> >  {
> >      int rc;
> > -    xen_pfn_t pfn, allocsz, i, j, mfn;
> > +    xen_pfn_t pfn, allocsz, mfn, total, pfn_base;
> > +    int i, j;
> >  
> >      rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type);
> >      if ( rc )
> > @@ -811,26 +812,101 @@ int arch_setup_meminit(struct xc_dom_image *dom)
> >              if ( rc )
> >                  return rc;
> >          }
> > -        /* setup initial p2m */
> > -        dom->p2m_size = dom->total_pages;
> > +
> > +        /* Setup dummy vNUMA information if it's not provided. Note
> > +         * that this is a valid state if libxl doesn't provide any
> > +         * vNUMA information.
> > +         *
> > +         * The dummy values make libxc allocate all pages from
> > +         * arbitrary physical. This is the expected behaviour if no
> > +         * vNUMA configuration is provided to libxc.
> > +         *
> "from arbitrary phisical nodes" maybe (or something like that?)
> 

Fixed. Thanks.

Wei.

> Also, it feels like these two paragraphs can be merged into one. Not a
> bit deal, though, and the rest of the patch also looks fine.
> 
> Regards,
> Dario

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 22/24] libxlu: introduce new APIs
  2015-02-13 14:12   ` Ian Jackson
@ 2015-02-16 19:10     ` Wei Liu
  2015-02-16 19:47       ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-16 19:10 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

On Fri, Feb 13, 2015 at 02:12:29PM +0000, Ian Jackson wrote:
> Wei Liu writes ("[PATCH v5 22/24] libxlu: introduce new APIs"):
> > These APIs can be used to manipulate XLU_ConfigValue and XLU_ConfigList.
> > 
> > +    if (value->type != XLU_STRING) {
> > +        if (!dont_warn)
> > +            fprintf(cfg->report, "warning: value is not a string\n");
> > +        *value_r = NULL;
> > +        return EINVAL;
> 
> This message needs to include the file and line number, or it is very
> hard for the user to use.  The other call sites (which are based on
> `find') require the caller to provide a name, which means that the
> setting name can be printed too.  Maybe you could do something
> similar.
> 

It is a bit different from a setting because a value doesn't have a
name.

I've added another patch to record line and column number for a value so
that we can use them here.

> If you were feeling keen you could replace these formulaic things with
> something like:
>    return report_bad_cfg(dont_warn, cfg, set, n, "value is not a string");
> or
>    return REPORT_BAD_CFG("value is not a string");
> (being a function or macro which always returns EINVAL), or some such.
> 

Do feel very keen about this since the format string differs from
functions. And it's only one printf anyway.

Wei.

> Thanks,
> Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 22/24] libxlu: introduce new APIs
  2015-02-16 19:10     ` Wei Liu
@ 2015-02-16 19:47       ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-16 19:47 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli, xen-devel,
	JBeulich, ufimtseva

On Mon, Feb 16, 2015 at 07:10:46PM +0000, Wei Liu wrote:
[...]
> > (being a function or macro which always returns EINVAL), or some such.
> > 
> 
> Do feel very keen about this since the format string differs from

Don't...

> functions. And it's only one printf anyway.
> 
> Wei.
> 
> > Thanks,
> > Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 06/24] libxl: introduce vNUMA types
  2015-02-16 17:38             ` Wei Liu
@ 2015-02-17 10:42               ` Dario Faggioli
  0 siblings, 0 replies; 94+ messages in thread
From: Dario Faggioli @ 2015-02-17 10:42 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 10971 bytes --]

On Mon, 2015-02-16 at 17:38 +0000, Wei Liu wrote:
> On Mon, Feb 16, 2015 at 04:51:43PM +0000, Dario Faggioli wrote:

> > So, if you're saying that, if we use a bitmap, we should write somewhere
> > how libxl would use it, I certainly agree. Up to what level of details
> > we, at that point, should do that, I'm not sure. I think I'd be fine, as
> > a user, if finding it written that "the memory of the vnode will be
> > allocated out of the pnodes specified in the bitmap", with no much
> > further detail, especially considering the use case for the feature.
> > 
> 
> This is of course OK. And the most simple implementation of this
> strategy is to pass on the node information to Xen to let Xen decide
> which node of the several nodes specified to allocate from. This would
> be trivial.
> 
Exactly. This is, by the way, what is happening right now if the user
sets explicitly an hard or a soft affinity for the vcpus that spans
multiple nodes. The domain's node affinity is dynamically calculated to
be, say, 0110, and memory is striped on nodes #1 and #2.

The end result is, most of the time (nearly) even distribution, but, for
instance, if we ran out of free RAM on #1, the allocation will continue
on #2, making things uneven, but still in line with what the user asked.

That behavior would be hence consistent with the already existing,
non-vNUMA case.

> I think having a vnode mapped able to map to several pnode would be
> good. I'm just trying to figure out if a single bitmap is enough to
> cover all the sensible usecases. Or what should we say about that
> interface.
> 
Well, ideally, you may turn the pnode map into another 'nested list',
making it possible to specify, for vnode #2, not only that you want the
memory from pnodes #0 and #4, but that you wants 0.5G from the former
and 1.5G from the latter. However, this:
 - makes the interface very complicated to both specify, understand and
   parse
 - requires non trivial changes inside Xen, which is of course possible,
   but is it worth?
 - very few benefits, as you won't have a fine grained enough control of
   what memory that 0.5G is comprised of (it's all within the same
   vnode!!), neither of what the guest OS will put there, e.g., across
   reboots. So, really, rather useless, IMO

So, yes, I think having a bitmap would be enough for now (for a while,
actually! :-D)

> > As I said, if there is only 1GB free on all pnodes, the user will be
> > allowed to specify a set of pnodes for the vnodes, instead of not being
> > able to use vnuma at all, no matter how libxl (or whoever else) will
> > actually split the memory, in this, previous or future version of Xen...
> > This is the scenario I'm talking about, and in such a scenario, knowing
> > how the split happens, does not really help much, it is just the
> > _possibility_ of splitting, that helps...
> > 
> 
> I don't see this problem that way though.
> 
> Basically you're saying, a user wants to use vNUMA, then at some point
> he / she finds out there is no enough memory in each specified pnode to
> accommodate his / her requirement, then he / she changes the
> configuration on the fly.
> 
Mmm... no, I wasn't thinking about 'on the fly' changes. But perhaps I
don't get what you mean with 'on the fly'.

> In reality, if you have mass deployment you probably won't do that.  You
> might just want to use the same configuration all the time. 
>
Yes, and given you have the same VMs all the times, and especially if
deploy also happens in the same or very similar order all the times,
you'll *always* end up with only 1G left free on all pnodes, and hence
you will *never* be able to turn on vnode-to-pnode mapping to that last
VM that you are deploying that have with 2G vnodes.

> > > > If we allow the user (or the automatic placement algorithm) to specify a
> > > > bitmap of pnode for each vnode, he could put, say, vnode #1 on pnode #0
> > > > and #2, which maybe are really close (in terms of NUMA distances) to
> > > > each other, and vnode #2 to pnode #5 and #6 (close to each others too).
> > > > This would give worst performance than having each vnode on just one
> > > > pnode, but, most likely, better performance than the scenario described
> > > > right above.
> > > > 
> > > 
> > > I get what you mean. So by writing the above paragraphs, you sort of
> > > confirm that there still are too many implications in the algorithms,
> > > right? A user cannot just tell from the interface what the behaviour is
> > > going to be.  
> > >
> > An user can tell that, if he wants a vnode 2GB wide, and there is no
> > pnode with 2GB free, but the sum of free memory in pnode #4 and #6 is >=
> > 2GB, he can still use vNUMA, by paying the (small or high will depends
> > on more factors) price of having that vnode split in two (or more!).
> > 
> 
> What if #4 and #6 do have > 2GB ram each?  What will the behaviour be?
>
Xen decides, while allocating, as it is right now.

> Does it imply better or worse performance?  Again, I'm thinking about
> migrating the same configuration to another version of Xen, or even just
> another host that has enough memory.
> 
> I guess the best we can say (at this point, if we're to use a bitmap),
> is that memory will allocate from the nodes specified, user should not
> expect any specific behaviour -- that basically is telling user not to
> specify multiple nodes...
> 
So, vNUMA is a performance optimization. Actually, no, vNUMA is a
feature someone may want independently from performance, but vNUMA nodes
to pNUMA nodes mapping is **definittely** a performance optimization.

In may experience, performance depends on a lot of things and factors.
There are very few features that just "give better or worse
performance". There are some, sure, but I don't think vNUMA falls in
this category. For instance, in this case, performance will depend on
the workload, on the host load conditions, on the order in which you
boot the VMs on other constraints (e.g., whether or not you use devices
attached to different IO controllers in IONUMA boxes), and that's like
this already.

We should work hard not to cause performance regressions, i.e., making
things worse for people just upgrading Xen and not changing anything
else, but, in this case, not changing *anything* *else* means taking
into account all the factors above (or more!). Then yes, if one really
manages to keep all the involved variables and factors fixed, then just
upgrading Xen should not degrade performance. Actually, it ideally would
make things better... isn't this what we're supposed to do here all
day? :-P :-P

IOW, in this case, if one wants top determinism, he can just set _only_
one bit in the bitmap and forget about it. OTOH, if one can trade a
slight (potential) degradation in perf (which of course also means less
determinism) with increased flexibility, the bitmap would help.

Allow me another example, migration, since you're mentioning it
yourself. Assume we have vnode #0 mapped on pnode #2 and vnode #1 mapped
on pnode #4 on host X, with distance between these pnodes being 20. We
migrate the VM to host Y, and there is not enough free memory on any
couple of pnodes with distance 20, so we pick two with distance 40: this
will alter the performance. What if there is free memory in such two
pnodes, because they're bigger on Y than on X, but there happens to be
more contention on the pcpus of those nodes on Y than on X (as a
consequence o them being bigger, or because on Y there are more small
domains, wrt memory, but with more vcpus, or ...): this will alter the
performance (making things worse). What if there is less contention:
this will alter the performance (making things better). What if there
are not even just two pnodes with enough free memory, no matter the
distance, and we need to turn vnode-to-pnode mapping off: this will
alter the performance. What if the new host is non-NUMA: this will alter
the performance.

So, in summary. vnode-to-pnode is helping performance for you workload:
jolly good. You migrate the VM(s): all bets are off! You upgrade Xen:
well, if you don't change anything else, things should be equal or
better; if you change something else, there is the chance that you need
to re-evaluate the performance and adapt the workload... but that's the
case already, no matter whether a bitmap or an integer is used.

> > I think there would be room for some increased user satisfaction in
> > this, even without knowing much and/or being in control on how exactly
> > the split happens, as there are chances for performance to be (if the
> > thing is used properly) better than in the no-vNUMA case, which is what
> > we're after.
> > 
> > > You can of course say the algorithm is fixed but I don't
> > > think we want to do that?
> > > 
> > I don't want to, but I don't think it's needed.
> > 
> > Anyway, I'm more than ok if we want to defer the discussion to after
> > this series is in. It will require a further change in the interface,
> > but I don't think it would be a terrible price to pay, if we decide the
> > feature is worth.
> > 
> > Or, and that was the other thing I was suggesting, we can have the
> > bitmap in vnode_info since now, but then only accept ints in xl config
> > parsing, and enforce the weight of the bitmap to be 1 (and perhaps print
> > a warning) for now. This would not require changing the API in future,
> > it'd just be a matter of changing the xl config file parsing. The
> > "problem" would still stand for libxl callers different than xl, though,
> > I know.
> > 
> 
> Note that the uint32_t mapping has a very rigid semantics.  
>
It has. Then, whether or not this rigidness buys you more consistent
performance, for example across migrations, it is questionable, and my
opinion would be that no, it does not.

> As long as
> you give me a well-defined semantics of that bitmap I'm fine with this.
> Otherwise I feel more comfortable with the interface as it is.
> 
The well defined semantic is:

<<The memory will be allocated out of the NUMA nodes specified in the
bitmap. If the bitmap has more than 1 bit set, how the memory is
actually split between the nodes, is determined by libxl and Xen
internals, taking into account the amount of free memory, system load,
and other factors. Having a bitmap provides flexibility, and increases
the chances of being able to exploit the vnode-to-pnode mapping feature
(e.g., even on highly packed hosts), at the cost of (potentially)
diminished determinism. For top determinism, always set just one bit>>

BTW, sorry for being so long... I at least hope I've expressed my view
clearly enough. Let me also repeat that I'm fine leaving this alone and
(perhaps) coming back to it later, when the series is merged.

Thanks and Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 12/24] hvmloader: retrieve vNUMA information from hypervisor
  2015-02-12 19:44 ` [PATCH v5 12/24] hvmloader: retrieve vNUMA information from hypervisor Wei Liu
  2015-02-13 15:58   ` Andrew Cooper
@ 2015-02-17 11:36   ` Jan Beulich
  2015-02-17 11:42     ` Wei Liu
  1 sibling, 1 reply; 94+ messages in thread
From: Jan Beulich @ 2015-02-17 11:36 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, andrew.cooper3, dario.faggioli, ian.jackson,
	xen-devel, ufimtseva

>>> On 12.02.15 at 20:44, <wei.liu2@citrix.com> wrote:
> Hvmloader issues XENMEM_get_vnumainfo hypercall and stores the
> information retrieved in scratch space for later use.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

I take it that these don't depend on the previous patches in this
series, i.e. can be applied right away (perhaps while adjusting
for Andrew's comments, some of which I would have given too).

Another remark though: You Cc-ed me on all of the tool stack only
patches too, which - together with their replies - makes up for
over 50 mails (up to now). I'd highly appreciate saving me from
wading through such amounts of mails, by Cc-ing me only on
those that I actually _need_ to look at. I'll see (and have a
chance to comment on) the others too anyway via xen-devel.

Jan

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 12/24] hvmloader: retrieve vNUMA information from hypervisor
  2015-02-17 11:36   ` Jan Beulich
@ 2015-02-17 11:42     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-17 11:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, ian.campbell, andrew.cooper3, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva

On Tue, Feb 17, 2015 at 11:36:06AM +0000, Jan Beulich wrote:
> >>> On 12.02.15 at 20:44, <wei.liu2@citrix.com> wrote:
> > Hvmloader issues XENMEM_get_vnumainfo hypercall and stores the
> > information retrieved in scratch space for later use.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>
> 
> I take it that these don't depend on the previous patches in this
> series, i.e. can be applied right away (perhaps while adjusting
> for Andrew's comments, some of which I would have given too).
> 

Yes. These hvmloader patches can be applied independently.

> Another remark though: You Cc-ed me on all of the tool stack only
> patches too, which - together with their replies - makes up for
> over 50 mails (up to now). I'd highly appreciate saving me from
> wading through such amounts of mails, by Cc-ing me only on
> those that I actually _need_ to look at. I'll see (and have a
> chance to comment on) the others too anyway via xen-devel.
> 

Sorry for the noise. I will only cc you hypervisor patches (and
perhaps cover letter) in the future.

Wei.

> Jan

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-13 15:40   ` Andrew Cooper
@ 2015-02-17 12:56     ` Wei Liu
  2015-03-02 15:13       ` Ian Campbell
  0 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-17 12:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, xen-devel,
	JBeulich, ufimtseva

On Fri, Feb 13, 2015 at 03:40:32PM +0000, Andrew Cooper wrote:
[...]
> > +    return 0;
> > +}
> > +
> > +/* Check if vNUMA configuration is valid:
> > + *  1. all pnodes inside vnode_to_pnode array are valid
> > + *  2. one vcpu belongs to and only belongs to one vnode
> > + *  3. each vmemrange is valid and doesn't overlap with each other
> > + */
> > +int libxl__vnuma_config_check(libxl__gc *gc,
> > +                              const libxl_domain_build_info *b_info,
> > +                              const libxl__domain_build_state *state)
> > +{
> > +    int i, j, rc = ERROR_VNUMA_CONFIG_INVALID, nr_nodes;
> 
> i, j and nr_nodes are all semantically unsigned.
> 

Fixed.

> > +    libxl_numainfo *ninfo = NULL;
> > +    uint64_t total_memkb = 0;
> > +    libxl_bitmap cpumap;
> > +    libxl_vnode_info *p;
> > +
> > +    libxl_bitmap_init(&cpumap);
> > +
> > +    /* Check pnode specified is valid */
> > +    ninfo = libxl_get_numainfo(CTX, &nr_nodes);
> > +    if (!ninfo) {
> > +        LOG(ERROR, "libxl_get_numainfo failed");
> > +        goto out;
> > +    }
> > +
> > +    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
> > +        uint32_t pnode;
> > +
> > +        p = &b_info->vnuma_nodes[i];
> > +        pnode = p->pnode;
> > +
> > +        /* The pnode specified is not valid? */
> > +        if (pnode >= nr_nodes) {
> > +            LOG(ERROR, "Invalid pnode %d specified", pnode);
> 
> pnode is uint32_t, so should be %u
> 

Fixed.

> > +            goto out;
> > +        }
> > +
> > +        total_memkb += p->memkb;
> > +    }
> > +
> > +    if (total_memkb != b_info->max_memkb) {
> > +        LOG(ERROR, "Amount of memory mismatch (0x%"PRIx64" != 0x%"PRIx64")",
> > +            total_memkb, b_info->max_memkb);
> > +        goto out;
> > +    }
> > +
> > +    /* Check vcpu mapping */
> > +    libxl_cpu_bitmap_alloc(CTX, &cpumap, b_info->max_vcpus);
> > +    libxl_bitmap_set_none(&cpumap);
> 
> Worth using/making libxl_cpu_bitmap_zalloc(), or perhaps making this a
> defined semantic of the alloc() function?  This would seem to be a very
> common pair of operations to perform.
> 

Actually libxl_bitmap_alloc already uses calloc, so the bitmap is always
set to all zeros.

> > +    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
> > +        p = &b_info->vnuma_nodes[i];
> > +        libxl_for_each_set_bit(j, p->vcpus) {
> > +            if (!libxl_bitmap_test(&cpumap, j))
> > +                libxl_bitmap_set(&cpumap, j);
> > +            else {
> > +                LOG(ERROR, "Vcpu %d assigned more than once", j);
> > +                goto out;
> > +            }
> > +        }
> 
> This libxl_for_each_set_bit() loop can be optimised to a
> bitmap_intersects() for the error condition, and bitmap_or() for the
> success case.
> 
> > +    }
> > +
> > +    for (i = 0; i < b_info->max_vcpus; i++) {
> > +        if (!libxl_bitmap_test(&cpumap, i)) {
> > +            LOG(ERROR, "Vcpu %d is not assigned to any vnode", i);
> > +            goto out;
> > +        }
> 
> This loop can be optimised to !bitmap_all_set().
> 

I can introduce a new patch set of bitmap operations and switch to that
later.  Now this series has grown to almost 30 patches and I would like
to keep this series focus mostly on vNUMA related stuffs.

Wei.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 10/24] libxl: functions to build vmemranges for PV guest
  2015-02-13 15:49   ` Andrew Cooper
@ 2015-02-17 14:08     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-17 14:08 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, ian.campbell, dario.faggioli, ian.jackson, xen-devel,
	JBeulich, ufimtseva

On Fri, Feb 13, 2015 at 03:49:44PM +0000, Andrew Cooper wrote:
[...]
> >  
> > +
> > +int libxl__vnuma_build_vmemrange_pv_generic(libxl__gc *gc,
> > +                                            uint32_t domid,
> > +                                            libxl_domain_build_info *b_info,
> > +                                            libxl__domain_build_state *state)
> > +{
> > +    int i;
> > +    uint64_t next;
> > +    xen_vmemrange_t *v = NULL;
> > +
> > +    /* Generate one vmemrange for each virtual node. */
> > +    GCREALLOC_ARRAY(v, b_info->num_vnuma_nodes);
> > +    next = 0;
> > +    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
> > +        libxl_vnode_info *p = &b_info->vnuma_nodes[i];
> > +
> > +        v[i].start = next;
> > +        v[i].end = next + (p->memkb << 10);
> > +        v[i].flags = 0;
> > +        v[i].nid = i;
> > +
> > +        next = v[i].end;
> 
> Using "start" and "end", this would appear to have a fencepost error
> which a start/size pair wouldn't have.
> 

Are you suggesting I change to use "start" and "size"? If so I don't
think that's possible. xen_vmemrange_t is part of the hypervisor
interface.

Wei.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen for HVM guest
  2015-02-12 19:44 ` [PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
  2015-02-13 14:21   ` Ian Jackson
@ 2015-02-17 14:26   ` Dario Faggioli
  2015-02-17 14:41     ` Wei Liu
  1 sibling, 1 reply; 94+ messages in thread
From: Dario Faggioli @ 2015-02-17 14:26 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 597 bytes --]

On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:

> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -407,7 +407,7 @@ static int setup_guest(xc_interface *xch,
>              new_memflags |= XENMEMF_exact_node_request;
>          }
>  
> -        end_pages = args->vmemranges[i].end >> PAGE_SHIFT;
> +        end_pages = args->vmemranges[vmemid].end >> PAGE_SHIFT;
>
What's this? I suspect this should be using vmemid already in patch 16,
shouldn't it?

Other than this,

Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen for HVM guest
  2015-02-17 14:26   ` Dario Faggioli
@ 2015-02-17 14:41     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-17 14:41 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, JBeulich, Andrew Cooper, xen-devel, ufimtseva,
	Ian Jackson, Ian Campbell

On Tue, Feb 17, 2015 at 02:26:23PM +0000, Dario Faggioli wrote:
> On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> 
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -407,7 +407,7 @@ static int setup_guest(xc_interface *xch,
> >              new_memflags |= XENMEMF_exact_node_request;
> >          }
> >  
> > -        end_pages = args->vmemranges[i].end >> PAGE_SHIFT;
> > +        end_pages = args->vmemranges[vmemid].end >> PAGE_SHIFT;
> >
> What's this? I suspect this should be using vmemid already in patch 16,
> shouldn't it?
> 

This hunk belongs to previous patch. It should be squashed to previous
one.

> Other than this,
> 
> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
> 

Thanks.

Wei.

> Regards,
> Dario

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 11/24] libxl: build, check and pass vNUMA info to Xen for PV guest
  2015-02-12 19:44 ` [PATCH v5 11/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
  2015-02-13 15:54   ` Andrew Cooper
@ 2015-02-17 14:49   ` Dario Faggioli
  1 sibling, 0 replies; 94+ messages in thread
From: Dario Faggioli @ 2015-02-17 14:49 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 2077 bytes --]

On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> Transform the user supplied vNUMA configuration into libxl internal
> representations, and finally libxc representations. Check validity of
> the configuration along the line.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
 
> +static int set_vnuma_info(libxl__gc *gc, uint32_t domid,
> +                          const libxl_domain_build_info *info,
> +                          const libxl__domain_build_state *state)
> +{
> +    int rc = 0;
> +    int i, nr_vdistance;
> +    unsigned int *vcpu_to_vnode, *vnode_to_pnode, *vdistance = NULL;
> +
> +    vcpu_to_vnode = libxl__calloc(gc, info->max_vcpus,
> +                                  sizeof(unsigned int));
> +    vnode_to_pnode = libxl__calloc(gc, info->num_vnuma_nodes,
> +                                   sizeof(unsigned int));
> +
> +    nr_vdistance = info->num_vnuma_nodes * info->num_vnuma_nodes;
> +    vdistance = libxl__calloc(gc, nr_vdistance, sizeof(unsigned int));
> +
> +    for (i = 0; i < info->num_vnuma_nodes; i++) {
> +        libxl_vnode_info *v = &info->vnuma_nodes[i];
> +        int bit;
> +
> +        /* vnode to pnode mapping */
> +        vnode_to_pnode[i] = v->pnode;
> +
> +        /* vcpu to vnode mapping */
> +        libxl_for_each_set_bit(bit, v->vcpus)
> +            vcpu_to_vnode[bit] = i;
> +
'bit' made the reader (or at least it made me) think that we're dealing
with something that have to do with actual bits, while it's just an
index, as usual.

For that reason, I'd have gone for 'j' (as 'i' is being used already).

However, that's certainly a minor thing, so with or without this change:

Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 10/24] libxl: functions to build vmemranges for PV guest
  2015-02-12 19:44 ` [PATCH v5 10/24] libxl: functions to build vmemranges for PV guest Wei Liu
  2015-02-13 15:49   ` Andrew Cooper
@ 2015-02-17 15:28   ` Dario Faggioli
  2015-02-17 15:32     ` Wei Liu
  1 sibling, 1 reply; 94+ messages in thread
From: Dario Faggioli @ 2015-02-17 15:28 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 834 bytes --]

On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:

> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -339,6 +339,79 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
>      return 0;
>  }
>  
> +/* Return 0 on success, ERROR_* on failure. */
> +int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
> +                                      uint32_t domid,
> +                                      libxl_domain_build_info *b_info,
> +                                      libxl__domain_build_state *state)
>
The doc comment does not look super useful as it is... ERRROR_* is
ERROR_NOMEM, right?

Perhaps say that, along with a few words on when it happens (or just get
rid of the comment entiretly)

In any case,

Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 10/24] libxl: functions to build vmemranges for PV guest
  2015-02-17 15:28   ` Dario Faggioli
@ 2015-02-17 15:32     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-17 15:32 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, JBeulich, Andrew Cooper, xen-devel, ufimtseva,
	Ian Jackson, Ian Campbell

On Tue, Feb 17, 2015 at 03:28:15PM +0000, Dario Faggioli wrote:
> On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> 
> > --- a/tools/libxl/libxl_x86.c
> > +++ b/tools/libxl/libxl_x86.c
> > @@ -339,6 +339,79 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
> >      return 0;
> >  }
> >  
> > +/* Return 0 on success, ERROR_* on failure. */
> > +int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
> > +                                      uint32_t domid,
> > +                                      libxl_domain_build_info *b_info,
> > +                                      libxl__domain_build_state *state)
> >
> The doc comment does not look super useful as it is... ERRROR_* is
> ERROR_NOMEM, right?
> 

It could also be ERROR_FAIL, if e820_host_sanitize fails.

> Perhaps say that, along with a few words on when it happens (or just get
> rid of the comment entiretly)
> 
> In any case,
> 
> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
> 

Thanks.

Wei.

> Dario

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-12 19:44 ` [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check Wei Liu
  2015-02-13 14:15   ` Ian Jackson
  2015-02-13 15:40   ` Andrew Cooper
@ 2015-02-17 16:38   ` Dario Faggioli
  2015-02-22 15:47     ` Wei Liu
  2 siblings, 1 reply; 94+ messages in thread
From: Dario Faggioli @ 2015-02-17 16:38 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 6137 bytes --]

On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> This function is used to check whether vNUMA configuration (be it
> auto-generated or supplied by user) is valid.
> 
> Define a new error code ERROR_VNUMA_CONFIG_INVALID.
> 
> The checks performed can be found in the comment of the function.
> 
> This vNUMA function (and future ones) is placed in a new file called
> libxl_vnuma.c
> 
I'm not sure whether having more files is a good or a bad thing. I would
say that libxl_numa.c is rather small, and can certainly accept more
code, if this kind of consolidation is desirable.

If it were me doing this, I'd put things there, but I don't have a super
strong opinion, and I'm also aware that I'm saying this pretty late, so
I'll just state this as a preference, and leave it to you and the
(other) maintainers.

> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> ---
> Changes in v5:
> 1. Define and use new error code.
> 2. Use LOG macro.
> 3. Fix hard tabs.
> 
> Changes in v4:
> 1. Adapt to new interface.
> 
> Changes in v3:
> 1. Rewrite commit log.
> 2. Shorten two error messages.
> ---
>  tools/libxl/Makefile         |   2 +-
>  tools/libxl/libxl_internal.h |   7 +++
>  tools/libxl/libxl_types.idl  |   1 +
>  tools/libxl/libxl_vnuma.c    | 131 +++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 140 insertions(+), 1 deletion(-)
>  create mode 100644 tools/libxl/libxl_vnuma.c
> 
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index 7329521..1b16598 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -93,7 +93,7 @@ LIBXL_LIBS += -lyajl
>  LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
>  			libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
>  			libxl_internal.o libxl_utils.o libxl_uuid.o \
> -			libxl_json.o libxl_aoutils.o libxl_numa.o \
> +			libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o \
>  			libxl_save_callout.o _libxl_save_msgs_callout.o \
>  			libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
>  LIBXL_OBJS += libxl_genid.o
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 6d3ac58..258be0d 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -3394,6 +3394,13 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
>      libxl_bitmap_copy(CTX, &cndt->nodemap, nodemap);
>  }
>  
> +/* Check if vNUMA config is valid. Returns 0 if valid,
> + * ERROR_VNUMA_CONFIG_INVALID otherwise.
> + */
> +int libxl__vnuma_config_check(libxl__gc *gc,
> +                              const libxl_domain_build_info *b_info,
> +                              const libxl__domain_build_state *state);
> +
>  _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
>                                     const libxl_ms_vm_genid *id);
>  
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 14c7e7c..23951fc 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -63,6 +63,7 @@ libxl_error = Enumeration("error", [
>      (-17, "DEVICE_EXISTS"),
>      (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
>      (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
> +    (-20, "VNUMA_CONFIG_INVALID"),
>      ], value_namespace = "")
>  
>  libxl_domain_type = Enumeration("domain_type", [
> diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
> new file mode 100644
> index 0000000..fa5aa8d
> --- /dev/null
> +++ b/tools/libxl/libxl_vnuma.c
> @@ -0,0 +1,131 @@
> +/*
> + * Copyright (C) 2014      Citrix Ltd.
> + * Author Wei Liu <wei.liu2@citrix.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as published
> + * by the Free Software Foundation; version 2.1 only. with the special
> + * exception on linking described in file LICENSE.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + */
> +#include "libxl_osdeps.h" /* must come before any other headers */
> +#include "libxl_internal.h"
> +#include <stdlib.h>
> +
> +/* Sort vmemranges in ascending order with "start" */
> +static int compare_vmemrange(const void *a, const void *b)
> +{
> +    const xen_vmemrange_t *x = a, *y = b;
> +    if (x->start < y->start)
> +        return -1;
> +    if (x->start > y->start)
> +        return 1;
> +    return 0;
> +}
> +
> +/* Check if vNUMA configuration is valid:
> + *  1. all pnodes inside vnode_to_pnode array are valid
> + *  2. one vcpu belongs to and only belongs to one vnode
>
something like "each vcpu belongs to one and only one vnode" sounds
better here...

> + *  3. each vmemrange is valid and doesn't overlap with each other
>
"doesn't overlap with any other one" perhaps (of course, I'm not a
native speaker, so I can be very wrong! :-D)

> + */
> +int libxl__vnuma_config_check(libxl__gc *gc,
> +                              const libxl_domain_build_info *b_info,
> +                              const libxl__domain_build_state *state)
> +{
> +    int i, j, rc = ERROR_VNUMA_CONFIG_INVALID, nr_nodes;
>
You should init nr_nodes to = 0 (see below).

> +    libxl_numainfo *ninfo = NULL;
> +    uint64_t total_memkb = 0;
> +    libxl_bitmap cpumap;
> +    libxl_vnode_info *p;
> +
*v, or *vnode, or *vinfo, all sounds better than *p.

> +    rc = 0;
> +out:
> +    if (ninfo) libxl_numainfo_dispose(ninfo);
>
What you want here, to free a libxl_numainfo, is
libxl_numainfo_list_free(ninfo,nr_nodes), which, BTW, can be called
without checking whether ninfo is NULL, _provided_ you initialize
nr_nodes to 0.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-13 15:12     ` Wei Liu
  2015-02-13 15:39       ` Elena Ufimtseva
@ 2015-02-17 16:44       ` Dario Faggioli
  1 sibling, 0 replies; 94+ messages in thread
From: Dario Faggioli @ 2015-02-17 16:44 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 1761 bytes --]

On Fri, 2015-02-13 at 15:12 +0000, Wei Liu wrote:
> On Fri, Feb 13, 2015 at 02:15:47PM +0000, Ian Jackson wrote:
> > Wei Liu writes ("[PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check"):
> > > This function is used to check whether vNUMA configuration (be it
> > > auto-generated or supplied by user) is valid.
> > 
> > This looks plausible, but I think you should explain what the impact
> > of this patch is.  Presumably the intent is to replace various later
> > failures with ERROR_FAIL with something more useful and more
> > specific ?
> > 
> 
> Yes, providing more useful error message is on aspect. Another aspect is
> just to do sanity check -- passing an invalid layout to guest doesn't
> make much sense.
> 
I agree with Wei. There are a lot of possible variants and combinations
of all these parameters, and the earlier we assess the entire set makes
sense the better.

> > Are there any cases which this new check forbids but which are
> > currently accepted by libxl ?  If so then we have to think about
> > compatibility.
> > 
> 
> First thing is there is no previous supported vNUMA interface in
> toolstack so there won't be a situation where previous good config
> doesn't pass this check.
> 
> Second thing is if user supplies a config without vNUMA configuration
> this function will not get called, so it won't have any effect.
> 
Indeed.

> > Also I would like to see an ack from the authors of the vnuma support,
> > as I'm not familiar enough with vnuma to fully understand the
> > semantics of the new checks.
> > 
> 
> Elena and Dario, what do you think?
> 
I made some comments on the code, but, those aside, the checks Wei
performs are the correct ones for me.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-13 16:11           ` Elena Ufimtseva
@ 2015-02-17 16:51             ` Dario Faggioli
  2015-02-22 15:50               ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Dario Faggioli @ 2015-02-17 16:51 UTC (permalink / raw)
  To: ufimtseva
  Cc: Wei Liu, Ian Campbell, Andrew Cooper, xen-devel, JBeulich, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 1294 bytes --]

On Fri, 2015-02-13 at 11:11 -0500, Elena Ufimtseva wrote:
> On Fri, Feb 13, 2015 at 11:06 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> > On Fri, Feb 13, 2015 at 10:39:25AM -0500, Elena Ufimtseva wrote:

> >> Any sanity checks for distances?
> >>
> >
> > The same applies, what is a valid distance what is not? I guess zero is
> > not valid? Or do we enforce that the distance to local node must be
> > smaller than or equal to the distance to remote node?
> 
> Yes, I think the second condition is enough for strict checking.
> 
That would not harm, probably but I honestly would not put down much
enforcement on distance values. We can enforce non-zero values, we can
enforce local < remote, we can enforce the symmetry of the distance
matrix, but, really, I wouldn't go that far.

What matters most wrt specification of the distances, is to provide a
sane default, in case one does not want to bother writing it down (or
does not want to write it down completely, as it could be tedious).

So, if one does not say anything, we should come up with something that
makes sense (and I'll say more about this while reviewing patch 24). If
the user does say something, I would just go with that... perhaps after
printing a warning, but no more than that.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-17 16:38   ` Dario Faggioli
@ 2015-02-22 15:47     ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-22 15:47 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, JBeulich, Andrew Cooper, xen-devel, ufimtseva,
	Ian Jackson, Ian Campbell

On Tue, Feb 17, 2015 at 04:38:02PM +0000, Dario Faggioli wrote:
[...]
> > +
> > +/* Check if vNUMA configuration is valid:
> > + *  1. all pnodes inside vnode_to_pnode array are valid
> > + *  2. one vcpu belongs to and only belongs to one vnode
> >
> something like "each vcpu belongs to one and only one vnode" sounds
> better here...
> 
> > + *  3. each vmemrange is valid and doesn't overlap with each other
> >
> "doesn't overlap with any other one" perhaps (of course, I'm not a
> native speaker, so I can be very wrong! :-D)
> 
> > + */
> > +int libxl__vnuma_config_check(libxl__gc *gc,
> > +                              const libxl_domain_build_info *b_info,
> > +                              const libxl__domain_build_state *state)
> > +{
> > +    int i, j, rc = ERROR_VNUMA_CONFIG_INVALID, nr_nodes;
> >
> You should init nr_nodes to = 0 (see below).
> 
> > +    libxl_numainfo *ninfo = NULL;
> > +    uint64_t total_memkb = 0;
> > +    libxl_bitmap cpumap;
> > +    libxl_vnode_info *p;
> > +
> *v, or *vnode, or *vinfo, all sounds better than *p.
> 
> > +    rc = 0;
> > +out:
> > +    if (ninfo) libxl_numainfo_dispose(ninfo);
> >
> What you want here, to free a libxl_numainfo, is
> libxl_numainfo_list_free(ninfo,nr_nodes), which, BTW, can be called
> without checking whether ninfo is NULL, _provided_ you initialize
> nr_nodes to 0.
> 

All fixed. Thanks for reviewing.

Wei.

> Regards,
> Dario

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-17 16:51             ` Dario Faggioli
@ 2015-02-22 15:50               ` Wei Liu
  0 siblings, 0 replies; 94+ messages in thread
From: Wei Liu @ 2015-02-22 15:50 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, Ian Campbell, Andrew Cooper, xen-devel, JBeulich,
	Ian Jackson, ufimtseva

On Tue, Feb 17, 2015 at 04:51:03PM +0000, Dario Faggioli wrote:
> On Fri, 2015-02-13 at 11:11 -0500, Elena Ufimtseva wrote:
> > On Fri, Feb 13, 2015 at 11:06 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> > > On Fri, Feb 13, 2015 at 10:39:25AM -0500, Elena Ufimtseva wrote:
> 
> > >> Any sanity checks for distances?
> > >>
> > >
> > > The same applies, what is a valid distance what is not? I guess zero is
> > > not valid? Or do we enforce that the distance to local node must be
> > > smaller than or equal to the distance to remote node?
> > 
> > Yes, I think the second condition is enough for strict checking.
> > 
> That would not harm, probably but I honestly would not put down much
> enforcement on distance values. We can enforce non-zero values, we can
> enforce local < remote, we can enforce the symmetry of the distance
> matrix, but, really, I wouldn't go that far.
> 
> What matters most wrt specification of the distances, is to provide a
> sane default, in case one does not want to bother writing it down (or
> does not want to write it down completely, as it could be tedious).
> 
> So, if one does not say anything, we should come up with something that
> makes sense (and I'll say more about this while reviewing patch 24). If
> the user does say something, I would just go with that... perhaps after
> printing a warning, but no more than that.
> 

The checking here and a sensible default are two things and are parallel
to each other, I think.

I've come up with something like this, to enforce local distance not
larger than remote distance.

    /* Check vdistances */
    for (i = 0; i < b_info->num_vnuma_nodes; i++) {
        uint32_t local_distance, remote_distance;

        v = &b_info->vnuma_nodes[i];
        local_distance = v->distances[i];

        for (j = 0; j < v->num_distances; j++) {
            if (i == j) continue;
            remote_distance = v->distances[j];
            if (local_distance > remote_distance) {
                LOG(ERROR,
                    "Distance from %u to %u smaller than %u's local distance",
                    i, j, i);
                goto out;
            }
        }
    }

Wei.

> Regards,
> Dario

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 24/24] xl: vNUMA support
  2015-02-12 19:44 ` [PATCH v5 24/24] xl: vNUMA support Wei Liu
@ 2015-02-24 16:19   ` Dario Faggioli
  2015-02-24 16:31     ` Wei Liu
  0 siblings, 1 reply; 94+ messages in thread
From: Dario Faggioli @ 2015-02-24 16:19 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 2631 bytes --]

On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> This patch includes configuration options parser and documentation.
> 
> Please find the hunk to xl.cfg.pod.5 for more information.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>
This all looks pretty good to me. I only have one comment and a
question.


> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index ec7fb2d..f52daf9 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c

[...]

> +                if (!strcmp("pnode", option)) {
> +                    val = strtoul(value, &endptr, 10);
> +                    ABORT_IF_FAILED(value);
> +                    if (val >= nr_nodes) {
> +                        fprintf(stderr,
> +                                "xl: invalid pnode number: %lu\n", val);
> +                        exit(1);
> +                    }
> +                    p->pnode = val;
>
This is, to all the effects, a form of placement so, if this part of
vNUMA specification is present, you should disable the automatic
placement happening in libxl.

This is all it takes to do so (look inside parse_vcpu_affinity() if you
need more insights):

libxl_defbool_set(&b_info->numa_placement, false);

> +                } else if (!strcmp("size", option)) {
> +                    val = strtoul(value, &endptr, 10);
> +                    ABORT_IF_FAILED(value);
> +                    p->memkb = val << 10;
> +                } else if (!strcmp("vcpus", option)) {
> +                    libxl_string_list cpu_spec_list;
> +                    int cpu;
> +                    unsigned long s, e;
> +
> +                    split_string_into_string_list(value, ",", &cpu_spec_list);
> +                    len = libxl_string_list_length(&cpu_spec_list);
> +
> +                    for (j = 0; j < len; j++) {
> +                        parse_range(cpu_spec_list[j], &s, &e);
> +                        for (cpu = s; cpu <=e; cpu++)
> +                            libxl_bitmap_set(&p->vcpus, cpu);
> +                    }
> +                    libxl_string_list_dispose(&cpu_spec_list);
>
I think that using vcpupin_parse() for "vcpus=" would allow for more
flexible syntax (i.e., things like "3-8,^5"), and save some code. The
only downside is that it also accepts things like "nodes:1", which we
clearly don't want in here... is that why you are not going for it?

If you decide to use it, BTW, you may want to change its name (again!)

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 24/24] xl: vNUMA support
  2015-02-24 16:19   ` Dario Faggioli
@ 2015-02-24 16:31     ` Wei Liu
  2015-02-24 16:44       ` Dario Faggioli
  0 siblings, 1 reply; 94+ messages in thread
From: Wei Liu @ 2015-02-24 16:31 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, JBeulich, Andrew Cooper, xen-devel, ufimtseva,
	Ian Jackson, Ian Campbell

On Tue, Feb 24, 2015 at 04:19:02PM +0000, Dario Faggioli wrote:
> On Thu, 2015-02-12 at 19:44 +0000, Wei Liu wrote:
> > This patch includes configuration options parser and documentation.
> > 
> > Please find the hunk to xl.cfg.pod.5 for more information.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> >
> This all looks pretty good to me. I only have one comment and a
> question.
> 
> 
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index ec7fb2d..f52daf9 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> 
> [...]
> 
> > +                if (!strcmp("pnode", option)) {
> > +                    val = strtoul(value, &endptr, 10);
> > +                    ABORT_IF_FAILED(value);
> > +                    if (val >= nr_nodes) {
> > +                        fprintf(stderr,
> > +                                "xl: invalid pnode number: %lu\n", val);
> > +                        exit(1);
> > +                    }
> > +                    p->pnode = val;
> >
> This is, to all the effects, a form of placement so, if this part of
> vNUMA specification is present, you should disable the automatic
> placement happening in libxl.
> 
> This is all it takes to do so (look inside parse_vcpu_affinity() if you
> need more insights):
> 
> libxl_defbool_set(&b_info->numa_placement, false);
> 

Will fix this.

> > +                } else if (!strcmp("size", option)) {
> > +                    val = strtoul(value, &endptr, 10);
> > +                    ABORT_IF_FAILED(value);
> > +                    p->memkb = val << 10;
> > +                } else if (!strcmp("vcpus", option)) {
> > +                    libxl_string_list cpu_spec_list;
> > +                    int cpu;
> > +                    unsigned long s, e;
> > +
> > +                    split_string_into_string_list(value, ",", &cpu_spec_list);
> > +                    len = libxl_string_list_length(&cpu_spec_list);
> > +
> > +                    for (j = 0; j < len; j++) {
> > +                        parse_range(cpu_spec_list[j], &s, &e);
> > +                        for (cpu = s; cpu <=e; cpu++)
> > +                            libxl_bitmap_set(&p->vcpus, cpu);
> > +                    }
> > +                    libxl_string_list_dispose(&cpu_spec_list);
> >
> I think that using vcpupin_parse() for "vcpus=" would allow for more
> flexible syntax (i.e., things like "3-8,^5"), and save some code. The
> only downside is that it also accepts things like "nodes:1", which we
> clearly don't want in here... is that why you are not going for it?
> 

Yes. I don't want "nodes" so I didn't reuse that function, and at that
point I didn't think it's critical to support "^X".

If you think this "^X" syntax is important, I can check for "nodes"
before calling vcpupin_parse.

> If you decide to use it, BTW, you may want to change its name (again!)
> 

vcpus_parse? It's not restricted to vcpu pinning in any way, I think.

Wei.


> Regards,
> Dario

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 24/24] xl: vNUMA support
  2015-02-24 16:31     ` Wei Liu
@ 2015-02-24 16:44       ` Dario Faggioli
  0 siblings, 0 replies; 94+ messages in thread
From: Dario Faggioli @ 2015-02-24 16:44 UTC (permalink / raw)
  To: Wei Liu
  Cc: JBeulich, Andrew Cooper, xen-devel, ufimtseva, Ian Jackson, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 2230 bytes --]

On Tue, 2015-02-24 at 16:31 +0000, Wei Liu wrote:
> On Tue, Feb 24, 2015 at 04:19:02PM +0000, Dario Faggioli wrote:

> > > +                } else if (!strcmp("size", option)) {
> > > +                    val = strtoul(value, &endptr, 10);
> > > +                    ABORT_IF_FAILED(value);
> > > +                    p->memkb = val << 10;
> > > +                } else if (!strcmp("vcpus", option)) {
> > > +                    libxl_string_list cpu_spec_list;
> > > +                    int cpu;
> > > +                    unsigned long s, e;
> > > +
> > > +                    split_string_into_string_list(value, ",", &cpu_spec_list);
> > > +                    len = libxl_string_list_length(&cpu_spec_list);
> > > +
> > > +                    for (j = 0; j < len; j++) {
> > > +                        parse_range(cpu_spec_list[j], &s, &e);
> > > +                        for (cpu = s; cpu <=e; cpu++)
> > > +                            libxl_bitmap_set(&p->vcpus, cpu);
> > > +                    }
> > > +                    libxl_string_list_dispose(&cpu_spec_list);
> > >
> > I think that using vcpupin_parse() for "vcpus=" would allow for more
> > flexible syntax (i.e., things like "3-8,^5"), and save some code. The
> > only downside is that it also accepts things like "nodes:1", which we
> > clearly don't want in here... is that why you are not going for it?
> > 
> 
> Yes. I don't want "nodes" so I didn't reuse that function, and at that
> point I didn't think it's critical to support "^X".
> 
Ok, I just wanted to be sure you were aware of the possibility. I
actually agree that supporting "^x" is not that critical here.

> If you think this "^X" syntax is important, I can check for "nodes"
> before calling vcpupin_parse.
> 
I don't think it is... TBH, I'm more attracted by the code being
potentially simpler, and less duplicate parsing logic being around, but
I appreciate that having to check for "node[s]" not being present up
front would make things look clumsy... so I'm leaving this to you, I'm
happy either way.

> vcpus_parse? It's not restricted to vcpu pinning in any way, I think.
> 
If you go for it, yes, I like this as a name.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-02-17 12:56     ` Wei Liu
@ 2015-03-02 15:13       ` Ian Campbell
  2015-03-02 15:25         ` Andrew Cooper
  0 siblings, 1 reply; 94+ messages in thread
From: Ian Campbell @ 2015-03-02 15:13 UTC (permalink / raw)
  To: Wei Liu
  Cc: ufimtseva, Andrew Cooper, dario.faggioli, ian.jackson, xen-devel,
	JBeulich

On Tue, 2015-02-17 at 12:56 +0000, Wei Liu wrote:

> > > +            LOG(ERROR, "Invalid pnode %d specified", pnode);
> > 
> > pnode is uint32_t, so should be %u

Actually PRId32 is correct for a uint32_t, although I guess %u will work
on all platforms we support.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-03-02 15:13       ` Ian Campbell
@ 2015-03-02 15:25         ` Andrew Cooper
  2015-03-02 16:05           ` Ian Campbell
  0 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2015-03-02 15:25 UTC (permalink / raw)
  To: Ian Campbell, Wei Liu
  Cc: dario.faggioli, ian.jackson, JBeulich, ufimtseva, xen-devel

On 02/03/15 15:13, Ian Campbell wrote:
> On Tue, 2015-02-17 at 12:56 +0000, Wei Liu wrote:
>
>>>> +            LOG(ERROR, "Invalid pnode %d specified", pnode);
>>> pnode is uint32_t, so should be %u
> Actually PRId32 is correct for a uint32_t, although I guess %u will work
> on all platforms we support.
>
>

PRId32 and PRIu32 differ in string representation if the top bit of
pnode is set.

pnode is unsigned, so should never be formatted with a signed identifier.

~Andrew

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check
  2015-03-02 15:25         ` Andrew Cooper
@ 2015-03-02 16:05           ` Ian Campbell
  0 siblings, 0 replies; 94+ messages in thread
From: Ian Campbell @ 2015-03-02 16:05 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, ufimtseva, dario.faggioli, ian.jackson, xen-devel, JBeulich

On Mon, 2015-03-02 at 15:25 +0000, Andrew Cooper wrote:
> On 02/03/15 15:13, Ian Campbell wrote:
> > On Tue, 2015-02-17 at 12:56 +0000, Wei Liu wrote:
> >
> >>>> +            LOG(ERROR, "Invalid pnode %d specified", pnode);
> >>> pnode is uint32_t, so should be %u
> > Actually PRId32 is correct for a uint32_t, although I guess %u will work
> > on all platforms we support.
> >
> >
> 
> PRId32 and PRIu32 differ in string representation if the top bit of
> pnode is set.
> 
> pnode is unsigned, so should never be formatted with a signed identifier.

Oops, I did indeed mean PRIu32.

My main point was that the PRI_32 stuff should be used, not %u as you
were suggesting.

Ian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2015-03-02 16:05 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
2015-02-12 19:44 ` [PATCH v5 01/24] xen: dump vNUMA information with debug key "u" Wei Liu
2015-02-13 11:50   ` Andrew Cooper
2015-02-16 14:35     ` Dario Faggioli
2015-02-12 19:44 ` [PATCH v5 02/24] xen: make two memory hypercalls vNUMA-aware Wei Liu
2015-02-13 12:00   ` Andrew Cooper
2015-02-13 13:24     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 03/24] libxc: duplicate snippet to allocate p2m_host array Wei Liu
2015-02-12 19:44 ` [PATCH v5 04/24] libxc: add p2m_size to xc_dom_image Wei Liu
2015-02-16 14:46   ` Dario Faggioli
2015-02-16 14:49     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest Wei Liu
2015-02-13 14:30   ` Andrew Cooper
2015-02-13 15:05     ` Wei Liu
2015-02-13 15:17       ` Andrew Cooper
2015-02-16 16:58   ` Dario Faggioli
2015-02-16 17:44     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 06/24] libxl: introduce vNUMA types Wei Liu
2015-02-16 14:58   ` Dario Faggioli
2015-02-16 15:17     ` Wei Liu
2015-02-16 15:56       ` Dario Faggioli
2015-02-16 16:11         ` Wei Liu
2015-02-16 16:51           ` Dario Faggioli
2015-02-16 17:38             ` Wei Liu
2015-02-17 10:42               ` Dario Faggioli
2015-02-12 19:44 ` [PATCH v5 07/24] libxl: add vmemrange to libxl__domain_build_state Wei Liu
2015-02-16 16:00   ` Dario Faggioli
2015-02-16 16:15     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check Wei Liu
2015-02-13 14:15   ` Ian Jackson
2015-02-13 15:12     ` Wei Liu
2015-02-13 15:39       ` Elena Ufimtseva
2015-02-13 16:06         ` Wei Liu
2015-02-13 16:11           ` Elena Ufimtseva
2015-02-17 16:51             ` Dario Faggioli
2015-02-22 15:50               ` Wei Liu
2015-02-17 16:44       ` Dario Faggioli
2015-02-13 15:40   ` Andrew Cooper
2015-02-17 12:56     ` Wei Liu
2015-03-02 15:13       ` Ian Campbell
2015-03-02 15:25         ` Andrew Cooper
2015-03-02 16:05           ` Ian Campbell
2015-02-17 16:38   ` Dario Faggioli
2015-02-22 15:47     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 09/24] libxl: x86: factor out e820_host_sanitize Wei Liu
2015-02-13 15:42   ` Andrew Cooper
2015-02-16 17:00     ` Dario Faggioli
2015-02-12 19:44 ` [PATCH v5 10/24] libxl: functions to build vmemranges for PV guest Wei Liu
2015-02-13 15:49   ` Andrew Cooper
2015-02-17 14:08     ` Wei Liu
2015-02-17 15:28   ` Dario Faggioli
2015-02-17 15:32     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 11/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
2015-02-13 15:54   ` Andrew Cooper
2015-02-17 14:49   ` Dario Faggioli
2015-02-12 19:44 ` [PATCH v5 12/24] hvmloader: retrieve vNUMA information from hypervisor Wei Liu
2015-02-13 15:58   ` Andrew Cooper
2015-02-17 11:36   ` Jan Beulich
2015-02-17 11:42     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 13/24] hvmloader: construct SRAT Wei Liu
2015-02-13 16:07   ` Andrew Cooper
2015-02-12 19:44 ` [PATCH v5 14/24] hvmloader: construct SLIT Wei Liu
2015-02-13 16:10   ` Andrew Cooper
2015-02-12 19:44 ` [PATCH v5 15/24] libxc: indentation change to xc_hvm_build_x86.c Wei Liu
2015-02-12 19:44 ` [PATCH v5 16/24] libxc: allocate memory with vNUMA information for HVM guest Wei Liu
2015-02-13 16:22   ` Andrew Cooper
2015-02-12 19:44 ` [PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
2015-02-13 14:21   ` Ian Jackson
2015-02-13 15:18     ` Wei Liu
2015-02-17 14:26   ` Dario Faggioli
2015-02-17 14:41     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 18/24] libxl: disallow memory relocation when vNUMA is enabled Wei Liu
2015-02-13 14:17   ` Ian Jackson
2015-02-13 15:18     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA Wei Liu
2015-02-13 14:12   ` Ian Jackson
2015-02-13 15:21     ` Wei Liu
2015-02-13 15:26       ` Ian Jackson
2015-02-13 15:27         ` Ian Jackson
2015-02-13 15:28         ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 20/24] libxlu: rework internal representation of setting Wei Liu
2015-02-13 14:24   ` Ian Jackson
2015-02-12 19:44 ` [PATCH v5 21/24] libxlu: nested list support Wei Liu
2015-02-12 19:44 ` [PATCH v5 22/24] libxlu: introduce new APIs Wei Liu
2015-02-13 14:12   ` Ian Jackson
2015-02-16 19:10     ` Wei Liu
2015-02-16 19:47       ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 23/24] xl: introduce xcalloc Wei Liu
2015-02-12 20:17   ` Andrew Cooper
2015-02-13 10:25     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 24/24] xl: vNUMA support Wei Liu
2015-02-24 16:19   ` Dario Faggioli
2015-02-24 16:31     ` Wei Liu
2015-02-24 16:44       ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.