All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] Fix HVM vNUMA
@ 2015-06-01 10:19 Wei Liu
  2015-06-01 10:19 ` [PATCH v3 1/4] libxc/libxl: fill xc_hvm_build_args in libxl Wei Liu
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Wei Liu @ 2015-06-01 10:19 UTC (permalink / raw)
  To: Xen-devel
  Cc: Wei Liu, Boris Ostrovsky, Ian Jackson, Dario Faggioli, Ian Campbell

Boris discovered that HVM vNUMA didn't actually work. This patch series fixes
that.

The first patch is a prerequisite patch for the actual fixes. The second patch
is to help debugging. The fixes are in the last two patches, which can be
squashed into one if necessary.

Patch 1 and 3 are updated in this series. Other patches remain the same as last
version.

Wei.

Wei Liu (4):
  libxc/libxl: fill xc_hvm_build_args in libxl
  libxc: print more error messages when failed
  libxc: rework vnuma bits in setup_guest
  libxl: fix HVM vNUMA

 tools/libxc/xc_hvm_build_x86.c | 133 +++++++++++++++++++++++++++--------------
 tools/libxl/libxl_dom.c        |  48 ++++++++-------
 tools/libxl/libxl_vnuma.c      |  15 ++++-
 3 files changed, 127 insertions(+), 69 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/4] libxc/libxl: fill xc_hvm_build_args in libxl
  2015-06-01 10:19 [PATCH v3 0/4] Fix HVM vNUMA Wei Liu
@ 2015-06-01 10:19 ` Wei Liu
  2015-06-01 10:19 ` [PATCH v3 2/4] libxc: print more error messages when failed Wei Liu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Wei Liu @ 2015-06-01 10:19 UTC (permalink / raw)
  To: Xen-devel
  Cc: Wei Liu, Ian Campbell, Andrew Cooper, Dario Faggioli,
	Ian Jackson, Chen, Tiejun, Boris Ostrovsky

When building HVM guests, originally some fields of xc_hvm_build_args
are filled in xc_hvm_build (and buried in the wrong function), some are
set in libxl__build_hvm before passing xc_hvm_build_args to
xc_hvm_build. This is fragile.

After examining the code in xc_hvm_build that sets those fields, we can
in fact move setting of mmio_start etc in libxl. This way we consolidate
memory layout setting in libxl.

The setting of firmware data related fields is left in xc_hvm_build
because it depends on parsing ELF image. Those fields only point to
scratch data that doesn't affect memory layout.

There should be no change in the generated guest memory layout. But the
semantic is changed for xc_hvm_build. Toolstack that built directly on
top of libxc need to adjust to this change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: "Chen, Tiejun" <tiejun.chen@intel.com>
Cc:  Andrew Cooper <andrew.cooper3@citrix.com>
---
v3: mention semantic change in commit message
---
 tools/libxc/xc_hvm_build_x86.c | 37 +++++++------------------------------
 tools/libxl/libxl_dom.c        | 16 ++++++++++++++++
 2 files changed, 23 insertions(+), 30 deletions(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index e45ae4a..92422bf 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -88,22 +88,14 @@ static int modules_init(struct xc_hvm_build_args *args,
     return 0;
 }
 
-static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
-                           uint64_t mmio_start, uint64_t mmio_size,
+static void build_hvm_info(void *hvm_info_page,
                            struct xc_hvm_build_args *args)
 {
     struct hvm_info_table *hvm_info = (struct hvm_info_table *)
         (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
-    uint64_t lowmem_end = mem_size, highmem_end = 0;
     uint8_t sum;
     int i;
 
-    if ( lowmem_end > mmio_start )
-    {
-        highmem_end = (1ull<<32) + (lowmem_end - mmio_start);
-        lowmem_end = mmio_start;
-    }
-
     memset(hvm_info_page, 0, PAGE_SIZE);
 
     /* Fill in the header. */
@@ -116,14 +108,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     memset(hvm_info->vcpu_online, 0xff, sizeof(hvm_info->vcpu_online));
 
     /* Memory parameters. */
-    hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
-    hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
+    hvm_info->low_mem_pgend = args->lowmem_end >> PAGE_SHIFT;
+    hvm_info->high_mem_pgend = args->highmem_end >> PAGE_SHIFT;
     hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
 
-    args->lowmem_end = lowmem_end;
-    args->highmem_end = highmem_end;
-    args->mmio_start = mmio_start;
-
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
         sum += ((uint8_t *)hvm_info)[i];
@@ -251,8 +239,6 @@ static int setup_guest(xc_interface *xch,
     xen_pfn_t *page_array = NULL;
     unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
     unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
-    uint64_t mmio_start = (1ull << 32) - args->mmio_size;
-    uint64_t mmio_size = args->mmio_size;
     unsigned long entry_eip, cur_pages, cur_pfn;
     void *hvm_info_page;
     uint32_t *ident_pt;
@@ -344,8 +330,8 @@ static int setup_guest(xc_interface *xch,
 
     for ( i = 0; i < nr_pages; i++ )
         page_array[i] = i;
-    for ( i = mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
-        page_array[i] += mmio_size >> PAGE_SHIFT;
+    for ( i = args->mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
+        page_array[i] += args->mmio_size >> PAGE_SHIFT;
 
     /*
      * Try to claim pages for early warning of insufficient memory available.
@@ -446,7 +432,7 @@ static int setup_guest(xc_interface *xch,
                   * range */
                  !check_mmio_hole(cur_pfn << PAGE_SHIFT,
                                   SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT,
-                                  mmio_start, mmio_size) )
+                                  args->mmio_start, args->mmio_size) )
             {
                 long done;
                 unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT;
@@ -545,7 +531,7 @@ static int setup_guest(xc_interface *xch,
               xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
               HVM_INFO_PFN)) == NULL )
         goto error_out;
-    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size, args);
+    build_hvm_info(hvm_info_page, args);
     munmap(hvm_info_page, PAGE_SIZE);
 
     /* Allocate and clear special pages. */
@@ -661,12 +647,6 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
     if ( args.image_file_name == NULL )
         return -1;
 
-    if ( args.mem_target == 0 )
-        args.mem_target = args.mem_size;
-
-    if ( args.mmio_size == 0 )
-        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
-
     /* An HVM guest must be initialised with at least 2MB memory. */
     if ( args.mem_size < (2ull << 20) || args.mem_target < (2ull << 20) )
         return -1;
@@ -684,9 +664,6 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
             args.acpi_module.guest_addr_out;
         hvm_args->smbios_module.guest_addr_out = 
             args.smbios_module.guest_addr_out;
-        hvm_args->lowmem_end = args.lowmem_end;
-        hvm_args->highmem_end = args.highmem_end;
-        hvm_args->mmio_start = args.mmio_start;
     }
 
     free(image);
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index a0c9850..dccc9ac 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -920,6 +920,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     libxl_ctx *ctx = libxl__gc_owner(gc);
     struct xc_hvm_build_args args = {};
     int ret, rc = ERROR_FAIL;
+    uint64_t mmio_start, lowmem_end, highmem_end;
 
     memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
@@ -941,6 +942,21 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         LOG(ERROR, "initializing domain firmware failed");
         goto out;
     }
+    if (args.mem_target == 0)
+        args.mem_target = args.mem_size;
+    if (args.mmio_size == 0)
+        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
+    lowmem_end = args.mem_size;
+    highmem_end = 0;
+    mmio_start = (1ull << 32) - args.mmio_size;
+    if (lowmem_end > mmio_start)
+    {
+        highmem_end = (1ull << 32) + (lowmem_end - mmio_start);
+        lowmem_end = mmio_start;
+    }
+    args.lowmem_end = lowmem_end;
+    args.highmem_end = highmem_end;
+    args.mmio_start = mmio_start;
 
     if (info->num_vnuma_nodes != 0) {
         int i;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/4] libxc: print more error messages when failed
  2015-06-01 10:19 [PATCH v3 0/4] Fix HVM vNUMA Wei Liu
  2015-06-01 10:19 ` [PATCH v3 1/4] libxc/libxl: fill xc_hvm_build_args in libxl Wei Liu
@ 2015-06-01 10:19 ` Wei Liu
  2015-06-01 10:19 ` [PATCH v3 3/4] libxc: rework vnuma bits in setup_guest Wei Liu
  2015-06-01 10:19 ` [PATCH v3 4/4] libxl: fix HVM vNUMA Wei Liu
  3 siblings, 0 replies; 10+ messages in thread
From: Wei Liu @ 2015-06-01 10:19 UTC (permalink / raw)
  To: Xen-devel
  Cc: Wei Liu, Boris Ostrovsky, Ian Jackson, Dario Faggioli, Ian Campbell

No functional changes introduced.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxc/xc_hvm_build_x86.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 92422bf..df4b7ed 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -259,7 +259,10 @@ static int setup_guest(xc_interface *xch,
 
     memset(&elf, 0, sizeof(elf));
     if ( elf_init(&elf, image, image_size) != 0 )
+    {
+        PERROR("Could not initialise ELF image");
         goto error_out;
+    }
 
     xc_elf_set_logfile(xch, &elf, 1);
 
@@ -522,15 +525,24 @@ static int setup_guest(xc_interface *xch,
     DPRINTF("  1GB PAGES: 0x%016lx\n", stat_1gb_pages);
     
     if ( loadelfimage(xch, &elf, dom, page_array) != 0 )
+    {
+        PERROR("Could not load ELF image");
         goto error_out;
+    }
 
     if ( loadmodules(xch, args, m_start, m_end, dom, page_array) != 0 )
-        goto error_out;    
+    {
+        PERROR("Could not load ACPI modules");
+        goto error_out;
+    }
 
     if ( (hvm_info_page = xc_map_foreign_range(
               xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
               HVM_INFO_PFN)) == NULL )
+    {
+        PERROR("Could not map hvm info page");
         goto error_out;
+    }
     build_hvm_info(hvm_info_page, args);
     munmap(hvm_info_page, PAGE_SIZE);
 
@@ -547,7 +559,10 @@ static int setup_guest(xc_interface *xch,
     }
 
     if ( xc_clear_domain_pages(xch, dom, special_pfn(0), NR_SPECIAL_PAGES) )
-            goto error_out;
+    {
+        PERROR("Could not clear special pages");
+        goto error_out;
+    }
 
     xc_hvm_param_set(xch, dom, HVM_PARAM_STORE_PFN,
                      special_pfn(SPECIALPAGE_XENSTORE));
@@ -580,7 +595,10 @@ static int setup_guest(xc_interface *xch,
     }
 
     if ( xc_clear_domain_pages(xch, dom, ioreq_server_pfn(0), NR_IOREQ_SERVER_PAGES) )
-            goto error_out;
+    {
+        PERROR("Could not clear ioreq page");
+        goto error_out;
+    }
 
     /* Tell the domain where the pages are and how many there are */
     xc_hvm_param_set(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
@@ -595,7 +613,10 @@ static int setup_guest(xc_interface *xch,
     if ( (ident_pt = xc_map_foreign_range(
               xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
               special_pfn(SPECIALPAGE_IDENT_PT))) == NULL )
+    {
+        PERROR("Could not map special page ident_pt");
         goto error_out;
+    }
     for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ )
         ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
                        _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
@@ -610,7 +631,10 @@ static int setup_guest(xc_interface *xch,
         char *page0 = xc_map_foreign_range(
             xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, 0);
         if ( page0 == NULL )
+        {
+            PERROR("Could not map page0");
             goto error_out;
+        }
         page0[0] = 0xe9;
         *(uint32_t *)&page0[1] = entry_eip - 5;
         munmap(page0, PAGE_SIZE);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/4] libxc: rework vnuma bits in setup_guest
  2015-06-01 10:19 [PATCH v3 0/4] Fix HVM vNUMA Wei Liu
  2015-06-01 10:19 ` [PATCH v3 1/4] libxc/libxl: fill xc_hvm_build_args in libxl Wei Liu
  2015-06-01 10:19 ` [PATCH v3 2/4] libxc: print more error messages when failed Wei Liu
@ 2015-06-01 10:19 ` Wei Liu
  2015-06-01 20:18   ` Boris Ostrovsky
  2015-06-01 10:19 ` [PATCH v3 4/4] libxl: fix HVM vNUMA Wei Liu
  3 siblings, 1 reply; 10+ messages in thread
From: Wei Liu @ 2015-06-01 10:19 UTC (permalink / raw)
  To: Xen-devel
  Cc: Wei Liu, Boris Ostrovsky, Ian Jackson, Dario Faggioli, Ian Campbell

Make the setup process similar to PV counterpart. That is, to allocate a
P2M array that covers the whole memory range and start from there. This
is clearer than using an array with no holes in it.

Also the dummy layout should take MMIO hole into consideration. We might
end up having two vmemranges in the dummy layout.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: only generate 1 vnode in dummy layout
v3: reset fields that point to dummy layout in exit path
---
 tools/libxc/xc_hvm_build_x86.c | 70 +++++++++++++++++++++++++++++++++---------
 1 file changed, 55 insertions(+), 15 deletions(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index df4b7ed..0e98c84 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -238,6 +238,7 @@ static int setup_guest(xc_interface *xch,
 {
     xen_pfn_t *page_array = NULL;
     unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
+    unsigned long p2m_size;
     unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
     unsigned long entry_eip, cur_pages, cur_pfn;
     void *hvm_info_page;
@@ -254,8 +255,9 @@ static int setup_guest(xc_interface *xch,
     xen_pfn_t special_array[NR_SPECIAL_PAGES];
     xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
     uint64_t total_pages;
-    xen_vmemrange_t dummy_vmemrange;
-    unsigned int dummy_vnode_to_pnode;
+    xen_vmemrange_t dummy_vmemrange[2];
+    unsigned int dummy_vnode_to_pnode[1];
+    bool use_dummy = false;
 
     memset(&elf, 0, sizeof(elf));
     if ( elf_init(&elf, image, image_size) != 0 )
@@ -275,17 +277,36 @@ static int setup_guest(xc_interface *xch,
 
     if ( args->nr_vmemranges == 0 )
     {
-        /* Build dummy vnode information */
-        dummy_vmemrange.start = 0;
-        dummy_vmemrange.end   = args->mem_size;
-        dummy_vmemrange.flags = 0;
-        dummy_vmemrange.nid   = 0;
+        /* Build dummy vnode information
+         *
+         * Guest physical address space layout:
+         * [0, hole_start) [hole_start, 4G) [4G, highmem_end)
+         *
+         * Of course if there is no high memory, the second vmemrange
+         * has no effect on the actual result.
+         */
+
+        dummy_vmemrange[0].start = 0;
+        dummy_vmemrange[0].end   = args->lowmem_end;
+        dummy_vmemrange[0].flags = 0;
+        dummy_vmemrange[0].nid   = 0;
         args->nr_vmemranges = 1;
-        args->vmemranges = &dummy_vmemrange;
 
-        dummy_vnode_to_pnode = XC_NUMA_NO_NODE;
+        if ( args->highmem_end > (1ULL << 32) )
+        {
+            dummy_vmemrange[1].start = 1ULL << 32;
+            dummy_vmemrange[1].end   = args->highmem_end;
+            dummy_vmemrange[1].flags = 0;
+            dummy_vmemrange[1].nid   = 0;
+
+            args->nr_vmemranges++;
+        }
+
+        dummy_vnode_to_pnode[0] = XC_NUMA_NO_NODE;
         args->nr_vnodes = 1;
-        args->vnode_to_pnode = &dummy_vnode_to_pnode;
+        args->vmemranges = dummy_vmemrange;
+        args->vnode_to_pnode = dummy_vnode_to_pnode;
+        use_dummy = true;
     }
     else
     {
@@ -297,9 +318,15 @@ static int setup_guest(xc_interface *xch,
     }
 
     total_pages = 0;
+    p2m_size = 0;
     for ( i = 0; i < args->nr_vmemranges; i++ )
+    {
         total_pages += ((args->vmemranges[i].end - args->vmemranges[i].start)
                         >> PAGE_SHIFT);
+        p2m_size = p2m_size > (args->vmemranges[i].end >> PAGE_SHIFT) ?
+            p2m_size : (args->vmemranges[i].end >> PAGE_SHIFT);
+    }
+
     if ( total_pages != (args->mem_size >> PAGE_SHIFT) )
     {
         PERROR("vNUMA memory pages mismatch (0x%"PRIx64" != 0x%"PRIx64")",
@@ -325,16 +352,23 @@ static int setup_guest(xc_interface *xch,
     DPRINTF("  TOTAL:    %016"PRIx64"->%016"PRIx64"\n", v_start, v_end);
     DPRINTF("  ENTRY:    %016"PRIx64"\n", elf_uval(&elf, elf.ehdr, e_entry));
 
-    if ( (page_array = malloc(nr_pages * sizeof(xen_pfn_t))) == NULL )
+    if ( (page_array = malloc(p2m_size * sizeof(xen_pfn_t))) == NULL )
     {
         PERROR("Could not allocate memory.");
         goto error_out;
     }
 
-    for ( i = 0; i < nr_pages; i++ )
-        page_array[i] = i;
-    for ( i = args->mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
-        page_array[i] += args->mmio_size >> PAGE_SHIFT;
+    for ( i = 0; i < p2m_size; i++ )
+        page_array[i] = ((xen_pfn_t)-1);
+    for ( vmemid = 0; vmemid < args->nr_vmemranges; vmemid++ )
+    {
+        uint64_t pfn;
+
+        for ( pfn = args->vmemranges[vmemid].start >> PAGE_SHIFT;
+              pfn < args->vmemranges[vmemid].end >> PAGE_SHIFT;
+              pfn++ )
+            page_array[pfn] = pfn;
+    }
 
     /*
      * Try to claim pages for early warning of insufficient memory available.
@@ -645,6 +679,12 @@ static int setup_guest(xc_interface *xch,
  error_out:
     rc = -1;
  out:
+    if ( use_dummy )
+    {
+        args->nr_vnodes = 0;
+        args->vmemranges = NULL;
+        args->vnode_to_pnode = NULL;
+    }
     if ( elf_check_broken(&elf) )
         ERROR("HVM ELF broken: %s", elf_check_broken(&elf));
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 4/4] libxl: fix HVM vNUMA
  2015-06-01 10:19 [PATCH v3 0/4] Fix HVM vNUMA Wei Liu
                   ` (2 preceding siblings ...)
  2015-06-01 10:19 ` [PATCH v3 3/4] libxc: rework vnuma bits in setup_guest Wei Liu
@ 2015-06-01 10:19 ` Wei Liu
  3 siblings, 0 replies; 10+ messages in thread
From: Wei Liu @ 2015-06-01 10:19 UTC (permalink / raw)
  To: Xen-devel
  Cc: Wei Liu, Boris Ostrovsky, Ian Jackson, Dario Faggioli, Ian Campbell

This patch does two thing:

The original code erroneously fills in xc_hvm_build_args before
generating vmemranges. The effect is that guest memory is populated
without vNUMA information. Move the hunk to right place to fix this.

Move the subtraction of video ram to libxl__vnuma_build_vmemrange_hvm
because it's the central place for generating vmemranges.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 tools/libxl/libxl_dom.c   | 32 ++++++++++----------------------
 tools/libxl/libxl_vnuma.c | 15 ++++++++++++++-
 2 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index dccc9ac..867172a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -961,6 +961,16 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     if (info->num_vnuma_nodes != 0) {
         int i;
 
+        ret = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, &args);
+        if (ret) {
+            LOGEV(ERROR, ret, "hvm build vmemranges failed");
+            goto out;
+        }
+        ret = libxl__vnuma_config_check(gc, info, state);
+        if (ret) goto out;
+        ret = set_vnuma_info(gc, domid, info, state);
+        if (ret) goto out;
+
         args.nr_vmemranges = state->num_vmemranges;
         args.vmemranges = libxl__malloc(gc, sizeof(*args.vmemranges) *
                                         args.nr_vmemranges);
@@ -972,17 +982,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
             args.vmemranges[i].nid   = state->vmemranges[i].nid;
         }
 
-        /* Consider video ram belongs to vmemrange 0 -- just shrink it
-         * by the size of video ram.
-         */
-        if (((args.vmemranges[0].end - args.vmemranges[0].start) >> 10)
-            < info->video_memkb) {
-            LOG(ERROR, "vmemrange 0 too small to contain video ram");
-            goto out;
-        }
-
-        args.vmemranges[0].end -= (info->video_memkb << 10);
-
         args.nr_vnodes = info->num_vnuma_nodes;
         args.vnode_to_pnode = libxl__malloc(gc, sizeof(*args.vnode_to_pnode) *
                                             args.nr_vnodes);
@@ -996,17 +995,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         goto out;
     }
 
-    if (info->num_vnuma_nodes != 0) {
-        ret = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, &args);
-        if (ret) {
-            LOGEV(ERROR, ret, "hvm build vmemranges failed");
-            goto out;
-        }
-        ret = libxl__vnuma_config_check(gc, info, state);
-        if (ret) goto out;
-        ret = set_vnuma_info(gc, domid, info, state);
-        if (ret) goto out;
-    }
     ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
index cac78d7..56856d2 100644
--- a/tools/libxl/libxl_vnuma.c
+++ b/tools/libxl/libxl_vnuma.c
@@ -257,6 +257,7 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
     uint64_t hole_start, hole_end, next;
     int nid, nr_vmemrange;
     xen_vmemrange_t *vmemranges;
+    int rc;
 
     /* Derive vmemranges from vnode size and memory hole.
      *
@@ -277,6 +278,16 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
         libxl_vnode_info *p = &b_info->vnuma_nodes[nid];
         uint64_t remaining_bytes = p->memkb << 10;
 
+        /* Consider video ram belongs to vnode 0 */
+        if (nid == 0) {
+            if (p->memkb < b_info->video_memkb) {
+                LOG(ERROR, "vnode 0 too small to contain video ram");
+                rc = ERROR_INVAL;
+                goto out;
+            }
+            remaining_bytes -= (b_info->video_memkb << 10);
+        }
+
         while (remaining_bytes > 0) {
             uint64_t count = remaining_bytes;
 
@@ -300,7 +311,9 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
     state->vmemranges = vmemranges;
     state->num_vmemranges = nr_vmemrange;
 
-    return 0;
+    rc = 0;
+out:
+    return rc;
 }
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/4] libxc: rework vnuma bits in setup_guest
  2015-06-01 10:19 ` [PATCH v3 3/4] libxc: rework vnuma bits in setup_guest Wei Liu
@ 2015-06-01 20:18   ` Boris Ostrovsky
  2015-06-03 10:34     ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Boris Ostrovsky @ 2015-06-01 20:18 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: Dario Faggioli, Ian Jackson, Ian Campbell

On 06/01/2015 06:19 AM, Wei Liu wrote:
> Make the setup process similar to PV counterpart. That is, to allocate a
> P2M array that covers the whole memory range and start from there. This
> is clearer than using an array with no holes in it.
>
> Also the dummy layout should take MMIO hole into consideration. We might
> end up having two vmemranges in the dummy layout.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

with a couple of nits below that you might consider.


> @@ -325,16 +352,23 @@ static int setup_guest(xc_interface *xch,
>       DPRINTF("  TOTAL:    %016"PRIx64"->%016"PRIx64"\n", v_start, v_end);
>       DPRINTF("  ENTRY:    %016"PRIx64"\n", elf_uval(&elf, elf.ehdr, e_entry));
>   
> -    if ( (page_array = malloc(nr_pages * sizeof(xen_pfn_t))) == NULL )
> +    if ( (page_array = malloc(p2m_size * sizeof(xen_pfn_t))) == NULL )
>       {
>           PERROR("Could not allocate memory.");
>           goto error_out;
>       }
>   
> -    for ( i = 0; i < nr_pages; i++ )
> -        page_array[i] = i;
> -    for ( i = args->mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
> -        page_array[i] += args->mmio_size >> PAGE_SHIFT;
> +    for ( i = 0; i < p2m_size; i++ )
> +        page_array[i] = ((xen_pfn_t)-1);

For large guests memset-ting page_array to 0xf should be much faster 
(but obviously not type-safe).


> +    for ( vmemid = 0; vmemid < args->nr_vmemranges; vmemid++ )
> +    {
> +        uint64_t pfn;
> +
> +        for ( pfn = args->vmemranges[vmemid].start >> PAGE_SHIFT;
> +              pfn < args->vmemranges[vmemid].end >> PAGE_SHIFT;
> +              pfn++ )
> +            page_array[pfn] = pfn;
> +    }
>   
>       /*
>        * Try to claim pages for early warning of insufficient memory available.
> @@ -645,6 +679,12 @@ static int setup_guest(xc_interface *xch,
>    error_out:
>       rc = -1;
>    out:
> +    if ( use_dummy )

Or 'if (args->vmemranges == dummy_vmemrange)' and drop use_dummy variable.

-boris

> +    {
> +        args->nr_vnodes = 0;
> +        args->vmemranges = NULL;
> +        args->vnode_to_pnode = NULL;
> +    }
>       if ( elf_check_broken(&elf) )
>           ERROR("HVM ELF broken: %s", elf_check_broken(&elf));
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/4] libxc: rework vnuma bits in setup_guest
  2015-06-01 20:18   ` Boris Ostrovsky
@ 2015-06-03 10:34     ` Ian Campbell
  2015-06-03 10:39       ` Wei Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2015-06-03 10:34 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: Xen-devel, Dario Faggioli, Wei Liu, Ian Jackson

On Mon, 2015-06-01 at 16:18 -0400, Boris Ostrovsky wrote:
> On 06/01/2015 06:19 AM, Wei Liu wrote:
> > Make the setup process similar to PV counterpart. That is, to allocate a
> > P2M array that covers the whole memory range and start from there. This
> > is clearer than using an array with no holes in it.
> >
> > Also the dummy layout should take MMIO hole into consideration. We might
> > end up having two vmemranges in the dummy layout.
> >
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> 
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

+ my ack on all 4 -> applied.

> with a couple of nits below that you might consider.

These can be done in followups, so I applied.

I have a passing comment on one of the nits:

> > +    for ( vmemid = 0; vmemid < args->nr_vmemranges; vmemid++ )
> > +    {
> > +        uint64_t pfn;
> > +
> > +        for ( pfn = args->vmemranges[vmemid].start >> PAGE_SHIFT;
> > +              pfn < args->vmemranges[vmemid].end >> PAGE_SHIFT;
> > +              pfn++ )
> > +            page_array[pfn] = pfn;
> > +    }
> >   
> >       /*
> >        * Try to claim pages for early warning of insufficient memory available.
> > @@ -645,6 +679,12 @@ static int setup_guest(xc_interface *xch,
> >    error_out:
> >       rc = -1;
> >    out:
> > +    if ( use_dummy )
> 
> Or 'if (args->vmemranges == dummy_vmemrange)' and drop use_dummy variable.

FWIW a more normal approach would be to do all the calculations on local
variables and propagate them to the caller in the !use_dummy case. I
don't know if that implies some huge restructure of this function
though.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/4] libxc: rework vnuma bits in setup_guest
  2015-06-03 10:34     ` Ian Campbell
@ 2015-06-03 10:39       ` Wei Liu
  2015-06-03 10:57         ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Wei Liu @ 2015-06-03 10:39 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ian Jackson, Xen-devel, Boris Ostrovsky, Wei Liu, Dario Faggioli

On Wed, Jun 03, 2015 at 11:34:49AM +0100, Ian Campbell wrote:
> On Mon, 2015-06-01 at 16:18 -0400, Boris Ostrovsky wrote:
> > On 06/01/2015 06:19 AM, Wei Liu wrote:
> > > Make the setup process similar to PV counterpart. That is, to allocate a
> > > P2M array that covers the whole memory range and start from there. This
> > > is clearer than using an array with no holes in it.
> > >
> > > Also the dummy layout should take MMIO hole into consideration. We might
> > > end up having two vmemranges in the dummy layout.
> > >
> > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > Cc: Ian Campbell <ian.campbell@citrix.com>
> > > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > 
> > Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> 
> + my ack on all 4 -> applied.
> 
> > with a couple of nits below that you might consider.
> 
> These can be done in followups, so I applied.
> 
> I have a passing comment on one of the nits:
> 
> > > +    for ( vmemid = 0; vmemid < args->nr_vmemranges; vmemid++ )
> > > +    {
> > > +        uint64_t pfn;
> > > +
> > > +        for ( pfn = args->vmemranges[vmemid].start >> PAGE_SHIFT;
> > > +              pfn < args->vmemranges[vmemid].end >> PAGE_SHIFT;
> > > +              pfn++ )
> > > +            page_array[pfn] = pfn;
> > > +    }
> > >   
> > >       /*
> > >        * Try to claim pages for early warning of insufficient memory available.
> > > @@ -645,6 +679,12 @@ static int setup_guest(xc_interface *xch,
> > >    error_out:
> > >       rc = -1;
> > >    out:
> > > +    if ( use_dummy )
> > 
> > Or 'if (args->vmemranges == dummy_vmemrange)' and drop use_dummy variable.
> 
> FWIW a more normal approach would be to do all the calculations on local
> variables and propagate them to the caller in the !use_dummy case. I
> don't know if that implies some huge restructure of this function
> though.

There is no calculation with regard to vNUMA that needs to be propagated
to the caller in !use_dummy case. So this point is moot.

Wei.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/4] libxc: rework vnuma bits in setup_guest
  2015-06-03 10:39       ` Wei Liu
@ 2015-06-03 10:57         ` Ian Campbell
  2015-06-03 11:04           ` Wei Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2015-06-03 10:57 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, Boris Ostrovsky, Ian Jackson, Dario Faggioli

On Wed, 2015-06-03 at 11:39 +0100, Wei Liu wrote:
> On Wed, Jun 03, 2015 at 11:34:49AM +0100, Ian Campbell wrote:
> > On Mon, 2015-06-01 at 16:18 -0400, Boris Ostrovsky wrote:
> > > On 06/01/2015 06:19 AM, Wei Liu wrote:
> > > > Make the setup process similar to PV counterpart. That is, to allocate a
> > > > P2M array that covers the whole memory range and start from there. This
> > > > is clearer than using an array with no holes in it.
> > > >
> > > > Also the dummy layout should take MMIO hole into consideration. We might
> > > > end up having two vmemranges in the dummy layout.
> > > >
> > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > Cc: Ian Campbell <ian.campbell@citrix.com>
> > > > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > > 
> > > Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > 
> > + my ack on all 4 -> applied.
> > 
> > > with a couple of nits below that you might consider.
> > 
> > These can be done in followups, so I applied.
> > 
> > I have a passing comment on one of the nits:
> > 
> > > > +    for ( vmemid = 0; vmemid < args->nr_vmemranges; vmemid++ )
> > > > +    {
> > > > +        uint64_t pfn;
> > > > +
> > > > +        for ( pfn = args->vmemranges[vmemid].start >> PAGE_SHIFT;
> > > > +              pfn < args->vmemranges[vmemid].end >> PAGE_SHIFT;
> > > > +              pfn++ )
> > > > +            page_array[pfn] = pfn;
> > > > +    }
> > > >   
> > > >       /*
> > > >        * Try to claim pages for early warning of insufficient memory available.
> > > > @@ -645,6 +679,12 @@ static int setup_guest(xc_interface *xch,
> > > >    error_out:
> > > >       rc = -1;
> > > >    out:
> > > > +    if ( use_dummy )
> > > 
> > > Or 'if (args->vmemranges == dummy_vmemrange)' and drop use_dummy variable.
> > 
> > FWIW a more normal approach would be to do all the calculations on local
> > variables and propagate them to the caller in the !use_dummy case. I
> > don't know if that implies some huge restructure of this function
> > though.
> 
> There is no calculation with regard to vNUMA that needs to be propagated
> to the caller in !use_dummy case. So this point is moot.

The code returns with args->{nr_vnodes,vmemranges,vnode_to_pnode} having
been updated, if that information should not be propoagated to the
caller then surely it should be blown away irrespective of the dummy or
not (and in that case why is it being done in that struct at all and not
in some local variables freed on exit)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/4] libxc: rework vnuma bits in setup_guest
  2015-06-03 10:57         ` Ian Campbell
@ 2015-06-03 11:04           ` Wei Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Wei Liu @ 2015-06-03 11:04 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ian Jackson, Xen-devel, Boris Ostrovsky, Wei Liu, Dario Faggioli

On Wed, Jun 03, 2015 at 11:57:33AM +0100, Ian Campbell wrote:
> On Wed, 2015-06-03 at 11:39 +0100, Wei Liu wrote:
> > On Wed, Jun 03, 2015 at 11:34:49AM +0100, Ian Campbell wrote:
> > > On Mon, 2015-06-01 at 16:18 -0400, Boris Ostrovsky wrote:
> > > > On 06/01/2015 06:19 AM, Wei Liu wrote:
> > > > > Make the setup process similar to PV counterpart. That is, to allocate a
> > > > > P2M array that covers the whole memory range and start from there. This
> > > > > is clearer than using an array with no holes in it.
> > > > >
> > > > > Also the dummy layout should take MMIO hole into consideration. We might
> > > > > end up having two vmemranges in the dummy layout.
> > > > >
> > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > Cc: Ian Campbell <ian.campbell@citrix.com>
> > > > > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > > > 
> > > > Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > > 
> > > + my ack on all 4 -> applied.
> > > 
> > > > with a couple of nits below that you might consider.
> > > 
> > > These can be done in followups, so I applied.
> > > 
> > > I have a passing comment on one of the nits:
> > > 
> > > > > +    for ( vmemid = 0; vmemid < args->nr_vmemranges; vmemid++ )
> > > > > +    {
> > > > > +        uint64_t pfn;
> > > > > +
> > > > > +        for ( pfn = args->vmemranges[vmemid].start >> PAGE_SHIFT;
> > > > > +              pfn < args->vmemranges[vmemid].end >> PAGE_SHIFT;
> > > > > +              pfn++ )
> > > > > +            page_array[pfn] = pfn;
> > > > > +    }
> > > > >   
> > > > >       /*
> > > > >        * Try to claim pages for early warning of insufficient memory available.
> > > > > @@ -645,6 +679,12 @@ static int setup_guest(xc_interface *xch,
> > > > >    error_out:
> > > > >       rc = -1;
> > > > >    out:
> > > > > +    if ( use_dummy )
> > > > 
> > > > Or 'if (args->vmemranges == dummy_vmemrange)' and drop use_dummy variable.
> > > 
> > > FWIW a more normal approach would be to do all the calculations on local
> > > variables and propagate them to the caller in the !use_dummy case. I
> > > don't know if that implies some huge restructure of this function
> > > though.
> > 
> > There is no calculation with regard to vNUMA that needs to be propagated
> > to the caller in !use_dummy case. So this point is moot.
> 
> The code returns with args->{nr_vnodes,vmemranges,vnode_to_pnode} having
> been updated, if that information should not be propoagated to the
> caller then surely it should be blown away irrespective of the dummy or
> not (and in that case why is it being done in that struct at all and not
> in some local variables freed on exit)
> 

That's because the code below references those fields in args and
nothing more. I can refactor this function to have local pointers either
point to dummy or args->*.

Wei.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-06-03 11:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-01 10:19 [PATCH v3 0/4] Fix HVM vNUMA Wei Liu
2015-06-01 10:19 ` [PATCH v3 1/4] libxc/libxl: fill xc_hvm_build_args in libxl Wei Liu
2015-06-01 10:19 ` [PATCH v3 2/4] libxc: print more error messages when failed Wei Liu
2015-06-01 10:19 ` [PATCH v3 3/4] libxc: rework vnuma bits in setup_guest Wei Liu
2015-06-01 20:18   ` Boris Ostrovsky
2015-06-03 10:34     ` Ian Campbell
2015-06-03 10:39       ` Wei Liu
2015-06-03 10:57         ` Ian Campbell
2015-06-03 11:04           ` Wei Liu
2015-06-01 10:19 ` [PATCH v3 4/4] libxl: fix HVM vNUMA Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.