All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>, xen-devel@lists.xen.org
Cc: dario.faggioli@citrix.com, JBeulich@suse.com,
	ian.jackson@eu.citrix.com, ian.campbell@citrix.com,
	ufimtseva@gmail.com
Subject: Re: [PATCH v5 16/24] libxc: allocate memory with vNUMA information for HVM guest
Date: Fri, 13 Feb 2015 16:22:44 +0000	[thread overview]
Message-ID: <54DE24D4.5010708@citrix.com> (raw)
In-Reply-To: <1423770294-9779-17-git-send-email-wei.liu2@citrix.com>

On 12/02/15 19:44, Wei Liu wrote:
> The algorithm is more or less the same as the one used for PV guest.
> Libxc gets hold of the mapping of vnode to pnode and size of each vnode
> then allocate memory accordingly.
>
> And then the function returns low memory end, high memory end and mmio
> start to caller. Libxl needs those values to construct vmemranges for
> that guest.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Elena Ufimtseva <ufimtseva@gmail.com>
> ---
> Changes in v5:
> 1. Use a better loop variable name vnid.
>
> Changes in v4:
> 1. Adapt to new interface.
> 2. Shorten error message.
> 3. This patch includes only functional changes.
>
> Changes in v3:
> 1. Rewrite commit log.
> 2. Add a few code comments.
> ---
>  tools/libxc/include/xenguest.h |  11 +++++
>  tools/libxc/xc_hvm_build_x86.c | 105 ++++++++++++++++++++++++++++++++++-------
>  2 files changed, 100 insertions(+), 16 deletions(-)
>
> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> index 40bbac8..ff66cb1 100644
> --- a/tools/libxc/include/xenguest.h
> +++ b/tools/libxc/include/xenguest.h
> @@ -230,6 +230,17 @@ struct xc_hvm_build_args {
>      struct xc_hvm_firmware_module smbios_module;
>      /* Whether to use claim hypercall (1 - enable, 0 - disable). */
>      int claim_enabled;
> +
> +    /* vNUMA information*/
> +    xen_vmemrange_t *vmemranges;
> +    unsigned int nr_vmemranges;
> +    unsigned int *vnode_to_pnode;
> +    unsigned int nr_vnodes;
> +
> +    /* Out parameters  */
> +    uint64_t lowmem_end;
> +    uint64_t highmem_end;
> +    uint64_t mmio_start;
>  };
>  
>  /**
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index ecc3224..a2a3777 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -89,7 +89,8 @@ static int modules_init(struct xc_hvm_build_args *args,
>  }
>  
>  static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
> -                           uint64_t mmio_start, uint64_t mmio_size)
> +                           uint64_t mmio_start, uint64_t mmio_size,
> +                           struct xc_hvm_build_args *args)
>  {
>      struct hvm_info_table *hvm_info = (struct hvm_info_table *)
>          (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
> @@ -119,6 +120,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
>      hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
>      hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
>  
> +    args->lowmem_end = lowmem_end;
> +    args->highmem_end = highmem_end;
> +    args->mmio_start = mmio_start;
> +
>      /* Finish with the checksum. */
>      for ( i = 0, sum = 0; i < hvm_info->length; i++ )
>          sum += ((uint8_t *)hvm_info)[i];
> @@ -244,7 +249,7 @@ static int setup_guest(xc_interface *xch,
>                         char *image, unsigned long image_size)
>  {
>      xen_pfn_t *page_array = NULL;
> -    unsigned long i, nr_pages = args->mem_size >> PAGE_SHIFT;
> +    unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
>      unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
>      uint64_t mmio_start = (1ull << 32) - args->mmio_size;
>      uint64_t mmio_size = args->mmio_size;
> @@ -258,13 +263,13 @@ static int setup_guest(xc_interface *xch,
>      xen_capabilities_info_t caps;
>      unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
>          stat_1gb_pages = 0;
> -    int pod_mode = 0;
> +    unsigned int memflags = 0;
>      int claim_enabled = args->claim_enabled;
>      xen_pfn_t special_array[NR_SPECIAL_PAGES];
>      xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
> -
> -    if ( nr_pages > target_pages )
> -        pod_mode = XENMEMF_populate_on_demand;
> +    uint64_t total_pages;
> +    xen_vmemrange_t dummy_vmemrange;
> +    unsigned int dummy_vnode_to_pnode;
>  
>      memset(&elf, 0, sizeof(elf));
>      if ( elf_init(&elf, image, image_size) != 0 )
> @@ -276,6 +281,43 @@ static int setup_guest(xc_interface *xch,
>      v_start = 0;
>      v_end = args->mem_size;
>  
> +    if ( nr_pages > target_pages )
> +        memflags |= XENMEMF_populate_on_demand;
> +
> +    if ( args->nr_vmemranges == 0 )
> +    {
> +        /* Build dummy vnode information */
> +        dummy_vmemrange.start = 0;
> +        dummy_vmemrange.end   = args->mem_size;
> +        dummy_vmemrange.flags = 0;
> +        dummy_vmemrange.nid   = 0;
> +        args->nr_vmemranges = 1;
> +        args->vmemranges = &dummy_vmemrange;
> +
> +        dummy_vnode_to_pnode = XC_VNUMA_NO_NODE;
> +        args->nr_vnodes = 1;
> +        args->vnode_to_pnode = &dummy_vnode_to_pnode;
> +    }
> +    else
> +    {
> +        if ( nr_pages > target_pages )
> +        {
> +            PERROR("Cannot enable vNUMA and PoD at the same time");

We would solve a large number of interaction issues like this if someone
had the time to reimplement PoD using the paging system to page in a
page of zeroes.

It would be functionally identical from the guests point of view,
wouldn't need any toolstack interaction, and would reduce the number of
moving parts involved in setting up memory for domain.

(I don't suggest this being a prerequisite to this patch series.)

~Andrew

> +            goto error_out;
> +        }
> +    }
> +
> +    total_pages = 0;
> +    for ( i = 0; i < args->nr_vmemranges; i++ )
> +        total_pages += ((args->vmemranges[i].end - args->vmemranges[i].start)
> +                        >> PAGE_SHIFT);
> +    if ( total_pages != (args->mem_size >> PAGE_SHIFT) )
> +    {
> +        PERROR("vNUMA memory pages mismatch (0x%"PRIx64" != 0x%"PRIx64")",
> +               total_pages, args->mem_size >> PAGE_SHIFT);
> +        goto error_out;
> +    }
> +
>      if ( xc_version(xch, XENVER_capabilities, &caps) != 0 )
>      {
>          PERROR("Could not get Xen capabilities");
> @@ -320,7 +362,7 @@ static int setup_guest(xc_interface *xch,
>          }
>      }
>  
> -    if ( pod_mode )
> +    if ( memflags & XENMEMF_populate_on_demand )
>      {
>          /*
>           * Subtract VGA_HOLE_SIZE from target_pages for the VGA
> @@ -349,15 +391,40 @@ static int setup_guest(xc_interface *xch,
>       * ensure that we can be preempted and hence dom0 remains responsive.
>       */
>      rc = xc_domain_populate_physmap_exact(
> -        xch, dom, 0xa0, 0, pod_mode, &page_array[0x00]);
> -    cur_pages = 0xc0;
> -    stat_normal_pages = 0xc0;
> +        xch, dom, 0xa0, 0, memflags, &page_array[0x00]);
>  
> +    stat_normal_pages = 0;
> +    for ( vmemid = 0; vmemid < args->nr_vmemranges; vmemid++ )
>      {
> -        while ( (rc == 0) && (nr_pages > cur_pages) )
> +        unsigned int new_memflags = memflags;
> +        uint64_t end_pages;
> +        unsigned int vnode = args->vmemranges[vmemid].nid;
> +        unsigned int pnode = args->vnode_to_pnode[vnode];
> +
> +        if ( pnode != XC_VNUMA_NO_NODE )
> +        {
> +            new_memflags |= XENMEMF_exact_node(pnode);
> +            new_memflags |= XENMEMF_exact_node_request;
> +        }
> +
> +        end_pages = args->vmemranges[i].end >> PAGE_SHIFT;
> +        /*
> +         * Consider vga hole belongs to the vmemrange that covers
> +         * 0xA0000-0xC0000. Note that 0x00000-0xA0000 is populated just
> +         * before this loop.
> +         */
> +        if ( args->vmemranges[vmemid].start == 0 )
> +        {
> +            cur_pages = 0xc0;
> +            stat_normal_pages += 0xc0;
> +        }
> +        else
> +            cur_pages = args->vmemranges[vmemid].start >> PAGE_SHIFT;
> +
> +        while ( (rc == 0) && (end_pages > cur_pages) )
>          {
>              /* Clip count to maximum 1GB extent. */
> -            unsigned long count = nr_pages - cur_pages;
> +            unsigned long count = end_pages - cur_pages;
>              unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS;
>  
>              if ( count > max_pages )
> @@ -394,7 +461,7 @@ static int setup_guest(xc_interface *xch,
>  
>                  done = xc_domain_populate_physmap(xch, dom, nr_extents,
>                                                    SUPERPAGE_1GB_SHIFT,
> -                                                  pod_mode, sp_extents);
> +                                                  memflags, sp_extents);
>  
>                  if ( done > 0 )
>                  {
> @@ -434,7 +501,7 @@ static int setup_guest(xc_interface *xch,
>  
>                      done = xc_domain_populate_physmap(xch, dom, nr_extents,
>                                                        SUPERPAGE_2MB_SHIFT,
> -                                                      pod_mode, sp_extents);
> +                                                      memflags, sp_extents);
>  
>                      if ( done > 0 )
>                      {
> @@ -450,11 +517,14 @@ static int setup_guest(xc_interface *xch,
>              if ( count != 0 )
>              {
>                  rc = xc_domain_populate_physmap_exact(
> -                    xch, dom, count, 0, pod_mode, &page_array[cur_pages]);
> +                    xch, dom, count, 0, new_memflags, &page_array[cur_pages]);
>                  cur_pages += count;
>                  stat_normal_pages += count;
>              }
>          }
> +
> +        if ( rc != 0 )
> +            break;
>      }
>  
>      if ( rc != 0 )
> @@ -478,7 +548,7 @@ static int setup_guest(xc_interface *xch,
>                xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
>                HVM_INFO_PFN)) == NULL )
>          goto error_out;
> -    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size);
> +    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size, args);
>      munmap(hvm_info_page, PAGE_SIZE);
>  
>      /* Allocate and clear special pages. */
> @@ -617,6 +687,9 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
>              args.acpi_module.guest_addr_out;
>          hvm_args->smbios_module.guest_addr_out = 
>              args.smbios_module.guest_addr_out;
> +        hvm_args->lowmem_end = args.lowmem_end;
> +        hvm_args->highmem_end = args.highmem_end;
> +        hvm_args->mmio_start = args.mmio_start;
>      }
>  
>      free(image);

  reply	other threads:[~2015-02-13 16:22 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-12 19:44 [PATCH v5 00/24] Virtual NUMA for PV and HVM Wei Liu
2015-02-12 19:44 ` [PATCH v5 01/24] xen: dump vNUMA information with debug key "u" Wei Liu
2015-02-13 11:50   ` Andrew Cooper
2015-02-16 14:35     ` Dario Faggioli
2015-02-12 19:44 ` [PATCH v5 02/24] xen: make two memory hypercalls vNUMA-aware Wei Liu
2015-02-13 12:00   ` Andrew Cooper
2015-02-13 13:24     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 03/24] libxc: duplicate snippet to allocate p2m_host array Wei Liu
2015-02-12 19:44 ` [PATCH v5 04/24] libxc: add p2m_size to xc_dom_image Wei Liu
2015-02-16 14:46   ` Dario Faggioli
2015-02-16 14:49     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 05/24] libxc: allocate memory with vNUMA information for PV guest Wei Liu
2015-02-13 14:30   ` Andrew Cooper
2015-02-13 15:05     ` Wei Liu
2015-02-13 15:17       ` Andrew Cooper
2015-02-16 16:58   ` Dario Faggioli
2015-02-16 17:44     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 06/24] libxl: introduce vNUMA types Wei Liu
2015-02-16 14:58   ` Dario Faggioli
2015-02-16 15:17     ` Wei Liu
2015-02-16 15:56       ` Dario Faggioli
2015-02-16 16:11         ` Wei Liu
2015-02-16 16:51           ` Dario Faggioli
2015-02-16 17:38             ` Wei Liu
2015-02-17 10:42               ` Dario Faggioli
2015-02-12 19:44 ` [PATCH v5 07/24] libxl: add vmemrange to libxl__domain_build_state Wei Liu
2015-02-16 16:00   ` Dario Faggioli
2015-02-16 16:15     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 08/24] libxl: introduce libxl__vnuma_config_check Wei Liu
2015-02-13 14:15   ` Ian Jackson
2015-02-13 15:12     ` Wei Liu
2015-02-13 15:39       ` Elena Ufimtseva
2015-02-13 16:06         ` Wei Liu
2015-02-13 16:11           ` Elena Ufimtseva
2015-02-17 16:51             ` Dario Faggioli
2015-02-22 15:50               ` Wei Liu
2015-02-17 16:44       ` Dario Faggioli
2015-02-13 15:40   ` Andrew Cooper
2015-02-17 12:56     ` Wei Liu
2015-03-02 15:13       ` Ian Campbell
2015-03-02 15:25         ` Andrew Cooper
2015-03-02 16:05           ` Ian Campbell
2015-02-17 16:38   ` Dario Faggioli
2015-02-22 15:47     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 09/24] libxl: x86: factor out e820_host_sanitize Wei Liu
2015-02-13 15:42   ` Andrew Cooper
2015-02-16 17:00     ` Dario Faggioli
2015-02-12 19:44 ` [PATCH v5 10/24] libxl: functions to build vmemranges for PV guest Wei Liu
2015-02-13 15:49   ` Andrew Cooper
2015-02-17 14:08     ` Wei Liu
2015-02-17 15:28   ` Dario Faggioli
2015-02-17 15:32     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 11/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
2015-02-13 15:54   ` Andrew Cooper
2015-02-17 14:49   ` Dario Faggioli
2015-02-12 19:44 ` [PATCH v5 12/24] hvmloader: retrieve vNUMA information from hypervisor Wei Liu
2015-02-13 15:58   ` Andrew Cooper
2015-02-17 11:36   ` Jan Beulich
2015-02-17 11:42     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 13/24] hvmloader: construct SRAT Wei Liu
2015-02-13 16:07   ` Andrew Cooper
2015-02-12 19:44 ` [PATCH v5 14/24] hvmloader: construct SLIT Wei Liu
2015-02-13 16:10   ` Andrew Cooper
2015-02-12 19:44 ` [PATCH v5 15/24] libxc: indentation change to xc_hvm_build_x86.c Wei Liu
2015-02-12 19:44 ` [PATCH v5 16/24] libxc: allocate memory with vNUMA information for HVM guest Wei Liu
2015-02-13 16:22   ` Andrew Cooper [this message]
2015-02-12 19:44 ` [PATCH v5 17/24] libxl: build, check and pass vNUMA info to Xen " Wei Liu
2015-02-13 14:21   ` Ian Jackson
2015-02-13 15:18     ` Wei Liu
2015-02-17 14:26   ` Dario Faggioli
2015-02-17 14:41     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 18/24] libxl: disallow memory relocation when vNUMA is enabled Wei Liu
2015-02-13 14:17   ` Ian Jackson
2015-02-13 15:18     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 19/24] libxl: define LIBXL_HAVE_VNUMA Wei Liu
2015-02-13 14:12   ` Ian Jackson
2015-02-13 15:21     ` Wei Liu
2015-02-13 15:26       ` Ian Jackson
2015-02-13 15:27         ` Ian Jackson
2015-02-13 15:28         ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 20/24] libxlu: rework internal representation of setting Wei Liu
2015-02-13 14:24   ` Ian Jackson
2015-02-12 19:44 ` [PATCH v5 21/24] libxlu: nested list support Wei Liu
2015-02-12 19:44 ` [PATCH v5 22/24] libxlu: introduce new APIs Wei Liu
2015-02-13 14:12   ` Ian Jackson
2015-02-16 19:10     ` Wei Liu
2015-02-16 19:47       ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 23/24] xl: introduce xcalloc Wei Liu
2015-02-12 20:17   ` Andrew Cooper
2015-02-13 10:25     ` Wei Liu
2015-02-12 19:44 ` [PATCH v5 24/24] xl: vNUMA support Wei Liu
2015-02-24 16:19   ` Dario Faggioli
2015-02-24 16:31     ` Wei Liu
2015-02-24 16:44       ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54DE24D4.5010708@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=dario.faggioli@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=ufimtseva@gmail.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.