All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/29] Introduce HVM without dm and new boot ABI
@ 2015-09-04 12:08 Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 01/29] libxc: split x86 HVM setup_guest into smaller logical functions Roger Pau Monne
                   ` (29 more replies)
  0 siblings, 30 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel

This series is split in the following order:

 - Patches from 1 to 10 switch HVM domain contruction to use the xc_dom_*
   family of functions, like they are used to build PV domains. This batch
   of patches can go in regardless of the status of the rest of the series
   IMHO, and in fact would help me quite a lot with the rebasing.
 - Patches from 11 to 21 allow disabling the devices emulated inside of Xen.
 - Patches from 21 to 29 introduce the creation of HVM guests without a
   device model and without the devices emulated inside of Xen.

This series has been successfully tested on the following hardware:

 - Intel Xeon W3550.
 - AMD Opteron 4184.

With both hap=0 and hap=1 in the configuration file. I've been able to boot
a SMP guest in this mode with a virtual hard drive and a virtual network
card, all working fine AFAICT.

For this round only maintainers of the specific code being modified have
been Cced on the patches.

The series can also be found in the following git repo:

git://xenbits.xen.org/people/royger/xen.git branch hvm_without_dm_v6

And for the FreeBSD part:

git://xenbits.xen.org/people/royger/freebsd.git branch new_entry_point_v4

In case someone wants to give it a try, I've uploaded a FreeBSD kernel that
should work when booted into this mode:

https://people.freebsd.org/~royger/kernel_no_dm

This FreeBSD kernel starts the APs in long mode. There are examples for 
starting the APs in other modes in the sys/x86/xen/pv.c file.

The config file that I've used is:

<config>
kernel="/path/to/kernel_no_dm"

builder="hvm"
device_model_version="none"

memory=128
vcpus=2
name = "freebsd"
</config>

Of course if you have a FreeBSD disk already setup it can also be added to
the configuration file, and the following line can be used to point FreeBSD
to the disk:

extra="vfs.root.mountfrom=ufs:/dev/ufsid/<disk_id>"

AW    01/29 libxc: split x86 HVM setup_guest into smaller
AW    02/29 libxc: unify xc_dom_p2m_{host/guest}
AW    03/29 libxc: introduce the notion of a container type
AW    04/29 libxc: introduce a domain loader for HVM guest
AW    05/29 libxc: make arch_setup_meminit a xc_dom_arch hook
AW    06/29 libxc: make arch_setup_boot{init/late} xc_dom_arch
AW    07/29 libxc: rework BSP initialization
AW  M 08/29 libxc: introduce a xc_dom_arch for hvm-3.0-x86_32
 W    09/29 libxl: switch HVM domain building to use xc_dom_*
AW    10/29 libxc: remove dead HVM building code
A     11/29 xen/x86: add bitmap of enabled emulated devices
A B   12/29 xen/x86: allow disabling the emulated local apic
A     13/29 xen/x86: allow disabling the emulated HPET
A     14/29 xen/x86: allow disabling the pmtimer
A     15/29 xen/x86: allow disabling the emulated RTC
A     16/29 xen/x86: allow disabling the emulated IO APIC
A     17/29 xen/x86: allow disabling the emulated PIC
A     18/29 xen/x86: allow disabling the emulated pmu
A     19/29 xen/x86: allow disabling the emulated VGA
A     20/29 xen/x86: allow disabling the emulated IOMMU
A     21/29 xen/x86: allow disabling all emulated devices inside
AW    22/29 elfnotes: intorduce a new PHYS_ENTRY elfnote
AW    23/29 libxc: allow creating domains without emulated
    M 24/29 xen/x86: allow HVM guests to use hypercalls to bring
 W    25/29 xenconsole: try to attach to PV console if HVM fails
    M 26/29 libxc/xen: introduce a start info structure for
AW    27/29 libxc: switch xc_dom_elfloader to be used with
 W    28/29 libxl: allow the creation of HVM domains without a
   N  29/29 libxl: add support for migrating HVM guests without

A = Acked/Reviewed by Andrew Cooper.
W = Acked/Reviewed by Wei Liu.
B = Acked/Reviewed by Boris Ostrovsky.
N = New in this version.
M = Modified in this version.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v6 01/29] libxc: split x86 HVM setup_guest into smaller logical functions
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 02/29] libxc: unify xc_dom_p2m_{host/guest} Roger Pau Monne
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

This is just a preparatory change to clean up the code in setup_guest.
Should not introduce any functional changes.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v3:
 - Add Andrew Cooper Reviewed-by.
 - Add Wei Acked-by.
---
 tools/libxc/xc_hvm_build_x86.c | 198 ++++++++++++++++++++++++-----------------
 1 file changed, 117 insertions(+), 81 deletions(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index ea250dd..4d3736b 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -231,28 +231,20 @@ static int check_mmio_hole(uint64_t start, uint64_t memsize,
         return 1;
 }
 
-static int setup_guest(xc_interface *xch,
-                       uint32_t dom, struct xc_hvm_build_args *args,
-                       char *image, unsigned long image_size)
+static int xc_hvm_populate_memory(xc_interface *xch, uint32_t dom,
+                                  struct xc_hvm_build_args *args,
+                                  xen_pfn_t *page_array)
 {
-    xen_pfn_t *page_array = NULL;
     unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
     unsigned long p2m_size;
     unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
-    unsigned long entry_eip, cur_pages, cur_pfn;
-    void *hvm_info_page;
-    uint32_t *ident_pt;
-    struct elf_binary elf;
-    uint64_t v_start, v_end;
-    uint64_t m_start = 0, m_end = 0;
+    unsigned long cur_pages, cur_pfn;
     int rc;
     xen_capabilities_info_t caps;
     unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
         stat_1gb_pages = 0;
     unsigned int memflags = 0;
     int claim_enabled = args->claim_enabled;
-    xen_pfn_t special_array[NR_SPECIAL_PAGES];
-    xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
     uint64_t total_pages;
     xen_vmemrange_t dummy_vmemrange[2];
     unsigned int dummy_vnode_to_pnode[1];
@@ -260,19 +252,6 @@ static int setup_guest(xc_interface *xch,
     unsigned int *vnode_to_pnode;
     unsigned int nr_vmemranges, nr_vnodes;
 
-    memset(&elf, 0, sizeof(elf));
-    if ( elf_init(&elf, image, image_size) != 0 )
-    {
-        PERROR("Could not initialise ELF image");
-        goto error_out;
-    }
-
-    xc_elf_set_logfile(xch, &elf, 1);
-
-    elf_parse_binary(&elf);
-    v_start = 0;
-    v_end = args->mem_size;
-
     if ( nr_pages > target_pages )
         memflags |= XENMEMF_populate_on_demand;
 
@@ -345,24 +324,6 @@ static int setup_guest(xc_interface *xch,
         goto error_out;
     }
 
-    if ( modules_init(args, v_end, &elf, &m_start, &m_end) != 0 )
-    {
-        ERROR("Insufficient space to load modules.");
-        goto error_out;
-    }
-
-    DPRINTF("VIRTUAL MEMORY ARRANGEMENT:\n");
-    DPRINTF("  Loader:   %016"PRIx64"->%016"PRIx64"\n", elf.pstart, elf.pend);
-    DPRINTF("  Modules:  %016"PRIx64"->%016"PRIx64"\n", m_start, m_end);
-    DPRINTF("  TOTAL:    %016"PRIx64"->%016"PRIx64"\n", v_start, v_end);
-    DPRINTF("  ENTRY:    %016"PRIx64"\n", elf_uval(&elf, elf.ehdr, e_entry));
-
-    if ( (page_array = malloc(p2m_size * sizeof(xen_pfn_t))) == NULL )
-    {
-        PERROR("Could not allocate memory.");
-        goto error_out;
-    }
-
     for ( i = 0; i < p2m_size; i++ )
         page_array[i] = ((xen_pfn_t)-1);
     for ( vmemid = 0; vmemid < nr_vmemranges; vmemid++ )
@@ -564,7 +525,54 @@ static int setup_guest(xc_interface *xch,
     DPRINTF("  4KB PAGES: 0x%016lx\n", stat_normal_pages);
     DPRINTF("  2MB PAGES: 0x%016lx\n", stat_2mb_pages);
     DPRINTF("  1GB PAGES: 0x%016lx\n", stat_1gb_pages);
-    
+
+    rc = 0;
+    goto out;
+ error_out:
+    rc = -1;
+ out:
+
+    /* ensure no unclaimed pages are left unused */
+    xc_domain_claim_pages(xch, dom, 0 /* cancels the claim */);
+
+    return rc;
+}
+
+static int xc_hvm_load_image(xc_interface *xch,
+                       uint32_t dom, struct xc_hvm_build_args *args,
+                       xen_pfn_t *page_array)
+{
+    unsigned long entry_eip, image_size;
+    struct elf_binary elf;
+    uint64_t v_start, v_end;
+    uint64_t m_start = 0, m_end = 0;
+    char *image;
+    int rc;
+
+    image = xc_read_image(xch, args->image_file_name, &image_size);
+    if ( image == NULL )
+        return -1;
+
+    memset(&elf, 0, sizeof(elf));
+    if ( elf_init(&elf, image, image_size) != 0 )
+        goto error_out;
+
+    xc_elf_set_logfile(xch, &elf, 1);
+
+    elf_parse_binary(&elf);
+    v_start = 0;
+    v_end = args->mem_size;
+
+    if ( modules_init(args, v_end, &elf, &m_start, &m_end) != 0 )
+    {
+        ERROR("Insufficient space to load modules.");
+        goto error_out;
+    }
+
+    DPRINTF("VIRTUAL MEMORY ARRANGEMENT:\n");
+    DPRINTF("  Loader:   %016"PRIx64"->%016"PRIx64"\n", elf.pstart, elf.pend);
+    DPRINTF("  Modules:  %016"PRIx64"->%016"PRIx64"\n", m_start, m_end);
+
     if ( loadelfimage(xch, &elf, dom, page_array) != 0 )
     {
         PERROR("Could not load ELF image");
@@ -577,6 +585,44 @@ static int setup_guest(xc_interface *xch,
         goto error_out;
     }
 
+    /* Insert JMP <rel32> instruction at address 0x0 to reach entry point. */
+    entry_eip = elf_uval(&elf, elf.ehdr, e_entry);
+    if ( entry_eip != 0 )
+    {
+        char *page0 = xc_map_foreign_range(
+            xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, 0);
+        if ( page0 == NULL )
+            goto error_out;
+        page0[0] = 0xe9;
+        *(uint32_t *)&page0[1] = entry_eip - 5;
+        munmap(page0, PAGE_SIZE);
+    }
+
+    rc = 0;
+    goto out;
+ error_out:
+    rc = -1;
+ out:
+    if ( elf_check_broken(&elf) )
+        ERROR("HVM ELF broken: %s", elf_check_broken(&elf));
+    free(image);
+
+    return rc;
+}
+
+static int xc_hvm_populate_params(xc_interface *xch, uint32_t dom,
+                                  struct xc_hvm_build_args *args)
+{
+    unsigned long i;
+    void *hvm_info_page;
+    uint32_t *ident_pt;
+    uint64_t v_end;
+    int rc;
+    xen_pfn_t special_array[NR_SPECIAL_PAGES];
+    xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
+
+    v_end = args->mem_size;
+
     if ( (hvm_info_page = xc_map_foreign_range(
               xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
               HVM_INFO_PFN)) == NULL )
@@ -665,34 +711,12 @@ static int setup_guest(xc_interface *xch,
     xc_hvm_param_set(xch, dom, HVM_PARAM_IDENT_PT,
                      special_pfn(SPECIALPAGE_IDENT_PT) << PAGE_SHIFT);
 
-    /* Insert JMP <rel32> instruction at address 0x0 to reach entry point. */
-    entry_eip = elf_uval(&elf, elf.ehdr, e_entry);
-    if ( entry_eip != 0 )
-    {
-        char *page0 = xc_map_foreign_range(
-            xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, 0);
-        if ( page0 == NULL )
-        {
-            PERROR("Could not map page0");
-            goto error_out;
-        }
-        page0[0] = 0xe9;
-        *(uint32_t *)&page0[1] = entry_eip - 5;
-        munmap(page0, PAGE_SIZE);
-    }
-
     rc = 0;
     goto out;
  error_out:
     rc = -1;
  out:
-    if ( elf_check_broken(&elf) )
-        ERROR("HVM ELF broken: %s", elf_check_broken(&elf));
-
-    /* ensure no unclaimed pages are left unused */
-    xc_domain_claim_pages(xch, dom, 0 /* cancels the claim */);
 
-    free(page_array);
     return rc;
 }
 
@@ -703,9 +727,8 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
                  struct xc_hvm_build_args *hvm_args)
 {
     struct xc_hvm_build_args args = *hvm_args;
-    void *image;
-    unsigned long image_size;
-    int sts;
+    xen_pfn_t *parray = NULL;
+    int rc;
 
     if ( domid == 0 )
         return -1;
@@ -716,24 +739,37 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
     if ( args.mem_size < (2ull << 20) || args.mem_target < (2ull << 20) )
         return -1;
 
-    image = xc_read_image(xch, args.image_file_name, &image_size);
-    if ( image == NULL )
+    parray = malloc((args.mem_size >> PAGE_SHIFT) * sizeof(xen_pfn_t));
+    if ( parray == NULL )
         return -1;
 
-    sts = setup_guest(xch, domid, &args, image, image_size);
-
-    if (!sts)
+    rc = xc_hvm_populate_memory(xch, domid, &args, parray);
+    if ( rc != 0 )
     {
-        /* Return module load addresses to caller */
-        hvm_args->acpi_module.guest_addr_out = 
-            args.acpi_module.guest_addr_out;
-        hvm_args->smbios_module.guest_addr_out = 
-            args.smbios_module.guest_addr_out;
+        PERROR("xc_hvm_populate_memory failed");
+        goto out;
+    }
+    rc = xc_hvm_load_image(xch, domid, &args, parray);
+    if ( rc != 0 )
+    {
+        PERROR("xc_hvm_load_image failed");
+        goto out;
+    }
+    rc = xc_hvm_populate_params(xch, domid, &args);
+    if ( rc != 0 )
+    {
+        PERROR("xc_hvm_populate_params failed");
+        goto out;
     }
 
-    free(image);
+    /* Return module load addresses to caller */
+    hvm_args->acpi_module.guest_addr_out = args.acpi_module.guest_addr_out;
+    hvm_args->smbios_module.guest_addr_out = args.smbios_module.guest_addr_out;
 
-    return sts;
+out:
+    free(parray);
+
+    return rc;
 }
 
 /* xc_hvm_build_target_mem: 
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 02/29] libxc: unify xc_dom_p2m_{host/guest}
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 01/29] libxc: split x86 HVM setup_guest into smaller logical functions Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 03/29] libxc: introduce the notion of a container type Roger Pau Monne
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson,
	Samuel Thibault, Roger Pau Monne

Unify both functions into xc_dom_p2m. Should not introduce any functional
change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
---
Changes since v3:
 - Add Andrew Cooper Reviewed-by.
 - Add Wei Acked-by.
---
 stubdom/grub/kexec.c              |  4 ++--
 tools/libxc/include/xc_dom.h      | 14 ++------------
 tools/libxc/xc_dom_boot.c         | 10 +++++-----
 tools/libxc/xc_dom_compat_linux.c |  4 ++--
 tools/libxc/xc_dom_x86.c          | 32 ++++++++++++++++----------------
 tools/libxl/libxl_dom.c           |  4 ++--
 6 files changed, 29 insertions(+), 39 deletions(-)

diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c
index 4c33b25..0b2f4f3 100644
--- a/stubdom/grub/kexec.c
+++ b/stubdom/grub/kexec.c
@@ -358,9 +358,9 @@ void kexec(void *kernel, long kernel_size, void *module, long module_size, char
 #ifdef __x86_64__
                 MMUEXT_PIN_L4_TABLE,
 #endif
-                xc_dom_p2m_host(dom, dom->pgtables_seg.pfn),
+                xc_dom_p2m(dom, dom->pgtables_seg.pfn),
                 dom->guest_domid)) != 0 ) {
-        grub_printf("pin_table(%lx) returned %d\n", xc_dom_p2m_host(dom,
+        grub_printf("pin_table(%lx) returned %d\n", xc_dom_p2m(dom,
                     dom->pgtables_seg.pfn), rc);
         errnum = ERR_BOOT_FAILURE;
         goto out_remap;
diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 600aef6..9cf13e2 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -375,19 +375,9 @@ static inline void *xc_dom_vaddr_to_ptr(struct xc_dom_image *dom,
     return ptr + offset;
 }
 
-static inline xen_pfn_t xc_dom_p2m_host(struct xc_dom_image *dom, xen_pfn_t pfn)
+static inline xen_pfn_t xc_dom_p2m(struct xc_dom_image *dom, xen_pfn_t pfn)
 {
-    if (dom->shadow_enabled)
-        return pfn;
-    if (pfn < dom->rambase_pfn || pfn >= dom->rambase_pfn + dom->total_pages)
-        return INVALID_MFN;
-    return dom->p2m_host[pfn - dom->rambase_pfn];
-}
-
-static inline xen_pfn_t xc_dom_p2m_guest(struct xc_dom_image *dom,
-                                         xen_pfn_t pfn)
-{
-    if (xc_dom_feature_translated(dom))
+    if ( dom->shadow_enabled || xc_dom_feature_translated(dom) )
         return pfn;
     if (pfn < dom->rambase_pfn || pfn >= dom->rambase_pfn + dom->total_pages)
         return INVALID_MFN;
diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c
index 8e06406..7c30f96 100644
--- a/tools/libxc/xc_dom_boot.c
+++ b/tools/libxc/xc_dom_boot.c
@@ -53,7 +53,7 @@ static int setup_hypercall_page(struct xc_dom_image *dom)
                   dom->parms.virt_hypercall, pfn);
     domctl.cmd = XEN_DOMCTL_hypercall_init;
     domctl.domain = dom->guest_domid;
-    domctl.u.hypercall_init.gmfn = xc_dom_p2m_guest(dom, pfn);
+    domctl.u.hypercall_init.gmfn = xc_dom_p2m(dom, pfn);
     rc = do_domctl(dom->xch, &domctl);
     if ( rc != 0 )
         xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
@@ -83,7 +83,7 @@ static int clear_page(struct xc_dom_image *dom, xen_pfn_t pfn)
     if ( pfn == 0 )
         return 0;
 
-    dst = xc_dom_p2m_host(dom, pfn);
+    dst = xc_dom_p2m(dom, pfn);
     DOMPRINTF("%s: pfn 0x%" PRIpfn ", mfn 0x%" PRIpfn "",
               __FUNCTION__, pfn, dst);
     rc = xc_clear_domain_page(dom->xch, dom->guest_domid, dst);
@@ -177,7 +177,7 @@ void *xc_dom_boot_domU_map(struct xc_dom_image *dom, xen_pfn_t pfn,
     }
 
     for ( i = 0; i < count; i++ )
-        entries[i].mfn = xc_dom_p2m_host(dom, pfn + i);
+        entries[i].mfn = xc_dom_p2m(dom, pfn + i);
 
     ptr = xc_map_foreign_ranges(dom->xch, dom->guest_domid,
                 count << page_shift, PROT_READ | PROT_WRITE, 1 << page_shift,
@@ -434,8 +434,8 @@ int xc_dom_gnttab_init(struct xc_dom_image *dom)
                                       dom->console_domid, dom->xenstore_domid);
     } else {
         return xc_dom_gnttab_seed(dom->xch, dom->guest_domid,
-                                  xc_dom_p2m_host(dom, dom->console_pfn),
-                                  xc_dom_p2m_host(dom, dom->xenstore_pfn),
+                                  xc_dom_p2m(dom, dom->console_pfn),
+                                  xc_dom_p2m(dom, dom->xenstore_pfn),
                                   dom->console_domid, dom->xenstore_domid);
     }
 }
diff --git a/tools/libxc/xc_dom_compat_linux.c b/tools/libxc/xc_dom_compat_linux.c
index a3abb99..5c1f043 100644
--- a/tools/libxc/xc_dom_compat_linux.c
+++ b/tools/libxc/xc_dom_compat_linux.c
@@ -64,8 +64,8 @@ static int xc_linux_build_internal(struct xc_dom_image *dom,
     if ( (rc = xc_dom_gnttab_init(dom)) != 0)
         goto out;
 
-    *console_mfn = xc_dom_p2m_host(dom, dom->console_pfn);
-    *store_mfn = xc_dom_p2m_host(dom, dom->xenstore_pfn);
+    *console_mfn = xc_dom_p2m(dom, dom->console_pfn);
+    *store_mfn = xc_dom_p2m(dom, dom->xenstore_pfn);
 
  out:
     return rc;
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 3d40fa4..dc2f4aa 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -247,7 +247,7 @@ static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom)
     unsigned long l3off, l2off = 0, l1off;
     xen_vaddr_t addr;
     xen_pfn_t pgpfn;
-    xen_pfn_t l3mfn = xc_dom_p2m_guest(dom, l3pfn);
+    xen_pfn_t l3mfn = xc_dom_p2m(dom, l3pfn);
 
     if ( dom->parms.pae == 1 )
     {
@@ -279,7 +279,7 @@ static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom)
                 goto pfn_error;
             l3off = l3_table_offset_pae(addr);
             l3tab[l3off] =
-                pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
+                pfn_to_paddr(xc_dom_p2m(dom, l2pfn)) | L3_PROT;
             l2pfn++;
         }
 
@@ -291,7 +291,7 @@ static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom)
                 goto pfn_error;
             l2off = l2_table_offset_pae(addr);
             l2tab[l2off] =
-                pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
+                pfn_to_paddr(xc_dom_p2m(dom, l1pfn)) | L2_PROT;
             l1pfn++;
         }
 
@@ -299,7 +299,7 @@ static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom)
         l1off = l1_table_offset_pae(addr);
         pgpfn = (addr - dom->parms.virt_base) >> PAGE_SHIFT_X86;
         l1tab[l1off] =
-            pfn_to_paddr(xc_dom_p2m_guest(dom, pgpfn)) | L1_PROT;
+            pfn_to_paddr(xc_dom_p2m(dom, pgpfn)) | L1_PROT;
         if ( (addr >= dom->pgtables_seg.vstart) &&
              (addr < dom->pgtables_seg.vend) )
             l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
@@ -315,7 +315,7 @@ static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom)
     if ( dom->virt_pgtab_end <= 0xc0000000 )
     {
         DOMPRINTF("%s: PAE: extra l2 page table for l3#3", __FUNCTION__);
-        l3tab[3] = pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
+        l3tab[3] = pfn_to_paddr(xc_dom_p2m(dom, l2pfn)) | L3_PROT;
     }
     return 0;
 
@@ -374,7 +374,7 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
                 goto pfn_error;
             l4off = l4_table_offset_x86_64(addr);
             l4tab[l4off] =
-                pfn_to_paddr(xc_dom_p2m_guest(dom, l3pfn)) | L4_PROT;
+                pfn_to_paddr(xc_dom_p2m(dom, l3pfn)) | L4_PROT;
             l3pfn++;
         }
 
@@ -386,7 +386,7 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
                 goto pfn_error;
             l3off = l3_table_offset_x86_64(addr);
             l3tab[l3off] =
-                pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
+                pfn_to_paddr(xc_dom_p2m(dom, l2pfn)) | L3_PROT;
             l2pfn++;
         }
 
@@ -398,7 +398,7 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
                 goto pfn_error;
             l2off = l2_table_offset_x86_64(addr);
             l2tab[l2off] =
-                pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
+                pfn_to_paddr(xc_dom_p2m(dom, l1pfn)) | L2_PROT;
             l1pfn++;
         }
 
@@ -406,7 +406,7 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
         l1off = l1_table_offset_x86_64(addr);
         pgpfn = (addr - dom->parms.virt_base) >> PAGE_SHIFT_X86;
         l1tab[l1off] =
-            pfn_to_paddr(xc_dom_p2m_guest(dom, pgpfn)) | L1_PROT;
+            pfn_to_paddr(xc_dom_p2m(dom, pgpfn)) | L1_PROT;
         if ( (!dom->pvh_enabled)                &&
              (addr >= dom->pgtables_seg.vstart) &&
              (addr < dom->pgtables_seg.vend) )
@@ -489,9 +489,9 @@ static int start_info_x86_32(struct xc_dom_image *dom)
     start_info->mfn_list = dom->p2m_seg.vstart;
 
     start_info->flags = dom->flags;
-    start_info->store_mfn = xc_dom_p2m_guest(dom, dom->xenstore_pfn);
+    start_info->store_mfn = xc_dom_p2m(dom, dom->xenstore_pfn);
     start_info->store_evtchn = dom->xenstore_evtchn;
-    start_info->console.domU.mfn = xc_dom_p2m_guest(dom, dom->console_pfn);
+    start_info->console.domU.mfn = xc_dom_p2m(dom, dom->console_pfn);
     start_info->console.domU.evtchn = dom->console_evtchn;
 
     if ( dom->ramdisk_blob )
@@ -535,9 +535,9 @@ static int start_info_x86_64(struct xc_dom_image *dom)
     start_info->mfn_list = dom->p2m_seg.vstart;
 
     start_info->flags = dom->flags;
-    start_info->store_mfn = xc_dom_p2m_guest(dom, dom->xenstore_pfn);
+    start_info->store_mfn = xc_dom_p2m(dom, dom->xenstore_pfn);
     start_info->store_evtchn = dom->xenstore_evtchn;
-    start_info->console.domU.mfn = xc_dom_p2m_guest(dom, dom->console_pfn);
+    start_info->console.domU.mfn = xc_dom_p2m(dom, dom->console_pfn);
     start_info->console.domU.evtchn = dom->console_evtchn;
 
     if ( dom->ramdisk_blob )
@@ -621,7 +621,7 @@ static int vcpu_x86_32(struct xc_dom_image *dom, void *ptr)
          dom->parms.pae == 3 /* bimodal */ )
         ctxt->vm_assist |= (1UL << VMASST_TYPE_pae_extended_cr3);
 
-    cr3_pfn = xc_dom_p2m_guest(dom, dom->pgtables_seg.pfn);
+    cr3_pfn = xc_dom_p2m(dom, dom->pgtables_seg.pfn);
     ctxt->ctrlreg[3] = xen_pfn_to_cr3_x86_32(cr3_pfn);
     DOMPRINTF("%s: cr3: pfn 0x%" PRIpfn " mfn 0x%" PRIpfn "",
               __FUNCTION__, dom->pgtables_seg.pfn, cr3_pfn);
@@ -647,7 +647,7 @@ static int vcpu_x86_64(struct xc_dom_image *dom, void *ptr)
     ctxt->user_regs.rflags = 1 << 9; /* Interrupt Enable */
 
     ctxt->flags = VGCF_in_kernel_X86_64 | VGCF_online_X86_64;
-    cr3_pfn = xc_dom_p2m_guest(dom, dom->pgtables_seg.pfn);
+    cr3_pfn = xc_dom_p2m(dom, dom->pgtables_seg.pfn);
     ctxt->ctrlreg[3] = xen_pfn_to_cr3_x86_64(cr3_pfn);
     DOMPRINTF("%s: cr3: pfn 0x%" PRIpfn " mfn 0x%" PRIpfn "",
               __FUNCTION__, dom->pgtables_seg.pfn, cr3_pfn);
@@ -1020,7 +1020,7 @@ int arch_setup_bootlate(struct xc_dom_image *dom)
         /* paravirtualized guest */
         xc_dom_unmap_one(dom, dom->pgtables_seg.pfn);
         rc = pin_table(dom->xch, pgd_type,
-                       xc_dom_p2m_host(dom, dom->pgtables_seg.pfn),
+                       xc_dom_p2m(dom, dom->pgtables_seg.pfn),
                        dom->guest_domid);
         if ( rc != 0 )
         {
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index c2518a3..eab6b21 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -738,8 +738,8 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
         state->console_mfn = dom->console_pfn;
         state->store_mfn = dom->xenstore_pfn;
     } else {
-        state->console_mfn = xc_dom_p2m_host(dom, dom->console_pfn);
-        state->store_mfn = xc_dom_p2m_host(dom, dom->xenstore_pfn);
+        state->console_mfn = xc_dom_p2m(dom, dom->console_pfn);
+        state->store_mfn = xc_dom_p2m(dom, dom->xenstore_pfn);
     }
 
     libxl__file_reference_unmap(&state->pv_kernel);
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 03/29] libxc: introduce the notion of a container type
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 01/29] libxc: split x86 HVM setup_guest into smaller logical functions Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 02/29] libxc: unify xc_dom_p2m_{host/guest} Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 04/29] libxc: introduce a domain loader for HVM guest firmware Roger Pau Monne
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

Introduce the notion of a container type into xc_dom_image. This will be
needed by later changes that will also use xc_dom_image in order to build
HVM guests.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v3:
 - Add Andrew Cooper Reviewed-by.
 - Add Wei Acked-by.
---
 tools/libxc/include/xc_dom.h | 6 ++++++
 tools/libxc/xc_dom_x86.c     | 4 ++++
 tools/libxl/libxl_dom.c      | 1 +
 3 files changed, 11 insertions(+)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 9cf13e2..bc55ec9 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -179,6 +179,12 @@ struct xc_dom_image {
     struct xc_dom_arch *arch_hooks;
     /* allocate up to virt_alloc_end */
     int (*allocate) (struct xc_dom_image * dom, xen_vaddr_t up_to);
+
+    /* Container type (HVM or PV). */
+    enum {
+        XC_DOM_PV_CONTAINER,
+        XC_DOM_HVM_CONTAINER,
+    } container_type;
 };
 
 /* --- pluggable kernel loader ------------------------------------- */
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index dc2f4aa..c7bfc0c 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -1071,6 +1071,10 @@ int arch_setup_bootlate(struct xc_dom_image *dom)
 
 int xc_dom_feature_translated(struct xc_dom_image *dom)
 {
+    /* Guests running inside HVM containers are always auto-translated. */
+    if ( dom->container_type == XC_DOM_HVM_CONTAINER )
+        return 1;
+
     return elf_xen_feature_get(XENFEAT_auto_translated_physmap, dom->f_active);
 }
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index eab6b21..6101e5c 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -619,6 +619,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
     }
 
     dom->pvh_enabled = state->pvh_enabled;
+    dom->container_type = XC_DOM_PV_CONTAINER;
 
     LOG(DEBUG, "pv kernel mapped %d path %s", state->pv_kernel.mapped, state->pv_kernel.path);
 
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 04/29] libxc: introduce a domain loader for HVM guest firmware
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (2 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 03/29] libxc: introduce the notion of a container type Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 05/29] libxc: make arch_setup_meminit a xc_dom_arch hook Roger Pau Monne
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

Introduce a very simple (and dummy) domain loader to be used to load the
firmware (hvmloader) into HVM guests. Since hmvloader is just a 32bit elf
executable the loader is fairly simple.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v3:
 - s/__FUNCTION__/__func__/g
 - Fix style errors in xc_dom_hvmloader.c.
 - Add Andrew Cooper Reviewed-by.
 - Add Wei Acked-by.
---
 tools/libxc/Makefile           |   1 +
 tools/libxc/include/xc_dom.h   |   8 ++
 tools/libxc/xc_dom_hvmloader.c | 313 +++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/libelf.h       |   1 +
 4 files changed, 323 insertions(+)
 create mode 100644 tools/libxc/xc_dom_hvmloader.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index a0f899b..baaadd6 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -84,6 +84,7 @@ GUEST_SRCS-y                 += xc_dom_core.c xc_dom_boot.c
 GUEST_SRCS-y                 += xc_dom_elfloader.c
 GUEST_SRCS-$(CONFIG_X86)     += xc_dom_bzimageloader.c
 GUEST_SRCS-$(CONFIG_X86)     += xc_dom_decompress_lz4.c
+GUEST_SRCS-$(CONFIG_X86)     += xc_dom_hvmloader.c
 GUEST_SRCS-$(CONFIG_ARM)     += xc_dom_armzimageloader.c
 GUEST_SRCS-y                 += xc_dom_binloader.c
 GUEST_SRCS-y                 += xc_dom_compat_linux.c
diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index bc55ec9..02d9d5c 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -14,6 +14,7 @@
  */
 
 #include <xen/libelf/libelf.h>
+#include <xenguest.h>
 
 #define INVALID_P2M_ENTRY   ((xen_pfn_t)-1)
 
@@ -185,6 +186,13 @@ struct xc_dom_image {
         XC_DOM_PV_CONTAINER,
         XC_DOM_HVM_CONTAINER,
     } container_type;
+
+    /* HVM specific fields. */
+    /* Extra ACPI tables passed to HVMLOADER */
+    struct xc_hvm_firmware_module acpi_module;
+
+    /* Extra SMBIOS structures passed to HVMLOADER */
+    struct xc_hvm_firmware_module smbios_module;
 };
 
 /* --- pluggable kernel loader ------------------------------------- */
diff --git a/tools/libxc/xc_dom_hvmloader.c b/tools/libxc/xc_dom_hvmloader.c
new file mode 100644
index 0000000..79a3b99
--- /dev/null
+++ b/tools/libxc/xc_dom_hvmloader.c
@@ -0,0 +1,313 @@
+/*
+ * Xen domain builder -- HVM specific bits.
+ *
+ * Parse and load ELF firmware images for HVM domains.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdarg.h>
+#include <inttypes.h>
+#include <assert.h>
+
+#include "xg_private.h"
+#include "xc_dom.h"
+#include "xc_bitops.h"
+
+/* ------------------------------------------------------------------------ */
+/* parse elf binary                                                         */
+
+static elf_negerrnoval check_elf_kernel(struct xc_dom_image *dom, bool verbose)
+{
+    if ( dom->kernel_blob == NULL )
+    {
+        if ( verbose )
+            xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                         "%s: no kernel image loaded", __func__);
+        return -EINVAL;
+    }
+
+    if ( !elf_is_elfbinary(dom->kernel_blob, dom->kernel_size) )
+    {
+        if ( verbose )
+            xc_dom_panic(dom->xch, XC_INVALID_KERNEL,
+                         "%s: kernel is not an ELF image", __func__);
+        return -EINVAL;
+    }
+    return 0;
+}
+
+static elf_negerrnoval xc_dom_probe_hvm_kernel(struct xc_dom_image *dom)
+{
+    struct elf_binary elf;
+    int rc;
+
+    /* This loader is designed for HVM guest firmware. */
+    if ( dom->container_type != XC_DOM_HVM_CONTAINER )
+        return -EINVAL;
+
+    rc = check_elf_kernel(dom, 0);
+    if ( rc != 0 )
+        return rc;
+
+    rc = elf_init(&elf, dom->kernel_blob, dom->kernel_size);
+    if ( rc != 0 )
+        return rc;
+
+    /*
+     * We need to check that there are no Xen ELFNOTES, or
+     * else we might be trying to load a PV kernel.
+     */
+    elf_parse_binary(&elf);
+    rc = elf_xen_parse(&elf, &dom->parms);
+    if ( rc == 0 )
+        return -EINVAL;
+
+    return 0;
+}
+
+static elf_errorstatus xc_dom_parse_hvm_kernel(struct xc_dom_image *dom)
+    /*
+     * This function sometimes returns -1 for error and sometimes
+     * an errno value.  ?!?!
+     */
+{
+    struct elf_binary *elf;
+    elf_errorstatus rc;
+
+    rc = check_elf_kernel(dom, 1);
+    if ( rc != 0 )
+        return rc;
+
+    elf = xc_dom_malloc(dom, sizeof(*elf));
+    if ( elf == NULL )
+        return -1;
+    dom->private_loader = elf;
+    rc = elf_init(elf, dom->kernel_blob, dom->kernel_size);
+    xc_elf_set_logfile(dom->xch, elf, 1);
+    if ( rc != 0 )
+    {
+        xc_dom_panic(dom->xch, XC_INVALID_KERNEL, "%s: corrupted ELF image",
+                     __func__);
+        return rc;
+    }
+
+    if ( !elf_32bit(elf) )
+    {
+        xc_dom_panic(dom->xch, XC_INVALID_KERNEL, "%s: ELF image is not 32bit",
+                     __func__);
+        return -EINVAL;
+    }
+
+    /* parse binary and get xen meta info */
+    elf_parse_binary(elf);
+
+    /* find kernel segment */
+    dom->kernel_seg.vstart = elf->pstart;
+    dom->kernel_seg.vend   = elf->pend;
+
+    dom->guest_type = "hvm-3.0-x86_32";
+
+    if ( elf_check_broken(elf) )
+        DOMPRINTF("%s: ELF broken: %s", __func__, elf_check_broken(elf));
+
+    return rc;
+}
+
+static int modules_init(struct xc_dom_image *dom,
+                        uint64_t vend, struct elf_binary *elf,
+                        uint64_t *mstart_out, uint64_t *mend_out)
+{
+#define MODULE_ALIGN 1UL << 7
+#define MB_ALIGN     1UL << 20
+#define MKALIGN(x, a) (((uint64_t)(x) + (a) - 1) & ~(uint64_t)((a) - 1))
+    uint64_t total_len = 0, offset1 = 0;
+
+    if ( dom->acpi_module.length == 0 && dom->smbios_module.length == 0 )
+        return 0;
+
+    /* Find the total length for the firmware modules with a reasonable large
+     * alignment size to align each the modules.
+     */
+    total_len = MKALIGN(dom->acpi_module.length, MODULE_ALIGN);
+    offset1 = total_len;
+    total_len += MKALIGN(dom->smbios_module.length, MODULE_ALIGN);
+
+    /* Want to place the modules 1Mb+change behind the loader image. */
+    *mstart_out = MKALIGN(elf->pend, MB_ALIGN) + (MB_ALIGN);
+    *mend_out = *mstart_out + total_len;
+
+    if ( *mend_out > vend )
+        return -1;
+
+    if ( dom->acpi_module.length != 0 )
+        dom->acpi_module.guest_addr_out = *mstart_out;
+    if ( dom->smbios_module.length != 0 )
+        dom->smbios_module.guest_addr_out = *mstart_out + offset1;
+
+    return 0;
+}
+
+static int loadmodules(struct xc_dom_image *dom,
+                       uint64_t mstart, uint64_t mend,
+                       uint32_t domid)
+{
+    privcmd_mmap_entry_t *entries = NULL;
+    unsigned long pfn_start;
+    unsigned long pfn_end;
+    size_t pages;
+    uint32_t i;
+    uint8_t *dest;
+    int rc = -1;
+    xc_interface *xch = dom->xch;
+
+    if ( mstart == 0 || mend == 0 )
+        return 0;
+
+    pfn_start = (unsigned long)(mstart >> PAGE_SHIFT);
+    pfn_end = (unsigned long)((mend + PAGE_SIZE - 1) >> PAGE_SHIFT);
+    pages = pfn_end - pfn_start;
+
+    /* Map address space for module list. */
+    entries = calloc(pages, sizeof(privcmd_mmap_entry_t));
+    if ( entries == NULL )
+        goto error_out;
+
+    for ( i = 0; i < pages; i++ )
+        entries[i].mfn = (mstart >> PAGE_SHIFT) + i;
+
+    dest = xc_map_foreign_ranges(
+        xch, domid, pages << PAGE_SHIFT, PROT_READ | PROT_WRITE, 1 << PAGE_SHIFT,
+        entries, pages);
+    if ( dest == NULL )
+        goto error_out;
+
+    /* Zero the range so padding is clear between modules */
+    memset(dest, 0, pages << PAGE_SHIFT);
+
+    /* Load modules into range */
+    if ( dom->acpi_module.length != 0 )
+    {
+        memcpy(dest,
+               dom->acpi_module.data,
+               dom->acpi_module.length);
+    }
+    if ( dom->smbios_module.length != 0 )
+    {
+        memcpy(dest + (dom->smbios_module.guest_addr_out - mstart),
+               dom->smbios_module.data,
+               dom->smbios_module.length);
+    }
+
+    munmap(dest, pages << PAGE_SHIFT);
+    rc = 0;
+
+ error_out:
+    free(entries);
+
+    return rc;
+}
+
+static elf_errorstatus xc_dom_load_hvm_kernel(struct xc_dom_image *dom)
+{
+    struct elf_binary *elf = dom->private_loader;
+    privcmd_mmap_entry_t *entries = NULL;
+    size_t pages = (elf->pend - elf->pstart + PAGE_SIZE - 1) >> PAGE_SHIFT;
+    elf_errorstatus rc;
+    uint64_t m_start = 0, m_end = 0;
+    int i;
+
+    /* Map address space for initial elf image. */
+    entries = calloc(pages, sizeof(privcmd_mmap_entry_t));
+    if ( entries == NULL )
+        return -ENOMEM;
+
+    for ( i = 0; i < pages; i++ )
+        entries[i].mfn = (elf->pstart >> PAGE_SHIFT) + i;
+
+    elf->dest_base = xc_map_foreign_ranges(
+        dom->xch, dom->guest_domid, pages << PAGE_SHIFT,
+        PROT_READ | PROT_WRITE, 1 << PAGE_SHIFT,
+        entries, pages);
+    if ( elf->dest_base == NULL )
+    {
+        DOMPRINTF("%s: unable to map guest memory space", __func__);
+        rc = -EFAULT;
+        goto error;
+    }
+
+    elf->dest_size = pages * XC_DOM_PAGE_SIZE(dom);
+
+    rc = elf_load_binary(elf);
+    if ( rc < 0 )
+    {
+        DOMPRINTF("%s: failed to load elf binary", __func__);
+        return rc;
+    }
+
+    munmap(elf->dest_base, elf->dest_size);
+
+    rc = modules_init(dom, dom->total_pages << PAGE_SHIFT, elf, &m_start,
+                      &m_end);
+    if ( rc != 0 )
+    {
+        DOMPRINTF("%s: insufficient space to load modules.", __func__);
+        return rc;
+    }
+
+    rc = loadmodules(dom, m_start, m_end, dom->guest_domid);
+    if ( rc != 0 )
+    {
+        DOMPRINTF("%s: unable to load modules.", __func__);
+        return rc;
+    }
+
+    dom->parms.phys_entry = elf_uval(elf, elf->ehdr, e_entry);
+
+    free(entries);
+    return 0;
+
+ error:
+    assert(rc != 0);
+    free(entries);
+    return rc;
+}
+
+/* ------------------------------------------------------------------------ */
+
+struct xc_dom_loader hvm_loader = {
+    .name = "HVM-generic",
+    .probe = xc_dom_probe_hvm_kernel,
+    .parser = xc_dom_parse_hvm_kernel,
+    .loader = xc_dom_load_hvm_kernel,
+};
+
+static void __init register_loader(void)
+{
+    xc_dom_register_loader(&hvm_loader);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index 6393040..03d449e 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -424,6 +424,7 @@ struct elf_dom_parms {
     uint64_t elf_paddr_offset;
     uint32_t f_supported[XENFEAT_NR_SUBMAPS];
     uint32_t f_required[XENFEAT_NR_SUBMAPS];
+    uint32_t phys_entry;
 
     /* calculated */
     uint64_t virt_offset;
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 05/29] libxc: make arch_setup_meminit a xc_dom_arch hook
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (3 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 04/29] libxc: introduce a domain loader for HVM guest firmware Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 06/29] libxc: make arch_setup_boot{init/late} xc_dom_arch hooks Roger Pau Monne
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

This allows having different arch_setup_meminit implementations based on the
guest type. It should not introduce any functional changes.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v3:
 - Add Andrew Cooper Reviewed-by.
 - Move xc_dom_arch definitions to the end of the xc_dom_<arch>.c files in
   order to reduce the spurious diffs in the comming patches.
 - Add Wei Acked-by.
---
 tools/libxc/include/xc_dom.h |  4 ++-
 tools/libxc/xc_dom_arm.c     | 70 +++++++++++++++++++++++--------------------
 tools/libxc/xc_dom_boot.c    |  2 +-
 tools/libxc/xc_dom_x86.c     | 71 +++++++++++++++++++++++---------------------
 4 files changed, 78 insertions(+), 69 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 02d9d5c..c4b994f 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -223,6 +223,9 @@ struct xc_dom_arch {
     int (*shared_info) (struct xc_dom_image * dom, void *shared_info);
     int (*vcpu) (struct xc_dom_image * dom, void *vcpu_ctxt);
 
+    /* arch-specific memory initialization. */
+    int (*meminit) (struct xc_dom_image * dom);
+
     char *guest_type;
     char *native_protocol;
     int page_shift;
@@ -400,7 +403,6 @@ static inline xen_pfn_t xc_dom_p2m(struct xc_dom_image *dom, xen_pfn_t pfn)
 
 /* --- arch bits --------------------------------------------------- */
 
-int arch_setup_meminit(struct xc_dom_image *dom);
 int arch_setup_bootearly(struct xc_dom_image *dom);
 int arch_setup_bootlate(struct xc_dom_image *dom);
 
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index b00d667..24776ba 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -194,38 +194,6 @@ static int vcpu_arm64(struct xc_dom_image *dom, void *ptr)
 
 /* ------------------------------------------------------------------------ */
 
-static struct xc_dom_arch xc_dom_32 = {
-    .guest_type = "xen-3.0-armv7l",
-    .native_protocol = XEN_IO_PROTO_ABI_ARM,
-    .page_shift = PAGE_SHIFT_ARM,
-    .sizeof_pfn = 8,
-    .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_arm,
-    .setup_pgtables = setup_pgtables_arm,
-    .start_info = start_info_arm,
-    .shared_info = shared_info_arm,
-    .vcpu = vcpu_arm32,
-};
-
-static struct xc_dom_arch xc_dom_64 = {
-    .guest_type = "xen-3.0-aarch64",
-    .native_protocol = XEN_IO_PROTO_ABI_ARM,
-    .page_shift = PAGE_SHIFT_ARM,
-    .sizeof_pfn = 8,
-    .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_arm,
-    .setup_pgtables = setup_pgtables_arm,
-    .start_info = start_info_arm,
-    .shared_info = shared_info_arm,
-    .vcpu = vcpu_arm64,
-};
-
-static void __init register_arch_hooks(void)
-{
-    xc_dom_register_arch_hooks(&xc_dom_32);
-    xc_dom_register_arch_hooks(&xc_dom_64);
-}
-
 static int set_mode(xc_interface *xch, domid_t domid, char *guest_type)
 {
     static const struct {
@@ -384,7 +352,7 @@ out:
     return rc < 0 ? rc : 0;
 }
 
-int arch_setup_meminit(struct xc_dom_image *dom)
+static int meminit(struct xc_dom_image *dom)
 {
     int i, rc;
     xen_pfn_t pfn;
@@ -542,6 +510,42 @@ int xc_dom_feature_translated(struct xc_dom_image *dom)
     return 1;
 }
 
+/* ------------------------------------------------------------------------ */
+
+static struct xc_dom_arch xc_dom_32 = {
+    .guest_type = "xen-3.0-armv7l",
+    .native_protocol = XEN_IO_PROTO_ABI_ARM,
+    .page_shift = PAGE_SHIFT_ARM,
+    .sizeof_pfn = 8,
+    .alloc_magic_pages = alloc_magic_pages,
+    .count_pgtables = count_pgtables_arm,
+    .setup_pgtables = setup_pgtables_arm,
+    .start_info = start_info_arm,
+    .shared_info = shared_info_arm,
+    .vcpu = vcpu_arm32,
+    .meminit = meminit,
+};
+
+static struct xc_dom_arch xc_dom_64 = {
+    .guest_type = "xen-3.0-aarch64",
+    .native_protocol = XEN_IO_PROTO_ABI_ARM,
+    .page_shift = PAGE_SHIFT_ARM,
+    .sizeof_pfn = 8,
+    .alloc_magic_pages = alloc_magic_pages,
+    .count_pgtables = count_pgtables_arm,
+    .setup_pgtables = setup_pgtables_arm,
+    .start_info = start_info_arm,
+    .shared_info = shared_info_arm,
+    .vcpu = vcpu_arm64,
+    .meminit = meminit,
+};
+
+static void __init register_arch_hooks(void)
+{
+    xc_dom_register_arch_hooks(&xc_dom_32);
+    xc_dom_register_arch_hooks(&xc_dom_64);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c
index 7c30f96..bf2cd7b 100644
--- a/tools/libxc/xc_dom_boot.c
+++ b/tools/libxc/xc_dom_boot.c
@@ -146,7 +146,7 @@ int xc_dom_boot_mem_init(struct xc_dom_image *dom)
 
     DOMPRINTF_CALLED(dom->xch);
 
-    rc = arch_setup_meminit(dom);
+    rc = dom->arch_hooks->meminit(dom);
     if ( rc != 0 )
     {
         xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index c7bfc0c..07170c1 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -670,38 +670,6 @@ static int vcpu_x86_64(struct xc_dom_image *dom, void *ptr)
 
 /* ------------------------------------------------------------------------ */
 
-static struct xc_dom_arch xc_dom_32_pae = {
-    .guest_type = "xen-3.0-x86_32p",
-    .native_protocol = XEN_IO_PROTO_ABI_X86_32,
-    .page_shift = PAGE_SHIFT_X86,
-    .sizeof_pfn = 4,
-    .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_x86_32_pae,
-    .setup_pgtables = setup_pgtables_x86_32_pae,
-    .start_info = start_info_x86_32,
-    .shared_info = shared_info_x86_32,
-    .vcpu = vcpu_x86_32,
-};
-
-static struct xc_dom_arch xc_dom_64 = {
-    .guest_type = "xen-3.0-x86_64",
-    .native_protocol = XEN_IO_PROTO_ABI_X86_64,
-    .page_shift = PAGE_SHIFT_X86,
-    .sizeof_pfn = 8,
-    .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_x86_64,
-    .setup_pgtables = setup_pgtables_x86_64,
-    .start_info = start_info_x86_64,
-    .shared_info = shared_info_x86_64,
-    .vcpu = vcpu_x86_64,
-};
-
-static void __init register_arch_hooks(void)
-{
-    xc_dom_register_arch_hooks(&xc_dom_32_pae);
-    xc_dom_register_arch_hooks(&xc_dom_64);
-}
-
 static int x86_compat(xc_interface *xch, domid_t domid, char *guest_type)
 {
     static const struct {
@@ -733,7 +701,6 @@ static int x86_compat(xc_interface *xch, domid_t domid, char *guest_type)
     return rc;
 }
 
-
 static int x86_shadow(xc_interface *xch, domid_t domid)
 {
     int rc, mode;
@@ -757,7 +724,7 @@ static int x86_shadow(xc_interface *xch, domid_t domid)
     return rc;
 }
 
-int arch_setup_meminit(struct xc_dom_image *dom)
+static int meminit_pv(struct xc_dom_image *dom)
 {
     int rc;
     xen_pfn_t pfn, allocsz, mfn, total, pfn_base;
@@ -1078,6 +1045,42 @@ int xc_dom_feature_translated(struct xc_dom_image *dom)
     return elf_xen_feature_get(XENFEAT_auto_translated_physmap, dom->f_active);
 }
 
+/* ------------------------------------------------------------------------ */
+
+static struct xc_dom_arch xc_dom_32_pae = {
+    .guest_type = "xen-3.0-x86_32p",
+    .native_protocol = XEN_IO_PROTO_ABI_X86_32,
+    .page_shift = PAGE_SHIFT_X86,
+    .sizeof_pfn = 4,
+    .alloc_magic_pages = alloc_magic_pages,
+    .count_pgtables = count_pgtables_x86_32_pae,
+    .setup_pgtables = setup_pgtables_x86_32_pae,
+    .start_info = start_info_x86_32,
+    .shared_info = shared_info_x86_32,
+    .vcpu = vcpu_x86_32,
+    .meminit = meminit_pv,
+};
+
+static struct xc_dom_arch xc_dom_64 = {
+    .guest_type = "xen-3.0-x86_64",
+    .native_protocol = XEN_IO_PROTO_ABI_X86_64,
+    .page_shift = PAGE_SHIFT_X86,
+    .sizeof_pfn = 8,
+    .alloc_magic_pages = alloc_magic_pages,
+    .count_pgtables = count_pgtables_x86_64,
+    .setup_pgtables = setup_pgtables_x86_64,
+    .start_info = start_info_x86_64,
+    .shared_info = shared_info_x86_64,
+    .vcpu = vcpu_x86_64,
+    .meminit = meminit_pv,
+};
+
+static void __init register_arch_hooks(void)
+{
+    xc_dom_register_arch_hooks(&xc_dom_32_pae);
+    xc_dom_register_arch_hooks(&xc_dom_64);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 06/29] libxc: make arch_setup_boot{init/late} xc_dom_arch hooks
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (4 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 05/29] libxc: make arch_setup_meminit a xc_dom_arch hook Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 07/29] libxc: rework BSP initialization Roger Pau Monne
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

This should not introduce any functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v3:
 - Add Andrew Cooper Reviewed-by.
 - Add Wei Acked-by.
---
 tools/libxc/include/xc_dom.h |  7 ++-----
 tools/libxc/xc_dom_arm.c     | 20 +++++++++++++-------
 tools/libxc/xc_dom_boot.c    |  4 ++--
 tools/libxc/xc_dom_x86.c     | 10 ++++++++--
 4 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index c4b994f..5c1bb0f 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -222,6 +222,8 @@ struct xc_dom_arch {
     int (*start_info) (struct xc_dom_image * dom);
     int (*shared_info) (struct xc_dom_image * dom, void *shared_info);
     int (*vcpu) (struct xc_dom_image * dom, void *vcpu_ctxt);
+    int (*bootearly) (struct xc_dom_image * dom);
+    int (*bootlate) (struct xc_dom_image * dom);
 
     /* arch-specific memory initialization. */
     int (*meminit) (struct xc_dom_image * dom);
@@ -401,11 +403,6 @@ static inline xen_pfn_t xc_dom_p2m(struct xc_dom_image *dom, xen_pfn_t pfn)
     return dom->p2m_host[pfn - dom->rambase_pfn];
 }
 
-/* --- arch bits --------------------------------------------------- */
-
-int arch_setup_bootearly(struct xc_dom_image *dom);
-int arch_setup_bootlate(struct xc_dom_image *dom);
-
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index 24776ba..7548dae 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -489,13 +489,20 @@ static int meminit(struct xc_dom_image *dom)
     return 0;
 }
 
-int arch_setup_bootearly(struct xc_dom_image *dom)
+int xc_dom_feature_translated(struct xc_dom_image *dom)
+{
+    return 1;
+}
+
+/* ------------------------------------------------------------------------ */
+
+static int bootearly(struct xc_dom_image *dom)
 {
     DOMPRINTF("%s: doing nothing", __FUNCTION__);
     return 0;
 }
 
-int arch_setup_bootlate(struct xc_dom_image *dom)
+static int bootlate(struct xc_dom_image *dom)
 {
     /* XXX
      *   map shared info
@@ -505,11 +512,6 @@ int arch_setup_bootlate(struct xc_dom_image *dom)
     return 0;
 }
 
-int xc_dom_feature_translated(struct xc_dom_image *dom)
-{
-    return 1;
-}
-
 /* ------------------------------------------------------------------------ */
 
 static struct xc_dom_arch xc_dom_32 = {
@@ -524,6 +526,8 @@ static struct xc_dom_arch xc_dom_32 = {
     .shared_info = shared_info_arm,
     .vcpu = vcpu_arm32,
     .meminit = meminit,
+    .bootearly = bootearly,
+    .bootlate = bootlate,
 };
 
 static struct xc_dom_arch xc_dom_64 = {
@@ -538,6 +542,8 @@ static struct xc_dom_arch xc_dom_64 = {
     .shared_info = shared_info_arm,
     .vcpu = vcpu_arm64,
     .meminit = meminit,
+    .bootearly = bootearly,
+    .bootlate = bootlate,
 };
 
 static void __init register_arch_hooks(void)
diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c
index bf2cd7b..e6f7794 100644
--- a/tools/libxc/xc_dom_boot.c
+++ b/tools/libxc/xc_dom_boot.c
@@ -208,7 +208,7 @@ int xc_dom_boot_image(struct xc_dom_image *dom)
     DOMPRINTF_CALLED(dom->xch);
 
     /* misc stuff*/
-    if ( (rc = arch_setup_bootearly(dom)) != 0 )
+    if ( (rc = dom->arch_hooks->bootearly(dom)) != 0 )
         return rc;
 
     /* collect some info */
@@ -255,7 +255,7 @@ int xc_dom_boot_image(struct xc_dom_image *dom)
     xc_dom_log_memory_footprint(dom);
 
     /* misc x86 stuff */
-    if ( (rc = arch_setup_bootlate(dom)) != 0 )
+    if ( (rc = dom->arch_hooks->bootlate(dom)) != 0 )
         return rc;
 
     /* let the vm run */
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 07170c1..0f49e27 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -924,7 +924,9 @@ static int meminit_pv(struct xc_dom_image *dom)
     return rc;
 }
 
-int arch_setup_bootearly(struct xc_dom_image *dom)
+/* ------------------------------------------------------------------------ */
+
+static int bootearly(struct xc_dom_image *dom)
 {
     DOMPRINTF("%s: doing nothing", __FUNCTION__);
     return 0;
@@ -963,7 +965,7 @@ static int map_grant_table_frames(struct xc_dom_image *dom)
     return 0;
 }
 
-int arch_setup_bootlate(struct xc_dom_image *dom)
+static int bootlate_pv(struct xc_dom_image *dom)
 {
     static const struct {
         char *guest;
@@ -1059,6 +1061,8 @@ static struct xc_dom_arch xc_dom_32_pae = {
     .shared_info = shared_info_x86_32,
     .vcpu = vcpu_x86_32,
     .meminit = meminit_pv,
+    .bootearly = bootearly,
+    .bootlate = bootlate_pv,
 };
 
 static struct xc_dom_arch xc_dom_64 = {
@@ -1073,6 +1077,8 @@ static struct xc_dom_arch xc_dom_64 = {
     .shared_info = shared_info_x86_64,
     .vcpu = vcpu_x86_64,
     .meminit = meminit_pv,
+    .bootearly = bootearly,
+    .bootlate = bootlate_pv,
 };
 
 static void __init register_arch_hooks(void)
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 07/29] libxc: rework BSP initialization
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (5 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 06/29] libxc: make arch_setup_boot{init/late} xc_dom_arch hooks Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 08/29] libxc: introduce a xc_dom_arch for hvm-3.0-x86_32 guests Roger Pau Monne
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

Place the calls to xc_vcpu_setcontext and the allocation of the hypercall
buffer into the arch-specific vcpu hooks. This is needed in order to
introduce a new builder, so x86 HVM guests can initialize the BSP using
XEN_DOMCTL_sethvmcontext instead of XEN_DOMCTL_setvcpucontext.

This patch should not introduce any functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v5:
 - Reword commit message to remove a reference to "next patch".

Changes since v4:
 - Add Andrew Cooper Reviewed-by.
---
 tools/libxc/include/xc_dom.h |  2 +-
 tools/libxc/xc_dom_arm.c     | 22 +++++++++++++++++-----
 tools/libxc/xc_dom_boot.c    | 23 +----------------------
 tools/libxc/xc_dom_x86.c     | 26 ++++++++++++++++++++------
 4 files changed, 39 insertions(+), 34 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 5c1bb0f..0245d24 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -221,7 +221,7 @@ struct xc_dom_arch {
     /* arch-specific data structs setup */
     int (*start_info) (struct xc_dom_image * dom);
     int (*shared_info) (struct xc_dom_image * dom, void *shared_info);
-    int (*vcpu) (struct xc_dom_image * dom, void *vcpu_ctxt);
+    int (*vcpu) (struct xc_dom_image * dom);
     int (*bootearly) (struct xc_dom_image * dom);
     int (*bootlate) (struct xc_dom_image * dom);
 
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index 7548dae..8865097 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -119,9 +119,10 @@ static int shared_info_arm(struct xc_dom_image *dom, void *ptr)
 
 /* ------------------------------------------------------------------------ */
 
-static int vcpu_arm32(struct xc_dom_image *dom, void *ptr)
+static int vcpu_arm32(struct xc_dom_image *dom)
 {
-    vcpu_guest_context_t *ctxt = ptr;
+    vcpu_guest_context_any_t any_ctx;
+    vcpu_guest_context_t *ctxt = &any_ctx.c;
 
     DOMPRINTF_CALLED(dom->xch);
 
@@ -154,12 +155,18 @@ static int vcpu_arm32(struct xc_dom_image *dom, void *ptr)
     DOMPRINTF("Initial state CPSR %#"PRIx32" PC %#"PRIx32,
            ctxt->user_regs.cpsr, ctxt->user_regs.pc32);
 
-    return 0;
+    rc = xc_vcpu_setcontext(dom->xch, dom->guest_domid, 0, &any_ctx);
+    if ( rc != 0 )
+        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                     "%s: SETVCPUCONTEXT failed (rc=%d)", __func__, rc);
+
+    return rc;
 }
 
-static int vcpu_arm64(struct xc_dom_image *dom, void *ptr)
+static int vcpu_arm64(struct xc_dom_image *dom)
 {
-    vcpu_guest_context_t *ctxt = ptr;
+    vcpu_guest_context_any_t any_ctx;
+    vcpu_guest_context_t *ctxt = &any_ctx.c;
 
     DOMPRINTF_CALLED(dom->xch);
     /* clear everything */
@@ -189,6 +196,11 @@ static int vcpu_arm64(struct xc_dom_image *dom, void *ptr)
     DOMPRINTF("Initial state CPSR %#"PRIx32" PC %#"PRIx64,
            ctxt->user_regs.cpsr, ctxt->user_regs.pc64);
 
+    rc = xc_vcpu_setcontext(dom->xch, dom->guest_domid, 0, &any_ctx);
+    if ( rc != 0 )
+        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                     "%s: SETVCPUCONTEXT failed (rc=%d)", __func__, rc);
+
     return 0;
 }
 
diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c
index e6f7794..791041b 100644
--- a/tools/libxc/xc_dom_boot.c
+++ b/tools/libxc/xc_dom_boot.c
@@ -62,19 +62,6 @@ static int setup_hypercall_page(struct xc_dom_image *dom)
     return rc;
 }
 
-static int launch_vm(xc_interface *xch, domid_t domid,
-                     vcpu_guest_context_any_t *ctxt)
-{
-    int rc;
-
-    xc_dom_printf(xch, "%s: called, ctxt=%p", __FUNCTION__, ctxt);
-    rc = xc_vcpu_setcontext(xch, domid, 0, ctxt);
-    if ( rc != 0 )
-        xc_dom_panic(xch, XC_INTERNAL_ERROR,
-                     "%s: SETVCPUCONTEXT failed (rc=%d)", __FUNCTION__, rc);
-    return rc;
-}
-
 static int clear_page(struct xc_dom_image *dom, xen_pfn_t pfn)
 {
     xen_pfn_t dst;
@@ -197,14 +184,9 @@ void *xc_dom_boot_domU_map(struct xc_dom_image *dom, xen_pfn_t pfn,
 
 int xc_dom_boot_image(struct xc_dom_image *dom)
 {
-    DECLARE_HYPERCALL_BUFFER(vcpu_guest_context_any_t, ctxt);
     xc_dominfo_t info;
     int rc;
 
-    ctxt = xc_hypercall_buffer_alloc(dom->xch, ctxt, sizeof(*ctxt));
-    if ( ctxt == NULL )
-        return -1;
-
     DOMPRINTF_CALLED(dom->xch);
 
     /* misc stuff*/
@@ -259,13 +241,10 @@ int xc_dom_boot_image(struct xc_dom_image *dom)
         return rc;
 
     /* let the vm run */
-    memset(ctxt, 0, sizeof(*ctxt));
-    if ( (rc = dom->arch_hooks->vcpu(dom, ctxt)) != 0 )
+    if ( (rc = dom->arch_hooks->vcpu(dom)) != 0 )
         return rc;
     xc_dom_unmap_all(dom);
-    rc = launch_vm(dom->xch, dom->guest_domid, ctxt);
 
-    xc_hypercall_buffer_free(dom->xch, ctxt);
     return rc;
 }
 
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 0f49e27..ae8187f 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -583,10 +583,12 @@ static int shared_info_x86_64(struct xc_dom_image *dom, void *ptr)
 
 /* ------------------------------------------------------------------------ */
 
-static int vcpu_x86_32(struct xc_dom_image *dom, void *ptr)
+static int vcpu_x86_32(struct xc_dom_image *dom)
 {
-    vcpu_guest_context_x86_32_t *ctxt = ptr;
+    vcpu_guest_context_any_t any_ctx;
+    vcpu_guest_context_x86_32_t *ctxt = &any_ctx.x32;
     xen_pfn_t cr3_pfn;
+    int rc;
 
     DOMPRINTF_CALLED(dom->xch);
 
@@ -626,13 +628,20 @@ static int vcpu_x86_32(struct xc_dom_image *dom, void *ptr)
     DOMPRINTF("%s: cr3: pfn 0x%" PRIpfn " mfn 0x%" PRIpfn "",
               __FUNCTION__, dom->pgtables_seg.pfn, cr3_pfn);
 
-    return 0;
+    rc = xc_vcpu_setcontext(dom->xch, dom->guest_domid, 0, &any_ctx);
+    if ( rc != 0 )
+        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                     "%s: SETVCPUCONTEXT failed (rc=%d)", __func__, rc);
+
+    return rc;
 }
 
-static int vcpu_x86_64(struct xc_dom_image *dom, void *ptr)
+static int vcpu_x86_64(struct xc_dom_image *dom)
 {
-    vcpu_guest_context_x86_64_t *ctxt = ptr;
+    vcpu_guest_context_any_t any_ctx;
+    vcpu_guest_context_x86_64_t *ctxt = &any_ctx.x64;
     xen_pfn_t cr3_pfn;
+    int rc;
 
     DOMPRINTF_CALLED(dom->xch);
 
@@ -665,7 +674,12 @@ static int vcpu_x86_64(struct xc_dom_image *dom, void *ptr)
     ctxt->kernel_ss = ctxt->user_regs.ss;
     ctxt->kernel_sp = ctxt->user_regs.esp;
 
-    return 0;
+    rc = xc_vcpu_setcontext(dom->xch, dom->guest_domid, 0, &any_ctx);
+    if ( rc != 0 )
+        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                     "%s: SETVCPUCONTEXT failed (rc=%d)", __func__, rc);
+
+    return rc;
 }
 
 /* ------------------------------------------------------------------------ */
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 08/29] libxc: introduce a xc_dom_arch for hvm-3.0-x86_32 guests
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (6 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 07/29] libxc: rework BSP initialization Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-18 15:53   ` Anthony PERARD
  2015-09-04 12:08 ` [PATCH v6 09/29] libxl: switch HVM domain building to use xc_dom_* helpers Roger Pau Monne
                   ` (21 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

This xc_dom_arch will be used in order to build HVM domains. The code is
based on the existing xc_hvm_populate_memory and xc_hvm_populate_params
functions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v5:
 - Set tr limit to 0x67.
 - Use "goto out" consistently in vcpu_hvm.
 - Unconditionally call free(full_ctx) before exiting vcpu_hvm.
 - Add Wei Liu Ack.

Changes since v4:
 - Replace a malloc+memset with a calloc.
 - Remove a != NULL check.
 - Add Andrew Cooper Reviewed-by.

Changes since v3:
 - Make sure c/s b9dbe33 is not reverted on this patch.
 - Set the initial BSP state using {get/set}hvmcontext.
---
 tools/libxc/include/xc_dom.h |   6 +
 tools/libxc/xc_dom_x86.c     | 618 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 613 insertions(+), 11 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 0245d24..cda40d9 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -188,6 +188,12 @@ struct xc_dom_image {
     } container_type;
 
     /* HVM specific fields. */
+    xen_pfn_t target_pages;
+    xen_pfn_t mmio_start;
+    xen_pfn_t mmio_size;
+    xen_pfn_t lowmem_end;
+    xen_pfn_t highmem_end;
+
     /* Extra ACPI tables passed to HVMLOADER */
     struct xc_hvm_firmware_module acpi_module;
 
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index ae8187f..f36b6f6 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -39,10 +39,32 @@
 
 /* ------------------------------------------------------------------------ */
 
-#define SUPERPAGE_PFN_SHIFT  9
-#define SUPERPAGE_NR_PFNS    (1UL << SUPERPAGE_PFN_SHIFT)
 #define SUPERPAGE_BATCH_SIZE 512
 
+#define SUPERPAGE_2MB_SHIFT   9
+#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
+#define SUPERPAGE_1GB_SHIFT   18
+#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
+
+#define X86_CR0_PE 0x01
+#define X86_CR0_ET 0x10
+
+#define VGA_HOLE_SIZE (0x20)
+
+#define SPECIALPAGE_PAGING   0
+#define SPECIALPAGE_ACCESS   1
+#define SPECIALPAGE_SHARING  2
+#define SPECIALPAGE_BUFIOREQ 3
+#define SPECIALPAGE_XENSTORE 4
+#define SPECIALPAGE_IOREQ    5
+#define SPECIALPAGE_IDENT_PT 6
+#define SPECIALPAGE_CONSOLE  7
+#define NR_SPECIAL_PAGES     8
+#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
+
+#define NR_IOREQ_SERVER_PAGES 8
+#define ioreq_server_pfn(x) (special_pfn(0) - NR_IOREQ_SERVER_PAGES + (x))
+
 #define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits))-1)
 #define round_down(addr, mask)   ((addr) & ~(mask))
 #define round_up(addr, mask)     ((addr) | (mask))
@@ -461,6 +483,135 @@ static int alloc_magic_pages(struct xc_dom_image *dom)
     return 0;
 }
 
+static void build_hvm_info(void *hvm_info_page, struct xc_dom_image *dom)
+{
+    struct hvm_info_table *hvm_info = (struct hvm_info_table *)
+        (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
+    uint8_t sum;
+    int i;
+
+    memset(hvm_info_page, 0, PAGE_SIZE);
+
+    /* Fill in the header. */
+    memcpy(hvm_info->signature, "HVM INFO", sizeof(hvm_info->signature));
+    hvm_info->length = sizeof(struct hvm_info_table);
+
+    /* Sensible defaults: these can be overridden by the caller. */
+    hvm_info->apic_mode = 1;
+    hvm_info->nr_vcpus = 1;
+    memset(hvm_info->vcpu_online, 0xff, sizeof(hvm_info->vcpu_online));
+
+    /* Memory parameters. */
+    hvm_info->low_mem_pgend = dom->lowmem_end >> PAGE_SHIFT;
+    hvm_info->high_mem_pgend = dom->highmem_end >> PAGE_SHIFT;
+    hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
+
+    /* Finish with the checksum. */
+    for ( i = 0, sum = 0; i < hvm_info->length; i++ )
+        sum += ((uint8_t *)hvm_info)[i];
+    hvm_info->checksum = -sum;
+}
+
+static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
+{
+    unsigned long i;
+    void *hvm_info_page;
+    uint32_t *ident_pt, domid = dom->guest_domid;
+    int rc;
+    xen_pfn_t special_array[NR_SPECIAL_PAGES];
+    xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
+    xc_interface *xch = dom->xch;
+
+    if ( (hvm_info_page = xc_map_foreign_range(
+              xch, domid, PAGE_SIZE, PROT_READ | PROT_WRITE,
+              HVM_INFO_PFN)) == NULL )
+        goto error_out;
+    build_hvm_info(hvm_info_page, dom);
+    munmap(hvm_info_page, PAGE_SIZE);
+
+    /* Allocate and clear special pages. */
+    for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
+        special_array[i] = special_pfn(i);
+
+    rc = xc_domain_populate_physmap_exact(xch, domid, NR_SPECIAL_PAGES, 0, 0,
+                                          special_array);
+    if ( rc != 0 )
+    {
+        DOMPRINTF("Could not allocate special pages.");
+        goto error_out;
+    }
+
+    if ( xc_clear_domain_pages(xch, domid, special_pfn(0), NR_SPECIAL_PAGES) )
+            goto error_out;
+
+    xc_hvm_param_set(xch, domid, HVM_PARAM_STORE_PFN,
+                     special_pfn(SPECIALPAGE_XENSTORE));
+    xc_hvm_param_set(xch, domid, HVM_PARAM_BUFIOREQ_PFN,
+                     special_pfn(SPECIALPAGE_BUFIOREQ));
+    xc_hvm_param_set(xch, domid, HVM_PARAM_IOREQ_PFN,
+                     special_pfn(SPECIALPAGE_IOREQ));
+    xc_hvm_param_set(xch, domid, HVM_PARAM_CONSOLE_PFN,
+                     special_pfn(SPECIALPAGE_CONSOLE));
+    xc_hvm_param_set(xch, domid, HVM_PARAM_PAGING_RING_PFN,
+                     special_pfn(SPECIALPAGE_PAGING));
+    xc_hvm_param_set(xch, domid, HVM_PARAM_MONITOR_RING_PFN,
+                     special_pfn(SPECIALPAGE_ACCESS));
+    xc_hvm_param_set(xch, domid, HVM_PARAM_SHARING_RING_PFN,
+                     special_pfn(SPECIALPAGE_SHARING));
+
+    /*
+     * Allocate and clear additional ioreq server pages. The default
+     * server will use the IOREQ and BUFIOREQ special pages above.
+     */
+    for ( i = 0; i < NR_IOREQ_SERVER_PAGES; i++ )
+        ioreq_server_array[i] = ioreq_server_pfn(i);
+
+    rc = xc_domain_populate_physmap_exact(xch, domid, NR_IOREQ_SERVER_PAGES, 0,
+                                          0, ioreq_server_array);
+    if ( rc != 0 )
+    {
+        DOMPRINTF("Could not allocate ioreq server pages.");
+        goto error_out;
+    }
+
+    if ( xc_clear_domain_pages(xch, domid, ioreq_server_pfn(0),
+                               NR_IOREQ_SERVER_PAGES) )
+            goto error_out;
+
+    /* Tell the domain where the pages are and how many there are */
+    xc_hvm_param_set(xch, domid, HVM_PARAM_IOREQ_SERVER_PFN,
+                     ioreq_server_pfn(0));
+    xc_hvm_param_set(xch, domid, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                     NR_IOREQ_SERVER_PAGES);
+
+    /*
+     * Identity-map page table is required for running with CR0.PG=0 when
+     * using Intel EPT. Create a 32-bit non-PAE page directory of superpages.
+     */
+    if ( (ident_pt = xc_map_foreign_range(
+              xch, domid, PAGE_SIZE, PROT_READ | PROT_WRITE,
+              special_pfn(SPECIALPAGE_IDENT_PT))) == NULL )
+        goto error_out;
+    for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ )
+        ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
+                       _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
+    munmap(ident_pt, PAGE_SIZE);
+    xc_hvm_param_set(xch, domid, HVM_PARAM_IDENT_PT,
+                     special_pfn(SPECIALPAGE_IDENT_PT) << PAGE_SHIFT);
+
+    dom->console_pfn = special_pfn(SPECIALPAGE_CONSOLE);
+    dom->xenstore_pfn = special_pfn(SPECIALPAGE_XENSTORE);
+    dom->parms.virt_hypercall = -1;
+
+    rc = 0;
+    goto out;
+ error_out:
+    rc = -1;
+ out:
+
+    return rc;
+}
+
 /* ------------------------------------------------------------------------ */
 
 static int start_info_x86_32(struct xc_dom_image *dom)
@@ -682,6 +833,102 @@ static int vcpu_x86_64(struct xc_dom_image *dom)
     return rc;
 }
 
+static int vcpu_hvm(struct xc_dom_image *dom)
+{
+    struct {
+        struct hvm_save_descriptor header_d;
+        HVM_SAVE_TYPE(HEADER) header;
+        struct hvm_save_descriptor cpu_d;
+        HVM_SAVE_TYPE(CPU) cpu;
+        struct hvm_save_descriptor end_d;
+        HVM_SAVE_TYPE(END) end;
+    } bsp_ctx;
+    uint8_t *full_ctx = NULL;
+    int rc;
+
+    DOMPRINTF_CALLED(dom->xch);
+
+    /*
+     * Get the full HVM context in order to have the header, it is not
+     * possible to get the header with getcontext_partial, and crafting one
+     * from userspace is also not an option since cpuid is trapped and
+     * modified by Xen.
+     */
+
+    rc = xc_domain_hvm_getcontext(dom->xch, dom->guest_domid, NULL, 0);
+    if ( rc <= 0 )
+    {
+        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                     "%s: unable to fetch HVM context size (rc=%d)",
+                     __func__, rc);
+        goto out;
+    }
+
+    full_ctx = calloc(1, rc);
+    if ( full_ctx == NULL )
+    {
+        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                     "%s: unable to allocate memory for HVM context (rc=%d)",
+                     __func__, rc);
+        rc = -ENOMEM;
+        goto out;
+    }
+
+    rc = xc_domain_hvm_getcontext(dom->xch, dom->guest_domid, full_ctx, rc);
+    if ( rc <= 0 )
+    {
+        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                     "%s: unable to fetch HVM context (rc=%d)",
+                     __func__, rc);
+        goto out;
+    }
+
+    /* Copy the header to our partial context. */
+    memset(&bsp_ctx, 0, sizeof(bsp_ctx));
+    memcpy(&bsp_ctx, full_ctx,
+           sizeof(struct hvm_save_descriptor) + HVM_SAVE_LENGTH(HEADER));
+
+    /* Set the CPU descriptor. */
+    bsp_ctx.cpu_d.typecode = HVM_SAVE_CODE(CPU);
+    bsp_ctx.cpu_d.instance = 0;
+    bsp_ctx.cpu_d.length = HVM_SAVE_LENGTH(CPU);
+
+    /* Set the cached part of the relevant segment registers. */
+    bsp_ctx.cpu.cs_base = 0;
+    bsp_ctx.cpu.ds_base = 0;
+    bsp_ctx.cpu.ss_base = 0;
+    bsp_ctx.cpu.tr_base = 0;
+    bsp_ctx.cpu.cs_limit = ~0u;
+    bsp_ctx.cpu.ds_limit = ~0u;
+    bsp_ctx.cpu.ss_limit = ~0u;
+    bsp_ctx.cpu.tr_limit = 0x67;
+    bsp_ctx.cpu.cs_arbytes = 0xc9b;
+    bsp_ctx.cpu.ds_arbytes = 0xc93;
+    bsp_ctx.cpu.ss_arbytes = 0xc93;
+    bsp_ctx.cpu.tr_arbytes = 0x8b;
+
+    /* Set the control registers. */
+    bsp_ctx.cpu.cr0 = X86_CR0_PE | X86_CR0_ET;
+
+    /* Set the IP. */
+    bsp_ctx.cpu.rip = dom->parms.phys_entry;
+
+    /* Set the end descriptor. */
+    bsp_ctx.end_d.typecode = HVM_SAVE_CODE(END);
+    bsp_ctx.end_d.instance = 0;
+    bsp_ctx.end_d.length = HVM_SAVE_LENGTH(END);
+
+    rc = xc_domain_hvm_setcontext(dom->xch, dom->guest_domid,
+                                  (uint8_t *)&bsp_ctx, sizeof(bsp_ctx));
+    if ( rc != 0 )
+        xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                     "%s: SETHVMCONTEXT failed (rc=%d)", __func__, rc);
+
+ out:
+    free(full_ctx);
+    return rc;
+}
+
 /* ------------------------------------------------------------------------ */
 
 static int x86_compat(xc_interface *xch, domid_t domid, char *guest_type)
@@ -762,7 +1009,7 @@ static int meminit_pv(struct xc_dom_image *dom)
 
     if ( dom->superpages )
     {
-        int count = dom->total_pages >> SUPERPAGE_PFN_SHIFT;
+        int count = dom->total_pages >> SUPERPAGE_2MB_SHIFT;
         xen_pfn_t extents[count];
 
         dom->p2m_size = dom->total_pages;
@@ -773,9 +1020,9 @@ static int meminit_pv(struct xc_dom_image *dom)
 
         DOMPRINTF("Populating memory with %d superpages", count);
         for ( pfn = 0; pfn < count; pfn++ )
-            extents[pfn] = pfn << SUPERPAGE_PFN_SHIFT;
+            extents[pfn] = pfn << SUPERPAGE_2MB_SHIFT;
         rc = xc_domain_populate_physmap_exact(dom->xch, dom->guest_domid,
-                                               count, SUPERPAGE_PFN_SHIFT, 0,
+                                               count, SUPERPAGE_2MB_SHIFT, 0,
                                                extents);
         if ( rc )
             return rc;
@@ -785,7 +1032,7 @@ static int meminit_pv(struct xc_dom_image *dom)
         for ( i = 0; i < count; i++ )
         {
             mfn = extents[i];
-            for ( j = 0; j < SUPERPAGE_NR_PFNS; j++, pfn++ )
+            for ( j = 0; j < SUPERPAGE_2MB_NR_PFNS; j++, pfn++ )
                 dom->p2m_host[pfn] = mfn + j;
         }
     }
@@ -870,7 +1117,7 @@ static int meminit_pv(struct xc_dom_image *dom)
 
             pages = (vmemranges[i].end - vmemranges[i].start)
                 >> PAGE_SHIFT;
-            super_pages = pages >> SUPERPAGE_PFN_SHIFT;
+            super_pages = pages >> SUPERPAGE_2MB_SHIFT;
             pfn_base = vmemranges[i].start >> PAGE_SHIFT;
 
             for ( pfn = pfn_base; pfn < pfn_base+pages; pfn++ )
@@ -883,11 +1130,11 @@ static int meminit_pv(struct xc_dom_image *dom)
                 super_pages -= count;
 
                 for ( pfn = pfn_base_idx, j = 0;
-                      pfn < pfn_base_idx + (count << SUPERPAGE_PFN_SHIFT);
-                      pfn += SUPERPAGE_NR_PFNS, j++ )
+                      pfn < pfn_base_idx + (count << SUPERPAGE_2MB_SHIFT);
+                      pfn += SUPERPAGE_2MB_NR_PFNS, j++ )
                     extents[j] = dom->p2m_host[pfn];
                 rc = xc_domain_populate_physmap(dom->xch, dom->guest_domid, count,
-                                                SUPERPAGE_PFN_SHIFT, memflags,
+                                                SUPERPAGE_2MB_SHIFT, memflags,
                                                 extents);
                 if ( rc < 0 )
                     return rc;
@@ -897,7 +1144,7 @@ static int meminit_pv(struct xc_dom_image *dom)
                 for ( j = 0; j < rc; j++ )
                 {
                     mfn = extents[j];
-                    for ( k = 0; k < SUPERPAGE_NR_PFNS; k++, pfn++ )
+                    for ( k = 0; k < SUPERPAGE_2MB_NR_PFNS; k++, pfn++ )
                         dom->p2m_host[pfn] = mfn + k;
                 }
                 pfn_base_idx = pfn;
@@ -938,6 +1185,332 @@ static int meminit_pv(struct xc_dom_image *dom)
     return rc;
 }
 
+/*
+ * Check whether there exists mmio hole in the specified memory range.
+ * Returns 1 if exists, else returns 0.
+ */
+static int check_mmio_hole(uint64_t start, uint64_t memsize,
+                           uint64_t mmio_start, uint64_t mmio_size)
+{
+    if ( start + memsize <= mmio_start || start >= mmio_start + mmio_size )
+        return 0;
+    else
+        return 1;
+}
+
+static int meminit_hvm(struct xc_dom_image *dom)
+{
+    unsigned long i, vmemid, nr_pages = dom->total_pages;
+    unsigned long p2m_size;
+    unsigned long target_pages = dom->target_pages;
+    unsigned long cur_pages, cur_pfn;
+    int rc;
+    xen_capabilities_info_t caps;
+    unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
+        stat_1gb_pages = 0;
+    unsigned int memflags = 0;
+    int claim_enabled = dom->claim_enabled;
+    uint64_t total_pages;
+    xen_vmemrange_t dummy_vmemrange[2];
+    unsigned int dummy_vnode_to_pnode[1];
+    xen_vmemrange_t *vmemranges;
+    unsigned int *vnode_to_pnode;
+    unsigned int nr_vmemranges, nr_vnodes;
+    xc_interface *xch = dom->xch;
+    uint32_t domid = dom->guest_domid;
+
+    if ( nr_pages > target_pages )
+        memflags |= XENMEMF_populate_on_demand;
+
+    if ( dom->nr_vmemranges == 0 )
+    {
+        /* Build dummy vnode information
+         *
+         * Guest physical address space layout:
+         * [0, hole_start) [hole_start, 4G) [4G, highmem_end)
+         *
+         * Of course if there is no high memory, the second vmemrange
+         * has no effect on the actual result.
+         */
+
+        dummy_vmemrange[0].start = 0;
+        dummy_vmemrange[0].end   = dom->lowmem_end;
+        dummy_vmemrange[0].flags = 0;
+        dummy_vmemrange[0].nid   = 0;
+        nr_vmemranges = 1;
+
+        if ( dom->highmem_end > (1ULL << 32) )
+        {
+            dummy_vmemrange[1].start = 1ULL << 32;
+            dummy_vmemrange[1].end   = dom->highmem_end;
+            dummy_vmemrange[1].flags = 0;
+            dummy_vmemrange[1].nid   = 0;
+
+            nr_vmemranges++;
+        }
+
+        dummy_vnode_to_pnode[0] = XC_NUMA_NO_NODE;
+        nr_vnodes = 1;
+        vmemranges = dummy_vmemrange;
+        vnode_to_pnode = dummy_vnode_to_pnode;
+    }
+    else
+    {
+        if ( nr_pages > target_pages )
+        {
+            DOMPRINTF("Cannot enable vNUMA and PoD at the same time");
+            goto error_out;
+        }
+
+        nr_vmemranges = dom->nr_vmemranges;
+        nr_vnodes = dom->nr_vnodes;
+        vmemranges = dom->vmemranges;
+        vnode_to_pnode = dom->vnode_to_pnode;
+    }
+
+    total_pages = 0;
+    p2m_size = 0;
+    for ( i = 0; i < nr_vmemranges; i++ )
+    {
+        total_pages += ((vmemranges[i].end - vmemranges[i].start)
+                        >> PAGE_SHIFT);
+        p2m_size = p2m_size > (vmemranges[i].end >> PAGE_SHIFT) ?
+            p2m_size : (vmemranges[i].end >> PAGE_SHIFT);
+    }
+
+    if ( total_pages != nr_pages )
+    {
+        DOMPRINTF("vNUMA memory pages mismatch (0x%"PRIx64" != 0x%"PRIx64")",
+               total_pages, nr_pages);
+        goto error_out;
+    }
+
+    if ( xc_version(xch, XENVER_capabilities, &caps) != 0 )
+    {
+        DOMPRINTF("Could not get Xen capabilities");
+        goto error_out;
+    }
+
+    dom->p2m_size = p2m_size;
+    dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
+                                      dom->p2m_size);
+    if ( dom->p2m_host == NULL )
+    {
+        DOMPRINTF("Could not allocate p2m");
+        goto error_out;
+    }
+
+    for ( i = 0; i < p2m_size; i++ )
+        dom->p2m_host[i] = ((xen_pfn_t)-1);
+    for ( vmemid = 0; vmemid < nr_vmemranges; vmemid++ )
+    {
+        uint64_t pfn;
+
+        for ( pfn = vmemranges[vmemid].start >> PAGE_SHIFT;
+              pfn < vmemranges[vmemid].end >> PAGE_SHIFT;
+              pfn++ )
+            dom->p2m_host[pfn] = pfn;
+    }
+
+    /*
+     * Try to claim pages for early warning of insufficient memory available.
+     * This should go before xc_domain_set_pod_target, becuase that function
+     * actually allocates memory for the guest. Claiming after memory has been
+     * allocated is pointless.
+     */
+    if ( claim_enabled ) {
+        rc = xc_domain_claim_pages(xch, domid, target_pages - VGA_HOLE_SIZE);
+        if ( rc != 0 )
+        {
+            DOMPRINTF("Could not allocate memory for HVM guest as we cannot claim memory!");
+            goto error_out;
+        }
+    }
+
+    if ( memflags & XENMEMF_populate_on_demand )
+    {
+        /*
+         * Subtract VGA_HOLE_SIZE from target_pages for the VGA
+         * "hole".  Xen will adjust the PoD cache size so that domain
+         * tot_pages will be target_pages - VGA_HOLE_SIZE after
+         * this call.
+         */
+        rc = xc_domain_set_pod_target(xch, domid, target_pages - VGA_HOLE_SIZE,
+                                      NULL, NULL, NULL);
+        if ( rc != 0 )
+        {
+            DOMPRINTF("Could not set PoD target for HVM guest.\n");
+            goto error_out;
+        }
+    }
+
+    /*
+     * Allocate memory for HVM guest, skipping VGA hole 0xA0000-0xC0000.
+     *
+     * We attempt to allocate 1GB pages if possible. It falls back on 2MB
+     * pages if 1GB allocation fails. 4KB pages will be used eventually if
+     * both fail.
+     * 
+     * Under 2MB mode, we allocate pages in batches of no more than 8MB to 
+     * ensure that we can be preempted and hence dom0 remains responsive.
+     */
+    rc = xc_domain_populate_physmap_exact(
+        xch, domid, 0xa0, 0, memflags, &dom->p2m_host[0x00]);
+
+    stat_normal_pages = 0;
+    for ( vmemid = 0; vmemid < nr_vmemranges; vmemid++ )
+    {
+        unsigned int new_memflags = memflags;
+        uint64_t end_pages;
+        unsigned int vnode = vmemranges[vmemid].nid;
+        unsigned int pnode = vnode_to_pnode[vnode];
+
+        if ( pnode != XC_NUMA_NO_NODE )
+            new_memflags |= XENMEMF_exact_node(pnode);
+
+        end_pages = vmemranges[vmemid].end >> PAGE_SHIFT;
+        /*
+         * Consider vga hole belongs to the vmemrange that covers
+         * 0xA0000-0xC0000. Note that 0x00000-0xA0000 is populated just
+         * before this loop.
+         */
+        if ( vmemranges[vmemid].start == 0 )
+        {
+            cur_pages = 0xc0;
+            stat_normal_pages += 0xc0;
+        }
+        else
+            cur_pages = vmemranges[vmemid].start >> PAGE_SHIFT;
+
+        while ( (rc == 0) && (end_pages > cur_pages) )
+        {
+            /* Clip count to maximum 1GB extent. */
+            unsigned long count = end_pages - cur_pages;
+            unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS;
+
+            if ( count > max_pages )
+                count = max_pages;
+
+            cur_pfn = dom->p2m_host[cur_pages];
+
+            /* Take care the corner cases of super page tails */
+            if ( ((cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
+                 (count > (-cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1))) )
+                count = -cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1);
+            else if ( ((count & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
+                      (count > SUPERPAGE_1GB_NR_PFNS) )
+                count &= ~(SUPERPAGE_1GB_NR_PFNS - 1);
+
+            /* Attemp to allocate 1GB super page. Because in each pass
+             * we only allocate at most 1GB, we don't have to clip
+             * super page boundaries.
+             */
+            if ( ((count | cur_pfn) & (SUPERPAGE_1GB_NR_PFNS - 1)) == 0 &&
+                 /* Check if there exists MMIO hole in the 1GB memory
+                  * range */
+                 !check_mmio_hole(cur_pfn << PAGE_SHIFT,
+                                  SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT,
+                                  dom->mmio_start, dom->mmio_size) )
+            {
+                long done;
+                unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT;
+                xen_pfn_t sp_extents[nr_extents];
+
+                for ( i = 0; i < nr_extents; i++ )
+                    sp_extents[i] =
+                        dom->p2m_host[cur_pages+(i<<SUPERPAGE_1GB_SHIFT)];
+
+                done = xc_domain_populate_physmap(xch, domid, nr_extents,
+                                                  SUPERPAGE_1GB_SHIFT,
+                                                  new_memflags, sp_extents);
+
+                if ( done > 0 )
+                {
+                    stat_1gb_pages += done;
+                    done <<= SUPERPAGE_1GB_SHIFT;
+                    cur_pages += done;
+                    count -= done;
+                }
+            }
+
+            if ( count != 0 )
+            {
+                /* Clip count to maximum 8MB extent. */
+                max_pages = SUPERPAGE_2MB_NR_PFNS * 4;
+                if ( count > max_pages )
+                    count = max_pages;
+
+                /* Clip partial superpage extents to superpage
+                 * boundaries. */
+                if ( ((cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
+                     (count > (-cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1))) )
+                    count = -cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1);
+                else if ( ((count & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
+                          (count > SUPERPAGE_2MB_NR_PFNS) )
+                    count &= ~(SUPERPAGE_2MB_NR_PFNS - 1); /* clip non-s.p. tail */
+
+                /* Attempt to allocate superpage extents. */
+                if ( ((count | cur_pfn) & (SUPERPAGE_2MB_NR_PFNS - 1)) == 0 )
+                {
+                    long done;
+                    unsigned long nr_extents = count >> SUPERPAGE_2MB_SHIFT;
+                    xen_pfn_t sp_extents[nr_extents];
+
+                    for ( i = 0; i < nr_extents; i++ )
+                        sp_extents[i] =
+                            dom->p2m_host[cur_pages+(i<<SUPERPAGE_2MB_SHIFT)];
+
+                    done = xc_domain_populate_physmap(xch, domid, nr_extents,
+                                                      SUPERPAGE_2MB_SHIFT,
+                                                      new_memflags, sp_extents);
+
+                    if ( done > 0 )
+                    {
+                        stat_2mb_pages += done;
+                        done <<= SUPERPAGE_2MB_SHIFT;
+                        cur_pages += done;
+                        count -= done;
+                    }
+                }
+            }
+
+            /* Fall back to 4kB extents. */
+            if ( count != 0 )
+            {
+                rc = xc_domain_populate_physmap_exact(
+                    xch, domid, count, 0, new_memflags, &dom->p2m_host[cur_pages]);
+                cur_pages += count;
+                stat_normal_pages += count;
+            }
+        }
+
+        if ( rc != 0 )
+            break;
+    }
+
+    if ( rc != 0 )
+    {
+        DOMPRINTF("Could not allocate memory for HVM guest.");
+        goto error_out;
+    }
+
+    DPRINTF("PHYSICAL MEMORY ALLOCATION:\n");
+    DPRINTF("  4KB PAGES: 0x%016lx\n", stat_normal_pages);
+    DPRINTF("  2MB PAGES: 0x%016lx\n", stat_2mb_pages);
+    DPRINTF("  1GB PAGES: 0x%016lx\n", stat_1gb_pages);
+
+    rc = 0;
+    goto out;
+ error_out:
+    rc = -1;
+ out:
+
+    /* ensure no unclaimed pages are left unused */
+    xc_domain_claim_pages(xch, domid, 0 /* cancels the claim */);
+
+    return rc;
+}
+
 /* ------------------------------------------------------------------------ */
 
 static int bootearly(struct xc_dom_image *dom)
@@ -1052,6 +1625,12 @@ static int bootlate_pv(struct xc_dom_image *dom)
     return 0;
 }
 
+static int bootlate_hvm(struct xc_dom_image *dom)
+{
+    DOMPRINTF("%s: doing nothing", __func__);
+    return 0;
+}
+
 int xc_dom_feature_translated(struct xc_dom_image *dom)
 {
     /* Guests running inside HVM containers are always auto-translated. */
@@ -1095,10 +1674,27 @@ static struct xc_dom_arch xc_dom_64 = {
     .bootlate = bootlate_pv,
 };
 
+static struct xc_dom_arch xc_hvm_32 = {
+    .guest_type = "hvm-3.0-x86_32",
+    .native_protocol = XEN_IO_PROTO_ABI_X86_32,
+    .page_shift = PAGE_SHIFT_X86,
+    .sizeof_pfn = 4,
+    .alloc_magic_pages = alloc_magic_pages_hvm,
+    .count_pgtables = NULL,
+    .setup_pgtables = NULL,
+    .start_info = NULL,
+    .shared_info = NULL,
+    .vcpu = vcpu_hvm,
+    .meminit = meminit_hvm,
+    .bootearly = bootearly,
+    .bootlate = bootlate_hvm,
+};
+
 static void __init register_arch_hooks(void)
 {
     xc_dom_register_arch_hooks(&xc_dom_32_pae);
     xc_dom_register_arch_hooks(&xc_dom_64);
+    xc_dom_register_arch_hooks(&xc_hvm_32);
 }
 
 /*
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 09/29] libxl: switch HVM domain building to use xc_dom_* helpers
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (7 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 08/29] libxc: introduce a xc_dom_arch for hvm-3.0-x86_32 guests Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-18 15:53   ` Anthony PERARD
  2015-09-04 12:08 ` [PATCH v6 10/29] libxc: remove dead HVM building code Roger Pau Monne
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

Now that we have all the code in place HVM domain building in libxl can be
switched to use the xc_dom_* family of functions, just like they are used in
order to build PV guests.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v4:
 - Add Wei Liu Acked-by.
---
 tools/libxl/libxl_arch.h     |   2 +-
 tools/libxl/libxl_dm.c       |  18 ++--
 tools/libxl/libxl_dom.c      | 227 +++++++++++++++++++++++++------------------
 tools/libxl/libxl_internal.h |   4 +-
 tools/libxl/libxl_vnuma.c    |  12 ++-
 tools/libxl/libxl_x86.c      |   8 +-
 6 files changed, 155 insertions(+), 116 deletions(-)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index bd030b6..34a853c 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -60,6 +60,6 @@ _hidden
 int libxl__arch_domain_construct_memmap(libxl__gc *gc,
                                         libxl_domain_config *d_config,
                                         uint32_t domid,
-                                        struct xc_hvm_build_args *args);
+                                        struct xc_dom_image *dom);
 
 #endif
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index c84085e..16ad47a 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -18,6 +18,8 @@
 #include "libxl_osdeps.h" /* must come before any other headers */
 
 #include "libxl_internal.h"
+
+#include <xc_dom.h>
 #include <xen/hvm/e820.h>
 
 static const char *libxl_tapif_script(libxl__gc *gc)
@@ -181,7 +183,7 @@ add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config,
 int libxl__domain_device_construct_rdm(libxl__gc *gc,
                                        libxl_domain_config *d_config,
                                        uint64_t rdm_mem_boundary,
-                                       struct xc_hvm_build_args *args)
+                                       struct xc_dom_image *dom)
 {
     int i, j, conflict, rc;
     struct xen_reserved_device_memory *xrdm = NULL;
@@ -189,7 +191,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc,
     uint16_t seg;
     uint8_t bus, devfn;
     uint64_t rdm_start, rdm_size;
-    uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull<<32);
+    uint64_t highmem_end = dom->highmem_end ? dom->highmem_end : (1ull<<32);
 
     /*
      * We just want to construct RDM once since RDM is specific to the
@@ -303,7 +305,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc,
     for (i = 0; i < d_config->num_rdms; i++) {
         rdm_start = d_config->rdms[i].start;
         rdm_size = d_config->rdms[i].size;
-        conflict = overlaps_rdm(0, args->lowmem_end, rdm_start, rdm_size);
+        conflict = overlaps_rdm(0, dom->lowmem_end, rdm_start, rdm_size);
 
         if (!conflict)
             continue;
@@ -314,14 +316,14 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc,
              * We will move downwards lowmem_end so we have to expand
              * highmem_end.
              */
-            highmem_end += (args->lowmem_end - rdm_start);
+            highmem_end += (dom->lowmem_end - rdm_start);
             /* Now move downwards lowmem_end. */
-            args->lowmem_end = rdm_start;
+            dom->lowmem_end = rdm_start;
         }
     }
 
     /* Sync highmem_end. */
-    args->highmem_end = highmem_end;
+    dom->highmem_end = highmem_end;
 
     /*
      * Finally we can take same policy to check lowmem(< 2G) and
@@ -331,11 +333,11 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc,
         rdm_start = d_config->rdms[i].start;
         rdm_size = d_config->rdms[i].size;
         /* Does this entry conflict with lowmem? */
-        conflict = overlaps_rdm(0, args->lowmem_end,
+        conflict = overlaps_rdm(0, dom->lowmem_end,
                                 rdm_start, rdm_size);
         /* Does this entry conflict with highmem? */
         conflict |= overlaps_rdm((1ULL<<32),
-                                 args->highmem_end - (1ULL<<32),
+                                 dom->highmem_end - (1ULL<<32),
                                  rdm_start, rdm_size);
 
         if (!conflict)
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 6101e5c..d2cf9e3 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -602,6 +602,63 @@ static int set_vnuma_info(libxl__gc *gc, uint32_t domid,
     return rc;
 }
 
+static int libxl__build_dom(libxl__gc *gc, uint32_t domid,
+             libxl_domain_build_info *info, libxl__domain_build_state *state,
+             struct xc_dom_image *dom)
+{
+    uint64_t mem_kb;
+    int ret;
+
+    if ( (ret = xc_dom_boot_xen_init(dom, CTX->xch, domid)) != 0 ) {
+        LOGE(ERROR, "xc_dom_boot_xen_init failed");
+        goto out;
+    }
+#ifdef GUEST_RAM_BASE
+    if ( (ret = xc_dom_rambase_init(dom, GUEST_RAM_BASE)) != 0 ) {
+        LOGE(ERROR, "xc_dom_rambase failed");
+        goto out;
+    }
+#endif
+    if ( (ret = xc_dom_parse_image(dom)) != 0 ) {
+        LOGE(ERROR, "xc_dom_parse_image failed");
+        goto out;
+    }
+    if ( (ret = libxl__arch_domain_init_hw_description(gc, info, state, dom)) != 0 ) {
+        LOGE(ERROR, "libxl__arch_domain_init_hw_description failed");
+        goto out;
+    }
+
+    mem_kb = dom->container_type == XC_DOM_HVM_CONTAINER ?
+             (info->max_memkb - info->video_memkb) : info->target_memkb;
+    if ( (ret = xc_dom_mem_init(dom, mem_kb / 1024)) != 0 ) {
+        LOGE(ERROR, "xc_dom_mem_init failed");
+        goto out;
+    }
+    if ( (ret = xc_dom_boot_mem_init(dom)) != 0 ) {
+        LOGE(ERROR, "xc_dom_boot_mem_init failed");
+        goto out;
+    }
+    if ( (ret = libxl__arch_domain_finalise_hw_description(gc, info, dom)) != 0 ) {
+        LOGE(ERROR, "libxl__arch_domain_finalise_hw_description failed");
+        goto out;
+    }
+    if ( (ret = xc_dom_build_image(dom)) != 0 ) {
+        LOGE(ERROR, "xc_dom_build_image failed");
+        goto out;
+    }
+    if ( (ret = xc_dom_boot_image(dom)) != 0 ) {
+        LOGE(ERROR, "xc_dom_boot_image failed");
+        goto out;
+    }
+    if ( (ret = xc_dom_gnttab_init(dom)) != 0 ) {
+        LOGE(ERROR, "xc_dom_gnttab_init failed");
+        goto out;
+    }
+
+out:
+    return ret != 0 ? ERROR_FAIL : 0;
+}
+
 int libxl__build_pv(libxl__gc *gc, uint32_t domid,
              libxl_domain_build_info *info, libxl__domain_build_state *state)
 {
@@ -692,48 +749,9 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
             dom->vnode_to_pnode[i] = info->vnuma_nodes[i].pnode;
     }
 
-    if ( (ret = xc_dom_boot_xen_init(dom, ctx->xch, domid)) != 0 ) {
-        LOGE(ERROR, "xc_dom_boot_xen_init failed");
-        goto out;
-    }
-#ifdef GUEST_RAM_BASE
-    if ( (ret = xc_dom_rambase_init(dom, GUEST_RAM_BASE)) != 0 ) {
-        LOGE(ERROR, "xc_dom_rambase failed");
-        goto out;
-    }
-#endif
-    if ( (ret = xc_dom_parse_image(dom)) != 0 ) {
-        LOGE(ERROR, "xc_dom_parse_image failed");
-        goto out;
-    }
-    if ( (ret = libxl__arch_domain_init_hw_description(gc, info, state, dom)) != 0 ) {
-        LOGE(ERROR, "libxl__arch_domain_init_hw_description failed");
-        goto out;
-    }
-    if ( (ret = xc_dom_mem_init(dom, info->target_memkb / 1024)) != 0 ) {
-        LOGE(ERROR, "xc_dom_mem_init failed");
-        goto out;
-    }
-    if ( (ret = xc_dom_boot_mem_init(dom)) != 0 ) {
-        LOGE(ERROR, "xc_dom_boot_mem_init failed");
-        goto out;
-    }
-    if ( (ret = libxl__arch_domain_finalise_hw_description(gc, info, dom)) != 0 ) {
-        LOGE(ERROR, "libxl__arch_domain_finalise_hw_description failed");
-        goto out;
-    }
-    if ( (ret = xc_dom_build_image(dom)) != 0 ) {
-        LOGE(ERROR, "xc_dom_build_image failed");
-        goto out;
-    }
-    if ( (ret = xc_dom_boot_image(dom)) != 0 ) {
-        LOGE(ERROR, "xc_dom_boot_image failed");
+    ret = libxl__build_dom(gc, domid, info, state, dom);
+    if (ret != 0)
         goto out;
-    }
-    if ( (ret = xc_dom_gnttab_init(dom)) != 0 ) {
-        LOGE(ERROR, "xc_dom_gnttab_init failed");
-        goto out;
-    }
 
     if (xc_dom_feature_translated(dom)) {
         state->console_mfn = dom->console_pfn;
@@ -793,39 +811,39 @@ static int hvm_build_set_params(xc_interface *handle, uint32_t domid,
 
 static int hvm_build_set_xs_values(libxl__gc *gc,
                                    uint32_t domid,
-                                   struct xc_hvm_build_args *args)
+                                   struct xc_dom_image *dom)
 {
     char *path = NULL;
     int ret = 0;
 
-    if (args->smbios_module.guest_addr_out) {
+    if (dom->smbios_module.guest_addr_out) {
         path = GCSPRINTF("/local/domain/%d/"HVM_XS_SMBIOS_PT_ADDRESS, domid);
 
         ret = libxl__xs_write(gc, XBT_NULL, path, "0x%"PRIx64,
-                              args->smbios_module.guest_addr_out);
+                              dom->smbios_module.guest_addr_out);
         if (ret)
             goto err;
 
         path = GCSPRINTF("/local/domain/%d/"HVM_XS_SMBIOS_PT_LENGTH, domid);
 
         ret = libxl__xs_write(gc, XBT_NULL, path, "0x%x",
-                              args->smbios_module.length);
+                              dom->smbios_module.length);
         if (ret)
             goto err;
     }
 
-    if (args->acpi_module.guest_addr_out) {
+    if (dom->acpi_module.guest_addr_out) {
         path = GCSPRINTF("/local/domain/%d/"HVM_XS_ACPI_PT_ADDRESS, domid);
 
         ret = libxl__xs_write(gc, XBT_NULL, path, "0x%"PRIx64,
-                              args->acpi_module.guest_addr_out);
+                              dom->acpi_module.guest_addr_out);
         if (ret)
             goto err;
 
         path = GCSPRINTF("/local/domain/%d/"HVM_XS_ACPI_PT_LENGTH, domid);
 
         ret = libxl__xs_write(gc, XBT_NULL, path, "0x%x",
-                              args->acpi_module.length);
+                              dom->acpi_module.length);
         if (ret)
             goto err;
     }
@@ -839,7 +857,7 @@ err:
 
 static int libxl__domain_firmware(libxl__gc *gc,
                                   libxl_domain_build_info *info,
-                                  struct xc_hvm_build_args *args)
+                                  struct xc_dom_image *dom)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     const char *firmware;
@@ -865,8 +883,13 @@ static int libxl__domain_firmware(libxl__gc *gc,
             break;
         }
     }
-    args->image_file_name = libxl__abs_path(gc, firmware,
-                                            libxl__xenfirmwaredir_path());
+
+    rc = xc_dom_kernel_file(dom, libxl__abs_path(gc, firmware,
+                                                 libxl__xenfirmwaredir_path()));
+    if (rc != 0) {
+        LOGE(ERROR, "xc_dom_kernel_file failed");
+        goto out;
+    }
 
     if (info->u.hvm.smbios_firmware) {
         data = NULL;
@@ -880,8 +903,8 @@ static int libxl__domain_firmware(libxl__gc *gc,
         libxl__ptr_add(gc, data);
         if (datalen) {
             /* Only accept non-empty files */
-            args->smbios_module.data = data;
-            args->smbios_module.length = (uint32_t)datalen;
+            dom->smbios_module.data = data;
+            dom->smbios_module.length = (uint32_t)datalen;
         }
     }
 
@@ -897,8 +920,8 @@ static int libxl__domain_firmware(libxl__gc *gc,
         libxl__ptr_add(gc, data);
         if (datalen) {
             /* Only accept non-empty files */
-            args->acpi_module.data = data;
-            args->acpi_module.length = (uint32_t)datalen;
+            dom->acpi_module.data = data;
+            dom->acpi_module.length = (uint32_t)datalen;
         }
     }
 
@@ -912,52 +935,62 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
               libxl__domain_build_state *state)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
-    struct xc_hvm_build_args args = {};
-    int ret, rc;
-    uint64_t mmio_start, lowmem_end, highmem_end;
+    int rc;
+    uint64_t mmio_start, lowmem_end, highmem_end, mem_size;
     libxl_domain_build_info *const info = &d_config->b_info;
+    struct xc_dom_image *dom = NULL;
+
+    xc_dom_loginit(ctx->xch);
+
+    dom = xc_dom_allocate(ctx->xch, NULL, NULL);
+    if (!dom) {
+        LOGE(ERROR, "xc_dom_allocate failed");
+        goto out;
+    }
+
+    dom->container_type = XC_DOM_HVM_CONTAINER;
 
-    memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
      * multiplied by 1 Kb. This was then divided off when calling
      * the old xc_hvm_build_target_mem() which then turned them to bytes.
      * Do all this in one step here...
      */
-    args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
-    args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 10;
-    args.claim_enabled = libxl_defbool_val(info->claim_mode);
+    mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
+    dom->target_pages = (uint64_t)(info->target_memkb - info->video_memkb) >> 2;
+    dom->claim_enabled = libxl_defbool_val(info->claim_mode);
     if (info->u.hvm.mmio_hole_memkb) {
         uint64_t max_ram_below_4g = (1ULL << 32) -
             (info->u.hvm.mmio_hole_memkb << 10);
 
         if (max_ram_below_4g < HVM_BELOW_4G_MMIO_START)
-            args.mmio_size = info->u.hvm.mmio_hole_memkb << 10;
+            dom->mmio_size = info->u.hvm.mmio_hole_memkb << 10;
     }
 
-    rc = libxl__domain_firmware(gc, info, &args);
+    rc = libxl__domain_firmware(gc, info, dom);
     if (rc != 0) {
         LOG(ERROR, "initializing domain firmware failed");
         goto out;
     }
-    if (args.mem_target == 0)
-        args.mem_target = args.mem_size;
-    if (args.mmio_size == 0)
-        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
-    lowmem_end = args.mem_size;
+
+    if (dom->target_pages == 0)
+        dom->target_pages = mem_size >> XC_PAGE_SHIFT;
+    if (dom->mmio_size == 0)
+        dom->mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
+    lowmem_end = mem_size;
     highmem_end = 0;
-    mmio_start = (1ull << 32) - args.mmio_size;
+    mmio_start = (1ull << 32) - dom->mmio_size;
     if (lowmem_end > mmio_start)
     {
         highmem_end = (1ull << 32) + (lowmem_end - mmio_start);
         lowmem_end = mmio_start;
     }
-    args.lowmem_end = lowmem_end;
-    args.highmem_end = highmem_end;
-    args.mmio_start = mmio_start;
+    dom->lowmem_end = lowmem_end;
+    dom->highmem_end = highmem_end;
+    dom->mmio_start = mmio_start;
 
     rc = libxl__domain_device_construct_rdm(gc, d_config,
                                             info->u.hvm.rdm_mem_boundary_memkb*1024,
-                                            &args);
+                                            dom);
     if (rc) {
         LOG(ERROR, "checking reserved device memory failed");
         goto out;
@@ -966,7 +999,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     if (info->num_vnuma_nodes != 0) {
         int i;
 
-        rc = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, &args);
+        rc = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, dom);
         if (rc != 0) {
             LOG(ERROR, "hvm build vmemranges failed");
             goto out;
@@ -976,37 +1009,34 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         rc = set_vnuma_info(gc, domid, info, state);
         if (rc != 0) goto out;
 
-        args.nr_vmemranges = state->num_vmemranges;
-        args.vmemranges = libxl__malloc(gc, sizeof(*args.vmemranges) *
-                                        args.nr_vmemranges);
+        dom->nr_vmemranges = state->num_vmemranges;
+        dom->vmemranges = libxl__malloc(gc, sizeof(*dom->vmemranges) *
+                                        dom->nr_vmemranges);
 
-        for (i = 0; i < args.nr_vmemranges; i++) {
-            args.vmemranges[i].start = state->vmemranges[i].start;
-            args.vmemranges[i].end   = state->vmemranges[i].end;
-            args.vmemranges[i].flags = state->vmemranges[i].flags;
-            args.vmemranges[i].nid   = state->vmemranges[i].nid;
+        for (i = 0; i < dom->nr_vmemranges; i++) {
+            dom->vmemranges[i].start = state->vmemranges[i].start;
+            dom->vmemranges[i].end   = state->vmemranges[i].end;
+            dom->vmemranges[i].flags = state->vmemranges[i].flags;
+            dom->vmemranges[i].nid   = state->vmemranges[i].nid;
         }
 
-        args.nr_vnodes = info->num_vnuma_nodes;
-        args.vnode_to_pnode = libxl__malloc(gc, sizeof(*args.vnode_to_pnode) *
-                                            args.nr_vnodes);
-        for (i = 0; i < args.nr_vnodes; i++)
-            args.vnode_to_pnode[i] = info->vnuma_nodes[i].pnode;
-    }
-
-    ret = xc_hvm_build(ctx->xch, domid, &args);
-    if (ret) {
-        LOGEV(ERROR, ret, "hvm building failed");
-        rc = ERROR_FAIL;
-        goto out;
+        dom->nr_vnodes = info->num_vnuma_nodes;
+        dom->vnode_to_pnode = libxl__malloc(gc, sizeof(*dom->vnode_to_pnode) *
+                                            dom->nr_vnodes);
+        for (i = 0; i < dom->nr_vnodes; i++)
+            dom->vnode_to_pnode[i] = info->vnuma_nodes[i].pnode;
     }
 
-    rc = libxl__arch_domain_construct_memmap(gc, d_config, domid, &args);
+    rc = libxl__arch_domain_construct_memmap(gc, d_config, domid, dom);
     if (rc != 0) {
         LOG(ERROR, "setting domain memory map failed");
         goto out;
     }
 
+    rc = libxl__build_dom(gc, domid, info, state, dom);
+    if (rc != 0)
+        goto out;
+
     rc = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
@@ -1016,15 +1046,18 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         goto out;
     }
 
-    rc = hvm_build_set_xs_values(gc, domid, &args);
+    rc = hvm_build_set_xs_values(gc, domid, dom);
     if (rc != 0) {
         LOG(ERROR, "hvm build set xenstore values failed");
         goto out;
     }
 
+    xc_dom_release(dom);
     return 0;
+
 out:
     assert(rc != 0);
+    if (dom != NULL) xc_dom_release(dom);
     return rc;
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 6ea6c83..ea89f1f 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1594,7 +1594,7 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
 _hidden int libxl__domain_device_construct_rdm(libxl__gc *gc,
                                    libxl_domain_config *d_config,
                                    uint64_t rdm_mem_guard,
-                                   struct xc_hvm_build_args *args);
+                                   struct xc_dom_image *dom);
 
 /*
  * This function will cause the whole libxl process to hang
@@ -3752,7 +3752,7 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
                                      uint32_t domid,
                                      libxl_domain_build_info *b_info,
                                      libxl__domain_build_state *state,
-                                     struct xc_hvm_build_args *args);
+                                     struct xc_dom_image *dom);
 bool libxl__vnuma_configured(const libxl_domain_build_info *b_info);
 
 _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
index 56856d2..db22799 100644
--- a/tools/libxl/libxl_vnuma.c
+++ b/tools/libxl/libxl_vnuma.c
@@ -17,6 +17,8 @@
 #include "libxl_arch.h"
 #include <stdlib.h>
 
+#include <xc_dom.h>
+
 bool libxl__vnuma_configured(const libxl_domain_build_info *b_info)
 {
     return b_info->num_vnuma_nodes != 0;
@@ -252,7 +254,7 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
                                      uint32_t domid,
                                      libxl_domain_build_info *b_info,
                                      libxl__domain_build_state *state,
-                                     struct xc_hvm_build_args *args)
+                                     struct xc_dom_image *dom)
 {
     uint64_t hole_start, hole_end, next;
     int nid, nr_vmemrange;
@@ -264,10 +266,10 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
      * Guest physical address space layout:
      * [0, hole_start) [hole_start, hole_end) [hole_end, highmem_end)
      */
-    hole_start = args->lowmem_end < args->mmio_start ?
-        args->lowmem_end : args->mmio_start;
-    hole_end = (args->mmio_start + args->mmio_size) > (1ULL << 32) ?
-        (args->mmio_start + args->mmio_size) : (1ULL << 32);
+    hole_start = dom->lowmem_end < dom->mmio_start ?
+        dom->lowmem_end : dom->mmio_start;
+    hole_end = (dom->mmio_start + dom->mmio_size) > (1ULL << 32) ?
+        (dom->mmio_start + dom->mmio_size) : (1ULL << 32);
 
     assert(state->vmemranges == NULL);
 
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 896f34c..9276126 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -1,6 +1,8 @@
 #include "libxl_internal.h"
 #include "libxl_arch.h"
 
+#include <xc_dom.h>
+
 int libxl__arch_domain_prepare_config(libxl__gc *gc,
                                       libxl_domain_config *d_config,
                                       xc_domain_configuration_t *xc_config)
@@ -473,7 +475,7 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
 int libxl__arch_domain_construct_memmap(libxl__gc *gc,
                                         libxl_domain_config *d_config,
                                         uint32_t domid,
-                                        struct xc_hvm_build_args *args)
+                                        struct xc_dom_image *dom)
 {
     int rc = 0;
     unsigned int nr = 0, i;
@@ -481,7 +483,7 @@ int libxl__arch_domain_construct_memmap(libxl__gc *gc,
     unsigned int e820_entries = 1;
     struct e820entry *e820 = NULL;
     uint64_t highmem_size =
-                    args->highmem_end ? args->highmem_end - (1ull << 32) : 0;
+                    dom->highmem_end ? dom->highmem_end - (1ull << 32) : 0;
 
     /* Add all rdm entries. */
     for (i = 0; i < d_config->num_rdms; i++)
@@ -503,7 +505,7 @@ int libxl__arch_domain_construct_memmap(libxl__gc *gc,
 
     /* Low memory */
     e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
-    e820[nr].size = args->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].size = dom->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
     e820[nr].type = E820_RAM;
     nr++;
 
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 10/29] libxc: remove dead HVM building code
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (8 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 09/29] libxl: switch HVM domain building to use xc_dom_* helpers Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices Roger Pau Monne
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

Remove xc_hvm_build_x86.c and xc_hvm_build_arm.c since xc_hvm_build is not
longer used in order to create HVM guests.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v4:
 - Add Wei Liu Acked-by and Andrew Cooper Reviewed-by.
---
 tools/libxc/Makefile              |   2 -
 tools/libxc/include/xenguest.h    |  44 ---
 tools/libxc/xc_hvm_build_arm.c    |  48 ---
 tools/libxc/xc_hvm_build_x86.c    | 808 --------------------------------------
 tools/libxc/xg_private.c          |   9 -
 tools/python/xen/lowlevel/xc/xc.c |  81 ----
 6 files changed, 992 deletions(-)
 delete mode 100644 tools/libxc/xc_hvm_build_arm.c
 delete mode 100644 tools/libxc/xc_hvm_build_x86.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index baaadd6..818f2e4 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -91,9 +91,7 @@ GUEST_SRCS-y                 += xc_dom_compat_linux.c
 
 GUEST_SRCS-$(CONFIG_X86)     += xc_dom_x86.c
 GUEST_SRCS-$(CONFIG_X86)     += xc_cpuid_x86.c
-GUEST_SRCS-$(CONFIG_X86)     += xc_hvm_build_x86.c
 GUEST_SRCS-$(CONFIG_ARM)     += xc_dom_arm.c
-GUEST_SRCS-$(CONFIG_ARM)     += xc_hvm_build_arm.c
 
 ifeq ($(CONFIG_LIBXC_MINIOS),y)
 GUEST_SRCS-y                 += xc_dom_decompress_unsafe.c
diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 1a1a185..ec67fbd 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -205,50 +205,6 @@ struct xc_hvm_firmware_module {
     uint64_t  guest_addr_out;
 };
 
-struct xc_hvm_build_args {
-    uint64_t mem_size;           /* Memory size in bytes. */
-    uint64_t mem_target;         /* Memory target in bytes. */
-    uint64_t mmio_size;          /* Size of the MMIO hole in bytes. */
-    const char *image_file_name; /* File name of the image to load. */
-
-    /* Extra ACPI tables passed to HVMLOADER */
-    struct xc_hvm_firmware_module acpi_module;
-
-    /* Extra SMBIOS structures passed to HVMLOADER */
-    struct xc_hvm_firmware_module smbios_module;
-    /* Whether to use claim hypercall (1 - enable, 0 - disable). */
-    int claim_enabled;
-
-    /* vNUMA information*/
-    xen_vmemrange_t *vmemranges;
-    unsigned int nr_vmemranges;
-    unsigned int *vnode_to_pnode;
-    unsigned int nr_vnodes;
-
-    /* Out parameters  */
-    uint64_t lowmem_end;
-    uint64_t highmem_end;
-    uint64_t mmio_start;
-};
-
-/**
- * Build a HVM domain.
- * @parm xch      libxc context handle.
- * @parm domid    domain ID for the new domain.
- * @parm hvm_args parameters for the new domain.
- *
- * The memory size and image file parameters are required, the rest
- * are optional.
- */
-int xc_hvm_build(xc_interface *xch, uint32_t domid,
-                 struct xc_hvm_build_args *hvm_args);
-
-int xc_hvm_build_target_mem(xc_interface *xch,
-                            uint32_t domid,
-                            int memsize,
-                            int target,
-                            const char *image_name);
-
 /*
  * Sets *lockfd to -1.
  * Has deallocated everything even on error.
diff --git a/tools/libxc/xc_hvm_build_arm.c b/tools/libxc/xc_hvm_build_arm.c
deleted file mode 100644
index 14f7c45..0000000
--- a/tools/libxc/xc_hvm_build_arm.c
+++ /dev/null
@@ -1,48 +0,0 @@
-/******************************************************************************
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation;
- * version 2.1 of the License.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with this library; If not, see <http://www.gnu.org/licenses/>.
- *
- * Copyright (c) 2011, Citrix Systems
- */
-
-#include <inttypes.h>
-#include <errno.h>
-#include <xenctrl.h>
-#include <xenguest.h>
-
-int xc_hvm_build(xc_interface *xch, uint32_t domid,
-                 struct xc_hvm_build_args *hvm_args)
-{
-    errno = ENOSYS;
-    return -1;
-}
-
-int xc_hvm_build_target_mem(xc_interface *xch,
-                           uint32_t domid,
-                           int memsize,
-                           int target,
-                           const char *image_name)
-{
-    errno = ENOSYS;
-    return -1;
-}
-
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 4
- * tab-width: 4
- * indent-tabs-mode: nil
- * End:
- */
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
deleted file mode 100644
index 4d3736b..0000000
--- a/tools/libxc/xc_hvm_build_x86.c
+++ /dev/null
@@ -1,808 +0,0 @@
-/******************************************************************************
- * xc_hvm_build.c
- *
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation;
- * version 2.1 of the License.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with this library; If not, see <http://www.gnu.org/licenses/>.
- */
-
-#include <stddef.h>
-#include <inttypes.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <zlib.h>
-
-#include "xg_private.h"
-#include "xc_private.h"
-
-#include <xen/foreign/x86_32.h>
-#include <xen/foreign/x86_64.h>
-#include <xen/hvm/hvm_info_table.h>
-#include <xen/hvm/params.h>
-#include <xen/hvm/e820.h>
-
-#include <xen/libelf/libelf.h>
-
-#define SUPERPAGE_2MB_SHIFT   9
-#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
-#define SUPERPAGE_1GB_SHIFT   18
-#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
-
-#define SPECIALPAGE_PAGING   0
-#define SPECIALPAGE_ACCESS   1
-#define SPECIALPAGE_SHARING  2
-#define SPECIALPAGE_BUFIOREQ 3
-#define SPECIALPAGE_XENSTORE 4
-#define SPECIALPAGE_IOREQ    5
-#define SPECIALPAGE_IDENT_PT 6
-#define SPECIALPAGE_CONSOLE  7
-#define NR_SPECIAL_PAGES     8
-#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
-
-#define NR_IOREQ_SERVER_PAGES 8
-#define ioreq_server_pfn(x) (special_pfn(0) - NR_IOREQ_SERVER_PAGES + (x))
-
-#define VGA_HOLE_SIZE (0x20)
-
-static int modules_init(struct xc_hvm_build_args *args,
-                        uint64_t vend, struct elf_binary *elf,
-                        uint64_t *mstart_out, uint64_t *mend_out)
-{
-#define MODULE_ALIGN 1UL << 7
-#define MB_ALIGN     1UL << 20
-#define MKALIGN(x, a) (((uint64_t)(x) + (a) - 1) & ~(uint64_t)((a) - 1))
-    uint64_t total_len = 0, offset1 = 0;
-
-    if ( (args->acpi_module.length == 0)&&(args->smbios_module.length == 0) )
-        return 0;
-
-    /* Find the total length for the firmware modules with a reasonable large
-     * alignment size to align each the modules.
-     */
-    total_len = MKALIGN(args->acpi_module.length, MODULE_ALIGN);
-    offset1 = total_len;
-    total_len += MKALIGN(args->smbios_module.length, MODULE_ALIGN);
-
-    /* Want to place the modules 1Mb+change behind the loader image. */
-    *mstart_out = MKALIGN(elf->pend, MB_ALIGN) + (MB_ALIGN);
-    *mend_out = *mstart_out + total_len;
-
-    if ( *mend_out > vend )    
-        return -1;
-
-    if ( args->acpi_module.length != 0 )
-        args->acpi_module.guest_addr_out = *mstart_out;
-    if ( args->smbios_module.length != 0 )
-        args->smbios_module.guest_addr_out = *mstart_out + offset1;
-
-    return 0;
-}
-
-static void build_hvm_info(void *hvm_info_page,
-                           struct xc_hvm_build_args *args)
-{
-    struct hvm_info_table *hvm_info = (struct hvm_info_table *)
-        (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
-    uint8_t sum;
-    int i;
-
-    memset(hvm_info_page, 0, PAGE_SIZE);
-
-    /* Fill in the header. */
-    memcpy(hvm_info->signature, "HVM INFO", sizeof(hvm_info->signature));
-    hvm_info->length = sizeof(struct hvm_info_table);
-
-    /* Sensible defaults: these can be overridden by the caller. */
-    hvm_info->apic_mode = 1;
-    hvm_info->nr_vcpus = 1;
-    memset(hvm_info->vcpu_online, 0xff, sizeof(hvm_info->vcpu_online));
-
-    /* Memory parameters. */
-    hvm_info->low_mem_pgend = args->lowmem_end >> PAGE_SHIFT;
-    hvm_info->high_mem_pgend = args->highmem_end >> PAGE_SHIFT;
-    hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
-
-    /* Finish with the checksum. */
-    for ( i = 0, sum = 0; i < hvm_info->length; i++ )
-        sum += ((uint8_t *)hvm_info)[i];
-    hvm_info->checksum = -sum;
-}
-
-static int loadelfimage(xc_interface *xch, struct elf_binary *elf,
-                        uint32_t dom, unsigned long *parray)
-{
-    privcmd_mmap_entry_t *entries = NULL;
-    unsigned long pfn_start = elf->pstart >> PAGE_SHIFT;
-    unsigned long pfn_end = (elf->pend + PAGE_SIZE - 1) >> PAGE_SHIFT;
-    size_t pages = pfn_end - pfn_start;
-    int i, rc = -1;
-
-    /* Map address space for initial elf image. */
-    entries = calloc(pages, sizeof(privcmd_mmap_entry_t));
-    if ( entries == NULL )
-        goto err;
-
-    for ( i = 0; i < pages; i++ )
-        entries[i].mfn = parray[(elf->pstart >> PAGE_SHIFT) + i];
-
-    elf->dest_base = xc_map_foreign_ranges(
-        xch, dom, pages << PAGE_SHIFT, PROT_READ | PROT_WRITE, 1 << PAGE_SHIFT,
-        entries, pages);
-    if ( elf->dest_base == NULL )
-        goto err;
-    elf->dest_size = pages * PAGE_SIZE;
-
-    ELF_ADVANCE_DEST(elf, elf->pstart & (PAGE_SIZE - 1));
-
-    /* Load the initial elf image. */
-    rc = elf_load_binary(elf);
-    if ( rc < 0 )
-        PERROR("Failed to load elf binary\n");
-
-    munmap(elf->dest_base, pages << PAGE_SHIFT);
-    elf->dest_base = NULL;
-    elf->dest_size = 0;
-
- err:
-    free(entries);
-
-    return rc;
-}
-
-static int loadmodules(xc_interface *xch,
-                       struct xc_hvm_build_args *args,
-                       uint64_t mstart, uint64_t mend,
-                       uint32_t dom, unsigned long *parray)
-{
-    privcmd_mmap_entry_t *entries = NULL;
-    unsigned long pfn_start;
-    unsigned long pfn_end;
-    size_t pages;
-    uint32_t i;
-    uint8_t *dest;
-    int rc = -1;
-
-    if ( (mstart == 0)||(mend == 0) )
-        return 0;
-
-    pfn_start = (unsigned long)(mstart >> PAGE_SHIFT);
-    pfn_end = (unsigned long)((mend + PAGE_SIZE - 1) >> PAGE_SHIFT);
-    pages = pfn_end - pfn_start;
-
-    /* Map address space for module list. */
-    entries = calloc(pages, sizeof(privcmd_mmap_entry_t));
-    if ( entries == NULL )
-        goto error_out;
-
-    for ( i = 0; i < pages; i++ )
-        entries[i].mfn = parray[(mstart >> PAGE_SHIFT) + i];
-
-    dest = xc_map_foreign_ranges(
-        xch, dom, pages << PAGE_SHIFT, PROT_READ | PROT_WRITE, 1 << PAGE_SHIFT,
-        entries, pages);
-    if ( dest == NULL )
-        goto error_out;
-
-    /* Zero the range so padding is clear between modules */
-    memset(dest, 0, pages << PAGE_SHIFT);
-
-    /* Load modules into range */    
-    if ( args->acpi_module.length != 0 )
-    {
-        memcpy(dest,
-               args->acpi_module.data,
-               args->acpi_module.length);
-    }
-    if ( args->smbios_module.length != 0 )
-    {
-        memcpy(dest + (args->smbios_module.guest_addr_out - mstart),
-               args->smbios_module.data,
-               args->smbios_module.length);
-    }
-
-    munmap(dest, pages << PAGE_SHIFT);
-    rc = 0;
-
- error_out:
-    free(entries);
-
-    return rc;
-}
-
-/*
- * Check whether there exists mmio hole in the specified memory range.
- * Returns 1 if exists, else returns 0.
- */
-static int check_mmio_hole(uint64_t start, uint64_t memsize,
-                           uint64_t mmio_start, uint64_t mmio_size)
-{
-    if ( start + memsize <= mmio_start || start >= mmio_start + mmio_size )
-        return 0;
-    else
-        return 1;
-}
-
-static int xc_hvm_populate_memory(xc_interface *xch, uint32_t dom,
-                                  struct xc_hvm_build_args *args,
-                                  xen_pfn_t *page_array)
-{
-    unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
-    unsigned long p2m_size;
-    unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
-    unsigned long cur_pages, cur_pfn;
-    int rc;
-    xen_capabilities_info_t caps;
-    unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
-        stat_1gb_pages = 0;
-    unsigned int memflags = 0;
-    int claim_enabled = args->claim_enabled;
-    uint64_t total_pages;
-    xen_vmemrange_t dummy_vmemrange[2];
-    unsigned int dummy_vnode_to_pnode[1];
-    xen_vmemrange_t *vmemranges;
-    unsigned int *vnode_to_pnode;
-    unsigned int nr_vmemranges, nr_vnodes;
-
-    if ( nr_pages > target_pages )
-        memflags |= XENMEMF_populate_on_demand;
-
-    if ( args->nr_vmemranges == 0 )
-    {
-        /* Build dummy vnode information
-         *
-         * Guest physical address space layout:
-         * [0, hole_start) [hole_start, 4G) [4G, highmem_end)
-         *
-         * Of course if there is no high memory, the second vmemrange
-         * has no effect on the actual result.
-         */
-
-        dummy_vmemrange[0].start = 0;
-        dummy_vmemrange[0].end   = args->lowmem_end;
-        dummy_vmemrange[0].flags = 0;
-        dummy_vmemrange[0].nid   = 0;
-        nr_vmemranges = 1;
-
-        if ( args->highmem_end > (1ULL << 32) )
-        {
-            dummy_vmemrange[1].start = 1ULL << 32;
-            dummy_vmemrange[1].end   = args->highmem_end;
-            dummy_vmemrange[1].flags = 0;
-            dummy_vmemrange[1].nid   = 0;
-
-            nr_vmemranges++;
-        }
-
-        dummy_vnode_to_pnode[0] = XC_NUMA_NO_NODE;
-        nr_vnodes = 1;
-        vmemranges = dummy_vmemrange;
-        vnode_to_pnode = dummy_vnode_to_pnode;
-    }
-    else
-    {
-        if ( nr_pages > target_pages )
-        {
-            PERROR("Cannot enable vNUMA and PoD at the same time");
-            goto error_out;
-        }
-
-        nr_vmemranges = args->nr_vmemranges;
-        nr_vnodes = args->nr_vnodes;
-        vmemranges = args->vmemranges;
-        vnode_to_pnode = args->vnode_to_pnode;
-    }
-
-    total_pages = 0;
-    p2m_size = 0;
-    for ( i = 0; i < nr_vmemranges; i++ )
-    {
-        total_pages += ((vmemranges[i].end - vmemranges[i].start)
-                        >> PAGE_SHIFT);
-        p2m_size = p2m_size > (vmemranges[i].end >> PAGE_SHIFT) ?
-            p2m_size : (vmemranges[i].end >> PAGE_SHIFT);
-    }
-
-    if ( total_pages != (args->mem_size >> PAGE_SHIFT) )
-    {
-        PERROR("vNUMA memory pages mismatch (0x%"PRIx64" != 0x%"PRIx64")",
-               total_pages, args->mem_size >> PAGE_SHIFT);
-        goto error_out;
-    }
-
-    if ( xc_version(xch, XENVER_capabilities, &caps) != 0 )
-    {
-        PERROR("Could not get Xen capabilities");
-        goto error_out;
-    }
-
-    for ( i = 0; i < p2m_size; i++ )
-        page_array[i] = ((xen_pfn_t)-1);
-    for ( vmemid = 0; vmemid < nr_vmemranges; vmemid++ )
-    {
-        uint64_t pfn;
-
-        for ( pfn = vmemranges[vmemid].start >> PAGE_SHIFT;
-              pfn < vmemranges[vmemid].end >> PAGE_SHIFT;
-              pfn++ )
-            page_array[pfn] = pfn;
-    }
-
-    /*
-     * Try to claim pages for early warning of insufficient memory available.
-     * This should go before xc_domain_set_pod_target, becuase that function
-     * actually allocates memory for the guest. Claiming after memory has been
-     * allocated is pointless.
-     */
-    if ( claim_enabled ) {
-        rc = xc_domain_claim_pages(xch, dom, target_pages - VGA_HOLE_SIZE);
-        if ( rc != 0 )
-        {
-            PERROR("Could not allocate memory for HVM guest as we cannot claim memory!");
-            goto error_out;
-        }
-    }
-
-    if ( memflags & XENMEMF_populate_on_demand )
-    {
-        /*
-         * Subtract VGA_HOLE_SIZE from target_pages for the VGA
-         * "hole".  Xen will adjust the PoD cache size so that domain
-         * tot_pages will be target_pages - VGA_HOLE_SIZE after
-         * this call.
-         */
-        rc = xc_domain_set_pod_target(xch, dom,
-                                      target_pages - VGA_HOLE_SIZE,
-                                      NULL, NULL, NULL);
-        if ( rc != 0 )
-        {
-            PERROR("Could not set PoD target for HVM guest.\n");
-            goto error_out;
-        }
-    }
-
-    /*
-     * Allocate memory for HVM guest, skipping VGA hole 0xA0000-0xC0000.
-     *
-     * We attempt to allocate 1GB pages if possible. It falls back on 2MB
-     * pages if 1GB allocation fails. 4KB pages will be used eventually if
-     * both fail.
-     * 
-     * Under 2MB mode, we allocate pages in batches of no more than 8MB to 
-     * ensure that we can be preempted and hence dom0 remains responsive.
-     */
-    rc = xc_domain_populate_physmap_exact(
-        xch, dom, 0xa0, 0, memflags, &page_array[0x00]);
-
-    stat_normal_pages = 0;
-    for ( vmemid = 0; vmemid < nr_vmemranges; vmemid++ )
-    {
-        unsigned int new_memflags = memflags;
-        uint64_t end_pages;
-        unsigned int vnode = vmemranges[vmemid].nid;
-        unsigned int pnode = vnode_to_pnode[vnode];
-
-        if ( pnode != XC_NUMA_NO_NODE )
-            new_memflags |= XENMEMF_exact_node(pnode);
-
-        end_pages = vmemranges[vmemid].end >> PAGE_SHIFT;
-        /*
-         * Consider vga hole belongs to the vmemrange that covers
-         * 0xA0000-0xC0000. Note that 0x00000-0xA0000 is populated just
-         * before this loop.
-         */
-        if ( vmemranges[vmemid].start == 0 )
-        {
-            cur_pages = 0xc0;
-            stat_normal_pages += 0xc0;
-        }
-        else
-            cur_pages = vmemranges[vmemid].start >> PAGE_SHIFT;
-
-        while ( (rc == 0) && (end_pages > cur_pages) )
-        {
-            /* Clip count to maximum 1GB extent. */
-            unsigned long count = end_pages - cur_pages;
-            unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS;
-
-            if ( count > max_pages )
-                count = max_pages;
-
-            cur_pfn = page_array[cur_pages];
-
-            /* Take care the corner cases of super page tails */
-            if ( ((cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
-                 (count > (-cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1))) )
-                count = -cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1);
-            else if ( ((count & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
-                      (count > SUPERPAGE_1GB_NR_PFNS) )
-                count &= ~(SUPERPAGE_1GB_NR_PFNS - 1);
-
-            /* Attemp to allocate 1GB super page. Because in each pass
-             * we only allocate at most 1GB, we don't have to clip
-             * super page boundaries.
-             */
-            if ( ((count | cur_pfn) & (SUPERPAGE_1GB_NR_PFNS - 1)) == 0 &&
-                 /* Check if there exists MMIO hole in the 1GB memory
-                  * range */
-                 !check_mmio_hole(cur_pfn << PAGE_SHIFT,
-                                  SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT,
-                                  args->mmio_start, args->mmio_size) )
-            {
-                long done;
-                unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT;
-                xen_pfn_t sp_extents[nr_extents];
-
-                for ( i = 0; i < nr_extents; i++ )
-                    sp_extents[i] =
-                        page_array[cur_pages+(i<<SUPERPAGE_1GB_SHIFT)];
-
-                done = xc_domain_populate_physmap(xch, dom, nr_extents,
-                                                  SUPERPAGE_1GB_SHIFT,
-                                                  new_memflags,
-                                                  sp_extents);
-
-                if ( done > 0 )
-                {
-                    stat_1gb_pages += done;
-                    done <<= SUPERPAGE_1GB_SHIFT;
-                    cur_pages += done;
-                    count -= done;
-                }
-            }
-
-            if ( count != 0 )
-            {
-                /* Clip count to maximum 8MB extent. */
-                max_pages = SUPERPAGE_2MB_NR_PFNS * 4;
-                if ( count > max_pages )
-                    count = max_pages;
-
-                /* Clip partial superpage extents to superpage
-                 * boundaries. */
-                if ( ((cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
-                     (count > (-cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1))) )
-                    count = -cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1);
-                else if ( ((count & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
-                          (count > SUPERPAGE_2MB_NR_PFNS) )
-                    count &= ~(SUPERPAGE_2MB_NR_PFNS - 1); /* clip non-s.p. tail */
-
-                /* Attempt to allocate superpage extents. */
-                if ( ((count | cur_pfn) & (SUPERPAGE_2MB_NR_PFNS - 1)) == 0 )
-                {
-                    long done;
-                    unsigned long nr_extents = count >> SUPERPAGE_2MB_SHIFT;
-                    xen_pfn_t sp_extents[nr_extents];
-
-                    for ( i = 0; i < nr_extents; i++ )
-                        sp_extents[i] =
-                            page_array[cur_pages+(i<<SUPERPAGE_2MB_SHIFT)];
-
-                    done = xc_domain_populate_physmap(xch, dom, nr_extents,
-                                                      SUPERPAGE_2MB_SHIFT,
-                                                      new_memflags,
-                                                      sp_extents);
-
-                    if ( done > 0 )
-                    {
-                        stat_2mb_pages += done;
-                        done <<= SUPERPAGE_2MB_SHIFT;
-                        cur_pages += done;
-                        count -= done;
-                    }
-                }
-            }
-
-            /* Fall back to 4kB extents. */
-            if ( count != 0 )
-            {
-                rc = xc_domain_populate_physmap_exact(
-                    xch, dom, count, 0, new_memflags, &page_array[cur_pages]);
-                cur_pages += count;
-                stat_normal_pages += count;
-            }
-        }
-
-        if ( rc != 0 )
-            break;
-    }
-
-    if ( rc != 0 )
-    {
-        PERROR("Could not allocate memory for HVM guest.");
-        goto error_out;
-    }
-
-    DPRINTF("PHYSICAL MEMORY ALLOCATION:\n");
-    DPRINTF("  4KB PAGES: 0x%016lx\n", stat_normal_pages);
-    DPRINTF("  2MB PAGES: 0x%016lx\n", stat_2mb_pages);
-    DPRINTF("  1GB PAGES: 0x%016lx\n", stat_1gb_pages);
-
-    rc = 0;
-    goto out;
- error_out:
-    rc = -1;
- out:
-
-    /* ensure no unclaimed pages are left unused */
-    xc_domain_claim_pages(xch, dom, 0 /* cancels the claim */);
-
-    return rc;
-}
-
-static int xc_hvm_load_image(xc_interface *xch,
-                       uint32_t dom, struct xc_hvm_build_args *args,
-                       xen_pfn_t *page_array)
-{
-    unsigned long entry_eip, image_size;
-    struct elf_binary elf;
-    uint64_t v_start, v_end;
-    uint64_t m_start = 0, m_end = 0;
-    char *image;
-    int rc;
-
-    image = xc_read_image(xch, args->image_file_name, &image_size);
-    if ( image == NULL )
-        return -1;
-
-    memset(&elf, 0, sizeof(elf));
-    if ( elf_init(&elf, image, image_size) != 0 )
-        goto error_out;
-
-    xc_elf_set_logfile(xch, &elf, 1);
-
-    elf_parse_binary(&elf);
-    v_start = 0;
-    v_end = args->mem_size;
-
-    if ( modules_init(args, v_end, &elf, &m_start, &m_end) != 0 )
-    {
-        ERROR("Insufficient space to load modules.");
-        goto error_out;
-    }
-
-    DPRINTF("VIRTUAL MEMORY ARRANGEMENT:\n");
-    DPRINTF("  Loader:   %016"PRIx64"->%016"PRIx64"\n", elf.pstart, elf.pend);
-    DPRINTF("  Modules:  %016"PRIx64"->%016"PRIx64"\n", m_start, m_end);
-
-    if ( loadelfimage(xch, &elf, dom, page_array) != 0 )
-    {
-        PERROR("Could not load ELF image");
-        goto error_out;
-    }
-
-    if ( loadmodules(xch, args, m_start, m_end, dom, page_array) != 0 )
-    {
-        PERROR("Could not load ACPI modules");
-        goto error_out;
-    }
-
-    /* Insert JMP <rel32> instruction at address 0x0 to reach entry point. */
-    entry_eip = elf_uval(&elf, elf.ehdr, e_entry);
-    if ( entry_eip != 0 )
-    {
-        char *page0 = xc_map_foreign_range(
-            xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, 0);
-        if ( page0 == NULL )
-            goto error_out;
-        page0[0] = 0xe9;
-        *(uint32_t *)&page0[1] = entry_eip - 5;
-        munmap(page0, PAGE_SIZE);
-    }
-
-    rc = 0;
-    goto out;
- error_out:
-    rc = -1;
- out:
-    if ( elf_check_broken(&elf) )
-        ERROR("HVM ELF broken: %s", elf_check_broken(&elf));
-    free(image);
-
-    return rc;
-}
-
-static int xc_hvm_populate_params(xc_interface *xch, uint32_t dom,
-                                  struct xc_hvm_build_args *args)
-{
-    unsigned long i;
-    void *hvm_info_page;
-    uint32_t *ident_pt;
-    uint64_t v_end;
-    int rc;
-    xen_pfn_t special_array[NR_SPECIAL_PAGES];
-    xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
-
-    v_end = args->mem_size;
-
-    if ( (hvm_info_page = xc_map_foreign_range(
-              xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
-              HVM_INFO_PFN)) == NULL )
-    {
-        PERROR("Could not map hvm info page");
-        goto error_out;
-    }
-    build_hvm_info(hvm_info_page, args);
-    munmap(hvm_info_page, PAGE_SIZE);
-
-    /* Allocate and clear special pages. */
-    for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
-        special_array[i] = special_pfn(i);
-
-    rc = xc_domain_populate_physmap_exact(xch, dom, NR_SPECIAL_PAGES, 0, 0,
-                                          special_array);
-    if ( rc != 0 )
-    {
-        PERROR("Could not allocate special pages.");
-        goto error_out;
-    }
-
-    if ( xc_clear_domain_pages(xch, dom, special_pfn(0), NR_SPECIAL_PAGES) )
-    {
-        PERROR("Could not clear special pages");
-        goto error_out;
-    }
-
-    xc_hvm_param_set(xch, dom, HVM_PARAM_STORE_PFN,
-                     special_pfn(SPECIALPAGE_XENSTORE));
-    xc_hvm_param_set(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
-                     special_pfn(SPECIALPAGE_BUFIOREQ));
-    xc_hvm_param_set(xch, dom, HVM_PARAM_IOREQ_PFN,
-                     special_pfn(SPECIALPAGE_IOREQ));
-    xc_hvm_param_set(xch, dom, HVM_PARAM_CONSOLE_PFN,
-                     special_pfn(SPECIALPAGE_CONSOLE));
-    xc_hvm_param_set(xch, dom, HVM_PARAM_PAGING_RING_PFN,
-                     special_pfn(SPECIALPAGE_PAGING));
-    xc_hvm_param_set(xch, dom, HVM_PARAM_MONITOR_RING_PFN,
-                     special_pfn(SPECIALPAGE_ACCESS));
-    xc_hvm_param_set(xch, dom, HVM_PARAM_SHARING_RING_PFN,
-                     special_pfn(SPECIALPAGE_SHARING));
-
-    /*
-     * Allocate and clear additional ioreq server pages. The default
-     * server will use the IOREQ and BUFIOREQ special pages above.
-     */
-    for ( i = 0; i < NR_IOREQ_SERVER_PAGES; i++ )
-        ioreq_server_array[i] = ioreq_server_pfn(i);
-
-    rc = xc_domain_populate_physmap_exact(xch, dom, NR_IOREQ_SERVER_PAGES, 0, 0,
-                                          ioreq_server_array);
-    if ( rc != 0 )
-    {
-        PERROR("Could not allocate ioreq server pages.");
-        goto error_out;
-    }
-
-    if ( xc_clear_domain_pages(xch, dom, ioreq_server_pfn(0), NR_IOREQ_SERVER_PAGES) )
-    {
-        PERROR("Could not clear ioreq page");
-        goto error_out;
-    }
-
-    /* Tell the domain where the pages are and how many there are */
-    xc_hvm_param_set(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
-                     ioreq_server_pfn(0));
-    xc_hvm_param_set(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
-                     NR_IOREQ_SERVER_PAGES);
-
-    /*
-     * Identity-map page table is required for running with CR0.PG=0 when
-     * using Intel EPT. Create a 32-bit non-PAE page directory of superpages.
-     */
-    if ( (ident_pt = xc_map_foreign_range(
-              xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
-              special_pfn(SPECIALPAGE_IDENT_PT))) == NULL )
-    {
-        PERROR("Could not map special page ident_pt");
-        goto error_out;
-    }
-    for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ )
-        ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
-                       _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-    munmap(ident_pt, PAGE_SIZE);
-    xc_hvm_param_set(xch, dom, HVM_PARAM_IDENT_PT,
-                     special_pfn(SPECIALPAGE_IDENT_PT) << PAGE_SHIFT);
-
-    rc = 0;
-    goto out;
- error_out:
-    rc = -1;
- out:
-
-    return rc;
-}
-
-/* xc_hvm_build:
- * Create a domain for a virtualized Linux, using files/filenames.
- */
-int xc_hvm_build(xc_interface *xch, uint32_t domid,
-                 struct xc_hvm_build_args *hvm_args)
-{
-    struct xc_hvm_build_args args = *hvm_args;
-    xen_pfn_t *parray = NULL;
-    int rc;
-
-    if ( domid == 0 )
-        return -1;
-    if ( args.image_file_name == NULL )
-        return -1;
-
-    /* An HVM guest must be initialised with at least 2MB memory. */
-    if ( args.mem_size < (2ull << 20) || args.mem_target < (2ull << 20) )
-        return -1;
-
-    parray = malloc((args.mem_size >> PAGE_SHIFT) * sizeof(xen_pfn_t));
-    if ( parray == NULL )
-        return -1;
-
-    rc = xc_hvm_populate_memory(xch, domid, &args, parray);
-    if ( rc != 0 )
-    {
-        PERROR("xc_hvm_populate_memory failed");
-        goto out;
-    }
-    rc = xc_hvm_load_image(xch, domid, &args, parray);
-    if ( rc != 0 )
-    {
-        PERROR("xc_hvm_load_image failed");
-        goto out;
-    }
-    rc = xc_hvm_populate_params(xch, domid, &args);
-    if ( rc != 0 )
-    {
-        PERROR("xc_hvm_populate_params failed");
-        goto out;
-    }
-
-    /* Return module load addresses to caller */
-    hvm_args->acpi_module.guest_addr_out = args.acpi_module.guest_addr_out;
-    hvm_args->smbios_module.guest_addr_out = args.smbios_module.guest_addr_out;
-
-out:
-    free(parray);
-
-    return rc;
-}
-
-/* xc_hvm_build_target_mem: 
- * Create a domain for a pre-ballooned virtualized Linux, using
- * files/filenames.  If target < memsize, domain is created with
- * memsize pages marked populate-on-demand, 
- * calculating pod cache size based on target.
- * If target == memsize, pages are populated normally.
- */
-int xc_hvm_build_target_mem(xc_interface *xch,
-                           uint32_t domid,
-                           int memsize,
-                           int target,
-                           const char *image_name)
-{
-    struct xc_hvm_build_args args = {};
-
-    memset(&args, 0, sizeof(struct xc_hvm_build_args));
-    args.mem_size = (uint64_t)memsize << 20;
-    args.mem_target = (uint64_t)target << 20;
-    args.image_file_name = image_name;
-    if ( args.mmio_size == 0 )
-        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
-
-    return xc_hvm_build(xch, domid, &args);
-}
-
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 4
- * tab-width: 4
- * indent-tabs-mode: nil
- * End:
- */
diff --git a/tools/libxc/xg_private.c b/tools/libxc/xg_private.c
index 67946e1..d98f282 100644
--- a/tools/libxc/xg_private.c
+++ b/tools/libxc/xg_private.c
@@ -187,15 +187,6 @@ unsigned long csum_page(void *page)
     return sum ^ (sum>>32);
 }
 
-__attribute__((weak)) 
-    int xc_hvm_build(xc_interface *xch,
-                     uint32_t domid,
-                     struct xc_hvm_build_args *hvm_args)
-{
-    errno = ENOSYS;
-    return -1;
-}
-
 /*
  * Local variables:
  * mode: C
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 9ab53fb..4ad8fc0 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -910,77 +910,6 @@ static PyObject *pyxc_dom_suppress_spurious_page_faults(XcObject *self,
 }
 #endif /* __i386__ || __x86_64__ */
 
-static PyObject *pyxc_hvm_build(XcObject *self,
-                                PyObject *args,
-                                PyObject *kwds)
-{
-    uint32_t dom;
-    struct hvm_info_table *va_hvm;
-    uint8_t *va_map, sum;
-    int i;
-    char *image;
-    int memsize, target=-1, vcpus = 1, acpi = 0, apic = 1;
-    PyObject *vcpu_avail_handle = NULL;
-    uint8_t vcpu_avail[(HVM_MAX_VCPUS + 7)/8];
-
-    static char *kwd_list[] = { "domid",
-                                "memsize", "image", "target", "vcpus", 
-                                "vcpu_avail", "acpi", "apic", NULL };
-    if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iis|iiOii", kwd_list,
-                                      &dom, &memsize, &image, &target, &vcpus,
-                                      &vcpu_avail_handle, &acpi, &apic) )
-        return NULL;
-
-    memset(vcpu_avail, 0, sizeof(vcpu_avail));
-    vcpu_avail[0] = 1;
-    if ( vcpu_avail_handle != NULL )
-    {
-        if ( PyInt_Check(vcpu_avail_handle) )
-        {
-            unsigned long v = PyInt_AsLong(vcpu_avail_handle);
-            for ( i = 0; i < sizeof(long); i++ )
-                vcpu_avail[i] = (uint8_t)(v>>(i*8));
-        }
-        else if ( PyLong_Check(vcpu_avail_handle) )
-        {
-            if ( _PyLong_AsByteArray((PyLongObject *)vcpu_avail_handle,
-                                     (unsigned char *)vcpu_avail,
-                                     sizeof(vcpu_avail), 1, 0) )
-                return NULL;
-        }
-        else
-        {
-            errno = EINVAL;
-            PyErr_SetFromErrno(xc_error_obj);
-            return NULL;
-        }
-    }
-
-    if ( target == -1 )
-        target = memsize;
-
-    if ( xc_hvm_build_target_mem(self->xc_handle, dom, memsize,
-                                 target, image) != 0 )
-        return pyxc_error_to_exception(self->xc_handle);
-
-    /* Fix up the HVM info table. */
-    va_map = xc_map_foreign_range(self->xc_handle, dom, XC_PAGE_SIZE,
-                                  PROT_READ | PROT_WRITE,
-                                  HVM_INFO_PFN);
-    if ( va_map == NULL )
-        return PyErr_SetFromErrno(xc_error_obj);
-    va_hvm = (struct hvm_info_table *)(va_map + HVM_INFO_OFFSET);
-    va_hvm->apic_mode    = apic;
-    va_hvm->nr_vcpus     = vcpus;
-    memcpy(va_hvm->vcpu_online, vcpu_avail, sizeof(vcpu_avail));
-    for ( i = 0, sum = 0; i < va_hvm->length; i++ )
-        sum += ((uint8_t *)va_hvm)[i];
-    va_hvm->checksum -= sum;
-    munmap(va_map, XC_PAGE_SIZE);
-
-    return Py_BuildValue("{}");
-}
-
 static PyObject *pyxc_gnttab_hvm_seed(XcObject *self,
 				      PyObject *args,
 				      PyObject *kwds)
@@ -2366,16 +2295,6 @@ static PyMethodDef pyxc_methods[] = {
       " image   [str]:      Name of kernel image file. May be gzipped.\n"
       " cmdline [str, n/a]: Kernel parameters, if any.\n\n"},
 
-    { "hvm_build", 
-      (PyCFunction)pyxc_hvm_build, 
-      METH_VARARGS | METH_KEYWORDS, "\n"
-      "Build a new HVM guest OS.\n"
-      " dom     [int]:      Identifier of domain to build into.\n"
-      " image   [str]:      Name of HVM loader image file.\n"
-      " vcpus   [int, 1]:   Number of Virtual CPUS in domain.\n\n"
-      " vcpu_avail [long, 1]: Which Virtual CPUS available.\n\n"
-      "Returns: [int] 0 on success; -1 on error.\n" },
-
     { "gnttab_hvm_seed",
       (PyCFunction)pyxc_gnttab_hvm_seed,
       METH_KEYWORDS, "\n"
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (9 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 10/29] libxc: remove dead HVM building code Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:25   ` Wei Liu
                     ` (3 more replies)
  2015-09-04 12:08 ` [PATCH v6 12/29] xen/x86: allow disabling the emulated local apic Roger Pau Monne
                   ` (18 subsequent siblings)
  29 siblings, 4 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Roger Pau Monne

Introduce a bitmap in x86 xen_arch_domainconfig that allows enabling or
disabling specific devices emulated inside of Xen for HVM guests.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v4:
 - Add a check to make sure the emulation bitmap is sane (undefined bits are
   all 0s).
 - Add Andrew Cooper Reviewed-by.

Changes since v3:
 - Return EOPNOTSUPP instead of ENOPERM if an invalid emulation mask is
   used.
 - Fix error messages (prefix them with d%d and use %#x instead of 0x%x).
 - Clearly state in the public header that emulation_flags should only be
   used with HVM guests.
 - Add a XEN_X86 prefix to the emulation flags defines.
 - Properly parenthese the has_* marcos.
---
 tools/libxl/libxl_x86.c           |  8 ++++++--
 xen/arch/x86/domain.c             | 23 +++++++++++++++++++++++
 xen/include/asm-x86/domain.h      | 13 +++++++++++++
 xen/include/public/arch-x86/xen.h | 21 ++++++++++++++++++++-
 4 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 9276126..9ecd85d 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -7,8 +7,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
                                       libxl_domain_config *d_config,
                                       xc_domain_configuration_t *xc_config)
 {
-    /* No specific configuration right now */
-
+    if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM)
+        xc_config->emulation_flags = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
+                                      XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
+                                      XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
+                                      XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
+                                      XEN_X86_EMU_IOMMU);
     return 0;
 }
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 045f6ff..fe9504f 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -555,6 +555,29 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
                d->domain_id);
     }
 
+    if ( is_hvm_domain(d) )
+    {
+        uint32_t emulation_mask = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
+                                   XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
+                                   XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
+                                   XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
+                                   XEN_X86_EMU_IOMMU);
+        if ( (config->emulation_flags & ~emulation_mask) != 0 )
+        {
+            printk(XENLOG_G_ERR "d%d: Invalid emulation bitmap: %#x.\n",
+                   d->domain_id, config->emulation_flags);
+            return -EINVAL;
+        }
+        if ( config->emulation_flags != emulation_mask )
+        {
+            printk(XENLOG_G_ERR "d%d: Xen does not allow HVM creation with the "
+                   "current selection of emulators: %#x.\n", d->domain_id,
+                   config->emulation_flags);
+            return -EOPNOTSUPP;
+        }
+        d->arch.emulation_flags = config->emulation_flags;
+    }
+
     if ( has_hvm_container_domain(d) )
     {
         d->arch.hvm_domain.hap_enabled =
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 0fce09e..2527637 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -387,8 +387,21 @@ struct arch_domain
     bool_t mem_access_emulate_enabled;
 
     struct monitor_write_data *event_write_data;
+
+    /* Emulated devices enabled bitmap. */
+    uint32_t emulation_flags;
 } __cacheline_aligned;
 
+#define has_vlapic(d)       ((d)->arch.emulation_flags & XEN_X86_EMU_LAPIC)
+#define has_vhpet(d)        ((d)->arch.emulation_flags & XEN_X86_EMU_HPET)
+#define has_vpmtimer(d)     ((d)->arch.emulation_flags & XEN_X86_EMU_PMTIMER)
+#define has_vrtc(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_RTC)
+#define has_vioapic(d)      ((d)->arch.emulation_flags & XEN_X86_EMU_IOAPIC)
+#define has_vpic(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_PIC)
+#define has_vpmu(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_PMU)
+#define has_vvga(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_VGA)
+#define has_viommu(d)       ((d)->arch.emulation_flags & XEN_X86_EMU_IOMMU)
+
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
 
 #define gdt_ldt_pt_idx(v) \
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index 2ecc9c9..98cae41 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -268,7 +268,26 @@ typedef struct arch_shared_info arch_shared_info_t;
  * XEN_DOMCTL_INTERFACE_VERSION.
  */
 struct xen_arch_domainconfig {
-    char dummy;
+#define _XEN_X86_EMU_LAPIC              0
+#define XEN_X86_EMU_LAPIC               (1U<<_XEN_X86_EMU_LAPIC)
+#define _XEN_X86_EMU_HPET               1
+#define XEN_X86_EMU_HPET                (1U<<_XEN_X86_EMU_HPET)
+#define _XEN_X86_EMU_PMTIMER            2
+#define XEN_X86_EMU_PMTIMER             (1U<<_XEN_X86_EMU_PMTIMER)
+#define _XEN_X86_EMU_RTC                3
+#define XEN_X86_EMU_RTC                 (1U<<_XEN_X86_EMU_RTC)
+#define _XEN_X86_EMU_IOAPIC             4
+#define XEN_X86_EMU_IOAPIC              (1U<<_XEN_X86_EMU_IOAPIC)
+#define _XEN_X86_EMU_PIC                5
+#define XEN_X86_EMU_PIC                 (1U<<_XEN_X86_EMU_PIC)
+#define _XEN_X86_EMU_PMU                6
+#define XEN_X86_EMU_PMU                 (1U<<_XEN_X86_EMU_PMU)
+#define _XEN_X86_EMU_VGA                7
+#define XEN_X86_EMU_VGA                 (1U<<_XEN_X86_EMU_VGA)
+#define _XEN_X86_EMU_IOMMU              8
+#define XEN_X86_EMU_IOMMU               (1U<<_XEN_X86_EMU_IOMMU)
+    /* For HVM guests only, this field is ignored for PV guests. */
+    uint32_t emulation_flags;
 };
 #endif
 
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 12/29] xen/x86: allow disabling the emulated local apic
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (10 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-16 10:05   ` Jan Beulich
  2015-09-04 12:08 ` [PATCH v6 13/29] xen/x86: allow disabling the emulated HPET Roger Pau Monne
                   ` (17 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Suravee Suthikulpanit, Jun Nakajima, Andrew Cooper,
	Eddie Dong, Aravind Gopalakrishnan, Jan Beulich, Boris Ostrovsky,
	Roger Pau Monne

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Eddie Dong <eddie.dong@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
---
Changes since v5:
 - Add Boris Ostrovsky Reviewed-by.

Changes since v4:
 - Split the is_pvh_domain check into two, so part of the code can be shared
   with the !has_lapic case.
 - Add Andrew Cooper Acked-by.
---
 xen/arch/x86/hvm/svm/svm.c  | 16 +++++++++-------
 xen/arch/x86/hvm/vlapic.c   | 30 +++++++++++++++++++++++++-----
 xen/arch/x86/hvm/vmsi.c     |  6 ++++++
 xen/arch/x86/hvm/vmx/vmcs.c |  5 ++++-
 xen/arch/x86/hvm/vmx/vmx.c  |  9 ++++++++-
 5 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 8de41fa..97dc507 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1035,6 +1035,7 @@ static void noreturn svm_do_resume(struct vcpu *v)
     struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
     bool_t debug_state = v->domain->debugger_attached;
     bool_t vcpu_guestmode = 0;
+    struct vlapic *vlapic = vcpu_vlapic(v);
 
     if ( nestedhvm_enabled(v->domain) && nestedhvm_vcpu_in_guestmode(v) )
         vcpu_guestmode = 1;
@@ -1058,14 +1059,14 @@ static void noreturn svm_do_resume(struct vcpu *v)
         hvm_asid_flush_vcpu(v);
     }
 
-    if ( !vcpu_guestmode )
+    if ( !vcpu_guestmode && !vlapic_hw_disabled(vlapic) )
     {
         vintr_t intr;
 
         /* Reflect the vlapic's TPR in the hardware vtpr */
         intr = vmcb_get_vintr(vmcb);
         intr.fields.tpr =
-            (vlapic_get_reg(vcpu_vlapic(v), APIC_TASKPRI) & 0xFF) >> 4;
+            (vlapic_get_reg(vlapic, APIC_TASKPRI) & 0xFF) >> 4;
         vmcb_set_vintr(vmcb, intr);
     }
 
@@ -2294,6 +2295,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
     int inst_len, rc;
     vintr_t intr;
     bool_t vcpu_guestmode = 0;
+    struct vlapic *vlapic = vcpu_vlapic(v);
 
     hvm_invalidate_regs_fields(regs);
 
@@ -2311,11 +2313,11 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
      * NB. We need to preserve the low bits of the TPR to make checked builds
      * of Windows work, even though they don't actually do anything.
      */
-    if ( !vcpu_guestmode ) {
+    if ( !vcpu_guestmode && !vlapic_hw_disabled(vlapic) ) {
         intr = vmcb_get_vintr(vmcb);
-        vlapic_set_reg(vcpu_vlapic(v), APIC_TASKPRI,
+        vlapic_set_reg(vlapic, APIC_TASKPRI,
                    ((intr.fields.tpr & 0x0F) << 4) |
-                   (vlapic_get_reg(vcpu_vlapic(v), APIC_TASKPRI) & 0x0F));
+                   (vlapic_get_reg(vlapic, APIC_TASKPRI) & 0x0F));
     }
 
     exit_reason = vmcb->exitcode;
@@ -2697,14 +2699,14 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
     }
 
   out:
-    if ( vcpu_guestmode )
+    if ( vcpu_guestmode || vlapic_hw_disabled(vlapic) )
         /* Don't clobber TPR of the nested guest. */
         return;
 
     /* The exit may have updated the TPR: reflect this in the hardware vtpr */
     intr = vmcb_get_vintr(vmcb);
     intr.fields.tpr =
-        (vlapic_get_reg(vcpu_vlapic(v), APIC_TASKPRI) & 0xFF) >> 4;
+        (vlapic_get_reg(vlapic, APIC_TASKPRI) & 0xFF) >> 4;
     vmcb_set_vintr(vmcb, intr);
 }
 
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index b893b40..e355679 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -993,6 +993,9 @@ static void set_x2apic_id(struct vlapic *vlapic)
 
 bool_t vlapic_msr_set(struct vlapic *vlapic, uint64_t value)
 {
+    if ( !has_vlapic(vlapic_domain(vlapic)) )
+        return 0;
+
     if ( (vlapic->hw.apic_base_msr ^ value) & MSR_IA32_APICBASE_ENABLE )
     {
         if ( unlikely(value & MSR_IA32_APICBASE_EXTD) )
@@ -1042,8 +1045,7 @@ void vlapic_tdt_msr_set(struct vlapic *vlapic, uint64_t value)
     uint64_t guest_tsc;
     struct vcpu *v = vlapic_vcpu(vlapic);
 
-    /* may need to exclude some other conditions like vlapic->hw.disabled */
-    if ( !vlapic_lvtt_tdt(vlapic) )
+    if ( !vlapic_lvtt_tdt(vlapic) || vlapic_hw_disabled(vlapic) )
     {
         HVM_DBG_LOG(DBG_LEVEL_VLAPIC_TIMER, "ignore tsc deadline msr write");
         return;
@@ -1118,6 +1120,9 @@ static int __vlapic_accept_pic_intr(struct vcpu *v)
 
 int vlapic_accept_pic_intr(struct vcpu *v)
 {
+    if ( vlapic_hw_disabled(vcpu_vlapic(v)) )
+        return 0;
+
     TRACE_2D(TRC_HVM_EMUL_LAPIC_PIC_INTR,
              (v == v->domain->arch.hvm_domain.i8259_target),
              v ? __vlapic_accept_pic_intr(v) : -1);
@@ -1265,6 +1270,9 @@ static int lapic_save_hidden(struct domain *d, hvm_domain_context_t *h)
     struct vlapic *s;
     int rc = 0;
 
+    if ( !has_vlapic(d) )
+        return 0;
+
     for_each_vcpu ( d, v )
     {
         s = vcpu_vlapic(v);
@@ -1281,6 +1289,9 @@ static int lapic_save_regs(struct domain *d, hvm_domain_context_t *h)
     struct vlapic *s;
     int rc = 0;
 
+    if ( !has_vlapic(d) )
+        return 0;
+
     for_each_vcpu ( d, v )
     {
         if ( hvm_funcs.sync_pir_to_irr )
@@ -1328,7 +1339,10 @@ static int lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
     uint16_t vcpuid;
     struct vcpu *v;
     struct vlapic *s;
-    
+
+    if ( !has_vlapic(d) )
+        return 0;
+
     /* Which vlapic to load? */
     vcpuid = hvm_load_instance(h); 
     if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
@@ -1360,7 +1374,10 @@ static int lapic_load_regs(struct domain *d, hvm_domain_context_t *h)
     uint16_t vcpuid;
     struct vcpu *v;
     struct vlapic *s;
-    
+
+    if ( !has_vlapic(d) )
+        return 0;
+
     /* Which vlapic to load? */
     vcpuid = hvm_load_instance(h); 
     if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
@@ -1399,7 +1416,7 @@ int vlapic_init(struct vcpu *v)
 
     HVM_DBG_LOG(DBG_LEVEL_VLAPIC, "%d", v->vcpu_id);
 
-    if ( is_pvh_vcpu(v) )
+    if ( is_pvh_vcpu(v) || !has_vlapic(v->domain) )
     {
         vlapic->hw.disabled = VLAPIC_HW_DISABLED;
         return 0;
@@ -1452,6 +1469,9 @@ void vlapic_destroy(struct vcpu *v)
 {
     struct vlapic *vlapic = vcpu_vlapic(v);
 
+    if ( !has_vlapic(vlapic_domain(vlapic)) )
+        return;
+
     tasklet_kill(&vlapic->init_sipi.tasklet);
     TRACE_0D(TRC_HVM_EMUL_LAPIC_STOP_TIMER);
     destroy_periodic_time(&vlapic->pt);
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index ac838a9..e6a145d 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -482,6 +482,9 @@ found:
 
 void msixtbl_init(struct domain *d)
 {
+    if ( !has_vlapic(d) )
+        return;
+
     INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
     spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock);
 
@@ -493,6 +496,9 @@ void msixtbl_pt_cleanup(struct domain *d)
     struct msixtbl_entry *entry, *temp;
     unsigned long flags;
 
+    if ( !has_vlapic(d) )
+        return;
+
     /* msixtbl_list_lock must be acquired with irq_disabled for check_lock() */
     local_irq_save(flags); 
     spin_lock(&d->arch.hvm_domain.msixtbl_list_lock);
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index a0a97e7..4251b2c 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1002,7 +1002,7 @@ static int construct_vmcs(struct vcpu *v)
         ~(SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
           SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
 
-    if ( is_pvh_domain(d) )
+    if ( is_pvh_domain(d) || !has_vlapic(d) )
     {
         /* Disable virtual apics, TPR */
         v->arch.hvm_vmx.secondary_exec_control &=
@@ -1014,7 +1014,10 @@ static int construct_vmcs(struct vcpu *v)
         /* In turn, disable posted interrupts. */
         __vmwrite(PIN_BASED_VM_EXEC_CONTROL,
                   vmx_pin_based_exec_control & ~PIN_BASED_POSTED_INTERRUPT);
+    }
 
+    if ( is_pvh_domain(d) )
+    {
         /* Unrestricted guest (real mode for EPT) */
         v->arch.hvm_vmx.secondary_exec_control &=
             ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2582cdd..763b589 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -89,6 +89,9 @@ static int vmx_domain_initialise(struct domain *d)
 {
     int rc;
 
+    if ( !has_vlapic(d) )
+        return 0;
+
     if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 )
         return rc;
 
@@ -97,6 +100,9 @@ static int vmx_domain_initialise(struct domain *d)
 
 static void vmx_domain_destroy(struct domain *d)
 {
+    if ( !has_vlapic(d) )
+        return;
+
     vmx_free_vlapic_mapping(d);
 }
 
@@ -2393,7 +2399,8 @@ static void vmx_install_vlapic_mapping(struct vcpu *v)
 {
     paddr_t virt_page_ma, apic_page_ma;
 
-    if ( !cpu_has_vmx_virtualize_apic_accesses )
+    if ( !cpu_has_vmx_virtualize_apic_accesses ||
+         v->domain->arch.hvm_domain.vmx.apic_access_mfn == 0 )
         return;
 
     virt_page_ma = page_to_maddr(vcpu_vlapic(v)->regs_page);
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 13/29] xen/x86: allow disabling the emulated HPET
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (11 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 12/29] xen/x86: allow disabling the emulated local apic Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 14/29] xen/x86: allow disabling the pmtimer Roger Pau Monne
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes in v4:
 - Add Andrew Cooper Acked-by.
---
 xen/arch/x86/hvm/hpet.c | 13 +++++++++++++
 xen/arch/x86/hvm/hvm.c  |  1 -
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index edf9a17..266b587 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -516,6 +516,9 @@ static int hpet_save(struct domain *d, hvm_domain_context_t *h)
     int rc;
     uint64_t guest_time;
 
+    if ( !has_vhpet(d) )
+        return 0;
+
     write_lock(&hp->lock);
     guest_time = guest_time_hpet(hp);
 
@@ -575,6 +578,9 @@ static int hpet_load(struct domain *d, hvm_domain_context_t *h)
     uint64_t guest_time;
     int i;
 
+    if ( !has_vhpet(d) )
+        return 0;
+
     write_lock(&hp->lock);
 
     /* Reload the HPET registers */
@@ -633,6 +639,9 @@ void hpet_init(struct domain *d)
     HPETState *h = domain_vhpet(d);
     int i;
 
+    if ( !has_vhpet(d) )
+        return;
+
     memset(h, 0, sizeof(HPETState));
 
     rwlock_init(&h->lock);
@@ -660,6 +669,7 @@ void hpet_init(struct domain *d)
     }
 
     register_mmio_handler(d, &hpet_mmio_ops);
+    d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1;
 }
 
 void hpet_deinit(struct domain *d)
@@ -667,6 +677,9 @@ void hpet_deinit(struct domain *d)
     int i;
     HPETState *h = domain_vhpet(d);
 
+    if ( !has_vhpet(d) )
+        return;
+
     write_lock(&h->lock);
 
     if ( hpet_enabled(h) )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 615fa89..1640b58 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1594,7 +1594,6 @@ int hvm_domain_initialise(struct domain *d)
 
     hvm_init_guest_time(d);
 
-    d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1;
     d->arch.hvm_domain.params[HVM_PARAM_TRIPLE_FAULT_REASON] = SHUTDOWN_reboot;
 
     vpic_init(d);
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 14/29] xen/x86: allow disabling the pmtimer
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (12 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 13/29] xen/x86: allow disabling the emulated HPET Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 15/29] xen/x86: allow disabling the emulated RTC Roger Pau Monne
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v4:
 - Add Andrew Cooper Acked-by.
---
 xen/arch/x86/hvm/pmtimer.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/xen/arch/x86/hvm/pmtimer.c b/xen/arch/x86/hvm/pmtimer.c
index 07594e1..199a15e 100644
--- a/xen/arch/x86/hvm/pmtimer.c
+++ b/xen/arch/x86/hvm/pmtimer.c
@@ -247,6 +247,9 @@ static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
     uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
     int rc;
 
+    if ( !has_vpmtimer(d) )
+        return 0;
+
     spin_lock(&s->lock);
 
     /* Update the counter to the guest's current time.  We always save
@@ -271,6 +274,9 @@ static int pmtimer_load(struct domain *d, hvm_domain_context_t *h)
 {
     PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
 
+    if ( !has_vpmtimer(d) )
+        return 0;
+
     spin_lock(&s->lock);
 
     /* Reload the registers */
@@ -328,6 +334,9 @@ void pmtimer_init(struct vcpu *v)
 {
     PMTState *s = &v->domain->arch.hvm_domain.pl_time.vpmt;
 
+    if ( !has_vpmtimer(v->domain) )
+        return;
+
     spin_lock_init(&s->lock);
 
     s->scale = ((uint64_t)FREQUENCE_PMTIMER << 32) / SYSTEM_TIME_HZ;
@@ -348,6 +357,10 @@ void pmtimer_init(struct vcpu *v)
 void pmtimer_deinit(struct domain *d)
 {
     PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
+
+    if ( !has_vpmtimer(d) )
+        return;
+
     kill_timer(&s->timer);
 }
 
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 15/29] xen/x86: allow disabling the emulated RTC
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (13 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 14/29] xen/x86: allow disabling the pmtimer Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 16/29] xen/x86: allow disabling the emulated IO APIC Roger Pau Monne
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v4:
 - Add Andrew Cooper Acked-by.
---
 xen/arch/x86/hvm/rtc.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index a9efeaf..bc93f66 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -726,6 +726,9 @@ void rtc_migrate_timers(struct vcpu *v)
 {
     RTCState *s = vcpu_vrtc(v);
 
+    if ( !has_vrtc(v->domain) )
+        return;
+
     if ( v->vcpu_id == 0 )
     {
         migrate_timer(&s->update_timer, v->processor);;
@@ -739,6 +742,10 @@ static int rtc_save(struct domain *d, hvm_domain_context_t *h)
 {
     RTCState *s = domain_vrtc(d);
     int rc;
+
+    if ( !has_vrtc(d) )
+        return 0;
+
     spin_lock(&s->lock);
     rc = hvm_save_entry(RTC, 0, h, &s->hw);
     spin_unlock(&s->lock);
@@ -750,6 +757,9 @@ static int rtc_load(struct domain *d, hvm_domain_context_t *h)
 {
     RTCState *s = domain_vrtc(d);
 
+    if ( !has_vrtc(d) )
+        return 0;
+
     spin_lock(&s->lock);
 
     /* Restore the registers */
@@ -790,6 +800,9 @@ void rtc_init(struct domain *d)
 {
     RTCState *s = domain_vrtc(d);
 
+    if ( !has_vrtc(d) )
+        return;
+
     spin_lock_init(&s->lock);
 
     init_timer(&s->update_timer, rtc_update_timer, s, smp_processor_id());
@@ -820,6 +833,9 @@ void rtc_deinit(struct domain *d)
 {
     RTCState *s = domain_vrtc(d);
 
+    if ( !has_vrtc(d) )
+        return;
+
     spin_barrier(&s->lock);
 
     TRACE_0D(TRC_HVM_EMUL_RTC_STOP_TIMER);
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 16/29] xen/x86: allow disabling the emulated IO APIC
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (14 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 15/29] xen/x86: allow disabling the emulated RTC Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 17/29] xen/x86: allow disabling the emulated PIC Roger Pau Monne
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v4:
 - Add Andrew Cooper Acked-by.
---
 xen/arch/x86/hvm/vioapic.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index d348235..30a4a0f 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -424,12 +424,20 @@ void vioapic_update_EOI(struct domain *d, u8 vector)
 static int ioapic_save(struct domain *d, hvm_domain_context_t *h)
 {
     struct hvm_hw_vioapic *s = domain_vioapic(d);
+
+    if ( !has_vioapic(d) )
+        return 0;
+
     return hvm_save_entry(IOAPIC, 0, h, s);
 }
 
 static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
 {
     struct hvm_hw_vioapic *s = domain_vioapic(d);
+
+    if ( !has_vioapic(d) )
+        return 0;
+
     return hvm_load_entry(IOAPIC, h, s);
 }
 
@@ -448,6 +456,9 @@ void vioapic_reset(struct domain *d)
 
 int vioapic_init(struct domain *d)
 {
+    if ( !has_vioapic(d) )
+        return 0;
+
     if ( (d->arch.hvm_domain.vioapic == NULL) &&
          ((d->arch.hvm_domain.vioapic = xmalloc(struct hvm_vioapic)) == NULL) )
         return -ENOMEM;
@@ -462,6 +473,9 @@ int vioapic_init(struct domain *d)
 
 void vioapic_deinit(struct domain *d)
 {
+    if ( !has_vioapic(d) )
+        return;
+
     xfree(d->arch.hvm_domain.vioapic);
     d->arch.hvm_domain.vioapic = NULL;
 }
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 17/29] xen/x86: allow disabling the emulated PIC
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (15 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 16/29] xen/x86: allow disabling the emulated IO APIC Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-21 14:34   ` Jan Beulich
  2015-09-04 12:08 ` [PATCH v6 18/29] xen/x86: allow disabling the emulated pmu Roger Pau Monne
                   ` (12 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v4:
 - Add Andrew Cooper Acked-by.
---
 xen/arch/x86/hvm/vpic.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
index 7c2edc8..5938f40 100644
--- a/xen/arch/x86/hvm/vpic.c
+++ b/xen/arch/x86/hvm/vpic.c
@@ -377,6 +377,9 @@ static int vpic_save(struct domain *d, hvm_domain_context_t *h)
     struct hvm_hw_vpic *s;
     int i;
 
+    if ( !has_vpic(d) )
+        return 0;
+
     /* Save the state of both PICs */
     for ( i = 0; i < 2 ; i++ )
     {
@@ -392,7 +395,10 @@ static int vpic_load(struct domain *d, hvm_domain_context_t *h)
 {
     struct hvm_hw_vpic *s;
     uint16_t inst;
-    
+
+    if ( !has_vpic(d) )
+        return 0;
+
     /* Which PIC is this? */
     inst = hvm_load_instance(h);
     if ( inst > 1 )
@@ -425,6 +431,9 @@ void vpic_reset(struct domain *d)
 
 void vpic_init(struct domain *d)
 {
+    if ( !has_vpic(d) )
+        return;
+
     vpic_reset(d);
 
     register_portio_handler(d, 0x20, 2, vpic_intercept_pic_io);
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 18/29] xen/x86: allow disabling the emulated pmu
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (16 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 17/29] xen/x86: allow disabling the emulated PIC Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-21 14:36   ` Jan Beulich
  2015-09-04 12:08 ` [PATCH v6 19/29] xen/x86: allow disabling the emulated VGA Roger Pau Monne
                   ` (11 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v4:
 - Add Andrew Cooper Acked-by.
---
 xen/arch/x86/cpu/vpmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 8af3df1..d5bb77d 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -439,6 +439,9 @@ void vpmu_initialise(struct vcpu *v)
     int ret;
     bool_t is_priv_vpmu = is_hardware_domain(v->domain);
 
+    if ( !has_vpmu(v->domain) )
+        return;
+
     BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
     BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
     BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 19/29] xen/x86: allow disabling the emulated VGA
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (17 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 18/29] xen/x86: allow disabling the emulated pmu Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-04 12:08 ` [PATCH v6 20/29] xen/x86: allow disabling the emulated IOMMU Roger Pau Monne
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v4:
 - Add Andrew Cooper Acked-by.
---
 xen/arch/x86/hvm/stdvga.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index f50bff7..a3296bd 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -555,6 +555,9 @@ void stdvga_init(struct domain *d)
     void *p;
     int i;
 
+    if ( !has_vvga(d) )
+        return;
+
     memset(s, 0, sizeof(*s));
     spin_lock_init(&s->lock);
     
@@ -594,6 +597,9 @@ void stdvga_deinit(struct domain *d)
     struct hvm_hw_stdvga *s = &d->arch.hvm_domain.stdvga;
     int i;
 
+    if ( !has_vvga(d) )
+        return;
+
     for ( i = 0; i != ARRAY_SIZE(s->vram_page); i++ )
     {
         if ( s->vram_page[i] == NULL )
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 20/29] xen/x86: allow disabling the emulated IOMMU
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (18 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 19/29] xen/x86: allow disabling the emulated VGA Roger Pau Monne
@ 2015-09-04 12:08 ` Roger Pau Monne
  2015-09-28 13:58   ` Aravind Gopalakrishnan
  2015-09-04 12:09 ` [PATCH v6 21/29] xen/x86: allow disabling all emulated devices inside of Xen Roger Pau Monne
                   ` (9 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Aravind Gopalakrishnan, Suravee Suthikulpanit, Roger Pau Monne

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
Changes since v4:
 - Add Andrew Cooper Acked-by.
---
 xen/drivers/passthrough/amd/iommu_guest.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index e74f469..b4e75ac 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -887,7 +887,8 @@ int guest_iommu_init(struct domain* d)
     struct guest_iommu *iommu;
     struct hvm_iommu *hd  = domain_hvm_iommu(d);
 
-    if ( !is_hvm_domain(d) || !iommu_enabled || !iommuv2_enabled )
+    if ( !is_hvm_domain(d) || !iommu_enabled || !iommuv2_enabled ||
+         !has_viommu(d) )
         return 0;
 
     iommu = xzalloc(struct guest_iommu);
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 21/29] xen/x86: allow disabling all emulated devices inside of Xen
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (19 preceding siblings ...)
  2015-09-04 12:08 ` [PATCH v6 20/29] xen/x86: allow disabling the emulated IOMMU Roger Pau Monne
@ 2015-09-04 12:09 ` Roger Pau Monne
  2015-09-04 12:09 ` [PATCH v6 22/29] elfnotes: intorduce a new PHYS_ENTRY elfnote Roger Pau Monne
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Only allow enabling or disabling all the emulated devices inside of Xen,
right now Xen doesn't support enabling specific emulated devices only.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v5:
 - Add Andrew Cooper Reviewed-by.
---
 xen/arch/x86/domain.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fe9504f..8fe95f7 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -568,7 +568,8 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
                    d->domain_id, config->emulation_flags);
             return -EINVAL;
         }
-        if ( config->emulation_flags != emulation_mask )
+        if ( config->emulation_flags != emulation_mask &&
+             config->emulation_flags != 0 )
         {
             printk(XENLOG_G_ERR "d%d: Xen does not allow HVM creation with the "
                    "current selection of emulators: %#x.\n", d->domain_id,
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 22/29] elfnotes: intorduce a new PHYS_ENTRY elfnote
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (20 preceding siblings ...)
  2015-09-04 12:09 ` [PATCH v6 21/29] xen/x86: allow disabling all emulated devices inside of Xen Roger Pau Monne
@ 2015-09-04 12:09 ` Roger Pau Monne
  2015-09-21 14:47   ` Jan Beulich
  2015-09-04 12:09 ` [PATCH v6 23/29] libxc: allow creating domains without emulated devices Roger Pau Monne
                   ` (7 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

This new elfnote contains the 32bit entry point into the kernel. Xen will
use this entry point in order to launch the guest kernel in 32bit protected
mode with paging disabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v4:
 - Add Andrew Cooper Reviewed-by and Wei Liu Acked-by.
---
 tools/xcutils/readnotes.c          |  3 +++
 xen/common/libelf/libelf-dominfo.c |  4 ++++
 xen/include/public/elfnote.h       | 11 ++++++++++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/xcutils/readnotes.c b/tools/xcutils/readnotes.c
index 5fa445e..e682dd1 100644
--- a/tools/xcutils/readnotes.c
+++ b/tools/xcutils/readnotes.c
@@ -159,6 +159,9 @@ static unsigned print_notes(struct elf_binary *elf, ELF_HANDLE_DECL(elf_note) st
 		case XEN_ELFNOTE_L1_MFN_VALID:
 			print_l1_mfn_valid_note("L1_MFN_VALID", elf , note);
 			break;
+		case XEN_ELFNOTE_PHYS32_ENTRY:
+			print_numeric_note("PHYS32_ENTRY", elf , note);
+			break;
 		default:
 			printf("unknown note type %#x\n",
 			       (unsigned)elf_uval(elf, note, type));
diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
index f929968..365e058 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -119,6 +119,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         [XEN_ELFNOTE_BSD_SYMTAB] = { "BSD_SYMTAB", 1},
         [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", 0 },
         [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", 0 },
+        [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", 0 },
     };
 /* *INDENT-ON* */
 
@@ -212,6 +213,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
                 elf, note, sizeof(*parms->f_supported), i);
         break;
 
+    case XEN_ELFNOTE_PHYS32_ENTRY:
+        parms->phys_entry = val;
+        break;
     }
     return 0;
 }
diff --git a/xen/include/public/elfnote.h b/xen/include/public/elfnote.h
index 3824a94..e6fc596 100644
--- a/xen/include/public/elfnote.h
+++ b/xen/include/public/elfnote.h
@@ -200,9 +200,18 @@
 #define XEN_ELFNOTE_SUPPORTED_FEATURES 17
 
 /*
+ * Physical entry point into the kernel.
+ *
+ * 32bit entry point into the kernel. Xen will use this entry point
+ * in order to launch the guest kernel in 32bit protected mode
+ * with paging disabled.
+ */
+#define XEN_ELFNOTE_PHYS32_ENTRY 18
+
+/*
  * The number of the highest elfnote defined.
  */
-#define XEN_ELFNOTE_MAX XEN_ELFNOTE_SUPPORTED_FEATURES
+#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_ENTRY
 
 /*
  * System information exported through crash notes.
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 23/29] libxc: allow creating domains without emulated devices.
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (21 preceding siblings ...)
  2015-09-04 12:09 ` [PATCH v6 22/29] elfnotes: intorduce a new PHYS_ENTRY elfnote Roger Pau Monne
@ 2015-09-04 12:09 ` Roger Pau Monne
  2015-09-04 12:09 ` [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs Roger Pau Monne
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

Introduce a new flag in xc_dom_image that turns on and off the emulated
devices. This prevents creating the VGA hole, the hvm_info page and the
ioreq server pages. libxl unconditionally sets it to true for all HVM
domains at the moment.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v5:
 - Add Andrew Cooper Reviewed-by.

Changes since v4:
 - Store the size of the VGA hole inside of the xc_dom_image struct and set
   it from libxl.
 - Rename dom->emulation to dom->device_model (no functional change).
 - Add Wei Liu Acked-by.

Changes since v3:
 - Explain the meaning of the "emulation" xc_dom_image field.
---
 tools/libxc/include/xc_dom.h |  4 +++
 tools/libxc/xc_dom_x86.c     | 73 ++++++++++++++++++++++++--------------------
 tools/libxl/libxl_dom.c      |  2 ++
 tools/libxl/libxl_internal.h |  1 +
 4 files changed, 47 insertions(+), 33 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index cda40d9..507b323 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -193,6 +193,10 @@ struct xc_dom_image {
     xen_pfn_t mmio_size;
     xen_pfn_t lowmem_end;
     xen_pfn_t highmem_end;
+    xen_pfn_t vga_hole_size;
+
+    /* If unset disables the setup of the IOREQ pages. */
+    bool device_model;
 
     /* Extra ACPI tables passed to HVMLOADER */
     struct xc_hvm_firmware_module acpi_module;
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index f36b6f6..2ef833e 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -49,8 +49,6 @@
 #define X86_CR0_PE 0x01
 #define X86_CR0_ET 0x10
 
-#define VGA_HOLE_SIZE (0x20)
-
 #define SPECIALPAGE_PAGING   0
 #define SPECIALPAGE_ACCESS   1
 #define SPECIALPAGE_SHARING  2
@@ -522,12 +520,15 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
     xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
     xc_interface *xch = dom->xch;
 
-    if ( (hvm_info_page = xc_map_foreign_range(
-              xch, domid, PAGE_SIZE, PROT_READ | PROT_WRITE,
-              HVM_INFO_PFN)) == NULL )
-        goto error_out;
-    build_hvm_info(hvm_info_page, dom);
-    munmap(hvm_info_page, PAGE_SIZE);
+    if ( dom->device_model )
+    {
+        if ( (hvm_info_page = xc_map_foreign_range(
+                  xch, domid, PAGE_SIZE, PROT_READ | PROT_WRITE,
+                  HVM_INFO_PFN)) == NULL )
+            goto error_out;
+        build_hvm_info(hvm_info_page, dom);
+        munmap(hvm_info_page, PAGE_SIZE);
+    }
 
     /* Allocate and clear special pages. */
     for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
@@ -559,30 +560,33 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
     xc_hvm_param_set(xch, domid, HVM_PARAM_SHARING_RING_PFN,
                      special_pfn(SPECIALPAGE_SHARING));
 
-    /*
-     * Allocate and clear additional ioreq server pages. The default
-     * server will use the IOREQ and BUFIOREQ special pages above.
-     */
-    for ( i = 0; i < NR_IOREQ_SERVER_PAGES; i++ )
-        ioreq_server_array[i] = ioreq_server_pfn(i);
-
-    rc = xc_domain_populate_physmap_exact(xch, domid, NR_IOREQ_SERVER_PAGES, 0,
-                                          0, ioreq_server_array);
-    if ( rc != 0 )
+    if ( dom->device_model )
     {
-        DOMPRINTF("Could not allocate ioreq server pages.");
-        goto error_out;
-    }
+        /*
+         * Allocate and clear additional ioreq server pages. The default
+         * server will use the IOREQ and BUFIOREQ special pages above.
+         */
+        for ( i = 0; i < NR_IOREQ_SERVER_PAGES; i++ )
+            ioreq_server_array[i] = ioreq_server_pfn(i);
 
-    if ( xc_clear_domain_pages(xch, domid, ioreq_server_pfn(0),
-                               NR_IOREQ_SERVER_PAGES) )
+        rc = xc_domain_populate_physmap_exact(xch, domid, NR_IOREQ_SERVER_PAGES, 0,
+                                              0, ioreq_server_array);
+        if ( rc != 0 )
+        {
+            DOMPRINTF("Could not allocate ioreq server pages.");
             goto error_out;
+        }
 
-    /* Tell the domain where the pages are and how many there are */
-    xc_hvm_param_set(xch, domid, HVM_PARAM_IOREQ_SERVER_PFN,
-                     ioreq_server_pfn(0));
-    xc_hvm_param_set(xch, domid, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
-                     NR_IOREQ_SERVER_PAGES);
+        if ( xc_clear_domain_pages(xch, domid, ioreq_server_pfn(0),
+                                   NR_IOREQ_SERVER_PAGES) )
+                goto error_out;
+
+        /* Tell the domain where the pages are and how many there are */
+        xc_hvm_param_set(xch, domid, HVM_PARAM_IOREQ_SERVER_PFN,
+                         ioreq_server_pfn(0));
+        xc_hvm_param_set(xch, domid, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                         NR_IOREQ_SERVER_PAGES);
+    }
 
     /*
      * Identity-map page table is required for running with CR0.PG=0 when
@@ -1319,7 +1323,8 @@ static int meminit_hvm(struct xc_dom_image *dom)
      * allocated is pointless.
      */
     if ( claim_enabled ) {
-        rc = xc_domain_claim_pages(xch, domid, target_pages - VGA_HOLE_SIZE);
+        rc = xc_domain_claim_pages(xch, domid,
+                                   target_pages - dom->vga_hole_size);
         if ( rc != 0 )
         {
             DOMPRINTF("Could not allocate memory for HVM guest as we cannot claim memory!");
@@ -1335,7 +1340,8 @@ static int meminit_hvm(struct xc_dom_image *dom)
          * tot_pages will be target_pages - VGA_HOLE_SIZE after
          * this call.
          */
-        rc = xc_domain_set_pod_target(xch, domid, target_pages - VGA_HOLE_SIZE,
+        rc = xc_domain_set_pod_target(xch, domid,
+                                      target_pages - dom->vga_hole_size,
                                       NULL, NULL, NULL);
         if ( rc != 0 )
         {
@@ -1354,8 +1360,9 @@ static int meminit_hvm(struct xc_dom_image *dom)
      * Under 2MB mode, we allocate pages in batches of no more than 8MB to 
      * ensure that we can be preempted and hence dom0 remains responsive.
      */
-    rc = xc_domain_populate_physmap_exact(
-        xch, domid, 0xa0, 0, memflags, &dom->p2m_host[0x00]);
+    if ( dom->device_model )
+        rc = xc_domain_populate_physmap_exact(
+            xch, domid, 0xa0, 0, memflags, &dom->p2m_host[0x00]);
 
     stat_normal_pages = 0;
     for ( vmemid = 0; vmemid < nr_vmemranges; vmemid++ )
@@ -1374,7 +1381,7 @@ static int meminit_hvm(struct xc_dom_image *dom)
          * 0xA0000-0xC0000. Note that 0x00000-0xA0000 is populated just
          * before this loop.
          */
-        if ( vmemranges[vmemid].start == 0 )
+        if ( vmemranges[vmemid].start == 0 && dom->device_model )
         {
             cur_pages = 0xc0;
             stat_normal_pages += 0xc0;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index d2cf9e3..4a214e7 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -987,6 +987,8 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     dom->lowmem_end = lowmem_end;
     dom->highmem_end = highmem_end;
     dom->mmio_start = mmio_start;
+    dom->vga_hole_size = LIBXL_VGA_HOLE_SIZE
+    dom->device_model = true;
 
     rc = libxl__domain_device_construct_rdm(gc, d_config,
                                             info->u.hvm.rdm_mem_boundary_memkb*1024,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index ea89f1f..294d442 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -100,6 +100,7 @@
 #define LIBXL_HVM_EXTRA_MEMORY 2048
 #define LIBXL_MIN_DOM0_MEM (128*1024)
 #define LIBXL_INVALID_GFN (~(uint64_t)0)
+#define LIBXL_VGA_HOLE_SIZE 0x20
 /* use 0 as the domid of the toolstack domain for now */
 #define LIBXL_TOOLSTACK_DOMID 0
 #define QEMU_SIGNATURE "DeviceModelRecord0002"
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (22 preceding siblings ...)
  2015-09-04 12:09 ` [PATCH v6 23/29] libxc: allow creating domains without emulated devices Roger Pau Monne
@ 2015-09-04 12:09 ` Roger Pau Monne
  2015-09-21 15:44   ` Jan Beulich
  2015-09-04 12:09 ` [PATCH v6 25/29] xenconsole: try to attach to PV console if HVM fails Roger Pau Monne
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, Jan Beulich,
	Roger Pau Monne

Allow the usage of the VCPUOP_initialise, VCPUOP_up, VCPUOP_down and
VCPUOP_is_up hypercalls from HVM guests.

This patch introduces a new structure (vcpu_hvm_context) that should be used
in conjuction with the VCPUOP_initialise hypercall in order to initialize
vCPUs for HVM guests.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
---
Changes since v5:
 - Fix a coding style issue.
 - Merge the code from wip-dmlite-v5-refactor by Andrew in order to reduce
   bloat.
 - Print the offending %cr3 in case of error when using shadow.
 - Reduce the scope of local variables in arch_initialize_vcpu.
 - s/current->domain/v->domain/g in arch_initialize_vcpu.
 - Expand the comment in public/vcpu.h to document the usage of
   vcpu_hvm_context for HVM guests.
 - Add myself as the copyright holder for the public hvm_vcpu.h header.

Changes since v4:
 - Don't assume mode is 64B, add an explicit check.
 - Don't set TF_kernel_mode, it is only needed for PV guests.
 - Don't set CR0_ET unconditionally.
---
 xen/arch/arm/domain.c             |  24 +++++
 xen/arch/x86/domain.c             | 196 ++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c            |   8 ++
 xen/common/domain.c               |  17 +---
 xen/include/public/hvm/hvm_vcpu.h | 170 +++++++++++++++++++++++++++++++++
 xen/include/public/vcpu.h         |   6 +-
 xen/include/xen/domain.h          |   2 +
 7 files changed, 405 insertions(+), 18 deletions(-)
 create mode 100644 xen/include/public/hvm/hvm_vcpu.h

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index b2bfc7d..b554e00 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -752,6 +752,30 @@ int arch_set_info_guest(
     return 0;
 }
 
+int arch_initialize_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+    struct vcpu_guest_context *ctxt;
+    struct domain *d = v->domain;
+    int rc;
+
+    if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
+        return -ENOMEM;
+
+    if ( copy_from_guest(ctxt, arg, 1) )
+    {
+        free_vcpu_guest_context(ctxt);
+        return -EFAULT;
+    }
+
+    domain_lock(d);
+    rc = v->is_initialised ? -EEXIST : arch_set_info_guest(v, ctxt);
+    domain_unlock(d);
+
+    free_vcpu_guest_context(ctxt);
+
+    return rc;
+}
+
 int arch_vcpu_reset(struct vcpu *v)
 {
     vcpu_end_shutdown_deferral(v);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8fe95f7..2410517 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -37,6 +37,7 @@
 #include <xen/wait.h>
 #include <xen/guest_access.h>
 #include <public/sysctl.h>
+#include <public/hvm/hvm_vcpu.h>
 #include <asm/regs.h>
 #include <asm/mc146818rtc.h>
 #include <asm/system.h>
@@ -1140,6 +1141,201 @@ int arch_set_info_guest(
 #undef c
 }
 
+/* Called by VCPUOP_initialise for HVM guests. */
+static int arch_set_info_hvm_guest(struct vcpu *v, vcpu_hvm_context_t *ctx)
+{
+    struct cpu_user_regs *uregs = &v->arch.user_regs;
+    struct segment_register cs, ds, ss, tr;
+
+#define SEG(s, r)                                                       \
+    (struct segment_register){ .base = (r)->s ## _base,                 \
+            .limit = (r)->s ## _limit, .attr.bytes = (r)->s ## _ar }
+
+    switch ( ctx->mode )
+    {
+    default:
+        return -EINVAL;
+
+    case VCPU_HVM_MODE_16B:
+    {
+        const struct vcpu_hvm_x86_16 *regs = &ctx->cpu_regs.x86_16;
+
+        uregs->rax    = regs->ax;
+        uregs->rcx    = regs->cx;
+        uregs->rdx    = regs->dx;
+        uregs->rbx    = regs->bx;
+        uregs->rsp    = regs->sp;
+        uregs->rbp    = regs->bp;
+        uregs->rsi    = regs->si;
+        uregs->rdi    = regs->di;
+        uregs->rip    = regs->ip;
+        uregs->rflags = regs->flags;
+
+        v->arch.hvm_vcpu.guest_cr[0] = regs->cr0;
+        v->arch.hvm_vcpu.guest_cr[4] = regs->cr4;
+
+        cs = SEG(cs, regs);
+        ds = SEG(ds, regs);
+        ss = SEG(ss, regs);
+        tr = SEG(tr, regs);
+    }
+    break;
+
+    case VCPU_HVM_MODE_32B:
+    {
+        const struct vcpu_hvm_x86_32 *regs = &ctx->cpu_regs.x86_32;
+
+        uregs->rax    = regs->eax;
+        uregs->rcx    = regs->ecx;
+        uregs->rdx    = regs->edx;
+        uregs->rbx    = regs->ebx;
+        uregs->rsp    = regs->esp;
+        uregs->rbp    = regs->ebp;
+        uregs->rsi    = regs->esi;
+        uregs->rdi    = regs->edi;
+        uregs->rip    = regs->eip;
+        uregs->rflags = regs->eflags;
+
+        v->arch.hvm_vcpu.guest_cr[0] = regs->cr0;
+        v->arch.hvm_vcpu.guest_cr[3] = regs->cr3;
+        v->arch.hvm_vcpu.guest_cr[4] = regs->cr4;
+        v->arch.hvm_vcpu.guest_efer  = regs->efer;
+
+        cs = SEG(cs, regs);
+        ds = SEG(ds, regs);
+        ss = SEG(ss, regs);
+        tr = SEG(tr, regs);
+    }
+    break;
+
+    case VCPU_HVM_MODE_64B:
+    {
+        const struct vcpu_hvm_x86_64 *regs = &ctx->cpu_regs.x86_64;
+
+        uregs->rax    = regs->rax;
+        uregs->rcx    = regs->rcx;
+        uregs->rdx    = regs->rdx;
+        uregs->rbx    = regs->rbx;
+        uregs->rsp    = regs->rsp;
+        uregs->rbp    = regs->rbp;
+        uregs->rsi    = regs->rsi;
+        uregs->rdi    = regs->rdi;
+        uregs->rip    = regs->rip;
+        uregs->rflags = regs->rflags;
+        uregs->r8     = regs->r8;
+        uregs->r9     = regs->r9;
+        uregs->r10    = regs->r10;
+        uregs->r11    = regs->r11;
+        uregs->r12    = regs->r12;
+        uregs->r13    = regs->r13;
+        uregs->r14    = regs->r14;
+        uregs->r15    = regs->r15;
+
+        v->arch.hvm_vcpu.guest_cr[0] = regs->cr0;
+        v->arch.hvm_vcpu.guest_cr[3] = regs->cr3;
+        v->arch.hvm_vcpu.guest_cr[4] = regs->cr4;
+        v->arch.hvm_vcpu.guest_efer  = regs->efer;
+
+        cs = SEG(cs, regs);
+        ds = SEG(ds, regs);
+        ss = SEG(ss, regs);
+        tr = SEG(tr, regs);
+    }
+    break;
+
+    }
+
+    if ( !paging_mode_hap(v->domain) )
+        v->arch.guest_table = pagetable_null();
+
+    hvm_update_guest_cr(v, 0);
+    hvm_update_guest_cr(v, 4);
+
+    if ( (ctx->mode == VCPU_HVM_MODE_32B) ||
+         (ctx->mode == VCPU_HVM_MODE_64B) )
+    {
+        hvm_update_guest_cr(v, 3);
+        hvm_update_guest_efer(v);
+    }
+
+    if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) )
+    {
+        /* Shadow-mode CR3 change. Check PDBR and update refcounts. */
+        struct page_info *page = get_page_from_gfn(v->domain,
+                                 v->arch.hvm_vcpu.guest_cr[3] >> PAGE_SHIFT,
+                                 NULL, P2M_ALLOC);
+        if ( !page )
+        {
+            gdprintk(XENLOG_ERR, "Invalid CR3: %#lx\n",
+                     v->arch.hvm_vcpu.guest_cr[3]);
+            domain_crash(v->domain);
+            return -EINVAL;
+        }
+
+        v->arch.guest_table = pagetable_from_page(page);
+    }
+
+    hvm_set_segment_register(v, x86_seg_cs, &cs);
+    hvm_set_segment_register(v, x86_seg_ds, &ds);
+    hvm_set_segment_register(v, x86_seg_ss, &ss);
+    hvm_set_segment_register(v, x86_seg_tr, &tr);
+
+    /* Sync AP's TSC with BSP's. */
+    v->arch.hvm_vcpu.cache_tsc_offset =
+        v->domain->vcpu[0]->arch.hvm_vcpu.cache_tsc_offset;
+    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset,
+                             v->domain->arch.hvm_domain.sync_tsc);
+
+    v->arch.hvm_vcpu.msr_tsc_adjust = 0;
+
+    paging_update_paging_modes(v);
+
+    v->is_initialised = 1;
+    set_bit(_VPF_down, &v->pause_flags);
+
+    return 0;
+#undef SEG
+}
+
+int arch_initialize_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+    struct domain *d = v->domain;
+    int rc;
+
+    if ( is_hvm_vcpu(v) )
+    {
+        struct vcpu_hvm_context hvm_ctx;
+
+        if ( copy_from_guest(&hvm_ctx, arg, 1) )
+            return -EFAULT;
+
+        domain_lock(d);
+        rc = v->is_initialised ? -EEXIST : arch_set_info_hvm_guest(v, &hvm_ctx);
+        domain_unlock(d);
+    }
+    else
+    {
+        struct vcpu_guest_context *ctxt;
+
+        if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
+            return -ENOMEM;
+
+        if ( copy_from_guest(ctxt, arg, 1) )
+        {
+            free_vcpu_guest_context(ctxt);
+            return -EFAULT;
+        }
+
+        domain_lock(d);
+        rc = v->is_initialised ? -EEXIST : arch_set_info_guest(v, ctxt);
+        domain_unlock(d);
+
+        free_vcpu_guest_context(ctxt);
+    }
+
+    return rc;
+}
+
 int arch_vcpu_reset(struct vcpu *v)
 {
     if ( is_pv_vcpu(v) )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1640b58..8856c72 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4980,6 +4980,10 @@ static long hvm_vcpu_op(
     case VCPUOP_stop_singleshot_timer:
     case VCPUOP_register_vcpu_info:
     case VCPUOP_register_vcpu_time_memory_area:
+    case VCPUOP_initialise:
+    case VCPUOP_up:
+    case VCPUOP_down:
+    case VCPUOP_is_up:
         rc = do_vcpu_op(cmd, vcpuid, arg);
         break;
     default:
@@ -5038,6 +5042,10 @@ static long hvm_vcpu_op_compat32(
     case VCPUOP_stop_singleshot_timer:
     case VCPUOP_register_vcpu_info:
     case VCPUOP_register_vcpu_time_memory_area:
+    case VCPUOP_initialise:
+    case VCPUOP_up:
+    case VCPUOP_down:
+    case VCPUOP_is_up:
         rc = compat_vcpu_op(cmd, vcpuid, arg);
         break;
     default:
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 1b9fcfc..2b41741 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1173,7 +1173,6 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d = current->domain;
     struct vcpu *v;
-    struct vcpu_guest_context *ctxt;
     long rc = 0;
 
     if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
@@ -1185,21 +1184,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( v->vcpu_info == &dummy_vcpu_info )
             return -EINVAL;
 
-        if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
-            return -ENOMEM;
-
-        if ( copy_from_guest(ctxt, arg, 1) )
-        {
-            free_vcpu_guest_context(ctxt);
-            return -EFAULT;
-        }
-
-        domain_lock(d);
-        rc = v->is_initialised ? -EEXIST : arch_set_info_guest(v, ctxt);
-        domain_unlock(d);
-
-        free_vcpu_guest_context(ctxt);
-
+        rc = arch_initialize_vcpu(v, arg);
         if ( rc == -ERESTART )
             rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iuh",
                                                cmd, vcpuid, arg);
diff --git a/xen/include/public/hvm/hvm_vcpu.h b/xen/include/public/hvm/hvm_vcpu.h
new file mode 100644
index 0000000..6ff4ad4
--- /dev/null
+++ b/xen/include/public/hvm/hvm_vcpu.h
@@ -0,0 +1,170 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2015, Roger Pau Monne <roger.pau@citrix.com>
+ */
+
+#ifndef __XEN_PUBLIC_HVM_HVM_VCPU_H__
+#define __XEN_PUBLIC_HVM_HVM_VCPU_H__
+
+#include "../xen.h"
+
+struct vcpu_hvm_x86_16 {
+    uint16_t ax;
+    uint16_t cx;
+    uint16_t dx;
+    uint16_t bx;
+    uint16_t sp;
+    uint16_t bp;
+    uint16_t si;
+    uint16_t di;
+    uint16_t ip;
+    uint16_t flags;
+
+    uint32_t cr0;
+    uint32_t cr4;
+
+    uint32_t cs_base;
+    uint32_t ds_base;
+    uint32_t ss_base;
+    uint32_t tr_base;
+    uint32_t cs_limit;
+    uint32_t ds_limit;
+    uint32_t ss_limit;
+    uint32_t tr_limit;
+    uint16_t cs_ar;
+    uint16_t ds_ar;
+    uint16_t ss_ar;
+    uint16_t tr_ar;
+};
+
+struct vcpu_hvm_x86_32 {
+    uint32_t eax;
+    uint32_t ecx;
+    uint32_t edx;
+    uint32_t ebx;
+    uint32_t esp;
+    uint32_t ebp;
+    uint32_t esi;
+    uint32_t edi;
+    uint32_t eip;
+    uint16_t eflags;
+
+    uint32_t cr0;
+    uint32_t cr3;
+    uint32_t cr4;
+    uint64_t efer;
+
+    uint32_t cs_base;
+    uint32_t ds_base;
+    uint32_t ss_base;
+    uint32_t tr_base;
+    uint32_t cs_limit;
+    uint32_t ds_limit;
+    uint32_t ss_limit;
+    uint32_t tr_limit;
+    uint16_t cs_ar;
+    uint16_t ds_ar;
+    uint16_t ss_ar;
+    uint16_t tr_ar;
+};
+
+struct vcpu_hvm_x86_64 {
+    uint64_t rax;
+    uint64_t rcx;
+    uint64_t rdx;
+    uint64_t rbx;
+    uint64_t rsp;
+    uint64_t rbp;
+    uint64_t rsi;
+    uint64_t rdi;
+    uint64_t r8;
+    uint64_t r9;
+    uint64_t r10;
+    uint64_t r11;
+    uint64_t r12;
+    uint64_t r13;
+    uint64_t r14;
+    uint64_t r15;
+    uint64_t rip;
+    uint64_t rflags;
+
+    uint64_t cr0;
+    uint64_t cr3;
+    uint64_t cr4;
+    uint64_t efer;
+
+    uint32_t cs_base;
+    uint32_t ds_base;
+    uint32_t ss_base;
+    uint32_t tr_base;
+    uint32_t cs_limit;
+    uint32_t ds_limit;
+    uint32_t ss_limit;
+    uint32_t tr_limit;
+    uint16_t cs_ar;
+    uint16_t ds_ar;
+    uint16_t ss_ar;
+    uint16_t tr_ar;
+};
+
+/*
+ * The layout of the _ar fields of the segment registers is the
+ * following:
+ *
+ * Bits [0,3]: type (bits 40-43).
+ * Bit      4: s    (descriptor type, bit 44).
+ * Bit  [5,6]: dpl  (descriptor privilege level, bits 45-46).
+ * Bit      7: p    (segment-present, bit 47).
+ * Bit      8: avl  (available for system software, bit 52).
+ * Bit      9: l    (64-bit code segment, bit 53).
+ * Bit     10: db   (meaning depends on the segment, bit 54).
+ * Bit     11: g    (granularity, bit 55)
+ *
+ * A more complete description of the meaning of this fields can be
+ * obtained from the Intel SDM, Volume 3, section 3.4.5.
+ */
+
+struct vcpu_hvm_context {
+#define VCPU_HVM_MODE_16B 0  /* 16bit fields of the structure will be used. */
+#define VCPU_HVM_MODE_32B 1  /* 32bit fields of the structure will be used. */
+#define VCPU_HVM_MODE_64B 2  /* 64bit fields of the structure will be used. */
+    uint32_t mode;
+
+    /* CPU registers. */
+    union {
+        struct vcpu_hvm_x86_16 x86_16;
+        struct vcpu_hvm_x86_32 x86_32;
+        struct vcpu_hvm_x86_64 x86_64;
+    } cpu_regs;
+};
+typedef struct vcpu_hvm_context vcpu_hvm_context_t;
+DEFINE_XEN_GUEST_HANDLE(vcpu_hvm_context_t);
+
+#endif /* __XEN_PUBLIC_HVM_HVM_VCPU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
index 898b89f..692b87a 100644
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -41,8 +41,10 @@
  * Initialise a VCPU. Each VCPU can be initialised only once. A 
  * newly-initialised VCPU will not run until it is brought up by VCPUOP_up.
  * 
- * @extra_arg == pointer to vcpu_guest_context structure containing initial
- *               state for the VCPU.
+ * @extra_arg == For PV or ARM guests this is a pointer to a vcpu_guest_context
+ *               structure containing the initial state for the VCPU. For x86
+ *               HVM based guests this is a pointer to a vcpu_hvm_context
+ *               structure.
  */
 #define VCPUOP_initialise            0
 
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 848db8a..21690be 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -68,6 +68,8 @@ void arch_domain_unpause(struct domain *d);
 int arch_set_info_guest(struct vcpu *, vcpu_guest_context_u);
 void arch_get_info_guest(struct vcpu *, vcpu_guest_context_u);
 
+int arch_initialize_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg);
+
 int domain_relinquish_resources(struct domain *d);
 
 void dump_pageframe_info(struct domain *d);
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 25/29] xenconsole: try to attach to PV console if HVM fails
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (23 preceding siblings ...)
  2015-09-04 12:09 ` [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs Roger Pau Monne
@ 2015-09-04 12:09 ` Roger Pau Monne
  2015-09-04 12:09 ` [PATCH v6 26/29] libxc/xen: introduce a start info structure for HVMlite guests Roger Pau Monne
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

HVM guests have always used the emulated serial console by default, but if
the emulated serial pty cannot be fetched from xenstore try to use the PV
console instead.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v4:
 - Add Wei Liu Acked-by.

Changes since v3:
 - Drop the usage of a label and instead use if conditions.
---
 tools/console/client/main.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/tools/console/client/main.c b/tools/console/client/main.c
index f130a60..d006fdc 100644
--- a/tools/console/client/main.c
+++ b/tools/console/client/main.c
@@ -333,7 +333,7 @@ int main(int argc, char **argv)
 		{ 0 },
 
 	};
-	char *dom_path = NULL, *path = NULL;
+	char *dom_path = NULL, *path = NULL, *test = NULL;
 	int spty, xsfd;
 	struct xs_handle *xs;
 	char *end;
@@ -415,9 +415,15 @@ int main(int argc, char **argv)
 	path = malloc(strlen(dom_path) + strlen("/device/console/0/tty") + 5);
 	if (path == NULL)
 		err(ENOMEM, "malloc");
-	if (type == CONSOLE_SERIAL)
+	if (type == CONSOLE_SERIAL) {
 		snprintf(path, strlen(dom_path) + strlen("/serial/0/tty") + 5, "%s/serial/%d/tty", dom_path, num);
-	else {
+		test = xs_read(xs, XBT_NULL, path, NULL);
+		free(test);
+		if (test == NULL)
+			type = CONSOLE_PV;
+	}
+	if (type == CONSOLE_PV) {
+
 		if (num == 0)
 			snprintf(path, strlen(dom_path) + strlen("/console/tty") + 1, "%s/console/tty", dom_path);
 		else
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 26/29] libxc/xen: introduce a start info structure for HVMlite guests
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (24 preceding siblings ...)
  2015-09-04 12:09 ` [PATCH v6 25/29] xenconsole: try to attach to PV console if HVM fails Roger Pau Monne
@ 2015-09-04 12:09 ` Roger Pau Monne
  2015-09-10 16:00   ` Wei Liu
  2015-09-21 15:53   ` Jan Beulich
  2015-09-04 12:09 ` [PATCH v6 27/29] libxc: switch xc_dom_elfloader to be used with HVMlite domains Roger Pau Monne
                   ` (3 subsequent siblings)
  29 siblings, 2 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, Roger Pau Monne

This structure contains the physical address of the command line, as well as
the physical address of the list of loaded modules. The physical address of
this structure is passed to the guest at boot time in the %ebx register.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v5:
 - Change some of the calculations performed to get the total size of the
   start_info region.
 - Replace the mention of HVMlite in a comment with PVH.
 - Don't use 64bit integers in hvm_modlist_entry.
---
 tools/libxc/xc_dom_x86.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++-
 xen/include/public/xen.h | 19 ++++++++++++++
 2 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 2ef833e..3fffcba 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -560,7 +560,70 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
     xc_hvm_param_set(xch, domid, HVM_PARAM_SHARING_RING_PFN,
                      special_pfn(SPECIALPAGE_SHARING));
 
-    if ( dom->device_model )
+    if ( !dom->device_model )
+    {
+        struct xc_dom_seg seg;
+        struct hvm_start_info *start_info;
+        char *cmdline;
+        struct hvm_modlist_entry *modlist;
+        void *start_page;
+        size_t cmdline_size = 0;
+        size_t start_info_size = sizeof(*start_info);
+
+        if ( dom->cmdline )
+        {
+            cmdline_size = ROUNDUP(strlen(dom->cmdline) + 1, 8);
+            start_info_size += cmdline_size;
+
+        }
+        if ( dom->ramdisk_blob )
+            start_info_size += sizeof(*modlist); /* Limited to one module. */
+
+        rc = xc_dom_alloc_segment(dom, &seg, "HVMlite start info", 0,
+                                  start_info_size);
+        if ( rc != 0 )
+        {
+            DOMPRINTF("Unable to reserve memory for the start info");
+            goto out;
+        }
+
+        start_page = xc_map_foreign_range(xch, domid, start_info_size,
+                                          PROT_READ | PROT_WRITE,
+                                          seg.pfn);
+        if ( start_page == NULL )
+        {
+            DOMPRINTF("Unable to map HVM start info page");
+            goto error_out;
+        }
+
+        start_info = start_page;
+        cmdline = start_page + sizeof(*start_info);
+        modlist = start_page + sizeof(*start_info) + cmdline_size;
+
+        if ( dom->cmdline )
+        {
+            strncpy(cmdline, dom->cmdline, MAX_GUEST_CMDLINE);
+            cmdline[MAX_GUEST_CMDLINE - 1] = '\0';
+            start_info->cmdline_paddr = (seg.pfn << PAGE_SHIFT) +
+                                ((xen_pfn_t)cmdline - (xen_pfn_t)start_info);
+        }
+
+        if ( dom->ramdisk_blob )
+        {
+            modlist[0].paddr = dom->ramdisk_seg.vstart - dom->parms.virt_base;
+            modlist[0].size = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
+            start_info->modlist_paddr = (seg.pfn << PAGE_SHIFT) +
+                                ((xen_pfn_t)modlist - (xen_pfn_t)start_info);
+            start_info->nr_modules = 1;
+        }
+
+        start_info->magic = HVM_START_MAGIC_VALUE;
+
+        munmap(start_page, start_info_size);
+
+        dom->start_info_pfn = seg.pfn;
+    }
+    else
     {
         /*
          * Allocate and clear additional ioreq server pages. The default
@@ -917,6 +980,9 @@ static int vcpu_hvm(struct xc_dom_image *dom)
     /* Set the IP. */
     bsp_ctx.cpu.rip = dom->parms.phys_entry;
 
+    if ( dom->start_info_pfn )
+        bsp_ctx.cpu.rbx = dom->start_info_pfn << PAGE_SHIFT;
+
     /* Set the end descriptor. */
     bsp_ctx.end_d.typecode = HVM_SAVE_CODE(END);
     bsp_ctx.end_d.instance = 0;
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index ff5547e..0326ba6 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -784,6 +784,25 @@ struct start_info {
 };
 typedef struct start_info start_info_t;
 
+/*
+ * Start of day structure passed to PVH guests in %ebx.
+ */
+struct hvm_start_info {
+#define HVM_START_MAGIC_VALUE 0x336ec578
+    uint32_t magic;             /* Contains the magic value 0x336ec578       */
+                                /* ("xEn3" with the 0x80 bit of the "E" set).*/
+    uint32_t flags;             /* SIF_xxx flags.                            */
+    uint32_t cmdline_paddr;     /* Physical address of the command line.     */
+    uint32_t nr_modules;        /* Number of modules passed to the kernel.   */
+    uint32_t modlist_paddr;     /* Physical address of an array of           */
+                                /* hvm_modlist_entry.                        */
+};
+
+struct hvm_modlist_entry {
+    uint32_t paddr;             /* Physical address of the module.           */
+    uint32_t size;              /* Size of the module in bytes.              */
+};
+
 /* New console union for dom0 introduced in 0x00030203. */
 #if __XEN_INTERFACE_VERSION__ < 0x00030203
 #define console_mfn    console.domU.mfn
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 27/29] libxc: switch xc_dom_elfloader to be used with HVMlite domains
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (25 preceding siblings ...)
  2015-09-04 12:09 ` [PATCH v6 26/29] libxc/xen: introduce a start info structure for HVMlite guests Roger Pau Monne
@ 2015-09-04 12:09 ` Roger Pau Monne
  2015-09-04 12:09 ` [PATCH v6 28/29] libxl: allow the creation of HVM domains without a device model Roger Pau Monne
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

Allow xc_dom_elfloader to report a guest type as hvm-3.0-x86_32 if it's
running inside of a HVM container and has the PHYS32_ENTRY elfnote set.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Only xc_dom_elfloader has been switched to support HVMlite, other loaders
should also be switched once we have a HVMlite compatible kernel that uses
them.
---
Changes since v5:
 - Add Wei Liu Ack.

Changes since v4:
 - Add Andrew Cooper Reviewed-by.
---
 tools/libxc/xc_dom_elfloader.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/libxc/xc_dom_elfloader.c b/tools/libxc/xc_dom_elfloader.c
index 66ea9d6..f3a0ed7 100644
--- a/tools/libxc/xc_dom_elfloader.c
+++ b/tools/libxc/xc_dom_elfloader.c
@@ -56,6 +56,10 @@ static char *xc_dom_guest_type(struct xc_dom_image *dom,
 {
     uint64_t machine = elf_uval(elf, elf->ehdr, e_machine);
 
+    if ( dom->container_type == XC_DOM_HVM_CONTAINER &&
+         dom->parms.phys_entry != UNSET_ADDR )
+        return "hvm-3.0-x86_32";
+
     switch ( machine )
     {
     case EM_386:
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 28/29] libxl: allow the creation of HVM domains without a device model.
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (26 preceding siblings ...)
  2015-09-04 12:09 ` [PATCH v6 27/29] libxc: switch xc_dom_elfloader to be used with HVMlite domains Roger Pau Monne
@ 2015-09-04 12:09 ` Roger Pau Monne
  2015-09-04 12:09 ` [PATCH v6 29/29] libxl: add support for migrating HVM guests " Roger Pau Monne
  2015-09-11 13:04 ` [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Ian Campbell
  29 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, Roger Pau Monne

Replace the firmware loaded into HVM guests with an OS kernel. Since the HVM
builder now uses the PV xc_dom_* set of functions this kernel will be parsed
and loaded inside the guest like on PV, but the container is a pure HVM
guest.

Also, if device_model_version is set to none or a device model for the
specified domain is not present unconditinally set the nic type to
LIBXL_NIC_TYPE_VIF.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v5:
 - Add Wei Liu Acked-by.

Changes since v4:
 - Set dom->mmio_size to match the size of the special pages if there's no
   device model for the guest. This implies moving NR_SPECIAL_PAGES and
   X86_HVM_END_SPECIAL_REGION to a public header so they can be known by
   libxl when creating the memory map.
 - Reword the xl.cfg man page description of the "none" device model option.
 - Use libxl__device_model_version_running instead of creating a new
   function.

Changes since v3:
 - Add explicit /* fall through */ comments.
 - Expand libxl__device_nic_setdefault so that it sets the right nic type
   for HVMlite guests.
 - Remove stray space in hvm_build_set_params.
 - Fix the error paths of libxl__domain_firmware.
---
 docs/man/xl.cfg.pod.5        |  5 +++
 tools/libxc/include/xc_dom.h |  2 ++
 tools/libxc/xc_dom_x86.c     | 15 ++++-----
 tools/libxl/libxl.c          | 44 ++++++++++++++++++--------
 tools/libxl/libxl_create.c   | 16 +++++++++-
 tools/libxl/libxl_dm.c       |  3 +-
 tools/libxl/libxl_dom.c      | 74 ++++++++++++++++++++++++++++++--------------
 tools/libxl/libxl_internal.h |  9 +++++-
 tools/libxl/libxl_types.idl  |  1 +
 tools/libxl/libxl_x86.c      |  9 ++++--
 tools/libxl/xl_cmdimpl.c     |  2 ++
 11 files changed, 131 insertions(+), 49 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 80e51bb..75d9949 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1741,6 +1741,11 @@ This device-model is the default for Linux dom0.
 Use the device-model based upon the historical Xen fork of Qemu.
 This device-model is still the default for NetBSD dom0.
 
+=item B<none>
+
+Don't use any device model. This requires a kernel capable of booting
+without emulated devices.
+
 =back
 
 It is recommended to accept the default value for new guests.  If
diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 507b323..b89f8e3 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -17,6 +17,8 @@
 #include <xenguest.h>
 
 #define INVALID_P2M_ENTRY   ((xen_pfn_t)-1)
+#define X86_HVM_NR_SPECIAL_PAGES    8
+#define X86_HVM_END_SPECIAL_REGION  0xff000u
 
 /* --- typedefs and structs ---------------------------------------- */
 
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 3fffcba..09d70fc 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -57,8 +57,8 @@
 #define SPECIALPAGE_IOREQ    5
 #define SPECIALPAGE_IDENT_PT 6
 #define SPECIALPAGE_CONSOLE  7
-#define NR_SPECIAL_PAGES     8
-#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
+#define special_pfn(x) \
+    (X86_HVM_END_SPECIAL_REGION - X86_HVM_NR_SPECIAL_PAGES + (x))
 
 #define NR_IOREQ_SERVER_PAGES 8
 #define ioreq_server_pfn(x) (special_pfn(0) - NR_IOREQ_SERVER_PAGES + (x))
@@ -516,7 +516,7 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
     void *hvm_info_page;
     uint32_t *ident_pt, domid = dom->guest_domid;
     int rc;
-    xen_pfn_t special_array[NR_SPECIAL_PAGES];
+    xen_pfn_t special_array[X86_HVM_NR_SPECIAL_PAGES];
     xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
     xc_interface *xch = dom->xch;
 
@@ -531,18 +531,19 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
     }
 
     /* Allocate and clear special pages. */
-    for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
+    for ( i = 0; i < X86_HVM_NR_SPECIAL_PAGES; i++ )
         special_array[i] = special_pfn(i);
 
-    rc = xc_domain_populate_physmap_exact(xch, domid, NR_SPECIAL_PAGES, 0, 0,
-                                          special_array);
+    rc = xc_domain_populate_physmap_exact(xch, domid, X86_HVM_NR_SPECIAL_PAGES,
+                                          0, 0, special_array);
     if ( rc != 0 )
     {
         DOMPRINTF("Could not allocate special pages.");
         goto error_out;
     }
 
-    if ( xc_clear_domain_pages(xch, domid, special_pfn(0), NR_SPECIAL_PAGES) )
+    if ( xc_clear_domain_pages(xch, domid, special_pfn(0),
+                               X86_HVM_NR_SPECIAL_PAGES) )
             goto error_out;
 
     xc_hvm_param_set(xch, domid, HVM_PARAM_STORE_PFN,
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 10d1909..efa66d6 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1033,11 +1033,14 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
     }
 
     if (type == LIBXL_DOMAIN_TYPE_HVM) {
-        rc = libxl__domain_resume_device_model(gc, domid);
-        if (rc < 0) {
-            LOG(ERROR, "failed to unpause device model for domain %u:%d",
-                domid, rc);
-            goto out;
+        if (libxl__device_model_version_running(gc, domid) !=
+            LIBXL_DEVICE_MODEL_VERSION_NONE) {
+            rc = libxl__domain_resume_device_model(gc, domid);
+            if (rc < 0) {
+                LOG(ERROR, "failed to unpause device model for domain %u:%d",
+                    domid, rc);
+                goto out;
+            }
         }
     }
     ret = xc_domain_unpause(ctx->xch, domid);
@@ -1584,11 +1587,11 @@ void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis)
 
     switch (libxl__domain_type(gc, domid)) {
     case LIBXL_DOMAIN_TYPE_HVM:
-        if (!libxl_get_stubdom_id(CTX, domid))
-            dm_present = 1;
-        else
+        if (libxl_get_stubdom_id(CTX, domid)) {
             dm_present = 0;
-        break;
+            break;
+        }
+        /* fall through */
     case LIBXL_DOMAIN_TYPE_PV:
         pid = libxl__xs_read(gc, XBT_NULL, libxl__sprintf(gc, "/local/domain/%d/image/device-model-pid", domid));
         dm_present = (pid != NULL);
@@ -3205,7 +3208,7 @@ out:
 /******************************************************************************/
 
 int libxl__device_nic_setdefault(libxl__gc *gc, libxl_device_nic *nic,
-                                 uint32_t domid)
+                                 uint32_t domid, libxl_domain_build_info *info)
 {
     int rc;
 
@@ -3242,8 +3245,23 @@ int libxl__device_nic_setdefault(libxl__gc *gc, libxl_device_nic *nic,
 
     switch (libxl__domain_type(gc, domid)) {
     case LIBXL_DOMAIN_TYPE_HVM:
-        if (!nic->nictype)
-            nic->nictype = LIBXL_NIC_TYPE_VIF_IOEMU;
+        if (!nic->nictype) {
+            if (info != NULL) {
+                /* Path taken at creation time. */
+                if (info->device_model_version ==
+                    LIBXL_DEVICE_MODEL_VERSION_NONE)
+                    nic->nictype = LIBXL_NIC_TYPE_VIF;
+                else
+                    nic->nictype = LIBXL_NIC_TYPE_VIF_IOEMU;
+            } else {
+                /* Path taken when hot-adding a nic. */
+                if (libxl__device_model_version_running(gc, domid) ==
+                    LIBXL_DEVICE_MODEL_VERSION_NONE)
+                    nic->nictype = LIBXL_NIC_TYPE_VIF;
+                else
+                    nic->nictype = LIBXL_NIC_TYPE_VIF_IOEMU;
+            }
+        }
         break;
     case LIBXL_DOMAIN_TYPE_PV:
         if (nic->nictype == LIBXL_NIC_TYPE_VIF_IOEMU) {
@@ -3292,7 +3310,7 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t domid,
     libxl_device_nic_init(&nic_saved);
     libxl_device_nic_copy(CTX, &nic_saved, nic);
 
-    rc = libxl__device_nic_setdefault(gc, nic, domid);
+    rc = libxl__device_nic_setdefault(gc, nic, domid, NULL);
     if (rc) goto out;
 
     front = flexarray_make(gc, 16, 1);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 5128160..00fe462 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -119,6 +119,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
                 b_info->u.hvm.bios = LIBXL_BIOS_TYPE_ROMBIOS; break;
             case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
                 b_info->u.hvm.bios = LIBXL_BIOS_TYPE_SEABIOS; break;
+            case LIBXL_DEVICE_MODEL_VERSION_NONE:
+                break;
             default:return ERROR_INVAL;
             }
 
@@ -132,6 +134,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
             if (b_info->u.hvm.bios == LIBXL_BIOS_TYPE_ROMBIOS)
                 return ERROR_INVAL;
             break;
+        case LIBXL_DEVICE_MODEL_VERSION_NONE:
+            break;
         default:abort();
         }
 
@@ -236,6 +240,9 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
                 break;
             }
             break;
+        case LIBXL_DEVICE_MODEL_VERSION_NONE:
+            b_info->video_memkb = 0;
+            break;
         case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
         default:
             switch (b_info->u.hvm.vga.kind) {
@@ -923,7 +930,8 @@ static void initiate_domain_create(libxl__egc *egc,
          * called libxl_device_nic_add when domcreate_launch_dm gets called,
          * but qemu needs the nic information to be complete.
          */
-        ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid);
+        ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid,
+                                           &d_config->b_info);
         if (ret) {
             LOG(ERROR, "Unable to set nic defaults for nic %d", i);
             goto error_out;
@@ -1260,6 +1268,12 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev,
         libxl__device_console_add(gc, domid, &console, state, &device);
         libxl__device_console_dispose(&console);
 
+        if (d_config->b_info.device_model_version ==
+            LIBXL_DEVICE_MODEL_VERSION_NONE) {
+            domcreate_devmodel_started(egc, &dcs->dmss.dm, 0);
+            return;
+        }
+
         libxl_device_vkb_init(&vkb);
         libxl__device_vkb_add(gc, domid, &vkb);
         libxl_device_vkb_dispose(&vkb);
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 16ad47a..7ae10d3 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1501,7 +1501,8 @@ static void spawn_stub_launch_dm(libxl__egc *egc,
          * called libxl_device_nic_add at this point, but qemu needs
          * the nic information to be complete.
          */
-        ret = libxl__device_nic_setdefault(gc, &dm_config->nics[i], dm_domid);
+        ret = libxl__device_nic_setdefault(gc, &dm_config->nics[i], dm_domid,
+                                           &dm_config->b_info);
         if (ret)
             goto out;
     }
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 4a214e7..b82b938 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -781,21 +781,23 @@ static int hvm_build_set_params(xc_interface *handle, uint32_t domid,
     uint64_t str_mfn, cons_mfn;
     int i;
 
-    va_map = xc_map_foreign_range(handle, domid,
-                                  XC_PAGE_SIZE, PROT_READ | PROT_WRITE,
-                                  HVM_INFO_PFN);
-    if (va_map == NULL)
-        return ERROR_FAIL;
+    if (info->device_model_version != LIBXL_DEVICE_MODEL_VERSION_NONE) {
+        va_map = xc_map_foreign_range(handle, domid,
+                                      XC_PAGE_SIZE, PROT_READ | PROT_WRITE,
+                                      HVM_INFO_PFN);
+        if (va_map == NULL)
+            return ERROR_FAIL;
 
-    va_hvm = (struct hvm_info_table *)(va_map + HVM_INFO_OFFSET);
-    va_hvm->apic_mode = libxl_defbool_val(info->u.hvm.apic);
-    va_hvm->nr_vcpus = info->max_vcpus;
-    memset(va_hvm->vcpu_online, 0, sizeof(va_hvm->vcpu_online));
-    memcpy(va_hvm->vcpu_online, info->avail_vcpus.map, info->avail_vcpus.size);
-    for (i = 0, sum = 0; i < va_hvm->length; i++)
-        sum += ((uint8_t *) va_hvm)[i];
-    va_hvm->checksum -= sum;
-    munmap(va_map, XC_PAGE_SIZE);
+        va_hvm = (struct hvm_info_table *)(va_map + HVM_INFO_OFFSET);
+        va_hvm->apic_mode = libxl_defbool_val(info->u.hvm.apic);
+        va_hvm->nr_vcpus = info->max_vcpus;
+        memset(va_hvm->vcpu_online, 0, sizeof(va_hvm->vcpu_online));
+        memcpy(va_hvm->vcpu_online, info->avail_vcpus.map, info->avail_vcpus.size);
+        for (i = 0, sum = 0; i < va_hvm->length; i++)
+            sum += ((uint8_t *) va_hvm)[i];
+        va_hvm->checksum -= sum;
+        munmap(va_map, XC_PAGE_SIZE);
+    }
 
     xc_hvm_param_get(handle, domid, HVM_PARAM_STORE_PFN, &str_mfn);
     xc_hvm_param_get(handle, domid, HVM_PARAM_CONSOLE_PFN, &cons_mfn);
@@ -861,7 +863,7 @@ static int libxl__domain_firmware(libxl__gc *gc,
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     const char *firmware;
-    int e, rc = ERROR_FAIL;
+    int e, rc;
     int datalen = 0;
     void *data;
 
@@ -876,18 +878,34 @@ static int libxl__domain_firmware(libxl__gc *gc,
         case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
             firmware = "hvmloader";
             break;
+        case LIBXL_DEVICE_MODEL_VERSION_NONE:
+            if (info->kernel == NULL) {
+                LOG(ERROR, "no device model requested without a kernel");
+                rc = ERROR_FAIL;
+                goto out;
+            }
+            break;
         default:
             LOG(ERROR, "invalid device model version %d",
                 info->device_model_version);
-            return ERROR_FAIL;
-            break;
+            rc = ERROR_FAIL;
+            goto out;
         }
     }
 
-    rc = xc_dom_kernel_file(dom, libxl__abs_path(gc, firmware,
+    if (info->kernel != NULL &&
+        info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_NONE) {
+        /* Try to load a kernel instead of the firmware. */
+        rc = xc_dom_kernel_file(dom, info->kernel);
+        if (rc == 0 && info->ramdisk != NULL)
+            rc = xc_dom_ramdisk_file(dom, info->ramdisk);
+    } else {
+        rc = xc_dom_kernel_file(dom, libxl__abs_path(gc, firmware,
                                                  libxl__xenfirmwaredir_path()));
+    }
+
     if (rc != 0) {
-        LOGE(ERROR, "xc_dom_kernel_file failed");
+        LOGE(ERROR, "xc_dom_{kernel_file/ramdisk_file} failed");
         goto out;
     }
 
@@ -898,6 +916,7 @@ static int libxl__domain_firmware(libxl__gc *gc,
         if (e) {
             LOGEV(ERROR, e, "failed to read SMBIOS firmware file %s",
                 info->u.hvm.smbios_firmware);
+            rc = ERROR_FAIL;
             goto out;
         }
         libxl__ptr_add(gc, data);
@@ -915,6 +934,7 @@ static int libxl__domain_firmware(libxl__gc *gc,
         if (e) {
             LOGEV(ERROR, e, "failed to read ACPI firmware file %s",
                 info->u.hvm.acpi_firmware);
+            rc = ERROR_FAIL;
             goto out;
         }
         libxl__ptr_add(gc, data);
@@ -927,6 +947,7 @@ static int libxl__domain_firmware(libxl__gc *gc,
 
     return 0;
 out:
+    assert(rc != 0);
     return rc;
 }
 
@@ -939,10 +960,13 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     uint64_t mmio_start, lowmem_end, highmem_end, mem_size;
     libxl_domain_build_info *const info = &d_config->b_info;
     struct xc_dom_image *dom = NULL;
+    bool device_model =
+        info->device_model_version != LIBXL_DEVICE_MODEL_VERSION_NONE ?
+        true : false;
 
     xc_dom_loginit(ctx->xch);
 
-    dom = xc_dom_allocate(ctx->xch, NULL, NULL);
+    dom = xc_dom_allocate(ctx->xch, info->cmdline, NULL);
     if (!dom) {
         LOGE(ERROR, "xc_dom_allocate failed");
         goto out;
@@ -974,8 +998,12 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 
     if (dom->target_pages == 0)
         dom->target_pages = mem_size >> XC_PAGE_SHIFT;
-    if (dom->mmio_size == 0)
+    if (dom->mmio_size == 0 && device_model)
         dom->mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
+    else if (dom->mmio_size == 0 && !device_model)
+        dom->mmio_size = GB(4) -
+                    ((X86_HVM_END_SPECIAL_REGION - X86_HVM_NR_SPECIAL_PAGES)
+                    << XC_PAGE_SHIFT);
     lowmem_end = mem_size;
     highmem_end = 0;
     mmio_start = (1ull << 32) - dom->mmio_size;
@@ -987,8 +1015,8 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     dom->lowmem_end = lowmem_end;
     dom->highmem_end = highmem_end;
     dom->mmio_start = mmio_start;
-    dom->vga_hole_size = LIBXL_VGA_HOLE_SIZE
-    dom->device_model = true;
+    dom->vga_hole_size = device_model ? LIBXL_VGA_HOLE_SIZE : 0;
+    dom->device_model = device_model;
 
     rc = libxl__domain_device_construct_rdm(gc, d_config,
                                             info->u.hvm.rdm_mem_boundary_memkb*1024,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 294d442..e155987 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -112,6 +112,12 @@
 #define TAP_DEVICE_SUFFIX "-emu"
 #define DOMID_XS_PATH "domid"
 
+/* Size macros. */
+#define __AC(X,Y)   (X##Y)
+#define _AC(X,Y)    __AC(X,Y)
+#define MB(_mb)     (_AC(_mb, ULL) << 20)
+#define GB(_gb)     (_AC(_gb, ULL) << 30)
+
 #define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
 
 #define ROUNDUP(_val, _order)                                           \
@@ -1173,7 +1179,8 @@ _hidden int libxl__domain_build_info_setdefault(libxl__gc *gc,
 _hidden int libxl__device_disk_setdefault(libxl__gc *gc,
                                           libxl_device_disk *disk);
 _hidden int libxl__device_nic_setdefault(libxl__gc *gc, libxl_device_nic *nic,
-                                         uint32_t domid);
+                                         uint32_t domid,
+                                         libxl_domain_build_info *info);
 _hidden int libxl__device_vtpm_setdefault(libxl__gc *gc, libxl_device_vtpm *vtpm);
 _hidden int libxl__device_vfb_setdefault(libxl__gc *gc, libxl_device_vfb *vfb);
 _hidden int libxl__device_vkb_setdefault(libxl__gc *gc, libxl_device_vkb *vkb);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index ef346e7..b6e99c4 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -98,6 +98,7 @@ libxl_device_model_version = Enumeration("device_model_version", [
     (0, "UNKNOWN"),
     (1, "QEMU_XEN_TRADITIONAL"), # Historical qemu-xen device model (qemu-dm)
     (2, "QEMU_XEN"),             # Upstream based qemu-xen device model
+    (3, "NONE"),                 # No device model
     ])
 
 libxl_console_type = Enumeration("console_type", [
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 9ecd85d..cf68541 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -7,7 +7,9 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
                                       libxl_domain_config *d_config,
                                       xc_domain_configuration_t *xc_config)
 {
-    if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM)
+    if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM &&
+        d_config->b_info.device_model_version !=
+        LIBXL_DEVICE_MODEL_VERSION_NONE)
         xc_config->emulation_flags = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
                                       XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
                                       XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
@@ -488,6 +490,7 @@ int libxl__arch_domain_construct_memmap(libxl__gc *gc,
     struct e820entry *e820 = NULL;
     uint64_t highmem_size =
                     dom->highmem_end ? dom->highmem_end - (1ull << 32) : 0;
+    uint32_t lowmem_start = dom->device_model ? GUEST_LOW_MEM_START_DEFAULT : 0;
 
     /* Add all rdm entries. */
     for (i = 0; i < d_config->num_rdms; i++)
@@ -508,8 +511,8 @@ int libxl__arch_domain_construct_memmap(libxl__gc *gc,
     e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
 
     /* Low memory */
-    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
-    e820[nr].size = dom->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].addr = lowmem_start;
+    e820[nr].size = dom->lowmem_end - lowmem_start;
     e820[nr].type = E820_RAM;
     nr++;
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index ebbb9a5..4464aa5 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2172,6 +2172,8 @@ skip_vfb:
         } else if (!strcmp(buf, "qemu-xen")) {
             b_info->device_model_version
                 = LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
+        } else if (!strcmp(buf, "none")) {
+            b_info->device_model_version = LIBXL_DEVICE_MODEL_VERSION_NONE;
         } else {
             fprintf(stderr,
                     "Unknown device_model_version \"%s\" specified\n", buf);
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v6 29/29] libxl: add support for migrating HVM guests without a device model
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (27 preceding siblings ...)
  2015-09-04 12:09 ` [PATCH v6 28/29] libxl: allow the creation of HVM domains without a device model Roger Pau Monne
@ 2015-09-04 12:09 ` Roger Pau Monne
  2015-09-10 16:00   ` Wei Liu
  2015-09-10 16:30   ` Andrew Cooper
  2015-09-11 13:04 ` [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Ian Campbell
  29 siblings, 2 replies; 99+ messages in thread
From: Roger Pau Monne @ 2015-09-04 12:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Roger Pau Monne

Only some minor libxl changes are needed in order to be able to migrate HVM
guests without a device model, no hypervisor changes are needed.

This change adds support for LIBXL_DEVICE_MODEL_VERSION_NONE into the
migration code.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_dom.c              | 3 +++
 tools/libxl/libxl_dom_suspend.c      | 4 ++++
 tools/libxl/libxl_sr_stream_format.h | 1 +
 tools/libxl/libxl_stream_write.c     | 9 ++++++++-
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index b82b938..1566456 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1308,6 +1308,9 @@ void libxl__domain_suspend_common_switch_qemu_logdirty
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
         domain_suspend_switch_qemu_xen_logdirty(domid, enable, shs);
         break;
+    case LIBXL_DEVICE_MODEL_VERSION_NONE:
+        libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
+        break;
     default:
         LOG(ERROR,"logdirty switch failed"
             ", no valid device model version found, abandoning suspend");
diff --git a/tools/libxl/libxl_dom_suspend.c b/tools/libxl/libxl_dom_suspend.c
index 4cc01ad..7205e60 100644
--- a/tools/libxl/libxl_dom_suspend.c
+++ b/tools/libxl/libxl_dom_suspend.c
@@ -43,6 +43,8 @@ int libxl__domain_suspend_device_model(libxl__gc *gc,
         if (ret)
             unlink(filename);
         break;
+    case LIBXL_DEVICE_MODEL_VERSION_NONE:
+        break;
     default:
         return ERROR_INVAL;
     }
@@ -394,6 +396,8 @@ int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
         if (libxl__qmp_resume(gc, domid))
             return ERROR_FAIL;
         break;
+    case LIBXL_DEVICE_MODEL_VERSION_NONE:
+        break;
     default:
         return ERROR_INVAL;
     }
diff --git a/tools/libxl/libxl_sr_stream_format.h b/tools/libxl/libxl_sr_stream_format.h
index 54da360..960211c 100644
--- a/tools/libxl/libxl_sr_stream_format.h
+++ b/tools/libxl/libxl_sr_stream_format.h
@@ -46,6 +46,7 @@ typedef struct libxl__sr_emulator_hdr
 #define EMULATOR_UNKNOWN             0x00000000U
 #define EMULATOR_QEMU_TRADITIONAL    0x00000001U
 #define EMULATOR_QEMU_UPSTREAM       0x00000002U
+#define EMULATOR_NONE                0x00000003U
 
 #endif /* LIBXL__SR_STREAM_FORMAT_H */
 
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index 52a60d7..aa0a5e5 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -232,6 +232,10 @@ void libxl__stream_write_start(libxl__egc *egc,
             stream->emu_sub_hdr.id = EMULATOR_QEMU_UPSTREAM;
             break;
 
+        case LIBXL_DEVICE_MODEL_VERSION_NONE:
+            stream->emu_sub_hdr.id = EMULATOR_NONE;
+            break;
+
         default:
             rc = ERROR_FAIL;
             LOG(ERROR, "Unknown emulator for HVM domain");
@@ -387,8 +391,11 @@ static void emulator_xenstore_record_done(libxl__egc *egc,
                                           libxl__stream_write_state *stream)
 {
     libxl__domain_suspend_state *dss = stream->dss;
+    STATE_AO_GC(stream->ao);
 
-    if (dss->type == LIBXL_DOMAIN_TYPE_HVM)
+    if (dss->type == LIBXL_DOMAIN_TYPE_HVM &&
+        libxl__device_model_version_running(gc, dss->domid) !=
+        LIBXL_DEVICE_MODEL_VERSION_NONE)
         write_emulator_context_record(egc, stream);
     else {
         if (stream->in_checkpoint)
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-04 12:08 ` [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices Roger Pau Monne
@ 2015-09-04 12:25   ` Wei Liu
  2015-09-04 13:51     ` Roger Pau Monné
  2015-09-09 14:27   ` Wei Liu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 99+ messages in thread
From: Wei Liu @ 2015-09-04 12:25 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, xen-devel

On Fri, Sep 04, 2015 at 02:08:50PM +0200, Roger Pau Monne wrote:
> Introduce a bitmap in x86 xen_arch_domainconfig that allows enabling or
> disabling specific devices emulated inside of Xen for HVM guests.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> Changes since v4:
>  - Add a check to make sure the emulation bitmap is sane (undefined bits are
>    all 0s).
>  - Add Andrew Cooper Reviewed-by.
> 
> Changes since v3:
>  - Return EOPNOTSUPP instead of ENOPERM if an invalid emulation mask is
>    used.
>  - Fix error messages (prefix them with d%d and use %#x instead of 0x%x).
>  - Clearly state in the public header that emulation_flags should only be
>    used with HVM guests.
>  - Add a XEN_X86 prefix to the emulation flags defines.
>  - Properly parenthese the has_* marcos.
> ---
>  tools/libxl/libxl_x86.c           |  8 ++++++--
>  xen/arch/x86/domain.c             | 23 +++++++++++++++++++++++
>  xen/include/asm-x86/domain.h      | 13 +++++++++++++
>  xen/include/public/arch-x86/xen.h | 21 ++++++++++++++++++++-
>  4 files changed, 62 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index 9276126..9ecd85d 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -7,8 +7,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>                                        libxl_domain_config *d_config,
>                                        xc_domain_configuration_t *xc_config)
>  {
> -    /* No specific configuration right now */
> -
> +    if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM)
> +        xc_config->emulation_flags = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
> +                                      XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
> +                                      XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
> +                                      XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
> +                                      XEN_X86_EMU_IOMMU);
>      return 0;
>  }
>  
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 045f6ff..fe9504f 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -555,6 +555,29 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
>                 d->domain_id);
>      }
>  
> +    if ( is_hvm_domain(d) )
> +    {
> +        uint32_t emulation_mask = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
> +                                   XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
> +                                   XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
> +                                   XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
> +                                   XEN_X86_EMU_IOMMU);

This is repetitive. Could you consolidate all these to

  #define XEN_X86_EMU_ALL ...

?

Or am I talking non-sense?

Wei.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-04 12:25   ` Wei Liu
@ 2015-09-04 13:51     ` Roger Pau Monné
  2015-09-04 13:55       ` Jan Beulich
  2015-09-04 13:56       ` Wei Liu
  0 siblings, 2 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-04 13:51 UTC (permalink / raw)
  To: Wei Liu
  Cc: Ian Campbell, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	Jan Beulich, xen-devel

El 04/09/15 a les 14.25, Wei Liu ha escrit:
> On Fri, Sep 04, 2015 at 02:08:50PM +0200, Roger Pau Monne wrote:
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index 045f6ff..fe9504f 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -555,6 +555,29 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
>>                 d->domain_id);
>>      }
>>  
>> +    if ( is_hvm_domain(d) )
>> +    {
>> +        uint32_t emulation_mask = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
>> +                                   XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
>> +                                   XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
>> +                                   XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
>> +                                   XEN_X86_EMU_IOMMU);
> 
> This is repetitive. Could you consolidate all these to
> 
>   #define XEN_X86_EMU_ALL ...
> 
> ?

That sounds fine, I would place it in the public header where all the
XEN_X86_EMU_* are defined. I will wait for Andrew's opinion, since he
already acked this patch.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-04 13:51     ` Roger Pau Monné
@ 2015-09-04 13:55       ` Jan Beulich
  2015-09-04 22:41         ` Andrew Cooper
  2015-09-23 11:43         ` Roger Pau Monné
  2015-09-04 13:56       ` Wei Liu
  1 sibling, 2 replies; 99+ messages in thread
From: Jan Beulich @ 2015-09-04 13:55 UTC (permalink / raw)
  To: Roger Pau Monné, Wei Liu
  Cc: Andrew Cooper, xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

>>> On 04.09.15 at 15:51, <roger.pau@citrix.com> wrote:
> El 04/09/15 a les 14.25, Wei Liu ha escrit:
>> On Fri, Sep 04, 2015 at 02:08:50PM +0200, Roger Pau Monne wrote:
>>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>>> index 045f6ff..fe9504f 100644
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -555,6 +555,29 @@ int arch_domain_create(struct domain *d, unsigned int 
> domcr_flags,
>>>                 d->domain_id);
>>>      }
>>>  
>>> +    if ( is_hvm_domain(d) )
>>> +    {
>>> +        uint32_t emulation_mask = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
>>> +                                   XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
>>> +                                   XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
>>> +                                   XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
>>> +                                   XEN_X86_EMU_IOMMU);
>> 
>> This is repetitive. Could you consolidate all these to
>> 
>>   #define XEN_X86_EMU_ALL ...
>> 
>> ?
> 
> That sounds fine, I would place it in the public header where all the
> XEN_X86_EMU_* are defined. I will wait for Andrew's opinion, since he
> already acked this patch.

This doesn't belong in the public ABI, so if at all it should be added
there inside #ifdef __XEN__. Alternatively (and perhaps preferably)
this would go into another (internal) header.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-04 13:51     ` Roger Pau Monné
  2015-09-04 13:55       ` Jan Beulich
@ 2015-09-04 13:56       ` Wei Liu
  1 sibling, 0 replies; 99+ messages in thread
From: Wei Liu @ 2015-09-04 13:56 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, xen-devel

On Fri, Sep 04, 2015 at 03:51:17PM +0200, Roger Pau Monné wrote:
> El 04/09/15 a les 14.25, Wei Liu ha escrit:
> > On Fri, Sep 04, 2015 at 02:08:50PM +0200, Roger Pau Monne wrote:
> >> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> >> index 045f6ff..fe9504f 100644
> >> --- a/xen/arch/x86/domain.c
> >> +++ b/xen/arch/x86/domain.c
> >> @@ -555,6 +555,29 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
> >>                 d->domain_id);
> >>      }
> >>  
> >> +    if ( is_hvm_domain(d) )
> >> +    {
> >> +        uint32_t emulation_mask = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
> >> +                                   XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
> >> +                                   XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
> >> +                                   XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
> >> +                                   XEN_X86_EMU_IOMMU);
> > 
> > This is repetitive. Could you consolidate all these to
> > 
> >   #define XEN_X86_EMU_ALL ...
> > 
> > ?
> 
> That sounds fine, I would place it in the public header where all the
> XEN_X86_EMU_* are defined. I will wait for Andrew's opinion, since he
> already acked this patch.

Of course he has the final authority here. I don't want block this patch
just for such cosmetic comment. :-)

Wei.

> 
> Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-04 13:55       ` Jan Beulich
@ 2015-09-04 22:41         ` Andrew Cooper
  2015-09-23 11:43         ` Roger Pau Monné
  1 sibling, 0 replies; 99+ messages in thread
From: Andrew Cooper @ 2015-09-04 22:41 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné, Wei Liu
  Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

On 04/09/15 14:55, Jan Beulich wrote:
>>>> On 04.09.15 at 15:51, <roger.pau@citrix.com> wrote:
>> El 04/09/15 a les 14.25, Wei Liu ha escrit:
>>> On Fri, Sep 04, 2015 at 02:08:50PM +0200, Roger Pau Monne wrote:
>>>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>>>> index 045f6ff..fe9504f 100644
>>>> --- a/xen/arch/x86/domain.c
>>>> +++ b/xen/arch/x86/domain.c
>>>> @@ -555,6 +555,29 @@ int arch_domain_create(struct domain *d, unsigned int
>> domcr_flags,
>>>>                  d->domain_id);
>>>>       }
>>>>   
>>>> +    if ( is_hvm_domain(d) )
>>>> +    {
>>>> +        uint32_t emulation_mask = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
>>>> +                                   XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
>>>> +                                   XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
>>>> +                                   XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
>>>> +                                   XEN_X86_EMU_IOMMU);
>>> This is repetitive. Could you consolidate all these to
>>>
>>>    #define XEN_X86_EMU_ALL ...
>>>
>>> ?
>> That sounds fine, I would place it in the public header where all the
>> XEN_X86_EMU_* are defined. I will wait for Andrew's opinion, since he
>> already acked this patch.
> This doesn't belong in the public ABI, so if at all it should be added
> there inside #ifdef __XEN__. Alternatively (and perhaps preferably)
> this would go into another (internal) header.

Agreed.

Having an ALL mask will be useful for checking invalid values, but this 
logic is going to get a bit more complicated as soon as we see about 
introducing situations which permit the use of the vLAPIC, and I don't 
see the ALL mask being useful anywhere else.

I am not fussed either way.

~Andrew

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-04 12:08 ` [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices Roger Pau Monne
  2015-09-04 12:25   ` Wei Liu
@ 2015-09-09 14:27   ` Wei Liu
  2015-09-16  9:50   ` Jan Beulich
  2015-09-16 10:10   ` Jan Beulich
  3 siblings, 0 replies; 99+ messages in thread
From: Wei Liu @ 2015-09-09 14:27 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, xen-devel

On Fri, Sep 04, 2015 at 02:08:50PM +0200, Roger Pau Monne wrote:
> Introduce a bitmap in x86 xen_arch_domainconfig that allows enabling or
> disabling specific devices emulated inside of Xen for HVM guests.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 29/29] libxl: add support for migrating HVM guests without a device model
  2015-09-04 12:09 ` [PATCH v6 29/29] libxl: add support for migrating HVM guests " Roger Pau Monne
@ 2015-09-10 16:00   ` Wei Liu
  2015-09-10 16:30   ` Andrew Cooper
  1 sibling, 0 replies; 99+ messages in thread
From: Wei Liu @ 2015-09-10 16:00 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Ian Jackson, Ian Campbell, Wei Liu

On Fri, Sep 04, 2015 at 02:09:08PM +0200, Roger Pau Monne wrote:
> Only some minor libxl changes are needed in order to be able to migrate HVM
> guests without a device model, no hypervisor changes are needed.
> 
> This change adds support for LIBXL_DEVICE_MODEL_VERSION_NONE into the
> migration code.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
[...]
> diff --git a/tools/libxl/libxl_sr_stream_format.h b/tools/libxl/libxl_sr_stream_format.h
> index 54da360..960211c 100644
> --- a/tools/libxl/libxl_sr_stream_format.h
> +++ b/tools/libxl/libxl_sr_stream_format.h
> @@ -46,6 +46,7 @@ typedef struct libxl__sr_emulator_hdr
>  #define EMULATOR_UNKNOWN             0x00000000U
>  #define EMULATOR_QEMU_TRADITIONAL    0x00000001U
>  #define EMULATOR_QEMU_UPSTREAM       0x00000002U
> +#define EMULATOR_NONE                0x00000003U
>  

You also need to patch migration stream spec under docs.

Wei.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 26/29] libxc/xen: introduce a start info structure for HVMlite guests
  2015-09-04 12:09 ` [PATCH v6 26/29] libxc/xen: introduce a start info structure for HVMlite guests Roger Pau Monne
@ 2015-09-10 16:00   ` Wei Liu
  2015-09-21 15:53   ` Jan Beulich
  1 sibling, 0 replies; 99+ messages in thread
From: Wei Liu @ 2015-09-10 16:00 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, Jan Beulich, xen-devel

On Fri, Sep 04, 2015 at 02:09:05PM +0200, Roger Pau Monne wrote:
> This structure contains the physical address of the command line, as well as
> the physical address of the list of loaded modules. The physical address of
> this structure is passed to the guest at boot time in the %ebx register.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>

This patch, though touches libxc, is in fact ABI.

I don't think I have comment on the ABI itself here.

The code looks good enough to me FWIW.

Wei.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 29/29] libxl: add support for migrating HVM guests without a device model
  2015-09-04 12:09 ` [PATCH v6 29/29] libxl: add support for migrating HVM guests " Roger Pau Monne
  2015-09-10 16:00   ` Wei Liu
@ 2015-09-10 16:30   ` Andrew Cooper
  1 sibling, 0 replies; 99+ messages in thread
From: Andrew Cooper @ 2015-09-10 16:30 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Ian Jackson, Wei Liu, Ian Campbell

On 04/09/15 13:09, Roger Pau Monne wrote:
> Only some minor libxl changes are needed in order to be able to migrate HVM
> guests without a device model, no hypervisor changes are needed.
>
> This change adds support for LIBXL_DEVICE_MODEL_VERSION_NONE into the
> migration code.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

IMO, this is conceptually wrong.

No device model should be signified by not sending an emulator record.  
Sending an emulator record saying "No emulator" is pointless in a single 
emulator setup, and wrong for a multi-emulator setup.

Furthermore, for situations like vGPU with no-dm guests, there will 
actually be an emulator record sent for demu (The component XenServer 
currently uses to make vGPU function).


On the save side, if the domain has no emulator, no EMULATOR_* records 
should be sent.

On the receive side the domain configuration has already been sent so 
the domain is known to not have an emulator.  The arrival of an 
EMULATOR_* record should be a fatal error for restore.

Support for migration of no-dm guests does not require any extension to 
the migration ABI.  Having said this, the libxl spec does not have a 
"LAYOUT" section like the libxc spec, and might benefit from having one.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 00/29] Introduce HVM without dm and new boot ABI
  2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
                   ` (28 preceding siblings ...)
  2015-09-04 12:09 ` [PATCH v6 29/29] libxl: add support for migrating HVM guests " Roger Pau Monne
@ 2015-09-11 13:04 ` Ian Campbell
  29 siblings, 0 replies; 99+ messages in thread
From: Ian Campbell @ 2015-09-11 13:04 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel

On Fri, 2015-09-04 at 14:08 +0200, Roger Pau Monne wrote:
> This series is split in the following order:
> 
>  - Patches from 1 to 10 switch HVM domain contruction to use the xc_dom_*
>    family of functions, like they are used to build PV domains. This batch
>    of patches can go in regardless of the status of the rest of the series
>    IMHO, and in fact would help me quite a lot with the rebasing.

In principal I agree, however they are a pretty major refactoring which
would potentially make cherry-picking of subsequent patches for 4.6 more
difficult, so I'd like to hold off for a few weeks until we are getting
towards the final 4.6 release candidates.

Sorry.

Ian.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-04 12:08 ` [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices Roger Pau Monne
  2015-09-04 12:25   ` Wei Liu
  2015-09-09 14:27   ` Wei Liu
@ 2015-09-16  9:50   ` Jan Beulich
  2015-09-23 12:35     ` Roger Pau Monné
  2015-09-16 10:10   ` Jan Beulich
  3 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-16  9:50 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel

>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -7,8 +7,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>                                        libxl_domain_config *d_config,
>                                        xc_domain_configuration_t *xc_config)
>  {
> -    /* No specific configuration right now */
> -
> +    if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM)
> +        xc_config->emulation_flags = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
> +                                      XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
> +                                      XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
> +                                      XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
> +                                      XEN_X86_EMU_IOMMU);

This calls for the elsewhere discussed XEN_X86_EMU_ALL to even be
exposed to the tool stack.

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -555,6 +555,29 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
>                 d->domain_id);
>      }
>  
> +    if ( is_hvm_domain(d) )
> +    {
> +        uint32_t emulation_mask = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |

const

> +                                   XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
> +                                   XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
> +                                   XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
> +                                   XEN_X86_EMU_IOMMU);
> +        if ( (config->emulation_flags & ~emulation_mask) != 0 )

Missing blank line between declaration and statements.

> +        {
> +            printk(XENLOG_G_ERR "d%d: Invalid emulation bitmap: %#x.\n",

Generally we have no full stops at the end of log messages.

> +                   d->domain_id, config->emulation_flags);
> +            return -EINVAL;
> +        }
> +        if ( config->emulation_flags != emulation_mask )
> +        {
> +            printk(XENLOG_G_ERR "d%d: Xen does not allow HVM creation with the "
> +                   "current selection of emulators: %#x.\n", d->domain_id,
> +                   config->emulation_flags);
> +            return -EOPNOTSUPP;
> +        }
> +        d->arch.emulation_flags = config->emulation_flags;
> +    }

Isn't there an "else" missing here, validating that the flags are zero?

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 12/29] xen/x86: allow disabling the emulated local apic
  2015-09-04 12:08 ` [PATCH v6 12/29] xen/x86: allow disabling the emulated local apic Roger Pau Monne
@ 2015-09-16 10:05   ` Jan Beulich
  2015-09-23 15:45     ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-16 10:05 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Kevin Tian, Suravee Suthikulpanit, AndrewCooper, Eddie Dong,
	Aravind Gopalakrishnan, Jun Nakajima, xen-devel, Boris Ostrovsky

>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1035,6 +1035,7 @@ static void noreturn svm_do_resume(struct vcpu *v)
>      struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
>      bool_t debug_state = v->domain->debugger_attached;
>      bool_t vcpu_guestmode = 0;
> +    struct vlapic *vlapic = vcpu_vlapic(v);
>  
>      if ( nestedhvm_enabled(v->domain) && nestedhvm_vcpu_in_guestmode(v) )
>          vcpu_guestmode = 1;
> @@ -1058,14 +1059,14 @@ static void noreturn svm_do_resume(struct vcpu *v)
>          hvm_asid_flush_vcpu(v);
>      }
>  
> -    if ( !vcpu_guestmode )
> +    if ( !vcpu_guestmode && !vlapic_hw_disabled(vlapic) )
>      {
>          vintr_t intr;
>  
>          /* Reflect the vlapic's TPR in the hardware vtpr */
>          intr = vmcb_get_vintr(vmcb);
>          intr.fields.tpr =
> -            (vlapic_get_reg(vcpu_vlapic(v), APIC_TASKPRI) & 0xFF) >> 4;
> +            (vlapic_get_reg(vlapic, APIC_TASKPRI) & 0xFF) >> 4;
>          vmcb_set_vintr(vmcb, intr);
>      }
>  
> @@ -2294,6 +2295,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>      int inst_len, rc;
>      vintr_t intr;
>      bool_t vcpu_guestmode = 0;
> +    struct vlapic *vlapic = vcpu_vlapic(v);
>  
>      hvm_invalidate_regs_fields(regs);
>  
> @@ -2311,11 +2313,11 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>       * NB. We need to preserve the low bits of the TPR to make checked builds
>       * of Windows work, even though they don't actually do anything.
>       */
> -    if ( !vcpu_guestmode ) {
> +    if ( !vcpu_guestmode && !vlapic_hw_disabled(vlapic) ) {
>          intr = vmcb_get_vintr(vmcb);
> -        vlapic_set_reg(vcpu_vlapic(v), APIC_TASKPRI,
> +        vlapic_set_reg(vlapic, APIC_TASKPRI,
>                     ((intr.fields.tpr & 0x0F) << 4) |
> -                   (vlapic_get_reg(vcpu_vlapic(v), APIC_TASKPRI) & 0x0F));
> +                   (vlapic_get_reg(vlapic, APIC_TASKPRI) & 0x0F));
>      }
>  
>      exit_reason = vmcb->exitcode;
> @@ -2697,14 +2699,14 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>      }
>  
>    out:
> -    if ( vcpu_guestmode )
> +    if ( vcpu_guestmode || vlapic_hw_disabled(vlapic) )
>          /* Don't clobber TPR of the nested guest. */

The comment is now stale.

Also - aren't all the changes to this file (and perhaps othersfurther
down) bug fixes in their own right?

> @@ -1042,8 +1045,7 @@ void vlapic_tdt_msr_set(struct vlapic *vlapic, uint64_t value)
>      uint64_t guest_tsc;
>      struct vcpu *v = vlapic_vcpu(vlapic);
>  
> -    /* may need to exclude some other conditions like vlapic->hw.disabled */
> -    if ( !vlapic_lvtt_tdt(vlapic) )
> +    if ( !vlapic_lvtt_tdt(vlapic) || vlapic_hw_disabled(vlapic) )
>      {
>          HVM_DBG_LOG(DBG_LEVEL_VLAPIC_TIMER, "ignore tsc deadline msr write");

The newly added condition needless triggers the HVM_DBG_LOG().
Please separate it.

> @@ -1328,7 +1339,10 @@ static int lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
>      uint16_t vcpuid;
>      struct vcpu *v;
>      struct vlapic *s;
> -    
> +
> +    if ( !has_vlapic(d) )
> +        return 0;
> +
>      /* Which vlapic to load? */
>      vcpuid = hvm_load_instance(h); 
>      if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> @@ -1360,7 +1374,10 @@ static int lapic_load_regs(struct domain *d, hvm_domain_context_t *h)
>      uint16_t vcpuid;
>      struct vcpu *v;
>      struct vlapic *s;
> -    
> +
> +    if ( !has_vlapic(d) )
> +        return 0;

I agree that the save side should return zero in that case, but aren't
load attempts invalid (and hence warrant an error return)?

> @@ -1399,7 +1416,7 @@ int vlapic_init(struct vcpu *v)
>  
>      HVM_DBG_LOG(DBG_LEVEL_VLAPIC, "%d", v->vcpu_id);
>  
> -    if ( is_pvh_vcpu(v) )
> +    if ( is_pvh_vcpu(v) || !has_vlapic(v->domain) )

Isn't the latter alone sufficient?

> @@ -1452,6 +1469,9 @@ void vlapic_destroy(struct vcpu *v)
>  {
>      struct vlapic *vlapic = vcpu_vlapic(v);
>  
> +    if ( !has_vlapic(vlapic_domain(vlapic)) )

I think v->domain would be better here.

> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -482,6 +482,9 @@ found:
>  
>  void msixtbl_init(struct domain *d)
>  {
> +    if ( !has_vlapic(d) )
> +        return;
> +
>      INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
>      spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock);

Don't you also need to add guards to msixtbl_pt_{,un}register()?

> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1002,7 +1002,7 @@ static int construct_vmcs(struct vcpu *v)
>          ~(SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
>            SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
>  
> -    if ( is_pvh_domain(d) )
> +    if ( is_pvh_domain(d) || !has_vlapic(d) )

See above.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-04 12:08 ` [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices Roger Pau Monne
                     ` (2 preceding siblings ...)
  2015-09-16  9:50   ` Jan Beulich
@ 2015-09-16 10:10   ` Jan Beulich
  2015-09-23 12:42     ` Roger Pau Monné
  3 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-16 10:10 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel

>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -387,8 +387,21 @@ struct arch_domain
>      bool_t mem_access_emulate_enabled;
>  
>      struct monitor_write_data *event_write_data;
> +
> +    /* Emulated devices enabled bitmap. */
> +    uint32_t emulation_flags;
>  } __cacheline_aligned;
>  
> +#define has_vlapic(d)       ((d)->arch.emulation_flags & XEN_X86_EMU_LAPIC)
> +#define has_vhpet(d)        ((d)->arch.emulation_flags & XEN_X86_EMU_HPET)
> +#define has_vpmtimer(d)     ((d)->arch.emulation_flags & XEN_X86_EMU_PMTIMER)
> +#define has_vrtc(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_RTC)
> +#define has_vioapic(d)      ((d)->arch.emulation_flags & XEN_X86_EMU_IOAPIC)
> +#define has_vpic(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_PIC)
> +#define has_vpmu(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_PMU)
> +#define has_vvga(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_VGA)
> +#define has_viommu(d)       ((d)->arch.emulation_flags & XEN_X86_EMU_IOMMU)

And btw, now that I saw a few uses of these - do they really all need
to be has_v*() instead of just has_*()? Together with the macros taking
a domain pointer it's quite obvious that talk is about virtual devices...

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 09/29] libxl: switch HVM domain building to use xc_dom_* helpers
  2015-09-04 12:08 ` [PATCH v6 09/29] libxl: switch HVM domain building to use xc_dom_* helpers Roger Pau Monne
@ 2015-09-18 15:53   ` Anthony PERARD
  2015-09-23 10:38     ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Anthony PERARD @ 2015-09-18 15:53 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: xen-devel, Ian Campbell, Wei Liu, Ian Jackson, Stefano Stabellini

On Fri, Sep 04, 2015 at 02:08:48PM +0200, Roger Pau Monne wrote:
> Now that we have all the code in place HVM domain building in libxl can be
> switched to use the xc_dom_* family of functions, just like they are used in
> order to build PV guests.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
> Changes since v4:
>  - Add Wei Liu Acked-by.
> ---
>  tools/libxl/libxl_arch.h     |   2 +-
>  tools/libxl/libxl_dm.c       |  18 ++--
>  tools/libxl/libxl_dom.c      | 227 +++++++++++++++++++++++++------------------
>  tools/libxl/libxl_internal.h |   4 +-
>  tools/libxl/libxl_vnuma.c    |  12 ++-
>  tools/libxl/libxl_x86.c      |   8 +-
>  6 files changed, 155 insertions(+), 116 deletions(-)


> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 6101e5c..d2cf9e3 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -912,52 +935,62 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>                libxl__domain_build_state *state)
>  {
>      libxl_ctx *ctx = libxl__gc_owner(gc);
> -    struct xc_hvm_build_args args = {};
> -    int ret, rc;
> -    uint64_t mmio_start, lowmem_end, highmem_end;
> +    int rc;
> +    uint64_t mmio_start, lowmem_end, highmem_end, mem_size;
>      libxl_domain_build_info *const info = &d_config->b_info;
> +    struct xc_dom_image *dom = NULL;
> +
> +    xc_dom_loginit(ctx->xch);
> +
> +    dom = xc_dom_allocate(ctx->xch, NULL, NULL);
> +    if (!dom) {
> +        LOGE(ERROR, "xc_dom_allocate failed");

'rc' is uninitialized at this point and is going to be used in out:.

> +        goto out;
> +    }
> +

[...]

>  
> +    xc_dom_release(dom);
>      return 0;
> +
>  out:
>      assert(rc != 0);
> +    if (dom != NULL) xc_dom_release(dom);
>      return rc;
>  }
>  

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 08/29] libxc: introduce a xc_dom_arch for hvm-3.0-x86_32 guests
  2015-09-04 12:08 ` [PATCH v6 08/29] libxc: introduce a xc_dom_arch for hvm-3.0-x86_32 guests Roger Pau Monne
@ 2015-09-18 15:53   ` Anthony PERARD
  2015-09-23 10:32     ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Anthony PERARD @ 2015-09-18 15:53 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: xen-devel, Ian Campbell, Wei Liu, Ian Jackson, Stefano Stabellini

On Fri, Sep 04, 2015 at 02:08:47PM +0200, Roger Pau Monne wrote:
> This xc_dom_arch will be used in order to build HVM domains. The code is
> based on the existing xc_hvm_populate_memory and xc_hvm_populate_params
> functions.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
> Changes since v5:
>  - Set tr limit to 0x67.
>  - Use "goto out" consistently in vcpu_hvm.
>  - Unconditionally call free(full_ctx) before exiting vcpu_hvm.
>  - Add Wei Liu Ack.
> 
> Changes since v4:
>  - Replace a malloc+memset with a calloc.
>  - Remove a != NULL check.
>  - Add Andrew Cooper Reviewed-by.
> 
> Changes since v3:
>  - Make sure c/s b9dbe33 is not reverted on this patch.
>  - Set the initial BSP state using {get/set}hvmcontext.
> ---
>  tools/libxc/include/xc_dom.h |   6 +
>  tools/libxc/xc_dom_x86.c     | 618 ++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 613 insertions(+), 11 deletions(-)


> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index ae8187f..f36b6f6 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c

> +static int meminit_hvm(struct xc_dom_image *dom)
> +{
> +    unsigned long i, vmemid, nr_pages = dom->total_pages;
> +    unsigned long p2m_size;
> +    unsigned long target_pages = dom->target_pages;
> +    unsigned long cur_pages, cur_pfn;
> +    int rc;
> +    xen_capabilities_info_t caps;
> +    unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
> +        stat_1gb_pages = 0;
> +    unsigned int memflags = 0;
> +    int claim_enabled = dom->claim_enabled;
> +    uint64_t total_pages;
> +    xen_vmemrange_t dummy_vmemrange[2];
> +    unsigned int dummy_vnode_to_pnode[1];
> +    xen_vmemrange_t *vmemranges;
> +    unsigned int *vnode_to_pnode;
> +    unsigned int nr_vmemranges, nr_vnodes;
> +    xc_interface *xch = dom->xch;
> +    uint32_t domid = dom->guest_domid;
> +
> +    if ( nr_pages > target_pages )
> +        memflags |= XENMEMF_populate_on_demand;
> +
> +    if ( dom->nr_vmemranges == 0 )
> +    {
> +        /* Build dummy vnode information
> +         *
> +         * Guest physical address space layout:
> +         * [0, hole_start) [hole_start, 4G) [4G, highmem_end)
> +         *
> +         * Of course if there is no high memory, the second vmemrange
> +         * has no effect on the actual result.
> +         */
> +
> +        dummy_vmemrange[0].start = 0;
> +        dummy_vmemrange[0].end   = dom->lowmem_end;
> +        dummy_vmemrange[0].flags = 0;
> +        dummy_vmemrange[0].nid   = 0;
> +        nr_vmemranges = 1;
> +
> +        if ( dom->highmem_end > (1ULL << 32) )
> +        {
> +            dummy_vmemrange[1].start = 1ULL << 32;
> +            dummy_vmemrange[1].end   = dom->highmem_end;
> +            dummy_vmemrange[1].flags = 0;
> +            dummy_vmemrange[1].nid   = 0;
> +
> +            nr_vmemranges++;
> +        }
> +
> +        dummy_vnode_to_pnode[0] = XC_NUMA_NO_NODE;
> +        nr_vnodes = 1;
> +        vmemranges = dummy_vmemrange;
> +        vnode_to_pnode = dummy_vnode_to_pnode;
> +    }
> +    else
> +    {
> +        if ( nr_pages > target_pages )
> +        {
> +            DOMPRINTF("Cannot enable vNUMA and PoD at the same time");
> +            goto error_out;
> +        }
> +
> +        nr_vmemranges = dom->nr_vmemranges;
> +        nr_vnodes = dom->nr_vnodes;
> +        vmemranges = dom->vmemranges;
> +        vnode_to_pnode = dom->vnode_to_pnode;
> +    }
> +
> +    total_pages = 0;
> +    p2m_size = 0;
> +    for ( i = 0; i < nr_vmemranges; i++ )
> +    {
> +        total_pages += ((vmemranges[i].end - vmemranges[i].start)
> +                        >> PAGE_SHIFT);
> +        p2m_size = p2m_size > (vmemranges[i].end >> PAGE_SHIFT) ?
> +            p2m_size : (vmemranges[i].end >> PAGE_SHIFT);
> +    }
> +
> +    if ( total_pages != nr_pages )
> +    {
> +        DOMPRINTF("vNUMA memory pages mismatch (0x%"PRIx64" != 0x%"PRIx64")",

nr_pages is unsigned long, so would need to be print with %lx.

> +               total_pages, nr_pages);
> +        goto error_out;
> +    }
> +

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 17/29] xen/x86: allow disabling the emulated PIC
  2015-09-04 12:08 ` [PATCH v6 17/29] xen/x86: allow disabling the emulated PIC Roger Pau Monne
@ 2015-09-21 14:34   ` Jan Beulich
  2015-09-25 15:01     ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-21 14:34 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
> @@ -425,6 +431,9 @@ void vpic_reset(struct domain *d)
>  
>  void vpic_init(struct domain *d)
>  {
> +    if ( !has_vpic(d) )
> +        return;

vpic_reset() above this function as well as functions further down
in the source file aren't static, yet you aren't adding guards to them.
I think here and in other similar patches you should, in the commit
message, give reasons for any one not obviously being excluded
(e.g. because used only for handling intercepts which aren't getting
enabled) from the set needing such.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 18/29] xen/x86: allow disabling the emulated pmu
  2015-09-04 12:08 ` [PATCH v6 18/29] xen/x86: allow disabling the emulated pmu Roger Pau Monne
@ 2015-09-21 14:36   ` Jan Beulich
  2015-09-21 14:48     ` Boris Ostrovsky
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-21 14:36 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Boris Ostrovsky, xen-devel

>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:

Hmm - this seems questionable to me: Isn't the vPMU an optional
feature anyway? I.e. doesn't need separate handling here? Boris?

Jan

> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> Changes since v4:
>  - Add Andrew Cooper Acked-by.
> ---
>  xen/arch/x86/cpu/vpmu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
> index 8af3df1..d5bb77d 100644
> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -439,6 +439,9 @@ void vpmu_initialise(struct vcpu *v)
>      int ret;
>      bool_t is_priv_vpmu = is_hardware_domain(v->domain);
>  
> +    if ( !has_vpmu(v->domain) )
> +        return;
> +
>      BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
>      BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
>      BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
> -- 
> 1.9.5 (Apple Git-50.3)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 22/29] elfnotes: intorduce a new PHYS_ENTRY elfnote
  2015-09-04 12:09 ` [PATCH v6 22/29] elfnotes: intorduce a new PHYS_ENTRY elfnote Roger Pau Monne
@ 2015-09-21 14:47   ` Jan Beulich
  2015-09-28 10:35     ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-21 14:47 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Ian Jackson, xen-devel, Wei Liu, Ian Campbell, Stefano Stabellini

>>> On 04.09.15 at 14:09, <roger.pau@citrix.com> wrote:
> --- a/xen/include/public/elfnote.h
> +++ b/xen/include/public/elfnote.h
> @@ -200,9 +200,18 @@
>  #define XEN_ELFNOTE_SUPPORTED_FEATURES 17
>  
>  /*
> + * Physical entry point into the kernel.
> + *
> + * 32bit entry point into the kernel. Xen will use this entry point
> + * in order to launch the guest kernel in 32bit protected mode
> + * with paging disabled.
> + */
> +#define XEN_ELFNOTE_PHYS32_ENTRY 18

The comment reads as if this was the case for all kinds of guests,
yet I suppose it doesn't apply to PV ones. This should be made
explicit if so.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 18/29] xen/x86: allow disabling the emulated pmu
  2015-09-21 14:36   ` Jan Beulich
@ 2015-09-21 14:48     ` Boris Ostrovsky
  2015-09-25 15:07       ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Boris Ostrovsky @ 2015-09-21 14:48 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

On 09/21/2015 10:36 AM, Jan Beulich wrote:
>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
> Hmm - this seems questionable to me: Isn't the vPMU an optional
> feature anyway? I.e. doesn't need separate handling here? Boris?

It is optional system-wise, not per-guest, which is what I think Roger 
is trying to do. I in fact wanted to add ability to disable VPMU per 
guest myself.

However, VPMU has nothing to do with device model so I don't think it 
should be part of this series from that perspective.

-boris

>
> Jan
>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> Changes since v4:
>>   - Add Andrew Cooper Acked-by.
>> ---
>>   xen/arch/x86/cpu/vpmu.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
>> index 8af3df1..d5bb77d 100644
>> --- a/xen/arch/x86/cpu/vpmu.c
>> +++ b/xen/arch/x86/cpu/vpmu.c
>> @@ -439,6 +439,9 @@ void vpmu_initialise(struct vcpu *v)
>>       int ret;
>>       bool_t is_priv_vpmu = is_hardware_domain(v->domain);
>>   
>> +    if ( !has_vpmu(v->domain) )
>> +        return;
>> +
>>       BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
>>       BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
>>       BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
>> -- 
>> 1.9.5 (Apple Git-50.3)
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-04 12:09 ` [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs Roger Pau Monne
@ 2015-09-21 15:44   ` Jan Beulich
  2015-09-25 15:16     ` Andrew Cooper
  2015-09-28 16:09     ` Roger Pau Monné
  0 siblings, 2 replies; 99+ messages in thread
From: Jan Beulich @ 2015-09-21 15:44 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, xen-devel

>>> On 04.09.15 at 14:09, <roger.pau@citrix.com> wrote:
> Allow the usage of the VCPUOP_initialise, VCPUOP_up, VCPUOP_down and
> VCPUOP_is_up hypercalls from HVM guests.
> 
> This patch introduces a new structure (vcpu_hvm_context) that should be used
> in conjuction with the VCPUOP_initialise hypercall in order to initialize
> vCPUs for HVM guests.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

So this bi-modal thing doesn't look too bad, but a concern I have is
with the now different contexts used by XEN_DOMCTL_setvcpucontext
and VCPUOP_initialise.

> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -752,6 +752,30 @@ int arch_set_info_guest(
>      return 0;
>  }
>  
> +int arch_initialize_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
> +{
> +    struct vcpu_guest_context *ctxt;
> +    struct domain *d = v->domain;
> +    int rc;
> +
> +    if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
> +        return -ENOMEM;
> +
> +    if ( copy_from_guest(ctxt, arg, 1) )
> +    {
> +        free_vcpu_guest_context(ctxt);
> +        return -EFAULT;
> +    }
> +
> +    domain_lock(d);
> +    rc = v->is_initialised ? -EEXIST : arch_set_info_guest(v, ctxt);
> +    domain_unlock(d);
> +
> +    free_vcpu_guest_context(ctxt);
> +
> +    return rc;
> +}

I wonder whether this shouldn't instead be kept in common code,
with arch code calling it as needed (e.g. as default_initialize_vcpu()),
since afaict the code is now duplicate with the x86 side PV handling.

> @@ -1140,6 +1141,201 @@ int arch_set_info_guest(
>  #undef c
>  }
>  
> +/* Called by VCPUOP_initialise for HVM guests. */
> +static int arch_set_info_hvm_guest(struct vcpu *v, vcpu_hvm_context_t *ctx)
> +{
> +    struct cpu_user_regs *uregs = &v->arch.user_regs;
> +    struct segment_register cs, ds, ss, tr;
> +
> +#define SEG(s, r)                                                       \
> +    (struct segment_register){ .base = (r)->s ## _base,                 \
> +            .limit = (r)->s ## _limit, .attr.bytes = (r)->s ## _ar }
> +
> +    switch ( ctx->mode )
> +    {
> +    default:
> +        return -EINVAL;
> +
> +    case VCPU_HVM_MODE_16B:

I think "MODE" is misleading here, not just because of the register
size issue (see further down) but also because you don't seem to
enforce the respective mode to be chosen in cs_ar. I'm also missing
at least some simple consistency checks (like CS.DPL == SS.DPL,
rIP within CS limit, rSP within SS limit); leaving these to the first VM
entry would likely result in much harder to analyze issues in case of
bad input.

> +    {
> +        const struct vcpu_hvm_x86_16 *regs = &ctx->cpu_regs.x86_16;
> +
> +        uregs->rax    = regs->ax;
> +        uregs->rcx    = regs->cx;
> +        uregs->rdx    = regs->dx;
> +        uregs->rbx    = regs->bx;
> +        uregs->rsp    = regs->sp;
> +        uregs->rbp    = regs->bp;
> +        uregs->rsi    = regs->si;
> +        uregs->rdi    = regs->di;
> +        uregs->rip    = regs->ip;
> +        uregs->rflags = regs->flags;
> +
> +        v->arch.hvm_vcpu.guest_cr[0] = regs->cr0;
> +        v->arch.hvm_vcpu.guest_cr[4] = regs->cr4;
> +
> +        cs = SEG(cs, regs);
> +        ds = SEG(ds, regs);
> +        ss = SEG(ss, regs);
> +        tr = SEG(tr, regs);
> +    }
> +    break;
> +
> +    case VCPU_HVM_MODE_32B:
> +    {
> +        const struct vcpu_hvm_x86_32 *regs = &ctx->cpu_regs.x86_32;
> +
> +        uregs->rax    = regs->eax;
> +        uregs->rcx    = regs->ecx;
> +        uregs->rdx    = regs->edx;
> +        uregs->rbx    = regs->ebx;
> +        uregs->rsp    = regs->esp;
> +        uregs->rbp    = regs->ebp;
> +        uregs->rsi    = regs->esi;
> +        uregs->rdi    = regs->edi;
> +        uregs->rip    = regs->eip;
> +        uregs->rflags = regs->eflags;
> +
> +        v->arch.hvm_vcpu.guest_cr[0] = regs->cr0;
> +        v->arch.hvm_vcpu.guest_cr[3] = regs->cr3;
> +        v->arch.hvm_vcpu.guest_cr[4] = regs->cr4;
> +        v->arch.hvm_vcpu.guest_efer  = regs->efer;
> +
> +        cs = SEG(cs, regs);
> +        ds = SEG(ds, regs);
> +        ss = SEG(ss, regs);
> +        tr = SEG(tr, regs);
> +    }
> +    break;
> +
> +    case VCPU_HVM_MODE_64B:
> +    {
> +        const struct vcpu_hvm_x86_64 *regs = &ctx->cpu_regs.x86_64;
> +
> +        uregs->rax    = regs->rax;
> +        uregs->rcx    = regs->rcx;
> +        uregs->rdx    = regs->rdx;
> +        uregs->rbx    = regs->rbx;
> +        uregs->rsp    = regs->rsp;
> +        uregs->rbp    = regs->rbp;
> +        uregs->rsi    = regs->rsi;
> +        uregs->rdi    = regs->rdi;
> +        uregs->rip    = regs->rip;
> +        uregs->rflags = regs->rflags;
> +        uregs->r8     = regs->r8;
> +        uregs->r9     = regs->r9;
> +        uregs->r10    = regs->r10;
> +        uregs->r11    = regs->r11;
> +        uregs->r12    = regs->r12;
> +        uregs->r13    = regs->r13;
> +        uregs->r14    = regs->r14;
> +        uregs->r15    = regs->r15;
> +
> +        v->arch.hvm_vcpu.guest_cr[0] = regs->cr0;
> +        v->arch.hvm_vcpu.guest_cr[3] = regs->cr3;
> +        v->arch.hvm_vcpu.guest_cr[4] = regs->cr4;
> +        v->arch.hvm_vcpu.guest_efer  = regs->efer;
> +
> +        cs = SEG(cs, regs);
> +        ds = SEG(ds, regs);
> +        ss = SEG(ss, regs);
> +        tr = SEG(tr, regs);
> +    }
> +    break;
> +
> +    }
> +
> +    if ( !paging_mode_hap(v->domain) )
> +        v->arch.guest_table = pagetable_null();

A comment with the reason for this would be nice. I can't immediately
see why this is here.

> +    hvm_update_guest_cr(v, 0);
> +    hvm_update_guest_cr(v, 4);
> +
> +    if ( (ctx->mode == VCPU_HVM_MODE_32B) ||
> +         (ctx->mode == VCPU_HVM_MODE_64B) )
> +    {
> +        hvm_update_guest_cr(v, 3);
> +        hvm_update_guest_efer(v);
> +    }
> +
> +    if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) )
> +    {
> +        /* Shadow-mode CR3 change. Check PDBR and update refcounts. */
> +        struct page_info *page = get_page_from_gfn(v->domain,
> +                                 v->arch.hvm_vcpu.guest_cr[3] >> PAGE_SHIFT,
> +                                 NULL, P2M_ALLOC);

P2M_ALLOC but not P2M_UNSHARE?

> +        if ( !page )
> +        {
> +            gdprintk(XENLOG_ERR, "Invalid CR3: %#lx\n",
> +                     v->arch.hvm_vcpu.guest_cr[3]);
> +            domain_crash(v->domain);
> +            return -EINVAL;
> +        }
> +
> +        v->arch.guest_table = pagetable_from_page(page);
> +    }
> +
> +    hvm_set_segment_register(v, x86_seg_cs, &cs);
> +    hvm_set_segment_register(v, x86_seg_ds, &ds);
> +    hvm_set_segment_register(v, x86_seg_ss, &ss);
> +    hvm_set_segment_register(v, x86_seg_tr, &tr);
> +
> +    /* Sync AP's TSC with BSP's. */
> +    v->arch.hvm_vcpu.cache_tsc_offset =
> +        v->domain->vcpu[0]->arch.hvm_vcpu.cache_tsc_offset;
> +    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset,
> +                             v->domain->arch.hvm_domain.sync_tsc);
> +
> +    v->arch.hvm_vcpu.msr_tsc_adjust = 0;

The need for this one I also can't see right away - another comment
perhaps?

> +    paging_update_paging_modes(v);
> +
> +    v->is_initialised = 1;
> +    set_bit(_VPF_down, &v->pause_flags);

No VGCF_online equivalent?

> +    return 0;
> +#undef SEG

I think this should move up (right after the switch() using it).

> +#ifndef __XEN_PUBLIC_HVM_HVM_VCPU_H__
> +#define __XEN_PUBLIC_HVM_HVM_VCPU_H__
> +
> +#include "../xen.h"
> +
> +struct vcpu_hvm_x86_16 {
> +    uint16_t ax;
> +    uint16_t cx;
> +    uint16_t dx;
> +    uint16_t bx;
> +    uint16_t sp;
> +    uint16_t bp;
> +    uint16_t si;
> +    uint16_t di;
> +    uint16_t ip;
> +    uint16_t flags;
> +
> +    uint32_t cr0;
> +    uint32_t cr4;
> +
> +    uint32_t cs_base;
> +    uint32_t ds_base;
> +    uint32_t ss_base;
> +    uint32_t tr_base;
> +    uint32_t cs_limit;
> +    uint32_t ds_limit;
> +    uint32_t ss_limit;
> +    uint32_t tr_limit;
> +    uint16_t cs_ar;
> +    uint16_t ds_ar;
> +    uint16_t ss_ar;
> +    uint16_t tr_ar;
> +};

I doubt this is very useful: While one ought to be able to start a
guest in 16-bit mode, its GPRs still are supposed to be 32 bits wide.
The mode used really would depend on the cs_ar setting. (Having
the structure just for a 16-bit IP would seem insane.)

> +struct vcpu_hvm_x86_32 {
> +    uint32_t eax;
> +    uint32_t ecx;
> +    uint32_t edx;
> +    uint32_t ebx;
> +    uint32_t esp;
> +    uint32_t ebp;
> +    uint32_t esi;
> +    uint32_t edi;
> +    uint32_t eip;
> +    uint16_t eflags;

uint32_t for sure.

> +    uint32_t cr0;
> +    uint32_t cr3;
> +    uint32_t cr4;
> +    uint64_t efer;

What again was the point of having EFER here?

> +    uint32_t cs_base;
> +    uint32_t ds_base;
> +    uint32_t ss_base;

I continue to question why we have DS here, but not ES (and maybe
FS and GS too). I.e. either just CS and SS (which are architecturally
required) or at least all four traditional x86 segment registers. And
you're also clearly not targeting minimal state, or else there likely
wouldn't be a need for e.g. R8-R15 in the 64-bit variant.

> +struct vcpu_hvm_x86_64 {
> +    uint64_t rax;
> +    uint64_t rcx;
> +    uint64_t rdx;
> +    uint64_t rbx;
> +    uint64_t rsp;
> +    uint64_t rbp;
> +    uint64_t rsi;
> +    uint64_t rdi;
> +    uint64_t r8;
> +    uint64_t r9;
> +    uint64_t r10;
> +    uint64_t r11;
> +    uint64_t r12;
> +    uint64_t r13;
> +    uint64_t r14;
> +    uint64_t r15;
> +    uint64_t rip;
> +    uint64_t rflags;
> +
> +    uint64_t cr0;
> +    uint64_t cr3;
> +    uint64_t cr4;
> +    uint64_t efer;
> +
> +    uint32_t cs_base;
> +    uint32_t ds_base;
> +    uint32_t ss_base;
> +    uint32_t tr_base;
> +    uint32_t cs_limit;
> +    uint32_t ds_limit;
> +    uint32_t ss_limit;
> +    uint32_t tr_limit;

I can see the need for the TR ones here, but what's the point of the
CS, SS, and DS ones?

> +/*
> + * The layout of the _ar fields of the segment registers is the
> + * following:
> + *
> + * Bits [0,3]: type (bits 40-43).
> + * Bit      4: s    (descriptor type, bit 44).
> + * Bit  [5,6]: dpl  (descriptor privilege level, bits 45-46).
> + * Bit      7: p    (segment-present, bit 47).
> + * Bit      8: avl  (available for system software, bit 52).
> + * Bit      9: l    (64-bit code segment, bit 53).
> + * Bit     10: db   (meaning depends on the segment, bit 54).
> + * Bit     11: g    (granularity, bit 55)
> + *
> + * A more complete description of the meaning of this fields can be
> + * obtained from the Intel SDM, Volume 3, section 3.4.5.
> + */

Please make explicit what state bits 12-15 are expected to be in
(hopefully they're being checked to be zero rather than getting
ignored).

Please also clarify whether the limit specified for the various
segment registers is the one present in descriptor tables (subject
to scaling by the g bit) or (more likely) the byte granular one. In
any event I suppose certain restrictions apply.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 26/29] libxc/xen: introduce a start info structure for HVMlite guests
  2015-09-04 12:09 ` [PATCH v6 26/29] libxc/xen: introduce a start info structure for HVMlite guests Roger Pau Monne
  2015-09-10 16:00   ` Wei Liu
@ 2015-09-21 15:53   ` Jan Beulich
  2015-09-28 16:51     ` Roger Pau Monné
  1 sibling, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-21 15:53 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel

>>> On 04.09.15 at 14:09, <roger.pau@citrix.com> wrote:

First of all - I suppose it is intentional for this to not consider the Dom0
side (yet)?

> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -560,7 +560,70 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
>      xc_hvm_param_set(xch, domid, HVM_PARAM_SHARING_RING_PFN,
>                       special_pfn(SPECIALPAGE_SHARING));
>  
> -    if ( dom->device_model )
> +    if ( !dom->device_model )
> +    {
> +        struct xc_dom_seg seg;
> +        struct hvm_start_info *start_info;
> +        char *cmdline;
> +        struct hvm_modlist_entry *modlist;
> +        void *start_page;
> +        size_t cmdline_size = 0;
> +        size_t start_info_size = sizeof(*start_info);
> +
> +        if ( dom->cmdline )
> +        {
> +            cmdline_size = ROUNDUP(strlen(dom->cmdline) + 1, 8);
> +            start_info_size += cmdline_size;
> +
> +        }
> +        if ( dom->ramdisk_blob )
> +            start_info_size += sizeof(*modlist); /* Limited to one module. */
> +
> +        rc = xc_dom_alloc_segment(dom, &seg, "HVMlite start info", 0,
> +                                  start_info_size);
> +        if ( rc != 0 )
> +        {
> +            DOMPRINTF("Unable to reserve memory for the start info");
> +            goto out;
> +        }
> +
> +        start_page = xc_map_foreign_range(xch, domid, start_info_size,
> +                                          PROT_READ | PROT_WRITE,
> +                                          seg.pfn);
> +        if ( start_page == NULL )
> +        {
> +            DOMPRINTF("Unable to map HVM start info page");
> +            goto error_out;
> +        }
> +
> +        start_info = start_page;
> +        cmdline = start_page + sizeof(*start_info);
> +        modlist = start_page + sizeof(*start_info) + cmdline_size;
> +
> +        if ( dom->cmdline )
> +        {
> +            strncpy(cmdline, dom->cmdline, MAX_GUEST_CMDLINE);
> +            cmdline[MAX_GUEST_CMDLINE - 1] = '\0';
> +            start_info->cmdline_paddr = (seg.pfn << PAGE_SHIFT) +

Not knowing much about the tools interface used for allocation
above: Does that interface guarantee this shift (and another
one below) to not overflow?

> +                                ((xen_pfn_t)cmdline - (xen_pfn_t)start_info);

xen_pfn_t? Aren't these byte addresses?

> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -784,6 +784,25 @@ struct start_info {
>  };
>  typedef struct start_info start_info_t;
>  
> +/*
> + * Start of day structure passed to PVH guests in %ebx.
> + */

This is a single line comment.

> +struct hvm_start_info {
> +#define HVM_START_MAGIC_VALUE 0x336ec578
> +    uint32_t magic;             /* Contains the magic value 0x336ec578       */
> +                                /* ("xEn3" with the 0x80 bit of the "E" set).*/
> +    uint32_t flags;             /* SIF_xxx flags.                            */
> +    uint32_t cmdline_paddr;     /* Physical address of the command line.     */
> +    uint32_t nr_modules;        /* Number of modules passed to the kernel.   */
> +    uint32_t modlist_paddr;     /* Physical address of an array of           */
> +                                /* hvm_modlist_entry.                        */
> +};
> +
> +struct hvm_modlist_entry {
> +    uint32_t paddr;             /* Physical address of the module.           */
> +    uint32_t size;              /* Size of the module in bytes.              */
> +};

Iirc this already went back and forth, but - is this meant to be an
x86-specific interface? If not, should we really limit physical
addresses to 32 bits here?

Also this now sits inside a XEN_HAVE_PV_GUEST_ENTRY conditional
- is that intended?

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 08/29] libxc: introduce a xc_dom_arch for hvm-3.0-x86_32 guests
  2015-09-18 15:53   ` Anthony PERARD
@ 2015-09-23 10:32     ` Roger Pau Monné
  0 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-23 10:32 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: xen-devel, Ian Campbell, Wei Liu, Ian Jackson, Stefano Stabellini

El 18/09/15 a les 17.53, Anthony PERARD ha escrit:
> On Fri, Sep 04, 2015 at 02:08:47PM +0200, Roger Pau Monne wrote:
>> +    total_pages = 0;
>> +    p2m_size = 0;
>> +    for ( i = 0; i < nr_vmemranges; i++ )
>> +    {
>> +        total_pages += ((vmemranges[i].end - vmemranges[i].start)
>> +                        >> PAGE_SHIFT);
>> +        p2m_size = p2m_size > (vmemranges[i].end >> PAGE_SHIFT) ?
>> +            p2m_size : (vmemranges[i].end >> PAGE_SHIFT);
>> +    }
>> +
>> +    if ( total_pages != nr_pages )
>> +    {
>> +        DOMPRINTF("vNUMA memory pages mismatch (0x%"PRIx64" != 0x%"PRIx64")",
> 
> nr_pages is unsigned long, so would need to be print with %lx.

Thanks, fixed now. This code is almost a verbatim copy of the code in
xc_hvm_build_x86.c, so I guess you are also seeing this error there.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 09/29] libxl: switch HVM domain building to use xc_dom_* helpers
  2015-09-18 15:53   ` Anthony PERARD
@ 2015-09-23 10:38     ` Roger Pau Monné
  0 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-23 10:38 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: xen-devel, Ian Campbell, Wei Liu, Ian Jackson, Stefano Stabellini

El 18/09/15 a les 17.53, Anthony PERARD ha escrit:
> On Fri, Sep 04, 2015 at 02:08:48PM +0200, Roger Pau Monne wrote:
>> Now that we have all the code in place HVM domain building in libxl can be
>> switched to use the xc_dom_* family of functions, just like they are used in
>> order to build PV guests.
>>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> Acked-by: Wei Liu <wei.liu2@citrix.com>
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> Cc: Ian Campbell <ian.campbell@citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>> Changes since v4:
>>  - Add Wei Liu Acked-by.
>> ---
>>  tools/libxl/libxl_arch.h     |   2 +-
>>  tools/libxl/libxl_dm.c       |  18 ++--
>>  tools/libxl/libxl_dom.c      | 227 +++++++++++++++++++++++++------------------
>>  tools/libxl/libxl_internal.h |   4 +-
>>  tools/libxl/libxl_vnuma.c    |  12 ++-
>>  tools/libxl/libxl_x86.c      |   8 +-
>>  6 files changed, 155 insertions(+), 116 deletions(-)
> 
> 
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index 6101e5c..d2cf9e3 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -912,52 +935,62 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>>                libxl__domain_build_state *state)
>>  {
>>      libxl_ctx *ctx = libxl__gc_owner(gc);
>> -    struct xc_hvm_build_args args = {};
>> -    int ret, rc;
>> -    uint64_t mmio_start, lowmem_end, highmem_end;
>> +    int rc;
>> +    uint64_t mmio_start, lowmem_end, highmem_end, mem_size;
>>      libxl_domain_build_info *const info = &d_config->b_info;
>> +    struct xc_dom_image *dom = NULL;
>> +
>> +    xc_dom_loginit(ctx->xch);
>> +
>> +    dom = xc_dom_allocate(ctx->xch, NULL, NULL);
>> +    if (!dom) {
>> +        LOGE(ERROR, "xc_dom_allocate failed");
> 
> 'rc' is uninitialized at this point and is going to be used in out:.

Yes, I'm sorry, I have no idea how I missed it. Anyway, as you say this
is missing:

rc = ERROR_NOMEM;

before the goto.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-04 13:55       ` Jan Beulich
  2015-09-04 22:41         ` Andrew Cooper
@ 2015-09-23 11:43         ` Roger Pau Monné
  1 sibling, 0 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-23 11:43 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu
  Cc: Andrew Cooper, xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

El 04/09/15 a les 15.55, Jan Beulich ha escrit:
>>>> On 04.09.15 at 15:51, <roger.pau@citrix.com> wrote:
>> El 04/09/15 a les 14.25, Wei Liu ha escrit:
>>> On Fri, Sep 04, 2015 at 02:08:50PM +0200, Roger Pau Monne wrote:
>>>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>>>> index 045f6ff..fe9504f 100644
>>>> --- a/xen/arch/x86/domain.c
>>>> +++ b/xen/arch/x86/domain.c
>>>> @@ -555,6 +555,29 @@ int arch_domain_create(struct domain *d, unsigned int 
>> domcr_flags,
>>>>                 d->domain_id);
>>>>      }
>>>>  
>>>> +    if ( is_hvm_domain(d) )
>>>> +    {
>>>> +        uint32_t emulation_mask = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
>>>> +                                   XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
>>>> +                                   XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
>>>> +                                   XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
>>>> +                                   XEN_X86_EMU_IOMMU);
>>>
>>> This is repetitive. Could you consolidate all these to
>>>
>>>   #define XEN_X86_EMU_ALL ...
>>>
>>> ?
>>
>> That sounds fine, I would place it in the public header where all the
>> XEN_X86_EMU_* are defined. I will wait for Andrew's opinion, since he
>> already acked this patch.
> 
> This doesn't belong in the public ABI, so if at all it should be added
> there inside #ifdef __XEN__. Alternatively (and perhaps preferably)
> this would go into another (internal) header.

The definitions of XEN_X86_EMU_* and xen_arch_domainconfig are already
protected with:

#if defined(__XEN__) || defined(__XEN_TOOLS__)

I don't mind putting them in a different header, but I think this needs
to live in the public headers, and IMHO public/arch-x86/xen.h is
probably the best place, specially taking into account that
xen_arch_domainconfig is also defined there.

Unless someone objects and provides a more suitable location I'm going
to define XEN_X86_EMU_ALL there also.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-16  9:50   ` Jan Beulich
@ 2015-09-23 12:35     ` Roger Pau Monné
  2015-09-23 13:24       ` Jan Beulich
  0 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-23 12:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel

El 16/09/15 a les 11.50, Jan Beulich ha escrit:
>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>> --- a/tools/libxl/libxl_x86.c
>> +++ b/tools/libxl/libxl_x86.c
>> @@ -7,8 +7,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>>                                        libxl_domain_config *d_config,
>>                                        xc_domain_configuration_t *xc_config)
>>  {
>> -    /* No specific configuration right now */
>> -
>> +    if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM)
>> +        xc_config->emulation_flags = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
>> +                                      XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
>> +                                      XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
>> +                                      XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
>> +                                      XEN_X86_EMU_IOMMU);
> 
> This calls for the elsewhere discussed XEN_X86_EMU_ALL to even be
> exposed to the tool stack.

Done.

>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -555,6 +555,29 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
>>                 d->domain_id);
>>      }
>>  
>> +    if ( is_hvm_domain(d) )
>> +    {
>> +        uint32_t emulation_mask = (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |
> 
> const

With the introduction of XEN_X86_EMU_ALL this is no longer needed.

> 
>> +                                   XEN_X86_EMU_PMTIMER | XEN_X86_EMU_RTC |
>> +                                   XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |
>> +                                   XEN_X86_EMU_PMU | XEN_X86_EMU_VGA |
>> +                                   XEN_X86_EMU_IOMMU);
>> +        if ( (config->emulation_flags & ~emulation_mask) != 0 )
> 
> Missing blank line between declaration and statements.

emulation_mask is now gone, so no need for the newline any more.

>> +        {
>> +            printk(XENLOG_G_ERR "d%d: Invalid emulation bitmap: %#x.\n",
> 
> Generally we have no full stops at the end of log messages.

Ack, I've removed them here and below, but I've noticed that other
printks in arch_domain_create have full stops (that's probably why I've
added them here).

> 
>> +                   d->domain_id, config->emulation_flags);
>> +            return -EINVAL;
>> +        }
>> +        if ( config->emulation_flags != emulation_mask )
>> +        {
>> +            printk(XENLOG_G_ERR "d%d: Xen does not allow HVM creation with the "
>> +                   "current selection of emulators: %#x.\n", d->domain_id,
>> +                   config->emulation_flags);
>> +            return -EOPNOTSUPP;
>> +        }
>> +        d->arch.emulation_flags = config->emulation_flags;
>> +    }
> 
> Isn't there an "else" missing here, validating that the flags are zero?

The comment in xen/include/asm-x86/domain.h above the emulation_flags
field already mentions that this field is ignored for PV guests. For
example the x86 Dom0 building code calls arch_domain_create passing in a
NULL xen_arch_domainconfig, so I think it's easier to just ignore this
for PV guests.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-16 10:10   ` Jan Beulich
@ 2015-09-23 12:42     ` Roger Pau Monné
  2015-09-23 12:46       ` Andrew Cooper
  0 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-23 12:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel

El 16/09/15 a les 12.10, Jan Beulich ha escrit:
>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>> --- a/xen/include/asm-x86/domain.h
>> +++ b/xen/include/asm-x86/domain.h
>> @@ -387,8 +387,21 @@ struct arch_domain
>>      bool_t mem_access_emulate_enabled;
>>  
>>      struct monitor_write_data *event_write_data;
>> +
>> +    /* Emulated devices enabled bitmap. */
>> +    uint32_t emulation_flags;
>>  } __cacheline_aligned;
>>  
>> +#define has_vlapic(d)       ((d)->arch.emulation_flags & XEN_X86_EMU_LAPIC)
>> +#define has_vhpet(d)        ((d)->arch.emulation_flags & XEN_X86_EMU_HPET)
>> +#define has_vpmtimer(d)     ((d)->arch.emulation_flags & XEN_X86_EMU_PMTIMER)
>> +#define has_vrtc(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_RTC)
>> +#define has_vioapic(d)      ((d)->arch.emulation_flags & XEN_X86_EMU_IOAPIC)
>> +#define has_vpic(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_PIC)
>> +#define has_vpmu(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_PMU)
>> +#define has_vvga(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_VGA)
>> +#define has_viommu(d)       ((d)->arch.emulation_flags & XEN_X86_EMU_IOMMU)
> 
> And btw, now that I saw a few uses of these - do they really all need
> to be has_v*() instead of just has_*()? Together with the macros taking
> a domain pointer it's quite obvious that talk is about virtual devices...

IMHO, I prefer to have the "v" prefix, because that's how they are named
inside of Xen, so it's more clear that we are actually disabling them,
but I'm not going to fight for it.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-23 12:42     ` Roger Pau Monné
@ 2015-09-23 12:46       ` Andrew Cooper
  0 siblings, 0 replies; 99+ messages in thread
From: Andrew Cooper @ 2015-09-23 12:46 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: Ian Jackson, xen-devel, Wei Liu, Ian Campbell, Stefano Stabellini



On 23/09/15 13:42, Roger Pau Monné wrote:
> El 16/09/15 a les 12.10, Jan Beulich ha escrit:
>>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>>> --- a/xen/include/asm-x86/domain.h
>>> +++ b/xen/include/asm-x86/domain.h
>>> @@ -387,8 +387,21 @@ struct arch_domain
>>>       bool_t mem_access_emulate_enabled;
>>>   
>>>       struct monitor_write_data *event_write_data;
>>> +
>>> +    /* Emulated devices enabled bitmap. */
>>> +    uint32_t emulation_flags;
>>>   } __cacheline_aligned;
>>>   
>>> +#define has_vlapic(d)       ((d)->arch.emulation_flags & XEN_X86_EMU_LAPIC)
>>> +#define has_vhpet(d)        ((d)->arch.emulation_flags & XEN_X86_EMU_HPET)
>>> +#define has_vpmtimer(d)     ((d)->arch.emulation_flags & XEN_X86_EMU_PMTIMER)
>>> +#define has_vrtc(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_RTC)
>>> +#define has_vioapic(d)      ((d)->arch.emulation_flags & XEN_X86_EMU_IOAPIC)
>>> +#define has_vpic(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_PIC)
>>> +#define has_vpmu(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_PMU)
>>> +#define has_vvga(d)         ((d)->arch.emulation_flags & XEN_X86_EMU_VGA)
>>> +#define has_viommu(d)       ((d)->arch.emulation_flags & XEN_X86_EMU_IOMMU)
>> And btw, now that I saw a few uses of these - do they really all need
>> to be has_v*() instead of just has_*()? Together with the macros taking
>> a domain pointer it's quite obvious that talk is about virtual devices...
> IMHO, I prefer to have the "v" prefix, because that's how they are named
> inside of Xen, so it's more clear that we are actually disabling them,
> but I'm not going to fight for it.

I personally think it is clearer having the v in the name.

~Andrew

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-23 12:35     ` Roger Pau Monné
@ 2015-09-23 13:24       ` Jan Beulich
  2015-09-23 15:02         ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-23 13:24 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel

>>> On 23.09.15 at 14:35, <roger.pau@citrix.com> wrote:
> El 16/09/15 a les 11.50, Jan Beulich ha escrit:
>>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>>> +                   d->domain_id, config->emulation_flags);
>>> +            return -EINVAL;
>>> +        }
>>> +        if ( config->emulation_flags != emulation_mask )
>>> +        {
>>> +            printk(XENLOG_G_ERR "d%d: Xen does not allow HVM creation with the "
>>> +                   "current selection of emulators: %#x.\n", d->domain_id,
>>> +                   config->emulation_flags);
>>> +            return -EOPNOTSUPP;
>>> +        }
>>> +        d->arch.emulation_flags = config->emulation_flags;
>>> +    }
>> 
>> Isn't there an "else" missing here, validating that the flags are zero?
> 
> The comment in xen/include/asm-x86/domain.h above the emulation_flags
> field already mentions that this field is ignored for PV guests. For
> example the x86 Dom0 building code calls arch_domain_create passing in a
> NULL xen_arch_domainconfig, so I think it's easier to just ignore this
> for PV guests.

Easier now, but perhaps more cumbersome if we ever want to
assign that field some meaning for PV. It's a domctl, so not _that_
difficult to change, but you may have noticed that I generally ask
for unused fields/bits to be validated to be zero, to allow using
them later on. Anyway - not a big issue, I just wanted to point it
out (and I might stumble across the missing else again during
review of future revisions).

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices
  2015-09-23 13:24       ` Jan Beulich
@ 2015-09-23 15:02         ` Roger Pau Monné
  0 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-23 15:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel

El 23/09/15 a les 15.24, Jan Beulich ha escrit:
>>>> On 23.09.15 at 14:35, <roger.pau@citrix.com> wrote:
>> El 16/09/15 a les 11.50, Jan Beulich ha escrit:
>>>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>>>> +                   d->domain_id, config->emulation_flags);
>>>> +            return -EINVAL;
>>>> +        }
>>>> +        if ( config->emulation_flags != emulation_mask )
>>>> +        {
>>>> +            printk(XENLOG_G_ERR "d%d: Xen does not allow HVM creation with the "
>>>> +                   "current selection of emulators: %#x.\n", d->domain_id,
>>>> +                   config->emulation_flags);
>>>> +            return -EOPNOTSUPP;
>>>> +        }
>>>> +        d->arch.emulation_flags = config->emulation_flags;
>>>> +    }
>>>
>>> Isn't there an "else" missing here, validating that the flags are zero?
>>
>> The comment in xen/include/asm-x86/domain.h above the emulation_flags
>> field already mentions that this field is ignored for PV guests. For
>> example the x86 Dom0 building code calls arch_domain_create passing in a
>> NULL xen_arch_domainconfig, so I think it's easier to just ignore this
>> for PV guests.
> 
> Easier now, but perhaps more cumbersome if we ever want to
> assign that field some meaning for PV. It's a domctl, so not _that_
> difficult to change, but you may have noticed that I generally ask
> for unused fields/bits to be validated to be zero, to allow using
> them later on. Anyway - not a big issue, I just wanted to point it
> out (and I might stumble across the missing else again during
> review of future revisions).

OK, you convinced me. I've added a check to the start of
arch_domain_create in order to make sure config is not NULL, and fixed
the x86 callers of domain_create in order to make sure a non-null config
is always provided.

Also dropped Andrew Cooper's reviewed-by tag, since the hypervisor side
code has changed substantially.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 12/29] xen/x86: allow disabling the emulated local apic
  2015-09-16 10:05   ` Jan Beulich
@ 2015-09-23 15:45     ` Roger Pau Monné
  2015-09-24  7:57       ` Jan Beulich
  0 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-23 15:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Suravee Suthikulpanit, AndrewCooper, Eddie Dong,
	Aravind Gopalakrishnan, Jun Nakajima, xen-devel, Boris Ostrovsky

El 16/09/15 a les 12.05, Jan Beulich ha escrit:
>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -1035,6 +1035,7 @@ static void noreturn svm_do_resume(struct vcpu *v)
>>      struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
>>      bool_t debug_state = v->domain->debugger_attached;
>>      bool_t vcpu_guestmode = 0;
>> +    struct vlapic *vlapic = vcpu_vlapic(v);
>>  
>>      if ( nestedhvm_enabled(v->domain) && nestedhvm_vcpu_in_guestmode(v) )
>>          vcpu_guestmode = 1;
>> @@ -1058,14 +1059,14 @@ static void noreturn svm_do_resume(struct vcpu *v)
>>          hvm_asid_flush_vcpu(v);
>>      }
>>  
>> -    if ( !vcpu_guestmode )
>> +    if ( !vcpu_guestmode && !vlapic_hw_disabled(vlapic) )
>>      {
>>          vintr_t intr;
>>  
>>          /* Reflect the vlapic's TPR in the hardware vtpr */
>>          intr = vmcb_get_vintr(vmcb);
>>          intr.fields.tpr =
>> -            (vlapic_get_reg(vcpu_vlapic(v), APIC_TASKPRI) & 0xFF) >> 4;
>> +            (vlapic_get_reg(vlapic, APIC_TASKPRI) & 0xFF) >> 4;
>>          vmcb_set_vintr(vmcb, intr);
>>      }
>>  
>> @@ -2294,6 +2295,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>>      int inst_len, rc;
>>      vintr_t intr;
>>      bool_t vcpu_guestmode = 0;
>> +    struct vlapic *vlapic = vcpu_vlapic(v);
>>  
>>      hvm_invalidate_regs_fields(regs);
>>  
>> @@ -2311,11 +2313,11 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>>       * NB. We need to preserve the low bits of the TPR to make checked builds
>>       * of Windows work, even though they don't actually do anything.
>>       */
>> -    if ( !vcpu_guestmode ) {
>> +    if ( !vcpu_guestmode && !vlapic_hw_disabled(vlapic) ) {
>>          intr = vmcb_get_vintr(vmcb);
>> -        vlapic_set_reg(vcpu_vlapic(v), APIC_TASKPRI,
>> +        vlapic_set_reg(vlapic, APIC_TASKPRI,
>>                     ((intr.fields.tpr & 0x0F) << 4) |
>> -                   (vlapic_get_reg(vcpu_vlapic(v), APIC_TASKPRI) & 0x0F));
>> +                   (vlapic_get_reg(vlapic, APIC_TASKPRI) & 0x0F));
>>      }
>>  
>>      exit_reason = vmcb->exitcode;
>> @@ -2697,14 +2699,14 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>>      }
>>  
>>    out:
>> -    if ( vcpu_guestmode )
>> +    if ( vcpu_guestmode || vlapic_hw_disabled(vlapic) )
>>          /* Don't clobber TPR of the nested guest. */
> 
> The comment is now stale.

Thanks, I've removed it.

> Also - aren't all the changes to this file (and perhaps othersfurther
> down) bug fixes in their own right?

Whether they should be considered bugs or not it's hard to tell. There
was no code that executed this paths before with this configuration, and
probably nobody considered running HVM guests without an emulated lapic
a possibility, so I would argue that they are merely omissions.

>> @@ -1042,8 +1045,7 @@ void vlapic_tdt_msr_set(struct vlapic *vlapic, uint64_t value)
>>      uint64_t guest_tsc;
>>      struct vcpu *v = vlapic_vcpu(vlapic);
>>  
>> -    /* may need to exclude some other conditions like vlapic->hw.disabled */
>> -    if ( !vlapic_lvtt_tdt(vlapic) )
>> +    if ( !vlapic_lvtt_tdt(vlapic) || vlapic_hw_disabled(vlapic) )
>>      {
>>          HVM_DBG_LOG(DBG_LEVEL_VLAPIC_TIMER, "ignore tsc deadline msr write");
> 
> The newly added condition needless triggers the HVM_DBG_LOG().
> Please separate it.

Done.

>> @@ -1328,7 +1339,10 @@ static int lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
>>      uint16_t vcpuid;
>>      struct vcpu *v;
>>      struct vlapic *s;
>> -    
>> +
>> +    if ( !has_vlapic(d) )
>> +        return 0;
>> +
>>      /* Which vlapic to load? */
>>      vcpuid = hvm_load_instance(h); 
>>      if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
>> @@ -1360,7 +1374,10 @@ static int lapic_load_regs(struct domain *d, hvm_domain_context_t *h)
>>      uint16_t vcpuid;
>>      struct vcpu *v;
>>      struct vlapic *s;
>> -    
>> +
>> +    if ( !has_vlapic(d) )
>> +        return 0;
> 
> I agree that the save side should return zero in that case, but aren't
> load attempts invalid (and hence warrant an error return)?

Good point, ENODEV seems like the most appropriate error code here IMHO.

>> @@ -1399,7 +1416,7 @@ int vlapic_init(struct vcpu *v)
>>  
>>      HVM_DBG_LOG(DBG_LEVEL_VLAPIC, "%d", v->vcpu_id);
>>  
>> -    if ( is_pvh_vcpu(v) )
>> +    if ( is_pvh_vcpu(v) || !has_vlapic(v->domain) )
> 
> Isn't the latter alone sufficient?

Yes.

>> @@ -1452,6 +1469,9 @@ void vlapic_destroy(struct vcpu *v)
>>  {
>>      struct vlapic *vlapic = vcpu_vlapic(v);
>>  
>> +    if ( !has_vlapic(vlapic_domain(vlapic)) )
> 
> I think v->domain would be better here.

TBH, I don't have a strong opinion, if you think v->domain is clearer
I'm fine with that.

>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -482,6 +482,9 @@ found:
>>  
>>  void msixtbl_init(struct domain *d)
>>  {
>> +    if ( !has_vlapic(d) )
>> +        return;
>> +
>>      INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
>>      spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock);
> 
> Don't you also need to add guards to msixtbl_pt_{,un}register()?

Yes, that's right. XEN_DOMCTL_bind_pt_irq (which calls
msixtbl_pt_register) can be executed against HVM guests, and by
extensions against HVMlite guests.

>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -1002,7 +1002,7 @@ static int construct_vmcs(struct vcpu *v)
>>          ~(SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
>>            SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
>>  
>> -    if ( is_pvh_domain(d) )
>> +    if ( is_pvh_domain(d) || !has_vlapic(d) )
> 
> See above.

Fixed.

Thanks for the thorough review!

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 12/29] xen/x86: allow disabling the emulated local apic
  2015-09-23 15:45     ` Roger Pau Monné
@ 2015-09-24  7:57       ` Jan Beulich
  2015-09-25  9:00         ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-24  7:57 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, SuraveeSuthikulpanit, AndrewCooper, Eddie Dong,
	Aravind Gopalakrishnan, Jun Nakajima, xen-devel, Boris Ostrovsky

>>> On 23.09.15 at 17:45, <roger.pau@citrix.com> wrote:
> El 16/09/15 a les 12.05, Jan Beulich ha escrit:
>>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>> Also - aren't all the changes to this file (and perhaps othersfurther
>> down) bug fixes in their own right?
> 
> Whether they should be considered bugs or not it's hard to tell. There
> was no code that executed this paths before with this configuration, and
> probably nobody considered running HVM guests without an emulated lapic
> a possibility, so I would argue that they are merely omissions.

Whether these were active or latent bugs doesn't really matter.
What I'd prefer is for the code adjustments not directly related to
the feature suppression you work on to be in their own patch, so
that the two steps taken can be viewed as two steps. Particularly
if it later turns out that one or more of those apparent latent bugs
are found to be actively harming some special case, backporting
that adjustment without the feature suppression parts would
become a straightforward option.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 12/29] xen/x86: allow disabling the emulated local apic
  2015-09-24  7:57       ` Jan Beulich
@ 2015-09-25  9:00         ` Roger Pau Monné
  0 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-25  9:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, SuraveeSuthikulpanit, AndrewCooper, Eddie Dong,
	Aravind Gopalakrishnan, Jun Nakajima, xen-devel, Boris Ostrovsky

El 24/09/15 a les 9.57, Jan Beulich ha escrit:
>>>> On 23.09.15 at 17:45, <roger.pau@citrix.com> wrote:
>> El 16/09/15 a les 12.05, Jan Beulich ha escrit:
>>>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>>> Also - aren't all the changes to this file (and perhaps othersfurther
>>> down) bug fixes in their own right?
>>
>> Whether they should be considered bugs or not it's hard to tell. There
>> was no code that executed this paths before with this configuration, and
>> probably nobody considered running HVM guests without an emulated lapic
>> a possibility, so I would argue that they are merely omissions.
> 
> Whether these were active or latent bugs doesn't really matter.
> What I'd prefer is for the code adjustments not directly related to
> the feature suppression you work on to be in their own patch, so
> that the two steps taken can be viewed as two steps. Particularly
> if it later turns out that one or more of those apparent latent bugs
> are found to be actively harming some special case, backporting
> that adjustment without the feature suppression parts would
> become a straightforward option.

Ack, I'm going to split some of the changes in this patch to a pre-patch.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 17/29] xen/x86: allow disabling the emulated PIC
  2015-09-21 14:34   ` Jan Beulich
@ 2015-09-25 15:01     ` Roger Pau Monné
  0 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-25 15:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

El 21/09/15 a les 16.34, Jan Beulich ha escrit:
>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>> @@ -425,6 +431,9 @@ void vpic_reset(struct domain *d)
>>  
>>  void vpic_init(struct domain *d)
>>  {
>> +    if ( !has_vpic(d) )
>> +        return;
> 
> vpic_reset() above this function as well as functions further down
> in the source file aren't static, yet you aren't adding guards to them.
> I think here and in other similar patches you should, in the commit
> message, give reasons for any one not obviously being excluded
> (e.g. because used only for handling intercepts which aren't getting
> enabled) from the set needing such.

I've gone through the patches and added appropriate guards or asserts to
the public functions, depending on whether we are expecting them to be
called or not. I've also fixed all the _load functions to return ENODEV
if the device has been disabled (_save functions are just noops and
return 0).

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 18/29] xen/x86: allow disabling the emulated pmu
  2015-09-21 14:48     ` Boris Ostrovsky
@ 2015-09-25 15:07       ` Roger Pau Monné
  2015-09-25 15:13         ` Jan Beulich
  0 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-25 15:07 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich; +Cc: Andrew Cooper, xen-devel

El 21/09/15 a les 16.48, Boris Ostrovsky ha escrit:
> On 09/21/2015 10:36 AM, Jan Beulich wrote:
>>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>> Hmm - this seems questionable to me: Isn't the vPMU an optional
>> feature anyway? I.e. doesn't need separate handling here? Boris?
> 
> It is optional system-wise, not per-guest, which is what I think Roger
> is trying to do. I in fact wanted to add ability to disable VPMU per
> guest myself.
> 
> However, VPMU has nothing to do with device model so I don't think it
> should be part of this series from that perspective.

vpmu is enabled globally on the xen command line, there's no way to
disable it on a per-guest basis, so AFAICS it will be enabled for
HMVlite guests which is wrong because we don't have a vlapic.

IMHO, there's no need to add a flag to specifically disable the vpmu, we
can just decide whether to enable it based on vlapic presence and the
global vpmu flag.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 18/29] xen/x86: allow disabling the emulated pmu
  2015-09-25 15:07       ` Roger Pau Monné
@ 2015-09-25 15:13         ` Jan Beulich
  2015-09-25 15:22           ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-25 15:13 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Boris Ostrovsky, xen-devel

>>> On 25.09.15 at 17:07, <roger.pau@citrix.com> wrote:
> El 21/09/15 a les 16.48, Boris Ostrovsky ha escrit:
>> On 09/21/2015 10:36 AM, Jan Beulich wrote:
>>>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>>> Hmm - this seems questionable to me: Isn't the vPMU an optional
>>> feature anyway? I.e. doesn't need separate handling here? Boris?
>> 
>> It is optional system-wise, not per-guest, which is what I think Roger
>> is trying to do. I in fact wanted to add ability to disable VPMU per
>> guest myself.
>> 
>> However, VPMU has nothing to do with device model so I don't think it
>> should be part of this series from that perspective.
> 
> vpmu is enabled globally on the xen command line, there's no way to
> disable it on a per-guest basis, so AFAICS it will be enabled for
> HMVlite guests which is wrong because we don't have a vlapic.

Hmm, Boris specifically enabled vPMU to be available to all three
kinds of guests, and PVH in it s current shape doesn't have a
vLAPIC either. So making such a connection seems wrong to me.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-21 15:44   ` Jan Beulich
@ 2015-09-25 15:16     ` Andrew Cooper
  2015-09-25 15:52       ` Jan Beulich
  2015-09-28 16:09     ` Roger Pau Monné
  1 sibling, 1 reply; 99+ messages in thread
From: Andrew Cooper @ 2015-09-25 15:16 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monne; +Cc: xen-devel, Stefano Stabellini, Ian Campbell


On 21/09/15 16:44, Jan Beulich wrote:
>> +    uint32_t cr0;
>> +    uint32_t cr3;
>> +    uint32_t cr4;
>> +    uint64_t efer;
> What again was the point of having EFER here?
>

EFER.NX must match what the BSP chose to set up in the pagetables 
pointed to in %cr3, or a triple fault will occur.

~Andrew

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 18/29] xen/x86: allow disabling the emulated pmu
  2015-09-25 15:13         ` Jan Beulich
@ 2015-09-25 15:22           ` Roger Pau Monné
  2015-09-25 15:41             ` Boris Ostrovsky
  0 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-25 15:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Boris Ostrovsky, xen-devel

El 25/09/15 a les 17.13, Jan Beulich ha escrit:
>>>> On 25.09.15 at 17:07, <roger.pau@citrix.com> wrote:
>> El 21/09/15 a les 16.48, Boris Ostrovsky ha escrit:
>>> On 09/21/2015 10:36 AM, Jan Beulich wrote:
>>>>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>>>> Hmm - this seems questionable to me: Isn't the vPMU an optional
>>>> feature anyway? I.e. doesn't need separate handling here? Boris?
>>>
>>> It is optional system-wise, not per-guest, which is what I think Roger
>>> is trying to do. I in fact wanted to add ability to disable VPMU per
>>> guest myself.
>>>
>>> However, VPMU has nothing to do with device model so I don't think it
>>> should be part of this series from that perspective.
>>
>> vpmu is enabled globally on the xen command line, there's no way to
>> disable it on a per-guest basis, so AFAICS it will be enabled for
>> HMVlite guests which is wrong because we don't have a vlapic.
> 
> Hmm, Boris specifically enabled vPMU to be available to all three
> kinds of guests, and PVH in it s current shape doesn't have a
> vLAPIC either. So making such a connection seems wrong to me.

IMHO, I would prefer the vPMU support in HVMlite guests to be tied to a
vlapic presence, so we can do it the HVM way instead of having to do
hypercalls.

One think that could work is changing the is_hvm_* calls in vpmu.c to
has_vlapic. This way HVMlite guests could still use the PV vPMU
interface, but once a lapic is enabled we could use the native way.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 18/29] xen/x86: allow disabling the emulated pmu
  2015-09-25 15:22           ` Roger Pau Monné
@ 2015-09-25 15:41             ` Boris Ostrovsky
  0 siblings, 0 replies; 99+ messages in thread
From: Boris Ostrovsky @ 2015-09-25 15:41 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 09/25/2015 11:22 AM, Roger Pau Monné wrote:
> El 25/09/15 a les 17.13, Jan Beulich ha escrit:
>>>>> On 25.09.15 at 17:07, <roger.pau@citrix.com> wrote:
>>> El 21/09/15 a les 16.48, Boris Ostrovsky ha escrit:
>>>> On 09/21/2015 10:36 AM, Jan Beulich wrote:
>>>>>>>> On 04.09.15 at 14:08, <roger.pau@citrix.com> wrote:
>>>>> Hmm - this seems questionable to me: Isn't the vPMU an optional
>>>>> feature anyway? I.e. doesn't need separate handling here? Boris?
>>>> It is optional system-wise, not per-guest, which is what I think Roger
>>>> is trying to do. I in fact wanted to add ability to disable VPMU per
>>>> guest myself.
>>>>
>>>> However, VPMU has nothing to do with device model so I don't think it
>>>> should be part of this series from that perspective.
>>> vpmu is enabled globally on the xen command line, there's no way to
>>> disable it on a per-guest basis, so AFAICS it will be enabled for
>>> HMVlite guests which is wrong because we don't have a vlapic.
>> Hmm, Boris specifically enabled vPMU to be available to all three
>> kinds of guests, and PVH in it s current shape doesn't have a
>> vLAPIC either. So making such a connection seems wrong to me.
> IMHO, I would prefer the vPMU support in HVMlite guests to be tied to a
> vlapic presence, so we can do it the HVM way instead of having to do
> hypercalls.
>
> One think that could work is changing the is_hvm_* calls in vpmu.c to
> has_vlapic. This way HVMlite guests could still use the PV vPMU
> interface, but once a lapic is enabled we could use the native way.

Yes, I think that would work.

-boris

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-25 15:16     ` Andrew Cooper
@ 2015-09-25 15:52       ` Jan Beulich
  0 siblings, 0 replies; 99+ messages in thread
From: Jan Beulich @ 2015-09-25 15:52 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monne
  Cc: xen-devel, Stefano Stabellini, Ian Campbell

>>> On 25.09.15 at 17:16, <andrew.cooper3@citrix.com> wrote:
> On 21/09/15 16:44, Jan Beulich wrote:
>>> +    uint32_t cr0;
>>> +    uint32_t cr3;
>>> +    uint32_t cr4;
>>> +    uint64_t efer;
>> What again was the point of having EFER here?
> 
> EFER.NX must match what the BSP chose to set up in the pagetables 
> pointed to in %cr3, or a triple fault will occur.

That is assuming the AP is to start up on shared page tables, which
is not a given. Hence a comment seems warranted to state that this
is there to allow such a usage by the guest.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 22/29] elfnotes: intorduce a new PHYS_ENTRY elfnote
  2015-09-21 14:47   ` Jan Beulich
@ 2015-09-28 10:35     ` Roger Pau Monné
  2015-09-28 10:56       ` Jan Beulich
  0 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-28 10:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Jackson, xen-devel, Wei Liu, Ian Campbell, Stefano Stabellini

El 21/09/15 a les 16.47, Jan Beulich ha escrit:
>>>> On 04.09.15 at 14:09, <roger.pau@citrix.com> wrote:
>> --- a/xen/include/public/elfnote.h
>> +++ b/xen/include/public/elfnote.h
>> @@ -200,9 +200,18 @@
>>  #define XEN_ELFNOTE_SUPPORTED_FEATURES 17
>>  
>>  /*
>> + * Physical entry point into the kernel.
>> + *
>> + * 32bit entry point into the kernel. Xen will use this entry point
>> + * in order to launch the guest kernel in 32bit protected mode
>> + * with paging disabled.
>> + */
>> +#define XEN_ELFNOTE_PHYS32_ENTRY 18
> 
> The comment reads as if this was the case for all kinds of guests,
> yet I suppose it doesn't apply to PV ones. This should be made
> explicit if so.

Yes, what about the following:

32bit entry point into the kernel. Xen will use this entry point
in order to launch the guest kernel in 32bit protected mode with paging
disabled inside of an HVM container.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 22/29] elfnotes: intorduce a new PHYS_ENTRY elfnote
  2015-09-28 10:35     ` Roger Pau Monné
@ 2015-09-28 10:56       ` Jan Beulich
  2015-09-28 10:59         ` Andrew Cooper
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-28 10:56 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ian Jackson, xen-devel, Wei Liu, Ian Campbell, Stefano Stabellini

>>> On 28.09.15 at 12:35, <roger.pau@citrix.com> wrote:
> El 21/09/15 a les 16.47, Jan Beulich ha escrit:
>>>>> On 04.09.15 at 14:09, <roger.pau@citrix.com> wrote:
>>> --- a/xen/include/public/elfnote.h
>>> +++ b/xen/include/public/elfnote.h
>>> @@ -200,9 +200,18 @@
>>>  #define XEN_ELFNOTE_SUPPORTED_FEATURES 17
>>>  
>>>  /*
>>> + * Physical entry point into the kernel.
>>> + *
>>> + * 32bit entry point into the kernel. Xen will use this entry point
>>> + * in order to launch the guest kernel in 32bit protected mode
>>> + * with paging disabled.
>>> + */
>>> +#define XEN_ELFNOTE_PHYS32_ENTRY 18
>> 
>> The comment reads as if this was the case for all kinds of guests,
>> yet I suppose it doesn't apply to PV ones. This should be made
>> explicit if so.
> 
> Yes, what about the following:
> 
> 32bit entry point into the kernel. Xen will use this entry point
> in order to launch the guest kernel in 32bit protected mode with paging
> disabled inside of an HVM container.

Depends: If the note's presence means this and only this entry point
will be used, then okay. If, however, normal PV and/or HVM operation
of such a guest is still intended to be possible, then I think this is still
too vague. Perhaps

32bit entry point into the kernel. When requested to launch the
guest kernel in a HVM container, Xen will use this entry point to
launch the guest in 32bit protected mode with paging disabled.
Ignored otherwise.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 22/29] elfnotes: intorduce a new PHYS_ENTRY elfnote
  2015-09-28 10:56       ` Jan Beulich
@ 2015-09-28 10:59         ` Andrew Cooper
  0 siblings, 0 replies; 99+ messages in thread
From: Andrew Cooper @ 2015-09-28 10:59 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, Stefano Stabellini, Ian Jackson, Wei Liu, Ian Campbell

On 28/09/15 11:56, Jan Beulich wrote:
>>>> On 28.09.15 at 12:35, <roger.pau@citrix.com> wrote:
>> El 21/09/15 a les 16.47, Jan Beulich ha escrit:
>>>>>> On 04.09.15 at 14:09, <roger.pau@citrix.com> wrote:
>>>> --- a/xen/include/public/elfnote.h
>>>> +++ b/xen/include/public/elfnote.h
>>>> @@ -200,9 +200,18 @@
>>>>  #define XEN_ELFNOTE_SUPPORTED_FEATURES 17
>>>>  
>>>>  /*
>>>> + * Physical entry point into the kernel.
>>>> + *
>>>> + * 32bit entry point into the kernel. Xen will use this entry point
>>>> + * in order to launch the guest kernel in 32bit protected mode
>>>> + * with paging disabled.
>>>> + */
>>>> +#define XEN_ELFNOTE_PHYS32_ENTRY 18
>>> The comment reads as if this was the case for all kinds of guests,
>>> yet I suppose it doesn't apply to PV ones. This should be made
>>> explicit if so.
>> Yes, what about the following:
>>
>> 32bit entry point into the kernel. Xen will use this entry point
>> in order to launch the guest kernel in 32bit protected mode with paging
>> disabled inside of an HVM container.
> Depends: If the note's presence means this and only this entry point
> will be used, then okay. If, however, normal PV and/or HVM operation
> of such a guest is still intended to be possible, then I think this is still
> too vague. Perhaps
>
> 32bit entry point into the kernel. When requested to launch the
> guest kernel in a HVM container, Xen will use this entry point to
> launch the guest in 32bit protected mode with paging disabled.
> Ignored otherwise.

A multi-mode binary seems very likely, certainly for the near future.

As such, this param is an indication of supporting DMLite.  The guest
kernel knows it was started in DMLite mode if this is the entry point used.

If another entry point is used, the guest was not started in DMLite mode.

~Andrew

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 20/29] xen/x86: allow disabling the emulated IOMMU
  2015-09-04 12:08 ` [PATCH v6 20/29] xen/x86: allow disabling the emulated IOMMU Roger Pau Monne
@ 2015-09-28 13:58   ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 99+ messages in thread
From: Aravind Gopalakrishnan @ 2015-09-28 13:58 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Suravee Suthikulpanit

On 9/4/2015 7:08 AM, Roger Pau Monne wrote:
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> ---
> Changes since v4:
>   - Add Andrew Cooper Acked-by.
> ---
>   xen/drivers/passthrough/amd/iommu_guest.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
> index e74f469..b4e75ac 100644
> --- a/xen/drivers/passthrough/amd/iommu_guest.c
> +++ b/xen/drivers/passthrough/amd/iommu_guest.c
> @@ -887,7 +887,8 @@ int guest_iommu_init(struct domain* d)
>       struct guest_iommu *iommu;
>       struct hvm_iommu *hd  = domain_hvm_iommu(d);
>   
> -    if ( !is_hvm_domain(d) || !iommu_enabled || !iommuv2_enabled )
> +    if ( !is_hvm_domain(d) || !iommu_enabled || !iommuv2_enabled ||
> +         !has_viommu(d) )
>           return 0;
>   
>       iommu = xzalloc(struct guest_iommu);


Acked-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>

Thanks,
-Aravind.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-21 15:44   ` Jan Beulich
  2015-09-25 15:16     ` Andrew Cooper
@ 2015-09-28 16:09     ` Roger Pau Monné
  2015-09-29  7:09       ` Jan Beulich
  1 sibling, 1 reply; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-28 16:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, xen-devel

El 21/09/15 a les 17.44, Jan Beulich ha escrit:
>>>> On 04.09.15 at 14:09, <roger.pau@citrix.com> wrote:
>> Allow the usage of the VCPUOP_initialise, VCPUOP_up, VCPUOP_down and
>> VCPUOP_is_up hypercalls from HVM guests.
>>
>> This patch introduces a new structure (vcpu_hvm_context) that should be used
>> in conjuction with the VCPUOP_initialise hypercall in order to initialize
>> vCPUs for HVM guests.
>>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> So this bi-modal thing doesn't look too bad, but a concern I have is
> with the now different contexts used by XEN_DOMCTL_setvcpucontext
> and VCPUOP_initialise.

Yes, that's far from ideal. I was going to say that
XEN_DOMCTL_{set/get}vcpucontext should return EOPNOTSUPP when executed
against HVM guests, but that's going to break current toolstack code
that relies on this in order to perform suspend/resume of HVM guests
(and gdbsx would probably also be affected).

If you feel that would be a better solution I could fix current tools
code in order to use XEN_DOMCTL_{get/set}hvmcontext instead of
XEN_DOMCTL_{set/get}vcpucontext when dealing with HVM guests.

>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -752,6 +752,30 @@ int arch_set_info_guest(
>>      return 0;
>>  }
>>  
>> +int arch_initialize_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
>> +{
>> +    struct vcpu_guest_context *ctxt;
>> +    struct domain *d = v->domain;
>> +    int rc;
>> +
>> +    if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
>> +        return -ENOMEM;
>> +
>> +    if ( copy_from_guest(ctxt, arg, 1) )
>> +    {
>> +        free_vcpu_guest_context(ctxt);
>> +        return -EFAULT;
>> +    }
>> +
>> +    domain_lock(d);
>> +    rc = v->is_initialised ? -EEXIST : arch_set_info_guest(v, ctxt);
>> +    domain_unlock(d);
>> +
>> +    free_vcpu_guest_context(ctxt);
>> +
>> +    return rc;
>> +}
> 
> I wonder whether this shouldn't instead be kept in common code,
> with arch code calling it as needed (e.g. as default_initialize_vcpu()),
> since afaict the code is now duplicate with the x86 side PV handling.

Ack.

>> @@ -1140,6 +1141,201 @@ int arch_set_info_guest(
>>  #undef c
>>  }
>>  
>> +/* Called by VCPUOP_initialise for HVM guests. */
>> +static int arch_set_info_hvm_guest(struct vcpu *v, vcpu_hvm_context_t *ctx)
>> +{
>> +    struct cpu_user_regs *uregs = &v->arch.user_regs;
>> +    struct segment_register cs, ds, ss, tr;
>> +
>> +#define SEG(s, r)                                                       \
>> +    (struct segment_register){ .base = (r)->s ## _base,                 \
>> +            .limit = (r)->s ## _limit, .attr.bytes = (r)->s ## _ar }
>> +
>> +    switch ( ctx->mode )
>> +    {
>> +    default:
>> +        return -EINVAL;
>> +
>> +    case VCPU_HVM_MODE_16B:
> 
> I think "MODE" is misleading here, not just because of the register
> size issue (see further down) but also because you don't seem to
> enforce the respective mode to be chosen in cs_ar. I'm also missing
> at least some simple consistency checks (like CS.DPL == SS.DPL,
> rIP within CS limit, rSP within SS limit); leaving these to the first VM
> entry would likely result in much harder to analyze issues in case of
> bad input.

See the comment about the 16bit registers in vcpu_hvm_context_t.

>> +    {
>> +        const struct vcpu_hvm_x86_16 *regs = &ctx->cpu_regs.x86_16;
>> +
>> +        uregs->rax    = regs->ax;
>> +        uregs->rcx    = regs->cx;
>> +        uregs->rdx    = regs->dx;
>> +        uregs->rbx    = regs->bx;
>> +        uregs->rsp    = regs->sp;
>> +        uregs->rbp    = regs->bp;
>> +        uregs->rsi    = regs->si;
>> +        uregs->rdi    = regs->di;
>> +        uregs->rip    = regs->ip;
>> +        uregs->rflags = regs->flags;
>> +
>> +        v->arch.hvm_vcpu.guest_cr[0] = regs->cr0;
>> +        v->arch.hvm_vcpu.guest_cr[4] = regs->cr4;
>> +
>> +        cs = SEG(cs, regs);
>> +        ds = SEG(ds, regs);
>> +        ss = SEG(ss, regs);
>> +        tr = SEG(tr, regs);
>> +    }
>> +    break;
>> +
>> +    case VCPU_HVM_MODE_32B:
>> +    {
>> +        const struct vcpu_hvm_x86_32 *regs = &ctx->cpu_regs.x86_32;
>> +
>> +        uregs->rax    = regs->eax;
>> +        uregs->rcx    = regs->ecx;
>> +        uregs->rdx    = regs->edx;
>> +        uregs->rbx    = regs->ebx;
>> +        uregs->rsp    = regs->esp;
>> +        uregs->rbp    = regs->ebp;
>> +        uregs->rsi    = regs->esi;
>> +        uregs->rdi    = regs->edi;
>> +        uregs->rip    = regs->eip;
>> +        uregs->rflags = regs->eflags;
>> +
>> +        v->arch.hvm_vcpu.guest_cr[0] = regs->cr0;
>> +        v->arch.hvm_vcpu.guest_cr[3] = regs->cr3;
>> +        v->arch.hvm_vcpu.guest_cr[4] = regs->cr4;
>> +        v->arch.hvm_vcpu.guest_efer  = regs->efer;
>> +
>> +        cs = SEG(cs, regs);
>> +        ds = SEG(ds, regs);
>> +        ss = SEG(ss, regs);
>> +        tr = SEG(tr, regs);
>> +    }
>> +    break;
>> +
>> +    case VCPU_HVM_MODE_64B:
>> +    {
>> +        const struct vcpu_hvm_x86_64 *regs = &ctx->cpu_regs.x86_64;
>> +
>> +        uregs->rax    = regs->rax;
>> +        uregs->rcx    = regs->rcx;
>> +        uregs->rdx    = regs->rdx;
>> +        uregs->rbx    = regs->rbx;
>> +        uregs->rsp    = regs->rsp;
>> +        uregs->rbp    = regs->rbp;
>> +        uregs->rsi    = regs->rsi;
>> +        uregs->rdi    = regs->rdi;
>> +        uregs->rip    = regs->rip;
>> +        uregs->rflags = regs->rflags;
>> +        uregs->r8     = regs->r8;
>> +        uregs->r9     = regs->r9;
>> +        uregs->r10    = regs->r10;
>> +        uregs->r11    = regs->r11;
>> +        uregs->r12    = regs->r12;
>> +        uregs->r13    = regs->r13;
>> +        uregs->r14    = regs->r14;
>> +        uregs->r15    = regs->r15;
>> +
>> +        v->arch.hvm_vcpu.guest_cr[0] = regs->cr0;
>> +        v->arch.hvm_vcpu.guest_cr[3] = regs->cr3;
>> +        v->arch.hvm_vcpu.guest_cr[4] = regs->cr4;
>> +        v->arch.hvm_vcpu.guest_efer  = regs->efer;
>> +
>> +        cs = SEG(cs, regs);
>> +        ds = SEG(ds, regs);
>> +        ss = SEG(ss, regs);
>> +        tr = SEG(tr, regs);
>> +    }
>> +    break;
>> +
>> +    }
>> +
>> +    if ( !paging_mode_hap(v->domain) )
>> +        v->arch.guest_table = pagetable_null();
> 
> A comment with the reason for this would be nice. I can't immediately
> see why this is here.

guest_table contains uninitialized data at this point, which makes
pagetable_is_null return false. Other places where guest_table is set
check if previous guest_table is null or not, and if it's not null Xen
will try to free it, causing a bug. hvm_vcpu_reset_state does something
similar when setting the initial vCPU state.

I will add a comment to point this out.

>> +    hvm_update_guest_cr(v, 0);
>> +    hvm_update_guest_cr(v, 4);
>> +
>> +    if ( (ctx->mode == VCPU_HVM_MODE_32B) ||
>> +         (ctx->mode == VCPU_HVM_MODE_64B) )
>> +    {
>> +        hvm_update_guest_cr(v, 3);
>> +        hvm_update_guest_efer(v);
>> +    }
>> +
>> +    if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) )
>> +    {
>> +        /* Shadow-mode CR3 change. Check PDBR and update refcounts. */
>> +        struct page_info *page = get_page_from_gfn(v->domain,
>> +                                 v->arch.hvm_vcpu.guest_cr[3] >> PAGE_SHIFT,
>> +                                 NULL, P2M_ALLOC);
> 
> P2M_ALLOC but not P2M_UNSHARE?

This is a copy of what's done in hvm_set_cr3 when shadow mode is enabled.

>> +        if ( !page )
>> +        {
>> +            gdprintk(XENLOG_ERR, "Invalid CR3: %#lx\n",
>> +                     v->arch.hvm_vcpu.guest_cr[3]);
>> +            domain_crash(v->domain);
>> +            return -EINVAL;
>> +        }
>> +
>> +        v->arch.guest_table = pagetable_from_page(page);
>> +    }
>> +
>> +    hvm_set_segment_register(v, x86_seg_cs, &cs);
>> +    hvm_set_segment_register(v, x86_seg_ds, &ds);
>> +    hvm_set_segment_register(v, x86_seg_ss, &ss);
>> +    hvm_set_segment_register(v, x86_seg_tr, &tr);
>> +
>> +    /* Sync AP's TSC with BSP's. */
>> +    v->arch.hvm_vcpu.cache_tsc_offset =
>> +        v->domain->vcpu[0]->arch.hvm_vcpu.cache_tsc_offset;
>> +    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset,
>> +                             v->domain->arch.hvm_domain.sync_tsc);
>> +
>> +    v->arch.hvm_vcpu.msr_tsc_adjust = 0;
> 
> The need for this one I also can't see right away - another comment
> perhaps?

AFAICT we need to initialize this to a sane value, like it's done in
hvm_vcpu_reset_state.

>> +    paging_update_paging_modes(v);
>> +
>> +    v->is_initialised = 1;
>> +    set_bit(_VPF_down, &v->pause_flags);
> 
> No VGCF_online equivalent?

No, this interface doesn't allow to setup the vCPU state and start it at
the same time. Users will have to call VCPUOP_up after VCPUOP_initialise
in order to start the vCPU. vcpu_hvm_context doesn't even have the
"flags" field any more, where this used to be set.

>> +    return 0;
>> +#undef SEG
> 
> I think this should move up (right after the switch() using it).

Ok, I don't mind.

>> +#ifndef __XEN_PUBLIC_HVM_HVM_VCPU_H__
>> +#define __XEN_PUBLIC_HVM_HVM_VCPU_H__
>> +
>> +#include "../xen.h"
>> +
>> +struct vcpu_hvm_x86_16 {
>> +    uint16_t ax;
>> +    uint16_t cx;
>> +    uint16_t dx;
>> +    uint16_t bx;
>> +    uint16_t sp;
>> +    uint16_t bp;
>> +    uint16_t si;
>> +    uint16_t di;
>> +    uint16_t ip;
>> +    uint16_t flags;
>> +
>> +    uint32_t cr0;
>> +    uint32_t cr4;
>> +
>> +    uint32_t cs_base;
>> +    uint32_t ds_base;
>> +    uint32_t ss_base;
>> +    uint32_t tr_base;
>> +    uint32_t cs_limit;
>> +    uint32_t ds_limit;
>> +    uint32_t ss_limit;
>> +    uint32_t tr_limit;
>> +    uint16_t cs_ar;
>> +    uint16_t ds_ar;
>> +    uint16_t ss_ar;
>> +    uint16_t tr_ar;
>> +};
> 
> I doubt this is very useful: While one ought to be able to start a
> guest in 16-bit mode, its GPRs still are supposed to be 32 bits wide.
> The mode used really would depend on the cs_ar setting. (Having
> the structure just for a 16-bit IP would seem insane.)

So, would you prefer to go just with the 64-bit structure and the mode
is simply set by the value of the control registers / segment selectors?

>> +struct vcpu_hvm_x86_32 {
>> +    uint32_t eax;
>> +    uint32_t ecx;
>> +    uint32_t edx;
>> +    uint32_t ebx;
>> +    uint32_t esp;
>> +    uint32_t ebp;
>> +    uint32_t esi;
>> +    uint32_t edi;
>> +    uint32_t eip;
>> +    uint16_t eflags;
> 
> uint32_t for sure.

Right.

> 
>> +    uint32_t cr0;
>> +    uint32_t cr3;
>> +    uint32_t cr4;
>> +    uint64_t efer;
> 
> What again was the point of having EFER here?

For the NX bit.

>> +    uint32_t cs_base;
>> +    uint32_t ds_base;
>> +    uint32_t ss_base;
> 
> I continue to question why we have DS here, but not ES (and maybe
> FS and GS too). I.e. either just CS and SS (which are architecturally
> required) or at least all four traditional x86 segment registers. And
> you're also clearly not targeting minimal state, or else there likely
> wouldn't be a need for e.g. R8-R15 in the 64-bit variant.

I'm fine with removing r8-15. Regarding the segment selectors, I don't
have a problem with only allowing CS and SS to be set, or all of them
including FS and GS. But I would like to get a consensus on this, we
have already gone back and forth several times regarding how this
structure should look like, and TBH, I was hoping that this was the last
time.

Andrew, Jan, what would you prefer, either DS is removed or ES, FS and
GS are also added?

>> +struct vcpu_hvm_x86_64 {
>> +    uint64_t rax;
>> +    uint64_t rcx;
>> +    uint64_t rdx;
>> +    uint64_t rbx;
>> +    uint64_t rsp;
>> +    uint64_t rbp;
>> +    uint64_t rsi;
>> +    uint64_t rdi;
>> +    uint64_t r8;
>> +    uint64_t r9;
>> +    uint64_t r10;
>> +    uint64_t r11;
>> +    uint64_t r12;
>> +    uint64_t r13;
>> +    uint64_t r14;
>> +    uint64_t r15;
>> +    uint64_t rip;
>> +    uint64_t rflags;
>> +
>> +    uint64_t cr0;
>> +    uint64_t cr3;
>> +    uint64_t cr4;
>> +    uint64_t efer;
>> +
>> +    uint32_t cs_base;
>> +    uint32_t ds_base;
>> +    uint32_t ss_base;
>> +    uint32_t tr_base;
>> +    uint32_t cs_limit;
>> +    uint32_t ds_limit;
>> +    uint32_t ss_limit;
>> +    uint32_t tr_limit;
> 
> I can see the need for the TR ones here, but what's the point of the
> CS, SS, and DS ones?

In case the guest wants to start in 64bit compatibility mode?

>> +/*
>> + * The layout of the _ar fields of the segment registers is the
>> + * following:
>> + *
>> + * Bits [0,3]: type (bits 40-43).
>> + * Bit      4: s    (descriptor type, bit 44).
>> + * Bit  [5,6]: dpl  (descriptor privilege level, bits 45-46).
>> + * Bit      7: p    (segment-present, bit 47).
>> + * Bit      8: avl  (available for system software, bit 52).
>> + * Bit      9: l    (64-bit code segment, bit 53).
>> + * Bit     10: db   (meaning depends on the segment, bit 54).
>> + * Bit     11: g    (granularity, bit 55)
>> + *
>> + * A more complete description of the meaning of this fields can be
>> + * obtained from the Intel SDM, Volume 3, section 3.4.5.
>> + */
> 
> Please make explicit what state bits 12-15 are expected to be in
> (hopefully they're being checked to be zero rather than getting
> ignored).

I will add a comment regarding the 12-15 bits. IMHO, that checking
should be done in hvm_set_segment_register, and is not part of this patch.

> Please also clarify whether the limit specified for the various
> segment registers is the one present in descriptor tables (subject
> to scaling by the g bit) or (more likely) the byte granular one. In
> any event I suppose certain restrictions apply.

Limit is not subject to the g bit, the limit set here it's the one that
ends up present in the descriptor tables (or in this case in the cached
part of the selectors). IMHO that's implicit, and it's what's done on
bare metal.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 26/29] libxc/xen: introduce a start info structure for HVMlite guests
  2015-09-21 15:53   ` Jan Beulich
@ 2015-09-28 16:51     ` Roger Pau Monné
  0 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-28 16:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Ian Jackson, xen-devel

El 21/09/15 a les 17.53, Jan Beulich ha escrit:
>>>> On 04.09.15 at 14:09, <roger.pau@citrix.com> wrote:
> 
> First of all - I suppose it is intentional for this to not consider the Dom0
> side (yet)?

Yes, let's leave Dom0 for a later patch series please, this is already
big enough.

>> --- a/tools/libxc/xc_dom_x86.c
>> +++ b/tools/libxc/xc_dom_x86.c
>> @@ -560,7 +560,70 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
>>      xc_hvm_param_set(xch, domid, HVM_PARAM_SHARING_RING_PFN,
>>                       special_pfn(SPECIALPAGE_SHARING));
>>  
>> -    if ( dom->device_model )
>> +    if ( !dom->device_model )
>> +    {
>> +        struct xc_dom_seg seg;
>> +        struct hvm_start_info *start_info;
>> +        char *cmdline;
>> +        struct hvm_modlist_entry *modlist;
>> +        void *start_page;
>> +        size_t cmdline_size = 0;
>> +        size_t start_info_size = sizeof(*start_info);
>> +
>> +        if ( dom->cmdline )
>> +        {
>> +            cmdline_size = ROUNDUP(strlen(dom->cmdline) + 1, 8);
>> +            start_info_size += cmdline_size;
>> +
>> +        }
>> +        if ( dom->ramdisk_blob )
>> +            start_info_size += sizeof(*modlist); /* Limited to one module. */
>> +
>> +        rc = xc_dom_alloc_segment(dom, &seg, "HVMlite start info", 0,
>> +                                  start_info_size);
>> +        if ( rc != 0 )
>> +        {
>> +            DOMPRINTF("Unable to reserve memory for the start info");
>> +            goto out;
>> +        }
>> +
>> +        start_page = xc_map_foreign_range(xch, domid, start_info_size,
>> +                                          PROT_READ | PROT_WRITE,
>> +                                          seg.pfn);
>> +        if ( start_page == NULL )
>> +        {
>> +            DOMPRINTF("Unable to map HVM start info page");
>> +            goto error_out;
>> +        }
>> +
>> +        start_info = start_page;
>> +        cmdline = start_page + sizeof(*start_info);
>> +        modlist = start_page + sizeof(*start_info) + cmdline_size;
>> +
>> +        if ( dom->cmdline )
>> +        {
>> +            strncpy(cmdline, dom->cmdline, MAX_GUEST_CMDLINE);
>> +            cmdline[MAX_GUEST_CMDLINE - 1] = '\0';
>> +            start_info->cmdline_paddr = (seg.pfn << PAGE_SHIFT) +
> 
> Not knowing much about the tools interface used for allocation
> above: Does that interface guarantee this shift (and another
> one below) to not overflow?

I can add a check to make sure pfn are always below the 4GB boundary.

>> +                                ((xen_pfn_t)cmdline - (xen_pfn_t)start_info);
> 
> xen_pfn_t? Aren't these byte addresses?

Right, these are not pfns.

>> --- a/xen/include/public/xen.h
>> +++ b/xen/include/public/xen.h
>> @@ -784,6 +784,25 @@ struct start_info {
>>  };
>>  typedef struct start_info start_info_t;
>>  
>> +/*
>> + * Start of day structure passed to PVH guests in %ebx.
>> + */
> 
> This is a single line comment.

Ack.

>> +struct hvm_start_info {
>> +#define HVM_START_MAGIC_VALUE 0x336ec578
>> +    uint32_t magic;             /* Contains the magic value 0x336ec578       */
>> +                                /* ("xEn3" with the 0x80 bit of the "E" set).*/
>> +    uint32_t flags;             /* SIF_xxx flags.                            */
>> +    uint32_t cmdline_paddr;     /* Physical address of the command line.     */
>> +    uint32_t nr_modules;        /* Number of modules passed to the kernel.   */
>> +    uint32_t modlist_paddr;     /* Physical address of an array of           */
>> +                                /* hvm_modlist_entry.                        */
>> +};
>> +
>> +struct hvm_modlist_entry {
>> +    uint32_t paddr;             /* Physical address of the module.           */
>> +    uint32_t size;              /* Size of the module in bytes.              */
>> +};
> 
> Iirc this already went back and forth, but - is this meant to be an
> x86-specific interface? If not, should we really limit physical
> addresses to 32 bits here?

That's what I though initially (x86 only), and nobody on the ARM side
expressed interest in using it there. Ian, Stefano, do you foresee this
being used for ARM also?

> Also this now sits inside a XEN_HAVE_PV_GUEST_ENTRY conditional
> - is that intended?

That's hard to tell, I would consider a guest/kernel using this
structure a PV guest (PVH), but I'm not sure if we can make a clear cut
here.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-28 16:09     ` Roger Pau Monné
@ 2015-09-29  7:09       ` Jan Beulich
  2015-09-29  8:53         ` Tim Deegan
  2015-09-29 10:00         ` Andrew Cooper
  0 siblings, 2 replies; 99+ messages in thread
From: Jan Beulich @ 2015-09-29  7:09 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

>>> On 28.09.15 at 18:09, <roger.pau@citrix.com> wrote:
> El 21/09/15 a les 17.44, Jan Beulich ha escrit:
>>>>> On 04.09.15 at 14:09, <roger.pau@citrix.com> wrote:
>>> Allow the usage of the VCPUOP_initialise, VCPUOP_up, VCPUOP_down and
>>> VCPUOP_is_up hypercalls from HVM guests.
>>>
>>> This patch introduces a new structure (vcpu_hvm_context) that should be used
>>> in conjuction with the VCPUOP_initialise hypercall in order to initialize
>>> vCPUs for HVM guests.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> 
>> So this bi-modal thing doesn't look too bad, but a concern I have is
>> with the now different contexts used by XEN_DOMCTL_setvcpucontext
>> and VCPUOP_initialise.
> 
> Yes, that's far from ideal. I was going to say that
> XEN_DOMCTL_{set/get}vcpucontext should return EOPNOTSUPP when executed
> against HVM guests, but that's going to break current toolstack code
> that relies on this in order to perform suspend/resume of HVM guests
> (and gdbsx would probably also be affected).
> 
> If you feel that would be a better solution I could fix current tools
> code in order to use XEN_DOMCTL_{get/set}hvmcontext instead of
> XEN_DOMCTL_{set/get}vcpucontext when dealing with HVM guests.

That might be a follow-up thing, subject to the tools maintainers
agreeing.

>>> +    if ( !paging_mode_hap(v->domain) )
>>> +        v->arch.guest_table = pagetable_null();
>> 
>> A comment with the reason for this would be nice. I can't immediately
>> see why this is here.
> 
> guest_table contains uninitialized data at this point, which makes
> pagetable_is_null return false. Other places where guest_table is set
> check if previous guest_table is null or not, and if it's not null Xen
> will try to free it, causing a bug. hvm_vcpu_reset_state does something
> similar when setting the initial vCPU state.

Truly uninitialized is not possible, as struct vcpu starts out zeroed.
Hence if anything the field may hold left over data, in which case
the store would better go where the reference becomes dangling.

>>> +    hvm_update_guest_cr(v, 0);
>>> +    hvm_update_guest_cr(v, 4);
>>> +
>>> +    if ( (ctx->mode == VCPU_HVM_MODE_32B) ||
>>> +         (ctx->mode == VCPU_HVM_MODE_64B) )
>>> +    {
>>> +        hvm_update_guest_cr(v, 3);
>>> +        hvm_update_guest_efer(v);
>>> +    }
>>> +
>>> +    if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) )
>>> +    {
>>> +        /* Shadow-mode CR3 change. Check PDBR and update refcounts. */
>>> +        struct page_info *page = get_page_from_gfn(v->domain,
>>> +                                 v->arch.hvm_vcpu.guest_cr[3] >> PAGE_SHIFT,
>>> +                                 NULL, P2M_ALLOC);
>> 
>> P2M_ALLOC but not P2M_UNSHARE?
> 
> This is a copy of what's done in hvm_set_cr3 when shadow mode is enabled.

As said on IRC - sadly we have to mem-sharing maintainer to ask. I
wonder whether the past or current x86/mm maintainer would know
- Tim, George?

>>> +    /* Sync AP's TSC with BSP's. */
>>> +    v->arch.hvm_vcpu.cache_tsc_offset =
>>> +        v->domain->vcpu[0]->arch.hvm_vcpu.cache_tsc_offset;
>>> +    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset,
>>> +                             v->domain->arch.hvm_domain.sync_tsc);
>>> +
>>> +    v->arch.hvm_vcpu.msr_tsc_adjust = 0;
>> 
>> The need for this one I also can't see right away - another comment
>> perhaps?
> 
> AFAICT we need to initialize this to a sane value, like it's done in
> hvm_vcpu_reset_state.

As said above - struct vcpu starts out zeroed, so unless stale data
is being left here, "initialize" would at least not be the right term.

>>> +#ifndef __XEN_PUBLIC_HVM_HVM_VCPU_H__
>>> +#define __XEN_PUBLIC_HVM_HVM_VCPU_H__
>>> +
>>> +#include "../xen.h"
>>> +
>>> +struct vcpu_hvm_x86_16 {
>>> +    uint16_t ax;
>>> +    uint16_t cx;
>>> +    uint16_t dx;
>>> +    uint16_t bx;
>>> +    uint16_t sp;
>>> +    uint16_t bp;
>>> +    uint16_t si;
>>> +    uint16_t di;
>>> +    uint16_t ip;
>>> +    uint16_t flags;
>>> +
>>> +    uint32_t cr0;
>>> +    uint32_t cr4;
>>> +
>>> +    uint32_t cs_base;
>>> +    uint32_t ds_base;
>>> +    uint32_t ss_base;
>>> +    uint32_t tr_base;
>>> +    uint32_t cs_limit;
>>> +    uint32_t ds_limit;
>>> +    uint32_t ss_limit;
>>> +    uint32_t tr_limit;
>>> +    uint16_t cs_ar;
>>> +    uint16_t ds_ar;
>>> +    uint16_t ss_ar;
>>> +    uint16_t tr_ar;
>>> +};
>> 
>> I doubt this is very useful: While one ought to be able to start a
>> guest in 16-bit mode, its GPRs still are supposed to be 32 bits wide.
>> The mode used really would depend on the cs_ar setting. (Having
>> the structure just for a 16-bit IP would seem insane.)
> 
> So, would you prefer to go just with the 64-bit structure and the mode
> is simply set by the value of the control registers / segment selectors?

No, you certainly want a 32- and a 64-bit layout, for the benefit of
32- and 64-bit guests.

>>> +    uint32_t cs_base;
>>> +    uint32_t ds_base;
>>> +    uint32_t ss_base;
>> 
>> I continue to question why we have DS here, but not ES (and maybe
>> FS and GS too). I.e. either just CS and SS (which are architecturally
>> required) or at least all four traditional x86 segment registers. And
>> you're also clearly not targeting minimal state, or else there likely
>> wouldn't be a need for e.g. R8-R15 in the 64-bit variant.
> 
> I'm fine with removing r8-15. Regarding the segment selectors, I don't
> have a problem with only allowing CS and SS to be set, or all of them
> including FS and GS. But I would like to get a consensus on this, we
> have already gone back and forth several times regarding how this
> structure should look like, and TBH, I was hoping that this was the last
> time.

Was there back and forth? I only recall always having asked for
consistency here, just like spelled out above.

> Andrew, Jan, what would you prefer, either DS is removed or ES, FS and
> GS are also added?

I voiced my opinion. Andrew?

>>> +struct vcpu_hvm_x86_64 {
>>> +    uint64_t rax;
>>> +    uint64_t rcx;
>>> +    uint64_t rdx;
>>> +    uint64_t rbx;
>>> +    uint64_t rsp;
>>> +    uint64_t rbp;
>>> +    uint64_t rsi;
>>> +    uint64_t rdi;
>>> +    uint64_t r8;
>>> +    uint64_t r9;
>>> +    uint64_t r10;
>>> +    uint64_t r11;
>>> +    uint64_t r12;
>>> +    uint64_t r13;
>>> +    uint64_t r14;
>>> +    uint64_t r15;
>>> +    uint64_t rip;
>>> +    uint64_t rflags;
>>> +
>>> +    uint64_t cr0;
>>> +    uint64_t cr3;
>>> +    uint64_t cr4;
>>> +    uint64_t efer;
>>> +
>>> +    uint32_t cs_base;
>>> +    uint32_t ds_base;
>>> +    uint32_t ss_base;
>>> +    uint32_t tr_base;
>>> +    uint32_t cs_limit;
>>> +    uint32_t ds_limit;
>>> +    uint32_t ss_limit;
>>> +    uint32_t tr_limit;
>> 
>> I can see the need for the TR ones here, but what's the point of the
>> CS, SS, and DS ones?
> 
> In case the guest wants to start in 64bit compatibility mode?

Hmm, yes. But couldn't that be done using the 32-bit mode layout?

>>> +/*
>>> + * The layout of the _ar fields of the segment registers is the
>>> + * following:
>>> + *
>>> + * Bits [0,3]: type (bits 40-43).
>>> + * Bit      4: s    (descriptor type, bit 44).
>>> + * Bit  [5,6]: dpl  (descriptor privilege level, bits 45-46).
>>> + * Bit      7: p    (segment-present, bit 47).
>>> + * Bit      8: avl  (available for system software, bit 52).
>>> + * Bit      9: l    (64-bit code segment, bit 53).
>>> + * Bit     10: db   (meaning depends on the segment, bit 54).
>>> + * Bit     11: g    (granularity, bit 55)
>>> + *
>>> + * A more complete description of the meaning of this fields can be
>>> + * obtained from the Intel SDM, Volume 3, section 3.4.5.
>>> + */
>> 
>> Please make explicit what state bits 12-15 are expected to be in
>> (hopefully they're being checked to be zero rather than getting
>> ignored).
> 
> I will add a comment regarding the 12-15 bits. IMHO, that checking
> should be done in hvm_set_segment_register, and is not part of this patch.

Well - is there an existing path like the one being added here where
these values come from an untrusted source (which isn't to say that
when coming from a trusted source checking wouldn't be worthwhile)?
I wasn't able to spot any.

>> Please also clarify whether the limit specified for the various
>> segment registers is the one present in descriptor tables (subject
>> to scaling by the g bit) or (more likely) the byte granular one. In
>> any event I suppose certain restrictions apply.
> 
> Limit is not subject to the g bit, the limit set here it's the one that
> ends up present in the descriptor tables (or in this case in the cached
> part of the selectors). IMHO that's implicit, and it's what's done on
> bare metal.

See - you're mixing things up: In descriptor tables, the (20-bit) limit is
always subject to scaling due to G=1. And since we're talking about
documentation of an interface here, "implicit" is not an option. (I
wonder, btw, whether VMENTRY checks would fail when a limit value
isn't in line with the G bit. I didn't check the manual...)

With both of the above points my main goal is to convert an
unrecoverable error (guest death) to a recoverable one (the vCPU
issuing the hypercall receiving an error indication).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29  7:09       ` Jan Beulich
@ 2015-09-29  8:53         ` Tim Deegan
  2015-09-29 10:00         ` Andrew Cooper
  1 sibling, 0 replies; 99+ messages in thread
From: Tim Deegan @ 2015-09-29  8:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Stefano Stabellini,
	xen-devel, Roger Pau Monné

At 01:09 -0600 on 29 Sep (1443488980), Jan Beulich wrote:
> >>> On 28.09.15 at 18:09, <roger.pau@citrix.com> wrote:
> > El 21/09/15 a les 17.44, Jan Beulich ha escrit:
> >>>>> On 04.09.15 at 14:09, <roger.pau@citrix.com> wrote:
> >>> +    if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) )
> >>> +    {
> >>> +        /* Shadow-mode CR3 change. Check PDBR and update refcounts. */
> >>> +        struct page_info *page = get_page_from_gfn(v->domain,
> >>> +                                 v->arch.hvm_vcpu.guest_cr[3] >> PAGE_SHIFT,
> >>> +                                 NULL, P2M_ALLOC);
> >>
> >> P2M_ALLOC but not P2M_UNSHARE?
> >
> > This is a copy of what's done in hvm_set_cr3 when shadow mode is enabled.
> 
> As said on IRC - sadly we have to mem-sharing maintainer to ask. I
> wonder whether the past or current x86/mm maintainer would know
> - Tim, George?

Memory sharing requires HAP, so it makes no difference here.
Seems like it ought to be P2M_UNSHARE, though, for consistency.

Tim.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29  7:09       ` Jan Beulich
  2015-09-29  8:53         ` Tim Deegan
@ 2015-09-29 10:00         ` Andrew Cooper
  2015-09-29 10:07           ` Jan Beulich
  1 sibling, 1 reply; 99+ messages in thread
From: Andrew Cooper @ 2015-09-29 10:00 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: George Dunlap, xen-devel, Stefano Stabellini, Ian Campbell, Tim Deegan

On 29/09/15 08:09, Jan Beulich wrote:
>
>>>> +    uint32_t cs_base;
>>>> +    uint32_t ds_base;
>>>> +    uint32_t ss_base;
>>> I continue to question why we have DS here, but not ES (and maybe
>>> FS and GS too). I.e. either just CS and SS (which are architecturally
>>> required) or at least all four traditional x86 segment registers. And
>>> you're also clearly not targeting minimal state, or else there likely
>>> wouldn't be a need for e.g. R8-R15 in the 64-bit variant.
>> I'm fine with removing r8-15. Regarding the segment selectors, I don't
>> have a problem with only allowing CS and SS to be set, or all of them
>> including FS and GS. But I would like to get a consensus on this, we
>> have already gone back and forth several times regarding how this
>> structure should look like, and TBH, I was hoping that this was the last
>> time.
> Was there back and forth? I only recall always having asked for
> consistency here, just like spelled out above.
>
>> Andrew, Jan, what would you prefer, either DS is removed or ES, FS and
>> GS are also added?
> I voiced my opinion. Andrew?

DS clearly needs initialising to provide a sane environment in the newly
running vcpu.  Expecting %cs or %ss overrides until a new GDT is loaded
is unreasonable IMO.

Therefore, we are back to the question of whether to provide all segment
registers, or specify a flat layout without specific selector values.  I
would prefer the former to the latter.

~Andrew

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 10:00         ` Andrew Cooper
@ 2015-09-29 10:07           ` Jan Beulich
  2015-09-29 10:25             ` Andrew Cooper
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-29 10:07 UTC (permalink / raw)
  To: Andrew Cooper, roger.pau
  Cc: George Dunlap, xen-devel, Stefano Stabellini, Ian Campbell, Tim Deegan

>>> On 29.09.15 at 12:00, <andrew.cooper3@citrix.com> wrote:
> On 29/09/15 08:09, Jan Beulich wrote:
>>
>>>>> +    uint32_t cs_base;
>>>>> +    uint32_t ds_base;
>>>>> +    uint32_t ss_base;
>>>> I continue to question why we have DS here, but not ES (and maybe
>>>> FS and GS too). I.e. either just CS and SS (which are architecturally
>>>> required) or at least all four traditional x86 segment registers. And
>>>> you're also clearly not targeting minimal state, or else there likely
>>>> wouldn't be a need for e.g. R8-R15 in the 64-bit variant.
>>> I'm fine with removing r8-15. Regarding the segment selectors, I don't
>>> have a problem with only allowing CS and SS to be set, or all of them
>>> including FS and GS. But I would like to get a consensus on this, we
>>> have already gone back and forth several times regarding how this
>>> structure should look like, and TBH, I was hoping that this was the last
>>> time.
>> Was there back and forth? I only recall always having asked for
>> consistency here, just like spelled out above.
>>
>>> Andrew, Jan, what would you prefer, either DS is removed or ES, FS and
>>> GS are also added?
>> I voiced my opinion. Andrew?
> 
> DS clearly needs initialising to provide a sane environment in the newly
> running vcpu.  Expecting %cs or %ss overrides until a new GDT is loaded
> is unreasonable IMO.

I don't view this as unreasonable: You want to load a GDT first thing
anyway. And I see nothing wrong with that one instruction carrying
an override (and even then only on 32-bit, as a nul DS is fine on
64-bit).

> Therefore, we are back to the question of whether to provide all segment
> registers, or specify a flat layout without specific selector values.  I
> would prefer the former to the latter.

If we don't go the CS+SS only route, then yes, I'd too prefer
completing the set (I would probably agree with not adding FS
and GS, and even recommend against it in the 64-bit variant,
but I do insist on ES in that case).

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 10:07           ` Jan Beulich
@ 2015-09-29 10:25             ` Andrew Cooper
  2015-09-29 10:33               ` Jan Beulich
  0 siblings, 1 reply; 99+ messages in thread
From: Andrew Cooper @ 2015-09-29 10:25 UTC (permalink / raw)
  To: Jan Beulich, roger.pau
  Cc: George Dunlap, xen-devel, Stefano Stabellini, Ian Campbell, Tim Deegan

On 29/09/15 11:07, Jan Beulich wrote:
>>>> On 29.09.15 at 12:00, <andrew.cooper3@citrix.com> wrote:
>> On 29/09/15 08:09, Jan Beulich wrote:
>>>>>> +    uint32_t cs_base;
>>>>>> +    uint32_t ds_base;
>>>>>> +    uint32_t ss_base;
>>>>> I continue to question why we have DS here, but not ES (and maybe
>>>>> FS and GS too). I.e. either just CS and SS (which are architecturally
>>>>> required) or at least all four traditional x86 segment registers. And
>>>>> you're also clearly not targeting minimal state, or else there likely
>>>>> wouldn't be a need for e.g. R8-R15 in the 64-bit variant.
>>>> I'm fine with removing r8-15. Regarding the segment selectors, I don't
>>>> have a problem with only allowing CS and SS to be set, or all of them
>>>> including FS and GS. But I would like to get a consensus on this, we
>>>> have already gone back and forth several times regarding how this
>>>> structure should look like, and TBH, I was hoping that this was the last
>>>> time.
>>> Was there back and forth? I only recall always having asked for
>>> consistency here, just like spelled out above.
>>>
>>>> Andrew, Jan, what would you prefer, either DS is removed or ES, FS and
>>>> GS are also added?
>>> I voiced my opinion. Andrew?
>> DS clearly needs initialising to provide a sane environment in the newly
>> running vcpu.  Expecting %cs or %ss overrides until a new GDT is loaded
>> is unreasonable IMO.
> I don't view this as unreasonable: You want to load a GDT first thing
> anyway.

True.  The main question is whether the GDT will be at a fixed linear
address.  I expect this to be the case in any non-contrived situation.

> And I see nothing wrong with that one instruction carrying
> an override (and even then only on 32-bit, as a nul DS is fine on
> 64-bit).

Ah yes.

>
>> Therefore, we are back to the question of whether to provide all segment
>> registers, or specify a flat layout without specific selector values.  I
>> would prefer the former to the latter.
> If we don't go the CS+SS only route, then yes, I'd too prefer
> completing the set (I would probably agree with not adding FS
> and GS, and even recommend against it in the 64-bit variant,
> but I do insist on ES in that case).

I would still err on the CS/SS/DS/ES side given a straight choice.  It
offers more flexibility for rarer usecases.

~Andrew

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 10:25             ` Andrew Cooper
@ 2015-09-29 10:33               ` Jan Beulich
  2015-09-29 10:37                 ` Andrew Cooper
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-29 10:33 UTC (permalink / raw)
  To: Andrew Cooper, roger.pau
  Cc: George Dunlap, xen-devel, Stefano Stabellini, Ian Campbell, Tim Deegan

>>> On 29.09.15 at 12:25, <andrew.cooper3@citrix.com> wrote:
> On 29/09/15 11:07, Jan Beulich wrote:
>>>>> On 29.09.15 at 12:00, <andrew.cooper3@citrix.com> wrote:
>>> Therefore, we are back to the question of whether to provide all segment
>>> registers, or specify a flat layout without specific selector values.  I
>>> would prefer the former to the latter.
>> If we don't go the CS+SS only route, then yes, I'd too prefer
>> completing the set (I would probably agree with not adding FS
>> and GS, and even recommend against it in the 64-bit variant,
>> but I do insist on ES in that case).
> 
> I would still err on the CS/SS/DS/ES side given a straight choice.  It
> offers more flexibility for rarer usecases.

Okay, all four of them then for 32-bit, and just CS and SS for 64-bit?

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 10:33               ` Jan Beulich
@ 2015-09-29 10:37                 ` Andrew Cooper
  2015-09-29 10:48                   ` Jan Beulich
  0 siblings, 1 reply; 99+ messages in thread
From: Andrew Cooper @ 2015-09-29 10:37 UTC (permalink / raw)
  To: Jan Beulich, roger.pau
  Cc: George Dunlap, xen-devel, Stefano Stabellini, Ian Campbell, Tim Deegan

On 29/09/15 11:33, Jan Beulich wrote:
>>>> On 29.09.15 at 12:25, <andrew.cooper3@citrix.com> wrote:
>> On 29/09/15 11:07, Jan Beulich wrote:
>>>>>> On 29.09.15 at 12:00, <andrew.cooper3@citrix.com> wrote:
>>>> Therefore, we are back to the question of whether to provide all segment
>>>> registers, or specify a flat layout without specific selector values.  I
>>>> would prefer the former to the latter.
>>> If we don't go the CS+SS only route, then yes, I'd too prefer
>>> completing the set (I would probably agree with not adding FS
>>> and GS, and even recommend against it in the 64-bit variant,
>>> but I do insist on ES in that case).
>> I would still err on the CS/SS/DS/ES side given a straight choice.  It
>> offers more flexibility for rarer usecases.
> Okay, all four of them then for 32-bit, and just CS and SS for 64-bit?

Is SS needed for 64bit?  It is expected to be NUL just like DS and ES.

~Andrew

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 10:37                 ` Andrew Cooper
@ 2015-09-29 10:48                   ` Jan Beulich
  2015-09-29 14:01                     ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-29 10:48 UTC (permalink / raw)
  To: Andrew Cooper, roger.pau
  Cc: George Dunlap, xen-devel, Stefano Stabellini, Ian Campbell, Tim Deegan

>>> On 29.09.15 at 12:37, <andrew.cooper3@citrix.com> wrote:
> On 29/09/15 11:33, Jan Beulich wrote:
>>>>> On 29.09.15 at 12:25, <andrew.cooper3@citrix.com> wrote:
>>> On 29/09/15 11:07, Jan Beulich wrote:
>>>>>>> On 29.09.15 at 12:00, <andrew.cooper3@citrix.com> wrote:
>>>>> Therefore, we are back to the question of whether to provide all segment
>>>>> registers, or specify a flat layout without specific selector values.  I
>>>>> would prefer the former to the latter.
>>>> If we don't go the CS+SS only route, then yes, I'd too prefer
>>>> completing the set (I would probably agree with not adding FS
>>>> and GS, and even recommend against it in the 64-bit variant,
>>>> but I do insist on ES in that case).
>>> I would still err on the CS/SS/DS/ES side given a straight choice.  It
>>> offers more flexibility for rarer usecases.
>> Okay, all four of them then for 32-bit, and just CS and SS for 64-bit?
> 
> Is SS needed for 64bit?  It is expected to be NUL just like DS and ES.

Indeed, we should be able to get away without it. And for CS all
we'd need would be a selector.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 10:48                   ` Jan Beulich
@ 2015-09-29 14:01                     ` Roger Pau Monné
  2015-09-29 15:29                       ` Jan Beulich
  0 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-29 14:01 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: George Dunlap, xen-devel, Stefano Stabellini, Ian Campbell, Tim Deegan

El 29/09/15 a les 12.48, Jan Beulich ha escrit:
>>>> On 29.09.15 at 12:37, <andrew.cooper3@citrix.com> wrote:
>> On 29/09/15 11:33, Jan Beulich wrote:
>>>>>> On 29.09.15 at 12:25, <andrew.cooper3@citrix.com> wrote:
>>>> On 29/09/15 11:07, Jan Beulich wrote:
>>>>>>>> On 29.09.15 at 12:00, <andrew.cooper3@citrix.com> wrote:
>>>>>> Therefore, we are back to the question of whether to provide all segment
>>>>>> registers, or specify a flat layout without specific selector values.  I
>>>>>> would prefer the former to the latter.
>>>>> If we don't go the CS+SS only route, then yes, I'd too prefer
>>>>> completing the set (I would probably agree with not adding FS
>>>>> and GS, and even recommend against it in the 64-bit variant,
>>>>> but I do insist on ES in that case).
>>>> I would still err on the CS/SS/DS/ES side given a straight choice.  It
>>>> offers more flexibility for rarer usecases.
>>> Okay, all four of them then for 32-bit, and just CS and SS for 64-bit?
>>
>> Is SS needed for 64bit?  It is expected to be NUL just like DS and ES.
> 
> Indeed, we should be able to get away without it. And for CS all
> we'd need would be a selector.

Ok thanks, so we seem to have a consensus. Before posting a new 
revision, does the following vcpu_hvm_context look fine to both of you:

struct vcpu_hvm_x86_32 {
    uint32_t eax;
    uint32_t ecx;
    uint32_t edx;
    uint32_t ebx;
    uint32_t esp;
    uint32_t ebp;
    uint32_t esi;
    uint32_t edi;
    uint32_t eip;
    uint32_t eflags;

    uint32_t cr0;
    uint32_t cr3;
    uint32_t cr4;

    /*
     * EFER should only be used to set the NXE bit (if required)
     * when starting a vCPU in 32bit mode with paging enabled.
     */
    uint64_t efer;

    uint32_t cs_base;
    uint32_t ds_base;
    uint32_t ss_base;
    uint32_t es_base;
    uint32_t tr_base;
    uint32_t cs_limit;
    uint32_t ds_limit;
    uint32_t ss_limit;
    uint32_t es_limit;
    uint32_t tr_limit;
    uint16_t cs_ar;
    uint16_t ds_ar;
    uint16_t ss_ar;
    uint16_t es_ar;
    uint16_t tr_ar;
};

struct vcpu_hvm_x86_64 {
    uint64_t rax;
    uint64_t rcx;
    uint64_t rdx;
    uint64_t rbx;
    uint64_t rsp;
    uint64_t rbp;
    uint64_t rsi;
    uint64_t rdi;
    uint64_t rip;
    uint64_t rflags;

    uint64_t cr0;
    uint64_t cr3;
    uint64_t cr4;
    uint64_t efer;

    /*
     * Setting the CS attributes field is allowed in order to
     * start in compatibility mode.
     */
    uint16_t cs_ar;
};

struct vcpu_hvm_context {
#define VCPU_HVM_MODE_32B 0  /* 32bit fields of the structure will be used. */
#define VCPU_HVM_MODE_64B 1  /* 64bit fields of the structure will be used. */
    uint32_t mode;

    /* CPU registers. */
    union {
        struct vcpu_hvm_x86_32 x86_32;
        struct vcpu_hvm_x86_64 x86_64;
    } cpu_regs;
};

Thanks, Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 14:01                     ` Roger Pau Monné
@ 2015-09-29 15:29                       ` Jan Beulich
  2015-09-29 16:01                         ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-29 15:29 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

>>> On 29.09.15 at 16:01, <roger.pau@citrix.com> wrote:
> Ok thanks, so we seem to have a consensus. Before posting a new 
> revision, does the following vcpu_hvm_context look fine to both of you:
> 
> struct vcpu_hvm_x86_32 {
>     uint32_t eax;
>     uint32_t ecx;
>     uint32_t edx;
>     uint32_t ebx;
>     uint32_t esp;
>     uint32_t ebp;
>     uint32_t esi;
>     uint32_t edi;
>     uint32_t eip;
>     uint32_t eflags;
> 
>     uint32_t cr0;
>     uint32_t cr3;
>     uint32_t cr4;
> 
>     /*
>      * EFER should only be used to set the NXE bit (if required)
>      * when starting a vCPU in 32bit mode with paging enabled.
>      */
>     uint64_t efer;
> 
>     uint32_t cs_base;
>     uint32_t ds_base;
>     uint32_t ss_base;
>     uint32_t es_base;
>     uint32_t tr_base;
>     uint32_t cs_limit;
>     uint32_t ds_limit;
>     uint32_t ss_limit;
>     uint32_t es_limit;
>     uint32_t tr_limit;
>     uint16_t cs_ar;
>     uint16_t ds_ar;
>     uint16_t ss_ar;
>     uint16_t es_ar;
>     uint16_t tr_ar;
> };
> 
> struct vcpu_hvm_x86_64 {
>     uint64_t rax;
>     uint64_t rcx;
>     uint64_t rdx;
>     uint64_t rbx;
>     uint64_t rsp;
>     uint64_t rbp;
>     uint64_t rsi;
>     uint64_t rdi;
>     uint64_t rip;
>     uint64_t rflags;
> 
>     uint64_t cr0;
>     uint64_t cr3;
>     uint64_t cr4;
>     uint64_t efer;
> 
>     /*
>      * Setting the CS attributes field is allowed in order to
>      * start in compatibility mode.
>      */

Hmm, as said before it would seem to me that this would better
(or also) be allowed to work by specifying a suitable 32-bit register
state. Remember that in compatibility mode base and limit matter
again, and I think you also can't start with a nul SS.

>     uint16_t cs_ar;
> };

No tr_* here?

Also, what are the selector values going to be? Do you intend to
pass zero there?

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 15:29                       ` Jan Beulich
@ 2015-09-29 16:01                         ` Roger Pau Monné
  2015-09-29 16:20                           ` Jan Beulich
  0 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-29 16:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

El 29/09/15 a les 17.29, Jan Beulich ha escrit:
>>>> On 29.09.15 at 16:01, <roger.pau@citrix.com> wrote:
>> Ok thanks, so we seem to have a consensus. Before posting a new 
>> revision, does the following vcpu_hvm_context look fine to both of you:
>>
>> struct vcpu_hvm_x86_32 {
>>     uint32_t eax;
>>     uint32_t ecx;
>>     uint32_t edx;
>>     uint32_t ebx;
>>     uint32_t esp;
>>     uint32_t ebp;
>>     uint32_t esi;
>>     uint32_t edi;
>>     uint32_t eip;
>>     uint32_t eflags;
>>
>>     uint32_t cr0;
>>     uint32_t cr3;
>>     uint32_t cr4;
>>
>>     /*
>>      * EFER should only be used to set the NXE bit (if required)
>>      * when starting a vCPU in 32bit mode with paging enabled.
>>      */
>>     uint64_t efer;
>>
>>     uint32_t cs_base;
>>     uint32_t ds_base;
>>     uint32_t ss_base;
>>     uint32_t es_base;
>>     uint32_t tr_base;
>>     uint32_t cs_limit;
>>     uint32_t ds_limit;
>>     uint32_t ss_limit;
>>     uint32_t es_limit;
>>     uint32_t tr_limit;
>>     uint16_t cs_ar;
>>     uint16_t ds_ar;
>>     uint16_t ss_ar;
>>     uint16_t es_ar;
>>     uint16_t tr_ar;
>> };
>>
>> struct vcpu_hvm_x86_64 {
>>     uint64_t rax;
>>     uint64_t rcx;
>>     uint64_t rdx;
>>     uint64_t rbx;
>>     uint64_t rsp;
>>     uint64_t rbp;
>>     uint64_t rsi;
>>     uint64_t rdi;
>>     uint64_t rip;
>>     uint64_t rflags;
>>
>>     uint64_t cr0;
>>     uint64_t cr3;
>>     uint64_t cr4;
>>     uint64_t efer;
>>
>>     /*
>>      * Setting the CS attributes field is allowed in order to
>>      * start in compatibility mode.
>>      */
> 
> Hmm, as said before it would seem to me that this would better
> (or also) be allowed to work by specifying a suitable 32-bit register
> state. Remember that in compatibility mode base and limit matter
> again, and I think you also can't start with a nul SS.

Yes, I should add back all the registers here, so it looks like:

     uint32_t cs_base;
     uint32_t ds_base;
     uint32_t ss_base;
     uint32_t es_base;
     uint32_t cs_limit;
     uint32_t ds_limit;
     uint32_t ss_limit;
     uint32_t es_limit;
     uint16_t cs_ar;
     uint16_t ds_ar;
     uint16_t ss_ar;
     uint16_t es_ar;

>>     uint16_t cs_ar;
>> };
> 
> No tr_* here?

Is it necessary? for long or compatibility mode tr is always going to be:

tr_base = 0;
tr_limit = 0x67;
tr_ar = 0x8b;

The attributes field is always going to be 0x8b for compat or long mode,
the base and the limit might be different, but the guest should change
that by itself.

> Also, what are the selector values going to be? Do you intend to
> pass zero there?

Do you mean the visible part of the selectors? With the current
implementation the value of the "sel" field in the segment_register
struct is left uninitialized, so it's undefined. I could set them to 0,
but what's the point in doing it?

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 16:01                         ` Roger Pau Monné
@ 2015-09-29 16:20                           ` Jan Beulich
  2015-09-29 16:49                             ` Roger Pau Monné
  0 siblings, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-29 16:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

>>> On 29.09.15 at 18:01, <roger.pau@citrix.com> wrote:
> El 29/09/15 a les 17.29, Jan Beulich ha escrit:
>>>>> On 29.09.15 at 16:01, <roger.pau@citrix.com> wrote:
>>> Ok thanks, so we seem to have a consensus. Before posting a new 
>>> revision, does the following vcpu_hvm_context look fine to both of you:
>>>
>>> struct vcpu_hvm_x86_32 {
>>>     uint32_t eax;
>>>     uint32_t ecx;
>>>     uint32_t edx;
>>>     uint32_t ebx;
>>>     uint32_t esp;
>>>     uint32_t ebp;
>>>     uint32_t esi;
>>>     uint32_t edi;
>>>     uint32_t eip;
>>>     uint32_t eflags;
>>>
>>>     uint32_t cr0;
>>>     uint32_t cr3;
>>>     uint32_t cr4;
>>>
>>>     /*
>>>      * EFER should only be used to set the NXE bit (if required)
>>>      * when starting a vCPU in 32bit mode with paging enabled.
>>>      */
>>>     uint64_t efer;
>>>
>>>     uint32_t cs_base;
>>>     uint32_t ds_base;
>>>     uint32_t ss_base;
>>>     uint32_t es_base;
>>>     uint32_t tr_base;
>>>     uint32_t cs_limit;
>>>     uint32_t ds_limit;
>>>     uint32_t ss_limit;
>>>     uint32_t es_limit;
>>>     uint32_t tr_limit;
>>>     uint16_t cs_ar;
>>>     uint16_t ds_ar;
>>>     uint16_t ss_ar;
>>>     uint16_t es_ar;
>>>     uint16_t tr_ar;
>>> };
>>>
>>> struct vcpu_hvm_x86_64 {
>>>     uint64_t rax;
>>>     uint64_t rcx;
>>>     uint64_t rdx;
>>>     uint64_t rbx;
>>>     uint64_t rsp;
>>>     uint64_t rbp;
>>>     uint64_t rsi;
>>>     uint64_t rdi;
>>>     uint64_t rip;
>>>     uint64_t rflags;
>>>
>>>     uint64_t cr0;
>>>     uint64_t cr3;
>>>     uint64_t cr4;
>>>     uint64_t efer;
>>>
>>>     /*
>>>      * Setting the CS attributes field is allowed in order to
>>>      * start in compatibility mode.
>>>      */
>> 
>> Hmm, as said before it would seem to me that this would better
>> (or also) be allowed to work by specifying a suitable 32-bit register
>> state. Remember that in compatibility mode base and limit matter
>> again, and I think you also can't start with a nul SS.
> 
> Yes, I should add back all the registers here, so it looks like:
> 
>      uint32_t cs_base;
>      uint32_t ds_base;
>      uint32_t ss_base;
>      uint32_t es_base;
>      uint32_t cs_limit;
>      uint32_t ds_limit;
>      uint32_t ss_limit;
>      uint32_t es_limit;
>      uint16_t cs_ar;
>      uint16_t ds_ar;
>      uint16_t ss_ar;
>      uint16_t es_ar;

Or specify that compat mode entry is via using 32-bit register state.

>>>     uint16_t cs_ar;
>>> };
>> 
>> No tr_* here?
> 
> Is it necessary? for long or compatibility mode tr is always going to be:
> 
> tr_base = 0;
> tr_limit = 0x67;
> tr_ar = 0x8b;
> 
> The attributes field is always going to be 0x8b for compat or long mode,
> the base and the limit might be different, but the guest should change
> that by itself.

Please explain why you have it in the 32-bit register state, but not
in the 64-bit one. I said before that we should aim at being as
consistent as possible. (And btw I do not think that tr_base being
zero makes any sense.)

>> Also, what are the selector values going to be? Do you intend to
>> pass zero there?
> 
> Do you mean the visible part of the selectors? With the current
> implementation the value of the "sel" field in the segment_register
> struct is left uninitialized, so it's undefined. I could set them to 0,
> but what's the point in doing it?

We should never leave things uninitialized; leaving things
unspecified may be okay, but I don't see why we shouldn't
spell out what we intend to do, unless we foresee it to change
later on.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 16:20                           ` Jan Beulich
@ 2015-09-29 16:49                             ` Roger Pau Monné
  2015-09-29 16:58                               ` Roger Pau Monné
  2015-09-30 10:03                               ` Jan Beulich
  0 siblings, 2 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-29 16:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

El 29/09/15 a les 18.20, Jan Beulich ha escrit:
>>>> On 29.09.15 at 18:01, <roger.pau@citrix.com> wrote:
>> El 29/09/15 a les 17.29, Jan Beulich ha escrit:
>>>>>> On 29.09.15 at 16:01, <roger.pau@citrix.com> wrote:
>>>> Ok thanks, so we seem to have a consensus. Before posting a new 
>>>> revision, does the following vcpu_hvm_context look fine to both of you:
>>>>
>>>> struct vcpu_hvm_x86_32 {
>>>>     uint32_t eax;
>>>>     uint32_t ecx;
>>>>     uint32_t edx;
>>>>     uint32_t ebx;
>>>>     uint32_t esp;
>>>>     uint32_t ebp;
>>>>     uint32_t esi;
>>>>     uint32_t edi;
>>>>     uint32_t eip;
>>>>     uint32_t eflags;
>>>>
>>>>     uint32_t cr0;
>>>>     uint32_t cr3;
>>>>     uint32_t cr4;
>>>>
>>>>     /*
>>>>      * EFER should only be used to set the NXE bit (if required)
>>>>      * when starting a vCPU in 32bit mode with paging enabled.
>>>>      */
>>>>     uint64_t efer;
>>>>
>>>>     uint32_t cs_base;
>>>>     uint32_t ds_base;
>>>>     uint32_t ss_base;
>>>>     uint32_t es_base;
>>>>     uint32_t tr_base;
>>>>     uint32_t cs_limit;
>>>>     uint32_t ds_limit;
>>>>     uint32_t ss_limit;
>>>>     uint32_t es_limit;
>>>>     uint32_t tr_limit;
>>>>     uint16_t cs_ar;
>>>>     uint16_t ds_ar;
>>>>     uint16_t ss_ar;
>>>>     uint16_t es_ar;
>>>>     uint16_t tr_ar;
>>>> };
>>>>
>>>> struct vcpu_hvm_x86_64 {
>>>>     uint64_t rax;
>>>>     uint64_t rcx;
>>>>     uint64_t rdx;
>>>>     uint64_t rbx;
>>>>     uint64_t rsp;
>>>>     uint64_t rbp;
>>>>     uint64_t rsi;
>>>>     uint64_t rdi;
>>>>     uint64_t rip;
>>>>     uint64_t rflags;
>>>>
>>>>     uint64_t cr0;
>>>>     uint64_t cr3;
>>>>     uint64_t cr4;
>>>>     uint64_t efer;
>>>>
>>>>     /*
>>>>      * Setting the CS attributes field is allowed in order to
>>>>      * start in compatibility mode.
>>>>      */
>>>
>>> Hmm, as said before it would seem to me that this would better
>>> (or also) be allowed to work by specifying a suitable 32-bit register
>>> state. Remember that in compatibility mode base and limit matter
>>> again, and I think you also can't start with a nul SS.
>>
>> Yes, I should add back all the registers here, so it looks like:
>>
>>      uint32_t cs_base;
>>      uint32_t ds_base;
>>      uint32_t ss_base;
>>      uint32_t es_base;
>>      uint32_t cs_limit;
>>      uint32_t ds_limit;
>>      uint32_t ss_limit;
>>      uint32_t es_limit;
>>      uint16_t cs_ar;
>>      uint16_t ds_ar;
>>      uint16_t ss_ar;
>>      uint16_t es_ar;
> 
> Or specify that compat mode entry is via using 32-bit register state.

Yes, that should also work. If that's the preference then I guess we 
can remove all the selectors stuff from the 64bit structure.

>>>>     uint16_t cs_ar;
>>>> };
>>>
>>> No tr_* here?
>>
>> Is it necessary? for long or compatibility mode tr is always going to be:
>>
>> tr_base = 0;
>> tr_limit = 0x67;
>> tr_ar = 0x8b;
>>
>> The attributes field is always going to be 0x8b for compat or long mode,
>> the base and the limit might be different, but the guest should change
>> that by itself.
> 
> Please explain why you have it in the 32-bit register state, but not
> in the 64-bit one.

Because the 32bit register state can be used to start a CPU in real 
mode, and the TS type in real mode is different from the ones in 
protected mode and long mode.

> I said before that we should aim at being as
> consistent as possible. (And btw I do not think that tr_base being
> zero makes any sense.)

We don't actually have a TSS anywhere, does it really matter if base is 
set to 0?

In fact allowing the user to set tr_base and tr_limit is kind of 
pointless, it's impossible to load a TSS from this interface. So AFAICT 
what really matters is tr_ar only.

>>> Also, what are the selector values going to be? Do you intend to
>>> pass zero there?
>>
>> Do you mean the visible part of the selectors? With the current
>> implementation the value of the "sel" field in the segment_register
>> struct is left uninitialized, so it's undefined. I could set them to 0,
>> but what's the point in doing it?
> 
> We should never leave things uninitialized; leaving things
> unspecified may be okay, but I don't see why we shouldn't
> spell out what we intend to do, unless we foresee it to change
> later on.

Ack, I don't see a problem in setting them to 0, but anyway there's no 
GDT loaded, so the selector value itself is pointless (the cached part 
is what really matters).

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 16:49                             ` Roger Pau Monné
@ 2015-09-29 16:58                               ` Roger Pau Monné
  2015-09-30 10:03                               ` Jan Beulich
  1 sibling, 0 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-29 16:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

El 29/09/15 a les 18.49, Roger Pau Monné ha escrit:
> El 29/09/15 a les 18.20, Jan Beulich ha escrit:
>> I said before that we should aim at being as
>> consistent as possible. (And btw I do not think that tr_base being
>> zero makes any sense.)
> 
> We don't actually have a TSS anywhere, does it really matter if base is 
> set to 0?
> 
> In fact allowing the user to set tr_base and tr_limit is kind of 
> pointless, it's impossible to load a TSS from this interface. So AFAICT 
> what really matters is tr_ar only.

Please ignore the excerpt above, it's wrong. A user can indeed place a
valid TSS in memory and load it on the vcpu using this interface.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-29 16:49                             ` Roger Pau Monné
  2015-09-29 16:58                               ` Roger Pau Monné
@ 2015-09-30 10:03                               ` Jan Beulich
  2015-09-30 11:37                                 ` Roger Pau Monné
  1 sibling, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-30 10:03 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

>>> On 29.09.15 at 18:49, <roger.pau@citrix.com> wrote:
> El 29/09/15 a les 18.20, Jan Beulich ha escrit:
>>>>> On 29.09.15 at 18:01, <roger.pau@citrix.com> wrote:
>>> Yes, I should add back all the registers here, so it looks like:
>>>
>>>      uint32_t cs_base;
>>>      uint32_t ds_base;
>>>      uint32_t ss_base;
>>>      uint32_t es_base;
>>>      uint32_t cs_limit;
>>>      uint32_t ds_limit;
>>>      uint32_t ss_limit;
>>>      uint32_t es_limit;
>>>      uint16_t cs_ar;
>>>      uint16_t ds_ar;
>>>      uint16_t ss_ar;
>>>      uint16_t es_ar;
>> 
>> Or specify that compat mode entry is via using 32-bit register state.
> 
> Yes, that should also work. If that's the preference then I guess we 
> can remove all the selectors stuff from the 64bit structure.

That's my preference (with the EFER comment then accordingly
updated). Andrew?

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-30 10:03                               ` Jan Beulich
@ 2015-09-30 11:37                                 ` Roger Pau Monné
  2015-09-30 11:49                                   ` Andrew Cooper
  2015-09-30 11:54                                   ` Jan Beulich
  0 siblings, 2 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-30 11:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

El 30/09/15 a les 12.03, Jan Beulich ha escrit:
>>>> On 29.09.15 at 18:49, <roger.pau@citrix.com> wrote:
>> El 29/09/15 a les 18.20, Jan Beulich ha escrit:
>>>>>> On 29.09.15 at 18:01, <roger.pau@citrix.com> wrote:
>>>> Yes, I should add back all the registers here, so it looks like:
>>>>
>>>>      uint32_t cs_base;
>>>>      uint32_t ds_base;
>>>>      uint32_t ss_base;
>>>>      uint32_t es_base;
>>>>      uint32_t cs_limit;
>>>>      uint32_t ds_limit;
>>>>      uint32_t ss_limit;
>>>>      uint32_t es_limit;
>>>>      uint16_t cs_ar;
>>>>      uint16_t ds_ar;
>>>>      uint16_t ss_ar;
>>>>      uint16_t es_ar;
>>>
>>> Or specify that compat mode entry is via using 32-bit register state.
>>
>> Yes, that should also work. If that's the preference then I guess we 
>> can remove all the selectors stuff from the 64bit structure.
> 
> That's my preference (with the EFER comment then accordingly
> updated). Andrew?

This is what I currently have prototyped according to the comments, it 
should allow starting the vCPU in all possible modes AFAICT.

struct vcpu_hvm_x86_32 {
    uint32_t eax;
    uint32_t ecx;
    uint32_t edx;
    uint32_t ebx;
    uint32_t esp;
    uint32_t ebp;
    uint32_t esi;
    uint32_t edi;
    uint32_t eip;
    uint32_t eflags;

    uint32_t cr0;
    uint32_t cr3;
    uint32_t cr4;

    /*
     * EFER should only be used to set the NXE bit (if required)
     * when starting a vCPU in 32bit mode with paging enabled or
     * to set the LME/LMA bits in order to start the vCPU in
     * compatibility mode.
     */
    uint64_t efer;

    uint32_t cs_base;
    uint32_t ds_base;
    uint32_t ss_base;
    uint32_t es_base;
    uint32_t tr_base;
    uint32_t cs_limit;
    uint32_t ds_limit;
    uint32_t ss_limit;
    uint32_t es_limit;
    uint32_t tr_limit;
    uint16_t cs_ar;
    uint16_t ds_ar;
    uint16_t ss_ar;
    uint16_t es_ar;
    uint16_t tr_ar;
};

struct vcpu_hvm_x86_64 {
    uint64_t rax;
    uint64_t rcx;
    uint64_t rdx;
    uint64_t rbx;
    uint64_t rsp;
    uint64_t rbp;
    uint64_t rsi;
    uint64_t rdi;
    uint64_t rip;
    uint64_t rflags;

    uint64_t cr0;
    uint64_t cr3;
    uint64_t cr4;
    uint64_t efer;

    /*
     * Using VCPU_HVM_MODE_64B implies that the vCPU is launched
     * directly in long mode, so the type of the cached part
     * of the TR register is set to describe a 64-bit TSS (Busy).
     * The cached part of the CS register will also have the L bit
     * set (64-bit code segment).
     *
     * If the user wants to launch the vCPU in compatibility mode
     * the 32-bit structure should be used instead.
     */
};

struct vcpu_hvm_context {
#define VCPU_HVM_MODE_32B 0  /* 32bit fields of the structure will be used. */
#define VCPU_HVM_MODE_64B 1  /* 64bit fields of the structure will be used. */
    uint32_t mode;

    /* CPU registers. */
    union {
        struct vcpu_hvm_x86_32 x86_32;
        struct vcpu_hvm_x86_64 x86_64;
    } cpu_regs;
};

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-30 11:37                                 ` Roger Pau Monné
@ 2015-09-30 11:49                                   ` Andrew Cooper
  2015-09-30 11:54                                   ` Jan Beulich
  1 sibling, 0 replies; 99+ messages in thread
From: Andrew Cooper @ 2015-09-30 11:49 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: George Dunlap, xen-devel, Stefano Stabellini, Ian Campbell, Tim Deegan

On 30/09/15 12:37, Roger Pau Monné wrote:
> El 30/09/15 a les 12.03, Jan Beulich ha escrit:
>>>>> On 29.09.15 at 18:49, <roger.pau@citrix.com> wrote:
>>> El 29/09/15 a les 18.20, Jan Beulich ha escrit:
>>>>>>> On 29.09.15 at 18:01, <roger.pau@citrix.com> wrote:
>>>>> Yes, I should add back all the registers here, so it looks like:
>>>>>
>>>>>      uint32_t cs_base;
>>>>>      uint32_t ds_base;
>>>>>      uint32_t ss_base;
>>>>>      uint32_t es_base;
>>>>>      uint32_t cs_limit;
>>>>>      uint32_t ds_limit;
>>>>>      uint32_t ss_limit;
>>>>>      uint32_t es_limit;
>>>>>      uint16_t cs_ar;
>>>>>      uint16_t ds_ar;
>>>>>      uint16_t ss_ar;
>>>>>      uint16_t es_ar;
>>>> Or specify that compat mode entry is via using 32-bit register state.
>>> Yes, that should also work. If that's the preference then I guess we 
>>> can remove all the selectors stuff from the 64bit structure.
>> That's my preference (with the EFER comment then accordingly
>> updated). Andrew?
> This is what I currently have prototyped according to the comments, it 
> should allow starting the vCPU in all possible modes AFAICT.
>
> struct vcpu_hvm_x86_32 {
>     uint32_t eax;
>     uint32_t ecx;
>     uint32_t edx;
>     uint32_t ebx;
>     uint32_t esp;
>     uint32_t ebp;
>     uint32_t esi;
>     uint32_t edi;
>     uint32_t eip;
>     uint32_t eflags;
>
>     uint32_t cr0;
>     uint32_t cr3;
>     uint32_t cr4;
>
>     /*
>      * EFER should only be used to set the NXE bit (if required)
>      * when starting a vCPU in 32bit mode with paging enabled or
>      * to set the LME/LMA bits in order to start the vCPU in
>      * compatibility mode.
>      */
>     uint64_t efer;
>
>     uint32_t cs_base;
>     uint32_t ds_base;
>     uint32_t ss_base;
>     uint32_t es_base;
>     uint32_t tr_base;
>     uint32_t cs_limit;
>     uint32_t ds_limit;
>     uint32_t ss_limit;
>     uint32_t es_limit;
>     uint32_t tr_limit;
>     uint16_t cs_ar;
>     uint16_t ds_ar;
>     uint16_t ss_ar;
>     uint16_t es_ar;
>     uint16_t tr_ar;
> };
>
> struct vcpu_hvm_x86_64 {
>     uint64_t rax;
>     uint64_t rcx;
>     uint64_t rdx;
>     uint64_t rbx;
>     uint64_t rsp;
>     uint64_t rbp;
>     uint64_t rsi;
>     uint64_t rdi;
>     uint64_t rip;
>     uint64_t rflags;
>
>     uint64_t cr0;
>     uint64_t cr3;
>     uint64_t cr4;
>     uint64_t efer;
>
>     /*
>      * Using VCPU_HVM_MODE_64B implies that the vCPU is launched
>      * directly in long mode, so the type of the cached part
>      * of the TR register is set to describe a 64-bit TSS (Busy).
>      * The cached part of the CS register will also have the L bit
>      * set (64-bit code segment).
>      *
>      * If the user wants to launch the vCPU in compatibility mode
>      * the 32-bit structure should be used instead.
>      */
> };
>
> struct vcpu_hvm_context {
> #define VCPU_HVM_MODE_32B 0  /* 32bit fields of the structure will be used. */
> #define VCPU_HVM_MODE_64B 1  /* 64bit fields of the structure will be used. */
>     uint32_t mode;
>
>     /* CPU registers. */
>     union {
>         struct vcpu_hvm_x86_32 x86_32;
>         struct vcpu_hvm_x86_64 x86_64;
>     } cpu_regs;
> };
>
> Roger.

Looks good to me.

~Andrew

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-30 11:37                                 ` Roger Pau Monné
  2015-09-30 11:49                                   ` Andrew Cooper
@ 2015-09-30 11:54                                   ` Jan Beulich
  2015-09-30 12:19                                     ` Roger Pau Monné
  1 sibling, 1 reply; 99+ messages in thread
From: Jan Beulich @ 2015-09-30 11:54 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

>>> On 30.09.15 at 13:37, <roger.pau@citrix.com> wrote:
> This is what I currently have prototyped according to the comments, it 
> should allow starting the vCPU in all possible modes AFAICT.

Looks okay, one more comment:

> struct vcpu_hvm_x86_32 {
>     uint32_t eax;
>     uint32_t ecx;
>     uint32_t edx;
>     uint32_t ebx;
>     uint32_t esp;
>     uint32_t ebp;
>     uint32_t esi;
>     uint32_t edi;
>     uint32_t eip;
>     uint32_t eflags;
> 
>     uint32_t cr0;
>     uint32_t cr3;
>     uint32_t cr4;
> 
>     /*
>      * EFER should only be used to set the NXE bit (if required)
>      * when starting a vCPU in 32bit mode with paging enabled or
>      * to set the LME/LMA bits in order to start the vCPU in
>      * compatibility mode.
>      */
>     uint64_t efer;
> 
>     uint32_t cs_base;
>     uint32_t ds_base;
>     uint32_t ss_base;
>     uint32_t es_base;
>     uint32_t tr_base;
>     uint32_t cs_limit;
>     uint32_t ds_limit;
>     uint32_t ss_limit;
>     uint32_t es_limit;
>     uint32_t tr_limit;
>     uint16_t cs_ar;
>     uint16_t ds_ar;
>     uint16_t ss_ar;
>     uint16_t es_ar;
>     uint16_t tr_ar;
> };
> 
> struct vcpu_hvm_x86_64 {
>     uint64_t rax;
>     uint64_t rcx;
>     uint64_t rdx;
>     uint64_t rbx;
>     uint64_t rsp;
>     uint64_t rbp;
>     uint64_t rsi;
>     uint64_t rdi;
>     uint64_t rip;
>     uint64_t rflags;
> 
>     uint64_t cr0;
>     uint64_t cr3;
>     uint64_t cr4;
>     uint64_t efer;
> 
>     /*
>      * Using VCPU_HVM_MODE_64B implies that the vCPU is launched
>      * directly in long mode, so the type of the cached part
>      * of the TR register is set to describe a 64-bit TSS (Busy).
>      * The cached part of the CS register will also have the L bit
>      * set (64-bit code segment).

I'd leave out mentioning TR here (or else it'll be odd not to mention
e.g. LDTR too). Perhaps just "..., so the cached parts of the segment
registers get set to match that environment"?

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-30 11:54                                   ` Jan Beulich
@ 2015-09-30 12:19                                     ` Roger Pau Monné
  2015-09-30 12:35                                       ` Jan Beulich
  0 siblings, 1 reply; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-30 12:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

El 30/09/15 a les 13.54, Jan Beulich ha escrit:
>>>> On 30.09.15 at 13:37, <roger.pau@citrix.com> wrote:
>> This is what I currently have prototyped according to the comments, it 
>> should allow starting the vCPU in all possible modes AFAICT.
> 
> Looks okay, one more comment:
> 
>> struct vcpu_hvm_x86_32 {
>>     uint32_t eax;
>>     uint32_t ecx;
>>     uint32_t edx;
>>     uint32_t ebx;
>>     uint32_t esp;
>>     uint32_t ebp;
>>     uint32_t esi;
>>     uint32_t edi;
>>     uint32_t eip;
>>     uint32_t eflags;
>>
>>     uint32_t cr0;
>>     uint32_t cr3;
>>     uint32_t cr4;
>>
>>     /*
>>      * EFER should only be used to set the NXE bit (if required)
>>      * when starting a vCPU in 32bit mode with paging enabled or
>>      * to set the LME/LMA bits in order to start the vCPU in
>>      * compatibility mode.
>>      */
>>     uint64_t efer;
>>
>>     uint32_t cs_base;
>>     uint32_t ds_base;
>>     uint32_t ss_base;
>>     uint32_t es_base;
>>     uint32_t tr_base;
>>     uint32_t cs_limit;
>>     uint32_t ds_limit;
>>     uint32_t ss_limit;
>>     uint32_t es_limit;
>>     uint32_t tr_limit;
>>     uint16_t cs_ar;
>>     uint16_t ds_ar;
>>     uint16_t ss_ar;
>>     uint16_t es_ar;
>>     uint16_t tr_ar;
>> };
>>
>> struct vcpu_hvm_x86_64 {
>>     uint64_t rax;
>>     uint64_t rcx;
>>     uint64_t rdx;
>>     uint64_t rbx;
>>     uint64_t rsp;
>>     uint64_t rbp;
>>     uint64_t rsi;
>>     uint64_t rdi;
>>     uint64_t rip;
>>     uint64_t rflags;
>>
>>     uint64_t cr0;
>>     uint64_t cr3;
>>     uint64_t cr4;
>>     uint64_t efer;
>>
>>     /*
>>      * Using VCPU_HVM_MODE_64B implies that the vCPU is launched
>>      * directly in long mode, so the type of the cached part
>>      * of the TR register is set to describe a 64-bit TSS (Busy).
>>      * The cached part of the CS register will also have the L bit
>>      * set (64-bit code segment).
> 
> I'd leave out mentioning TR here (or else it'll be odd not to mention
> e.g. LDTR too). Perhaps just "..., so the cached parts of the segment
> registers get set to match that environment"?

That sounds fine. I'm going to update the patch and the FreeBSD part in
order to test it. Since we also spoke about adding sanity checks, I
wonder whether I should add those checks now, or leave them for a later
patch. IMHO those checks are only useful for developers.

For VCPU_HVM_MODE_32B:
 - rIP within CS limit.
 - Check that CS.DPL == SS.DPL.
 - rSP within SS limit.

TBH I don't think we should enforce the last two checks, starting with
an invalid stack should be fine as long as the user knows it. Maybe
print a warning/debug message in this case?

For VCPU_HVM_MODE_64B:
 - Check that cr0 has paging enabled.
 - Check that cr4 has pae enabled.
 - Check that efer has the LMA/LME bits set.

Those should be always enforced for long mode.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-30 12:19                                     ` Roger Pau Monné
@ 2015-09-30 12:35                                       ` Jan Beulich
  2015-09-30 12:50                                         ` Andrew Cooper
  2015-09-30 14:23                                         ` Roger Pau Monné
  0 siblings, 2 replies; 99+ messages in thread
From: Jan Beulich @ 2015-09-30 12:35 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

>>> On 30.09.15 at 14:19, <roger.pau@citrix.com> wrote:
> El 30/09/15 a les 13.54, Jan Beulich ha escrit:
>>>>> On 30.09.15 at 13:37, <roger.pau@citrix.com> wrote:
>>>     /*
>>>      * Using VCPU_HVM_MODE_64B implies that the vCPU is launched
>>>      * directly in long mode, so the type of the cached part
>>>      * of the TR register is set to describe a 64-bit TSS (Busy).
>>>      * The cached part of the CS register will also have the L bit
>>>      * set (64-bit code segment).
>> 
>> I'd leave out mentioning TR here (or else it'll be odd not to mention
>> e.g. LDTR too). Perhaps just "..., so the cached parts of the segment
>> registers get set to match that environment"?
> 
> That sounds fine. I'm going to update the patch and the FreeBSD part in
> order to test it. Since we also spoke about adding sanity checks, I
> wonder whether I should add those checks now, or leave them for a later
> patch. IMHO those checks are only useful for developers.

Fundamentally, checks done here should include everything that
would otherwise lead to the domain getting killed due to failed
VMENTRY. I.e. both sets below may need extending.

> For VCPU_HVM_MODE_32B:
>  - rIP within CS limit.
>  - Check that CS.DPL == SS.DPL.
>  - rSP within SS limit.
> 
> TBH I don't think we should enforce the last two checks, starting with
> an invalid stack should be fine as long as the user knows it. Maybe
> print a warning/debug message in this case?

I wouldn't check ESP at all. As to the two DPLs to, I don't think you
could launch a guest with these disagreeing.

> For VCPU_HVM_MODE_64B:
>  - Check that cr0 has paging enabled.
>  - Check that cr4 has pae enabled.
>  - Check that efer has the LMA/LME bits set.
> 
> Those should be always enforced for long mode.

Agreed, plus RIP being canonical.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-30 12:35                                       ` Jan Beulich
@ 2015-09-30 12:50                                         ` Andrew Cooper
  2015-09-30 15:33                                           ` Roger Pau Monné
  2015-09-30 14:23                                         ` Roger Pau Monné
  1 sibling, 1 reply; 99+ messages in thread
From: Andrew Cooper @ 2015-09-30 12:50 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: George Dunlap, xen-devel, Stefano Stabellini, Ian Campbell, Tim Deegan

On 30/09/15 13:35, Jan Beulich wrote:
>>>> On 30.09.15 at 14:19, <roger.pau@citrix.com> wrote:
>> El 30/09/15 a les 13.54, Jan Beulich ha escrit:
>>>>>> On 30.09.15 at 13:37, <roger.pau@citrix.com> wrote:
>>>>     /*
>>>>      * Using VCPU_HVM_MODE_64B implies that the vCPU is launched
>>>>      * directly in long mode, so the type of the cached part
>>>>      * of the TR register is set to describe a 64-bit TSS (Busy).
>>>>      * The cached part of the CS register will also have the L bit
>>>>      * set (64-bit code segment).
>>> I'd leave out mentioning TR here (or else it'll be odd not to mention
>>> e.g. LDTR too). Perhaps just "..., so the cached parts of the segment
>>> registers get set to match that environment"?
>> That sounds fine. I'm going to update the patch and the FreeBSD part in
>> order to test it. Since we also spoke about adding sanity checks, I
>> wonder whether I should add those checks now, or leave them for a later
>> patch. IMHO those checks are only useful for developers.
> Fundamentally, checks done here should include everything that
> would otherwise lead to the domain getting killed due to failed
> VMENTRY. I.e. both sets below may need extending.
>
>> For VCPU_HVM_MODE_32B:
>>  - rIP within CS limit.
>>  - Check that CS.DPL == SS.DPL.
>>  - rSP within SS limit.
>>
>> TBH I don't think we should enforce the last two checks, starting with
>> an invalid stack should be fine as long as the user knows it. Maybe
>> print a warning/debug message in this case?
> I wouldn't check ESP at all. As to the two DPLs to, I don't think you
> could launch a guest with these disagreeing.
>
>> For VCPU_HVM_MODE_64B:
>>  - Check that cr0 has paging enabled.
>>  - Check that cr4 has pae enabled.
>>  - Check that efer has the LMA/LME bits set.
>>
>> Those should be always enforced for long mode.
> Agreed, plus RIP being canonical.

LMA is a read-only bit with inconsistent semantics between Intel and AMD.

In particular, on Intel, LMA is not visible until LME has been set,
which means that the typical setup of

mov $MSR_EFER, %ecx
rdmsr
bts $_EFER_LME, %eax
wrmsr

Causes Xen to observe LME but not LMA when intercepting the wrmsr.

I think all you need to check is CR0.PG, CR4.PAE and EFER.LME (and guest
X86_FEATURE_LM).  LMA will then leak through in subsequent rdmsr's

~Andrew

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-30 12:35                                       ` Jan Beulich
  2015-09-30 12:50                                         ` Andrew Cooper
@ 2015-09-30 14:23                                         ` Roger Pau Monné
  2015-09-30 15:41                                           ` Jan Beulich
  1 sibling, 1 reply; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-30 14:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

El 30/09/15 a les 14.35, Jan Beulich ha escrit:
> On 30.09.15 at 14:19, <roger.pau@citrix.com> wrote:
>> For VCPU_HVM_MODE_32B:
>>  - rIP within CS limit.
>>  - Check that CS.DPL == SS.DPL.
>>  - rSP within SS limit.
>>
>> TBH I don't think we should enforce the last two checks, starting with
>> an invalid stack should be fine as long as the user knows it. Maybe
>> print a warning/debug message in this case?
> 
> I wouldn't check ESP at all. As to the two DPLs to, I don't think you
> could launch a guest with these disagreeing.

I think we should basically check that the DPL of the CS segment is
equal or greater than the DPL of all the other segments (ds, ss and es).

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-30 12:50                                         ` Andrew Cooper
@ 2015-09-30 15:33                                           ` Roger Pau Monné
  0 siblings, 0 replies; 99+ messages in thread
From: Roger Pau Monné @ 2015-09-30 15:33 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: George Dunlap, xen-devel, Stefano Stabellini, Ian Campbell, Tim Deegan

El 30/09/15 a les 14.50, Andrew Cooper ha escrit:
> On 30/09/15 13:35, Jan Beulich wrote:
>>>>> On 30.09.15 at 14:19, <roger.pau@citrix.com> wrote:
>>> El 30/09/15 a les 13.54, Jan Beulich ha escrit:
>>>>>>> On 30.09.15 at 13:37, <roger.pau@citrix.com> wrote:
>>>>>     /*
>>>>>      * Using VCPU_HVM_MODE_64B implies that the vCPU is launched
>>>>>      * directly in long mode, so the type of the cached part
>>>>>      * of the TR register is set to describe a 64-bit TSS (Busy).
>>>>>      * The cached part of the CS register will also have the L bit
>>>>>      * set (64-bit code segment).
>>>> I'd leave out mentioning TR here (or else it'll be odd not to mention
>>>> e.g. LDTR too). Perhaps just "..., so the cached parts of the segment
>>>> registers get set to match that environment"?
>>> That sounds fine. I'm going to update the patch and the FreeBSD part in
>>> order to test it. Since we also spoke about adding sanity checks, I
>>> wonder whether I should add those checks now, or leave them for a later
>>> patch. IMHO those checks are only useful for developers.
>> Fundamentally, checks done here should include everything that
>> would otherwise lead to the domain getting killed due to failed
>> VMENTRY. I.e. both sets below may need extending.
>>
>>> For VCPU_HVM_MODE_32B:
>>>  - rIP within CS limit.
>>>  - Check that CS.DPL == SS.DPL.
>>>  - rSP within SS limit.
>>>
>>> TBH I don't think we should enforce the last two checks, starting with
>>> an invalid stack should be fine as long as the user knows it. Maybe
>>> print a warning/debug message in this case?
>> I wouldn't check ESP at all. As to the two DPLs to, I don't think you
>> could launch a guest with these disagreeing.
>>
>>> For VCPU_HVM_MODE_64B:
>>>  - Check that cr0 has paging enabled.
>>>  - Check that cr4 has pae enabled.
>>>  - Check that efer has the LMA/LME bits set.
>>>
>>> Those should be always enforced for long mode.
>> Agreed, plus RIP being canonical.
> 
> LMA is a read-only bit with inconsistent semantics between Intel and AMD.
> 
> In particular, on Intel, LMA is not visible until LME has been set,
> which means that the typical setup of
> 
> mov $MSR_EFER, %ecx
> rdmsr
> bts $_EFER_LME, %eax
> wrmsr
> 
> Causes Xen to observe LME but not LMA when intercepting the wrmsr.
> 
> I think all you need to check is CR0.PG, CR4.PAE and EFER.LME (and guest
> X86_FEATURE_LM).  LMA will then leak through in subsequent rdmsr's

Although you don't need to set LMA in real hardware, AFAICT you need to
set it when using this interface, because hvm_update_guest_efer doesn't
set LMA if LME is enabled, which causes the vmentry to fail.

Roger.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
  2015-09-30 14:23                                         ` Roger Pau Monné
@ 2015-09-30 15:41                                           ` Jan Beulich
  0 siblings, 0 replies; 99+ messages in thread
From: Jan Beulich @ 2015-09-30 15:41 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ian Campbell, George Dunlap, Andrew Cooper, Tim Deegan,
	Stefano Stabellini, xen-devel

>>> On 30.09.15 at 16:23, <roger.pau@citrix.com> wrote:
> El 30/09/15 a les 14.35, Jan Beulich ha escrit:
>> On 30.09.15 at 14:19, <roger.pau@citrix.com> wrote:
>>> For VCPU_HVM_MODE_32B:
>>>  - rIP within CS limit.
>>>  - Check that CS.DPL == SS.DPL.
>>>  - rSP within SS limit.
>>>
>>> TBH I don't think we should enforce the last two checks, starting with
>>> an invalid stack should be fine as long as the user knows it. Maybe
>>> print a warning/debug message in this case?
>> 
>> I wouldn't check ESP at all. As to the two DPLs to, I don't think you
>> could launch a guest with these disagreeing.
> 
> I think we should basically check that the DPL of the CS segment is
> equal or greater than the DPL of all the other segments (ds, ss and es).

Except that for SS it needs to be equal.

Jan

^ permalink raw reply	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2015-09-30 15:41 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-04 12:08 [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 01/29] libxc: split x86 HVM setup_guest into smaller logical functions Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 02/29] libxc: unify xc_dom_p2m_{host/guest} Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 03/29] libxc: introduce the notion of a container type Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 04/29] libxc: introduce a domain loader for HVM guest firmware Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 05/29] libxc: make arch_setup_meminit a xc_dom_arch hook Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 06/29] libxc: make arch_setup_boot{init/late} xc_dom_arch hooks Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 07/29] libxc: rework BSP initialization Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 08/29] libxc: introduce a xc_dom_arch for hvm-3.0-x86_32 guests Roger Pau Monne
2015-09-18 15:53   ` Anthony PERARD
2015-09-23 10:32     ` Roger Pau Monné
2015-09-04 12:08 ` [PATCH v6 09/29] libxl: switch HVM domain building to use xc_dom_* helpers Roger Pau Monne
2015-09-18 15:53   ` Anthony PERARD
2015-09-23 10:38     ` Roger Pau Monné
2015-09-04 12:08 ` [PATCH v6 10/29] libxc: remove dead HVM building code Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 11/29] xen/x86: add bitmap of enabled emulated devices Roger Pau Monne
2015-09-04 12:25   ` Wei Liu
2015-09-04 13:51     ` Roger Pau Monné
2015-09-04 13:55       ` Jan Beulich
2015-09-04 22:41         ` Andrew Cooper
2015-09-23 11:43         ` Roger Pau Monné
2015-09-04 13:56       ` Wei Liu
2015-09-09 14:27   ` Wei Liu
2015-09-16  9:50   ` Jan Beulich
2015-09-23 12:35     ` Roger Pau Monné
2015-09-23 13:24       ` Jan Beulich
2015-09-23 15:02         ` Roger Pau Monné
2015-09-16 10:10   ` Jan Beulich
2015-09-23 12:42     ` Roger Pau Monné
2015-09-23 12:46       ` Andrew Cooper
2015-09-04 12:08 ` [PATCH v6 12/29] xen/x86: allow disabling the emulated local apic Roger Pau Monne
2015-09-16 10:05   ` Jan Beulich
2015-09-23 15:45     ` Roger Pau Monné
2015-09-24  7:57       ` Jan Beulich
2015-09-25  9:00         ` Roger Pau Monné
2015-09-04 12:08 ` [PATCH v6 13/29] xen/x86: allow disabling the emulated HPET Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 14/29] xen/x86: allow disabling the pmtimer Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 15/29] xen/x86: allow disabling the emulated RTC Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 16/29] xen/x86: allow disabling the emulated IO APIC Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 17/29] xen/x86: allow disabling the emulated PIC Roger Pau Monne
2015-09-21 14:34   ` Jan Beulich
2015-09-25 15:01     ` Roger Pau Monné
2015-09-04 12:08 ` [PATCH v6 18/29] xen/x86: allow disabling the emulated pmu Roger Pau Monne
2015-09-21 14:36   ` Jan Beulich
2015-09-21 14:48     ` Boris Ostrovsky
2015-09-25 15:07       ` Roger Pau Monné
2015-09-25 15:13         ` Jan Beulich
2015-09-25 15:22           ` Roger Pau Monné
2015-09-25 15:41             ` Boris Ostrovsky
2015-09-04 12:08 ` [PATCH v6 19/29] xen/x86: allow disabling the emulated VGA Roger Pau Monne
2015-09-04 12:08 ` [PATCH v6 20/29] xen/x86: allow disabling the emulated IOMMU Roger Pau Monne
2015-09-28 13:58   ` Aravind Gopalakrishnan
2015-09-04 12:09 ` [PATCH v6 21/29] xen/x86: allow disabling all emulated devices inside of Xen Roger Pau Monne
2015-09-04 12:09 ` [PATCH v6 22/29] elfnotes: intorduce a new PHYS_ENTRY elfnote Roger Pau Monne
2015-09-21 14:47   ` Jan Beulich
2015-09-28 10:35     ` Roger Pau Monné
2015-09-28 10:56       ` Jan Beulich
2015-09-28 10:59         ` Andrew Cooper
2015-09-04 12:09 ` [PATCH v6 23/29] libxc: allow creating domains without emulated devices Roger Pau Monne
2015-09-04 12:09 ` [PATCH v6 24/29] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs Roger Pau Monne
2015-09-21 15:44   ` Jan Beulich
2015-09-25 15:16     ` Andrew Cooper
2015-09-25 15:52       ` Jan Beulich
2015-09-28 16:09     ` Roger Pau Monné
2015-09-29  7:09       ` Jan Beulich
2015-09-29  8:53         ` Tim Deegan
2015-09-29 10:00         ` Andrew Cooper
2015-09-29 10:07           ` Jan Beulich
2015-09-29 10:25             ` Andrew Cooper
2015-09-29 10:33               ` Jan Beulich
2015-09-29 10:37                 ` Andrew Cooper
2015-09-29 10:48                   ` Jan Beulich
2015-09-29 14:01                     ` Roger Pau Monné
2015-09-29 15:29                       ` Jan Beulich
2015-09-29 16:01                         ` Roger Pau Monné
2015-09-29 16:20                           ` Jan Beulich
2015-09-29 16:49                             ` Roger Pau Monné
2015-09-29 16:58                               ` Roger Pau Monné
2015-09-30 10:03                               ` Jan Beulich
2015-09-30 11:37                                 ` Roger Pau Monné
2015-09-30 11:49                                   ` Andrew Cooper
2015-09-30 11:54                                   ` Jan Beulich
2015-09-30 12:19                                     ` Roger Pau Monné
2015-09-30 12:35                                       ` Jan Beulich
2015-09-30 12:50                                         ` Andrew Cooper
2015-09-30 15:33                                           ` Roger Pau Monné
2015-09-30 14:23                                         ` Roger Pau Monné
2015-09-30 15:41                                           ` Jan Beulich
2015-09-04 12:09 ` [PATCH v6 25/29] xenconsole: try to attach to PV console if HVM fails Roger Pau Monne
2015-09-04 12:09 ` [PATCH v6 26/29] libxc/xen: introduce a start info structure for HVMlite guests Roger Pau Monne
2015-09-10 16:00   ` Wei Liu
2015-09-21 15:53   ` Jan Beulich
2015-09-28 16:51     ` Roger Pau Monné
2015-09-04 12:09 ` [PATCH v6 27/29] libxc: switch xc_dom_elfloader to be used with HVMlite domains Roger Pau Monne
2015-09-04 12:09 ` [PATCH v6 28/29] libxl: allow the creation of HVM domains without a device model Roger Pau Monne
2015-09-04 12:09 ` [PATCH v6 29/29] libxl: add support for migrating HVM guests " Roger Pau Monne
2015-09-10 16:00   ` Wei Liu
2015-09-10 16:30   ` Andrew Cooper
2015-09-11 13:04 ` [PATCH v6 00/29] Introduce HVM without dm and new boot ABI Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.