All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18][V6]: PVH xen: version 6 patches...
@ 2013-05-25  1:25 Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 01/18] PVH xen: turn gdb_frames/gdt_ents into union Mukesh Rathor
                   ` (18 more replies)
  0 siblings, 19 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

I've version 6 of my patches for 64bit PVH guest for xen.  This is Phase I.
These patches are built on top of git
c/s: 9204bc654562976c7cdebf21c6b5013f6e3057b3

V6:
The biggest change in V6 is dropping of dom0 PVH. It will take some time
to investigate and redo dom0 construct to use unmodified PV code. These 
patches in V6 will allow PV dom0 to create PVH domU. Please ack or indicate
individual patches if there are no issues, so I know they have been looked
at.


Repeating from before:

Phase I:
   - Establish a baseline of something working. These patches allow for
     dom0 to be booted in PVH mode, and after that guests to be started
     in PV, PVH, and HVM modes. I also tested booting dom0 in PV mode,
     and starting PV, PVH, and HVM guests.

     Also, the disk must be specified as phy: in vm.cfg file:
         > losetup /dev/loop1 guest.img
         > vm.cfg file: disk = ['phy:/dev/loop1,xvda,w']          

     I've not tested anything else.
     Note, HAP and iommu are required for PVH.

As a result of V3, there were two new action items on the linux side before
it will boot as PVH: 1)MSI-X fixup and 2)load KERNEL_CS righ after gdt switch.

As a result of V5 a new fixme:
  - MMIO ranges above the highest covered e820 address must be mapped for dom0.

Following fixme's exist in the code:
  - Add support for more memory types in arch/x86/hvm/mtrr.c.
  - arch/x86/time.c: support more tsc modes.
  - check_guest_io_breakpoint(): check/add support for IO breakpoint.
  - implement arch_get_info_guest() for pvh.
  - vmxit_msr_read(): during AMD port go thru hvm_msr_read_intercept() again.
  - verify bp matching on emulated instructions will work same as HVM for
    PVH guest. see instruction_done() and check_guest_io_breakpoint().

Following remain to be done for PVH:
   - AMD port.
   - Avail PVH dom0 of posted interrupts. (This will be a big win).
   - 32bit support in both linux and xen. Xen changes are tagged "32bitfixme".
   - Add support for monitoring guest behavior. See hvm_memory_event* functions
     in hvm.c
   - Change xl to support other modes other than "phy:".
   - Hotplug support
   - Migration of PVH guests.

Thanks for all the help,
Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 01/18] PVH xen: turn gdb_frames/gdt_ents into union
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-31  9:13   ` Jan Beulich
  2013-05-25  1:25 ` [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range Mukesh Rathor
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

Changes in V2:
  - Add __XEN_INTERFACE_VERSION__

  Changes in V3:
    - Rename union to 'gdt' and rename field names.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 tools/libxc/xc_domain_restore.c   |    8 ++++----
 tools/libxc/xc_domain_save.c      |    6 +++---
 xen/arch/x86/domain.c             |   12 ++++++------
 xen/arch/x86/domctl.c             |   12 ++++++------
 xen/include/public/arch-x86/xen.h |   14 ++++++++++++++
 5 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index a15f86a..5530631 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -2020,15 +2020,15 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
             munmap(start_info, PAGE_SIZE);
         }
         /* Uncanonicalise each GDT frame number. */
-        if ( GET_FIELD(ctxt, gdt_ents) > 8192 )
+        if ( GET_FIELD(ctxt, gdt.pv.num_ents) > 8192 )
         {
             ERROR("GDT entry count out of range");
             goto out;
         }
 
-        for ( j = 0; (512*j) < GET_FIELD(ctxt, gdt_ents); j++ )
+        for ( j = 0; (512*j) < GET_FIELD(ctxt, gdt.pv.num_ents); j++ )
         {
-            pfn = GET_FIELD(ctxt, gdt_frames[j]);
+            pfn = GET_FIELD(ctxt, gdt.pv.frames[j]);
             if ( (pfn >= dinfo->p2m_size) ||
                  (pfn_type[pfn] != XEN_DOMCTL_PFINFO_NOTAB) )
             {
@@ -2036,7 +2036,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       j, (unsigned long)pfn);
                 goto out;
             }
-            SET_FIELD(ctxt, gdt_frames[j], ctx->p2m[pfn]);
+            SET_FIELD(ctxt, gdt.pv.frames[j], ctx->p2m[pfn]);
         }
         /* Uncanonicalise the page table base pointer. */
         pfn = UNFOLD_CR3(GET_FIELD(ctxt, ctrlreg[3]));
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index ff76626..97cf64a 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1900,15 +1900,15 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
         }
 
         /* Canonicalise each GDT frame number. */
-        for ( j = 0; (512*j) < GET_FIELD(&ctxt, gdt_ents); j++ )
+        for ( j = 0; (512*j) < GET_FIELD(&ctxt, gdt.pv.num_ents); j++ )
         {
-            mfn = GET_FIELD(&ctxt, gdt_frames[j]);
+            mfn = GET_FIELD(&ctxt, gdt.pv.frames[j]);
             if ( !MFN_IS_IN_PSEUDOPHYS_MAP(mfn) )
             {
                 ERROR("GDT frame is not in range of pseudophys map");
                 goto out;
             }
-            SET_FIELD(&ctxt, gdt_frames[j], mfn_to_pfn(mfn));
+            SET_FIELD(&ctxt, gdt.pv.frames[j], mfn_to_pfn(mfn));
         }
 
         /* Canonicalise the page table base pointer. */
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 161d1b3..cd95dc6 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -780,8 +780,8 @@ int arch_set_info_guest(
         }
 
         for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames); ++i )
-            fail |= v->arch.pv_vcpu.gdt_frames[i] != c(gdt_frames[i]);
-        fail |= v->arch.pv_vcpu.gdt_ents != c(gdt_ents);
+            fail |= v->arch.pv_vcpu.gdt_frames[i] != c(gdt.pv.frames[i]);
+        fail |= v->arch.pv_vcpu.gdt_ents != c(gdt.pv.num_ents);
 
         fail |= v->arch.pv_vcpu.ldt_base != c(ldt_base);
         fail |= v->arch.pv_vcpu.ldt_ents != c(ldt_ents);
@@ -830,17 +830,17 @@ int arch_set_info_guest(
         d->vm_assist = c(vm_assist);
 
     if ( !compat )
-        rc = (int)set_gdt(v, c.nat->gdt_frames, c.nat->gdt_ents);
+        rc = (int)set_gdt(v, c.nat->gdt.pv.frames, c.nat->gdt.pv.num_ents);
     else
     {
         unsigned long gdt_frames[ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames)];
-        unsigned int n = (c.cmp->gdt_ents + 511) / 512;
+        unsigned int n = (c.cmp->gdt.pv.num_ents + 511) / 512;
 
         if ( n > ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames) )
             return -EINVAL;
         for ( i = 0; i < n; ++i )
-            gdt_frames[i] = c.cmp->gdt_frames[i];
-        rc = (int)set_gdt(v, gdt_frames, c.cmp->gdt_ents);
+            gdt_frames[i] = c.cmp->gdt.pv.frames[i];
+        rc = (int)set_gdt(v, gdt_frames, c.cmp->gdt.pv.num_ents);
     }
     if ( rc != 0 )
         return rc;
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index c2a04c4..f87d6ab 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1300,12 +1300,12 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
         c(ldt_base = v->arch.pv_vcpu.ldt_base);
         c(ldt_ents = v->arch.pv_vcpu.ldt_ents);
         for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames); ++i )
-            c(gdt_frames[i] = v->arch.pv_vcpu.gdt_frames[i]);
-        BUILD_BUG_ON(ARRAY_SIZE(c.nat->gdt_frames) !=
-                     ARRAY_SIZE(c.cmp->gdt_frames));
-        for ( ; i < ARRAY_SIZE(c.nat->gdt_frames); ++i )
-            c(gdt_frames[i] = 0);
-        c(gdt_ents = v->arch.pv_vcpu.gdt_ents);
+            c(gdt.pv.frames[i] = v->arch.pv_vcpu.gdt_frames[i]);
+        BUILD_BUG_ON(ARRAY_SIZE(c.nat->gdt.pv.frames) !=
+                     ARRAY_SIZE(c.cmp->gdt.pv.frames));
+        for ( ; i < ARRAY_SIZE(c.nat->gdt.pv.frames); ++i )
+            c(gdt.pv.frames[i] = 0);
+        c(gdt.pv.num_ents = v->arch.pv_vcpu.gdt_ents);
         c(kernel_ss = v->arch.pv_vcpu.kernel_ss);
         c(kernel_sp = v->arch.pv_vcpu.kernel_sp);
         for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.ctrlreg); ++i )
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index b7f6a51..25c8519 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -170,7 +170,21 @@ struct vcpu_guest_context {
     struct cpu_user_regs user_regs;         /* User-level CPU registers     */
     struct trap_info trap_ctxt[256];        /* Virtual IDT                  */
     unsigned long ldt_base, ldt_ents;       /* LDT (linear address, # ents) */
+#if __XEN_INTERFACE_VERSION__ < 0x00040400
     unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */
+#else
+    union {
+        struct {
+            /* GDT (machine frames, # ents) */
+            unsigned long frames[16], num_ents;
+        } pv;
+        struct {
+            /* PVH: GDTR addr and size */
+            uint64_t addr;
+            uint16_t limit;
+        } pvh;
+    } gdt;
+#endif
     unsigned long kernel_ss, kernel_sp;     /* Virtual TSS (only SS1/SP1)   */
     /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */
     unsigned long ctrlreg[8];               /* CR0-CR7 (control registers)  */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 01/18] PVH xen: turn gdb_frames/gdt_ents into union Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-31  9:28   ` Jan Beulich
  2013-05-25  1:25 ` [PATCH 03/18] PVH xen: create domctl_memory_mapping() function Mukesh Rathor
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

In this patch we add a new function xenmem_add_to_physmap_range(), and
change xenmem_add_to_physmap_once parameters so it can be called from
xenmem_add_to_physmap_range. There is no PVH specific change here.
Note, foreign_domid is useful for XENMAPSPACE_gmfn_foreign only and is
introduced later in the patch series. In this patch, calling with
XENMAPSPACE_gmfn_foreign will result in xenmem_add_to_physmap_once()
returing -EINVAL.

Changes in V2:
  - Do not break parameter so xenmem_add_to_physmap_once() but pass in
    struct xen_add_to_physmap.

Changes in V3:
  - add xsm hook
  - redo xenmem_add_to_physmap_range() a bit as the struct
    xen_add_to_physmap_range got enhanced.

Changes in V6:
  - rcu_lock_target_domain_by_id() got removed, use rcu_lock_domain_by_any_id.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/mm.c |   74 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5123860..43f0769 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4519,7 +4519,8 @@ static int handle_iomem_range(unsigned long s, unsigned long e, void *p)
 
 static int xenmem_add_to_physmap_once(
     struct domain *d,
-    const struct xen_add_to_physmap *xatp)
+    const struct xen_add_to_physmap *xatp,
+    domid_t foreign_domid)
 {
     struct page_info *page = NULL;
     unsigned long gfn = 0; /* gcc ... */
@@ -4646,7 +4647,7 @@ static int xenmem_add_to_physmap(struct domain *d,
         start_xatp = *xatp;
         while ( xatp->size > 0 )
         {
-            rc = xenmem_add_to_physmap_once(d, xatp);
+            rc = xenmem_add_to_physmap_once(d, xatp, DOMID_INVALID);
             if ( rc < 0 )
                 return rc;
 
@@ -4672,7 +4673,37 @@ static int xenmem_add_to_physmap(struct domain *d,
         return rc;
     }
 
-    return xenmem_add_to_physmap_once(d, xatp);
+    return xenmem_add_to_physmap_once(d, xatp, DOMID_INVALID);
+}
+
+static int xenmem_add_to_physmap_range(struct domain *d,
+                                       struct xen_add_to_physmap_range *xatpr)
+{
+    int rc;
+
+    /* Process entries in reverse order to allow continuations */
+    while ( xatpr->size > 0 )
+    {
+        struct xen_add_to_physmap xatp;
+
+        if ( copy_from_guest_offset(&xatp.idx, xatpr->idxs, xatpr->size-1, 1)
+             || copy_from_guest_offset(&xatp.gpfn, xatpr->gpfns, xatpr->size-1,
+                                       1) )
+            return -EFAULT;
+
+        xatp.space = xatpr->space;
+        rc = xenmem_add_to_physmap_once(d, &xatp, xatpr->foreign_domid);
+
+        if ( copy_to_guest_offset(xatpr->errs, xatpr->size-1, &rc, 1) )
+            return -EFAULT;
+
+        xatpr->size--;
+
+        /* Check for continuation if it's not the last interation */
+        if ( xatpr->size > 0 && hypercall_preempt_check() )
+            return -EAGAIN;
+    }
+    return 0;
 }
 
 long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
@@ -4689,6 +4720,10 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&xatp, arg, 1) )
             return -EFAULT;
 
+        /* This one is only supported for add_to_physmap_range */
+        if ( xatp.space == XENMAPSPACE_gmfn_foreign )
+            return -EINVAL;
+
         d = rcu_lock_domain_by_any_id(xatp.domid);
         if ( d == NULL )
             return -ESRCH;
@@ -4716,6 +4751,39 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
         return rc;
     }
 
+    case XENMEM_add_to_physmap_range:
+    {
+        struct xen_add_to_physmap_range xatpr;
+        struct domain *d;
+
+        if ( copy_from_guest(&xatpr, arg, 1) )
+            return -EFAULT;
+
+        /* This mapspace is redundant for this hypercall */
+        if ( xatpr.space == XENMAPSPACE_gmfn_range )
+            return -EINVAL;
+
+        d = rcu_lock_domain_by_any_id(xatpr.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        if ( xsm_add_to_physmap(XSM_TARGET, current->domain, d) )
+        {
+            rcu_unlock_domain(d);
+            return -EPERM;
+        }
+
+        rc = xenmem_add_to_physmap_range(d, &xatpr);
+
+        rcu_unlock_domain(d);
+
+        if ( rc == -EAGAIN )
+            rc = hypercall_create_continuation(
+                __HYPERVISOR_memory_op, "ih", op, arg);
+
+        return rc;
+    }
+
     case XENMEM_set_memory_map:
     {
         struct xen_foreign_memory_map fmap;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/18] PVH xen: create domctl_memory_mapping() function
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 01/18] PVH xen: turn gdb_frames/gdt_ents into union Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-31  9:46   ` Jan Beulich
  2013-05-25  1:25 ` [PATCH 04/18] PVH xen: add params to read_segment_register Mukesh Rathor
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

In this patch, XEN_DOMCTL_memory_mapping code is put into a function so
it can be shared later for PVH. There is no change in it's
functionality.

Changes in V2:
  - Remove PHYSDEVOP_map_iomem sub hypercall, and the code supporting it
           as the IO region is mapped transparently now.

Changes in V3:
  - change loop control variable to ulong from int.
  - move priv checks to the function.

Changes in V5:
  - Move iomem_access_permitted check to case statment from the function
    as current doesn't point to dom0 during construct_dom0.

Changes in V6:
  - Move iomem_access_permitted back to domctl_memory_mapping() as it should
    be after the sanity and wrap checks of mfns/gfns.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/domctl.c    |  130 +++++++++++++++++++++++++---------------------
 xen/include/xen/domain.h |    2 +
 2 files changed, 72 insertions(+), 60 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index f87d6ab..ce32245 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -46,6 +46,75 @@ static int gdbsx_guest_mem_io(
     return (iop->remain ? -EFAULT : 0);
 }
 
+long domctl_memory_mapping(struct domain *d, unsigned long gfn,
+                           unsigned long mfn, unsigned long nr_mfns,
+                           bool_t add_map)
+{
+    unsigned long i;
+    long ret;
+
+    if ( (mfn + nr_mfns - 1) < mfn || /* wrap? */
+         ((mfn | (mfn + nr_mfns - 1)) >> (paddr_bits - PAGE_SHIFT)) ||
+         (gfn + nr_mfns - 1) < gfn ) /* wrap? */
+        return -EINVAL;
+
+    /* caller construct_dom0() runs on idle vcpu */
+    if ( !is_idle_vcpu(current) &&
+         !iomem_access_permitted(current->domain, mfn, mfn + nr_mfns - 1) )
+        return -EPERM;
+
+    ret = xsm_iomem_permission(XSM_HOOK, d, mfn, mfn + nr_mfns - 1, add_map);
+    if ( ret )
+        return ret;
+
+    if ( add_map )
+    {
+        printk(XENLOG_G_INFO
+               "memory_map:add: dom%d gfn=%lx mfn=%lx nr=%lx\n",
+               d->domain_id, gfn, mfn, nr_mfns);
+
+        ret = iomem_permit_access(d, mfn, mfn + nr_mfns - 1);
+        if ( !ret && paging_mode_translate(d) )
+        {
+            for ( i = 0; !ret && i < nr_mfns; i++ )
+                if ( !set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i)) )
+                    ret = -EIO;
+            if ( ret )
+            {
+                printk(XENLOG_G_WARNING
+                       "memory_map:fail: dom%d gfn=%lx mfn=%lx\n",
+                       d->domain_id, gfn + i, mfn + i);
+                while ( i-- )
+                    clear_mmio_p2m_entry(d, gfn + i);
+                if ( iomem_deny_access(d, mfn, mfn + nr_mfns - 1) &&
+                     is_hardware_domain(current->domain) )
+                    printk(XENLOG_ERR
+                           "memory_map: failed to deny dom%d access to [%lx,%lx]\n",
+                           d->domain_id, mfn, mfn + nr_mfns - 1);
+            }
+        }
+    }
+    else
+    {
+        printk(XENLOG_G_INFO
+               "memory_map:remove: dom%d gfn=%lx mfn=%lx nr=%lx\n",
+               d->domain_id, gfn, mfn, nr_mfns);
+
+        if ( paging_mode_translate(d) )
+            for ( i = 0; i < nr_mfns; i++ )
+                add_map |= !clear_mmio_p2m_entry(d, gfn + i);
+        ret = iomem_deny_access(d, mfn, mfn + nr_mfns - 1);
+        if ( !ret && add_map )
+            ret = -EIO;
+        if ( ret && is_hardware_domain(current->domain) )
+            printk(XENLOG_ERR
+                   "memory_map: error %ld %s dom%d access to [%lx,%lx]\n",
+                   ret, add_map ? "removing" : "denying", d->domain_id,
+                   mfn, mfn + nr_mfns - 1);
+    }
+    return ret;
+}
+
 long arch_do_domctl(
     struct xen_domctl *domctl, struct domain *d,
     XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
@@ -625,67 +694,8 @@ long arch_do_domctl(
         unsigned long mfn = domctl->u.memory_mapping.first_mfn;
         unsigned long nr_mfns = domctl->u.memory_mapping.nr_mfns;
         int add = domctl->u.memory_mapping.add_mapping;
-        unsigned long i;
-
-        ret = -EINVAL;
-        if ( (mfn + nr_mfns - 1) < mfn || /* wrap? */
-             ((mfn | (mfn + nr_mfns - 1)) >> (paddr_bits - PAGE_SHIFT)) ||
-             (gfn + nr_mfns - 1) < gfn ) /* wrap? */
-            break;
 
-        ret = -EPERM;
-        if ( !iomem_access_permitted(current->domain, mfn, mfn + nr_mfns - 1) )
-            break;
-
-        ret = xsm_iomem_mapping(XSM_HOOK, d, mfn, mfn + nr_mfns - 1, add);
-        if ( ret )
-            break;
-
-        if ( add )
-        {
-            printk(XENLOG_G_INFO
-                   "memory_map:add: dom%d gfn=%lx mfn=%lx nr=%lx\n",
-                   d->domain_id, gfn, mfn, nr_mfns);
-
-            ret = iomem_permit_access(d, mfn, mfn + nr_mfns - 1);
-            if ( !ret && paging_mode_translate(d) )
-            {
-                for ( i = 0; !ret && i < nr_mfns; i++ )
-                    if ( !set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i)) )
-                        ret = -EIO;
-                if ( ret )
-                {
-                    printk(XENLOG_G_WARNING
-                           "memory_map:fail: dom%d gfn=%lx mfn=%lx\n",
-                           d->domain_id, gfn + i, mfn + i);
-                    while ( i-- )
-                        clear_mmio_p2m_entry(d, gfn + i);
-                    if ( iomem_deny_access(d, mfn, mfn + nr_mfns - 1) &&
-                         is_hardware_domain(current->domain) )
-                        printk(XENLOG_ERR
-                               "memory_map: failed to deny dom%d access to [%lx,%lx]\n",
-                               d->domain_id, mfn, mfn + nr_mfns - 1);
-                }
-            }
-        }
-        else
-        {
-            printk(XENLOG_G_INFO
-                   "memory_map:remove: dom%d gfn=%lx mfn=%lx nr=%lx\n",
-                   d->domain_id, gfn, mfn, nr_mfns);
-
-            if ( paging_mode_translate(d) )
-                for ( i = 0; i < nr_mfns; i++ )
-                    add |= !clear_mmio_p2m_entry(d, gfn + i);
-            ret = iomem_deny_access(d, mfn, mfn + nr_mfns - 1);
-            if ( !ret && add )
-                ret = -EIO;
-            if ( ret && is_hardware_domain(current->domain) )
-                printk(XENLOG_ERR
-                       "memory_map: error %ld %s dom%d access to [%lx,%lx]\n",
-                       ret, add ? "removing" : "denying", d->domain_id,
-                       mfn, mfn + nr_mfns - 1);
-        }
+        ret = domctl_memory_mapping(d, gfn, mfn, nr_mfns, add);
     }
     break;
 
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index a057069..9d6287c 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -89,4 +89,6 @@ extern unsigned int xen_processor_pmbits;
 
 extern bool_t opt_dom0_vcpus_pin;
 
+extern long domctl_memory_mapping(struct domain *d, unsigned long gfn,
+                    unsigned long mfn, unsigned long nr_mfns, bool_t add_map);
 #endif /* __XEN_DOMAIN_H__ */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/18] PVH xen: add params to read_segment_register
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (2 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 03/18] PVH xen: create domctl_memory_mapping() function Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-31 10:00   ` Jan Beulich
  2013-05-25  1:25 ` [PATCH 05/18] PVH xen: vmx realted preparatory changes for PVH Mukesh Rathor
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

In this patch, read_segment_register macro is changed to take vcpu and
regs parameters so it can check if it's PVH guest (change in upcoming
patches). No functionality change. Also, make emulate_privileged_op()
public for later while changing this file.

Changes in V2:  None
Changes in V3:
   - Replace read_sreg with read_segment_register

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/domain.c        |    8 ++++----
 xen/arch/x86/traps.c         |   28 +++++++++++++---------------
 xen/arch/x86/x86_64/traps.c  |   16 ++++++++--------
 xen/include/asm-x86/system.h |    2 +-
 xen/include/asm-x86/traps.h  |    1 +
 5 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index cd95dc6..a5f2885 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1208,10 +1208,10 @@ static void save_segments(struct vcpu *v)
     struct cpu_user_regs *regs = &v->arch.user_regs;
     unsigned int dirty_segment_mask = 0;
 
-    regs->ds = read_segment_register(ds);
-    regs->es = read_segment_register(es);
-    regs->fs = read_segment_register(fs);
-    regs->gs = read_segment_register(gs);
+    regs->ds = read_segment_register(v, regs, ds);
+    regs->es = read_segment_register(v, regs, es);
+    regs->fs = read_segment_register(v, regs, fs);
+    regs->gs = read_segment_register(v, regs, gs);
 
     if ( regs->ds )
         dirty_segment_mask |= DIRTY_DS;
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 087bbeb..9d04735 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1831,8 +1831,6 @@ static inline uint64_t guest_misc_enable(uint64_t val)
     }                                                                       \
     (eip) += sizeof(_x); _x; })
 
-#define read_sreg(regs, sr) read_segment_register(sr)
-
 static int is_cpufreq_controller(struct domain *d)
 {
     return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
@@ -1841,7 +1839,7 @@ static int is_cpufreq_controller(struct domain *d)
 
 #include "x86_64/mmconfig.h"
 
-static int emulate_privileged_op(struct cpu_user_regs *regs)
+int emulate_privileged_op(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
     unsigned long *reg, eip = regs->eip;
@@ -1877,7 +1875,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         goto fail;
 
     /* emulating only opcodes not allowing SS to be default */
-    data_sel = read_sreg(regs, ds);
+    data_sel = read_segment_register(v, regs, ds);
 
     /* Legacy prefixes. */
     for ( i = 0; i < 8; i++, rex == opcode || (rex = 0) )
@@ -1895,17 +1893,17 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             data_sel = regs->cs;
             continue;
         case 0x3e: /* DS override */
-            data_sel = read_sreg(regs, ds);
+            data_sel = read_segment_register(v, regs, ds);
             continue;
         case 0x26: /* ES override */
-            data_sel = read_sreg(regs, es);
+            data_sel = read_segment_register(v, regs, es);
             continue;
         case 0x64: /* FS override */
-            data_sel = read_sreg(regs, fs);
+            data_sel = read_segment_register(v, regs, fs);
             lm_ovr = lm_seg_fs;
             continue;
         case 0x65: /* GS override */
-            data_sel = read_sreg(regs, gs);
+            data_sel = read_segment_register(v, regs, gs);
             lm_ovr = lm_seg_gs;
             continue;
         case 0x36: /* SS override */
@@ -1952,7 +1950,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
 
         if ( !(opcode & 2) )
         {
-            data_sel = read_sreg(regs, es);
+            data_sel = read_segment_register(v, regs, es);
             lm_ovr = lm_seg_none;
         }
 
@@ -2696,22 +2694,22 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             ASSERT(opnd_sel);
             continue;
         case 0x3e: /* DS override */
-            opnd_sel = read_sreg(regs, ds);
+            opnd_sel = read_segment_register(v, regs, ds);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
         case 0x26: /* ES override */
-            opnd_sel = read_sreg(regs, es);
+            opnd_sel = read_segment_register(v, regs, es);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
         case 0x64: /* FS override */
-            opnd_sel = read_sreg(regs, fs);
+            opnd_sel = read_segment_register(v, regs, fs);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
         case 0x65: /* GS override */
-            opnd_sel = read_sreg(regs, gs);
+            opnd_sel = read_segment_register(v, regs, gs);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
@@ -2764,7 +2762,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
                             switch ( modrm & 7 )
                             {
                             default:
-                                opnd_sel = read_sreg(regs, ds);
+                                opnd_sel = read_segment_register(v, regs, ds);
                                 break;
                             case 4: case 5:
                                 opnd_sel = regs->ss;
@@ -2792,7 +2790,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
                             break;
                         }
                         if ( !opnd_sel )
-                            opnd_sel = read_sreg(regs, ds);
+                            opnd_sel = read_segment_register(v, regs, ds);
                         switch ( modrm & 7 )
                         {
                         case 0: case 2: case 4:
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index eec919a..d2f7209 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -122,10 +122,10 @@ void show_registers(struct cpu_user_regs *regs)
         fault_crs[0] = read_cr0();
         fault_crs[3] = read_cr3();
         fault_crs[4] = read_cr4();
-        fault_regs.ds = read_segment_register(ds);
-        fault_regs.es = read_segment_register(es);
-        fault_regs.fs = read_segment_register(fs);
-        fault_regs.gs = read_segment_register(gs);
+        fault_regs.ds = read_segment_register(v, regs, ds);
+        fault_regs.es = read_segment_register(v, regs, es);
+        fault_regs.fs = read_segment_register(v, regs, fs);
+        fault_regs.gs = read_segment_register(v, regs, gs);
     }
 
     print_xen_info();
@@ -240,10 +240,10 @@ void do_double_fault(struct cpu_user_regs *regs)
     crs[2] = read_cr2();
     crs[3] = read_cr3();
     crs[4] = read_cr4();
-    regs->ds = read_segment_register(ds);
-    regs->es = read_segment_register(es);
-    regs->fs = read_segment_register(fs);
-    regs->gs = read_segment_register(gs);
+    regs->ds = read_segment_register(current, regs, ds);
+    regs->es = read_segment_register(current, regs, es);
+    regs->fs = read_segment_register(current, regs, fs);
+    regs->gs = read_segment_register(current, regs, gs);
 
     printk("CPU:    %d\n", cpu);
     _show_registers(regs, crs, CTXT_hypervisor, NULL);
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 6ab7d56..9bb22cb 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -4,7 +4,7 @@
 #include <xen/lib.h>
 #include <xen/bitops.h>
 
-#define read_segment_register(name)                             \
+#define read_segment_register(vcpu, regs, name)                 \
 ({  u16 __sel;                                                  \
     asm volatile ( "movw %%" STR(name) ",%0" : "=r" (__sel) );  \
     __sel;                                                      \
diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h
index 82cbcee..202e3be 100644
--- a/xen/include/asm-x86/traps.h
+++ b/xen/include/asm-x86/traps.h
@@ -49,4 +49,5 @@ extern int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
 extern int send_guest_trap(struct domain *d, uint16_t vcpuid,
 				unsigned int trap_nr);
 
+int emulate_privileged_op(struct cpu_user_regs *regs);
 #endif /* ASM_TRAP_H */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/18] PVH xen: vmx realted preparatory changes for PVH
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (3 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 04/18] PVH xen: add params to read_segment_register Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 06/18] PVH xen: Move e820 fields out of pv_domain struct Mukesh Rathor
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

This is another preparotary patch for PVH. In this patch, following
functions are made non-static:
    vmx_fpu_enter(), get_instruction_length(), update_guest_eip(),
    vmx_dr_access(), and pv_cpuid().

There is no functionality change.

Changes in V2:
  - prepend vmx_ to get_instruction_length and update_guest_eip.
  - Do not export/use vmr().

Changes in V3:
  - Do not change emulate_forced_invalid_op() in this patch.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/hvm/vmx/vmx.c         |   72 +++++++++++++++---------------------
 xen/arch/x86/hvm/vmx/vvmx.c        |    2 +-
 xen/arch/x86/traps.c               |    2 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h |    1 +
 xen/include/asm-x86/hvm/vmx/vmx.h  |   16 +++++++-
 xen/include/asm-x86/processor.h    |    1 +
 6 files changed, 49 insertions(+), 45 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d0de44a..25a265e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -577,7 +577,7 @@ static int vmx_load_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt)
     return 0;
 }
 
-static void vmx_fpu_enter(struct vcpu *v)
+void vmx_fpu_enter(struct vcpu *v)
 {
     vcpu_restore_fpu_lazy(v);
     v->arch.hvm_vmx.exception_bitmap &= ~(1u << TRAP_no_device);
@@ -1594,24 +1594,12 @@ const struct hvm_function_table * __init start_vmx(void)
     return &vmx_function_table;
 }
 
-/*
- * Not all cases receive valid value in the VM-exit instruction length field.
- * Callers must know what they're doing!
- */
-static int get_instruction_length(void)
-{
-    int len;
-    len = __vmread(VM_EXIT_INSTRUCTION_LEN); /* Safe: callers audited */
-    BUG_ON((len < 1) || (len > 15));
-    return len;
-}
-
-void update_guest_eip(void)
+void vmx_update_guest_eip(void)
 {
     struct cpu_user_regs *regs = guest_cpu_user_regs();
     unsigned long x;
 
-    regs->eip += get_instruction_length(); /* Safe: callers audited */
+    regs->eip += vmx_get_instruction_length(); /* Safe: callers audited */
     regs->eflags &= ~X86_EFLAGS_RF;
 
     x = __vmread(GUEST_INTERRUPTIBILITY_INFO);
@@ -1684,8 +1672,8 @@ static void vmx_do_cpuid(struct cpu_user_regs *regs)
     regs->edx = edx;
 }
 
-static void vmx_dr_access(unsigned long exit_qualification,
-                          struct cpu_user_regs *regs)
+void vmx_dr_access(unsigned long exit_qualification,
+                   struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
 
@@ -2298,7 +2286,7 @@ static int vmx_handle_eoi_write(void)
     if ( (((exit_qualification >> 12) & 0xf) == 1) &&
          ((exit_qualification & 0xfff) == APIC_EOI) )
     {
-        update_guest_eip(); /* Safe: APIC data write */
+        vmx_update_guest_eip(); /* Safe: APIC data write */
         vlapic_EOI_set(vcpu_vlapic(current));
         HVMTRACE_0D(VLAPIC);
         return 1;
@@ -2511,7 +2499,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             HVMTRACE_1D(TRAP, vector);
             if ( v->domain->debugger_attached )
             {
-                update_guest_eip(); /* Safe: INT3 */            
+                vmx_update_guest_eip(); /* Safe: INT3 */
                 current->arch.gdbsx_vcpu_event = TRAP_int3;
                 domain_pause_for_debugger();
                 break;
@@ -2619,7 +2607,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
          */
         inst_len = ((source != 3) ||        /* CALL, IRET, or JMP? */
                     (idtv_info & (1u<<10))) /* IntrType > 3? */
-            ? get_instruction_length() /* Safe: SDM 3B 23.2.4 */ : 0;
+            ? vmx_get_instruction_length() /* Safe: SDM 3B 23.2.4 */ : 0;
         if ( (source == 3) && (idtv_info & INTR_INFO_DELIVER_CODE_MASK) )
             ecode = __vmread(IDT_VECTORING_ERROR_CODE);
         regs->eip += inst_len;
@@ -2627,15 +2615,15 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         break;
     }
     case EXIT_REASON_CPUID:
-        update_guest_eip(); /* Safe: CPUID */
+        vmx_update_guest_eip(); /* Safe: CPUID */
         vmx_do_cpuid(regs);
         break;
     case EXIT_REASON_HLT:
-        update_guest_eip(); /* Safe: HLT */
+        vmx_update_guest_eip(); /* Safe: HLT */
         hvm_hlt(regs->eflags);
         break;
     case EXIT_REASON_INVLPG:
-        update_guest_eip(); /* Safe: INVLPG */
+        vmx_update_guest_eip(); /* Safe: INVLPG */
         exit_qualification = __vmread(EXIT_QUALIFICATION);
         vmx_invlpg_intercept(exit_qualification);
         break;
@@ -2643,7 +2631,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         regs->ecx = hvm_msr_tsc_aux(v);
         /* fall through */
     case EXIT_REASON_RDTSC:
-        update_guest_eip(); /* Safe: RDTSC, RDTSCP */
+        vmx_update_guest_eip(); /* Safe: RDTSC, RDTSCP */
         hvm_rdtsc_intercept(regs);
         break;
     case EXIT_REASON_VMCALL:
@@ -2653,7 +2641,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         rc = hvm_do_hypercall(regs);
         if ( rc != HVM_HCALL_preempted )
         {
-            update_guest_eip(); /* Safe: VMCALL */
+            vmx_update_guest_eip(); /* Safe: VMCALL */
             if ( rc == HVM_HCALL_invalidate )
                 send_invalidate_req();
         }
@@ -2663,7 +2651,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     {
         exit_qualification = __vmread(EXIT_QUALIFICATION);
         if ( vmx_cr_access(exit_qualification) == X86EMUL_OKAY )
-            update_guest_eip(); /* Safe: MOV Cn, LMSW, CLTS */
+            vmx_update_guest_eip(); /* Safe: MOV Cn, LMSW, CLTS */
         break;
     }
     case EXIT_REASON_DR_ACCESS:
@@ -2677,7 +2665,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         {
             regs->eax = (uint32_t)msr_content;
             regs->edx = (uint32_t)(msr_content >> 32);
-            update_guest_eip(); /* Safe: RDMSR */
+            vmx_update_guest_eip(); /* Safe: RDMSR */
         }
         break;
     }
@@ -2686,63 +2674,63 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         uint64_t msr_content;
         msr_content = ((uint64_t)regs->edx << 32) | (uint32_t)regs->eax;
         if ( hvm_msr_write_intercept(regs->ecx, msr_content) == X86EMUL_OKAY )
-            update_guest_eip(); /* Safe: WRMSR */
+            vmx_update_guest_eip(); /* Safe: WRMSR */
         break;
     }
 
     case EXIT_REASON_VMXOFF:
         if ( nvmx_handle_vmxoff(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMXON:
         if ( nvmx_handle_vmxon(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMCLEAR:
         if ( nvmx_handle_vmclear(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
  
     case EXIT_REASON_VMPTRLD:
         if ( nvmx_handle_vmptrld(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMPTRST:
         if ( nvmx_handle_vmptrst(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMREAD:
         if ( nvmx_handle_vmread(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
  
     case EXIT_REASON_VMWRITE:
         if ( nvmx_handle_vmwrite(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMLAUNCH:
         if ( nvmx_handle_vmlaunch(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMRESUME:
         if ( nvmx_handle_vmresume(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_INVEPT:
         if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_INVVPID:
         if ( nvmx_handle_invvpid(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_MWAIT_INSTRUCTION:
@@ -2790,14 +2778,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             int bytes = (exit_qualification & 0x07) + 1;
             int dir = (exit_qualification & 0x08) ? IOREQ_READ : IOREQ_WRITE;
             if ( handle_pio(port, bytes, dir) )
-                update_guest_eip(); /* Safe: IN, OUT */
+                vmx_update_guest_eip(); /* Safe: IN, OUT */
         }
         break;
 
     case EXIT_REASON_INVD:
     case EXIT_REASON_WBINVD:
     {
-        update_guest_eip(); /* Safe: INVD, WBINVD */
+        vmx_update_guest_eip(); /* Safe: INVD, WBINVD */
         vmx_wbinvd_intercept();
         break;
     }
@@ -2830,7 +2818,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     {
         u64 new_bv = (((u64)regs->edx) << 32) | regs->eax;
         if ( hvm_handle_xsetbv(new_bv) == 0 )
-            update_guest_eip(); /* Safe: XSETBV */
+            vmx_update_guest_eip(); /* Safe: XSETBV */
         break;
     }
 
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index bb7688f..225de9f 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -2136,7 +2136,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
             tsc += __get_vvmcs(nvcpu->nv_vvmcx, TSC_OFFSET);
             regs->eax = (uint32_t)tsc;
             regs->edx = (uint32_t)(tsc >> 32);
-            update_guest_eip();
+            vmx_update_guest_eip();
 
             return 1;
         }
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 9d04735..ce7c18a 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -736,7 +736,7 @@ int cpuid_hypervisor_leaves( uint32_t idx, uint32_t sub_idx,
     return 1;
 }
 
-static void pv_cpuid(struct cpu_user_regs *regs)
+void pv_cpuid(struct cpu_user_regs *regs)
 {
     uint32_t a, b, c, d;
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index f30e5ac..c9d7118 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -475,6 +475,7 @@ void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 int vmx_check_msr_bitmap(unsigned long *msr_bitmap, u32 msr, int access_type);
+void vmx_fpu_enter(struct vcpu *v);
 void virtual_vmcs_enter(void *vvmcs);
 void virtual_vmcs_exit(void *vvmcs);
 u64 virtual_vmcs_vmread(void *vvmcs, u32 vmcs_encoding);
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index c33b9f9..ad341dc 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -446,6 +446,18 @@ static inline int __vmxon(u64 addr)
     return rc;
 }
 
+/*
+ * Not all cases receive valid value in the VM-exit instruction length field.
+ * Callers must know what they're doing!
+ */
+static inline int vmx_get_instruction_length(void)
+{
+    int len;
+    len = __vmread(VM_EXIT_INSTRUCTION_LEN); /* Safe: callers audited */
+    BUG_ON((len < 1) || (len > 15));
+    return len;
+}
+
 void vmx_get_segment_register(struct vcpu *, enum x86_segment,
                               struct segment_register *);
 void vmx_inject_extint(int trap);
@@ -457,7 +469,9 @@ void ept_p2m_uninit(struct p2m_domain *p2m);
 void ept_walk_table(struct domain *d, unsigned long gfn);
 void setup_ept_dump(void);
 
-void update_guest_eip(void);
+void vmx_update_guest_eip(void);
+void vmx_dr_access(unsigned long exit_qualification,
+                   struct cpu_user_regs *regs);
 
 int alloc_p2m_hap_data(struct p2m_domain *p2m);
 void free_p2m_hap_data(struct p2m_domain *p2m);
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 5cdacc7..8c70324 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -566,6 +566,7 @@ void microcode_set_module(unsigned int);
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len);
 int microcode_resume_cpu(int cpu);
 
+void pv_cpuid(struct cpu_user_regs *regs);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_X86_PROCESSOR_H */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/18] PVH xen: Move e820 fields out of pv_domain struct
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (4 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 05/18] PVH xen: vmx realted preparatory changes for PVH Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-06-05 15:33   ` Konrad Rzeszutek Wilk
  2013-05-25  1:25 ` [PATCH 07/18] PVH xen: Introduce PVH guest type Mukesh Rathor
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

This patch moves fields out of the pv_domain struct as they are used by
PVH also.

Changes in V6:
  - Don't base on guest type the initialization and cleanup.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/domain.c        |   10 ++++------
 xen/arch/x86/mm.c            |   26 +++++++++++++-------------
 xen/include/asm-x86/domain.h |   10 +++++-----
 3 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a5f2885..e53a937 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -553,6 +553,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
         if ( (rc = iommu_domain_init(d)) != 0 )
             goto fail;
     }
+    spin_lock_init(&d->arch.e820_lock);
 
     if ( is_hvm_domain(d) )
     {
@@ -563,13 +564,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
         }
     }
     else
-    {
         /* 64-bit PV guest by default. */
         d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
 
-        spin_lock_init(&d->arch.pv_domain.e820_lock);
-    }
-
     /* initialize default tsc behavior in case tools don't */
     tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
     spin_lock_init(&d->arch.vtsc_lock);
@@ -592,8 +589,9 @@ void arch_domain_destroy(struct domain *d)
 {
     if ( is_hvm_domain(d) )
         hvm_domain_destroy(d);
-    else
-        xfree(d->arch.pv_domain.e820);
+
+    if ( d->arch.e820 )
+        xfree(d->arch.e820);
 
     free_domain_pirqs(d);
     if ( !is_idle_domain(d) )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 43f0769..bd1402e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4827,11 +4827,11 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -EFAULT;
         }
 
-        spin_lock(&d->arch.pv_domain.e820_lock);
-        xfree(d->arch.pv_domain.e820);
-        d->arch.pv_domain.e820 = e820;
-        d->arch.pv_domain.nr_e820 = fmap.map.nr_entries;
-        spin_unlock(&d->arch.pv_domain.e820_lock);
+        spin_lock(&d->arch.e820_lock);
+        xfree(d->arch.e820);
+        d->arch.e820 = e820;
+        d->arch.nr_e820 = fmap.map.nr_entries;
+        spin_unlock(&d->arch.e820_lock);
 
         rcu_unlock_domain(d);
         return rc;
@@ -4845,26 +4845,26 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&map, arg, 1) )
             return -EFAULT;
 
-        spin_lock(&d->arch.pv_domain.e820_lock);
+        spin_lock(&d->arch.e820_lock);
 
         /* Backwards compatibility. */
-        if ( (d->arch.pv_domain.nr_e820 == 0) ||
-             (d->arch.pv_domain.e820 == NULL) )
+        if ( (d->arch.nr_e820 == 0) ||
+             (d->arch.e820 == NULL) )
         {
-            spin_unlock(&d->arch.pv_domain.e820_lock);
+            spin_unlock(&d->arch.e820_lock);
             return -ENOSYS;
         }
 
-        map.nr_entries = min(map.nr_entries, d->arch.pv_domain.nr_e820);
-        if ( copy_to_guest(map.buffer, d->arch.pv_domain.e820,
+        map.nr_entries = min(map.nr_entries, d->arch.nr_e820);
+        if ( copy_to_guest(map.buffer, d->arch.e820,
                            map.nr_entries) ||
              __copy_to_guest(arg, &map, 1) )
         {
-            spin_unlock(&d->arch.pv_domain.e820_lock);
+            spin_unlock(&d->arch.e820_lock);
             return -EFAULT;
         }
 
-        spin_unlock(&d->arch.pv_domain.e820_lock);
+        spin_unlock(&d->arch.e820_lock);
         return 0;
     }
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index d79464d..c3f9f8e 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -234,11 +234,6 @@ struct pv_domain
 
     /* map_domain_page() mapping cache. */
     struct mapcache_domain mapcache;
-
-    /* Pseudophysical e820 map (XENMEM_memory_map).  */
-    spinlock_t e820_lock;
-    struct e820entry *e820;
-    unsigned int nr_e820;
 };
 
 struct arch_domain
@@ -313,6 +308,11 @@ struct arch_domain
                                 (possibly other cases in the future */
     uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */
     uint64_t vtsc_usercount; /* not used for hvm */
+
+    /* Pseudophysical e820 map (XENMEM_memory_map).  */
+    spinlock_t e820_lock;
+    struct e820entry *e820;
+    unsigned int nr_e820;
 } __cacheline_aligned;
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/18] PVH xen: Introduce PVH guest type
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (5 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 06/18] PVH xen: Move e820 fields out of pv_domain struct Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 08/18] PVH xen: tools changes to create PVH domain Mukesh Rathor
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

This patch introduces the concept of a pvh guest. There are also other basic
changes like creating macros to check for pvh vcpu/domain, and creating
new macros to see if it's pv/pvh/hvm domain/vcpu. Also, modify copy macros
to include pvh. Lastly, we introduce that PVH uses HVM style event delivery.

Chagnes in V2:
  - make is_pvh/is_hvm enum instead of adding is_pvh as a new flag.
  - fix indentation and spacing in guest_kernel_mode macro.
  - add debug only BUG() in GUEST_KERNEL_RPL macro as it should no longer
    be called in any PVH paths.

Chagnes in V3:
  - Rename enum fields, and add is_pv to it.
  - Get rid if is_hvm_or_pvh_* macros.

Chagnes in V4:
  - Move e820 fields out of pv_domain struct.

Chagnes in V5:
  - Move e820 changes above in V4, to a separate patch.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/debug.c               |    2 +-
 xen/arch/x86/domain.c              |    7 +++++++
 xen/common/domain.c                |    2 +-
 xen/include/asm-x86/desc.h         |    5 +++++
 xen/include/asm-x86/domain.h       |    2 +-
 xen/include/asm-x86/event.h        |    2 +-
 xen/include/asm-x86/guest_access.h |   12 ++++++------
 xen/include/asm-x86/x86_64/regs.h  |    9 +++++----
 xen/include/public/domctl.h        |    3 +++
 xen/include/xen/sched.h            |   21 ++++++++++++++++++---
 10 files changed, 48 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c
index e67473e..167421d 100644
--- a/xen/arch/x86/debug.c
+++ b/xen/arch/x86/debug.c
@@ -158,7 +158,7 @@ dbg_rw_guest_mem(dbgva_t addr, dbgbyte_t *buf, int len, struct domain *dp,
 
         pagecnt = min_t(long, PAGE_SIZE - (addr & ~PAGE_MASK), len);
 
-        mfn = (dp->is_hvm
+        mfn = (!is_pv_domain(dp)
                ? dbg_hvm_va2mfn(addr, dp, toaddr, &gfn)
                : dbg_pv_va2mfn(addr, dp, pgd3));
         if ( mfn == INVALID_MFN ) 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index e53a937..31a8a50 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -644,6 +644,13 @@ int arch_set_info_guest(
     unsigned int i;
     int rc = 0, compat;
 
+    /* This removed when all patches are checked in and PVH is done */
+    if ( is_pvh_vcpu(v) )
+    {
+        printk("PVH: You don't have the correct xen version for PVH\n");
+        return -EINVAL;
+    }
+
     /* The context is a compat-mode one if the target domain is compat-mode;
      * we expect the tools to DTRT even in compat-mode callers. */
     compat = is_pv_32on64_domain(d);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index fac3470..6ece3fe 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -236,7 +236,7 @@ struct domain *domain_create(
         goto fail;
 
     if ( domcr_flags & DOMCRF_hvm )
-        d->is_hvm = 1;
+        d->guest_type = is_hvm;
 
     if ( domid == 0 )
     {
diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h
index 354b889..4dca0a3 100644
--- a/xen/include/asm-x86/desc.h
+++ b/xen/include/asm-x86/desc.h
@@ -38,7 +38,12 @@
 
 #ifndef __ASSEMBLY__
 
+#ifndef NDEBUG
+#define GUEST_KERNEL_RPL(d) (is_pvh_domain(d) ? ({ BUG(); 0; }) :    \
+                                                is_pv_32bit_domain(d) ? 1 : 3)
+#else
 #define GUEST_KERNEL_RPL(d) (is_pv_32bit_domain(d) ? 1 : 3)
+#endif
 
 /* Fix up the RPL of a guest segment selector. */
 #define __fixup_guest_selector(d, sel)                             \
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index c3f9f8e..b95314a 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -16,7 +16,7 @@
 #define is_pv_32on64_domain(d) (is_pv_32bit_domain(d))
 #define is_pv_32on64_vcpu(v)   (is_pv_32on64_domain((v)->domain))
 
-#define is_hvm_pv_evtchn_domain(d) (is_hvm_domain(d) && \
+#define is_hvm_pv_evtchn_domain(d) (!is_pv_domain(d) && \
         d->arch.hvm_domain.irq.callback_via_type == HVMIRQ_callback_vector)
 #define is_hvm_pv_evtchn_vcpu(v) (is_hvm_pv_evtchn_domain(v->domain))
 
diff --git a/xen/include/asm-x86/event.h b/xen/include/asm-x86/event.h
index 06057c7..7ed5812 100644
--- a/xen/include/asm-x86/event.h
+++ b/xen/include/asm-x86/event.h
@@ -18,7 +18,7 @@ int hvm_local_events_need_delivery(struct vcpu *v);
 static inline int local_events_need_delivery(void)
 {
     struct vcpu *v = current;
-    return (is_hvm_vcpu(v) ? hvm_local_events_need_delivery(v) :
+    return (!is_pv_vcpu(v) ? hvm_local_events_need_delivery(v) :
             (vcpu_info(v, evtchn_upcall_pending) &&
              !vcpu_info(v, evtchn_upcall_mask)));
 }
diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
index ca700c9..675dda1 100644
--- a/xen/include/asm-x86/guest_access.h
+++ b/xen/include/asm-x86/guest_access.h
@@ -14,27 +14,27 @@
 
 /* Raw access functions: no type checking. */
 #define raw_copy_to_guest(dst, src, len)        \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      copy_to_user_hvm((dst), (src), (len)) :    \
      copy_to_user((dst), (src), (len)))
 #define raw_copy_from_guest(dst, src, len)      \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      copy_from_user_hvm((dst), (src), (len)) :  \
      copy_from_user((dst), (src), (len)))
 #define raw_clear_guest(dst,  len)              \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      clear_user_hvm((dst), (len)) :             \
      clear_user((dst), (len)))
 #define __raw_copy_to_guest(dst, src, len)      \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      copy_to_user_hvm((dst), (src), (len)) :    \
      __copy_to_user((dst), (src), (len)))
 #define __raw_copy_from_guest(dst, src, len)    \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      copy_from_user_hvm((dst), (src), (len)) :  \
      __copy_from_user((dst), (src), (len)))
 #define __raw_clear_guest(dst,  len)            \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      clear_user_hvm((dst), (len)) :             \
      clear_user((dst), (len)))
 
diff --git a/xen/include/asm-x86/x86_64/regs.h b/xen/include/asm-x86/x86_64/regs.h
index 3cdc702..bb475cf 100644
--- a/xen/include/asm-x86/x86_64/regs.h
+++ b/xen/include/asm-x86/x86_64/regs.h
@@ -10,10 +10,11 @@
 #define ring_2(r)    (((r)->cs & 3) == 2)
 #define ring_3(r)    (((r)->cs & 3) == 3)
 
-#define guest_kernel_mode(v, r)                                 \
-    (!is_pv_32bit_vcpu(v) ?                                     \
-     (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) :        \
-     (ring_1(r)))
+#define guest_kernel_mode(v, r)                                   \
+    (is_pvh_vcpu(v) ? ({ ASSERT(v == current); ring_0(r); }) :    \
+     (!is_pv_32bit_vcpu(v) ?                                      \
+      (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) :         \
+      (ring_1(r))))
 
 #define permit_softint(dpl, v, r) \
     ((dpl) >= (guest_kernel_mode(v, r) ? 1 : 3))
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 4c5b2bb..6b1aa11 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -89,6 +89,9 @@ struct xen_domctl_getdomaininfo {
  /* Being debugged.  */
 #define _XEN_DOMINF_debugged  6
 #define XEN_DOMINF_debugged   (1U<<_XEN_DOMINF_debugged)
+/* domain is PVH */
+#define _XEN_DOMINF_pvh_guest 7
+#define XEN_DOMINF_pvh_guest   (1U<<_XEN_DOMINF_pvh_guest)
  /* XEN_DOMINF_shutdown guest-supplied code.  */
 #define XEN_DOMINF_shutdownmask 255
 #define XEN_DOMINF_shutdownshift 16
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ae6a3b8..3c89452 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -238,6 +238,14 @@ struct mem_event_per_domain
     struct mem_event_domain access;
 };
 
+/*
+ * PVH is a PV guest running in an HVM container. While is_hvm_* are false
+ * for it, it uses many of the HVM data structs.
+ */
+enum guest_type {
+    is_pv, is_pvh, is_hvm
+};
+
 struct domain
 {
     domid_t          domain_id;
@@ -285,8 +293,8 @@ struct domain
     struct rangeset *iomem_caps;
     struct rangeset *irq_caps;
 
-    /* Is this an HVM guest? */
-    bool_t           is_hvm;
+    enum guest_type guest_type;
+
 #ifdef HAS_PASSTHROUGH
     /* Does this guest need iommu mappings? */
     bool_t           need_iommu;
@@ -464,6 +472,9 @@ struct domain *domain_create(
  /* DOMCRF_oos_off: dont use out-of-sync optimization for shadow page tables */
 #define _DOMCRF_oos_off         4
 #define DOMCRF_oos_off          (1U<<_DOMCRF_oos_off)
+ /* DOMCRF_pvh: Create PV domain in HVM container */
+#define _DOMCRF_pvh            5
+#define DOMCRF_pvh             (1U<<_DOMCRF_pvh)
 
 /*
  * rcu_lock_domain_by_id() is more efficient than get_domain_by_id().
@@ -732,8 +743,12 @@ void watchdog_domain_destroy(struct domain *d);
 
 #define VM_ASSIST(_d,_t) (test_bit((_t), &(_d)->vm_assist))
 
-#define is_hvm_domain(d) ((d)->is_hvm)
+#define is_pv_domain(d) ((d)->guest_type == is_pv)
+#define is_pv_vcpu(v)   (is_pv_domain(v->domain))
+#define is_hvm_domain(d) ((d)->guest_type == is_hvm)
 #define is_hvm_vcpu(v)   (is_hvm_domain(v->domain))
+#define is_pvh_domain(d) ((d)->guest_type == is_pvh)
+#define is_pvh_vcpu(v)   (is_pvh_domain(v->domain))
 #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \
                            cpumask_weight((v)->cpu_affinity) == 1)
 #ifdef HAS_PASSTHROUGH
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (6 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 07/18] PVH xen: Introduce PVH guest type Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-06-12 14:58   ` Ian Campbell
  2013-05-25  1:25 ` [PATCH 09/18] PVH xen: domain creation code changes Mukesh Rathor
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

This patch contains tools changes for PVH. For now, only one mode is
supported/tested:
    dom0> losetup /dev/loop1 guest.img
    dom0> In vm.cfg file: disk = ['phy:/dev/loop1,xvda,w']

Chnages in V2: None
Chnages in V3:
  - Document pvh boolean flag in xl.cfg.pod.5
  - Rename ci_pvh and bi_pvh to pvh, and domcr_is_pvh to pvh_enabled.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 docs/man/xl.cfg.pod.5             |    3 +++
 tools/debugger/gdbsx/xg/xg_main.c |    4 +++-
 tools/libxc/xc_dom.h              |    1 +
 tools/libxc/xc_dom_x86.c          |    7 ++++---
 tools/libxl/libxl_create.c        |    2 ++
 tools/libxl/libxl_dom.c           |   18 +++++++++++++++++-
 tools/libxl/libxl_types.idl       |    2 ++
 tools/libxl/libxl_x86.c           |    4 +++-
 tools/libxl/xl_cmdimpl.c          |   11 +++++++++++
 tools/xenstore/xenstored_domain.c |   12 +++++++-----
 10 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index e0c3bb2..d44ce01 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -621,6 +621,9 @@ if your particular guest kernel does not require this behaviour then
 it is safe to allow this to be enabled but you may wish to disable it
 anyway.
 
+=item B<pvh=BOOLEAN>
+Selects whether to run this guest in an HVM container. Default is 0.
+
 =back
 
 =head2 Fully-virtualised (HVM) Guest Specific Options
diff --git a/tools/debugger/gdbsx/xg/xg_main.c b/tools/debugger/gdbsx/xg/xg_main.c
index 64c7484..5736b86 100644
--- a/tools/debugger/gdbsx/xg/xg_main.c
+++ b/tools/debugger/gdbsx/xg/xg_main.c
@@ -81,6 +81,7 @@ int xgtrc_on = 0;
 struct xen_domctl domctl;         /* just use a global domctl */
 
 static int     _hvm_guest;        /* hvm guest? 32bit HVMs have 64bit context */
+static int     _pvh_guest;        /* PV guest in HVM container */
 static domid_t _dom_id;           /* guest domid */
 static int     _max_vcpu_id;      /* thus max_vcpu_id+1 VCPUs */
 static int     _dom0_fd;          /* fd of /dev/privcmd */
@@ -309,6 +310,7 @@ xg_attach(int domid, int guest_bitness)
 
     _max_vcpu_id = domctl.u.getdomaininfo.max_vcpu_id;
     _hvm_guest = (domctl.u.getdomaininfo.flags & XEN_DOMINF_hvm_guest);
+    _pvh_guest = (domctl.u.getdomaininfo.flags & XEN_DOMINF_pvh_guest);
     return _max_vcpu_id;
 }
 
@@ -369,7 +371,7 @@ _change_TF(vcpuid_t which_vcpu, int guest_bitness, int setit)
     int sz = sizeof(anyc);
 
     /* first try the MTF for hvm guest. otherwise do manually */
-    if (_hvm_guest) {
+    if (_hvm_guest || _pvh_guest) {
         domctl.u.debug_op.vcpu = which_vcpu;
         domctl.u.debug_op.op = setit ? XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON :
                                        XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF;
diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h
index ac36600..8b43d2b 100644
--- a/tools/libxc/xc_dom.h
+++ b/tools/libxc/xc_dom.h
@@ -130,6 +130,7 @@ struct xc_dom_image {
     domid_t console_domid;
     domid_t xenstore_domid;
     xen_pfn_t shared_info_mfn;
+    int pvh_enabled;
 
     xc_interface *xch;
     domid_t guest_domid;
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index f1be43b..24f6759 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -389,7 +389,8 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
         pgpfn = (addr - dom->parms.virt_base) >> PAGE_SHIFT_X86;
         l1tab[l1off] =
             pfn_to_paddr(xc_dom_p2m_guest(dom, pgpfn)) | L1_PROT;
-        if ( (addr >= dom->pgtables_seg.vstart) && 
+        if ( (!dom->pvh_enabled)                &&
+             (addr >= dom->pgtables_seg.vstart) &&
              (addr < dom->pgtables_seg.vend) )
             l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
         if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
@@ -706,7 +707,7 @@ int arch_setup_meminit(struct xc_dom_image *dom)
     rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type);
     if ( rc )
         return rc;
-    if ( xc_dom_feature_translated(dom) )
+    if ( xc_dom_feature_translated(dom) && !dom->pvh_enabled )
     {
         dom->shadow_enabled = 1;
         rc = x86_shadow(dom->xch, dom->guest_domid);
@@ -832,7 +833,7 @@ int arch_setup_bootlate(struct xc_dom_image *dom)
         }
 
         /* Map grant table frames into guest physmap. */
-        for ( i = 0; ; i++ )
+        for ( i = 0; !dom->pvh_enabled; i++ )
         {
             rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid,
                                           XENMAPSPACE_grant_table,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index cb9c822..83e2d5b 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -421,6 +421,8 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_create_info *info,
         flags |= XEN_DOMCTL_CDF_hvm_guest;
         flags |= libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
         flags |= libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
+    } else if ( libxl_defbool_val(info->pvh) ) {
+        flags |= XEN_DOMCTL_CDF_hap;
     }
     *domid = -1;
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index b38d0a7..cefbf76 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -329,9 +329,23 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
     struct xc_dom_image *dom;
     int ret;
     int flags = 0;
+    int is_pvh = libxl_defbool_val(info->pvh);
 
     xc_dom_loginit(ctx->xch);
 
+    if (is_pvh) {
+        char *pv_feats = "writable_descriptor_tables|auto_translated_physmap"
+                         "|supervisor_mode_kernel|hvm_callback_vector";
+
+        if (info->u.pv.features && info->u.pv.features[0] != '\0')
+        {
+            LOG(ERROR, "Didn't expect info->u.pv.features to contain string\n");
+            LOG(ERROR, "String: %s\n", info->u.pv.features);
+            return ERROR_FAIL;
+        }
+        info->u.pv.features = strdup(pv_feats);
+    }
+
     dom = xc_dom_allocate(ctx->xch, state->pv_cmdline, info->u.pv.features);
     if (!dom) {
         LOGE(ERROR, "xc_dom_allocate failed");
@@ -370,6 +384,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
     }
 
     dom->flags = flags;
+    dom->pvh_enabled = is_pvh;
     dom->console_evtchn = state->console_port;
     dom->console_domid = state->console_domid;
     dom->xenstore_evtchn = state->store_port;
@@ -400,7 +415,8 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
         LOGE(ERROR, "xc_dom_boot_image failed");
         goto out;
     }
-    if ( (ret = xc_dom_gnttab_init(dom)) != 0 ) {
+    /* PVH sets up its own grant during boot via hvm mechanisms */
+    if ( !is_pvh && (ret = xc_dom_gnttab_init(dom)) != 0 ) {
         LOGE(ERROR, "xc_dom_gnttab_init failed");
         goto out;
     }
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 8262cba..43e6d95 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -245,6 +245,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
     ("platformdata", libxl_key_value_list),
     ("poolid",       uint32),
     ("run_hotplug_scripts",libxl_defbool),
+    ("pvh",          libxl_defbool),
     ], dir=DIR_IN)
 
 MemKB = UInt(64, init_val = "LIBXL_MEMKB_DEFAULT")
@@ -346,6 +347,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                       ])),
                  ("invalid", Struct(None, [])),
                  ], keyvar_init_val = "LIBXL_DOMAIN_TYPE_INVALID")),
+    ("pvh",       libxl_defbool),
     ], dir=DIR_IN
 )
 
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index a17f6ae..424bc68 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -290,7 +290,9 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
     if (rtc_timeoffset)
         xc_domain_set_time_offset(ctx->xch, domid, rtc_timeoffset);
 
-    if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM) {
+    if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM ||
+        libxl_defbool_val(d_config->b_info.pvh)) {
+
         unsigned long shadow;
         shadow = (d_config->b_info.shadow_memkb + 1023) / 1024;
         xc_shadow_control(ctx->xch, domid, XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION, NULL, 0, &shadow, 0, NULL);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index e13a64e..e032668 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -610,8 +610,18 @@ static void parse_config_data(const char *config_source,
         !strncmp(buf, "hvm", strlen(buf)))
         c_info->type = LIBXL_DOMAIN_TYPE_HVM;
 
+    libxl_defbool_setdefault(&c_info->pvh, false);
+    libxl_defbool_setdefault(&c_info->hap, false);
+    xlu_cfg_get_defbool(config, "pvh", &c_info->pvh, 0);
     xlu_cfg_get_defbool(config, "hap", &c_info->hap, 0);
 
+    if (libxl_defbool_val(c_info->pvh) &&
+        !libxl_defbool_val(c_info->hap)) {
+
+        fprintf(stderr, "hap is required for PVH domain\n");
+        exit(1);
+    }
+
     if (xlu_cfg_replace_string (config, "name", &c_info->name, 0)) {
         fprintf(stderr, "Domain name must be specified.\n");
         exit(1);
@@ -918,6 +928,7 @@ static void parse_config_data(const char *config_source,
 
         b_info->u.pv.cmdline = cmdline;
         xlu_cfg_replace_string (config, "ramdisk", &b_info->u.pv.ramdisk, 0);
+        libxl_defbool_set(&b_info->pvh, libxl_defbool_val(c_info->pvh));
         break;
     }
     default:
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index bf83d58..10c23a1 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -168,13 +168,15 @@ static int readchn(struct connection *conn, void *data, unsigned int len)
 static void *map_interface(domid_t domid, unsigned long mfn)
 {
 	if (*xcg_handle != NULL) {
-		/* this is the preferred method */
-		return xc_gnttab_map_grant_ref(*xcg_handle, domid,
+                void *addr;
+                /* this is the preferred method */
+                addr = xc_gnttab_map_grant_ref(*xcg_handle, domid,
 			GNTTAB_RESERVED_XENSTORE, PROT_READ|PROT_WRITE);
-	} else {
-		return xc_map_foreign_range(*xc_handle, domid,
-			getpagesize(), PROT_READ|PROT_WRITE, mfn);
+                if (addr)
+                        return addr;
 	}
+	return xc_map_foreign_range(*xc_handle, domid,
+		        getpagesize(), PROT_READ|PROT_WRITE, mfn);
 }
 
 static void unmap_interface(void *interface)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/18] PVH xen: domain creation code changes
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (7 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 08/18] PVH xen: tools changes to create PVH domain Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 10/18] PVH xen: create PVH vmcs, and also initialization Mukesh Rathor
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

This patch contains changes to arch/x86/domain.c to allow for a PVH
domain.
Changes in V2:
  - changes to read_segment_register() moved to this patch.
  - The other comment was to create NULL functions for pvh_set_vcpu_info
    and pvh_read_descriptor which are implemented in later patch, but since
    I disable PVH creation until all patches are checked in, it is not needed.
    But it helps breaking down of patches.

Changes in V3:
  - Fix read_segment_register() macro to make sure args are evaluated once,
    and use # instead of STR for name in the macro.

Changes in V4:
  - Remove pvh substruct in the hvm substruct, as the vcpu_info_mfn has been
    moved out of pv_vcpu struct.
  - rename hvm_pvh_* functions to hvm_*.

Changes in V5:
  - remove pvh_read_descriptor().

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/domain.c         |   61 +++++++++++++++++++++++++++-------------
 xen/arch/x86/mm.c             |    3 ++
 xen/arch/x86/mm/hap/hap.c     |    4 ++-
 xen/include/asm-x86/hvm/hvm.h |    8 +++++
 xen/include/asm-x86/system.h  |   18 +++++++++---
 5 files changed, 69 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 31a8a50..9953f80 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -385,7 +385,7 @@ int vcpu_initialise(struct vcpu *v)
 
     vmce_init_vcpu(v);
 
-    if ( is_hvm_domain(d) )
+    if ( !is_pv_domain(d) )
     {
         rc = hvm_vcpu_initialise(v);
         goto done;
@@ -452,7 +452,7 @@ void vcpu_destroy(struct vcpu *v)
 
     vcpu_destroy_fpu(v);
 
-    if ( is_hvm_vcpu(v) )
+    if ( !is_pv_vcpu(v) )
         hvm_vcpu_destroy(v);
     else
         xfree(v->arch.pv_vcpu.trap_ctxt);
@@ -464,7 +464,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     int rc = -ENOMEM;
 
     d->arch.hvm_domain.hap_enabled =
-        is_hvm_domain(d) &&
+        !is_pv_domain(d) &&
         hvm_funcs.hap_supported &&
         (domcr_flags & DOMCRF_hap);
     d->arch.hvm_domain.mem_sharing_enabled = 0;
@@ -512,7 +512,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     mapcache_domain_init(d);
 
     HYPERVISOR_COMPAT_VIRT_START(d) =
-        is_hvm_domain(d) ? ~0u : __HYPERVISOR_COMPAT_VIRT_START;
+        is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
 
     if ( (rc = paging_domain_init(d, domcr_flags)) != 0 )
         goto fail;
@@ -555,7 +555,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     }
     spin_lock_init(&d->arch.e820_lock);
 
-    if ( is_hvm_domain(d) )
+    if ( !is_pv_domain(d) )
     {
         if ( (rc = hvm_domain_initialise(d)) != 0 )
         {
@@ -658,7 +658,7 @@ int arch_set_info_guest(
 #define c(fld) (compat ? (c.cmp->fld) : (c.nat->fld))
     flags = c(flags);
 
-    if ( !is_hvm_vcpu(v) )
+    if ( is_pv_vcpu(v) )
     {
         if ( !compat )
         {
@@ -711,7 +711,7 @@ int arch_set_info_guest(
     v->fpu_initialised = !!(flags & VGCF_I387_VALID);
 
     v->arch.flags &= ~TF_kernel_mode;
-    if ( (flags & VGCF_in_kernel) || is_hvm_vcpu(v)/*???*/ )
+    if ( (flags & VGCF_in_kernel) || !is_pv_vcpu(v)/*???*/ )
         v->arch.flags |= TF_kernel_mode;
 
     v->arch.vgc_flags = flags;
@@ -722,7 +722,7 @@ int arch_set_info_guest(
     if ( !compat )
     {
         memcpy(&v->arch.user_regs, &c.nat->user_regs, sizeof(c.nat->user_regs));
-        if ( !is_hvm_vcpu(v) )
+        if ( is_pv_vcpu(v) )
             memcpy(v->arch.pv_vcpu.trap_ctxt, c.nat->trap_ctxt,
                    sizeof(c.nat->trap_ctxt));
     }
@@ -738,10 +738,13 @@ int arch_set_info_guest(
 
     v->arch.user_regs.eflags |= 2;
 
-    if ( is_hvm_vcpu(v) )
+    if ( !is_pv_vcpu(v) )
     {
         hvm_set_info_guest(v);
-        goto out;
+        if ( is_hvm_vcpu(v) || v->is_initialised )
+            goto out;
+        else
+            goto pvh_skip_pv_stuff;
     }
 
     init_int80_direct_trap(v);
@@ -750,7 +753,10 @@ int arch_set_info_guest(
     v->arch.pv_vcpu.iopl = (v->arch.user_regs.eflags >> 12) & 3;
     v->arch.user_regs.eflags &= ~X86_EFLAGS_IOPL;
 
-    /* Ensure real hardware interrupts are enabled. */
+    /*
+     * Ensure real hardware interrupts are enabled. Note: PVH may not have
+     * IDT set on all vcpus so we don't enable IF for it yet.
+     */
     v->arch.user_regs.eflags |= X86_EFLAGS_IF;
 
     if ( !v->is_initialised )
@@ -852,6 +858,7 @@ int arch_set_info_guest(
 
     set_bit(_VPF_in_reset, &v->pause_flags);
 
+pvh_skip_pv_stuff:
     if ( !compat )
         cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[3]);
     else
@@ -860,7 +867,7 @@ int arch_set_info_guest(
 
     if ( !cr3_page )
         rc = -EINVAL;
-    else if ( paging_mode_refcounts(d) )
+    else if ( paging_mode_refcounts(d) || is_pvh_vcpu(v) )
         /* nothing */;
     else if ( cr3_page == v->arch.old_guest_table )
     {
@@ -886,8 +893,15 @@ int arch_set_info_guest(
         /* handled below */;
     else if ( !compat )
     {
+        /* PVH 32bitfixme */
+        if ( is_pvh_vcpu(v) )
+        {
+            v->arch.cr3 = page_to_mfn(cr3_page);
+            v->arch.hvm_vcpu.guest_cr[3] = c.nat->ctrlreg[3];
+        }
+
         v->arch.guest_table = pagetable_from_page(cr3_page);
-        if ( c.nat->ctrlreg[1] )
+        if ( c.nat->ctrlreg[1] && !is_pvh_vcpu(v) )
         {
             cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[1]);
             cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC);
@@ -942,6 +956,13 @@ int arch_set_info_guest(
 
     update_cr3(v);
 
+    if ( is_pvh_vcpu(v) )
+    {
+        /* guest is bringing up non-boot SMP vcpu */
+        if ( (rc=hvm_set_vcpu_info(v, c.nat)) != 0 )
+            return rc;
+    }
+
  out:
     if ( flags & VGCF_online )
         clear_bit(_VPF_down, &v->pause_flags);
@@ -1309,7 +1330,7 @@ static void update_runstate_area(struct vcpu *v)
 
 static inline int need_full_gdt(struct vcpu *v)
 {
-    return (!is_hvm_vcpu(v) && !is_idle_vcpu(v));
+    return (is_pv_vcpu(v) && !is_idle_vcpu(v));
 }
 
 static void __context_switch(void)
@@ -1443,7 +1464,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
         /* Re-enable interrupts before restoring state which may fault. */
         local_irq_enable();
 
-        if ( !is_hvm_vcpu(next) )
+        if ( is_pv_vcpu(next) )
         {
             load_LDT(next);
             load_segments(next);
@@ -1566,12 +1587,12 @@ unsigned long hypercall_create_continuation(
         regs->eax  = op;
 
         /* Ensure the hypercall trap instruction is re-executed. */
-        if ( !is_hvm_vcpu(current) )
+        if ( is_pv_vcpu(current) )
             regs->eip -= 2;  /* re-execute 'syscall' / 'int $xx' */
         else
             current->arch.hvm_vcpu.hcall_preempted = 1;
 
-        if ( !is_hvm_vcpu(current) ?
+        if ( is_pv_vcpu(current) ?
              !is_pv_32on64_vcpu(current) :
              (hvm_guest_x86_mode(current) == 8) )
         {
@@ -1839,7 +1860,7 @@ int domain_relinquish_resources(struct domain *d)
                 return ret;
         }
 
-        if ( !is_hvm_domain(d) )
+        if ( is_pv_domain(d) )
         {
             for_each_vcpu ( d, v )
             {
@@ -1912,7 +1933,7 @@ int domain_relinquish_resources(struct domain *d)
         BUG();
     }
 
-    if ( is_hvm_domain(d) )
+    if ( !is_pv_domain(d) )
         hvm_domain_relinquish_resources(d);
 
     return 0;
@@ -1996,7 +2017,7 @@ void vcpu_mark_events_pending(struct vcpu *v)
     if ( already_pending )
         return;
 
-    if ( is_hvm_vcpu(v) )
+    if ( !is_pv_vcpu(v) )
         hvm_assert_evtchn_irq(v);
     else
         vcpu_kick(v);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index bd1402e..b190ad9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4330,6 +4330,9 @@ void destroy_gdt(struct vcpu *v)
     int i;
     unsigned long pfn;
 
+    if ( is_pvh_vcpu(v) )
+        return;
+
     v->arch.pv_vcpu.gdt_ents = 0;
     pl1e = gdt_ldt_ptes(v->domain, v);
     for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ )
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index bff05d9..5aa0852 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -639,7 +639,9 @@ static void hap_update_cr3(struct vcpu *v, int do_locking)
 const struct paging_mode *
 hap_paging_get_mode(struct vcpu *v)
 {
-    return !hvm_paging_enabled(v)   ? &hap_paging_real_mode :
+    /* PVH 32bitfixme */
+    return is_pvh_vcpu(v) ? &hap_paging_long_mode :
+        !hvm_paging_enabled(v)   ? &hap_paging_real_mode :
         hvm_long_mode_enabled(v) ? &hap_paging_long_mode :
         hvm_pae_enabled(v)       ? &hap_paging_pae_mode  :
                                    &hap_paging_protected_mode;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 8408420..7e21ee1 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -192,6 +192,8 @@ struct hvm_function_table {
                                 paddr_t *L1_gpa, unsigned int *page_order,
                                 uint8_t *p2m_acc, bool_t access_r,
                                 bool_t access_w, bool_t access_x);
+    /* PVH functions */
+    int (*pvh_set_vcpu_info)(struct vcpu *v, struct vcpu_guest_context *ctxtp);
 };
 
 extern struct hvm_function_table hvm_funcs;
@@ -325,6 +327,12 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
     return hvm_funcs.get_shadow_gs_base(v);
 }
 
+static inline int hvm_set_vcpu_info(struct vcpu *v,
+                                        struct vcpu_guest_context *ctxtp)
+{
+    return hvm_funcs.pvh_set_vcpu_info(v, ctxtp);
+}
+
 #define is_viridian_domain(_d)                                             \
  (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VIRIDIAN]))
 
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 9bb22cb..955983b 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -4,10 +4,20 @@
 #include <xen/lib.h>
 #include <xen/bitops.h>
 
-#define read_segment_register(vcpu, regs, name)                 \
-({  u16 __sel;                                                  \
-    asm volatile ( "movw %%" STR(name) ",%0" : "=r" (__sel) );  \
-    __sel;                                                      \
+/*
+ * We need vcpu because during context switch, going from pure PV to PVH,
+ * in save_segments(), current has been updated to next, and no longer pointing
+ * to the pure PV. Note: for PVH, we update regs->selectors on each vmexit.
+ */
+#define read_segment_register(vcpu, regs, name)                   \
+({  u16 __sel;                                                    \
+    struct cpu_user_regs *_regs = (regs);                         \
+                                                                  \
+    if ( is_pvh_vcpu(vcpu) )                                      \
+        __sel = _regs->name;                                      \
+    else                                                          \
+        asm volatile ( "movw %%" #name ",%0" : "=r" (__sel) );    \
+    __sel;                                                        \
 })
 
 #define wbinvd() \
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/18] PVH xen: create PVH vmcs, and also initialization
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (8 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 09/18] PVH xen: domain creation code changes Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 11/18] PVH xen: create read_descriptor_sel() Mukesh Rathor
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

This patch mainly contains code to create a VMCS for PVH guest, and HVM
specific vcpu/domain creation code.

Changes in V2:
  - Avoid call to hvm_do_resume() at call site rather than return in it.
  - Return for PVH vmx_do_resume prior to intel debugger stuff.

Changes in V3:
  - Cleanup pvh_construct_vmcs().
  - Fix formatting in few places, adding XENLOG_G_ERR to printing.
  - Do not load the CS selector for PVH here, but try to do that in Linux.

Changes in V4:
  - Remove VM_ENTRY_LOAD_DEBUG_CTLS clearing.
  - Add 32bit kernel changes mark.
  - Verify pit_init call for PVH.

Changes in V5:
  - formatting. remove unnecessary variable guest_pat.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/hvm/hvm.c      |   94 ++++++++++++-
 xen/arch/x86/hvm/vmx/vmcs.c |  312 ++++++++++++++++++++++++++++++++++++++----
 xen/arch/x86/hvm/vmx/vmx.c  |   40 ++++++
 3 files changed, 410 insertions(+), 36 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bcf9609..a525080 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -510,6 +510,30 @@ static int hvm_print_line(
     return X86EMUL_OKAY;
 }
 
+static int pvh_dom_initialise(struct domain *d)
+{
+    int rc;
+
+    if ( !d->arch.hvm_domain.hap_enabled )
+        return -EINVAL;
+
+    spin_lock_init(&d->arch.hvm_domain.irq_lock);
+
+    hvm_init_cacheattr_region_list(d);
+
+    if ( (rc = paging_enable(d, PG_refcounts|PG_translate|PG_external)) != 0 )
+        goto fail1;
+
+    if ( (rc = hvm_funcs.domain_initialise(d)) != 0 )
+        goto fail1;
+
+    return 0;
+
+fail1:
+    hvm_destroy_cacheattr_region_list(d);
+    return rc;
+}
+
 int hvm_domain_initialise(struct domain *d)
 {
     int rc;
@@ -520,6 +544,8 @@ int hvm_domain_initialise(struct domain *d)
                  "on a non-VT/AMDV platform.\n");
         return -EINVAL;
     }
+    if ( is_pvh_domain(d) )
+        return pvh_dom_initialise(d);
 
     spin_lock_init(&d->arch.hvm_domain.pbuf_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
@@ -584,6 +610,11 @@ int hvm_domain_initialise(struct domain *d)
 
 void hvm_domain_relinquish_resources(struct domain *d)
 {
+    if ( is_pvh_domain(d) )
+    {
+        pit_deinit(d);
+        return;
+    }
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
@@ -609,10 +640,14 @@ void hvm_domain_relinquish_resources(struct domain *d)
 void hvm_domain_destroy(struct domain *d)
 {
     hvm_funcs.domain_destroy(d);
+    hvm_destroy_cacheattr_region_list(d);
+
+    if ( is_pvh_domain(d) )
+        return;
+
     rtc_deinit(d);
     stdvga_deinit(d);
     vioapic_deinit(d);
-    hvm_destroy_cacheattr_region_list(d);
 }
 
 static int hvm_save_tsc_adjust(struct domain *d, hvm_domain_context_t *h)
@@ -1066,14 +1101,46 @@ static int __init __hvm_register_CPU_XSAVE_save_and_restore(void)
 }
 __initcall(__hvm_register_CPU_XSAVE_save_and_restore);
 
+static int pvh_vcpu_initialise(struct vcpu *v)
+{
+    int rc;
+
+    if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 )
+        return rc;
+
+    softirq_tasklet_init(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet,
+                         (void(*)(unsigned long))hvm_assert_evtchn_irq,
+                         (unsigned long)v);
+
+    v->arch.hvm_vcpu.hcall_64bit = 1;    /* PVH 32bitfixme */
+    v->arch.user_regs.eflags = 2;
+    v->arch.hvm_vcpu.inject_trap.vector = -1;
+
+    if ( (rc = hvm_vcpu_cacheattr_init(v)) != 0 )
+    {
+        hvm_funcs.vcpu_destroy(v);
+        return rc;
+    }
+    if ( v->vcpu_id == 0 )
+        pit_init(v, cpu_khz);
+
+    return 0;
+}
+
 int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    domid_t dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+    domid_t dm_domid;
 
     hvm_asid_flush_vcpu(v);
 
+    spin_lock_init(&v->arch.hvm_vcpu.tm_lock);
+    INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list);
+
+    if ( is_pvh_vcpu(v) )
+        return pvh_vcpu_initialise(v);
+
     if ( (rc = vlapic_init(v)) != 0 )
         goto fail1;
 
@@ -1084,6 +1151,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) 
         goto fail3;
 
+    dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+
     /* Create ioreq event channel. */
     rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL);
     if ( rc < 0 )
@@ -1106,9 +1175,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
         get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
     spin_unlock(&d->arch.hvm_domain.ioreq.lock);
 
-    spin_lock_init(&v->arch.hvm_vcpu.tm_lock);
-    INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list);
-
     v->arch.hvm_vcpu.inject_trap.vector = -1;
 
     rc = setup_compat_arg_xlat(v);
@@ -1163,7 +1229,10 @@ void hvm_vcpu_destroy(struct vcpu *v)
 
     tasklet_kill(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet);
     hvm_vcpu_cacheattr_destroy(v);
-    vlapic_destroy(v);
+
+    if ( !is_pvh_vcpu(v) )
+        vlapic_destroy(v);
+
     hvm_funcs.vcpu_destroy(v);
 
     /* Event channel is already freed by evtchn_destroy(). */
@@ -4528,8 +4597,11 @@ static int hvm_memory_event_traps(long p, uint32_t reason,
     return 1;
 }
 
+/* PVH fixme: add support for monitoring guest behaviour in below functions */
 void hvm_memory_event_cr0(unsigned long value, unsigned long old) 
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_CR0],
                            MEM_EVENT_REASON_CR0,
@@ -4538,6 +4610,8 @@ void hvm_memory_event_cr0(unsigned long value, unsigned long old)
 
 void hvm_memory_event_cr3(unsigned long value, unsigned long old) 
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_CR3],
                            MEM_EVENT_REASON_CR3,
@@ -4546,6 +4620,8 @@ void hvm_memory_event_cr3(unsigned long value, unsigned long old)
 
 void hvm_memory_event_cr4(unsigned long value, unsigned long old) 
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_CR4],
                            MEM_EVENT_REASON_CR4,
@@ -4554,6 +4630,8 @@ void hvm_memory_event_cr4(unsigned long value, unsigned long old)
 
 void hvm_memory_event_msr(unsigned long msr, unsigned long value)
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_MSR],
                            MEM_EVENT_REASON_MSR,
@@ -4566,6 +4644,8 @@ int hvm_memory_event_int3(unsigned long gla)
     unsigned long gfn;
     gfn = paging_gva_to_gfn(current, gla, &pfec);
 
+    if ( is_pvh_vcpu(current) )
+        return 0;
     return hvm_memory_event_traps(current->domain->arch.hvm_domain
                                     .params[HVM_PARAM_MEMORY_EVENT_INT3],
                                   MEM_EVENT_REASON_INT3,
@@ -4578,6 +4658,8 @@ int hvm_memory_event_single_step(unsigned long gla)
     unsigned long gfn;
     gfn = paging_gva_to_gfn(current, gla, &pfec);
 
+    if ( is_pvh_vcpu(current) )
+        return 0;
     return hvm_memory_event_traps(current->domain->arch.hvm_domain
             .params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP],
             MEM_EVENT_REASON_SINGLESTEP,
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index ef0ee7f..3844104 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -634,7 +634,7 @@ void vmx_vmcs_exit(struct vcpu *v)
     {
         /* Don't confuse vmx_do_resume (for @v or @current!) */
         vmx_clear_vmcs(v);
-        if ( is_hvm_vcpu(current) )
+        if ( !is_pv_vcpu(current) )
             vmx_load_vmcs(current);
 
         spin_unlock(&v->arch.hvm_vmx.vmcs_lock);
@@ -825,16 +825,285 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
     virtual_vmcs_exit(vvmcs);
 }
 
-static int construct_vmcs(struct vcpu *v)
+static void vmx_set_common_host_vmcs_fields(struct vcpu *v)
 {
-    struct domain *d = v->domain;
     uint16_t sysenter_cs;
     unsigned long sysenter_eip;
+
+    /* Host data selectors. */
+    __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS);
+    __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS);
+    __vmwrite(HOST_ES_SELECTOR, __HYPERVISOR_DS);
+    __vmwrite(HOST_FS_SELECTOR, 0);
+    __vmwrite(HOST_GS_SELECTOR, 0);
+    __vmwrite(HOST_FS_BASE, 0);
+    __vmwrite(HOST_GS_BASE, 0);
+
+    /* Host control registers. */
+    v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS;
+    __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);
+    __vmwrite(HOST_CR4,
+              mmu_cr4_features | (xsave_enabled(v) ? X86_CR4_OSXSAVE : 0));
+
+    /* Host CS:RIP. */
+    __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS);
+    __vmwrite(HOST_RIP, (unsigned long)vmx_asm_vmexit_handler);
+
+    /* Host SYSENTER CS:RIP. */
+    rdmsrl(MSR_IA32_SYSENTER_CS, sysenter_cs);
+    __vmwrite(HOST_SYSENTER_CS, sysenter_cs);
+    rdmsrl(MSR_IA32_SYSENTER_EIP, sysenter_eip);
+    __vmwrite(HOST_SYSENTER_EIP, sysenter_eip);
+}
+
+static int pvh_check_requirements(struct vcpu *v)
+{
+    u64 required, tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features);
+
+    if ( !paging_mode_hap(v->domain) )
+    {
+        printk(XENLOG_G_INFO "HAP is required for PVH guest.\n");
+        return -EINVAL;
+    }
+    if ( !cpu_has_vmx_pat )
+    {
+        printk(XENLOG_G_INFO "PVH: CPU does not have PAT support\n");
+        return -ENOSYS;
+    }
+    if ( !cpu_has_vmx_msr_bitmap )
+    {
+        printk(XENLOG_G_INFO "PVH: CPU does not have msr bitmap\n");
+        return -ENOSYS;
+    }
+    if ( !cpu_has_vmx_vpid )
+    {
+        printk(XENLOG_G_INFO "PVH: CPU doesn't have VPID support\n");
+        return -ENOSYS;
+    }
+    if ( !cpu_has_vmx_secondary_exec_control )
+    {
+        printk(XENLOG_G_INFO "CPU Secondary exec is required to run PVH\n");
+        return -ENOSYS;
+    }
+
+    if ( v->domain->arch.vtsc )
+    {
+        printk(XENLOG_G_INFO
+                "At present PVH only supports the default timer mode\n");
+        return -ENOSYS;
+    }
+
+    required = X86_CR4_PAE | X86_CR4_VMXE | X86_CR4_OSFXSR;
+    if ( (tmpval & required) != required )
+    {
+        printk(XENLOG_G_INFO "PVH: required CR4 features not available:%lx\n",
+                required);
+        return -ENOSYS;
+    }
+
+    return 0;
+}
+
+static int pvh_construct_vmcs(struct vcpu *v)
+{
+    int rc, msr_type;
+    unsigned long *msr_bitmap;
+    struct domain *d = v->domain;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    struct ept_data *ept = &p2m->ept;
+    u32 vmexit_ctl = vmx_vmexit_control;
+    u32 vmentry_ctl = vmx_vmentry_control;
+    u64 host_pat, tmpval = -1;
+
+    if ( (rc = pvh_check_requirements(v)) )
+        return rc;
+
+    msr_bitmap = alloc_xenheap_page();
+    if ( msr_bitmap == NULL )
+        return -ENOMEM;
+
+    /* 1. Pin-Based Controls */
+    __vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_control);
+
+    v->arch.hvm_vmx.exec_control = vmx_cpu_based_exec_control;
+
+    /* 2. Primary Processor-based controls */
+    /*
+     * If rdtsc exiting is turned on and it goes thru emulate_privileged_op,
+     * then pv_vcpu.ctrlreg must be added to the pvh struct.
+     */
+    v->arch.hvm_vmx.exec_control &= ~CPU_BASED_RDTSC_EXITING;
+    v->arch.hvm_vmx.exec_control &= ~CPU_BASED_USE_TSC_OFFSETING;
+
+    v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING |
+                                      CPU_BASED_CR3_LOAD_EXITING |
+                                      CPU_BASED_CR3_STORE_EXITING);
+    v->arch.hvm_vmx.exec_control |= CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
+    v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG;
+    v->arch.hvm_vmx.exec_control |= CPU_BASED_ACTIVATE_MSR_BITMAP;
+    v->arch.hvm_vmx.exec_control &= ~CPU_BASED_TPR_SHADOW;
+    v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
+
+    __vmwrite(CPU_BASED_VM_EXEC_CONTROL, v->arch.hvm_vmx.exec_control);
+
+    /* 3. Secondary Processor-based controls. Intel SDM: resvd bits are 0 */
+    v->arch.hvm_vmx.secondary_exec_control = SECONDARY_EXEC_ENABLE_EPT;
+    v->arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_ENABLE_VPID;
+    v->arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+
+    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
+              v->arch.hvm_vmx.secondary_exec_control);
+
+    __vmwrite(IO_BITMAP_A, virt_to_maddr((char *)hvm_io_bitmap + 0));
+    __vmwrite(IO_BITMAP_B, virt_to_maddr((char *)hvm_io_bitmap + PAGE_SIZE));
+
+    /* MSR bitmap for intercepts */
+    memset(msr_bitmap, ~0, PAGE_SIZE);
+    v->arch.hvm_vmx.msr_bitmap = msr_bitmap;
+    __vmwrite(MSR_BITMAP, virt_to_maddr(msr_bitmap));
+
+    msr_type = MSR_TYPE_R | MSR_TYPE_W;
+    /* Disable interecepts for MSRs that have corresponding VMCS fields */
+    vmx_disable_intercept_for_msr(v, MSR_FS_BASE, msr_type);
+    vmx_disable_intercept_for_msr(v, MSR_GS_BASE, msr_type);
+    vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS, msr_type);
+    vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP, msr_type);
+    vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, msr_type);
+    vmx_disable_intercept_for_msr(v, MSR_SHADOW_GS_BASE, msr_type);
+    vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, msr_type);
+
+    /*
+     * We don't disable intercepts for MSRs: MSR_STAR, MSR_LSTAR, MSR_CSTAR,
+     * and MSR_SYSCALL_MASK because we need to specify save/restore area to
+     * save/restore at every VM exit and entry. Instead, let the intercept
+     * functions save them into vmx_msr_state fields. See comment in
+     * vmx_restore_host_msrs(). See also vmx_restore_guest_msrs().
+     */
+    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, 0);
+    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, 0);
+    __vmwrite(VM_EXIT_MSR_STORE_COUNT, 0);
+
+    __vmwrite(VM_EXIT_CONTROLS, vmexit_ctl);
+
+    /*
+     * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of 1, which means
+     * upon vmentry, the cpu reads/loads VMCS.DR7 and VMCS.DEBUGCTLS, and not
+     * use the host values. 0 would cause it to not use the VMCS values.
+     */
+    vmentry_ctl &= ~VM_ENTRY_LOAD_GUEST_EFER;
+    vmentry_ctl &= ~VM_ENTRY_SMM;
+    vmentry_ctl &= ~VM_ENTRY_DEACT_DUAL_MONITOR;
+    /* PVH 32bitfixme */
+    vmentry_ctl |= VM_ENTRY_IA32E_MODE;       /* GUEST_EFER.LME/LMA ignored */
+
+    __vmwrite(VM_ENTRY_CONTROLS, vmentry_ctl);
+
+    vmx_set_common_host_vmcs_fields(v);
+
+    __vmwrite(VM_ENTRY_INTR_INFO, 0);
+    __vmwrite(CR3_TARGET_COUNT, 0);
+    __vmwrite(GUEST_ACTIVITY_STATE, 0);
+
+    /* These are sorta irrelevant as we load the discriptors directly. */
+    __vmwrite(GUEST_CS_SELECTOR, 0);
+    __vmwrite(GUEST_DS_SELECTOR, 0);
+    __vmwrite(GUEST_SS_SELECTOR, 0);
+    __vmwrite(GUEST_ES_SELECTOR, 0);
+    __vmwrite(GUEST_FS_SELECTOR, 0);
+    __vmwrite(GUEST_GS_SELECTOR, 0);
+
+    __vmwrite(GUEST_CS_BASE, 0);
+    __vmwrite(GUEST_CS_LIMIT, ~0u);
+    /* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme */
+    __vmwrite(GUEST_CS_AR_BYTES, 0xa09b);
+
+    __vmwrite(GUEST_DS_BASE, 0);
+    __vmwrite(GUEST_DS_LIMIT, ~0u);
+    __vmwrite(GUEST_DS_AR_BYTES, 0xc093); /* read/write, accessed */
+
+    __vmwrite(GUEST_SS_BASE, 0);
+    __vmwrite(GUEST_SS_LIMIT, ~0u);
+    __vmwrite(GUEST_SS_AR_BYTES, 0xc093); /* read/write, accessed */
+
+    __vmwrite(GUEST_ES_BASE, 0);
+    __vmwrite(GUEST_ES_LIMIT, ~0u);
+    __vmwrite(GUEST_ES_AR_BYTES, 0xc093); /* read/write, accessed */
+
+    __vmwrite(GUEST_FS_BASE, 0);
+    __vmwrite(GUEST_FS_LIMIT, ~0u);
+    __vmwrite(GUEST_FS_AR_BYTES, 0xc093); /* read/write, accessed */
+
+    __vmwrite(GUEST_GS_BASE, 0);
+    __vmwrite(GUEST_GS_LIMIT, ~0u);
+    __vmwrite(GUEST_GS_AR_BYTES, 0xc093); /* read/write, accessed */
+
+    __vmwrite(GUEST_GDTR_BASE, 0);
+    __vmwrite(GUEST_GDTR_LIMIT, 0);
+
+    __vmwrite(GUEST_LDTR_BASE, 0);
+    __vmwrite(GUEST_LDTR_LIMIT, 0);
+    __vmwrite(GUEST_LDTR_AR_BYTES, 0x82); /* LDT */
+    __vmwrite(GUEST_LDTR_SELECTOR, 0);
+
+    /* Guest TSS. */
+    __vmwrite(GUEST_TR_BASE, 0);
+    __vmwrite(GUEST_TR_LIMIT, 0xff);
+    __vmwrite(GUEST_TR_AR_BYTES, 0x8b); /* 32-bit TSS (busy) */
+
+    __vmwrite(GUEST_INTERRUPTIBILITY_INFO, 0);
+    __vmwrite(GUEST_DR7, 0);
+    __vmwrite(VMCS_LINK_POINTER, ~0UL);
+
+    __vmwrite(PAGE_FAULT_ERROR_CODE_MASK, 0);
+    __vmwrite(PAGE_FAULT_ERROR_CODE_MATCH, 0);
+
+    v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK | (1U << TRAP_debug) |
+                                   (1U << TRAP_int3) | (1U << TRAP_no_device);
+    __vmwrite(EXCEPTION_BITMAP, v->arch.hvm_vmx.exception_bitmap);
+
+    /* Set WP bit so rdonly pages are not written from CPL 0 */
+    tmpval = X86_CR0_PG | X86_CR0_NE | X86_CR0_PE | X86_CR0_WP;
+    __vmwrite(GUEST_CR0, tmpval);
+    __vmwrite(CR0_READ_SHADOW, tmpval);
+    v->arch.hvm_vcpu.hw_cr[0] = v->arch.hvm_vcpu.guest_cr[0] = tmpval;
+
+    tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features);
+    __vmwrite(GUEST_CR4, tmpval);
+    __vmwrite(CR4_READ_SHADOW, tmpval);
+    v->arch.hvm_vcpu.guest_cr[4] = tmpval;
+
+    __vmwrite(CR0_GUEST_HOST_MASK, ~0UL);
+    __vmwrite(CR4_GUEST_HOST_MASK, ~0UL);
+
+     v->arch.hvm_vmx.vmx_realmode = 0;
+
+    ept->asr  = pagetable_get_pfn(p2m_get_pagetable(p2m));
+    __vmwrite(EPT_POINTER, ept_get_eptp(ept));
+
+    rdmsrl(MSR_IA32_CR_PAT, host_pat);
+    __vmwrite(HOST_PAT, host_pat);
+    __vmwrite(GUEST_PAT, MSR_IA32_CR_PAT_RESET);
+
+    /* the paging mode is updated for PVH by arch_set_info_guest() */
+
+    return 0;
+}
+
+static int construct_vmcs(struct vcpu *v)
+{
+    struct domain *d = v->domain;
     u32 vmexit_ctl = vmx_vmexit_control;
     u32 vmentry_ctl = vmx_vmentry_control;
 
     vmx_vmcs_enter(v);
 
+    if ( is_pvh_vcpu(v) )
+    {
+        int rc = pvh_construct_vmcs(v);
+        vmx_vmcs_exit(v);
+        return rc;
+    }
+
     /* VMCS controls. */
     __vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_control);
 
@@ -932,30 +1201,7 @@ static int construct_vmcs(struct vcpu *v)
         __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
     }
 
-    /* Host data selectors. */
-    __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS);
-    __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS);
-    __vmwrite(HOST_ES_SELECTOR, __HYPERVISOR_DS);
-    __vmwrite(HOST_FS_SELECTOR, 0);
-    __vmwrite(HOST_GS_SELECTOR, 0);
-    __vmwrite(HOST_FS_BASE, 0);
-    __vmwrite(HOST_GS_BASE, 0);
-
-    /* Host control registers. */
-    v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS;
-    __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);
-    __vmwrite(HOST_CR4,
-              mmu_cr4_features | (xsave_enabled(v) ? X86_CR4_OSXSAVE : 0));
-
-    /* Host CS:RIP. */
-    __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS);
-    __vmwrite(HOST_RIP, (unsigned long)vmx_asm_vmexit_handler);
-
-    /* Host SYSENTER CS:RIP. */
-    rdmsrl(MSR_IA32_SYSENTER_CS, sysenter_cs);
-    __vmwrite(HOST_SYSENTER_CS, sysenter_cs);
-    rdmsrl(MSR_IA32_SYSENTER_EIP, sysenter_eip);
-    __vmwrite(HOST_SYSENTER_EIP, sysenter_eip);
+    vmx_set_common_host_vmcs_fields(v);
 
     /* MSR intercepts. */
     __vmwrite(VM_EXIT_MSR_LOAD_COUNT, 0);
@@ -1275,8 +1521,11 @@ void vmx_do_resume(struct vcpu *v)
 
         vmx_clear_vmcs(v);
         vmx_load_vmcs(v);
-        hvm_migrate_timers(v);
-        hvm_migrate_pirqs(v);
+        if ( !is_pvh_vcpu(v) )
+        {
+            hvm_migrate_timers(v);
+            hvm_migrate_pirqs(v);
+        }
         vmx_set_host_env(v);
         /*
          * Both n1 VMCS and n2 VMCS need to update the host environment after 
@@ -1288,6 +1537,9 @@ void vmx_do_resume(struct vcpu *v)
         hvm_asid_flush_vcpu(v);
     }
 
+    if ( is_pvh_vcpu(v) )
+        reset_stack_and_jump(vmx_asm_do_vmentry);
+
     debug_state = v->domain->debugger_attached
                   || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_INT3]
                   || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP];
@@ -1471,7 +1723,7 @@ static void vmcs_dump(unsigned char ch)
 
     for_each_domain ( d )
     {
-        if ( !is_hvm_domain(d) )
+        if ( is_pv_domain(d) )
             continue;
         printk("\n>>> Domain %d <<<\n", d->domain_id);
         for_each_vcpu ( d, v )
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 25a265e..d20be75 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -82,6 +82,9 @@ static int vmx_domain_initialise(struct domain *d)
 {
     int rc;
 
+    if ( is_pvh_domain(d) )
+        return 0;
+
     if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 )
         return rc;
 
@@ -90,6 +93,9 @@ static int vmx_domain_initialise(struct domain *d)
 
 static void vmx_domain_destroy(struct domain *d)
 {
+    if ( is_pvh_domain(d) )
+        return;
+
     vmx_free_vlapic_mapping(d);
 }
 
@@ -113,6 +119,12 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 
     vpmu_initialise(v);
 
+    if ( is_pvh_vcpu(v) )
+    {
+        /* this for hvm_long_mode_enabled(v) */
+        v->arch.hvm_vcpu.guest_efer = EFER_SCE | EFER_LMA | EFER_LME;
+        return 0;
+    }
     vmx_install_vlapic_mapping(v);
 
     /* %eax == 1 signals full real-mode support to the guest loader. */
@@ -1034,6 +1046,28 @@ static void vmx_update_host_cr3(struct vcpu *v)
     vmx_vmcs_exit(v);
 }
 
+/*
+ * PVH guest never causes CR3 write vmexit. This called during the guest
+ * setup.
+ */
+static void vmx_update_pvh_cr(struct vcpu *v, unsigned int cr)
+{
+    vmx_vmcs_enter(v);
+    switch ( cr )
+    {
+    case 3:
+        __vmwrite(GUEST_CR3, v->arch.hvm_vcpu.guest_cr[3]);
+        hvm_asid_flush_vcpu(v);
+        break;
+
+    default:
+        printk(XENLOG_ERR
+               "PVH: d%d v%d unexpected cr%d update at rip:%lx\n",
+               v->domain->domain_id, v->vcpu_id, cr, __vmread(GUEST_RIP));
+    }
+    vmx_vmcs_exit(v);
+}
+
 void vmx_update_debug_state(struct vcpu *v)
 {
     unsigned long mask;
@@ -1053,6 +1087,12 @@ void vmx_update_debug_state(struct vcpu *v)
 
 static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
 {
+    if ( is_pvh_vcpu(v) )
+    {
+        vmx_update_pvh_cr(v, cr);
+        return;
+    }
+
     vmx_vmcs_enter(v);
 
     switch ( cr )
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/18] PVH xen: create read_descriptor_sel()
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (9 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 10/18] PVH xen: create PVH vmcs, and also initialization Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 12/18] PVH xen: support hypercalls for PVH Mukesh Rathor
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

This patch changes read descriptor functionality to support PVH by
introducing read_descriptor_sel(). Also, we make emulate_forced_invalid_op()
public and suitable for PVH use.

Changes in V5: None. New patch (separating this from prev patch 10 which was
               getting large).

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/traps.c            |   87 +++++++++++++++++++++++++++++++++------
 xen/include/asm-x86/processor.h |    1 +
 2 files changed, 75 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index ce7c18a..75faf04 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -913,7 +913,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     return EXCRET_fault_fixed;
 }
 
-static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
+int emulate_forced_invalid_op(struct cpu_user_regs *regs)
 {
     char sig[5], instr[2];
     unsigned long eip, rc;
@@ -921,7 +921,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     eip = regs->eip;
 
     /* Check for forced emulation signature: ud2 ; .ascii "xen". */
-    if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 )
+    if ( (rc = raw_copy_from_guest(sig, (char *)eip, sizeof(sig))) != 0 )
     {
         propagate_page_fault(eip + sizeof(sig) - rc, 0);
         return EXCRET_fault_fixed;
@@ -931,7 +931,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     eip += sizeof(sig);
 
     /* We only emulate CPUID. */
-    if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 )
+    if ( ( rc = raw_copy_from_guest(instr, (char *)eip, sizeof(instr))) != 0 )
     {
         propagate_page_fault(eip + sizeof(instr) - rc, 0);
         return EXCRET_fault_fixed;
@@ -1076,6 +1076,12 @@ void propagate_page_fault(unsigned long addr, u16 error_code)
     struct vcpu *v = current;
     struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
 
+    if ( is_pvh_vcpu(v) )
+    {
+        hvm_inject_page_fault(error_code, addr);
+        return;
+    }
+
     v->arch.pv_vcpu.ctrlreg[2] = addr;
     arch_set_cr2(v, addr);
 
@@ -1514,6 +1520,49 @@ static int read_descriptor(unsigned int sel,
     return 1;
 }
 
+static int read_descriptor_sel(unsigned int sel,
+                               enum x86_segment which_sel,
+                               struct vcpu *v,
+                               const struct cpu_user_regs *regs,
+                               unsigned long *base,
+                               unsigned long *limit,
+                               unsigned int *ar,
+                               unsigned int vm86attr)
+{
+    struct segment_register seg;
+    unsigned int long_mode = 0;
+
+    if ( !is_pvh_vcpu(v) )
+        return read_descriptor(sel, v, regs, base, limit, ar, vm86attr);
+
+    hvm_get_segment_register(v, x86_seg_cs, &seg);
+    long_mode = seg.attr.fields.l;
+
+    if ( which_sel != x86_seg_cs )
+        hvm_get_segment_register(v, which_sel, &seg);
+
+    /* ar is returned packed as in segment_attributes_t. Fix it up */
+    *ar = (unsigned int)seg.attr.bytes;
+    *ar = (*ar & 0xff ) | ((*ar & 0xf00) << 4);
+    *ar = *ar << 8;
+
+    if ( long_mode )
+    {
+        *limit = ~0UL;
+
+        if ( which_sel < x86_seg_fs )
+        {
+            *base = 0UL;
+            return 1;
+        }
+   }
+   else
+       *limit = (unsigned long)seg.limit;
+
+   *base = seg.base;
+    return 1;
+}
+
 static int read_gate_descriptor(unsigned int gate_sel,
                                 const struct vcpu *v,
                                 unsigned int *sel,
@@ -1841,6 +1890,7 @@ static int is_cpufreq_controller(struct domain *d)
 
 int emulate_privileged_op(struct cpu_user_regs *regs)
 {
+    enum x86_segment which_sel;
     struct vcpu *v = current;
     unsigned long *reg, eip = regs->eip;
     u8 opcode, modrm_reg = 0, modrm_rm = 0, rep_prefix = 0, lock = 0, rex = 0;
@@ -1863,9 +1913,10 @@ int emulate_privileged_op(struct cpu_user_regs *regs)
     void (*io_emul)(struct cpu_user_regs *) __attribute__((__regparm__(1)));
     uint64_t val, msr_content;
 
-    if ( !read_descriptor(regs->cs, v, regs,
-                          &code_base, &code_limit, &ar,
-                          _SEGMENT_CODE|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P) )
+    if ( !read_descriptor_sel(regs->cs, x86_seg_cs, v, regs,
+                              &code_base, &code_limit, &ar,
+                              _SEGMENT_CODE|_SEGMENT_S|
+                              _SEGMENT_DPL|_SEGMENT_P) )
         goto fail;
     op_default = op_bytes = (ar & (_SEGMENT_L|_SEGMENT_DB)) ? 4 : 2;
     ad_default = ad_bytes = (ar & _SEGMENT_L) ? 8 : op_default;
@@ -1876,6 +1927,7 @@ int emulate_privileged_op(struct cpu_user_regs *regs)
 
     /* emulating only opcodes not allowing SS to be default */
     data_sel = read_segment_register(v, regs, ds);
+    which_sel = x86_seg_ds;
 
     /* Legacy prefixes. */
     for ( i = 0; i < 8; i++, rex == opcode || (rex = 0) )
@@ -1891,23 +1943,29 @@ int emulate_privileged_op(struct cpu_user_regs *regs)
             continue;
         case 0x2e: /* CS override */
             data_sel = regs->cs;
+            which_sel = x86_seg_cs;
             continue;
         case 0x3e: /* DS override */
             data_sel = read_segment_register(v, regs, ds);
+            which_sel = x86_seg_ds;
             continue;
         case 0x26: /* ES override */
             data_sel = read_segment_register(v, regs, es);
+            which_sel = x86_seg_es;
             continue;
         case 0x64: /* FS override */
             data_sel = read_segment_register(v, regs, fs);
+            which_sel = x86_seg_fs;
             lm_ovr = lm_seg_fs;
             continue;
         case 0x65: /* GS override */
             data_sel = read_segment_register(v, regs, gs);
+            which_sel = x86_seg_gs;
             lm_ovr = lm_seg_gs;
             continue;
         case 0x36: /* SS override */
             data_sel = regs->ss;
+            which_sel = x86_seg_ss;
             continue;
         case 0xf0: /* LOCK */
             lock = 1;
@@ -1951,15 +2009,16 @@ int emulate_privileged_op(struct cpu_user_regs *regs)
         if ( !(opcode & 2) )
         {
             data_sel = read_segment_register(v, regs, es);
+            which_sel = x86_seg_es;
             lm_ovr = lm_seg_none;
         }
 
         if ( !(ar & _SEGMENT_L) )
         {
-            if ( !read_descriptor(data_sel, v, regs,
-                                  &data_base, &data_limit, &ar,
-                                  _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|
-                                  _SEGMENT_P) )
+            if ( !read_descriptor_sel(data_sel, which_sel, v, regs,
+                                      &data_base, &data_limit, &ar,
+                                      _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|
+                                      _SEGMENT_P) )
                 goto fail;
             if ( !(ar & _SEGMENT_S) ||
                  !(ar & _SEGMENT_P) ||
@@ -1989,9 +2048,9 @@ int emulate_privileged_op(struct cpu_user_regs *regs)
                 }
             }
             else
-                read_descriptor(data_sel, v, regs,
-                                &data_base, &data_limit, &ar,
-                                0);
+                read_descriptor_sel(data_sel, which_sel, v, regs,
+                                    &data_base, &data_limit, &ar,
+                                    0);
             data_limit = ~0UL;
             ar = _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P;
         }
@@ -2646,6 +2705,8 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
     unsigned long off, eip, opnd_off, base, limit;
     int jump;
 
+    ASSERT(!is_pvh_vcpu(v));
+
     /* Check whether this fault is due to the use of a call gate. */
     if ( !read_gate_descriptor(regs->error_code, v, &sel, &off, &ar) ||
          (((ar >> 13) & 3) < (regs->cs & 3)) ||
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 8c70324..ab15ff0 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -567,6 +567,7 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len);
 int microcode_resume_cpu(int cpu);
 
 void pv_cpuid(struct cpu_user_regs *regs);
+int emulate_forced_invalid_op(struct cpu_user_regs *regs);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_X86_PROCESSOR_H */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/18] PVH xen: support hypercalls for PVH
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (10 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 11/18] PVH xen: create read_descriptor_sel() Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-06-05 15:27   ` Konrad Rzeszutek Wilk
  2013-05-25  1:25 ` [PATCH 13/18] PVH xen: introduce vmx_pvh.c Mukesh Rathor
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

This patch replaces the old patch which created pvh.c. Instead, we modify
hvm.c to add support for PVH also.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/hvm/hvm.c |   58 +++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index a525080..74004bc 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3242,6 +3242,8 @@ static long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         case PHYSDEVOP_get_free_pirq:
             return do_physdev_op(cmd, arg);
         default:
+            if ( is_pvh_vcpu(current) && is_hardware_domain(current->domain) )
+                return do_physdev_op(cmd, arg);
             return -ENOSYS;
     }
 }
@@ -3249,7 +3251,7 @@ static long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 static long hvm_vcpu_op(
     int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
-    long rc;
+    long rc = -ENOSYS;
 
     switch ( cmd )
     {
@@ -3262,6 +3264,14 @@ static long hvm_vcpu_op(
     case VCPUOP_register_vcpu_info:
         rc = do_vcpu_op(cmd, vcpuid, arg);
         break;
+
+    case VCPUOP_is_up:
+    case VCPUOP_up:
+    case VCPUOP_initialise:
+        if ( is_pvh_vcpu(current) )
+            rc = do_vcpu_op(cmd, vcpuid, arg);
+        break;
+
     default:
         rc = -ENOSYS;
         break;
@@ -3381,12 +3391,31 @@ static hvm_hypercall_t *const hvm_hypercall32_table[NR_hypercalls] = {
     HYPERCALL(tmem_op)
 };
 
+/* PVH 32bitfixme */
+static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
+    HYPERCALL(platform_op),
+    HYPERCALL(memory_op),
+    HYPERCALL(xen_version),
+    HYPERCALL(console_io),
+    [ __HYPERVISOR_grant_table_op ]  = (hvm_hypercall_t *)hvm_grant_table_op,
+    [ __HYPERVISOR_vcpu_op ]         = (hvm_hypercall_t *)hvm_vcpu_op,
+    HYPERCALL(mmuext_op),
+    HYPERCALL(xsm_op),
+    HYPERCALL(sched_op),
+    HYPERCALL(event_channel_op),
+    [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
+    HYPERCALL(hvm_op),
+    HYPERCALL(sysctl),
+    HYPERCALL(domctl)
+};
+
 int hvm_do_hypercall(struct cpu_user_regs *regs)
 {
     struct vcpu *curr = current;
     struct segment_register sreg;
     int mode = hvm_guest_x86_mode(curr);
     uint32_t eax = regs->eax;
+    hvm_hypercall_t **hcall_table;
 
     switch ( mode )
     {
@@ -3407,7 +3436,9 @@ int hvm_do_hypercall(struct cpu_user_regs *regs)
     if ( (eax & 0x80000000) && is_viridian_domain(curr->domain) )
         return viridian_hypercall(regs);
 
-    if ( (eax >= NR_hypercalls) || !hvm_hypercall32_table[eax] )
+    if ( (eax >= NR_hypercalls) ||
+         (is_pvh_vcpu(curr) && !pvh_hypercall64_table[eax]) ||
+         (is_hvm_vcpu(curr) && !hvm_hypercall32_table[eax]) )
     {
         regs->eax = -ENOSYS;
         return HVM_HCALL_completed;
@@ -3421,17 +3452,24 @@ int hvm_do_hypercall(struct cpu_user_regs *regs)
                     eax, regs->rdi, regs->rsi, regs->rdx,
                     regs->r10, regs->r8, regs->r9);
 
+        if ( is_pvh_vcpu(curr) )
+            hcall_table = (hvm_hypercall_t **)pvh_hypercall64_table;
+        else
+            hcall_table = (hvm_hypercall_t **)hvm_hypercall64_table;
+
         curr->arch.hvm_vcpu.hcall_64bit = 1;
-        regs->rax = hvm_hypercall64_table[eax](regs->rdi,
-                                               regs->rsi,
-                                               regs->rdx,
-                                               regs->r10,
-                                               regs->r8,
-                                               regs->r9); 
+        regs->rax = hcall_table[eax](regs->rdi,
+                                     regs->rsi,
+                                     regs->rdx,
+                                     regs->r10,
+                                     regs->r8,
+                                     regs->r9);
         curr->arch.hvm_vcpu.hcall_64bit = 0;
     }
     else
     {
+        ASSERT(!is_pvh_vcpu(curr));   /* PVH 32bitfixme */
+
         HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%u(%x, %x, %x, %x, %x, %x)", eax,
                     (uint32_t)regs->ebx, (uint32_t)regs->ecx,
                     (uint32_t)regs->edx, (uint32_t)regs->esi,
@@ -3855,7 +3893,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -ESRCH;
 
         rc = -EINVAL;
-        if ( !is_hvm_domain(d) )
+        if ( is_pv_domain(d) )
             goto param_fail;
 
         rc = xsm_hvm_param(XSM_TARGET, d, op);
@@ -4027,7 +4065,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 break;
             }
 
-            if ( rc == 0 ) 
+            if ( rc == 0 && !is_pvh_domain(d) )
             {
                 d->arch.hvm_domain.params[a.index] = a.value;
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 13/18] PVH xen: introduce vmx_pvh.c
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (11 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 12/18] PVH xen: support hypercalls for PVH Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 14/18] PVH xen: some misc changes like mtrr, intr, msi Mukesh Rathor
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

The heart of this patch is vmx exit handler for PVH guest. It is nicely
isolated in a separate module as preferred by most of us. A call to it
is added to vmx_pvh_vmexit_handler().

Changes in V2:
  - Move non VMX generic code to arch/x86/hvm/pvh.c
  - Remove get_gpr_ptr() and use existing decode_register() instead.
  - Defer call to pvh vmx exit handler until interrupts are enabled. So the
    caller vmx_pvh_vmexit_handler() handles the NMI/EXT-INT/TRIPLE_FAULT now.
  - Fix the CPUID (wrongly) clearing bit 24. No need to do this now, set
    the correct feature bits in CR4 during vmcs creation.
  - Fix few hard tabs.

Changes in V3:
  - Lot of cleanup and rework in PVH vm exit handler.
  - add parameter to emulate_forced_invalid_op().

Changes in V5:
  - Move pvh.c and emulate_forced_invalid_op related changes to another patch.
  - Formatting.
  - Remove vmx_pvh_read_descriptor().
  - Use SS DPL instead of CS.RPL for CPL.
  - Remove pvh_user_cpuid() and call pv_cpuid for user mode also.

Changes in V6:
  - Replace domain_crash_synchronous() with domain_crash().

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/hvm/vmx/Makefile     |    1 +
 xen/arch/x86/hvm/vmx/vmx.c        |    7 +
 xen/arch/x86/hvm/vmx/vmx_pvh.c    |  500 +++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmx.h |    2 +
 4 files changed, 510 insertions(+), 0 deletions(-)
 create mode 100644 xen/arch/x86/hvm/vmx/vmx_pvh.c

diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
index 373b3d9..8b71dae 100644
--- a/xen/arch/x86/hvm/vmx/Makefile
+++ b/xen/arch/x86/hvm/vmx/Makefile
@@ -5,3 +5,4 @@ obj-y += vmcs.o
 obj-y += vmx.o
 obj-y += vpmu_core2.o
 obj-y += vvmx.o
+obj-y += vmx_pvh.o
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d20be75..7205acc 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1595,6 +1595,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .deliver_posted_intr  = vmx_deliver_posted_intr,
     .sync_pir_to_irr      = vmx_sync_pir_to_irr,
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
+    .pvh_set_vcpu_info    = vmx_pvh_set_vcpu_info,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
@@ -2447,6 +2448,12 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     if ( unlikely(exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) )
         return vmx_failed_vmentry(exit_reason, regs);
 
+    if ( is_pvh_vcpu(v) )
+    {
+        vmx_pvh_vmexit_handler(regs);
+        return;
+    }
+
     if ( v->arch.hvm_vmx.vmx_realmode )
     {
         /* Put RFLAGS back the way the guest wants it */
diff --git a/xen/arch/x86/hvm/vmx/vmx_pvh.c b/xen/arch/x86/hvm/vmx/vmx_pvh.c
new file mode 100644
index 0000000..00371d0
--- /dev/null
+++ b/xen/arch/x86/hvm/vmx/vmx_pvh.c
@@ -0,0 +1,500 @@
+/*
+ * Copyright (C) 2013, Mukesh Rathor, Oracle Corp.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <xen/hypercall.h>
+#include <xen/guest_access.h>
+#include <asm/p2m.h>
+#include <asm/traps.h>
+#include <asm/hvm/vmx/vmx.h>
+#include <public/sched.h>
+
+#ifndef NDEBUG
+int pvhdbg = 0;
+#define dbgp1(...) do { (pvhdbg == 1) ? printk(__VA_ARGS__) : 0; } while ( 0 )
+#else
+#define dbgp1(...) ((void)0)
+#endif
+
+
+static void read_vmcs_selectors(struct cpu_user_regs *regs)
+{
+    regs->cs = __vmread(GUEST_CS_SELECTOR);
+    regs->ss = __vmread(GUEST_SS_SELECTOR);
+    regs->ds = __vmread(GUEST_DS_SELECTOR);
+    regs->es = __vmread(GUEST_ES_SELECTOR);
+    regs->gs = __vmread(GUEST_GS_SELECTOR);
+    regs->fs = __vmread(GUEST_FS_SELECTOR);
+}
+
+/* returns : 0 == msr read successfully */
+static int vmxit_msr_read(struct cpu_user_regs *regs)
+{
+    u64 msr_content = 0;
+
+    switch ( regs->ecx )
+    {
+    case MSR_IA32_MISC_ENABLE:
+        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
+        msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
+                       MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
+        break;
+
+    default:
+        /* pvh fixme: see hvm_msr_read_intercept() */
+        rdmsrl(regs->ecx, msr_content);
+        break;
+    }
+    regs->eax = (uint32_t)msr_content;
+    regs->edx = (uint32_t)(msr_content >> 32);
+    vmx_update_guest_eip();
+
+    dbgp1("msr read c:%lx a:%lx d:%lx RIP:%lx RSP:%lx\n", regs->ecx, regs->eax,
+          regs->edx, regs->rip, regs->rsp);
+
+    return 0;
+}
+
+/* returns : 0 == msr written successfully */
+static int vmxit_msr_write(struct cpu_user_regs *regs)
+{
+    uint64_t msr_content = (uint32_t)regs->eax | ((uint64_t)regs->edx << 32);
+
+    dbgp1("PVH: msr write:0x%lx. eax:0x%lx edx:0x%lx\n", regs->ecx,
+          regs->eax, regs->edx);
+
+    if ( hvm_msr_write_intercept(regs->ecx, msr_content) == X86EMUL_OKAY )
+    {
+        vmx_update_guest_eip();
+        return 0;
+    }
+    return 1;
+}
+
+static int vmxit_debug(struct cpu_user_regs *regs)
+{
+    struct vcpu *vp = current;
+    unsigned long exit_qualification = __vmread(EXIT_QUALIFICATION);
+
+    write_debugreg(6, exit_qualification | 0xffff0ff0);
+
+    /* gdbsx or another debugger */
+    if ( vp->domain->domain_id != 0 &&    /* never pause dom0 */
+         guest_kernel_mode(vp, regs) &&  vp->domain->debugger_attached )
+
+        domain_pause_for_debugger();
+    else
+        hvm_inject_hw_exception(TRAP_debug, HVM_DELIVER_NO_ERROR_CODE);
+
+    return 0;
+}
+
+/* Returns: rc == 0: handled the MTF vmexit */
+static int vmxit_mtf(struct cpu_user_regs *regs)
+{
+    struct vcpu *vp = current;
+    int rc = -EINVAL, ss = vp->arch.hvm_vcpu.single_step;
+
+    vp->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG;
+    __vmwrite(CPU_BASED_VM_EXEC_CONTROL, vp->arch.hvm_vmx.exec_control);
+    vp->arch.hvm_vcpu.single_step = 0;
+
+    if ( vp->domain->debugger_attached && ss )
+    {
+        domain_pause_for_debugger();
+        rc = 0;
+    }
+    return rc;
+}
+
+static int vmxit_int3(struct cpu_user_regs *regs)
+{
+    int ilen = vmx_get_instruction_length();
+    struct vcpu *vp = current;
+    struct hvm_trap trap_info = {
+        .vector = TRAP_int3,
+        .type = X86_EVENTTYPE_SW_EXCEPTION,
+        .error_code = HVM_DELIVER_NO_ERROR_CODE,
+        .insn_len = ilen
+    };
+
+    /* gdbsx or another debugger. Never pause dom0 */
+    if ( vp->domain->domain_id != 0 && guest_kernel_mode(vp, regs) )
+    {
+        regs->eip += ilen;
+        dbgp1("[%d]PVH: domain pause for debugger\n", smp_processor_id());
+        current->arch.gdbsx_vcpu_event = TRAP_int3;
+        domain_pause_for_debugger();
+        return 0;
+    }
+    hvm_inject_trap(&trap_info);
+
+    return 0;
+}
+
+static int vmxit_invalid_op(struct cpu_user_regs *regs)
+{
+    if ( guest_kernel_mode(current, regs) || !emulate_forced_invalid_op(regs) )
+        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+
+    return 0;
+}
+
+/* Returns: rc == 0: handled the exception/NMI */
+static int vmxit_exception(struct cpu_user_regs *regs)
+{
+    int vector = (__vmread(VM_EXIT_INTR_INFO)) & INTR_INFO_VECTOR_MASK;
+    int rc = -ENOSYS;
+
+    dbgp1(" EXCPT: vec:%d cs:%lx r.IP:%lx\n", vector,
+          __vmread(GUEST_CS_SELECTOR), regs->eip);
+
+    switch ( vector )
+    {
+    case TRAP_debug:
+        rc = vmxit_debug(regs);
+        break;
+
+    case TRAP_int3:
+        rc = vmxit_int3(regs);
+        break;
+
+    case TRAP_invalid_op:
+        rc = vmxit_invalid_op(regs);
+        break;
+
+    case TRAP_no_device:
+        hvm_funcs.fpu_dirty_intercept();
+        rc = 0;
+        break;
+
+    default:
+        gdprintk(XENLOG_G_WARNING,
+                 "PVH: Unhandled trap:%d. IP:%lx\n", vector, regs->eip);
+    }
+    return rc;
+}
+
+static int vmxit_vmcall(struct cpu_user_regs *regs)
+{
+    if ( hvm_do_hypercall(regs) != HVM_HCALL_preempted )
+        vmx_update_guest_eip();
+    return 0;
+}
+
+/* Returns: rc == 0: success */
+static int access_cr0(struct cpu_user_regs *regs, uint acc_typ, uint64_t *regp)
+{
+    struct vcpu *vp = current;
+
+    if ( acc_typ == VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR )
+    {
+        unsigned long new_cr0 = *regp;
+        unsigned long old_cr0 = __vmread(GUEST_CR0);
+
+        dbgp1("PVH:writing to CR0. RIP:%lx val:0x%lx\n", regs->rip, *regp);
+        if ( (u32)new_cr0 != new_cr0 )
+        {
+            gdprintk(XENLOG_G_WARNING,
+                     "Guest setting upper 32 bits in CR0: %lx", new_cr0);
+            return -EPERM;
+        }
+
+        new_cr0 &= ~HVM_CR0_GUEST_RESERVED_BITS;
+        /* ET is reserved and should always be 1. */
+        new_cr0 |= X86_CR0_ET;
+
+        /* pvh not expected to change to real mode */
+        if ( (new_cr0 & (X86_CR0_PE | X86_CR0_PG)) !=
+             (X86_CR0_PG | X86_CR0_PE) )
+        {
+            gdprintk(XENLOG_G_WARNING,
+                     "PVH attempting to turn off PE/PG. CR0:%lx\n", new_cr0);
+            return -EPERM;
+        }
+        /* TS going from 1 to 0 */
+        if ( (old_cr0 & X86_CR0_TS) && ((new_cr0 & X86_CR0_TS) == 0) )
+            vmx_fpu_enter(vp);
+
+        vp->arch.hvm_vcpu.hw_cr[0] = vp->arch.hvm_vcpu.guest_cr[0] = new_cr0;
+        __vmwrite(GUEST_CR0, new_cr0);
+        __vmwrite(CR0_READ_SHADOW, new_cr0);
+    }
+    else
+        *regp = __vmread(GUEST_CR0);
+
+    return 0;
+}
+
+/* Returns: rc == 0: success */
+static int access_cr4(struct cpu_user_regs *regs, uint acc_typ, uint64_t *regp)
+{
+    if ( acc_typ == VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR )
+    {
+        u64 old_cr4 = __vmread(GUEST_CR4);
+        u64 new = *regp;
+
+        if ( (old_cr4 ^ new) & (X86_CR4_PSE | X86_CR4_PGE | X86_CR4_PAE) )
+            vpid_sync_all();
+
+        __vmwrite(CR4_READ_SHADOW, new);
+
+        new &= ~X86_CR4_PAE;     /* PVH always runs with hap enabled */
+        new |= X86_CR4_VMXE | X86_CR4_MCE;
+        __vmwrite(GUEST_CR4, new);
+    }
+    else
+        *regp = __vmread(CR4_READ_SHADOW);
+
+    return 0;
+}
+
+/* Returns: rc == 0: success, else -errno */
+static int vmxit_cr_access(struct cpu_user_regs *regs)
+{
+    unsigned long exit_qualification = __vmread(EXIT_QUALIFICATION);
+    uint acc_typ = VMX_CONTROL_REG_ACCESS_TYPE(exit_qualification);
+    int cr, rc = -EINVAL;
+
+    switch ( acc_typ )
+    {
+    case VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR:
+    case VMX_CONTROL_REG_ACCESS_TYPE_MOV_FROM_CR:
+    {
+        uint gpr = VMX_CONTROL_REG_ACCESS_GPR(exit_qualification);
+        uint64_t *regp = decode_register(gpr, regs, 0);
+        cr = VMX_CONTROL_REG_ACCESS_NUM(exit_qualification);
+
+        if ( regp == NULL )
+            break;
+
+        switch ( cr )
+        {
+        case 0:
+            rc = access_cr0(regs, acc_typ, regp);
+            break;
+
+        case 3:
+            gdprintk(XENLOG_G_ERR, "PVH: unexpected cr3 vmexit. rip:%lx\n",
+                     regs->rip);
+            domain_crash(current->domain);
+            break;
+
+        case 4:
+            rc = access_cr4(regs, acc_typ, regp);
+            break;
+        }
+        if ( rc == 0 )
+            vmx_update_guest_eip();
+        break;
+    }
+
+    case VMX_CONTROL_REG_ACCESS_TYPE_CLTS:
+    {
+        struct vcpu *vp = current;
+        unsigned long cr0 = vp->arch.hvm_vcpu.guest_cr[0] & ~X86_CR0_TS;
+        vp->arch.hvm_vcpu.hw_cr[0] = vp->arch.hvm_vcpu.guest_cr[0] = cr0;
+
+        vmx_fpu_enter(vp);
+        __vmwrite(GUEST_CR0, cr0);
+        __vmwrite(CR0_READ_SHADOW, cr0);
+        vmx_update_guest_eip();
+        rc = 0;
+    }
+    }
+    return rc;
+}
+
+/*
+ * NOTE: A PVH guest sets IOPL natively by setting bits in the eflags, and not
+ *       via hypercalls used by a PV.
+ */
+static int vmxit_io_instr(struct cpu_user_regs *regs)
+{
+    struct segment_register seg;
+    int requested = (regs->rflags & X86_EFLAGS_IOPL) >> 12;
+    int curr_lvl = (regs->rflags & X86_EFLAGS_VM) ? 3 : 0;
+
+    if ( curr_lvl == 0 )
+    {
+        hvm_get_segment_register(current, x86_seg_ss, &seg);
+        curr_lvl = seg.attr.fields.dpl;
+    }
+    if ( requested >= curr_lvl && emulate_privileged_op(regs) )
+        return 0;
+
+    hvm_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+    return 0;
+}
+
+static int pvh_ept_handle_violation(unsigned long qualification,
+                                    paddr_t gpa, struct cpu_user_regs *regs)
+{
+    unsigned long gla, gfn = gpa >> PAGE_SHIFT;
+    p2m_type_t p2mt;
+    mfn_t mfn = get_gfn_query_unlocked(current->domain, gfn, &p2mt);
+
+    gdprintk(XENLOG_G_ERR, "EPT violation %#lx (%c%c%c/%c%c%c), "
+             "gpa %#"PRIpaddr", mfn %#lx, type %i. IP:0x%lx RSP:0x%lx\n",
+             qualification,
+             (qualification & EPT_READ_VIOLATION) ? 'r' : '-',
+             (qualification & EPT_WRITE_VIOLATION) ? 'w' : '-',
+             (qualification & EPT_EXEC_VIOLATION) ? 'x' : '-',
+             (qualification & EPT_EFFECTIVE_READ) ? 'r' : '-',
+             (qualification & EPT_EFFECTIVE_WRITE) ? 'w' : '-',
+             (qualification & EPT_EFFECTIVE_EXEC) ? 'x' : '-',
+             gpa, mfn_x(mfn), p2mt, regs->rip, regs->rsp);
+
+    ept_walk_table(current->domain, gfn);
+
+    if ( qualification & EPT_GLA_VALID )
+    {
+        gla = __vmread(GUEST_LINEAR_ADDRESS);
+        gdprintk(XENLOG_G_ERR, " --- GLA %#lx\n", gla);
+    }
+    hvm_inject_hw_exception(TRAP_gp_fault, 0);
+    return 0;
+}
+
+/*
+ * Main vm exit handler for PVH . Called from vmx_vmexit_handler().
+ * Note: vmx_asm_vmexit_handler updates rip/rsp/eflags in regs{} struct.
+ */
+void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs)
+{
+    unsigned long exit_qualification;
+    unsigned int exit_reason = __vmread(VM_EXIT_REASON);
+    int rc=0, ccpu = smp_processor_id();
+    struct vcpu *v = current;
+
+    dbgp1("PVH:[%d]left VMCS exitreas:%d RIP:%lx RSP:%lx EFLAGS:%lx CR0:%lx\n",
+          ccpu, exit_reason, regs->rip, regs->rsp, regs->rflags,
+          __vmread(GUEST_CR0));
+
+    /* guest_kernel_mode() needs cs. read_segment_register needs others */
+    read_vmcs_selectors(regs);
+
+    switch ( (uint16_t)exit_reason )
+    {
+    /* NMI and machine_check are handled by the caller, we handle rest here */
+    case EXIT_REASON_EXCEPTION_NMI:      /* 0 */
+        rc = vmxit_exception(regs);
+        break;
+
+    case EXIT_REASON_EXTERNAL_INTERRUPT: /* 1 */
+        break;              /* handled in vmx_vmexit_handler() */
+
+    case EXIT_REASON_PENDING_VIRT_INTR:  /* 7 */
+        /* Disable the interrupt window. */
+        v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
+        __vmwrite(CPU_BASED_VM_EXEC_CONTROL, v->arch.hvm_vmx.exec_control);
+        break;
+
+    case EXIT_REASON_CPUID:              /* 10 */
+        pv_cpuid(regs);
+        vmx_update_guest_eip();
+        break;
+
+    case EXIT_REASON_HLT:                /* 12 */
+        vmx_update_guest_eip();
+        hvm_hlt(regs->eflags);
+        break;
+
+    case EXIT_REASON_VMCALL:             /* 18 */
+        rc = vmxit_vmcall(regs);
+        break;
+
+    case EXIT_REASON_CR_ACCESS:          /* 28 */
+        rc = vmxit_cr_access(regs);
+        break;
+
+    case EXIT_REASON_DR_ACCESS:          /* 29 */
+        exit_qualification = __vmread(EXIT_QUALIFICATION);
+        vmx_dr_access(exit_qualification, regs);
+        break;
+
+    case EXIT_REASON_IO_INSTRUCTION:     /* 30 */
+        vmxit_io_instr(regs);
+        break;
+
+    case EXIT_REASON_MSR_READ:           /* 31 */
+        rc = vmxit_msr_read(regs);
+        break;
+
+    case EXIT_REASON_MSR_WRITE:          /* 32 */
+        rc = vmxit_msr_write(regs);
+        break;
+
+    case EXIT_REASON_MONITOR_TRAP_FLAG:  /* 37 */
+        rc = vmxit_mtf(regs);
+        break;
+
+    case EXIT_REASON_MCE_DURING_VMENTRY: /* 41 */
+        break;              /* handled in vmx_vmexit_handler() */
+
+    case EXIT_REASON_EPT_VIOLATION:      /* 48 */
+    {
+        paddr_t gpa = __vmread(GUEST_PHYSICAL_ADDRESS);
+        exit_qualification = __vmread(EXIT_QUALIFICATION);
+        rc = pvh_ept_handle_violation(exit_qualification, gpa, regs);
+        break;
+    }
+
+    default:
+        rc = 1;
+        gdprintk(XENLOG_G_ERR,
+                 "PVH: Unexpected exit reason:0x%x\n", exit_reason);
+    }
+
+    if ( rc )
+    {
+        exit_qualification = __vmread(EXIT_QUALIFICATION);
+        gdprintk(XENLOG_G_WARNING,
+                 "PVH: [%d] exit_reas:%d 0x%x qual:%ld 0x%lx cr0:0x%016lx\n",
+                 ccpu, exit_reason, exit_reason, exit_qualification,
+                 exit_qualification, __vmread(GUEST_CR0));
+        gdprintk(XENLOG_G_WARNING, "PVH: RIP:%lx RSP:%lx EFLAGS:%lx CR3:%lx\n",
+                 regs->rip, regs->rsp, regs->rflags, __vmread(GUEST_CR3));
+        domain_crash(v->domain);
+    }
+}
+
+/*
+ * Sets info for non boot SMP vcpu. VCPU 0 context is set by the library.
+ * In case of linux, the call comes from cpu_initialize_context().
+ */
+int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp)
+{
+    if ( v->vcpu_id == 0 )
+        return 0;
+
+    vmx_vmcs_enter(v);
+    __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr);
+    __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit);
+    __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_user);
+
+    __vmwrite(GUEST_CS_SELECTOR, ctxtp->user_regs.cs);
+    __vmwrite(GUEST_DS_SELECTOR, ctxtp->user_regs.ds);
+    __vmwrite(GUEST_ES_SELECTOR, ctxtp->user_regs.es);
+    __vmwrite(GUEST_SS_SELECTOR, ctxtp->user_regs.ss);
+    __vmwrite(GUEST_GS_SELECTOR, ctxtp->user_regs.gs);
+
+    if ( vmx_add_guest_msr(MSR_SHADOW_GS_BASE) )
+    {
+        vmx_vmcs_exit(v);
+        return -EINVAL;
+    }
+    vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_kernel);
+
+    vmx_vmcs_exit(v);
+    return 0;
+}
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index ad341dc..3ed6cff 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -472,6 +472,8 @@ void setup_ept_dump(void);
 void vmx_update_guest_eip(void);
 void vmx_dr_access(unsigned long exit_qualification,
                    struct cpu_user_regs *regs);
+void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs);
+int  vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp);
 
 int alloc_p2m_hap_data(struct p2m_domain *p2m);
 void free_p2m_hap_data(struct p2m_domain *p2m);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 14/18] PVH xen: some misc changes like mtrr, intr, msi...
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (12 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 13/18] PVH xen: introduce vmx_pvh.c Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 15/18] PVH xen: hcall page initialize, create PVH guest type, etc Mukesh Rathor
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

Changes in irq.c as PVH doesn't use vlapic emulation. In mtrr we add
assert and set MTRR_TYPEs for PVH.

Changes in V2:
   - Some cleanup of redundant code.
   - time.c: Honor no rdtsc exiting for PVH by setting vtsc to 0 in time.c

Changes in V3:
   - Dont check for pvh in making mmio rangesets readonly.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/hvm/irq.c      |    3 +++
 xen/arch/x86/hvm/mtrr.c     |   11 +++++++++++
 xen/arch/x86/hvm/vmx/intr.c |    7 ++++---
 xen/arch/x86/time.c         |    9 +++++++++
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 9eae5de..92fb245 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -405,6 +405,9 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v)
          && vcpu_info(v, evtchn_upcall_pending) )
         return hvm_intack_vector(plat->irq.callback_via.vector);
 
+    if ( is_pvh_vcpu(v) )
+        return hvm_intack_none;
+
     if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output )
         return hvm_intack_pic(0);
 
diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index ef51a8d..0623652 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -578,6 +578,9 @@ int32_t hvm_set_mem_pinned_cacheattr(
 {
     struct hvm_mem_pinned_cacheattr_range *range;
 
+    /* PVH note: The guest writes to MSR_IA32_CR_PAT natively */
+    ASSERT(!is_pvh_domain(d));
+
     if ( !((type == PAT_TYPE_UNCACHABLE) ||
            (type == PAT_TYPE_WRCOMB) ||
            (type == PAT_TYPE_WRTHROUGH) ||
@@ -693,6 +696,14 @@ uint8_t epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
          ((d->vcpu == NULL) || ((v = d->vcpu[0]) == NULL)) )
         return MTRR_TYPE_WRBACK;
 
+    /* PVH fixme: Add support for more memory types */
+    if ( is_pvh_domain(d) )
+    {
+        if (direct_mmio)
+            return MTRR_TYPE_UNCACHABLE;
+        return MTRR_TYPE_WRBACK;
+    }
+
     if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )
         return MTRR_TYPE_WRBACK;
 
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index e376f3c..b94f9d5 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -219,15 +219,16 @@ void vmx_intr_assist(void)
         return;
     }
 
-    /* Crank the handle on interrupt state. */
-    pt_vector = pt_update_irq(v);
+    if ( !is_pvh_vcpu(v) )
+        /* Crank the handle on interrupt state. */
+        pt_vector = pt_update_irq(v);
 
     do {
         intack = hvm_vcpu_has_pending_irq(v);
         if ( likely(intack.source == hvm_intsrc_none) )
             goto out;
 
-        if ( unlikely(nvmx_intr_intercept(v, intack)) )
+        if ( !is_pvh_vcpu(v) && unlikely(nvmx_intr_intercept(v, intack)) )
             goto out;
 
         intblk = hvm_interrupt_blocked(v, intack);
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 6e94847..484eb07 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1933,6 +1933,15 @@ void tsc_set_info(struct domain *d,
         d->arch.vtsc = 0;
         return;
     }
+    if ( is_pvh_domain(d) && tsc_mode != TSC_MODE_NEVER_EMULATE )
+    {
+        /* PVH fixme: support more tsc modes */
+        dprintk(XENLOG_WARNING,
+                "PVH currently does not support tsc emulation. Setting it "
+                "to no emulation\n");
+        d->arch.vtsc = 0;
+        return;
+    }
 
     switch ( d->arch.tsc_mode = tsc_mode )
     {
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 15/18] PVH xen: hcall page initialize, create PVH guest type, etc...
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (13 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 14/18] PVH xen: some misc changes like mtrr, intr, msi Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 16/18] PVH xen: Miscellaneous changes Mukesh Rathor
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

Create hcall page for PVH same as HVM. Set the PVH guest type if PV with
HAP is created, and some other changes in traps.c to support PVH.

Changes in V2:
  - Fix emulate_forced_invalid_op() to use proper copy function, and inject PF
    in case it fails.
  - remove extraneous PVH check in STI/CLI ops en emulate_privileged_op().
  - Make assert a debug ASSERT in show_registers().
  - debug.c: keep get_gfn() locked and move put_gfn closer to it.

Changes in V3:
  - Mostly formatting.

Changes in V5:
  - emulation of forced invalid op redone, and in a separate patch.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/traps.c        |   19 ++++++++++++++++++-
 xen/arch/x86/x86_64/traps.c |    8 +++++---
 xen/common/domain.c         |    9 +++++++++
 xen/common/domctl.c         |    5 +++++
 xen/common/kernel.c         |    6 +++++-
 5 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 75faf04..89ca10b 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -459,6 +459,10 @@ static void instruction_done(
     struct cpu_user_regs *regs, unsigned long eip, unsigned int bpmatch)
 {
     regs->eip = eip;
+
+    if ( is_pvh_vcpu(current) )
+        return;
+
     regs->eflags &= ~X86_EFLAGS_RF;
     if ( bpmatch || (regs->eflags & X86_EFLAGS_TF) )
     {
@@ -475,6 +479,9 @@ static unsigned int check_guest_io_breakpoint(struct vcpu *v,
     unsigned int width, i, match = 0;
     unsigned long start;
 
+    if ( is_pvh_vcpu(v) )
+        return 0;          /* PVH fixme: support io breakpoint */
+
     if ( !(v->arch.debugreg[5]) ||
          !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) )
         return 0;
@@ -1628,6 +1635,13 @@ static int guest_io_okay(
     int user_mode = !(v->arch.flags & TF_kernel_mode);
 #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
 
+    /*
+     * For PVH we check this in vmexit for EXIT_REASON_IO_INSTRUCTION
+     * and so don't need to check again here.
+     */
+    if ( is_pvh_vcpu(v) )
+        return 1;
+
     if ( !vm86_mode(regs) &&
          (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ? 1 : 3)) )
         return 1;
@@ -1873,7 +1887,7 @@ static inline uint64_t guest_misc_enable(uint64_t val)
         _ptr = (unsigned int)_ptr;                                          \
     if ( (limit) < sizeof(_x) - 1 || (eip) > (limit) - (sizeof(_x) - 1) )   \
         goto fail;                                                          \
-    if ( (_rc = copy_from_user(&_x, (type *)_ptr, sizeof(_x))) != 0 )       \
+    if ( (_rc = raw_copy_from_guest(&_x, (type *)_ptr, sizeof(_x))) != 0 )  \
     {                                                                       \
         propagate_page_fault(_ptr + sizeof(_x) - _rc, 0);                   \
         goto skip;                                                          \
@@ -3323,6 +3337,9 @@ void do_device_not_available(struct cpu_user_regs *regs)
 
     BUG_ON(!guest_mode(regs));
 
+    /* PVH should not get here. (ctrlreg is not implemented) */
+    ASSERT(!is_pvh_vcpu(curr));
+
     vcpu_restore_fpu_lazy(curr);
 
     if ( curr->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index d2f7209..0df1e1c 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -146,8 +146,8 @@ void vcpu_show_registers(const struct vcpu *v)
     const struct cpu_user_regs *regs = &v->arch.user_regs;
     unsigned long crs[8];
 
-    /* No need to handle HVM for now. */
-    if ( is_hvm_vcpu(v) )
+    /* No need to handle HVM and PVH for now. */
+    if ( !is_pv_vcpu(v) )
         return;
 
     crs[0] = v->arch.pv_vcpu.ctrlreg[0];
@@ -440,6 +440,8 @@ static long register_guest_callback(struct callback_register *reg)
     long ret = 0;
     struct vcpu *v = current;
 
+    ASSERT(!is_pvh_vcpu(v));
+
     if ( !is_canonical_address(reg->address) )
         return -EINVAL;
 
@@ -620,7 +622,7 @@ static void hypercall_page_initialise_ring3_kernel(void *hypercall_page)
 void hypercall_page_initialise(struct domain *d, void *hypercall_page)
 {
     memset(hypercall_page, 0xCC, PAGE_SIZE);
-    if ( is_hvm_domain(d) )
+    if ( !is_pv_domain(d) )
         hvm_hypercall_page_initialise(d, hypercall_page);
     else if ( !is_pv_32bit_domain(d) )
         hypercall_page_initialise_ring3_kernel(hypercall_page);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 6ece3fe..b4be781 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -237,6 +237,15 @@ struct domain *domain_create(
 
     if ( domcr_flags & DOMCRF_hvm )
         d->guest_type = is_hvm;
+    else if ( domcr_flags & DOMCRF_pvh )
+    {
+        if ( !(domcr_flags & DOMCRF_hap) )
+        {
+            printk(XENLOG_INFO "PVH guest must have HAP on\n");
+            goto fail;
+        }
+        d->guest_type = is_pvh;
+    }
 
     if ( domid == 0 )
     {
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 9bd8f80..f9c361d 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -187,6 +187,8 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
 
     if ( is_hvm_domain(d) )
         info->flags |= XEN_DOMINF_hvm_guest;
+    else if ( is_pvh_domain(d) )
+        info->flags |= XEN_DOMINF_pvh_guest;
 
     xsm_security_domaininfo(d, info);
 
@@ -443,6 +445,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         domcr_flags = 0;
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hvm_guest )
             domcr_flags |= DOMCRF_hvm;
+        else if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap )
+            domcr_flags |= DOMCRF_pvh;     /* PV with HAP is a PVH guest */
+
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap )
             domcr_flags |= DOMCRF_hap;
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_s3_integrity )
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 72fb905..3bba758 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -289,7 +289,11 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             if ( current->domain == dom0 )
                 fi.submap |= 1U << XENFEAT_dom0;
 #ifdef CONFIG_X86
-            if ( !is_hvm_vcpu(current) )
+            if ( is_pvh_vcpu(current) )
+                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                             (1U << XENFEAT_supervisor_mode_kernel) |
+                             (1U << XENFEAT_hvm_callback_vector);
+            else if ( !is_hvm_vcpu(current) )
                 fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
                              (1U << XENFEAT_highmem_assist) |
                              (1U << XENFEAT_gnttab_map_avail_bits);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 16/18] PVH xen: Miscellaneous changes
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (14 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 15/18] PVH xen: hcall page initialize, create PVH guest type, etc Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-06-05 15:39   ` Konrad Rzeszutek Wilk
  2013-05-25  1:25 ` [PATCH 17/18] PVH xen: Introduce p2m_map_foreign Mukesh Rathor
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

This patch contains misc changes like restricting iobitmap calls for PVH,
restricting 32bit PVH guest, etc..

Changes in V6:
  - clear out vcpu_guest_context struct in arch_get_info_guest.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/domain.c      |    7 +++++++
 xen/arch/x86/domain_page.c |   10 +++++-----
 xen/arch/x86/domctl.c      |    6 ++++++
 xen/arch/x86/mm.c          |    2 +-
 xen/arch/x86/physdev.c     |   13 +++++++++++++
 xen/common/grant_table.c   |    4 ++--
 6 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9953f80..8cff7c9 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -339,6 +339,13 @@ int switch_compat(struct domain *d)
 
     if ( d == NULL )
         return -EINVAL;
+
+    if ( is_pvh_domain(d) )
+    {
+        gdprintk(XENLOG_G_ERR,
+                 "Xen does not currently support 32bit PVH guests\n");
+        return -EINVAL;
+    }
     if ( !may_switch_mode(d) )
         return -EACCES;
     if ( is_pv_32on64_domain(d) )
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index efda6af..7685416 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -34,7 +34,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
      * then it means we are running on the idle domain's page table and must
      * therefore use its mapcache.
      */
-    if ( unlikely(pagetable_is_null(v->arch.guest_table)) && !is_hvm_vcpu(v) )
+    if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) )
     {
         /* If we really are idling, perform lazy context switch now. */
         if ( (v = idle_vcpu[smp_processor_id()]) == current )
@@ -71,7 +71,7 @@ void *map_domain_page(unsigned long mfn)
 #endif
 
     v = mapcache_current_vcpu();
-    if ( !v || is_hvm_vcpu(v) )
+    if ( !v || !is_pv_vcpu(v) )
         return mfn_to_virt(mfn);
 
     dcache = &v->domain->arch.pv_domain.mapcache;
@@ -175,7 +175,7 @@ void unmap_domain_page(const void *ptr)
     ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
 
     v = mapcache_current_vcpu();
-    ASSERT(v && !is_hvm_vcpu(v));
+    ASSERT(v && is_pv_vcpu(v));
 
     dcache = &v->domain->arch.pv_domain.mapcache;
     ASSERT(dcache->inuse);
@@ -242,7 +242,7 @@ int mapcache_domain_init(struct domain *d)
     struct mapcache_domain *dcache = &d->arch.pv_domain.mapcache;
     unsigned int bitmap_pages;
 
-    if ( is_hvm_domain(d) || is_idle_domain(d) )
+    if ( !is_pv_domain(d) || is_idle_domain(d) )
         return 0;
 
 #ifdef NDEBUG
@@ -273,7 +273,7 @@ int mapcache_vcpu_init(struct vcpu *v)
     unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
     unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long));
 
-    if ( is_hvm_vcpu(v) || !dcache->inuse )
+    if ( !is_pv_vcpu(v) || !dcache->inuse )
         return 0;
 
     if ( ents > dcache->entries )
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index ce32245..8b44061 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1305,6 +1305,12 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
             c.nat->gs_base_kernel = hvm_get_shadow_gs_base(v);
         }
     }
+    else if ( is_pvh_vcpu(v) )
+    {
+        /* pvh fixme: punt it to phase II */
+        printk(XENLOG_WARNING "PVH: fixme: arch_get_info_guest()\n");
+        memset(c.nat, 0, sizeof(*c.nat));
+    }
     else
     {
         c(ldt_base = v->arch.pv_vcpu.ldt_base);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b190ad9..e992b4f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2805,7 +2805,7 @@ static struct domain *get_pg_owner(domid_t domid)
         goto out;
     }
 
-    if ( unlikely(paging_mode_translate(curr)) )
+    if ( !is_pvh_domain(curr) && unlikely(paging_mode_translate(curr)) )
     {
         MEM_LOG("Cannot mix foreign mappings with translated domains");
         goto out;
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 3733c7a..2fc7ae6 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -475,6 +475,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case PHYSDEVOP_set_iopl: {
         struct physdev_set_iopl set_iopl;
+
+        if ( is_pvh_vcpu(current) )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
         ret = -EFAULT;
         if ( copy_from_guest(&set_iopl, arg, 1) != 0 )
             break;
@@ -488,6 +495,12 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case PHYSDEVOP_set_iobitmap: {
         struct physdev_set_iobitmap set_iobitmap;
+
+        if ( is_pvh_vcpu(current) )
+        {
+            ret = -EINVAL;
+            break;
+        }
         ret = -EFAULT;
         if ( copy_from_guest(&set_iobitmap, arg, 1) != 0 )
             break;
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 3f97328..a2073d2 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -721,7 +721,7 @@ __gnttab_map_grant_ref(
 
     double_gt_lock(lgt, rgt);
 
-    if ( !is_hvm_domain(ld) && need_iommu(ld) )
+    if ( is_pv_domain(ld) && need_iommu(ld) )
     {
         unsigned int wrc, rdc;
         int err = 0;
@@ -932,7 +932,7 @@ __gnttab_unmap_common(
             act->pin -= GNTPIN_hstw_inc;
     }
 
-    if ( !is_hvm_domain(ld) && need_iommu(ld) )
+    if ( is_pv_domain(ld) && need_iommu(ld) )
     {
         unsigned int wrc, rdc;
         int err = 0;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 17/18] PVH xen: Introduce p2m_map_foreign
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (15 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 16/18] PVH xen: Miscellaneous changes Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-05-25  1:25 ` [PATCH 18/18] PVH xen: Add and remove foreign pages Mukesh Rathor
  2013-06-05 15:23 ` [PATCH 00/18][V6]: PVH xen: version 6 patches Konrad Rzeszutek Wilk
  18 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

In this patch, a new type p2m_map_foreign for pages that a dom0 maps from
foreign domains its creating is introduced.

Changes in V2:
   - Make guest_physmap_add_entry() same for PVH in terms of overwriting old
     entry.
   - In set_foreign_p2m_entry() do locked get_gfn and not unlocked.
   - Replace ASSERT with return -EINVAL in do_physdev_op.
   - Remove unnecessary check for PVH in do_physdev_op().

Changes in V3:
   - remove changes unrelated to this patch.

Changes in V5:
   - remove mmio check for highest gfn tracking.
   - set_foreign_p2m_entry looks same as set_mmio_p2m_entry, so make a common
     function for both.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/mm/p2m-ept.c |    1 +
 xen/arch/x86/mm/p2m-pt.c  |    1 +
 xen/arch/x86/mm/p2m.c     |   29 +++++++++++++++++++++--------
 xen/include/asm-x86/p2m.h |    4 ++++
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 595c6e7..67c200c 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -75,6 +75,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
             entry->w = 0;
             break;
         case p2m_grant_map_rw:
+        case p2m_map_foreign:
             entry->r = entry->w = 1;
             entry->x = 0;
             break;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 302b621..021a6af 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -89,6 +89,7 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
     case p2m_ram_rw:
         return flags | P2M_BASE_FLAGS | _PAGE_RW;
     case p2m_grant_map_rw:
+    case p2m_map_foreign:
         return flags | P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
     case p2m_mmio_direct:
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index f5ddd20..1646fac 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -523,7 +523,7 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
         for ( i = 0; i < (1UL << page_order); i++ )
         {
             mfn_return = p2m->get_entry(p2m, gfn + i, &t, &a, 0, NULL);
-            if ( !p2m_is_grant(t) && !p2m_is_shared(t) )
+            if ( !p2m_is_grant(t) && !p2m_is_shared(t) && !p2m_is_foreign(t) )
                 set_gpfn_from_mfn(mfn+i, INVALID_M2P_ENTRY);
             ASSERT( !p2m_is_valid(t) || mfn + i == mfn_x(mfn_return) );
         }
@@ -754,10 +754,9 @@ void p2m_change_type_range(struct domain *d,
     p2m_unlock(p2m);
 }
 
-
-
-int
-set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+static int
+set_typed_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
+                    p2m_type_t gfn_p2mt)
 {
     int rc = 0;
     p2m_access_t a;
@@ -782,16 +781,30 @@ set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
         set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
     }
 
-    P2M_DEBUG("set mmio %lx %lx\n", gfn, mfn_x(mfn));
-    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_mmio_direct, p2m->default_access);
+    P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn, mfn_x(mfn));
+    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, gfn_p2mt,
+                       p2m->default_access);
     gfn_unlock(p2m, gfn, 0);
     if ( 0 == rc )
         gdprintk(XENLOG_ERR,
-            "set_mmio_p2m_entry: set_p2m_entry failed! mfn=%08lx\n",
+            "%s: set_p2m_entry failed! mfn=%08lx\n", __func__,
             mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)));
     return rc;
 }
 
+/* Returns: True for success. 0 for failure. */
+int 
+set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+{
+    return set_typed_p2m_entry(d, gfn, mfn, p2m_map_foreign);
+}
+
+int
+set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+{
+    return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct);
+}
+
 int
 clear_mmio_p2m_entry(struct domain *d, unsigned long gfn)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 43583b2..6fc71a1 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -70,6 +70,7 @@ typedef enum {
     p2m_ram_paging_in = 11,       /* Memory that is being paged in */
     p2m_ram_shared = 12,          /* Shared or sharable memory */
     p2m_ram_broken = 13,          /* Broken page, access cause domain crash */
+    p2m_map_foreign  = 14,        /* ram pages from foreign domain */
 } p2m_type_t;
 
 /*
@@ -180,6 +181,7 @@ typedef unsigned int p2m_query_t;
 #define p2m_is_sharable(_t) (p2m_to_mask(_t) & P2M_SHARABLE_TYPES)
 #define p2m_is_shared(_t)   (p2m_to_mask(_t) & P2M_SHARED_TYPES)
 #define p2m_is_broken(_t)   (p2m_to_mask(_t) & P2M_BROKEN_TYPES)
+#define p2m_is_foreign(_t)  (p2m_to_mask(_t) & p2m_to_mask(p2m_map_foreign))
 
 /* Per-p2m-table state */
 struct p2m_domain {
@@ -510,6 +512,8 @@ p2m_type_t p2m_change_type(struct domain *d, unsigned long gfn,
 int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn);
 
+/* Set foreign mfn in the current guest's p2m table. */
+int set_foreign_p2m_entry(struct domain *domp, unsigned long gfn, mfn_t mfn);
 
 /* 
  * Populate-on-demand
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 18/18] PVH xen: Add and remove foreign pages
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (16 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 17/18] PVH xen: Introduce p2m_map_foreign Mukesh Rathor
@ 2013-05-25  1:25 ` Mukesh Rathor
  2013-06-05 15:23 ` [PATCH 00/18][V6]: PVH xen: version 6 patches Konrad Rzeszutek Wilk
  18 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-05-25  1:25 UTC (permalink / raw)
  To: Xen-devel

In this patch, a new function, xenmem_add_foreign_to_pmap(), is added
to map pages from foreign guest into current dom0 for domU creation.
Also, allow XENMEM_remove_from_physmap to remove p2m_map_foreign
pages. Note, in this path, we must release the refcount that was taken
during the map phase.

Changes in V2:
  - Move the XENMEM_remove_from_physmap changes here instead of prev patch
  - Move grant changes from this to one of the next patches.
  - In xenmem_add_foreign_to_pmap(), do locked get_gfn
  - Fail the mappings for qemu mapping pages for memory not there.

Changes in V3:
  - remove mmio pages.
  - remove unrelated changes.
  - cleanup both add and remove.

Changes in V5:
  - add a comment in public/memory.h

Changes in V6:
  - replace ASSERTs with returning error code.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/mm.c           |   77 +++++++++++++++++++++++++++++++++++++++++++
 xen/common/memory.c         |   44 ++++++++++++++++++++++---
 xen/include/public/memory.h |    2 +-
 3 files changed, 117 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e992b4f..34137f9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4520,6 +4520,78 @@ static int handle_iomem_range(unsigned long s, unsigned long e, void *p)
     return 0;
 }
 
+/*
+ * Add frames from foreign domain to current domain's physmap. Similar to
+ * XENMAPSPACE_gmfn but the frame is foreign being mapped into current,
+ * and is not removed from foreign domain.
+ * Usage: libxl on pvh dom0 creating a guest and doing privcmd_ioctl_mmap.
+ * Side Effect: the mfn for fgfn will be refcounted so it is not lost
+ *              while mapped here. The refcnt is released in do_memory_op()
+ *              via XENMEM_remove_from_physmap.
+ * Returns: 0 ==> success
+ */
+static int xenmem_add_foreign_to_pmap(domid_t foreign_domid,
+                                      unsigned long fgfn, unsigned long gpfn)
+{
+    p2m_type_t p2mt, p2mt_prev;
+    int rc = 0;
+    unsigned long prev_mfn, mfn = 0;
+    struct domain *fdom, *currd = current->domain;
+    struct page_info *page = NULL;
+
+    if ( currd->domain_id == foreign_domid || foreign_domid == DOMID_SELF ||
+         !is_pvh_domain(currd) )
+        return -EINVAL;
+
+    if ( !is_hardware_domain(currd) || !(fdom = get_pg_owner(foreign_domid)) )
+        return -EPERM;
+
+    /* following will take a refcnt on the mfn */
+    page = get_page_from_gfn(fdom, fgfn, &p2mt, P2M_ALLOC);
+    if ( !page || !p2m_is_valid(p2mt) )
+    {
+        if ( page )
+            put_page(page);
+        put_pg_owner(fdom);
+        return -EINVAL;
+    }
+    mfn = page_to_mfn(page);
+
+    /* Remove previously mapped page if it is present. */
+    prev_mfn = mfn_x(get_gfn(currd, gpfn, &p2mt_prev));
+    if ( mfn_valid(prev_mfn) )
+    {
+        if ( is_xen_heap_mfn(prev_mfn) )
+            /* Xen heap frames are simply unhooked from this phys slot */
+            guest_physmap_remove_page(currd, gpfn, prev_mfn, 0);
+        else
+            /* Normal domain memory is freed, to avoid leaking memory. */
+            guest_remove_page(currd, gpfn);
+    }
+    /*
+     * Create the new mapping. Can't use guest_physmap_add_page() because it
+     * will update the m2p table which will result in  mfn -> gpfn of dom0
+     * and not fgfn of domU.
+     */
+    if ( set_foreign_p2m_entry(currd, gpfn, _mfn(mfn)) == 0 )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "guest_physmap_add_page failed. gpfn:%lx mfn:%lx fgfn:%lx\n",
+                 gpfn, mfn, fgfn);
+        put_page(page);
+        rc = -EINVAL;
+    }
+
+    /*
+     * We must do this put_gfn after set_foreign_p2m_entry so another cpu
+     * doesn't populate the gpfn before us.
+     */
+    put_gfn(currd, gpfn);
+
+    put_pg_owner(fdom);
+    return rc;
+}
+
 static int xenmem_add_to_physmap_once(
     struct domain *d,
     const struct xen_add_to_physmap *xatp,
@@ -4582,6 +4654,11 @@ static int xenmem_add_to_physmap_once(
             page = mfn_to_page(mfn);
             break;
         }
+
+        case XENMAPSPACE_gmfn_foreign:
+            return xenmem_add_foreign_to_pmap(foreign_domid, xatp->idx,
+                                              xatp->gpfn);
+
         default:
             break;
     }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 06a0d0a..47f0cb0 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -675,9 +675,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case XENMEM_remove_from_physmap:
     {
+        unsigned long mfn;
         struct xen_remove_from_physmap xrfp;
         struct page_info *page;
-        struct domain *d;
+        struct domain *d, *foreign_dom = NULL;
+        p2m_type_t p2mt, tp;
 
         if ( copy_from_guest(&xrfp, arg, 1) )
             return -EFAULT;
@@ -695,11 +697,43 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
         domain_lock(d);
 
-        page = get_page_from_gfn(d, xrfp.gpfn, NULL, P2M_ALLOC);
-        if ( page )
+        /*
+         * if PVH, the gfn could be mapped to a mfn from foreign domain by the
+         * user space tool during domain creation. We need to check for that,
+         * free it up from the p2m, and release refcnt on it. In such a case,
+         * page would be NULL and the following call would not have refcnt'd
+         * the page. See also xenmem_add_foreign_to_pmap().
+         */
+        page = get_page_from_gfn(d, xrfp.gpfn, &p2mt, P2M_ALLOC);
+
+        if ( page || p2m_is_foreign(p2mt) )
         {
-            guest_physmap_remove_page(d, xrfp.gpfn, page_to_mfn(page), 0);
-            put_page(page);
+            if ( page )
+                mfn = page_to_mfn(page);
+            else
+            {
+                mfn = mfn_x(get_gfn_query(d, xrfp.gpfn, &tp));
+                foreign_dom = page_get_owner(mfn_to_page(mfn));
+
+                if ( !is_pvh_domain(d) || (d == foreign_dom) ||
+                     !p2m_is_foreign(tp) )
+                {
+                    rc = -EINVAL;
+                    domain_unlock(d);
+                    rcu_unlock_domain(d);
+                    break;
+                }
+            }
+
+            guest_physmap_remove_page(d, xrfp.gpfn, mfn, 0);
+            if ( page )
+                put_page(page);
+
+            if ( p2m_is_foreign(p2mt) )
+            {
+                put_page(mfn_to_page(mfn));
+                put_gfn(d, xrfp.gpfn);
+            }
         }
         else
             rc = -ENOENT;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 7a26dee..e319b5d 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -208,7 +208,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_machphys_mapping_t);
 #define XENMAPSPACE_gmfn_range   3 /* GMFN range, XENMEM_add_to_physmap only. */
 #define XENMAPSPACE_gmfn_foreign 4 /* GMFN from another dom,
                                     * XENMEM_add_to_physmap_range only.
-                                    */
+                                    * (PVH x86 only) */
 /* ` } */
 
 /*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/18] PVH xen: turn gdb_frames/gdt_ents into union
  2013-05-25  1:25 ` [PATCH 01/18] PVH xen: turn gdb_frames/gdt_ents into union Mukesh Rathor
@ 2013-05-31  9:13   ` Jan Beulich
  0 siblings, 0 replies; 80+ messages in thread
From: Jan Beulich @ 2013-05-31  9:13 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> Changes in V2:
>   - Add __XEN_INTERFACE_VERSION__
> 
>   Changes in V3:
>     - Rename union to 'gdt' and rename field names.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-05-25  1:25 ` [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range Mukesh Rathor
@ 2013-05-31  9:28   ` Jan Beulich
  2013-05-31  9:38     ` Ian Campbell
  2013-06-05  0:31     ` Mukesh Rathor
  0 siblings, 2 replies; 80+ messages in thread
From: Jan Beulich @ 2013-05-31  9:28 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>  static int xenmem_add_to_physmap_once(
>      struct domain *d,
> -    const struct xen_add_to_physmap *xatp)
> +    const struct xen_add_to_physmap *xatp,
> +    domid_t foreign_domid)

The patch could be a bit smaller afaict if you used the otherwise
unused here domain ID field in xatp for passing the domain ID you
care about here (I hinted at that in the last round already, where
I also asked Stefano why we have three domains here in the first
place).

While I won't object the patch to remain the way it is, I'm also
not sure I'd want to ack it in the less efficient shape.

> +        if ( xsm_add_to_physmap(XSM_TARGET, current->domain, d) )
> +        {
> +            rcu_unlock_domain(d);
> +            return -EPERM;
> +        }

I realize there's another such bogus use of the function in the same
file, but you shouldn't propagate that mistake: The function has a
proper return value, and that's what should be used here instead
of forcing it to be -EPERM.

I also vaguely recall having pointed out in a much earlier review
that this functionality is lacking a counterpart in
compat_arch_memory_op().

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-05-31  9:28   ` Jan Beulich
@ 2013-05-31  9:38     ` Ian Campbell
  2013-05-31 10:14       ` Jan Beulich
  2013-06-05  0:31     ` Mukesh Rathor
  1 sibling, 1 reply; 80+ messages in thread
From: Ian Campbell @ 2013-05-31  9:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 2013-05-31 at 10:28 +0100, Jan Beulich wrote:
> >>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> >  static int xenmem_add_to_physmap_once(
> >      struct domain *d,
> > -    const struct xen_add_to_physmap *xatp)
> > +    const struct xen_add_to_physmap *xatp,
> > +    domid_t foreign_domid)
> 
> The patch could be a bit smaller afaict if you used the otherwise
> unused here domain ID field in xatp for passing the domain ID you
> care about here (I hinted at that in the last round already, where
> I also asked Stefano why we have three domains here in the first
> place).

This interface is already used on ARM, I don't think we want PVH to use
a different variation of the same thing so I don't think it is right to
put this on Mukesh.

AFAICT the existing xenmem ops all take an explicit target domain here,
rather than assuming that the target is the caller, this includes the
non-ranged add_to_physmap variant.

So I think the existing XENMEM_add_to_physmap_range interface is
consistent with the rest of that API, which I think is worthwhile even
if it only ever gets used with DOMID_SELF in practice (we have several
such interfaces, I think).

In any case I'm not sure we can rule out a future need to be able to do
this sort of operation on another domain -- at least not to the extent
of worrying about changing it now that it exists.

Ian.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/18] PVH xen: create domctl_memory_mapping() function
  2013-05-25  1:25 ` [PATCH 03/18] PVH xen: create domctl_memory_mapping() function Mukesh Rathor
@ 2013-05-31  9:46   ` Jan Beulich
  2013-06-05  0:47     ` Mukesh Rathor
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Beulich @ 2013-05-31  9:46 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> Changes in V5:
>   - Move iomem_access_permitted check to case statment from the function
>     as current doesn't point to dom0 during construct_dom0.
> 
> Changes in V6:
>   - Move iomem_access_permitted back to domctl_memory_mapping() as it should
>     be after the sanity and wrap checks of mfns/gfns.

So you're undoing what you previously did? How can it work then?
Or is the whole patch perhaps no longer needed with Dom0 code
dropped for the time being?

Otherwise, rather than undoing what you did in v5, I'd think you'd
want to keep the sanity/wrap checks in the caller as well, since if
you need the function to be split out just for Dom0 building, you
can assume the inputs to be sane. And if so, the XSM check could
remain in the caller too. That would also make sure there really is
no change in functionality:

> +long domctl_memory_mapping(struct domain *d, unsigned long gfn,
> +                           unsigned long mfn, unsigned long nr_mfns,
> +                           bool_t add_map)
> +{
> +    unsigned long i;
> +    long ret;
> +
> +    if ( (mfn + nr_mfns - 1) < mfn || /* wrap? */
> +         ((mfn | (mfn + nr_mfns - 1)) >> (paddr_bits - PAGE_SHIFT)) ||
> +         (gfn + nr_mfns - 1) < gfn ) /* wrap? */
> +        return -EINVAL;
> +
> +    /* caller construct_dom0() runs on idle vcpu */
> +    if ( !is_idle_vcpu(current) &&

This check is new, so this is not pure code motion.

> +         !iomem_access_permitted(current->domain, mfn, mfn + nr_mfns - 1) )
> +        return -EPERM;
> +
> +    ret = xsm_iomem_permission(XSM_HOOK, d, mfn, mfn + nr_mfns - 1, add_map);

And this was xsm_iomem_mapping() in the original code. What's going
on here?

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/18] PVH xen: add params to read_segment_register
  2013-05-25  1:25 ` [PATCH 04/18] PVH xen: add params to read_segment_register Mukesh Rathor
@ 2013-05-31 10:00   ` Jan Beulich
  2013-06-06  1:25     ` Mukesh Rathor
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Beulich @ 2013-05-31 10:00 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> @@ -240,10 +240,10 @@ void do_double_fault(struct cpu_user_regs *regs)
>      crs[2] = read_cr2();
>      crs[3] = read_cr3();
>      crs[4] = read_cr4();
> -    regs->ds = read_segment_register(ds);
> -    regs->es = read_segment_register(es);
> -    regs->fs = read_segment_register(fs);
> -    regs->gs = read_segment_register(gs);
> +    regs->ds = read_segment_register(current, regs, ds);
> +    regs->es = read_segment_register(current, regs, es);
> +    regs->fs = read_segment_register(current, regs, fs);
> +    regs->gs = read_segment_register(current, regs, gs);

In patch 9 you start using the first parameter of
read_segment_register() in ways not compatible with the use of
current here - the double fault handler (and in general all host side
exception handling code, i.e. the change to show_registers() is
questionable too) wants to use the real register value, not what's
in regs->. Even more, with the VMEXIT code storing at best
a known bad value into these fields, is it really valid to use them
at all (i.e. things ought to work much like the if() portion of
show_registers() which you _do not_ modify).

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-05-31  9:38     ` Ian Campbell
@ 2013-05-31 10:14       ` Jan Beulich
  2013-05-31 10:40         ` Ian Campbell
  2013-06-05  0:24         ` Mukesh Rathor
  0 siblings, 2 replies; 80+ messages in thread
From: Jan Beulich @ 2013-05-31 10:14 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

>>> On 31.05.13 at 11:38, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Fri, 2013-05-31 at 10:28 +0100, Jan Beulich wrote:
>> >>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>> >  static int xenmem_add_to_physmap_once(
>> >      struct domain *d,
>> > -    const struct xen_add_to_physmap *xatp)
>> > +    const struct xen_add_to_physmap *xatp,
>> > +    domid_t foreign_domid)
>> 
>> The patch could be a bit smaller afaict if you used the otherwise
>> unused here domain ID field in xatp for passing the domain ID you
>> care about here (I hinted at that in the last round already, where
>> I also asked Stefano why we have three domains here in the first
>> place).
> 
> This interface is already used on ARM, I don't think we want PVH to use
> a different variation of the same thing so I don't think it is right to
> put this on Mukesh.

I expressed this wrong if you understood it this way: I'm not
asking Mukesh to alter the interface - I had asked Stefano why
it was the way it is, and while I'm not happy with the situation,
I appreciate that changing it again is not a very good idea.

All I'm asking is, instead of introducing a new function parameter
here, to use an otherwise unused field of the xatp input
structure. This is based on the fact that the field is being
evaluated once directly in arch_memory_op(), and then no longer
needed.

That would also allow doing the setting just once before loops get
started, rather than having to pass the same value again on each
loop iteration.

The one caveat is that for the continuation creation of
XENMEM_add_to_physmap the field would need to be prevented
from getting written back to guest memory. But I consider it bad
practice anyway to copy back whole structures when only
selected fields (often just one) need updating, and hence
switching that operation to do a couple of individual field writes
would be nice regardless of that change, even if this adds two or
three lines of code.

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-05-31 10:14       ` Jan Beulich
@ 2013-05-31 10:40         ` Ian Campbell
  2013-06-05  0:24         ` Mukesh Rathor
  1 sibling, 0 replies; 80+ messages in thread
From: Ian Campbell @ 2013-05-31 10:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 2013-05-31 at 11:14 +0100, Jan Beulich wrote:
> All I'm asking is, instead of introducing a new function parameter
> here, to use an otherwise unused field of the xatp input
> structure. This is based on the fact that the field is being
> evaluated once directly in arch_memory_op(), and then no longer
> needed.

Ah ok, yes, that sounds ok.

Although I'm not convinced the compiler wouldn't do the smart thing
either way. Since the arguments are passed in registers it might even
hurt by hitting memory instead of just using the value held in the
register (realistically it will be a warm cache line, so probably
doesn't matter).

Would need care to make sure that this reuse didn't leak into the
hypercall ABI and end up painting ourselves into a corner WRT using all
three domains in the future, should it come to it.

Ian.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-05-31 10:14       ` Jan Beulich
  2013-05-31 10:40         ` Ian Campbell
@ 2013-06-05  0:24         ` Mukesh Rathor
  1 sibling, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-05  0:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Campbell, xen-devel

On Fri, 31 May 2013 11:14:03 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 31.05.13 at 11:38, Ian Campbell <Ian.Campbell@citrix.com>
> >>> wrote:
> > On Fri, 2013-05-31 at 10:28 +0100, Jan Beulich wrote:
> >> >>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com>
> >> >>> wrote:
> >> >  static int xenmem_add_to_physmap_once(
> >> >      struct domain *d,
> >> > -    const struct xen_add_to_physmap *xatp)
> >> > +    const struct xen_add_to_physmap *xatp,
> >> > +    domid_t foreign_domid)
> >> 
> >> The patch could be a bit smaller afaict if you used the otherwise
> >> unused here domain ID field in xatp for passing the domain ID you
> >> care about here (I hinted at that in the last round already, where
> >> I also asked Stefano why we have three domains here in the first
> >> place).
> > 
> > This interface is already used on ARM, I don't think we want PVH to
> > use a different variation of the same thing so I don't think it is
> > right to put this on Mukesh.
> 
> I expressed this wrong if you understood it this way: I'm not
> asking Mukesh to alter the interface - I had asked Stefano why
> it was the way it is, and while I'm not happy with the situation,
> I appreciate that changing it again is not a very good idea.
> 
> All I'm asking is, instead of introducing a new function parameter
> here, to use an otherwise unused field of the xatp input
> structure. This is based on the fact that the field is being
> evaluated once directly in arch_memory_op(), and then no longer
> needed.

Not a good programming practice IMO to alias fields unnecessarily. The 
documented purpose of the domid field in struct is not same as the 
parameter being passed:

struct xen_add_to_physmap {
    /* Which domain to change the mapping for. */
    domid_t domid;

As the function grows, and struct gets passed further down, it would be 
bewildering to someone reading and understanding the code. 

Since we've already discussed the existence of two domid fields, it seems
appropriate to leave it as it is.

> That would also allow doing the setting just once before loops get
> started, rather than having to pass the same value again on each
> loop iteration.
> 
> The one caveat is that for the continuation creation of
> XENMEM_add_to_physmap the field would need to be prevented
> from getting written back to guest memory. But I consider it bad
> practice anyway to copy back whole structures when only
> selected fields (often just one) need updating, and hence
> switching that operation to do a couple of individual field writes
> would be nice regardless of that change, even if this adds two or
> three lines of code.

Right, since this patch only deals with XENMEM_add_to_physmap_range, I'll
not worry about that.

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-05-31  9:28   ` Jan Beulich
  2013-05-31  9:38     ` Ian Campbell
@ 2013-06-05  0:31     ` Mukesh Rathor
  2013-06-05  7:32       ` Jan Beulich
  1 sibling, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-05  0:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 31 May 2013 10:28:54 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> 
> > +        if ( xsm_add_to_physmap(XSM_TARGET, current->domain, d) )
> > +        {
> > +            rcu_unlock_domain(d);
> > +            return -EPERM;
> > +        }
> I realize there's another such bogus use of the function in the same
> file, but you shouldn't propagate that mistake: The function has a
> proper return value, and that's what should be used here instead
> of forcing it to be -EPERM.

Ok, changed. 

> 
> I also vaguely recall having pointed out in a much earlier review
> that this functionality is lacking a counterpart in
> compat_arch_memory_op().

Hmm.. confused how/why a 64bit PVH go thru compat_arch_memory_op()? Can
you pl explain?

thanks,
M-

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/18] PVH xen: create domctl_memory_mapping() function
  2013-05-31  9:46   ` Jan Beulich
@ 2013-06-05  0:47     ` Mukesh Rathor
  2013-06-05  7:34       ` Jan Beulich
  0 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-05  0:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 31 May 2013 10:46:45 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> > Changes in V5:
> >   - Move iomem_access_permitted check to case statment from the
> > function as current doesn't point to dom0 during construct_dom0.
> > 
> > Changes in V6:
> >   - Move iomem_access_permitted back to domctl_memory_mapping() as
> > it should be after the sanity and wrap checks of mfns/gfns.
> 
> So you're undoing what you previously did? How can it work then?
> Or is the whole patch perhaps no longer needed with Dom0 code
> dropped for the time being?
> 
> Otherwise, rather than undoing what you did in v5, I'd think you'd
> want to keep the sanity/wrap checks in the caller as well, since if
> you need the function to be split out just for Dom0 building, you
> can assume the inputs to be sane. And if so, the XSM check could
> remain in the caller too. That would also make sure there really is
> no change in functionality:

Well, thats where I had it originally, but one of the reviewers early on
suggested moving it to the function, in case there's another caller in
future. 

It's only for dom0, so I can drop the check ' !is_idle_vcpu(current) ' and
add it during dom0 construction patch, or drop the whole patch and add it
to construct dom0 patch series, now phase 1.5. Please lmk.

> > +long domctl_memory_mapping(struct domain *d, unsigned long gfn,
> > +                           unsigned long mfn, unsigned long
> > nr_mfns,
> > +                           bool_t add_map)
> > +{
> > +    unsigned long i;
> > +    long ret;
> > +
> > +    if ( (mfn + nr_mfns - 1) < mfn || /* wrap? */
> > +         ((mfn | (mfn + nr_mfns - 1)) >> (paddr_bits -
> > PAGE_SHIFT)) ||
> > +         (gfn + nr_mfns - 1) < gfn ) /* wrap? */
> > +        return -EINVAL;
> > +
> > +    /* caller construct_dom0() runs on idle vcpu */
> > +    if ( !is_idle_vcpu(current) &&
> 
> This check is new, so this is not pure code motion.
> 
> > +         !iomem_access_permitted(current->domain, mfn, mfn +
> > nr_mfns - 1) )
> > +        return -EPERM;
> > +
> > +    ret = xsm_iomem_permission(XSM_HOOK, d, mfn, mfn + nr_mfns -
> > 1, add_map);
> 
> And this was xsm_iomem_mapping() in the original code. What's going
> on here?

Bad merge! That's why I like to do things in small steps and keep patchsets
small. Orig code had xsm_iomem_permission() at some point.

thanks
Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-06-05  0:31     ` Mukesh Rathor
@ 2013-06-05  7:32       ` Jan Beulich
  2013-06-05 20:41         ` Mukesh Rathor
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Beulich @ 2013-06-05  7:32 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 05.06.13 at 02:31, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>> I also vaguely recall having pointed out in a much earlier review
>> that this functionality is lacking a counterpart in
>> compat_arch_memory_op().
> 
> Hmm.. confused how/why a 64bit PVH go thru compat_arch_memory_op()? Can
> you pl explain?

Iirc the new hypercall isn't restricted to PVH guests, and hence
needs a compat implementation regardless of 32-bit PVH not
existing yet.

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/18] PVH xen: create domctl_memory_mapping() function
  2013-06-05  0:47     ` Mukesh Rathor
@ 2013-06-05  7:34       ` Jan Beulich
  0 siblings, 0 replies; 80+ messages in thread
From: Jan Beulich @ 2013-06-05  7:34 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 05.06.13 at 02:47, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Fri, 31 May 2013 10:46:45 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com>
>> >>> wrote:
>> > Changes in V5:
>> >   - Move iomem_access_permitted check to case statment from the
>> > function as current doesn't point to dom0 during construct_dom0.
>> > 
>> > Changes in V6:
>> >   - Move iomem_access_permitted back to domctl_memory_mapping() as
>> > it should be after the sanity and wrap checks of mfns/gfns.
>> 
>> So you're undoing what you previously did? How can it work then?
>> Or is the whole patch perhaps no longer needed with Dom0 code
>> dropped for the time being?
>> 
>> Otherwise, rather than undoing what you did in v5, I'd think you'd
>> want to keep the sanity/wrap checks in the caller as well, since if
>> you need the function to be split out just for Dom0 building, you
>> can assume the inputs to be sane. And if so, the XSM check could
>> remain in the caller too. That would also make sure there really is
>> no change in functionality:
> 
> Well, thats where I had it originally, but one of the reviewers early on
> suggested moving it to the function, in case there's another caller in
> future. 
> 
> It's only for dom0, so I can drop the check ' !is_idle_vcpu(current) ' and
> add it during dom0 construction patch, or drop the whole patch and add it
> to construct dom0 patch series, now phase 1.5. Please lmk.

If you don't need it for the current phase, best would indeed seem
to be to drop it from the current series.

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/18][V6]: PVH xen: version 6 patches...
  2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
                   ` (17 preceding siblings ...)
  2013-05-25  1:25 ` [PATCH 18/18] PVH xen: Add and remove foreign pages Mukesh Rathor
@ 2013-06-05 15:23 ` Konrad Rzeszutek Wilk
  2013-06-05 15:25   ` George Dunlap
                     ` (2 more replies)
  18 siblings, 3 replies; 80+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-05 15:23 UTC (permalink / raw)
  To: Mukesh Rathor, tim, ian.campbell, ian.jackson, george.dunlap; +Cc: Xen-devel

On Fri, May 24, 2013 at 06:25:19PM -0700, Mukesh Rathor wrote:
> I've version 6 of my patches for 64bit PVH guest for xen.  This is Phase I.
> These patches are built on top of git
> c/s: 9204bc654562976c7cdebf21c6b5013f6e3057b3
> 
> V6:
> The biggest change in V6 is dropping of dom0 PVH. It will take some time
> to investigate and redo dom0 construct to use unmodified PV code. These 
> patches in V6 will allow PV dom0 to create PVH domU. Please ack or indicate
> individual patches if there are no issues, so I know they have been looked
> at.

Ian, Tim, Ian, George,

Are you guys waiting for Jan to review all the patches or just
the hypervisor ones before looking at the rest (say the libxl ones)?

> 
> 
> Repeating from before:
> 
> Phase I:
>    - Establish a baseline of something working. These patches allow for
>      dom0 to be booted in PVH mode, and after that guests to be started
>      in PV, PVH, and HVM modes. I also tested booting dom0 in PV mode,
>      and starting PV, PVH, and HVM guests.
> 
>      Also, the disk must be specified as phy: in vm.cfg file:
>          > losetup /dev/loop1 guest.img
>          > vm.cfg file: disk = ['phy:/dev/loop1,xvda,w']          
> 
>      I've not tested anything else.
>      Note, HAP and iommu are required for PVH.
> 
> As a result of V3, there were two new action items on the linux side before
> it will boot as PVH: 1)MSI-X fixup and 2)load KERNEL_CS righ after gdt switch.
> 
> As a result of V5 a new fixme:
>   - MMIO ranges above the highest covered e820 address must be mapped for dom0.
> 
> Following fixme's exist in the code:
>   - Add support for more memory types in arch/x86/hvm/mtrr.c.
>   - arch/x86/time.c: support more tsc modes.
>   - check_guest_io_breakpoint(): check/add support for IO breakpoint.
>   - implement arch_get_info_guest() for pvh.
>   - vmxit_msr_read(): during AMD port go thru hvm_msr_read_intercept() again.
>   - verify bp matching on emulated instructions will work same as HVM for
>     PVH guest. see instruction_done() and check_guest_io_breakpoint().
> 
> Following remain to be done for PVH:
>    - AMD port.
>    - Avail PVH dom0 of posted interrupts. (This will be a big win).
>    - 32bit support in both linux and xen. Xen changes are tagged "32bitfixme".
>    - Add support for monitoring guest behavior. See hvm_memory_event* functions
>      in hvm.c
>    - Change xl to support other modes other than "phy:".
>    - Hotplug support
>    - Migration of PVH guests.
> 
> Thanks for all the help,
> Mukesh
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/18][V6]: PVH xen: version 6 patches...
  2013-06-05 15:23 ` [PATCH 00/18][V6]: PVH xen: version 6 patches Konrad Rzeszutek Wilk
@ 2013-06-05 15:25   ` George Dunlap
  2013-06-05 15:36   ` Ian Campbell
  2013-06-05 17:14   ` Tim Deegan
  2 siblings, 0 replies; 80+ messages in thread
From: George Dunlap @ 2013-06-05 15:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: ian.jackson, Xen-devel, tim, ian.campbell

On 05/06/13 16:23, Konrad Rzeszutek Wilk wrote:
> On Fri, May 24, 2013 at 06:25:19PM -0700, Mukesh Rathor wrote:
>> I've version 6 of my patches for 64bit PVH guest for xen.  This is Phase I.
>> These patches are built on top of git
>> c/s: 9204bc654562976c7cdebf21c6b5013f6e3057b3
>>
>> V6:
>> The biggest change in V6 is dropping of dom0 PVH. It will take some time
>> to investigate and redo dom0 construct to use unmodified PV code. These
>> patches in V6 will allow PV dom0 to create PVH domU. Please ack or indicate
>> individual patches if there are no issues, so I know they have been looked
>> at.
> Ian, Tim, Ian, George,
>
> Are you guys waiting for Jan to review all the patches or just
> the hypervisor ones before looking at the rest (say the libxl ones)?

TBH I'm waiting until I have all the 4.3 stuff sorted out before even 
looking at them.

  -George

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/18] PVH xen: support hypercalls for PVH
  2013-05-25  1:25 ` [PATCH 12/18] PVH xen: support hypercalls for PVH Mukesh Rathor
@ 2013-06-05 15:27   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 80+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-05 15:27 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel

On Fri, May 24, 2013 at 06:25:31PM -0700, Mukesh Rathor wrote:
> This patch replaces the old patch which created pvh.c. Instead, we modify
> hvm.c to add support for PVH also.

I think that is a not a very helpfull commit description.

In six months if one looks at that one will say 'what old patch?'?
The 'we modify hvm.c to add support for PVH also' is obvious as
you are touching said file and the subject mentiones that.

Perhaps just describe this patch limitation. Enumerate the
hypercalls that are implemented and explain why the other ones
are not implemented?


> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> ---
>  xen/arch/x86/hvm/hvm.c |   58 +++++++++++++++++++++++++++++++++++++++--------
>  1 files changed, 48 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index a525080..74004bc 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3242,6 +3242,8 @@ static long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          case PHYSDEVOP_get_free_pirq:
>              return do_physdev_op(cmd, arg);
>          default:
> +            if ( is_pvh_vcpu(current) && is_hardware_domain(current->domain) )
> +                return do_physdev_op(cmd, arg);
>              return -ENOSYS;
>      }
>  }
> @@ -3249,7 +3251,7 @@ static long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>  static long hvm_vcpu_op(
>      int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>  {
> -    long rc;
> +    long rc = -ENOSYS;
>  
>      switch ( cmd )
>      {
> @@ -3262,6 +3264,14 @@ static long hvm_vcpu_op(
>      case VCPUOP_register_vcpu_info:
>          rc = do_vcpu_op(cmd, vcpuid, arg);
>          break;
> +
> +    case VCPUOP_is_up:
> +    case VCPUOP_up:
> +    case VCPUOP_initialise:
> +        if ( is_pvh_vcpu(current) )
> +            rc = do_vcpu_op(cmd, vcpuid, arg);
> +        break;
> +
>      default:
>          rc = -ENOSYS;
>          break;
> @@ -3381,12 +3391,31 @@ static hvm_hypercall_t *const hvm_hypercall32_table[NR_hypercalls] = {
>      HYPERCALL(tmem_op)
>  };
>  
> +/* PVH 32bitfixme */
> +static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
> +    HYPERCALL(platform_op),
> +    HYPERCALL(memory_op),
> +    HYPERCALL(xen_version),
> +    HYPERCALL(console_io),
> +    [ __HYPERVISOR_grant_table_op ]  = (hvm_hypercall_t *)hvm_grant_table_op,
> +    [ __HYPERVISOR_vcpu_op ]         = (hvm_hypercall_t *)hvm_vcpu_op,
> +    HYPERCALL(mmuext_op),
> +    HYPERCALL(xsm_op),
> +    HYPERCALL(sched_op),
> +    HYPERCALL(event_channel_op),
> +    [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
> +    HYPERCALL(hvm_op),
> +    HYPERCALL(sysctl),
> +    HYPERCALL(domctl)
> +};
> +
>  int hvm_do_hypercall(struct cpu_user_regs *regs)
>  {
>      struct vcpu *curr = current;
>      struct segment_register sreg;
>      int mode = hvm_guest_x86_mode(curr);
>      uint32_t eax = regs->eax;
> +    hvm_hypercall_t **hcall_table;
>  
>      switch ( mode )
>      {
> @@ -3407,7 +3436,9 @@ int hvm_do_hypercall(struct cpu_user_regs *regs)
>      if ( (eax & 0x80000000) && is_viridian_domain(curr->domain) )
>          return viridian_hypercall(regs);
>  
> -    if ( (eax >= NR_hypercalls) || !hvm_hypercall32_table[eax] )
> +    if ( (eax >= NR_hypercalls) ||
> +         (is_pvh_vcpu(curr) && !pvh_hypercall64_table[eax]) ||
> +         (is_hvm_vcpu(curr) && !hvm_hypercall32_table[eax]) )
>      {
>          regs->eax = -ENOSYS;
>          return HVM_HCALL_completed;
> @@ -3421,17 +3452,24 @@ int hvm_do_hypercall(struct cpu_user_regs *regs)
>                      eax, regs->rdi, regs->rsi, regs->rdx,
>                      regs->r10, regs->r8, regs->r9);
>  
> +        if ( is_pvh_vcpu(curr) )
> +            hcall_table = (hvm_hypercall_t **)pvh_hypercall64_table;
> +        else
> +            hcall_table = (hvm_hypercall_t **)hvm_hypercall64_table;
> +
>          curr->arch.hvm_vcpu.hcall_64bit = 1;
> -        regs->rax = hvm_hypercall64_table[eax](regs->rdi,
> -                                               regs->rsi,
> -                                               regs->rdx,
> -                                               regs->r10,
> -                                               regs->r8,
> -                                               regs->r9); 
> +        regs->rax = hcall_table[eax](regs->rdi,
> +                                     regs->rsi,
> +                                     regs->rdx,
> +                                     regs->r10,
> +                                     regs->r8,
> +                                     regs->r9);
>          curr->arch.hvm_vcpu.hcall_64bit = 0;
>      }
>      else
>      {
> +        ASSERT(!is_pvh_vcpu(curr));   /* PVH 32bitfixme */
> +
>          HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%u(%x, %x, %x, %x, %x, %x)", eax,
>                      (uint32_t)regs->ebx, (uint32_t)regs->ecx,
>                      (uint32_t)regs->edx, (uint32_t)regs->esi,
> @@ -3855,7 +3893,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              return -ESRCH;
>  
>          rc = -EINVAL;
> -        if ( !is_hvm_domain(d) )
> +        if ( is_pv_domain(d) )
>              goto param_fail;
>  
>          rc = xsm_hvm_param(XSM_TARGET, d, op);
> @@ -4027,7 +4065,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  break;
>              }
>  
> -            if ( rc == 0 ) 
> +            if ( rc == 0 && !is_pvh_domain(d) )
>              {
>                  d->arch.hvm_domain.params[a.index] = a.value;
>  
> -- 
> 1.7.2.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/18] PVH xen: Move e820 fields out of pv_domain struct
  2013-05-25  1:25 ` [PATCH 06/18] PVH xen: Move e820 fields out of pv_domain struct Mukesh Rathor
@ 2013-06-05 15:33   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 80+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-05 15:33 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel

On Fri, May 24, 2013 at 06:25:25PM -0700, Mukesh Rathor wrote:
> This patch moves fields out of the pv_domain struct as they are used by
> PVH also.
> 
> Changes in V6:
>   - Don't base on guest type the initialization and cleanup.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/arch/x86/domain.c        |   10 ++++------
>  xen/arch/x86/mm.c            |   26 +++++++++++++-------------
>  xen/include/asm-x86/domain.h |   10 +++++-----
>  3 files changed, 22 insertions(+), 24 deletions(-)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index a5f2885..e53a937 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -553,6 +553,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
>          if ( (rc = iommu_domain_init(d)) != 0 )
>              goto fail;
>      }
> +    spin_lock_init(&d->arch.e820_lock);
>  
>      if ( is_hvm_domain(d) )
>      {
> @@ -563,13 +564,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
>          }
>      }
>      else
> -    {
>          /* 64-bit PV guest by default. */
>          d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
>  
> -        spin_lock_init(&d->arch.pv_domain.e820_lock);
> -    }
> -
>      /* initialize default tsc behavior in case tools don't */
>      tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
>      spin_lock_init(&d->arch.vtsc_lock);
> @@ -592,8 +589,9 @@ void arch_domain_destroy(struct domain *d)
>  {
>      if ( is_hvm_domain(d) )
>          hvm_domain_destroy(d);
> -    else
> -        xfree(d->arch.pv_domain.e820);
> +
> +    if ( d->arch.e820 )
> +        xfree(d->arch.e820);
>  
>      free_domain_pirqs(d);
>      if ( !is_idle_domain(d) )
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 43f0769..bd1402e 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4827,11 +4827,11 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              return -EFAULT;
>          }
>  
> -        spin_lock(&d->arch.pv_domain.e820_lock);
> -        xfree(d->arch.pv_domain.e820);
> -        d->arch.pv_domain.e820 = e820;
> -        d->arch.pv_domain.nr_e820 = fmap.map.nr_entries;
> -        spin_unlock(&d->arch.pv_domain.e820_lock);
> +        spin_lock(&d->arch.e820_lock);
> +        xfree(d->arch.e820);
> +        d->arch.e820 = e820;
> +        d->arch.nr_e820 = fmap.map.nr_entries;
> +        spin_unlock(&d->arch.e820_lock);
>  
>          rcu_unlock_domain(d);
>          return rc;
> @@ -4845,26 +4845,26 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( copy_from_guest(&map, arg, 1) )
>              return -EFAULT;
>  
> -        spin_lock(&d->arch.pv_domain.e820_lock);
> +        spin_lock(&d->arch.e820_lock);
>  
>          /* Backwards compatibility. */
> -        if ( (d->arch.pv_domain.nr_e820 == 0) ||
> -             (d->arch.pv_domain.e820 == NULL) )
> +        if ( (d->arch.nr_e820 == 0) ||
> +             (d->arch.e820 == NULL) )
>          {
> -            spin_unlock(&d->arch.pv_domain.e820_lock);
> +            spin_unlock(&d->arch.e820_lock);
>              return -ENOSYS;
>          }
>  
> -        map.nr_entries = min(map.nr_entries, d->arch.pv_domain.nr_e820);
> -        if ( copy_to_guest(map.buffer, d->arch.pv_domain.e820,
> +        map.nr_entries = min(map.nr_entries, d->arch.nr_e820);
> +        if ( copy_to_guest(map.buffer, d->arch.e820,
>                             map.nr_entries) ||
>               __copy_to_guest(arg, &map, 1) )
>          {
> -            spin_unlock(&d->arch.pv_domain.e820_lock);
> +            spin_unlock(&d->arch.e820_lock);
>              return -EFAULT;
>          }
>  
> -        spin_unlock(&d->arch.pv_domain.e820_lock);
> +        spin_unlock(&d->arch.e820_lock);
>          return 0;
>      }
>  
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index d79464d..c3f9f8e 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -234,11 +234,6 @@ struct pv_domain
>  
>      /* map_domain_page() mapping cache. */
>      struct mapcache_domain mapcache;
> -
> -    /* Pseudophysical e820 map (XENMEM_memory_map).  */
> -    spinlock_t e820_lock;
> -    struct e820entry *e820;
> -    unsigned int nr_e820;
>  };
>  
>  struct arch_domain
> @@ -313,6 +308,11 @@ struct arch_domain
>                                  (possibly other cases in the future */
>      uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */
>      uint64_t vtsc_usercount; /* not used for hvm */
> +
> +    /* Pseudophysical e820 map (XENMEM_memory_map).  */
> +    spinlock_t e820_lock;
> +    struct e820entry *e820;
> +    unsigned int nr_e820;
>  } __cacheline_aligned;
>  
>  #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
> -- 
> 1.7.2.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/18][V6]: PVH xen: version 6 patches...
  2013-06-05 15:23 ` [PATCH 00/18][V6]: PVH xen: version 6 patches Konrad Rzeszutek Wilk
  2013-06-05 15:25   ` George Dunlap
@ 2013-06-05 15:36   ` Ian Campbell
  2013-06-05 18:34     ` Konrad Rzeszutek Wilk
  2013-06-06 10:08     ` George Dunlap
  2013-06-05 17:14   ` Tim Deegan
  2 siblings, 2 replies; 80+ messages in thread
From: Ian Campbell @ 2013-06-05 15:36 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: george.dunlap, ian.jackson, Xen-devel, tim

On Wed, 2013-06-05 at 11:23 -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, May 24, 2013 at 06:25:19PM -0700, Mukesh Rathor wrote:
> > I've version 6 of my patches for 64bit PVH guest for xen.  This is Phase I.
> > These patches are built on top of git
> > c/s: 9204bc654562976c7cdebf21c6b5013f6e3057b3
> > 
> > V6:
> > The biggest change in V6 is dropping of dom0 PVH. It will take some time
> > to investigate and redo dom0 construct to use unmodified PV code. These 
> > patches in V6 will allow PV dom0 to create PVH domU. Please ack or indicate
> > individual patches if there are no issues, so I know they have been looked
> > at.
> 
> Ian, Tim, Ian, George,
> 
> Are you guys waiting for Jan to review all the patches or just
> the hypervisor ones before looking at the rest (say the libxl ones)?

There are libxl ones? I thought this was dom0 only at the moment?

Ian.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/18] PVH xen: Miscellaneous changes
  2013-05-25  1:25 ` [PATCH 16/18] PVH xen: Miscellaneous changes Mukesh Rathor
@ 2013-06-05 15:39   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 80+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-05 15:39 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel

On Fri, May 24, 2013 at 06:25:35PM -0700, Mukesh Rathor wrote:
> This patch contains misc changes like restricting iobitmap calls for PVH,
> restricting 32bit PVH guest, etc..

Could you please mention _why_ in the commit? And enumerate which
hypercalls are restricted. From the look of it is:

PHYSDEVOP_set_iopl, PHYSDEVOP_set_iobitmap and XEN_DOMCTL_getvcpucontext?


> 
> Changes in V6:
>   - clear out vcpu_guest_context struct in arch_get_info_guest.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

I think besides the commit description which needs a bit more
explanation you can also attach:

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

> ---
>  xen/arch/x86/domain.c      |    7 +++++++
>  xen/arch/x86/domain_page.c |   10 +++++-----
>  xen/arch/x86/domctl.c      |    6 ++++++
>  xen/arch/x86/mm.c          |    2 +-
>  xen/arch/x86/physdev.c     |   13 +++++++++++++
>  xen/common/grant_table.c   |    4 ++--
>  6 files changed, 34 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 9953f80..8cff7c9 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -339,6 +339,13 @@ int switch_compat(struct domain *d)
>  
>      if ( d == NULL )
>          return -EINVAL;
> +
> +    if ( is_pvh_domain(d) )
> +    {
> +        gdprintk(XENLOG_G_ERR,
> +                 "Xen does not currently support 32bit PVH guests\n");
> +        return -EINVAL;
> +    }
>      if ( !may_switch_mode(d) )
>          return -EACCES;
>      if ( is_pv_32on64_domain(d) )
> diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
> index efda6af..7685416 100644
> --- a/xen/arch/x86/domain_page.c
> +++ b/xen/arch/x86/domain_page.c
> @@ -34,7 +34,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
>       * then it means we are running on the idle domain's page table and must
>       * therefore use its mapcache.
>       */
> -    if ( unlikely(pagetable_is_null(v->arch.guest_table)) && !is_hvm_vcpu(v) )
> +    if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) )
>      {
>          /* If we really are idling, perform lazy context switch now. */
>          if ( (v = idle_vcpu[smp_processor_id()]) == current )
> @@ -71,7 +71,7 @@ void *map_domain_page(unsigned long mfn)
>  #endif
>  
>      v = mapcache_current_vcpu();
> -    if ( !v || is_hvm_vcpu(v) )
> +    if ( !v || !is_pv_vcpu(v) )
>          return mfn_to_virt(mfn);
>  
>      dcache = &v->domain->arch.pv_domain.mapcache;
> @@ -175,7 +175,7 @@ void unmap_domain_page(const void *ptr)
>      ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
>  
>      v = mapcache_current_vcpu();
> -    ASSERT(v && !is_hvm_vcpu(v));
> +    ASSERT(v && is_pv_vcpu(v));
>  
>      dcache = &v->domain->arch.pv_domain.mapcache;
>      ASSERT(dcache->inuse);
> @@ -242,7 +242,7 @@ int mapcache_domain_init(struct domain *d)
>      struct mapcache_domain *dcache = &d->arch.pv_domain.mapcache;
>      unsigned int bitmap_pages;
>  
> -    if ( is_hvm_domain(d) || is_idle_domain(d) )
> +    if ( !is_pv_domain(d) || is_idle_domain(d) )
>          return 0;
>  
>  #ifdef NDEBUG
> @@ -273,7 +273,7 @@ int mapcache_vcpu_init(struct vcpu *v)
>      unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
>      unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long));
>  
> -    if ( is_hvm_vcpu(v) || !dcache->inuse )
> +    if ( !is_pv_vcpu(v) || !dcache->inuse )
>          return 0;
>  
>      if ( ents > dcache->entries )
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index ce32245..8b44061 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -1305,6 +1305,12 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
>              c.nat->gs_base_kernel = hvm_get_shadow_gs_base(v);
>          }
>      }
> +    else if ( is_pvh_vcpu(v) )
> +    {
> +        /* pvh fixme: punt it to phase II */
> +        printk(XENLOG_WARNING "PVH: fixme: arch_get_info_guest()\n");
> +        memset(c.nat, 0, sizeof(*c.nat));
> +    }
>      else
>      {
>          c(ldt_base = v->arch.pv_vcpu.ldt_base);
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index b190ad9..e992b4f 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -2805,7 +2805,7 @@ static struct domain *get_pg_owner(domid_t domid)
>          goto out;
>      }
>  
> -    if ( unlikely(paging_mode_translate(curr)) )
> +    if ( !is_pvh_domain(curr) && unlikely(paging_mode_translate(curr)) )
>      {
>          MEM_LOG("Cannot mix foreign mappings with translated domains");
>          goto out;
> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
> index 3733c7a..2fc7ae6 100644
> --- a/xen/arch/x86/physdev.c
> +++ b/xen/arch/x86/physdev.c
> @@ -475,6 +475,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>      case PHYSDEVOP_set_iopl: {
>          struct physdev_set_iopl set_iopl;
> +
> +        if ( is_pvh_vcpu(current) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
>          ret = -EFAULT;
>          if ( copy_from_guest(&set_iopl, arg, 1) != 0 )
>              break;
> @@ -488,6 +495,12 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>      case PHYSDEVOP_set_iobitmap: {
>          struct physdev_set_iobitmap set_iobitmap;
> +
> +        if ( is_pvh_vcpu(current) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
>          ret = -EFAULT;
>          if ( copy_from_guest(&set_iobitmap, arg, 1) != 0 )
>              break;
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index 3f97328..a2073d2 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -721,7 +721,7 @@ __gnttab_map_grant_ref(
>  
>      double_gt_lock(lgt, rgt);
>  
> -    if ( !is_hvm_domain(ld) && need_iommu(ld) )
> +    if ( is_pv_domain(ld) && need_iommu(ld) )
>      {
>          unsigned int wrc, rdc;
>          int err = 0;
> @@ -932,7 +932,7 @@ __gnttab_unmap_common(
>              act->pin -= GNTPIN_hstw_inc;
>      }
>  
> -    if ( !is_hvm_domain(ld) && need_iommu(ld) )
> +    if ( is_pv_domain(ld) && need_iommu(ld) )
>      {
>          unsigned int wrc, rdc;
>          int err = 0;
> -- 
> 1.7.2.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/18][V6]: PVH xen: version 6 patches...
  2013-06-05 15:23 ` [PATCH 00/18][V6]: PVH xen: version 6 patches Konrad Rzeszutek Wilk
  2013-06-05 15:25   ` George Dunlap
  2013-06-05 15:36   ` Ian Campbell
@ 2013-06-05 17:14   ` Tim Deegan
  2013-06-06  7:29     ` Jan Beulich
  2 siblings, 1 reply; 80+ messages in thread
From: Tim Deegan @ 2013-06-05 17:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: george.dunlap, Xen-devel, ian.jackson, ian.campbell

At 11:23 -0400 on 05 Jun (1370431396), Konrad Rzeszutek Wilk wrote:
> On Fri, May 24, 2013 at 06:25:19PM -0700, Mukesh Rathor wrote:
> > I've version 6 of my patches for 64bit PVH guest for xen.  This is Phase I.
> > These patches are built on top of git
> > c/s: 9204bc654562976c7cdebf21c6b5013f6e3057b3
> > 
> > V6:
> > The biggest change in V6 is dropping of dom0 PVH. It will take some time
> > to investigate and redo dom0 construct to use unmodified PV code. These 
> > patches in V6 will allow PV dom0 to create PVH domU. Please ack or indicate
> > individual patches if there are no issues, so I know they have been looked
> > at.
> 
> Ian, Tim, Ian, George,
> 
> Are you guys waiting for Jan to review all the patches or just
> the hypervisor ones before looking at the rest (say the libxl ones)?

I'm waiting for Jan right now (though I'll only be reviewing the
hypervisor side anyway).  It takes me basically a whole day (i.e. my
entire working week on Xen things right now) to review a series like
this in any detail, so I'm afraid I'm waiting for Jan's feedback to be
addressed before I read it again.  IIRC all the x86/mm concerns I had
with early versions have been sorted out but a lot of things have
changed since then.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/18][V6]: PVH xen: version 6 patches...
  2013-06-05 15:36   ` Ian Campbell
@ 2013-06-05 18:34     ` Konrad Rzeszutek Wilk
  2013-06-05 20:51       ` Ian Campbell
  2013-06-06 10:08     ` George Dunlap
  1 sibling, 1 reply; 80+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-05 18:34 UTC (permalink / raw)
  To: Ian Campbell; +Cc: george.dunlap, ian.jackson, Xen-devel, tim

On Wed, Jun 05, 2013 at 04:36:02PM +0100, Ian Campbell wrote:
> On Wed, 2013-06-05 at 11:23 -0400, Konrad Rzeszutek Wilk wrote:
> > On Fri, May 24, 2013 at 06:25:19PM -0700, Mukesh Rathor wrote:
> > > I've version 6 of my patches for 64bit PVH guest for xen.  This is Phase I.
> > > These patches are built on top of git
> > > c/s: 9204bc654562976c7cdebf21c6b5013f6e3057b3
> > > 
> > > V6:
> > > The biggest change in V6 is dropping of dom0 PVH. It will take some time
> > > to investigate and redo dom0 construct to use unmodified PV code. These 
> > > patches in V6 will allow PV dom0 to create PVH domU. Please ack or indicate
> > > individual patches if there are no issues, so I know they have been looked
> > > at.
> > 
> > Ian, Tim, Ian, George,
> > 
> > Are you guys waiting for Jan to review all the patches or just
> > the hypervisor ones before looking at the rest (say the libxl ones)?
> 
> There are libxl ones? I thought this was dom0 only at the moment?

The patchset had support for both - dom0 and domU. But Mukesh dropped
the dom0 ones and is focusing on domU only. The changes are to libxc
and libxl:
 [PATCH 08/18] PVH xen: tools changes to create PVH domain

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-06-05  7:32       ` Jan Beulich
@ 2013-06-05 20:41         ` Mukesh Rathor
  2013-06-06  6:43           ` Jan Beulich
  0 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-05 20:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Wed, 05 Jun 2013 08:32:52 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 05.06.13 at 02:31, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> >> I also vaguely recall having pointed out in a much earlier review
> >> that this functionality is lacking a counterpart in
> >> compat_arch_memory_op().
> > 
> > Hmm.. confused how/why a 64bit PVH go thru compat_arch_memory_op()?
> > Can you pl explain?
> 
> Iirc the new hypercall isn't restricted to PVH guests, and hence
> needs a compat implementation regardless of 32-bit PVH not
> existing yet.

This patch does not introduce the hcall, it was introduced much earlier.
It implements the portion needed for 64bit PVH. It also documents the 
intention in the public header file for the subcall it uses:

#define XENMAPSPACE_gmfn_foreign 4 /* GMFN from another dom,
                                    * XENMEM_add_to_physmap_range only.
                                    * (PVH x86 only) */

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/18][V6]: PVH xen: version 6 patches...
  2013-06-05 18:34     ` Konrad Rzeszutek Wilk
@ 2013-06-05 20:51       ` Ian Campbell
  2013-06-05 22:01         ` Mukesh Rathor
  0 siblings, 1 reply; 80+ messages in thread
From: Ian Campbell @ 2013-06-05 20:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: george.dunlap, ian.jackson, Xen-devel, tim

On Wed, 2013-06-05 at 14:34 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 05, 2013 at 04:36:02PM +0100, Ian Campbell wrote:
> > On Wed, 2013-06-05 at 11:23 -0400, Konrad Rzeszutek Wilk wrote:
> > > On Fri, May 24, 2013 at 06:25:19PM -0700, Mukesh Rathor wrote:
> > > > I've version 6 of my patches for 64bit PVH guest for xen.  This is Phase I.
> > > > These patches are built on top of git
> > > > c/s: 9204bc654562976c7cdebf21c6b5013f6e3057b3
> > > > 
> > > > V6:
> > > > The biggest change in V6 is dropping of dom0 PVH. It will take some time
> > > > to investigate and redo dom0 construct to use unmodified PV code. These 
> > > > patches in V6 will allow PV dom0 to create PVH domU. Please ack or indicate
> > > > individual patches if there are no issues, so I know they have been looked
> > > > at.
> > > 
> > > Ian, Tim, Ian, George,
> > > 
> > > Are you guys waiting for Jan to review all the patches or just
> > > the hypervisor ones before looking at the rest (say the libxl ones)?
> > 
> > There are libxl ones? I thought this was dom0 only at the moment?
> 
> The patchset had support for both - dom0 and domU. But Mukesh dropped
> the dom0 ones and is focusing on domU only. The changes are to libxc
> and libxl:

I thought the series (up to v5 at least) was dom0 *only*, I hadn't
realised this had been flipped onto its head for the v6 posting.

>  [PATCH 08/18] PVH xen: tools changes to create PVH domain

I'll try and find time to take a look, but it might have to wait until
after 4.3.

Ian.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/18][V6]: PVH xen: version 6 patches...
  2013-06-05 20:51       ` Ian Campbell
@ 2013-06-05 22:01         ` Mukesh Rathor
  2013-06-06  8:46           ` Ian Campbell
  0 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-05 22:01 UTC (permalink / raw)
  To: Ian Campbell
  Cc: george.dunlap, ian.jackson, Xen-devel, tim, Konrad Rzeszutek Wilk

On Wed, 5 Jun 2013 21:51:54 +0100
Ian Campbell <ian.campbell@citrix.com> wrote:

> On Wed, 2013-06-05 at 14:34 -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jun 05, 2013 at 04:36:02PM +0100, Ian Campbell wrote:
> > > On Wed, 2013-06-05 at 11:23 -0400, Konrad Rzeszutek Wilk wrote:
> > > > On Fri, May 24, 2013 at 06:25:19PM -0700, Mukesh Rathor wrote:
> > > > > I've version 6 of my patches for 64bit PVH guest for xen.
> > > > > This is Phase I. These patches are built on top of git
> > > > > c/s: 9204bc654562976c7cdebf21c6b5013f6e3057b3
> > > > > 
> > > > > V6:
> > > > > The biggest change in V6 is dropping of dom0 PVH. It will
> > > > > take some time to investigate and redo dom0 construct to use
> > > > > unmodified PV code. These patches in V6 will allow PV dom0 to
> > > > > create PVH domU. Please ack or indicate individual patches if
> > > > > there are no issues, so I know they have been looked at.
> > > > 
> > > > Ian, Tim, Ian, George,
> > > > 
> > > > Are you guys waiting for Jan to review all the patches or just
> > > > the hypervisor ones before looking at the rest (say the libxl
> > > > ones)?
> > > 
> > > There are libxl ones? I thought this was dom0 only at the moment?
> > 
> > The patchset had support for both - dom0 and domU. But Mukesh
> > dropped the dom0 ones and is focusing on domU only. The changes are
> > to libxc and libxl:
> 
> I thought the series (up to v5 at least) was dom0 *only*, I hadn't
> realised this had been flipped onto its head for the v6 posting.

There was never a dom0 only series, initially it was both, then dom0
got dropped, as Jan wants a different approach.


> >  [PATCH 08/18] PVH xen: tools changes to create PVH domain
> 
> I'll try and find time to take a look, but it might have to wait until
> after 4.3.

This patch was looked at in the very early versions... So I hope nothing
major after all this time.

thanks
M-

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/18] PVH xen: add params to read_segment_register
  2013-05-31 10:00   ` Jan Beulich
@ 2013-06-06  1:25     ` Mukesh Rathor
  2013-06-06  6:48       ` Jan Beulich
  0 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-06  1:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 31 May 2013 11:00:12 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> > @@ -240,10 +240,10 @@ void do_double_fault(struct cpu_user_regs
> > *regs) crs[2] = read_cr2();
> >      crs[3] = read_cr3();
> >      crs[4] = read_cr4();
> > -    regs->ds = read_segment_register(ds);
> > -    regs->es = read_segment_register(es);
> > -    regs->fs = read_segment_register(fs);
> > -    regs->gs = read_segment_register(gs);
> > +    regs->ds = read_segment_register(current, regs, ds);
> > +    regs->es = read_segment_register(current, regs, es);
> > +    regs->fs = read_segment_register(current, regs, fs);
> > +    regs->gs = read_segment_register(current, regs, gs);
> 
> In patch 9 you start using the first parameter of
> read_segment_register() in ways not compatible with the use of
> current here - the double fault handler (and in general all host side
> exception handling code, i.e. the change to show_registers() is
> questionable too) wants to use the real register value, not what's
> in regs->. Even more, with the VMEXIT code storing at best
> a known bad value into these fields, is it really valid to use them
> at all (i.e. things ought to work much like the if() portion of
> show_registers() which you _do not_ modify).

Right, in case of double fault we'd need the real values. 
The only thing comes to mind:

#define read_segment_register(vcpu, regs, name)                   \
({  u16 __sel;                                                    \
    struct cpu_user_regs *_regs = (regs);                         \
                                                                  \
    if ( guest_mode(regs) && is_pvh_vcpu(vcpu) )         <==========
        __sel = _regs->name;                                      \
    else                                                          \
        asm volatile ( "movw %%" #name ",%0" : "=r" (__sel) );    \
    __sel;                                                        \
})                  

but let me verify this would work for all possible contect_switch ->
save_segments() calls.

BTW, I can't use current in the macro because of call from save_segments(). 

> at all (i.e. things ought to work much like the if() portion of
> show_registers() which you _do not_ modify).

Yeah, it was on hold because I've been investigating guest_cr[] sanity,
and found that I was missing:

    v->arch.hvm_vcpu.guest_cr[4] = value;

So, my next version will add that and update show_registers() for PVH.
I can scratch off another fixme from my list.

BTW, In the process I realized in the cr4 update intercept I am missing:

    if ( value & HVM_CR4_GUEST_RESERVED_BITS(v) )
    {
        HVM_DBG_LOG(DBG_LEVEL_1,
                    "Guest attempts to set reserved bit in CR4: %lx",
                    value);
        goto gpf;
    }

    if ( !(value & X86_CR4_PAE) && hvm_long_mode_enabled(v) )
    {
        HVM_DBG_LOG(DBG_LEVEL_1, "Guest cleared CR4.PAE while "
                    "EFER.LMA is set");
        goto gpf;
    }

I can't recall now whether I somehow concluded I didn't need to worry about
it for PVH since I was only thinking 64bit, or just missed it. 
I guess I should have the check even if I expect the guest to always
be in LME, right?

thanks
mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-06-05 20:41         ` Mukesh Rathor
@ 2013-06-06  6:43           ` Jan Beulich
  2013-06-06 22:19             ` Mukesh Rathor
  2013-06-07 15:08             ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 80+ messages in thread
From: Jan Beulich @ 2013-06-06  6:43 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 05.06.13 at 22:41, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Wed, 05 Jun 2013 08:32:52 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 05.06.13 at 02:31, Mukesh Rathor <mukesh.rathor@oracle.com>
>> >>> wrote:
>> >> I also vaguely recall having pointed out in a much earlier review
>> >> that this functionality is lacking a counterpart in
>> >> compat_arch_memory_op().
>> > 
>> > Hmm.. confused how/why a 64bit PVH go thru compat_arch_memory_op()?
>> > Can you pl explain?
>> 
>> Iirc the new hypercall isn't restricted to PVH guests, and hence
>> needs a compat implementation regardless of 32-bit PVH not
>> existing yet.
> 
> This patch does not introduce the hcall, it was introduced much earlier.
> It implements the portion needed for 64bit PVH.

So how is this statement of yours in line with this hunk of the
patch?

@@ -4716,6 +4751,39 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
         return rc;
     }
 
+    case XENMEM_add_to_physmap_range:
+    {
+        struct xen_add_to_physmap_range xatpr;
+        struct domain *d;
+
+        if ( copy_from_guest(&xatpr, arg, 1) )
+            return -EFAULT;
+
+        /* This mapspace is redundant for this hypercall */
+        if ( xatpr.space == XENMAPSPACE_gmfn_range )
+            return -EINVAL;
+
+        d = rcu_lock_domain_by_any_id(xatpr.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        if ( xsm_add_to_physmap(XSM_TARGET, current->domain, d) )
+        {
+            rcu_unlock_domain(d);
+            return -EPERM;
+        }
+
+        rc = xenmem_add_to_physmap_range(d, &xatpr);
+
+        rcu_unlock_domain(d);
+
+        if ( rc == -EAGAIN )
+            rc = hypercall_create_continuation(
+                __HYPERVISOR_memory_op, "ih", op, arg);
+
+        return rc;
+    }
+
     case XENMEM_set_memory_map:
     {
         struct xen_foreign_memory_map fmap;

If the hypercall were handled before, adding a new case statement
to arch_memory_op() would cause a compilation error. All that was
there before this patch was the definition of the hypercall (for ARM),
but I'm quite strongly opposed to adding x86 support for this
hypercall only for one half of the possible set of PV (and HVM?)
guests; the fact that PVH is 64-bit only for the moment has nothing
to do with this. The only alternative would be to constrain the
specific sub-hypercall to PVH, but that would be rather contrived,
so I'm already trying to make clear that I wouldn't accept such a
solution to the original comment.

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/18] PVH xen: add params to read_segment_register
  2013-06-06  1:25     ` Mukesh Rathor
@ 2013-06-06  6:48       ` Jan Beulich
  2013-06-07  1:43         ` Mukesh Rathor
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Beulich @ 2013-06-06  6:48 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 06.06.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Fri, 31 May 2013 11:00:12 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 25.05.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com>
>> >>> wrote:
>> > @@ -240,10 +240,10 @@ void do_double_fault(struct cpu_user_regs
>> > *regs) crs[2] = read_cr2();
>> >      crs[3] = read_cr3();
>> >      crs[4] = read_cr4();
>> > -    regs->ds = read_segment_register(ds);
>> > -    regs->es = read_segment_register(es);
>> > -    regs->fs = read_segment_register(fs);
>> > -    regs->gs = read_segment_register(gs);
>> > +    regs->ds = read_segment_register(current, regs, ds);
>> > +    regs->es = read_segment_register(current, regs, es);
>> > +    regs->fs = read_segment_register(current, regs, fs);
>> > +    regs->gs = read_segment_register(current, regs, gs);
>> 
>> In patch 9 you start using the first parameter of
>> read_segment_register() in ways not compatible with the use of
>> current here - the double fault handler (and in general all host side
>> exception handling code, i.e. the change to show_registers() is
>> questionable too) wants to use the real register value, not what's
>> in regs->. Even more, with the VMEXIT code storing at best
>> a known bad value into these fields, is it really valid to use them
>> at all (i.e. things ought to work much like the if() portion of
>> show_registers() which you _do not_ modify).
> 
> Right, in case of double fault we'd need the real values. 
> The only thing comes to mind:
> 
> #define read_segment_register(vcpu, regs, name)                   \
> ({  u16 __sel;                                                    \
>     struct cpu_user_regs *_regs = (regs);                         \
>                                                                   \
>     if ( guest_mode(regs) && is_pvh_vcpu(vcpu) )         <==========

Might be okay, albeit if so I'd like the two sides of the && switched.
But see below...

>         __sel = _regs->name;                                      \
>     else                                                          \
>         asm volatile ( "movw %%" #name ",%0" : "=r" (__sel) );    \
>     __sel;                                                        \
> })                  
> 
> but let me verify this would work for all possible contect_switch ->
> save_segments() calls.
> 
> BTW, I can't use current in the macro because of call from save_segments(). 
> 
>> at all (i.e. things ought to work much like the if() portion of
>> show_registers() which you _do not_ modify).
> 
> Yeah, it was on hold because I've been investigating guest_cr[] sanity,
> and found that I was missing:
> 
>     v->arch.hvm_vcpu.guest_cr[4] = value;
> 
> So, my next version will add that and update show_registers() for PVH.
> I can scratch off another fixme from my list.

But you don't answer the underlying question that I raised: Is
accessing the struct cpu_user_regs selector register fields valid
at all? Again, the #VMEXIT handling code only stores garbage
into them (in debug builds, in non-debug builds it simply leaves
the fields unaltered).

> BTW, In the process I realized in the cr4 update intercept I am missing:
> 
>     if ( value & HVM_CR4_GUEST_RESERVED_BITS(v) )
>     {
>         HVM_DBG_LOG(DBG_LEVEL_1,
>                     "Guest attempts to set reserved bit in CR4: %lx",
>                     value);
>         goto gpf;
>     }
> 
>     if ( !(value & X86_CR4_PAE) && hvm_long_mode_enabled(v) )
>     {
>         HVM_DBG_LOG(DBG_LEVEL_1, "Guest cleared CR4.PAE while "
>                     "EFER.LMA is set");
>         goto gpf;
>     }
> 
> I can't recall now whether I somehow concluded I didn't need to worry about
> it for PVH since I was only thinking 64bit, or just missed it. 
> I guess I should have the check even if I expect the guest to always
> be in LME, right?

Yes, I think this puts you on the safe side.

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/18][V6]: PVH xen: version 6 patches...
  2013-06-05 17:14   ` Tim Deegan
@ 2013-06-06  7:29     ` Jan Beulich
  0 siblings, 0 replies; 80+ messages in thread
From: Jan Beulich @ 2013-06-06  7:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Mukesh Rathor
  Cc: george.dunlap, Tim Deegan, ian.jackson, ian.campbell, xen-devel

>>> On 05.06.13 at 19:14, Tim Deegan <tim@xen.org> wrote:
> At 11:23 -0400 on 05 Jun (1370431396), Konrad Rzeszutek Wilk wrote:
>> On Fri, May 24, 2013 at 06:25:19PM -0700, Mukesh Rathor wrote:
>> > I've version 6 of my patches for 64bit PVH guest for xen.  This is Phase I.
>> > These patches are built on top of git
>> > c/s: 9204bc654562976c7cdebf21c6b5013f6e3057b3
>> > 
>> > V6:
>> > The biggest change in V6 is dropping of dom0 PVH. It will take some time
>> > to investigate and redo dom0 construct to use unmodified PV code. These 
>> > patches in V6 will allow PV dom0 to create PVH domU. Please ack or indicate
>> > individual patches if there are no issues, so I know they have been looked
>> > at.
>> 
>> Ian, Tim, Ian, George,
>> 
>> Are you guys waiting for Jan to review all the patches or just
>> the hypervisor ones before looking at the rest (say the libxl ones)?
> 
> I'm waiting for Jan right now (though I'll only be reviewing the
> hypervisor side anyway).  It takes me basically a whole day (i.e. my
> entire working week on Xen things right now) to review a series like
> this in any detail, so I'm afraid I'm waiting for Jan's feedback to be
> addressed before I read it again.  IIRC all the x86/mm concerns I had
> with early versions have been sorted out but a lot of things have
> changed since then.

And just to clarify - after having gone through the first several
patches of the most recent posting, the amount of changes I
asked for was large enough to make me stop spending significant
amounts of time on the rest of the series the 6th time.

And just to clarify for the future - now that I complained the 6th
time about simple coding style things, I'm going to not repeat
this exercise on this series again, but silently refrain from
considering it ready for getting applied until I can see that some
meaningful cleanup was really done (i.e. this doesn't mean I'm
not willing to take care of the occasional violation while
committing, but I'm tired of pointing out things that are obvious
looking at the raw patches).

The same, to a lesser degree, goes about other comments - I'm
simply not having the time to reiterate the same (or similar
recurring) problems on every submission again. After this long
a time, and several iterations, some common sense must be
possible to get applied to how such a patch series ought to be
structured and implemented.

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/18][V6]: PVH xen: version 6 patches...
  2013-06-05 22:01         ` Mukesh Rathor
@ 2013-06-06  8:46           ` Ian Campbell
  2013-06-07 13:56             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 80+ messages in thread
From: Ian Campbell @ 2013-06-06  8:46 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: george.dunlap, ian.jackson, Xen-devel, tim, Konrad Rzeszutek Wilk

On Wed, 2013-06-05 at 15:01 -0700, Mukesh Rathor wrote:
> On Wed, 5 Jun 2013 21:51:54 +0100
> Ian Campbell <ian.campbell@citrix.com> wrote:

> > I thought the series (up to v5 at least) was dom0 *only*, I hadn't
> > realised this had been flipped onto its head for the v6 posting.
> 
> There was never a dom0 only series, initially it was both, then dom0
> got dropped, as Jan wants a different approach.

I was obviously confused by the conversation which was had at February's
maintainer,committer,developer call:
http://wiki.xen.org/wiki/Xen_Maintainer,_Committer_and_Developer_Meeting/February_2013_Minutes
        "Konrad: Should be able to do dom0 only PVH as a tech preview in
        4.3."

> 
> 
> > >  [PATCH 08/18] PVH xen: tools changes to create PVH domain
> > 
> > I'll try and find time to take a look, but it might have to wait until
> > after 4.3.
> 
> This patch was looked at in the very early versions... So I hope nothing
> major after all this time.

OK, thanks.
Ian.

> 
> thanks
> M-
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/18][V6]: PVH xen: version 6 patches...
  2013-06-05 15:36   ` Ian Campbell
  2013-06-05 18:34     ` Konrad Rzeszutek Wilk
@ 2013-06-06 10:08     ` George Dunlap
  1 sibling, 0 replies; 80+ messages in thread
From: George Dunlap @ 2013-06-06 10:08 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Tim Deegan, xen-devel@lists.xensource.com, Ian Jackson,
	Konrad Rzeszutek Wilk

On Wed, Jun 5, 2013 at 4:36 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Wed, 2013-06-05 at 11:23 -0400, Konrad Rzeszutek Wilk wrote:
>> On Fri, May 24, 2013 at 06:25:19PM -0700, Mukesh Rathor wrote:
>> > I've version 6 of my patches for 64bit PVH guest for xen.  This is Phase I.
>> > These patches are built on top of git
>> > c/s: 9204bc654562976c7cdebf21c6b5013f6e3057b3
>> >
>> > V6:
>> > The biggest change in V6 is dropping of dom0 PVH. It will take some time
>> > to investigate and redo dom0 construct to use unmodified PV code. These
>> > patches in V6 will allow PV dom0 to create PVH domU. Please ack or indicate
>> > individual patches if there are no issues, so I know they have been looked
>> > at.
>>
>> Ian, Tim, Ian, George,
>>
>> Are you guys waiting for Jan to review all the patches or just
>> the hypervisor ones before looking at the rest (say the libxl ones)?
>
> There are libxl ones? I thought this was dom0 only at the moment?

Mukesh,

If you're using git, you can add a line like the following to your commit:

CC: Ian Campbell <ian.campbell@citrix.com>

And when you do git send-email it will automatically cc them.  If you
use it judiciously (e.g., by only cc'ing IanC on patches relating to
the toolstack), it can avoid this kind of "I didn't realize there were
toolstack patches" issue.

 -George

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-06-06  6:43           ` Jan Beulich
@ 2013-06-06 22:19             ` Mukesh Rathor
  2013-06-07  6:13               ` Jan Beulich
  2013-06-07 15:08             ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-06 22:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Thu, 06 Jun 2013 07:43:09 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 05.06.13 at 22:41, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> > On Wed, 05 Jun 2013 08:32:52 +0100
> > "Jan Beulich" <JBeulich@suse.com> wrote:
> > 
..........
> +        if ( rc == -EAGAIN )
> +            rc = hypercall_create_continuation(
> +                __HYPERVISOR_memory_op, "ih", op, arg);
> +
> +        return rc;
> +    }
> +
>      case XENMEM_set_memory_map:
>      {
>          struct xen_foreign_memory_map fmap;
> 
> If the hypercall were handled before, adding a new case statement
> to arch_memory_op() would cause a compilation error. All that was

No, not really. The hcall was handled by xen by returning -ENOSYS. 

> there before this patch was the definition of the hypercall (for ARM),
> but I'm quite strongly opposed to adding x86 support for this
> hypercall only for one half of the possible set of PV (and HVM?)
> guests; the fact that PVH is 64-bit only for the moment has nothing
> to do with this. The only alternative would be to constrain the
> specific sub-hypercall to PVH, but that would be rather contrived,
> so I'm already trying to make clear that I wouldn't accept such a
> solution to the original comment.

Let me add my perpective: 

32bit HVM and PV guest:
   Before my patch, if a call is made with XENMEM_add_to_physmap_range,
   which it can since its an existing hcall, it will come into 
   compat_arch_memory_op() which will return -ENOSYS.

   After my patch, it will do the same, return -ENOSYS. 

So how does this patch cause a regression for existing 32bit 
guests? Moreover, in the past, you've asked to remove even the simplest
benign line space change because it was unrelated to the patch. So changing
something as part of PVH patchset for PV and HVM guests, wouldn't that 
be unrelated and contradictory to the message you've sent? 

You've found some very genunie issues in the patchset, and I really 
appreciate that. But I struggle with this request.

Thanks,
Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/18] PVH xen: add params to read_segment_register
  2013-06-06  6:48       ` Jan Beulich
@ 2013-06-07  1:43         ` Mukesh Rathor
  2013-06-07  6:29           ` Jan Beulich
  0 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-07  1:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Thu, 06 Jun 2013 07:48:49 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 06.06.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> > On Fri, 31 May 2013 11:00:12 +0100
> > "Jan Beulich" <JBeulich@suse.com> wrote:
> > 
> > save_segments() calls.
> > 
> > BTW, I can't use current in the macro because of call from
> > save_segments(). 
> > 
> >> at all (i.e. things ought to work much like the if() portion of
> >> show_registers() which you _do not_ modify).
> > 
> > Yeah, it was on hold because I've been investigating guest_cr[]
> > sanity, and found that I was missing:
> > 
> >     v->arch.hvm_vcpu.guest_cr[4] = value;
> > 
> > So, my next version will add that and update show_registers() for
> > PVH. I can scratch off another fixme from my list.
> 
> But you don't answer the underlying question that I raised: Is
> accessing the struct cpu_user_regs selector register fields valid
> at all? Again, the #VMEXIT handling code only stores garbage
> into them (in debug builds, in non-debug builds it simply leaves
> the fields unaltered).

Are you looking at the right version? The patch doesn't have much DEBUG
code anymore. The selectors are updated every VMExit whether DEBUG or not:

static void read_vmcs_selectors(struct cpu_user_regs *regs)
{
    regs->cs = __vmread(GUEST_CS_SELECTOR);
    regs->ss = __vmread(GUEST_SS_SELECTOR);
    regs->ds = __vmread(GUEST_DS_SELECTOR);
    regs->es = __vmread(GUEST_ES_SELECTOR);
    regs->gs = __vmread(GUEST_GS_SELECTOR);
    regs->fs = __vmread(GUEST_FS_SELECTOR);
}

void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs)
{
    unsigned long exit_qualification;
    unsigned int exit_reason = __vmread(VM_EXIT_REASON);
    int rc=0, ccpu = smp_processor_id();
    struct vcpu *v = current;

    dbgp1("PVH:[%d]left VMCS exitreas:%d RIP:%lx RSP:%lx EFLAGS:%lx CR0:%lx\n",
          ccpu, exit_reason, regs->rip, regs->rsp, regs->rflags,
          __vmread(GUEST_CR0));

    /* guest_kernel_mode() needs cs. read_segment_register needs others */
    read_vmcs_selectors(regs);
.....


You are very thorough, so I'm sure it's not this obvious and I am just
missing something, so please bear with me if that's the case. I'll
continue staring at it... 

thanks
Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-06-06 22:19             ` Mukesh Rathor
@ 2013-06-07  6:13               ` Jan Beulich
  2013-06-07 20:46                 ` Mukesh Rathor
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Beulich @ 2013-06-07  6:13 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 07.06.13 at 00:19, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Thu, 06 Jun 2013 07:43:09 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 05.06.13 at 22:41, Mukesh Rathor <mukesh.rathor@oracle.com>
>> >>> wrote:
>> > On Wed, 05 Jun 2013 08:32:52 +0100
>> > "Jan Beulich" <JBeulich@suse.com> wrote:
>> > 
> ..........
>> +        if ( rc == -EAGAIN )
>> +            rc = hypercall_create_continuation(
>> +                __HYPERVISOR_memory_op, "ih", op, arg);
>> +
>> +        return rc;
>> +    }
>> +
>>      case XENMEM_set_memory_map:
>>      {
>>          struct xen_foreign_memory_map fmap;
>> 
>> If the hypercall were handled before, adding a new case statement
>> to arch_memory_op() would cause a compilation error. All that was
> 
> No, not really. The hcall was handled by xen by returning -ENOSYS. 

That's a completely bogus argument imo.

>> there before this patch was the definition of the hypercall (for ARM),
>> but I'm quite strongly opposed to adding x86 support for this
>> hypercall only for one half of the possible set of PV (and HVM?)
>> guests; the fact that PVH is 64-bit only for the moment has nothing
>> to do with this. The only alternative would be to constrain the
>> specific sub-hypercall to PVH, but that would be rather contrived,
>> so I'm already trying to make clear that I wouldn't accept such a
>> solution to the original comment.
> 
> Let me add my perpective: 
> 
> 32bit HVM and PV guest:
>    Before my patch, if a call is made with XENMEM_add_to_physmap_range,
>    which it can since its an existing hcall, it will come into 
>    compat_arch_memory_op() which will return -ENOSYS.
> 
>    After my patch, it will do the same, return -ENOSYS. 
> 
> So how does this patch cause a regression for existing 32bit 
> guests?

I never said it's a regression. It's incomplete functionality you add.
Someone adding a use of this to PV or PV drivers, testing it on
64-bit, may validly expect that this would work the same on 32-bit.

> Moreover, in the past, you've asked to remove even the simplest
> benign line space change because it was unrelated to the patch. So changing
> something as part of PVH patchset for PV and HVM guests, wouldn't that 
> be unrelated and contradictory to the message you've sent? 

In no way - I'm simply asking for consistency here. If you handled
the hypercall _only_ for PVH guests, and left the implementation
for 32-bit PVH out entirely because such guests can't work anyway
with the patch set in place, that would be consistent. But as said
before: This would artificially limit a hypercall, and that's a
separate reason not to accept such a partial implementation.

> You've found some very genunie issues in the patchset, and I really 
> appreciate that. But I struggle with this request.

Then leave it out, and I'll waste my time on getting it implemented
once the patch set is in. But please add a clear note of this state
to the patch description.

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/18] PVH xen: add params to read_segment_register
  2013-06-07  1:43         ` Mukesh Rathor
@ 2013-06-07  6:29           ` Jan Beulich
  2013-06-08  0:45             ` Mukesh Rathor
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Beulich @ 2013-06-07  6:29 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 07.06.13 at 03:43, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Thu, 06 Jun 2013 07:48:49 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 06.06.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com>
>> >>> wrote:
>> > On Fri, 31 May 2013 11:00:12 +0100
>> > "Jan Beulich" <JBeulich@suse.com> wrote:
>> > 
>> > save_segments() calls.
>> > 
>> > BTW, I can't use current in the macro because of call from
>> > save_segments(). 
>> > 
>> >> at all (i.e. things ought to work much like the if() portion of
>> >> show_registers() which you _do not_ modify).
>> > 
>> > Yeah, it was on hold because I've been investigating guest_cr[]
>> > sanity, and found that I was missing:
>> > 
>> >     v->arch.hvm_vcpu.guest_cr[4] = value;
>> > 
>> > So, my next version will add that and update show_registers() for
>> > PVH. I can scratch off another fixme from my list.
>> 
>> But you don't answer the underlying question that I raised: Is
>> accessing the struct cpu_user_regs selector register fields valid
>> at all? Again, the #VMEXIT handling code only stores garbage
>> into them (in debug builds, in non-debug builds it simply leaves
>> the fields unaltered).
> 
> Are you looking at the right version? The patch doesn't have much DEBUG
> code anymore. The selectors are updated every VMExit whether DEBUG or not:
> 
> static void read_vmcs_selectors(struct cpu_user_regs *regs)
> {
>     regs->cs = __vmread(GUEST_CS_SELECTOR);
>     regs->ss = __vmread(GUEST_SS_SELECTOR);
>     regs->ds = __vmread(GUEST_DS_SELECTOR);
>     regs->es = __vmread(GUEST_ES_SELECTOR);
>     regs->gs = __vmread(GUEST_GS_SELECTOR);
>     regs->fs = __vmread(GUEST_FS_SELECTOR);
> }
> 
> void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs)
> {
>     unsigned long exit_qualification;
>     unsigned int exit_reason = __vmread(VM_EXIT_REASON);
>     int rc=0, ccpu = smp_processor_id();
>     struct vcpu *v = current;
> 
>     dbgp1("PVH:[%d]left VMCS exitreas:%d RIP:%lx RSP:%lx EFLAGS:%lx CR0:%lx\n",
>           ccpu, exit_reason, regs->rip, regs->rsp, regs->rflags,
>           __vmread(GUEST_CR0));
> 
>     /* guest_kernel_mode() needs cs. read_segment_register needs others */
>     read_vmcs_selectors(regs);
> .....
> 
> 
> You are very thorough, so I'm sure it's not this obvious and I am just
> missing something, so please bear with me if that's the case. I'll
> continue staring at it... 

In fact I simply overlooked this, as I was looking at the assembly
entry point only (where those fields get written under !NDEBUG).
I'm sorry for that.

As a subsequent cleanup item, I think this assembly code should be
moved out to HVM-only C code, thus not being redundant with what
you do for PVH. In fact I was considering other things that can be
done in C code (reading/writing RIP, RSP, and RFLAGS) should be
removed from both respective entry.S files - it's not clear to me why
they were put there in the first place.

And anything we leave there should be optimized to hide latencies
between dependent instructions (the worst example here likely is
the CR2 read followed immediately by the intermediate register
getting written to memory, even though the control register read
could be done almost first thing in the handler).

One question though is why HVM code gets away without always
filling those fields, but PVH code doesn't. Could you say a word
on this?

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/18][V6]: PVH xen: version 6 patches...
  2013-06-06  8:46           ` Ian Campbell
@ 2013-06-07 13:56             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 80+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-07 13:56 UTC (permalink / raw)
  To: Ian Campbell; +Cc: george.dunlap, ian.jackson, Xen-devel, tim

On Thu, Jun 06, 2013 at 09:46:54AM +0100, Ian Campbell wrote:
> On Wed, 2013-06-05 at 15:01 -0700, Mukesh Rathor wrote:
> > On Wed, 5 Jun 2013 21:51:54 +0100
> > Ian Campbell <ian.campbell@citrix.com> wrote:
> 
> > > I thought the series (up to v5 at least) was dom0 *only*, I hadn't
> > > realised this had been flipped onto its head for the v6 posting.
> > 
> > There was never a dom0 only series, initially it was both, then dom0
> > got dropped, as Jan wants a different approach.
> 
> I was obviously confused by the conversation which was had at February's
> maintainer,committer,developer call:
> http://wiki.xen.org/wiki/Xen_Maintainer,_Committer_and_Developer_Meeting/February_2013_Minutes
>         "Konrad: Should be able to do dom0 only PVH as a tech preview in
>         4.3."
> 

That was the primary goal and with the intention of squeezing it in Xen 4.3.

As it is not going in 4.3, we can change the goals.

> > 
> > 
> > > >  [PATCH 08/18] PVH xen: tools changes to create PVH domain
> > > 
> > > I'll try and find time to take a look, but it might have to wait until
> > > after 4.3.
> > 
> > This patch was looked at in the very early versions... So I hope nothing
> > major after all this time.
> 
> OK, thanks.
> Ian.
> 
> > 
> > thanks
> > M-
> > 
> 
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-06-06  6:43           ` Jan Beulich
  2013-06-06 22:19             ` Mukesh Rathor
@ 2013-06-07 15:08             ` Konrad Rzeszutek Wilk
  2013-06-07 15:48               ` Jan Beulich
  1 sibling, 1 reply; 80+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-07 15:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Thu, Jun 06, 2013 at 07:43:09AM +0100, Jan Beulich wrote:
> >>> On 05.06.13 at 22:41, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> > On Wed, 05 Jun 2013 08:32:52 +0100
> > "Jan Beulich" <JBeulich@suse.com> wrote:
> > 
> >> >>> On 05.06.13 at 02:31, Mukesh Rathor <mukesh.rathor@oracle.com>
> >> >>> wrote:
> >> >> I also vaguely recall having pointed out in a much earlier review
> >> >> that this functionality is lacking a counterpart in
> >> >> compat_arch_memory_op().
> >> > 
> >> > Hmm.. confused how/why a 64bit PVH go thru compat_arch_memory_op()?
> >> > Can you pl explain?
> >> 
> >> Iirc the new hypercall isn't restricted to PVH guests, and hence
> >> needs a compat implementation regardless of 32-bit PVH not
> >> existing yet.
> > 
> > This patch does not introduce the hcall, it was introduced much earlier.
> > It implements the portion needed for 64bit PVH.
> 
> So how is this statement of yours in line with this hunk of the
> patch?
> 
> @@ -4716,6 +4751,39 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          return rc;
>      }
>  
> +    case XENMEM_add_to_physmap_range:
> +    {
> +        struct xen_add_to_physmap_range xatpr;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&xatpr, arg, 1) )
> +            return -EFAULT;
> +
> +        /* This mapspace is redundant for this hypercall */
> +        if ( xatpr.space == XENMAPSPACE_gmfn_range )
> +            return -EINVAL;
> +
> +        d = rcu_lock_domain_by_any_id(xatpr.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        if ( xsm_add_to_physmap(XSM_TARGET, current->domain, d) )
> +        {
> +            rcu_unlock_domain(d);
> +            return -EPERM;
> +        }
> +
> +        rc = xenmem_add_to_physmap_range(d, &xatpr);
> +
> +        rcu_unlock_domain(d);
> +
> +        if ( rc == -EAGAIN )
> +            rc = hypercall_create_continuation(
> +                __HYPERVISOR_memory_op, "ih", op, arg);
> +
> +        return rc;
> +    }
> +
>      case XENMEM_set_memory_map:
>      {
>          struct xen_foreign_memory_map fmap;
> 
> If the hypercall were handled before, adding a new case statement
> to arch_memory_op() would cause a compilation error. All that was

Meaning if it was handled in do_memory_op, right?

> there before this patch was the definition of the hypercall (for ARM),
> but I'm quite strongly opposed to adding x86 support for this
> hypercall only for one half of the possible set of PV (and HVM?)
> guests; the fact that PVH is 64-bit only for the moment has nothing
> to do with this. The only alternative would be to constrain the
> specific sub-hypercall to PVH, but that would be rather contrived,
> so I'm already trying to make clear that I wouldn't accept such a
> solution to the original comment.

Pardon my misunderstanding - what you are saying is that the location
of this hypercall in arch_memory_op is correct - but only if it
works with 32-bit guests as well?

Which would imply at least doing something in compat_arch_memory_op 
(to copy the arguments properly), and in the arch_memory_op
return -ENOSYS if the guest is 32-bit (at least for right now).

In the future the return -ENOSYS would be removed so that 32-bit
guests can use this hypercall.

Or perhaps it make sense to squash the x86 and ARM version of this
hypercall in the generic code? (later on)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-06-07 15:08             ` Konrad Rzeszutek Wilk
@ 2013-06-07 15:48               ` Jan Beulich
  0 siblings, 0 replies; 80+ messages in thread
From: Jan Beulich @ 2013-06-07 15:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

>>> On 07.06.13 at 17:08, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> Pardon my misunderstanding - what you are saying is that the location
> of this hypercall in arch_memory_op is correct - but only if it
> works with 32-bit guests as well?

Correct.

> Which would imply at least doing something in compat_arch_memory_op 
> (to copy the arguments properly),

Yes.

> and in the arch_memory_op
> return -ENOSYS if the guest is 32-bit (at least for right now).

No, or else the argument copying is pointless. In fact I don't think
arch_memory_op() needs any further tweaking for 32-bit guests
to work, it's solely the argument copying/translation that's missing.

> In the future the return -ENOSYS would be removed so that 32-bit
> guests can use this hypercall.
> 
> Or perhaps it make sense to squash the x86 and ARM version of this
> hypercall in the generic code? (later on)

That would be the ultimate goal, if feasible.

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range
  2013-06-07  6:13               ` Jan Beulich
@ 2013-06-07 20:46                 ` Mukesh Rathor
  0 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-07 20:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 07 Jun 2013 07:13:04 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 07.06.13 at 00:19, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
...
> I never said it's a regression. It's incomplete functionality you add.
> Someone adding a use of this to PV or PV drivers, testing it on
> 64-bit, may validly expect that this would work the same on 32-bit.
> 
> > Moreover, in the past, you've asked to remove even the simplest
> > benign line space change because it was unrelated to the patch. So
> > changing something as part of PVH patchset for PV and HVM guests,
> > wouldn't that be unrelated and contradictory to the message you've
> > sent? 
> 
> In no way - I'm simply asking for consistency here. If you handled
> the hypercall _only_ for PVH guests, and left the implementation
> for 32-bit PVH out entirely because such guests can't work anyway
> with the patch set in place, that would be consistent. But as said
> before: This would artificially limit a hypercall, and that's a
> separate reason not to accept such a partial implementation.

One last time :), the hcall is *already* limited, I'm unlimiting 
it *one step at a time*.

> > You've found some very genunie issues in the patchset, and I really 
> > appreciate that. But I struggle with this request.
> 
> Then leave it out, and I'll waste my time on getting it implemented
> once the patch set is in. But please add a clear note of this state
> to the patch description.

Ok, I'll add a note to the patch description.

thanks
Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/18] PVH xen: add params to read_segment_register
  2013-06-07  6:29           ` Jan Beulich
@ 2013-06-08  0:45             ` Mukesh Rathor
  2013-06-10  8:01               ` Jan Beulich
  0 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-08  0:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 07 Jun 2013 07:29:40 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 07.06.13 at 03:43, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> > On Thu, 06 Jun 2013 07:48:49 +0100
> > "Jan Beulich" <JBeulich@suse.com> wrote:
> > 
> >> >>> On 06.06.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com>
> >> >>> wrote:
> >> > On Fri, 31 May 2013 11:00:12 +0100
> >> > "Jan Beulich" <JBeulich@suse.com> wrote:
> >> > 
........ 
> And anything we leave there should be optimized to hide latencies
> between dependent instructions (the worst example here likely is
> the CR2 read followed immediately by the intermediate register
> getting written to memory, even though the control register read
> could be done almost first thing in the handler).
> 
> One question though is why HVM code gets away without always
> filling those fields, but PVH code doesn't. Could you say a word
> on this?

Yeah, sure. The HVM IO instr emulation goes thru handle_mmio/handle_pio,
and I've not looked at them since the PVH goes thru emulate_privileged_op(),
which calls read_segment_register macro. The HVM functions do not call
the read_segment_register macro. 

Hope that helps.

Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/18] PVH xen: add params to read_segment_register
  2013-06-08  0:45             ` Mukesh Rathor
@ 2013-06-10  8:01               ` Jan Beulich
  2013-06-10 23:10                 ` Mukesh Rathor
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Beulich @ 2013-06-10  8:01 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 08.06.13 at 02:45, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Fri, 07 Jun 2013 07:29:40 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 07.06.13 at 03:43, Mukesh Rathor <mukesh.rathor@oracle.com>
>> >>> wrote:
>> > On Thu, 06 Jun 2013 07:48:49 +0100
>> > "Jan Beulich" <JBeulich@suse.com> wrote:
>> > 
>> >> >>> On 06.06.13 at 03:25, Mukesh Rathor <mukesh.rathor@oracle.com>
>> >> >>> wrote:
>> >> > On Fri, 31 May 2013 11:00:12 +0100
>> >> > "Jan Beulich" <JBeulich@suse.com> wrote:
>> >> > 
> ........ 
>> And anything we leave there should be optimized to hide latencies
>> between dependent instructions (the worst example here likely is
>> the CR2 read followed immediately by the intermediate register
>> getting written to memory, even though the control register read
>> could be done almost first thing in the handler).
>> 
>> One question though is why HVM code gets away without always
>> filling those fields, but PVH code doesn't. Could you say a word
>> on this?
> 
> Yeah, sure. The HVM IO instr emulation goes thru handle_mmio/handle_pio,
> and I've not looked at them since the PVH goes thru emulate_privileged_op(),
> which calls read_segment_register macro. The HVM functions do not call
> the read_segment_register macro. 

Then is there any reason why, rather than always reading all
selector registers, you couldn't defer the reading to when this
macro actually gets used? I'd assume there are going to be many
exits that don't need any of them...

Jan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/18] PVH xen: add params to read_segment_register
  2013-06-10  8:01               ` Jan Beulich
@ 2013-06-10 23:10                 ` Mukesh Rathor
  0 siblings, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-10 23:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Mon, 10 Jun 2013 09:01:37 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 08.06.13 at 02:45, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> > On Fri, 07 Jun 2013 07:29:40 +0100
> > "Jan Beulich" <JBeulich@suse.com> wrote:
> > 
> >> >>> On 07.06.13 at 03:43, Mukesh Rathor <mukesh.rathor@oracle.com>
> >> >>> wrote:
> >> > On Thu, 06 Jun 2013 07:48:49 +0100
> >> > "Jan Beulich" <JBeulich@suse.com> wrote:
> >> > 
> >> >> >>> On 06.06.13 at 03:25, Mukesh Rathor
> >> >> >>> <mukesh.rathor@oracle.com> wrote:
> >> >> > On Fri, 31 May 2013 11:00:12 +0100
> >> >> > "Jan Beulich" <JBeulich@suse.com> wrote:
> >> >> > 
> > ........ 
> >> And anything we leave there should be optimized to hide latencies
> >> between dependent instructions (the worst example here likely is
> >> the CR2 read followed immediately by the intermediate register
> >> getting written to memory, even though the control register read
> >> could be done almost first thing in the handler).
> >> 
> >> One question though is why HVM code gets away without always
> >> filling those fields, but PVH code doesn't. Could you say a word
> >> on this?
> > 
> > Yeah, sure. The HVM IO instr emulation goes thru
> > handle_mmio/handle_pio, and I've not looked at them since the PVH
> > goes thru emulate_privileged_op(), which calls
> > read_segment_register macro. The HVM functions do not call the
> > read_segment_register macro. 
> 
> Then is there any reason why, rather than always reading all
> selector registers, you couldn't defer the reading to when this
> macro actually gets used? I'd assume there are going to be many
> exits that don't need any of them...


I realized the day after that I should have clarified. CS is the only
one needed for most of the intercepts, mostly in guest_kernel_mode. So
that is a good one to just read always. I moved
read_vmcs_selectors back to vmxit_io_instr() which is where it
was originally. I went thru all the intercepts to make sure it was not
needed anywhere else.

thanks
Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-05-25  1:25 ` [PATCH 08/18] PVH xen: tools changes to create PVH domain Mukesh Rathor
@ 2013-06-12 14:58   ` Ian Campbell
  2013-06-15  0:14     ` Mukesh Rathor
  2013-07-31  1:06     ` Mukesh Rathor
  0 siblings, 2 replies; 80+ messages in thread
From: Ian Campbell @ 2013-06-12 14:58 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel

On Fri, 2013-05-24 at 18:25 -0700, Mukesh Rathor wrote:
> This patch contains tools changes for PVH. For now, only one mode is
> supported/tested:
>     dom0> losetup /dev/loop1 guest.img
>     dom0> In vm.cfg file: disk = ['phy:/dev/loop1,xvda,w']
> 
> Chnages in V2: None
> Chnages in V3:
>   - Document pvh boolean flag in xl.cfg.pod.5
>   - Rename ci_pvh and bi_pvh to pvh, and domcr_is_pvh to pvh_enabled.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> ---
>  docs/man/xl.cfg.pod.5             |    3 +++
>  tools/debugger/gdbsx/xg/xg_main.c |    4 +++-
>  tools/libxc/xc_dom.h              |    1 +
>  tools/libxc/xc_dom_x86.c          |    7 ++++---
>  tools/libxl/libxl_create.c        |    2 ++
>  tools/libxl/libxl_dom.c           |   18 +++++++++++++++++-
>  tools/libxl/libxl_types.idl       |    2 ++
>  tools/libxl/libxl_x86.c           |    4 +++-
>  tools/libxl/xl_cmdimpl.c          |   11 +++++++++++
>  tools/xenstore/xenstored_domain.c |   12 +++++++-----

I think these should be split into
	libxc (dombuilder) changes
	libxl changes
	xenstore changes
	misc other (== gdbsx) changes.

Since most of the changes here are not mentioned at all in the commit
message (it was the xenstore change, which IMHO requires an explanation,
which prompted this)

>  10 files changed, 53 insertions(+), 11 deletions(-)

> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index f1be43b..24f6759 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -832,7 +833,7 @@ int arch_setup_bootlate(struct xc_dom_image *dom)
>          }
>  
>          /* Map grant table frames into guest physmap. */
> -        for ( i = 0; ; i++ )
> +        for ( i = 0; !dom->pvh_enabled; i++ )

This is a bit of an odd way to do this (unless pvh_enabled somehow
changes in this loop, which I doubt). Can we just get a surrounding if
please.

> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index b38d0a7..cefbf76 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -329,9 +329,23 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>      struct xc_dom_image *dom;
>      int ret;
>      int flags = 0;
> +    int is_pvh = libxl_defbool_val(info->pvh);
>  
>      xc_dom_loginit(ctx->xch);
>  
> +    if (is_pvh) {
> +        char *pv_feats = "writable_descriptor_tables|auto_translated_physmap"
> +                         "|supervisor_mode_kernel|hvm_callback_vector";
> +
> +        if (info->u.pv.features && info->u.pv.features[0] != '\0')
> +        {
> +            LOG(ERROR, "Didn't expect info->u.pv.features to contain string\n");
> +            LOG(ERROR, "String: %s\n", info->u.pv.features);
> +            return ERROR_FAIL;
> +        }
> +        info->u.pv.features = strdup(pv_feats);

What is this trying to achieve? I think the requirement for certain
features to be present if pvh is enabled needs to be handled in the
xc_dom library and not here. This field is (I think) for the user to
specify other features which they may wish to require.

> +    }
> +
>      dom = xc_dom_allocate(ctx->xch, state->pv_cmdline, info->u.pv.features);
>      if (!dom) {
>          LOGE(ERROR, "xc_dom_allocate failed");
> @@ -370,6 +384,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>      }
>  
>      dom->flags = flags;
> +    dom->pvh_enabled = is_pvh;

Not part of the flags?

>      dom->console_evtchn = state->console_port;
>      dom->console_domid = state->console_domid;
>      dom->xenstore_evtchn = state->store_port;
> @@ -400,7 +415,8 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>          LOGE(ERROR, "xc_dom_boot_image failed");
>          goto out;
>      }
> -    if ( (ret = xc_dom_gnttab_init(dom)) != 0 ) {
> +    /* PVH sets up its own grant during boot via hvm mechanisms */
> +    if ( !is_pvh && (ret = xc_dom_gnttab_init(dom)) != 0 ) {
>          LOGE(ERROR, "xc_dom_gnttab_init failed");
>          goto out;
>      }
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 8262cba..43e6d95 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -245,6 +245,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
>      ("platformdata", libxl_key_value_list),
>      ("poolid",       uint32),
>      ("run_hotplug_scripts",libxl_defbool),
> +    ("pvh",          libxl_defbool),
>      ], dir=DIR_IN)
>  
>  MemKB = UInt(64, init_val = "LIBXL_MEMKB_DEFAULT")
> @@ -346,6 +347,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>                                        ])),
>                   ("invalid", Struct(None, [])),
>                   ], keyvar_init_val = "LIBXL_DOMAIN_TYPE_INVALID")),
> +    ("pvh",       libxl_defbool),

I'm not quite convinced if the need for both of these bools in both
create and build, it's a bit of an odd quirk in our API which I need to
consider a bit deeper.

In any case if this one in build_info should exist it belongs in the pv
part of the preceding union since it is PV specific.

>      ], dir=DIR_IN
>  )
>  
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index a17f6ae..424bc68 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -290,7 +290,9 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
>      if (rtc_timeoffset)
>          xc_domain_set_time_offset(ctx->xch, domid, rtc_timeoffset);
>  
> -    if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM) {
> +    if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM ||
> +        libxl_defbool_val(d_config->b_info.pvh)) {
> +
>          unsigned long shadow;
>          shadow = (d_config->b_info.shadow_memkb + 1023) / 1024;
>          xc_shadow_control(ctx->xch, domid, XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION, NULL, 0, &shadow, 0, NULL);
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index e13a64e..e032668 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -610,8 +610,18 @@ static void parse_config_data(const char *config_source,
>          !strncmp(buf, "hvm", strlen(buf)))
>          c_info->type = LIBXL_DOMAIN_TYPE_HVM;
>  
> +    libxl_defbool_setdefault(&c_info->pvh, false);
> +    libxl_defbool_setdefault(&c_info->hap, false);

These belong in libxl__domain_create_info_setdefault() not here.

> +    xlu_cfg_get_defbool(config, "pvh", &c_info->pvh, 0);
>      xlu_cfg_get_defbool(config, "hap", &c_info->hap, 0);
>  
> +    if (libxl_defbool_val(c_info->pvh) &&
> +        !libxl_defbool_val(c_info->hap)) {
> +
> +        fprintf(stderr, "hap is required for PVH domain\n");
> +        exit(1);

This check belongs in setdefault or one of the functions in libxl which
consumes the create_info
> +    }
> +
>      if (xlu_cfg_replace_string (config, "name", &c_info->name, 0)) {
>          fprintf(stderr, "Domain name must be specified.\n");
>          exit(1);
> @@ -918,6 +928,7 @@ static void parse_config_data(const char *config_source,
>  
>          b_info->u.pv.cmdline = cmdline;
>          xlu_cfg_replace_string (config, "ramdisk", &b_info->u.pv.ramdisk, 0);
> +        libxl_defbool_set(&b_info->pvh, libxl_defbool_val(c_info->pvh));

b_info->pvh = c_info->pvh is the right way to do this. If possible I'd
like to remove one or the other from and handle it internally to the
library. As I say I need to chew on this one a bit more.


>          break;
>      }
>      default:
> diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
> index bf83d58..10c23a1 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -168,13 +168,15 @@ static int readchn(struct connection *conn, void *data, unsigned int len)
>  static void *map_interface(domid_t domid, unsigned long mfn)
>  {
>  	if (*xcg_handle != NULL) {
> -		/* this is the preferred method */
> -		return xc_gnttab_map_grant_ref(*xcg_handle, domid,
> +                void *addr;
> +                /* this is the preferred method */
> +                addr = xc_gnttab_map_grant_ref(*xcg_handle, domid,
>  			GNTTAB_RESERVED_XENSTORE, PROT_READ|PROT_WRITE);
> -	} else {
> -		return xc_map_foreign_range(*xc_handle, domid,
> -			getpagesize(), PROT_READ|PROT_WRITE, mfn);
> +                if (addr)
> +                        return addr;
>  	}
> +	return xc_map_foreign_range(*xc_handle, domid,
> +		        getpagesize(), PROT_READ|PROT_WRITE, mfn);
>  }
>  
>  static void unmap_interface(void *interface)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-06-12 14:58   ` Ian Campbell
@ 2013-06-15  0:14     ` Mukesh Rathor
  2013-06-17 11:11       ` Ian Campbell
  2013-07-31  1:06     ` Mukesh Rathor
  1 sibling, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-06-15  0:14 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel

On Wed, 12 Jun 2013 15:58:08 +0100
Ian Campbell <Ian.Campbell@citrix.com> wrote:

> On Fri, 2013-05-24 at 18:25 -0700, Mukesh Rathor wrote:
>---
........
ols/xenstore/xenstored_domain.c |   12 +++++++-----
> 
> I think these should be split into
> 	libxc (dombuilder) changes
> 	libxl changes
> 	xenstore changes
> 	misc other (== gdbsx) changes.
> 
> Since most of the changes here are not mentioned at all in the commit
> message (it was the xenstore change, which IMHO requires an
> explanation, which prompted this)

Ok, I'll just do a separate tools patch, and separate them.

> >  10 files changed, 53 insertions(+), 11 deletions(-)
> 
> > diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> > index f1be43b..24f6759 100644
> > --- a/tools/libxc/xc_dom_x86.c
> > +++ b/tools/libxc/xc_dom_x86.c
> > @@ -832,7 +833,7 @@ int arch_setup_bootlate(struct xc_dom_image
> > *dom) }
> >  
> >          /* Map grant table frames into guest physmap. */
> > -        for ( i = 0; ; i++ )
> > +        for ( i = 0; !dom->pvh_enabled; i++ )
> 
> This is a bit of an odd way to do this (unless pvh_enabled somehow
> changes in this loop, which I doubt). Can we just get a surrounding if
> please.

Sure (will indent more tho). Are you ok with a forward goto?


> > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> > index b38d0a7..cefbf76 100644
> > --- a/tools/libxl/libxl_dom.c
> > +++ b/tools/libxl/libxl_dom.c
> > @@ -329,9 +329,23 @@ int libxl__build_pv(libxl__gc *gc, uint32_t
> > domid, struct xc_dom_image *dom;
> >      int ret;
> >      int flags = 0;
> > +    int is_pvh = libxl_defbool_val(info->pvh);
> >  
> >      xc_dom_loginit(ctx->xch);
> >  
> > +    if (is_pvh) {
> > +        char *pv_feats =
> > "writable_descriptor_tables|auto_translated_physmap"
> > +
> > "|supervisor_mode_kernel|hvm_callback_vector"; +
> > +        if (info->u.pv.features && info->u.pv.features[0] != '\0')
> > +        {
> > +            LOG(ERROR, "Didn't expect info->u.pv.features to
> > contain string\n");
> > +            LOG(ERROR, "String: %s\n", info->u.pv.features);
> > +            return ERROR_FAIL;
> > +        }
> > +        info->u.pv.features = strdup(pv_feats);
> 
> What is this trying to achieve? I think the requirement for certain
> features to be present if pvh is enabled needs to be handled in the
> xc_dom library and not here. This field is (I think) for the user to
> specify other features which they may wish to require.

I had asked for assitance on this long ago. But anyways, basically here
I want to make sure the kernel has all those features because the user
has asked a PVH guest must be created (by pvh=1 in vm.cfg file). Can you 
kindly advise the best way to do this? 


> > @@ -245,6 +245,7 @@ libxl_domain_create_info =
> > Struct("domain_create_info",[ ("platformdata",
> > libxl_key_value_list), ("poolid",       uint32),
> >      ("run_hotplug_scripts",libxl_defbool),
> > +    ("pvh",          libxl_defbool),
> >      ], dir=DIR_IN)
> >  
> >  MemKB = UInt(64, init_val = "LIBXL_MEMKB_DEFAULT")
> > @@ -346,6 +347,7 @@ libxl_domain_build_info =
> > Struct("domain_build_info",[ ])),
> >                   ("invalid", Struct(None, [])),
> >                   ], keyvar_init_val =
> > "LIBXL_DOMAIN_TYPE_INVALID")),
> > +    ("pvh",       libxl_defbool),
> 
> I'm not quite convinced if the need for both of these bools in both
> create and build, it's a bit of an odd quirk in our API which I need
> to consider a bit deeper.

Ok, please let me know.

> >      if (xlu_cfg_replace_string (config, "name", &c_info->name, 0))
> > { fprintf(stderr, "Domain name must be specified.\n");
> >          exit(1);
> > @@ -918,6 +928,7 @@ static void parse_config_data(const char
> > *config_source, 
> >          b_info->u.pv.cmdline = cmdline;
> >          xlu_cfg_replace_string (config, "ramdisk",
> > &b_info->u.pv.ramdisk, 0);
> > +        libxl_defbool_set(&b_info->pvh,
> > libxl_defbool_val(c_info->pvh));
> 
> b_info->pvh = c_info->pvh is the right way to do this. If possible I'd
> like to remove one or the other from and handle it internally to the
> library. As I say I need to chew on this one a bit more.

Ok, please let me know what you come up with.

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-06-15  0:14     ` Mukesh Rathor
@ 2013-06-17 11:11       ` Ian Campbell
  2013-07-30 23:47         ` Mukesh Rathor
  2013-08-29  0:14         ` Mukesh Rathor
  0 siblings, 2 replies; 80+ messages in thread
From: Ian Campbell @ 2013-06-17 11:11 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Ian Jackson

On Fri, 2013-06-14 at 17:14 -0700, Mukesh Rathor wrote:

> > >  10 files changed, 53 insertions(+), 11 deletions(-)
> > 
> > > diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> > > index f1be43b..24f6759 100644
> > > --- a/tools/libxc/xc_dom_x86.c
> > > +++ b/tools/libxc/xc_dom_x86.c
> > > @@ -832,7 +833,7 @@ int arch_setup_bootlate(struct xc_dom_image
> > > *dom) }
> > >  
> > >          /* Map grant table frames into guest physmap. */
> > > -        for ( i = 0; ; i++ )
> > > +        for ( i = 0; !dom->pvh_enabled; i++ )
> > 
> > This is a bit of an odd way to do this (unless pvh_enabled somehow
> > changes in this loop, which I doubt). Can we just get a surrounding if
> > please.
> 
> Sure (will indent more tho). Are you ok with a forward goto?

Not madly keen. I'd rather pull that loop out into a function.

> 
> 
> > > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> > > index b38d0a7..cefbf76 100644
> > > --- a/tools/libxl/libxl_dom.c
> > > +++ b/tools/libxl/libxl_dom.c
> > > @@ -329,9 +329,23 @@ int libxl__build_pv(libxl__gc *gc, uint32_t
> > > domid, struct xc_dom_image *dom;
> > >      int ret;
> > >      int flags = 0;
> > > +    int is_pvh = libxl_defbool_val(info->pvh);
> > >  
> > >      xc_dom_loginit(ctx->xch);
> > >  
> > > +    if (is_pvh) {
> > > +        char *pv_feats =
> > > "writable_descriptor_tables|auto_translated_physmap"
> > > +
> > > "|supervisor_mode_kernel|hvm_callback_vector"; +
> > > +        if (info->u.pv.features && info->u.pv.features[0] != '\0')
> > > +        {
> > > +            LOG(ERROR, "Didn't expect info->u.pv.features to
> > > contain string\n");
> > > +            LOG(ERROR, "String: %s\n", info->u.pv.features);
> > > +            return ERROR_FAIL;
> > > +        }
> > > +        info->u.pv.features = strdup(pv_feats);
> > 
> > What is this trying to achieve? I think the requirement for certain
> > features to be present if pvh is enabled needs to be handled in the
> > xc_dom library and not here. This field is (I think) for the user to
> > specify other features which they may wish to require.
> 
> I had asked for assitance on this long ago. But anyways, basically here
> I want to make sure the kernel has all those features because the user
> has asked a PVH guest must be created (by pvh=1 in vm.cfg file). Can you 
> kindly advise the best way to do this? 

This should be done in xc_dom build stuff not in libxl. Basically libxl
should call xc_dom_foo with a kernel and pvh=yes (or =ifpossible) and
the builder is then responsible internally for knowing which features
are therefore required from the kernel.

> > > @@ -245,6 +245,7 @@ libxl_domain_create_info =
> > > Struct("domain_create_info",[ ("platformdata",
> > > libxl_key_value_list), ("poolid",       uint32),
> > >      ("run_hotplug_scripts",libxl_defbool),
> > > +    ("pvh",          libxl_defbool),
> > >      ], dir=DIR_IN)
> > >  
> > >  MemKB = UInt(64, init_val = "LIBXL_MEMKB_DEFAULT")
> > > @@ -346,6 +347,7 @@ libxl_domain_build_info =
> > > Struct("domain_build_info",[ ])),
> > >                   ("invalid", Struct(None, [])),
> > >                   ], keyvar_init_val =
> > > "LIBXL_DOMAIN_TYPE_INVALID")),
> > > +    ("pvh",       libxl_defbool),
> > 
> > I'm not quite convinced if the need for both of these bools in both
> > create and build, it's a bit of an odd quirk in our API which I need
> > to consider a bit deeper.
> 
> Ok, please let me know.

Which places need the one in c_info and which the one in b_info?

c_info is presumably for the createdomain domctl call while b_info is
stuff spread around the build process to handle the various differences?

Perhaps libxl__domain_create_state is the right place for the b_info
one, initialised internally to libxl from the c_info one? In that
context it would be a straight bool and not a defbool since we would
know exactly what the domain was by this point.

An alternative would be for the code to query the domain's type (i.e.
from the hypervisor) when it needs to know -- e.g. libxl__domain_type
does this for HVM vs PV.

Ian.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-06-17 11:11       ` Ian Campbell
@ 2013-07-30 23:47         ` Mukesh Rathor
  2013-07-31 12:00           ` Ian Campbell
  2013-08-29  0:14         ` Mukesh Rathor
  1 sibling, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-07-30 23:47 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel, Ian Jackson

On Mon, 17 Jun 2013 12:11:34 +0100
Ian Campbell <Ian.Campbell@citrix.com> wrote:

> On Fri, 2013-06-14 at 17:14 -0700, Mukesh Rathor wrote:
....
> > > > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> > > > index b38d0a7..cefbf76 100644
> > > > --- a/tools/libxl/libxl_dom.c
> > > > +++ b/tools/libxl/libxl_dom.c
> > > > @@ -329,9 +329,23 @@ int libxl__build_pv(libxl__gc *gc, uint32_t
> > > > domid, struct xc_dom_image *dom;
> > > >      int ret;
> > > >      int flags = 0;
> > > > +    int is_pvh = libxl_defbool_val(info->pvh);
> > > >  
> > > >      xc_dom_loginit(ctx->xch);
> > > >  
> > > > +    if (is_pvh) {
> > > > +        char *pv_feats =
> > > > "writable_descriptor_tables|auto_translated_physmap"
> > > > +
> > > > "|supervisor_mode_kernel|hvm_callback_vector"; +
> > > > +        if (info->u.pv.features && info->u.pv.features[0] !=
> > > > '\0')
> > > > +        {
> > > > +            LOG(ERROR, "Didn't expect info->u.pv.features to
> > > > contain string\n");
> > > > +            LOG(ERROR, "String: %s\n", info->u.pv.features);
> > > > +            return ERROR_FAIL;
> > > > +        }
> > > > +        info->u.pv.features = strdup(pv_feats);
> > > 
> > > What is this trying to achieve? I think the requirement for
> > > certain features to be present if pvh is enabled needs to be
> > > handled in the xc_dom library and not here. This field is (I
> > > think) for the user to specify other features which they may wish
> > > to require.
> > 
> > I had asked for assitance on this long ago. But anyways, basically
> > here I want to make sure the kernel has all those features because
> > the user has asked a PVH guest must be created (by pvh=1 in vm.cfg
> > file). Can you kindly advise the best way to do this? 
> 
> This should be done in xc_dom build stuff not in libxl. Basically
> libxl should call xc_dom_foo with a kernel and pvh=yes (or
> =ifpossible) and the builder is then responsible internally for
> knowing which features are therefore required from the kernel.

Alright, I'm still not able to figure this out. I was able to instrument
libraries to figure what goes on for PV. But, I see for PV both 
dom->f_requested  and dom->parms.f_required is null in xc_dom_parse_image(). 
Also, in the same function dom->parms.f_supported is checked, but I can't 
tell greping for f_supported where it's set! I am using xl and not xm, so
don't care what xm/python is setting. I was expecting to see some
features for PV set in those strings. 

It looks like elf_xen_parse_features sets the feature bits, but it's not 
being called for PV in xc_dom_allocate() because features parameter is 
null. 

So, that brings me back to setting the feature string somewhere before
xc_dom_allocate() is called. I'm at a loss where to set it? The feature
string should be set to following for pvh when xc_dom_allocate() is called:

       "writable_descriptor_tables|auto_translated_physmap"
       "|supervisor_mode_kernel|hvm_callback_vector

and if the kernel elf doesn't provide those features, the create should fail.
libxl__build_pv calls xc_dom_allocate(), but you don't want me to add it
in libxl.

I can just put the damn thing in xc_dom_allocate() if features is NULL,
or re-malloc if feature is not NULL for pvh domain?

thanks
mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-06-12 14:58   ` Ian Campbell
  2013-06-15  0:14     ` Mukesh Rathor
@ 2013-07-31  1:06     ` Mukesh Rathor
  2013-07-31 11:32       ` Ian Campbell
  1 sibling, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-07-31  1:06 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel, Ian Jackson

On Wed, 12 Jun 2013 15:58:08 +0100
Ian Campbell <Ian.Campbell@citrix.com> wrote:

> On Fri, 2013-05-24 at 18:25 -0700, Mukesh Rathor wrote:
... 
> > +    }
> > +
> >      dom = xc_dom_allocate(ctx->xch, state->pv_cmdline,
> > info->u.pv.features); if (!dom) {
> >          LOGE(ERROR, "xc_dom_allocate failed");
> > @@ -370,6 +384,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t
> > domid, }
> >  
> >      dom->flags = flags;
> > +    dom->pvh_enabled = is_pvh;
> 
> Not part of the flags?

I'd have liked to use the flags, but with all due respect, when a field 
is named "flags", a search for which yields 800 hits, and there is no
comment, and the type is ulong, it is impossible for someone to
understand and reverse engineer it? Can you tell me if it's a bit field,
enum of some sort, or even where it's used? I can't grep/cscope such
generic field name...

thanks
mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-07-31  1:06     ` Mukesh Rathor
@ 2013-07-31 11:32       ` Ian Campbell
  0 siblings, 0 replies; 80+ messages in thread
From: Ian Campbell @ 2013-07-31 11:32 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Ian Jackson

On Tue, 2013-07-30 at 18:06 -0700, Mukesh Rathor wrote:
> On Wed, 12 Jun 2013 15:58:08 +0100
> Ian Campbell <Ian.Campbell@citrix.com> wrote:
> 
> > On Fri, 2013-05-24 at 18:25 -0700, Mukesh Rathor wrote:
> ... 
> > > +    }
> > > +
> > >      dom = xc_dom_allocate(ctx->xch, state->pv_cmdline,
> > > info->u.pv.features); if (!dom) {
> > >          LOGE(ERROR, "xc_dom_allocate failed");
> > > @@ -370,6 +384,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t
> > > domid, }
> > >  
> > >      dom->flags = flags;
> > > +    dom->pvh_enabled = is_pvh;
> > 
> > Not part of the flags?
> 
> I'd have liked to use the flags, but with all due respect, when a field 
> is named "flags", a search for which yields 800 hits, and there is no
> comment, and the type is ulong, it is impossible for someone to
> understand and reverse engineer it? Can you tell me if it's a bit field,
> enum of some sort, or even where it's used? I can't grep/cscope such
> generic field name...

Well, obviously you grep/cscope for the type of the containing struct
and then look within it, it's hardly rocket science.

As it happens these flags correspond exactly to the struct start_info
flags which are a bunch of SIF_* things, so in this case putting pvh
there isn't appropriate.

Ian.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-07-30 23:47         ` Mukesh Rathor
@ 2013-07-31 12:00           ` Ian Campbell
  2013-08-01  2:02             ` Mukesh Rathor
  0 siblings, 1 reply; 80+ messages in thread
From: Ian Campbell @ 2013-07-31 12:00 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Ian Jackson

On Tue, 2013-07-30 at 16:47 -0700, Mukesh Rathor wrote:
> On Mon, 17 Jun 2013 12:11:34 +0100
> Ian Campbell <Ian.Campbell@citrix.com> wrote:
> 
> > On Fri, 2013-06-14 at 17:14 -0700, Mukesh Rathor wrote:
> ....
> > > > > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> > > > > index b38d0a7..cefbf76 100644
> > > > > --- a/tools/libxl/libxl_dom.c
> > > > > +++ b/tools/libxl/libxl_dom.c
> > > > > @@ -329,9 +329,23 @@ int libxl__build_pv(libxl__gc *gc, uint32_t
> > > > > domid, struct xc_dom_image *dom;
> > > > >      int ret;
> > > > >      int flags = 0;
> > > > > +    int is_pvh = libxl_defbool_val(info->pvh);
> > > > >  
> > > > >      xc_dom_loginit(ctx->xch);
> > > > >  
> > > > > +    if (is_pvh) {
> > > > > +        char *pv_feats =
> > > > > "writable_descriptor_tables|auto_translated_physmap"
> > > > > +
> > > > > "|supervisor_mode_kernel|hvm_callback_vector"; +
> > > > > +        if (info->u.pv.features && info->u.pv.features[0] !=
> > > > > '\0')
> > > > > +        {
> > > > > +            LOG(ERROR, "Didn't expect info->u.pv.features to
> > > > > contain string\n");
> > > > > +            LOG(ERROR, "String: %s\n", info->u.pv.features);
> > > > > +            return ERROR_FAIL;
> > > > > +        }
> > > > > +        info->u.pv.features = strdup(pv_feats);
> > > > 
> > > > What is this trying to achieve? I think the requirement for
> > > > certain features to be present if pvh is enabled needs to be
> > > > handled in the xc_dom library and not here. This field is (I
> > > > think) for the user to specify other features which they may wish
> > > > to require.
> > > 
> > > I had asked for assitance on this long ago. But anyways, basically
> > > here I want to make sure the kernel has all those features because
> > > the user has asked a PVH guest must be created (by pvh=1 in vm.cfg
> > > file). Can you kindly advise the best way to do this? 
> > 
> > This should be done in xc_dom build stuff not in libxl. Basically
> > libxl should call xc_dom_foo with a kernel and pvh=yes (or
> > =ifpossible) and the builder is then responsible internally for
> > knowing which features are therefore required from the kernel.
> 
> Alright, I'm still not able to figure this out. I was able to instrument
> libraries to figure what goes on for PV. But, I see for PV both 
> dom->f_requested  and dom->parms.f_required is null in xc_dom_parse_image(). 
> Also, in the same function dom->parms.f_supported is checked, but I can't 
> tell greping for f_supported where it's set! I am using xl and not xm, so
> don't care what xm/python is setting. I was expecting to see some
> features for PV set in those strings. 
> 
> It looks like elf_xen_parse_features sets the feature bits, but it's not 
> being called for PV in xc_dom_allocate() because features parameter is 
> null. 
> 
> So, that brings me back to setting the feature string somewhere before
> xc_dom_allocate() is called. I'm at a loss where to set it? The feature
> string should be set to following for pvh when xc_dom_allocate() is called:
> 
>        "writable_descriptor_tables|auto_translated_physmap"
>        "|supervisor_mode_kernel|hvm_callback_vector
> 
> and if the kernel elf doesn't provide those features, the create should fail.
> libxl__build_pv calls xc_dom_allocate(), but you don't want me to add it
> in libxl.
> 
> I can just put the damn thing in xc_dom_allocate() if features is NULL,
> or re-malloc if feature is not NULL for pvh domain?

I think there's a bit of confusion because the current libxc interface
is there to support user-driven direct override of the required feature
flags. IOW a user literally writes "features = FOO" in their config file
and that ends up being f_requested. Although libxl supports this concept
it is not plumbed into xl, I don't know/care what xend does either.

In any case this is not the use case you are looking for. What we want
for PVH is for libxc internally to decide on a set of features which are
required to launch a domain with a specific configuration, in this case
PVH. That's slightly orthogonal to the existing stuff. This isn't
something which has come up yet and so PVH will be the first to go down
this path, which is why you aren't finding all the necessary bits there
out of the box.

I suspect it would be sufficient for libxc (likely xc_dom_allocate) to
call elf_xen_parse_features a second time (or first if features == NULL)
to union the required features into f_requested. You might also need to
blacklist features which PVH is not comfortable with and error out if
the user asked for them at the appropriate time. You will need to do
something similar for kernels which declare a requirement for a feature
which PVH doesn't coexist with (are there any such XENFEAT_*?).

Actually, I think you might want to add a second array of f_required,
that is the set of features which absolutely have to be there and plumb
that down. This corresponds to the third parameter to
elf_xen_parse_features which is currently unused at the xc_dom_allocate
call site. The distinction probably becomes relevant when you support
pvh=no|yes|ifpossible? IOW if yes then the features are required, if
just ifpossible then they are only requested. Not sure, hopefully it
will come out in the wash.

Or maybe it actually makes sense to separate out the user requested
f_{requested,required} field from the libxc internal feature
preferences/requirements. I'm not sure. I'd probably start by reusing
the f_foo ones but if that becomes unwieldy because you find yourself
needing to know whose preference it is then back off into using a
separate pair of fields.

I'm not sure how you are currently signalling to the hypervisor that a
new domain is a PVH domain? I had a look through this patch and must be
being thick because I don't see it.

Anyway, at the point where you set that then you can check the set of
features which the kernel actually ended up supporting vs. those
required to enable PVH and determine if it is actually to enable PVH or
not. I suppose handling the yes vs ifpossible cases might mean two
checks. if yes then maybe in xc_dom_parse_image with the other feature
checks, if it is "ifpossible" I guess it will be near whatever point you
inform the hypervisor that this is to be a PVH domain?

I hope that helps.

Ian.


> 
> thanks
> mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-07-31 12:00           ` Ian Campbell
@ 2013-08-01  2:02             ` Mukesh Rathor
  2013-08-01  8:01               ` Ian Campbell
  2013-08-29 11:13               ` George Dunlap
  0 siblings, 2 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-08-01  2:02 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel, Ian Jackson

On Wed, 31 Jul 2013 13:00:57 +0100
Ian Campbell <Ian.Campbell@citrix.com> wrote:

> On Tue, 2013-07-30 at 16:47 -0700, Mukesh Rathor wrote:
> > On Mon, 17 Jun 2013 12:11:34 +0100
> > Ian Campbell <Ian.Campbell@citrix.com> wrote:
......
> 
> I think there's a bit of confusion because the current libxc interface
> is there to support user-driven direct override of the required
> feature flags. IOW a user literally writes "features = FOO" in their
> config file and that ends up being f_requested. Although libxl
> supports this concept it is not plumbed into xl, I don't know/care
> what xend does either.
> 
> In any case this is not the use case you are looking for. What we want
> for PVH is for libxc internally to decide on a set of features which
> are required to launch a domain with a specific configuration, in
> this case PVH. That's slightly orthogonal to the existing stuff. This
> isn't something which has come up yet and so PVH will be the first to
> go down this path, which is why you aren't finding all the necessary
> bits there out of the box.

> I suspect it would be sufficient for libxc (likely xc_dom_allocate) to
> call elf_xen_parse_features a second time (or first if features ==
> NULL) to union the required features into f_requested. You might also
> need to blacklist features which PVH is not comfortable with and
> error out if the user asked for them at the appropriate time. You
> will need to do something similar for kernels which declare a
> requirement for a feature which PVH doesn't coexist with (are there
> any such XENFEAT_*?).

If libxl is already parsing and supposed to be passing features
parameter to xc_dom_allocate(), why can't we just let it set the
string for PVH when calling xc_dom_allocate in libxl__build_pv? That way 
libxc can remain transparent.  For tools, PVH is a PV guest with some 
features like auto-xlate etc.., so the more we hide it, the better IMO.

If the answer is still no,. it appears that xc_dom_allocate is the best 
place to put the feature strings. Since, for PVH, features are pre-determined, 
features not being NULL would be an error. I can juse use the existing 
xc_interface_core.flags? (would like to rename it to xc_flags so one can easily
find its usages please :)). So:

xc_dom_allocate:

    if (xch->flags & PVH)
    {
        if (features)
        {
            error
            return NULL;
        }
        features = writable_descriptor_tables|auto_translated_physmap"
                   "|supervisor_mode_kernel|hvm_callback_vector;
    }
    if ( features )
        elf_xen_parse_features(features, dom->f_requested, NULL);

what do you think?

Acutally, wait! Looking at code more, I think I found the place we need
to put the check.  In xc_dom_parse_elf_kernel:

After:
    if ( elf_xen_feature_get(XENFEAT_dom0, dom->parms.f_required) )
    {
        xc_dom_panic(dom->xch, XC_INVALID_KERNEL, "%s: Kernel does not"
                     " support unprivileged (DomU) operation", __FUNCTION__);
        rc = -EINVAL;
        goto out;
    }
Add:
    if (dom->pvh)
    {
        if ( !elf_xen_feature_get(XENFEAT_hvm_callback_vector, 
                                 dom->parms.f_supported)   ||
             !elf_xen_feature_get(XENFEAT_auto_translated_physmap
                                 dom->parms.f_supported)   ||
             ...
        {
            xc_dom_panic(dom->xch, XC_INVALID_KERNEL, "%s: Kernel does not 
                         support PVH"......
            rc = -EINVAL;
            goto out;
        }
    }

BTW, i think the check should be against f_supported and not f_required.
This seems like the best solution to me. Agree?

Also, it's pvh=yes/no for now. For experimental phases we don't want if 
possible. There was a discussion, and a decision IIRC, about just booting 
PV in PVH mode "if possible" by default in future, but not now when we
are in the experimental phase.

> Actually, I think you might want to add a second array of f_required,
> that is the set of features which absolutely have to be there and
> plumb that down. This corresponds to the third parameter to
> elf_xen_parse_features which is currently unused at the
> xc_dom_allocate call site. The distinction probably becomes relevant
> when you support pvh=no|yes|ifpossible? IOW if yes then the features
> are required, if just ifpossible then they are only requested. Not
> sure, hopefully it will come out in the wash.
> 
> Or maybe it actually makes sense to separate out the user requested
> f_{requested,required} field from the libxc internal feature
> preferences/requirements. I'm not sure. I'd probably start by reusing
> the f_foo ones but if that becomes unwieldy because you find yourself
> needing to know whose preference it is then back off into using a
> separate pair of fields.
> 
> I'm not sure how you are currently signalling to the hypervisor that a
> new domain is a PVH domain? I had a look through this patch and must
> be being thick because I don't see it.

I had a flag set, but it was recommended during RFC to remove it. So,
now in xen, a PV with HAP is a PVH guest:

do_domctl():
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hvm_guest )
             domcr_flags |= DOMCRF_hvm;
+        else if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap )
+            domcr_flags |= DOMCRF_pvh;     /* PV with HAP is a PVH guest */
+

thanks for your help.
Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-01  2:02             ` Mukesh Rathor
@ 2013-08-01  8:01               ` Ian Campbell
  2013-08-02  1:12                 ` Mukesh Rathor
  2013-08-29  1:51                 ` Mukesh Rathor
  2013-08-29 11:13               ` George Dunlap
  1 sibling, 2 replies; 80+ messages in thread
From: Ian Campbell @ 2013-08-01  8:01 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Ian Jackson

On Wed, 2013-07-31 at 19:02 -0700, Mukesh Rathor wrote:
> On Wed, 31 Jul 2013 13:00:57 +0100
> Ian Campbell <Ian.Campbell@citrix.com> wrote:
> 
> > On Tue, 2013-07-30 at 16:47 -0700, Mukesh Rathor wrote:
> > > On Mon, 17 Jun 2013 12:11:34 +0100
> > > Ian Campbell <Ian.Campbell@citrix.com> wrote:
> ......
> > 
> > I think there's a bit of confusion because the current libxc interface
> > is there to support user-driven direct override of the required
> > feature flags. IOW a user literally writes "features = FOO" in their
> > config file and that ends up being f_requested. Although libxl
> > supports this concept it is not plumbed into xl, I don't know/care
> > what xend does either.
> > 
> > In any case this is not the use case you are looking for. What we want
> > for PVH is for libxc internally to decide on a set of features which
> > are required to launch a domain with a specific configuration, in
> > this case PVH. That's slightly orthogonal to the existing stuff. This
> > isn't something which has come up yet and so PVH will be the first to
> > go down this path, which is why you aren't finding all the necessary
> > bits there out of the box.
> 
> > I suspect it would be sufficient for libxc (likely xc_dom_allocate) to
> > call elf_xen_parse_features a second time (or first if features ==
> > NULL) to union the required features into f_requested. You might also
> > need to blacklist features which PVH is not comfortable with and
> > error out if the user asked for them at the appropriate time. You
> > will need to do something similar for kernels which declare a
> > requirement for a feature which PVH doesn't coexist with (are there
> > any such XENFEAT_*?).
> 
> If libxl is already parsing and supposed to be passing features
> parameter to xc_dom_allocate(), why can't we just let it set the
> string for PVH when calling xc_dom_allocate in libxl__build_pv?

Because I think that is the wrong layer.

In any case libxl isn't parsing any features, it is passing a user
provided string through uninspected, libxl doesn't know what those
features mean.

> That way  libxc can remain transparent.  For tools, PVH is a PV guest with some 
> features like auto-xlate etc.., so the more we hide it, the better IMO.

Yes, that's my point. Forcing libxl to set a string for PVH when calling
xc_dom_allocate is not making libxc transparent or hiding things, it's
the opposite.

> If the answer is still no,. it appears that xc_dom_allocate is the best 
> place to put the feature strings. Since, for PVH, features are pre-determined, 
> features not being NULL would be an error.

I think it is valid for a user to both request PVH mode and request some
features of their own which they want to enable for this guest. The code
should be set up to deal with  this, even if that currently means
rejecting any user request for a feature not in the PVH set (although
I'd prefer to see a stronger rationale for rejecting a requested feature
than that).

>  I can juse use the existing 
> xc_interface_core.flags? (would like to rename it to xc_flags so one can easily
> find its usages please :)).

Pointless churn. You've made this argument countless times about various
and been told no by several different people, please just drop it.

>  So:
> 
> xc_dom_allocate:
> 
>     if (xch->flags & PVH)
>     {
>         if (features)
>         {
>             error
>             return NULL;

No, please handle the case of the user asking for features.

At the very least if they only ask for things in the set you are forcing
then it is fine.

>         }
>         features = writable_descriptor_tables|auto_translated_physmap"
>                    "|supervisor_mode_kernel|hvm_callback_vector;

Don't do this. Instead, drop this whole if and add below:

>     }
>     if ( features )
>         elf_xen_parse_features(features, dom->f_requested, NULL);

	if ( xch->flags & PVH )
		elf_xen_parse_features("writable_desc...|etc", dom->f_requested,
dom->f_required)

I'm surprised that this features are only requested for PVH and not
required (more on this below).

I see PVH is a flag, are you not intending to handle it as a tristate
i.e.:
	PVH = no  : do not use PVH ever
	PVH = yes : either enable PVH or fail
        PVH = default : libxc decides if PVH is possible, based on 
		kernels supported feature set.

Long term "default" should be the, err, default e.g. so that people can
boot PVH or PV kernels via pygrub at their whim.

(I see you've just answered this below, nevermind)

> what do you think?
> 
> Acutally, wait! Looking at code more, I think I found the place we need
> to put the check.  In xc_dom_parse_elf_kernel:
> 
> After:
>     if ( elf_xen_feature_get(XENFEAT_dom0, dom->parms.f_required) )
>     {
>         xc_dom_panic(dom->xch, XC_INVALID_KERNEL, "%s: Kernel does not"
>                      " support unprivileged (DomU) operation", __FUNCTION__);
>         rc = -EINVAL;
>         goto out;
>     }
> Add:
>     if (dom->pvh)
>     {
>         if ( !elf_xen_feature_get(XENFEAT_hvm_callback_vector, 
>                                  dom->parms.f_supported)   ||
>              !elf_xen_feature_get(XENFEAT_auto_translated_physmap
>                                  dom->parms.f_supported)   ||
>              ...
>         {
>             xc_dom_panic(dom->xch, XC_INVALID_KERNEL, "%s: Kernel does not 
>                          support PVH"......
>             rc = -EINVAL;
>             goto out;
>         }
>     }

This is the sort of thing I was imagining, it's obviously going to need
to be more complex if you want to support optional PVH though.

> BTW, i think the check should be against f_supported and not f_required.
> This seems like the best solution to me. Agree?

I think you may need to separate out the concept of "required/supported
by the kernel" from "required/supported by the domain configuration" in
your thinking (if not the code). This is why I was suggesting you might
need to leave the exising fields for the kernel supported set and have a
separate pair of fields for the set required by the configuration -- I'm
starting to think that make sense, especially once you start to consider
the PVH = default case.

> Also, it's pvh=yes/no for now. For experimental phases we don't want if 
> possible. There was a discussion, and a decision IIRC, about just booting 
> PV in PVH mode "if possible" by default in future, but not now when we
> are in the experimental phase.

Hrm, at the libxc API level this is acceptable (since we can evolve that
API) but it needs to be a libxl_defbool at the libxl level because we
cannot change that once it goes in (not easily at least). This should
just be a case of libxl_defbool_setdefault(false) in the appropriate
setdefaults function in libxl.

> > Actually, I think you might want to add a second array of f_required,
> > that is the set of features which absolutely have to be there and
> > plumb that down. This corresponds to the third parameter to
> > elf_xen_parse_features which is currently unused at the
> > xc_dom_allocate call site. The distinction probably becomes relevant
> > when you support pvh=no|yes|ifpossible? IOW if yes then the features
> > are required, if just ifpossible then they are only requested. Not
> > sure, hopefully it will come out in the wash.
> > 
> > Or maybe it actually makes sense to separate out the user requested
> > f_{requested,required} field from the libxc internal feature
> > preferences/requirements. I'm not sure. I'd probably start by reusing
> > the f_foo ones but if that becomes unwieldy because you find yourself
> > needing to know whose preference it is then back off into using a
> > separate pair of fields.
> > 
> > I'm not sure how you are currently signalling to the hypervisor that a
> > new domain is a PVH domain? I had a look through this patch and must
> > be being thick because I don't see it.
> 
> I had a flag set, but it was recommended during RFC to remove it. So,
> now in xen, a PV with HAP is a PVH guest:

Ah, that makes sense.

Ian.

> 
> do_domctl():
>          if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hvm_guest )
>              domcr_flags |= DOMCRF_hvm;
> +        else if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap )
> +            domcr_flags |= DOMCRF_pvh;     /* PV with HAP is a PVH guest */
> +
> 
> thanks for your help.
> Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-01  8:01               ` Ian Campbell
@ 2013-08-02  1:12                 ` Mukesh Rathor
  2013-08-29  1:51                 ` Mukesh Rathor
  1 sibling, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-08-02  1:12 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel, Ian Jackson

On Thu, 1 Aug 2013 09:01:31 +0100
Ian Campbell <Ian.Campbell@citrix.com> wrote:

> On Wed, 2013-07-31 at 19:02 -0700, Mukesh Rathor wrote:
> > On Wed, 31 Jul 2013 13:00:57 +0100
.....
> > Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > 
> No, please handle the case of the user asking for features.
> 
> At the very least if they only ask for things in the set you are
> forcing then it is fine.
> 
> >         }
> >         features =
> > writable_descriptor_tables|auto_translated_physmap"
> > "|supervisor_mode_kernel|hvm_callback_vector;
> 
> Don't do this. Instead, drop this whole if and add below:
> 
> >     }
> >     if ( features )
> >         elf_xen_parse_features(features, dom->f_requested, NULL);
> 
> 	if ( xch->flags & PVH )
> 		elf_xen_parse_features("writable_desc...|etc",
> dom->f_requested, dom->f_required)
> 
> I'm surprised that this features are only requested for PVH and not
> required (more on this below).
> 
> I see PVH is a flag, are you not intending to handle it as a tristate
> i.e.:
> 	PVH = no  : do not use PVH ever
> 	PVH = yes : either enable PVH or fail
>         PVH = default : libxc decides if PVH is possible, based on 
> 		kernels supported feature set.

Wasn't planning on it, since the original plan was to just have this
temporary while in experimental phase. The prev discussion was to just 
boot in PVH mode if the kernel elf supports it. 

> > Acutally, wait! Looking at code more, I think I found the place we
> > need to put the check.  In xc_dom_parse_elf_kernel:
> > 
> > After:
> >     if ( elf_xen_feature_get(XENFEAT_dom0, dom->parms.f_required) )
> >     {
> >         xc_dom_panic(dom->xch, XC_INVALID_KERNEL, "%s: Kernel does
> > not" " support unprivileged (DomU) operation", __FUNCTION__);
> >         rc = -EINVAL;
> >         goto out;
> >     }
> > Add:
> >     if (dom->pvh)
> >     {
> >         if ( !elf_xen_feature_get(XENFEAT_hvm_callback_vector, 
> >                                  dom->parms.f_supported)   ||
> >              !elf_xen_feature_get(XENFEAT_auto_translated_physmap
> >                                  dom->parms.f_supported)   ||
> >              ...
> >         {
> >             xc_dom_panic(dom->xch, XC_INVALID_KERNEL, "%s: Kernel
> > does not support PVH"......
> >             rc = -EINVAL;
> >             goto out;
> >         }
> >     }
> 
> This is the sort of thing I was imagining, it's obviously going to
> need to be more complex if you want to support optional PVH though.

Ok. With limited time I've to work on PVH right now, I'll hack up a 
patch with binary pvh and post it.

thanks
mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-06-17 11:11       ` Ian Campbell
  2013-07-30 23:47         ` Mukesh Rathor
@ 2013-08-29  0:14         ` Mukesh Rathor
  1 sibling, 0 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-08-29  0:14 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel, Ian Jackson

On Mon, 17 Jun 2013 12:11:34 +0100
Ian Campbell <Ian.Campbell@citrix.com> wrote:
...... 
> > > > @@ -245,6 +245,7 @@ libxl_domain_create_info =
> > > > Struct("domain_create_info",[ ("platformdata",
> > > > libxl_key_value_list), ("poolid",       uint32),
> > > >      ("run_hotplug_scripts",libxl_defbool),
> > > > +    ("pvh",          libxl_defbool),
> > > >      ], dir=DIR_IN)
> > > >  
> > > >  MemKB = UInt(64, init_val = "LIBXL_MEMKB_DEFAULT")
> > > > @@ -346,6 +347,7 @@ libxl_domain_build_info =
> > > > Struct("domain_build_info",[ ])),
> > > >                   ("invalid", Struct(None, [])),
> > > >                   ], keyvar_init_val =
> > > > "LIBXL_DOMAIN_TYPE_INVALID")),
> > > > +    ("pvh",       libxl_defbool),
> > > 
> > > I'm not quite convinced if the need for both of these bools in
> > > both create and build, it's a bit of an odd quirk in our API
> > > which I need to consider a bit deeper.
> > 
> > Ok, please let me know.
> 
> Which places need the one in c_info and which the one in b_info?
> 
> c_info is presumably for the createdomain domctl call while b_info is
> stuff spread around the build process to handle the various
> differences?
> 
> Perhaps libxl__domain_create_state is the right place for the b_info
> one, initialised internally to libxl from the c_info one? In that
> context it would be a straight bool and not a defbool since we would
> know exactly what the domain was by this point.

I think you mean libxl__domain_build_state, and not libxl__domain_create_state
right? I think I can set a field in libxl__domain_build_state in 
libxl__build_pre() and get rid of the pvh bool field from b_info, and make
it work...

thanks
mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-01  8:01               ` Ian Campbell
  2013-08-02  1:12                 ` Mukesh Rathor
@ 2013-08-29  1:51                 ` Mukesh Rathor
  2013-08-29  9:01                   ` Ian Campbell
  1 sibling, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-08-29  1:51 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel, Ian Jackson

On Thu, 1 Aug 2013 09:01:31 +0100
Ian Campbell <Ian.Campbell@citrix.com> wrote:

> On Wed, 2013-07-31 at 19:02 -0700, Mukesh Rathor wrote:
> > On Wed, 31 Jul 2013 13:00:57 +0100
> > Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > 
> > > On Tue, 2013-07-30 at 16:47 -0700, Mukesh Rathor wrote:
> > > > On Mon, 17 Jun 2013 12:11:34 +0100
> > > > Ian Campbell <Ian.Campbell@citrix.com> wrote:
...
> I think it is valid for a user to both request PVH mode and request
> some features of their own which they want to enable for this guest.
> The code should be set up to deal with  this, even if that currently
> means rejecting any user request for a feature not in the PVH set
> (although I'd prefer to see a stronger rationale for rejecting a
> requested feature than that).
> 
> >  I can juse use the existing 
> > xc_interface_core.flags? (would like to rename it to xc_flags so
> > one can easily find its usages please :)).
> 
> Pointless churn. You've made this argument countless times about
> various and been told no by several different people, please just
> drop it.
> 
> >  So:
> > 
> > xc_dom_allocate:
> > 
> >     if (xch->flags & PVH)
> >     {
> >         if (features)
> >         {
> >             error
> >             return NULL;
> 
> No, please handle the case of the user asking for features.
> 
> At the very least if they only ask for things in the set you are
> forcing then it is fine.
> 
> >         }
> >         features =
> > writable_descriptor_tables|auto_translated_physmap"
> > "|supervisor_mode_kernel|hvm_callback_vector;
> 
> Don't do this. Instead, drop this whole if and add below:
> 
> >     }
> >     if ( features )
> >         elf_xen_parse_features(features, dom->f_requested, NULL);
> 
> 	if ( xch->flags & PVH )
> 		elf_xen_parse_features("writable_desc...|etc",
> dom->f_requested, dom->f_required)

Hmm.. the problem I am running here now is setting of PVH flag in 
xch->flags from libxl? struct xch seems to be private to libxc. So
short of creating a new xc_ interface just to set it, since I can't
add any parameters to xc_dom_allocate(), I can't think of any other
way. Also, the time when xc_interface is allcoated, we have not parsed
the config file to set it right then.

thanks
mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-29  1:51                 ` Mukesh Rathor
@ 2013-08-29  9:01                   ` Ian Campbell
  2013-08-30  0:45                     ` Mukesh Rathor
  0 siblings, 1 reply; 80+ messages in thread
From: Ian Campbell @ 2013-08-29  9:01 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Ian Jackson

On Wed, 2013-08-28 at 18:51 -0700, Mukesh Rathor wrote:
> On Thu, 1 Aug 2013 09:01:31 +0100
> Ian Campbell <Ian.Campbell@citrix.com> wrote:
> 
> > On Wed, 2013-07-31 at 19:02 -0700, Mukesh Rathor wrote:
> > > On Wed, 31 Jul 2013 13:00:57 +0100
> > > Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > > 
> > > > On Tue, 2013-07-30 at 16:47 -0700, Mukesh Rathor wrote:
> > > > > On Mon, 17 Jun 2013 12:11:34 +0100
> > > > > Ian Campbell <Ian.Campbell@citrix.com> wrote:
> ...
> > I think it is valid for a user to both request PVH mode and request
> > some features of their own which they want to enable for this guest.
> > The code should be set up to deal with  this, even if that currently
> > means rejecting any user request for a feature not in the PVH set
> > (although I'd prefer to see a stronger rationale for rejecting a
> > requested feature than that).
> > 
> > >  I can juse use the existing 
> > > xc_interface_core.flags? (would like to rename it to xc_flags so
> > > one can easily find its usages please :)).
> > 
> > Pointless churn. You've made this argument countless times about
> > various and been told no by several different people, please just
> > drop it.
> > 
> > >  So:
> > > 
> > > xc_dom_allocate:
> > > 
> > >     if (xch->flags & PVH)
> > >     {
> > >         if (features)
> > >         {
> > >             error
> > >             return NULL;
> > 
> > No, please handle the case of the user asking for features.
> > 
> > At the very least if they only ask for things in the set you are
> > forcing then it is fine.
> > 
> > >         }
> > >         features =
> > > writable_descriptor_tables|auto_translated_physmap"
> > > "|supervisor_mode_kernel|hvm_callback_vector;
> > 
> > Don't do this. Instead, drop this whole if and add below:
> > 
> > >     }
> > >     if ( features )
> > >         elf_xen_parse_features(features, dom->f_requested, NULL);
> > 
> > 	if ( xch->flags & PVH )
> > 		elf_xen_parse_features("writable_desc...|etc",
> > dom->f_requested, dom->f_required)
> 
> Hmm.. the problem I am running here now is setting of PVH flag in 
> xch->flags from libxl? struct xch seems to be private to libxc.

xch is the libxc handle used by all the api calls, so it can't be
private to libxc. There is an xch inside the libxl ctx, use either
ctx->xch or CTX->xch depending on whether you have a ctx or a gc in the
function in question.

Actually, xch->flags & PVH is not the right place. xch is a handle onto
an open libxc instance, it is not per-domain, so adding PVH to
xch->flags is wrong. Not sure how I missed that initially.

I think you need to add the flag to the dom->flags in libxl__build_pv. I
don't think anything before the existing setting of that field needs to
know if the guest is PVH or not. The calls between xc_dom_allocate and
there are xc_dom_(kernel|ramdisk)_(file|mem) which are just setting up
internal state and not touching the guest yet. If I'm wrong about that
then I think the block setting all of those dom->fields can be moved up.

Ian.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-01  2:02             ` Mukesh Rathor
  2013-08-01  8:01               ` Ian Campbell
@ 2013-08-29 11:13               ` George Dunlap
  2013-08-29 11:29                 ` Ian Campbell
  1 sibling, 1 reply; 80+ messages in thread
From: George Dunlap @ 2013-08-29 11:13 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Ian Jackson, Ian Campbell

On Thu, Aug 1, 2013 at 3:02 AM, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>> I'm not sure how you are currently signalling to the hypervisor that a
>> new domain is a PVH domain? I had a look through this patch and must
>> be being thick because I don't see it.
>
> I had a flag set, but it was recommended during RFC to remove it. So,
> now in xen, a PV with HAP is a PVH guest:

Why was it recommended to remove it?

"PVH == PV + HAP" is a ridiculous interface, and one which will make
it hard to import shadow in the future.  In my series I'm planning on
adding XEN_DOMCTL_CDF_pvh_guest, and using that instead.

The reason I didn't mention it before was that, not seeing any
toolstack patches, I thought it was just a hack to be able to create a
PVH guest without needing to modify xl/libxl to support it.

 -George

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-29 11:13               ` George Dunlap
@ 2013-08-29 11:29                 ` Ian Campbell
  2013-08-30  1:24                   ` Mukesh Rathor
  0 siblings, 1 reply; 80+ messages in thread
From: Ian Campbell @ 2013-08-29 11:29 UTC (permalink / raw)
  To: George Dunlap; +Cc: Xen-devel, Ian Jackson

On Thu, 2013-08-29 at 12:13 +0100, George Dunlap wrote:
> On Thu, Aug 1, 2013 at 3:02 AM, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> >> I'm not sure how you are currently signalling to the hypervisor that a
> >> new domain is a PVH domain? I had a look through this patch and must
> >> be being thick because I don't see it.
> >
> > I had a flag set, but it was recommended during RFC to remove it. So,
> > now in xen, a PV with HAP is a PVH guest:
> 
> Why was it recommended to remove it?
> 
> "PVH == PV + HAP" is a ridiculous interface, and one which will make
> it hard to import shadow in the future.  In my series I'm planning on
> adding XEN_DOMCTL_CDF_pvh_guest, and using that instead.

These are not stable ABI interfaces, so if someone wants to do PVH with
Shadow then they can just change it.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-29  9:01                   ` Ian Campbell
@ 2013-08-30  0:45                     ` Mukesh Rathor
  2013-08-30  9:56                       ` Ian Campbell
  0 siblings, 1 reply; 80+ messages in thread
From: Mukesh Rathor @ 2013-08-30  0:45 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel, Ian Jackson

On Thu, 29 Aug 2013 10:01:25 +0100
Ian Campbell <Ian.Campbell@citrix.com> wrote:

> On Wed, 2013-08-28 at 18:51 -0700, Mukesh Rathor wrote:
> > On Thu, 1 Aug 2013 09:01:31 +0100
> > Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > 
> > > On Wed, 2013-07-31 at 19:02 -0700, Mukesh Rathor wrote:
> > > > On Wed, 31 Jul 2013 13:00:57 +0100
> > > > Ian Campbell <Ian.Campbell@citrix.com> wrote:
...
> > Hmm.. the problem I am running here now is setting of PVH flag in 
> > xch->flags from libxl? struct xch seems to be private to libxc.
> 
> xch is the libxc handle used by all the api calls, so it can't be
> private to libxc. There is an xch inside the libxl ctx, use either
> ctx->xch or CTX->xch depending on whether you have a ctx or a gc in
> the function in question.
> 
> Actually, xch->flags & PVH is not the right place. xch is a handle
> onto an open libxc instance, it is not per-domain, so adding PVH to
> xch->flags is wrong. Not sure how I missed that initially.
> 
> I think you need to add the flag to the dom->flags in
> libxl__build_pv. I don't think anything before the existing setting
> of that field needs to know if the guest is PVH or not. The calls
> between xc_dom_allocate and there are
> xc_dom_(kernel|ramdisk)_(file|mem) which are just setting up internal
> state and not touching the guest yet. If I'm wrong about that then I
> think the block setting all of those dom->fields can be moved up.

The problem is I need to tell xc_dom_allocate() it's a PVH guest somehow 
so it can call elf_xen_parse_features for PVH also. Since,
thats not feasible, I can set the pvh flag in libxl__build_pv, and 
xc_dom_parse_image() can then parse PVH features by calling
elf_xen_parse_features(). LMK if thats not OK.

thanks
Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-29 11:29                 ` Ian Campbell
@ 2013-08-30  1:24                   ` Mukesh Rathor
  2013-08-30  9:53                     ` Ian Campbell
  2013-08-30 10:27                     ` George Dunlap
  0 siblings, 2 replies; 80+ messages in thread
From: Mukesh Rathor @ 2013-08-30  1:24 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, Xen-devel, Ian Jackson

On Thu, 29 Aug 2013 12:29:44 +0100
Ian Campbell <Ian.Campbell@citrix.com> wrote:

> On Thu, 2013-08-29 at 12:13 +0100, George Dunlap wrote:
> > On Thu, Aug 1, 2013 at 3:02 AM, Mukesh Rathor
> > <mukesh.rathor@oracle.com> wrote:
> > >> I'm not sure how you are currently signalling to the hypervisor
> > >> that a new domain is a PVH domain? I had a look through this
> > >> patch and must be being thick because I don't see it.
> > >
> > > I had a flag set, but it was recommended during RFC to remove it.
> > > So, now in xen, a PV with HAP is a PVH guest:
> > 
> > Why was it recommended to remove it?
> > 
> > "PVH == PV + HAP" is a ridiculous interface, and one which will make
> > it hard to import shadow in the future.  In my series I'm planning
> > on adding XEN_DOMCTL_CDF_pvh_guest, and using that instead.
> 
> These are not stable ABI interfaces, so if someone wants to do PVH
> with Shadow then they can just change it.

I thought we named PVH for PV with HAP :).. for shadow are we going
to rename it to PVS?? :)..... Besides for shadow the tools do the right
thing:

arch_setup_meminit():
    if ( xc_dom_feature_translated(dom) && !dom->pvh_enabled )
    {
        dom->shadow_enabled = 1;
        rc = x86_shadow(dom->xch, dom->guest_domid);
        ..

In any case, I am ok either way...

Mukesh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-30  1:24                   ` Mukesh Rathor
@ 2013-08-30  9:53                     ` Ian Campbell
  2013-08-30 10:22                       ` George Dunlap
  2013-08-30 10:27                     ` George Dunlap
  1 sibling, 1 reply; 80+ messages in thread
From: Ian Campbell @ 2013-08-30  9:53 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: George Dunlap, Xen-devel, Ian Jackson

On Thu, 2013-08-29 at 18:24 -0700, Mukesh Rathor wrote:
> On Thu, 29 Aug 2013 12:29:44 +0100
> Ian Campbell <Ian.Campbell@citrix.com> wrote:
> 
> > On Thu, 2013-08-29 at 12:13 +0100, George Dunlap wrote:
> > > On Thu, Aug 1, 2013 at 3:02 AM, Mukesh Rathor
> > > <mukesh.rathor@oracle.com> wrote:
> > > >> I'm not sure how you are currently signalling to the hypervisor
> > > >> that a new domain is a PVH domain? I had a look through this
> > > >> patch and must be being thick because I don't see it.
> > > >
> > > > I had a flag set, but it was recommended during RFC to remove it.
> > > > So, now in xen, a PV with HAP is a PVH guest:
> > > 
> > > Why was it recommended to remove it?
> > > 
> > > "PVH == PV + HAP" is a ridiculous interface, and one which will make
> > > it hard to import shadow in the future.  In my series I'm planning
> > > on adding XEN_DOMCTL_CDF_pvh_guest, and using that instead.
> > 
> > These are not stable ABI interfaces, so if someone wants to do PVH
> > with Shadow then they can just change it.
> 
> I thought we named PVH for PV with HAP :)

I thought it was H for HVM myself ;-)

Ian.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-30  0:45                     ` Mukesh Rathor
@ 2013-08-30  9:56                       ` Ian Campbell
  0 siblings, 0 replies; 80+ messages in thread
From: Ian Campbell @ 2013-08-30  9:56 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Ian Jackson

On Thu, 2013-08-29 at 17:45 -0700, Mukesh Rathor wrote:
> On Thu, 29 Aug 2013 10:01:25 +0100
> Ian Campbell <Ian.Campbell@citrix.com> wrote:
> 
> > On Wed, 2013-08-28 at 18:51 -0700, Mukesh Rathor wrote:
> > > On Thu, 1 Aug 2013 09:01:31 +0100
> > > Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > > 
> > > > On Wed, 2013-07-31 at 19:02 -0700, Mukesh Rathor wrote:
> > > > > On Wed, 31 Jul 2013 13:00:57 +0100
> > > > > Ian Campbell <Ian.Campbell@citrix.com> wrote:
> ...
> > > Hmm.. the problem I am running here now is setting of PVH flag in 
> > > xch->flags from libxl? struct xch seems to be private to libxc.
> > 
> > xch is the libxc handle used by all the api calls, so it can't be
> > private to libxc. There is an xch inside the libxl ctx, use either
> > ctx->xch or CTX->xch depending on whether you have a ctx or a gc in
> > the function in question.
> > 
> > Actually, xch->flags & PVH is not the right place. xch is a handle
> > onto an open libxc instance, it is not per-domain, so adding PVH to
> > xch->flags is wrong. Not sure how I missed that initially.
> > 
> > I think you need to add the flag to the dom->flags in
> > libxl__build_pv. I don't think anything before the existing setting
> > of that field needs to know if the guest is PVH or not. The calls
> > between xc_dom_allocate and there are
> > xc_dom_(kernel|ramdisk)_(file|mem) which are just setting up internal
> > state and not touching the guest yet. If I'm wrong about that then I
> > think the block setting all of those dom->fields can be moved up.
> 
> The problem is I need to tell xc_dom_allocate() it's a PVH guest somehow 
> so it can call elf_xen_parse_features for PVH also.

I think you can do the PVH version of this call later, at whichever
point consuming dom->flags&PVH makes sense.

>  Since,
> thats not feasible, I can set the pvh flag in libxl__build_pv, and 
> xc_dom_parse_image() can then parse PVH features by calling
> elf_xen_parse_features(). LMK if thats not OK.

I can see how that would make sense, yes.

Ian.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-30  9:53                     ` Ian Campbell
@ 2013-08-30 10:22                       ` George Dunlap
  0 siblings, 0 replies; 80+ messages in thread
From: George Dunlap @ 2013-08-30 10:22 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel, Ian Jackson

On 30/08/13 10:53, Ian Campbell wrote:
> On Thu, 2013-08-29 at 18:24 -0700, Mukesh Rathor wrote:
>> On Thu, 29 Aug 2013 12:29:44 +0100
>> Ian Campbell <Ian.Campbell@citrix.com> wrote:
>>
>>> On Thu, 2013-08-29 at 12:13 +0100, George Dunlap wrote:
>>>> On Thu, Aug 1, 2013 at 3:02 AM, Mukesh Rathor
>>>> <mukesh.rathor@oracle.com> wrote:
>>>>>> I'm not sure how you are currently signalling to the hypervisor
>>>>>> that a new domain is a PVH domain? I had a look through this
>>>>>> patch and must be being thick because I don't see it.
>>>>> I had a flag set, but it was recommended during RFC to remove it.
>>>>> So, now in xen, a PV with HAP is a PVH guest:
>>>> Why was it recommended to remove it?
>>>>
>>>> "PVH == PV + HAP" is a ridiculous interface, and one which will make
>>>> it hard to import shadow in the future.  In my series I'm planning
>>>> on adding XEN_DOMCTL_CDF_pvh_guest, and using that instead.
>>> These are not stable ABI interfaces, so if someone wants to do PVH
>>> with Shadow then they can just change it.
>> I thought we named PVH for PV with HAP :)
> I thought it was H for HVM myself ;-)

We always talked about an "HVM container", in part to gain back the 
extra protection levels lost when they took away the segmentation limits 
for x86-64.

 From a Linux maintenance perspective, autotranslate is of course a big 
win; but there's no reason in principle that we couldn't have used 
shadow pagetables for that.  HAP is a big win in some cases, but a loss 
in others; it is not, as far as I'm concerned, the primary reason for 
introducing this mode.

  -George

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/18] PVH xen: tools changes to create PVH domain
  2013-08-30  1:24                   ` Mukesh Rathor
  2013-08-30  9:53                     ` Ian Campbell
@ 2013-08-30 10:27                     ` George Dunlap
  1 sibling, 0 replies; 80+ messages in thread
From: George Dunlap @ 2013-08-30 10:27 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Ian Jackson, Ian Campbell

On 30/08/13 02:24, Mukesh Rathor wrote:
> On Thu, 29 Aug 2013 12:29:44 +0100
> Ian Campbell <Ian.Campbell@citrix.com> wrote:
>
>> On Thu, 2013-08-29 at 12:13 +0100, George Dunlap wrote:
>>> On Thu, Aug 1, 2013 at 3:02 AM, Mukesh Rathor
>>> <mukesh.rathor@oracle.com> wrote:
>>>>> I'm not sure how you are currently signalling to the hypervisor
>>>>> that a new domain is a PVH domain? I had a look through this
>>>>> patch and must be being thick because I don't see it.
>>>> I had a flag set, but it was recommended during RFC to remove it.
>>>> So, now in xen, a PV with HAP is a PVH guest:
>>> Why was it recommended to remove it?
>>>
>>> "PVH == PV + HAP" is a ridiculous interface, and one which will make
>>> it hard to import shadow in the future.  In my series I'm planning
>>> on adding XEN_DOMCTL_CDF_pvh_guest, and using that instead.
>> These are not stable ABI interfaces, so if someone wants to do PVH
>> with Shadow then they can just change it.
> I thought we named PVH for PV with HAP :).. for shadow are we going
> to rename it to PVS?? :)..... Besides for shadow the tools do the right
> thing:
>
> arch_setup_meminit():
>      if ( xc_dom_feature_translated(dom) && !dom->pvh_enabled )
>      {
>          dom->shadow_enabled = 1;
>          rc = x86_shadow(dom->xch, dom->guest_domid);
>          ..
>
> In any case, I am ok either way...

But you said "it was recommended to remove it". Who recommended removing 
it and why?

  -George

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2013-08-30 10:27 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-25  1:25 [PATCH 00/18][V6]: PVH xen: version 6 patches Mukesh Rathor
2013-05-25  1:25 ` [PATCH 01/18] PVH xen: turn gdb_frames/gdt_ents into union Mukesh Rathor
2013-05-31  9:13   ` Jan Beulich
2013-05-25  1:25 ` [PATCH 02/18] PVH xen: add XENMEM_add_to_physmap_range Mukesh Rathor
2013-05-31  9:28   ` Jan Beulich
2013-05-31  9:38     ` Ian Campbell
2013-05-31 10:14       ` Jan Beulich
2013-05-31 10:40         ` Ian Campbell
2013-06-05  0:24         ` Mukesh Rathor
2013-06-05  0:31     ` Mukesh Rathor
2013-06-05  7:32       ` Jan Beulich
2013-06-05 20:41         ` Mukesh Rathor
2013-06-06  6:43           ` Jan Beulich
2013-06-06 22:19             ` Mukesh Rathor
2013-06-07  6:13               ` Jan Beulich
2013-06-07 20:46                 ` Mukesh Rathor
2013-06-07 15:08             ` Konrad Rzeszutek Wilk
2013-06-07 15:48               ` Jan Beulich
2013-05-25  1:25 ` [PATCH 03/18] PVH xen: create domctl_memory_mapping() function Mukesh Rathor
2013-05-31  9:46   ` Jan Beulich
2013-06-05  0:47     ` Mukesh Rathor
2013-06-05  7:34       ` Jan Beulich
2013-05-25  1:25 ` [PATCH 04/18] PVH xen: add params to read_segment_register Mukesh Rathor
2013-05-31 10:00   ` Jan Beulich
2013-06-06  1:25     ` Mukesh Rathor
2013-06-06  6:48       ` Jan Beulich
2013-06-07  1:43         ` Mukesh Rathor
2013-06-07  6:29           ` Jan Beulich
2013-06-08  0:45             ` Mukesh Rathor
2013-06-10  8:01               ` Jan Beulich
2013-06-10 23:10                 ` Mukesh Rathor
2013-05-25  1:25 ` [PATCH 05/18] PVH xen: vmx realted preparatory changes for PVH Mukesh Rathor
2013-05-25  1:25 ` [PATCH 06/18] PVH xen: Move e820 fields out of pv_domain struct Mukesh Rathor
2013-06-05 15:33   ` Konrad Rzeszutek Wilk
2013-05-25  1:25 ` [PATCH 07/18] PVH xen: Introduce PVH guest type Mukesh Rathor
2013-05-25  1:25 ` [PATCH 08/18] PVH xen: tools changes to create PVH domain Mukesh Rathor
2013-06-12 14:58   ` Ian Campbell
2013-06-15  0:14     ` Mukesh Rathor
2013-06-17 11:11       ` Ian Campbell
2013-07-30 23:47         ` Mukesh Rathor
2013-07-31 12:00           ` Ian Campbell
2013-08-01  2:02             ` Mukesh Rathor
2013-08-01  8:01               ` Ian Campbell
2013-08-02  1:12                 ` Mukesh Rathor
2013-08-29  1:51                 ` Mukesh Rathor
2013-08-29  9:01                   ` Ian Campbell
2013-08-30  0:45                     ` Mukesh Rathor
2013-08-30  9:56                       ` Ian Campbell
2013-08-29 11:13               ` George Dunlap
2013-08-29 11:29                 ` Ian Campbell
2013-08-30  1:24                   ` Mukesh Rathor
2013-08-30  9:53                     ` Ian Campbell
2013-08-30 10:22                       ` George Dunlap
2013-08-30 10:27                     ` George Dunlap
2013-08-29  0:14         ` Mukesh Rathor
2013-07-31  1:06     ` Mukesh Rathor
2013-07-31 11:32       ` Ian Campbell
2013-05-25  1:25 ` [PATCH 09/18] PVH xen: domain creation code changes Mukesh Rathor
2013-05-25  1:25 ` [PATCH 10/18] PVH xen: create PVH vmcs, and also initialization Mukesh Rathor
2013-05-25  1:25 ` [PATCH 11/18] PVH xen: create read_descriptor_sel() Mukesh Rathor
2013-05-25  1:25 ` [PATCH 12/18] PVH xen: support hypercalls for PVH Mukesh Rathor
2013-06-05 15:27   ` Konrad Rzeszutek Wilk
2013-05-25  1:25 ` [PATCH 13/18] PVH xen: introduce vmx_pvh.c Mukesh Rathor
2013-05-25  1:25 ` [PATCH 14/18] PVH xen: some misc changes like mtrr, intr, msi Mukesh Rathor
2013-05-25  1:25 ` [PATCH 15/18] PVH xen: hcall page initialize, create PVH guest type, etc Mukesh Rathor
2013-05-25  1:25 ` [PATCH 16/18] PVH xen: Miscellaneous changes Mukesh Rathor
2013-06-05 15:39   ` Konrad Rzeszutek Wilk
2013-05-25  1:25 ` [PATCH 17/18] PVH xen: Introduce p2m_map_foreign Mukesh Rathor
2013-05-25  1:25 ` [PATCH 18/18] PVH xen: Add and remove foreign pages Mukesh Rathor
2013-06-05 15:23 ` [PATCH 00/18][V6]: PVH xen: version 6 patches Konrad Rzeszutek Wilk
2013-06-05 15:25   ` George Dunlap
2013-06-05 15:36   ` Ian Campbell
2013-06-05 18:34     ` Konrad Rzeszutek Wilk
2013-06-05 20:51       ` Ian Campbell
2013-06-05 22:01         ` Mukesh Rathor
2013-06-06  8:46           ` Ian Campbell
2013-06-07 13:56             ` Konrad Rzeszutek Wilk
2013-06-06 10:08     ` George Dunlap
2013-06-05 17:14   ` Tim Deegan
2013-06-06  7:29     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.