All of lore.kernel.org
 help / color / mirror / Atom feed
* Introduce PVH domU support
@ 2013-09-13 16:25 George Dunlap
  2013-09-13 16:25 ` [PATCH RFC v12 01/21] Fix failure path in hvm_vcpu_initialise George Dunlap
                   ` (21 more replies)
  0 siblings, 22 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

This patch series is a reworking of a series developed by Mukesh
Rathor at Oracle.  The entirety of the design and development was done
by him; I have only reworked, reorganized, and simplified things in a
way that I think makes more sense.  The vast majority of the credit
for this effort therefore goes to him.  This version is labelled v12
because it is based on his most recent series, v11.

Because this is based on his work, I retain the "Signed-off-by" in
patches which are based on his code.  This is not meant to imply that
he supports the modified version, only that he is involved in
certifying that the origin of the code for copyright purposes.

This is still an RFC, and in several places I have kept code as I
found it even though I didn't understand it or didn't agree with it
because I wanted to be able to discuss it.

This patch series is broken down into several broad strokes:
* Miscellaneous fixes or tweaks
* Code motion, so future patches are simpler
* Introduction of the "hvm_container" concept, which will form the
basis for sharing codepaths between hvm and pvh
* Start with HVM as an HVM container
* Disable unneeded HVM functionality
* Enable PV functionality
* Disable not-yet-implemented functionality

This patch series can also be pulled from this git tree:
 git://xenbits.xen.org/people/gdunlap/xen.git out/pvh-v12

This series is does *not* have the toolstack side patches; those
patches must be applied, and in addition, must be modified to add
XEN_DOMCTL_CDF_pvh_guest in addition to _hap for PVH guests.  (I'll
probably include those in this series at some point, but I wanted to
get a first draft out there.)

The kernel code for PVH guests can be found here:
 git://oss.oracle.com/git/mrathor/linux.git pvh.v9-muk-1
(That repo/branch also contains a config file, pvh-config-file)

For those who have been following the series as it develops, here is a
summary of the major changes from Mukesh's series:

* Introduction of "has_hvm_container_*()" macros, rather than using
  "!is_pv_*".  The patch which introduces this also does the vast
  majority of the "heavy lifting" in terms of defining PVH.

* Effort is made to use as much common code as possible.  No separate
  vmcs constructor, no separate vmexit handlers.  More of a "start
  with everything and disable if necessary" approach rather than
  "start with nothing and enable as needed" approach.

* One exception is arch_set_info_guest(), where a small amount of code
  duplication meant a lot fewer "if(!is_pvh_domain())"s in awkward
  places

* I rely on things being disabled at a higher level and passed down.
  For instance, I no longer explicitly disable rdtsc exiting in
  construct_vmcs(), since that will happen automatically when we're in
  NEVER_EMULATE mode (which is currently enforced for PVH).  Similarly
  for nested vmx and things relating to HAP mode.

* I have also done a slightly more extensive audit of is_pv_* and
  is_hvm_* and tried to do more restrictions.

* I changed the "enable PVH by setting PV + HAP", replacing it instead
  with a separate flag, just like the HVM case, since it makes sense
  to plan on using shadow in the future (although it is 

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
CC: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 01/21] Fix failure path in hvm_vcpu_initialise
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-13 16:25 ` [PATCH RFC v12 02/21] Fix failure path in construct_vmcs George Dunlap
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

It looks like one of the failure cases in hvm_vcpu_initialise jumps to
the wrong label; this could lead to slow leaks if something isn't
cleaned up properly.

I will probably change these labels in a future patch, but I figured
it was better to have this fix separately.

This is also a candidate for backport.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/hvm/hvm.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1fcaed0..1680a49 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1097,7 +1097,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
         /* Create bufioreq event channel. */
         rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL);
         if ( rc < 0 )
-            goto fail2;
+            goto fail4;
         d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = rc;
     }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 02/21] Fix failure path in construct_vmcs
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
  2013-09-13 16:25 ` [PATCH RFC v12 01/21] Fix failure path in hvm_vcpu_initialise George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-13 16:25 ` [PATCH RFC v12 03/21] Remove an unnecessary assert from vmx_update_debug_state George Dunlap
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

If the allocation fails, make sure to call vmx_vmcs_exit().

This is a candidate for backport.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/hvm/vmx/vmcs.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 79efb3f..0620f87 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -894,7 +894,10 @@ static int construct_vmcs(struct vcpu *v)
         unsigned long *msr_bitmap = alloc_xenheap_page();
 
         if ( msr_bitmap == NULL )
+        {
+            vmx_vmcs_exit(v);
             return -ENOMEM;
+        }
 
         memset(msr_bitmap, ~0, PAGE_SIZE);
         v->arch.hvm_vmx.msr_bitmap = msr_bitmap;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 03/21] Remove an unnecessary assert from vmx_update_debug_state
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
  2013-09-13 16:25 ` [PATCH RFC v12 01/21] Fix failure path in hvm_vcpu_initialise George Dunlap
  2013-09-13 16:25 ` [PATCH RFC v12 02/21] Fix failure path in construct_vmcs George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-16 21:09   ` Mukesh Rathor
  2013-09-13 16:25 ` [PATCH RFC v12 04/21] pvh prep: code motion George Dunlap
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

As far as I can tell, there's nothing here that requires v to be
current.  vmx_update_exception_bitmap() is updated in other situations
where v!=current.

Removing this allows the PVH code to call this during vmcs
construction in a later patch, making the code more robust by removing
duplicate code.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
CC: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/hvm/vmx/vmx.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 8ed7026..f02e47a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1041,8 +1041,6 @@ void vmx_update_debug_state(struct vcpu *v)
 {
     unsigned long mask;
 
-    ASSERT(v == current);
-
     mask = 1u << TRAP_int3;
     if ( !cpu_has_monitor_trap_flag )
         mask |= 1u << TRAP_debug;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 04/21] pvh prep: code motion
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (2 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 03/21] Remove an unnecessary assert from vmx_update_debug_state George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-18 12:59   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros George Dunlap
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

There are many functions where PVH requires some code in common with
HVM.  Rearrange some of these functions so that the code is together.

In general, the HVM code that PVH also uses includes:
 - cacheattr functionality
 - paging
 - hvm_funcs
 - hvm_assert_evtchn_irq tasklet
 - tm_list

And code that PVH shares with PV but not with PVH:
 - updating the domain wallclock
 - setting v->is_initialized

There should be no end-to-end changes in behavior.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/domain.c  |   11 +++---
 xen/arch/x86/hvm/hvm.c |   89 +++++++++++++++++++++++++-----------------------
 2 files changed, 52 insertions(+), 48 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 874742c..a9e2383 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -945,17 +945,16 @@ int arch_set_info_guest(
 
     clear_bit(_VPF_in_reset, &v->pause_flags);
 
-    if ( v->vcpu_id == 0 )
-        update_domain_wallclock_time(d);
-
-    /* Don't redo final setup */
-    v->is_initialised = 1;
-
     if ( paging_mode_enabled(d) )
         paging_update_paging_modes(v);
 
     update_cr3(v);
 
+    if ( v->vcpu_id == 0 )
+        update_domain_wallclock_time(d);
+
+    /* Don't redo final setup */
+    v->is_initialised = 1;
  out:
     if ( flags & VGCF_online )
         clear_bit(_VPF_down, &v->pause_flags);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1680a49..1764b78 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -525,6 +525,12 @@ int hvm_domain_initialise(struct domain *d)
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
+    hvm_init_cacheattr_region_list(d);
+
+    rc = paging_enable(d, PG_refcounts|PG_translate|PG_external);
+    if ( rc != 0 )
+        goto fail0;
+
     INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
     spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock);
 
@@ -534,7 +540,7 @@ int hvm_domain_initialise(struct domain *d)
     rc = -ENOMEM;
     if ( !d->arch.hvm_domain.pbuf || !d->arch.hvm_domain.params ||
          !d->arch.hvm_domain.io_handler )
-        goto fail0;
+        goto fail1;
     d->arch.hvm_domain.io_handler->num_slot = 0;
 
     hvm_init_guest_time(d);
@@ -542,12 +548,6 @@ int hvm_domain_initialise(struct domain *d)
     d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1;
     d->arch.hvm_domain.params[HVM_PARAM_TRIPLE_FAULT_REASON] = SHUTDOWN_reboot;
 
-    hvm_init_cacheattr_region_list(d);
-
-    rc = paging_enable(d, PG_refcounts|PG_translate|PG_external);
-    if ( rc != 0 )
-        goto fail1;
-
     vpic_init(d);
 
     rc = vioapic_init(d);
@@ -574,11 +574,11 @@ int hvm_domain_initialise(struct domain *d)
     stdvga_deinit(d);
     vioapic_deinit(d);
  fail1:
-    hvm_destroy_cacheattr_region_list(d);
- fail0:
     xfree(d->arch.hvm_domain.io_handler);
     xfree(d->arch.hvm_domain.params);
     xfree(d->arch.hvm_domain.pbuf);
+ fail0:
+    hvm_destroy_cacheattr_region_list(d);
     return rc;
 }
 
@@ -608,11 +608,11 @@ void hvm_domain_relinquish_resources(struct domain *d)
 
 void hvm_domain_destroy(struct domain *d)
 {
+    hvm_destroy_cacheattr_region_list(d);
     hvm_funcs.domain_destroy(d);
     rtc_deinit(d);
     stdvga_deinit(d);
     vioapic_deinit(d);
-    hvm_destroy_cacheattr_region_list(d);
 }
 
 static int hvm_save_tsc_adjust(struct domain *d, hvm_domain_context_t *h)
@@ -1070,24 +1070,46 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    domid_t dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+    domid_t dm_domid;
 
     hvm_asid_flush_vcpu(v);
 
-    if ( (rc = vlapic_init(v)) != 0 )
+    spin_lock_init(&v->arch.hvm_vcpu.tm_lock);
+    INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list);
+
+    rc = hvm_vcpu_cacheattr_init(v); /* t */
+    if ( rc != 0 )
         goto fail1;
 
-    if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 )
+    if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 ) /* teardown: hvm_funcs.vcpu_destroy */
         goto fail2;
 
-    if ( nestedhvm_enabled(d) 
-         && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) 
+    softirq_tasklet_init(
+        &v->arch.hvm_vcpu.assert_evtchn_irq_tasklet,
+        (void(*)(unsigned long))hvm_assert_evtchn_irq,
+        (unsigned long)v);
+
+    v->arch.user_regs.eflags = 2;
+
+    v->arch.hvm_vcpu.inject_trap.vector = -1;
+
+    rc = setup_compat_arg_xlat(v); /* teardown: free_compat_arg_xlat() */
+    if ( rc != 0 )
         goto fail3;
 
+    if ( (rc = vlapic_init(v)) != 0 ) /* teardown: vlapic_destroy */
+        goto fail4;
+
+    if ( nestedhvm_enabled(d) 
+         && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
+        goto fail5;
+
+    dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+
     /* Create ioreq event channel. */
-    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL);
+    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
     if ( rc < 0 )
-        goto fail4;
+        goto fail6;
 
     /* Register ioreq event channel. */
     v->arch.hvm_vcpu.xen_port = rc;
@@ -1095,9 +1117,9 @@ int hvm_vcpu_initialise(struct vcpu *v)
     if ( v->vcpu_id == 0 )
     {
         /* Create bufioreq event channel. */
-        rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL);
+        rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
         if ( rc < 0 )
-            goto fail4;
+            goto fail6;
         d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = rc;
     }
 
@@ -1106,26 +1128,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
         get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
     spin_unlock(&d->arch.hvm_domain.ioreq.lock);
 
-    spin_lock_init(&v->arch.hvm_vcpu.tm_lock);
-    INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list);
-
-    v->arch.hvm_vcpu.inject_trap.vector = -1;
-
-    rc = setup_compat_arg_xlat(v);
-    if ( rc != 0 )
-        goto fail4;
-
-    rc = hvm_vcpu_cacheattr_init(v);
-    if ( rc != 0 )
-        goto fail5;
-
-    softirq_tasklet_init(
-        &v->arch.hvm_vcpu.assert_evtchn_irq_tasklet,
-        (void(*)(unsigned long))hvm_assert_evtchn_irq,
-        (unsigned long)v);
-
-    v->arch.user_regs.eflags = 2;
-
     if ( v->vcpu_id == 0 )
     {
         /* NB. All these really belong in hvm_domain_initialise(). */
@@ -1143,14 +1145,17 @@ int hvm_vcpu_initialise(struct vcpu *v)
 
     return 0;
 
+ fail6:
+    nestedhvm_vcpu_destroy(v);
  fail5:
-    free_compat_arg_xlat(v);
+    vlapic_destroy(v);
  fail4:
-    nestedhvm_vcpu_destroy(v);
+    free_compat_arg_xlat(v);
  fail3:
+    tasklet_kill(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet);
     hvm_funcs.vcpu_destroy(v);
  fail2:
-    vlapic_destroy(v);
+    hvm_vcpu_cacheattr_destroy(v);
  fail1:
     return rc;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (3 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 04/21] pvh prep: code motion George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-18 13:46   ` Jan Beulich
  2013-09-19 16:58   ` George Dunlap
  2013-09-13 16:25 ` [PATCH RFC v12 06/21] pvh: Introduce PVH guest type George Dunlap
                   ` (16 subsequent siblings)
  21 siblings, 2 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

The goal of this patch is to classify conditionals more clearly, as to
whether they relate to pv guests, hvm-only guests, or guests with an
"hvm container" (which will eventually include PVH).

This patch introduces an enum for guest type, as well as two new macros
for switching behavior on and off: is_pv_* and has_hvm_container_*.

In general, a switch should use is_pv_* (or !is_pv_*) if the code in question
relates directly to a PV guest.  Examples include use of pv_vcpu structs or
other behavior directly related to PV domains.

hvm_container is more of a fuzzy concept, but in general:

* Most core HVM behavior will be included in this.  Behavior not
appropriate for PVH mode will be disabled in later patches

* Hypercalls related to HVM guests will *not* be included by default;
functionality needed by PVH guests will be enabled in future patches

* The following functionality are not considered part of the HVM
container, and PVH will end up behaving like PV by default: Event
channel, vtsc offset, code related to emulated timers, nested HVM,
emuirq, PoD

* Some features are left to implement for PVH later: vpmu, shadow mode

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/acpi/suspend.c        |    2 +-
 xen/arch/x86/cpu/mcheck/vmce.c     |    6 ++--
 xen/arch/x86/debug.c               |    2 +-
 xen/arch/x86/domain.c              |   54 ++++++++++++++++++------------------
 xen/arch/x86/domain_page.c         |   10 +++----
 xen/arch/x86/domctl.c              |   10 +++----
 xen/arch/x86/efi/runtime.c         |    4 +--
 xen/arch/x86/hvm/vmx/vmcs.c        |    4 +--
 xen/arch/x86/mm.c                  |    6 ++--
 xen/arch/x86/mm/shadow/common.c    |    6 ++--
 xen/arch/x86/mm/shadow/multi.c     |    7 +++--
 xen/arch/x86/physdev.c             |    4 +--
 xen/arch/x86/traps.c               |    5 ++--
 xen/arch/x86/x86_64/traps.c        |    8 +++---
 xen/common/domain.c                |    2 +-
 xen/common/grant_table.c           |    4 +--
 xen/common/kernel.c                |    2 +-
 xen/include/asm-x86/domain.h       |    2 +-
 xen/include/asm-x86/event.h        |    2 +-
 xen/include/asm-x86/guest_access.h |   12 ++++----
 xen/include/asm-x86/guest_pt.h     |    4 +--
 xen/include/xen/sched.h            |   14 ++++++++--
 xen/include/xen/tmem_xen.h         |    2 +-
 23 files changed, 91 insertions(+), 81 deletions(-)

diff --git a/xen/arch/x86/acpi/suspend.c b/xen/arch/x86/acpi/suspend.c
index c690b45..2b7aa3b 100644
--- a/xen/arch/x86/acpi/suspend.c
+++ b/xen/arch/x86/acpi/suspend.c
@@ -78,7 +78,7 @@ void restore_rest_processor_state(void)
     }
 
     /* Maybe load the debug registers. */
-    BUG_ON(is_hvm_vcpu(curr));
+    BUG_ON(has_hvm_container_vcpu(curr));
     if ( !is_idle_vcpu(curr) && curr->arch.debugreg[7] )
     {
         write_debugreg(0, curr->arch.debugreg[0]);
diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index af3b491..f6c35db 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -83,7 +83,7 @@ int vmce_restore_vcpu(struct vcpu *v, const struct hvm_vmce_vcpu *ctxt)
     {
         dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA capabilities"
                 " %#" PRIx64 " for d%d:v%u (supported: %#Lx)\n",
-                is_hvm_vcpu(v) ? "HVM" : "PV", ctxt->caps,
+                has_hvm_container_vcpu(v) ? "HVM" : "PV", ctxt->caps,
                 v->domain->domain_id, v->vcpu_id,
                 guest_mcg_cap & ~MCG_CAP_COUNT);
         return -EPERM;
@@ -357,7 +357,7 @@ int inject_vmce(struct domain *d, int vcpu)
         if ( vcpu != VMCE_INJECT_BROADCAST && vcpu != v->vcpu_id )
             continue;
 
-        if ( (is_hvm_domain(d) ||
+        if ( (has_hvm_container_domain(d) ||
               guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) &&
              !test_and_set_bool(v->mce_pending) )
         {
@@ -439,7 +439,7 @@ int unmmap_broken_page(struct domain *d, mfn_t mfn, unsigned long gfn)
     if (!mfn_valid(mfn_x(mfn)))
         return -EINVAL;
 
-    if ( !is_hvm_domain(d) || !paging_mode_hap(d) )
+    if ( !has_hvm_container_domain(d) || !paging_mode_hap(d) )
         return -ENOSYS;
 
     rc = -1;
diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c
index e67473e..3e21ca8 100644
--- a/xen/arch/x86/debug.c
+++ b/xen/arch/x86/debug.c
@@ -158,7 +158,7 @@ dbg_rw_guest_mem(dbgva_t addr, dbgbyte_t *buf, int len, struct domain *dp,
 
         pagecnt = min_t(long, PAGE_SIZE - (addr & ~PAGE_MASK), len);
 
-        mfn = (dp->is_hvm
+        mfn = (has_hvm_container_domain(dp)
                ? dbg_hvm_va2mfn(addr, dp, toaddr, &gfn)
                : dbg_pv_va2mfn(addr, dp, pgd3));
         if ( mfn == INVALID_MFN ) 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a9e2383..5c38cb1 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -167,7 +167,7 @@ void dump_pageframe_info(struct domain *d)
         spin_unlock(&d->page_alloc_lock);
     }
 
-    if ( is_hvm_domain(d) )
+    if ( has_hvm_container_domain(d) )
         p2m_pod_dump_data(d);
 
     spin_lock(&d->page_alloc_lock);
@@ -385,7 +385,7 @@ int vcpu_initialise(struct vcpu *v)
 
     vmce_init_vcpu(v);
 
-    if ( is_hvm_domain(d) )
+    if ( has_hvm_container_domain(d) )
     {
         rc = hvm_vcpu_initialise(v);
         goto done;
@@ -438,7 +438,7 @@ int vcpu_initialise(struct vcpu *v)
     {
         vcpu_destroy_fpu(v);
 
-        if ( !is_hvm_domain(d) )
+        if ( is_pv_domain(d) )
             xfree(v->arch.pv_vcpu.trap_ctxt);
     }
 
@@ -452,7 +452,7 @@ void vcpu_destroy(struct vcpu *v)
 
     vcpu_destroy_fpu(v);
 
-    if ( is_hvm_vcpu(v) )
+    if ( has_hvm_container_vcpu(v) )
         hvm_vcpu_destroy(v);
     else
         xfree(v->arch.pv_vcpu.trap_ctxt);
@@ -464,7 +464,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     int rc = -ENOMEM;
 
     d->arch.hvm_domain.hap_enabled =
-        is_hvm_domain(d) &&
+        has_hvm_container_domain(d) &&
         hvm_funcs.hap_supported &&
         (domcr_flags & DOMCRF_hap);
     d->arch.hvm_domain.mem_sharing_enabled = 0;
@@ -490,7 +490,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
                d->domain_id);
     }
 
-    if ( is_hvm_domain(d) )
+    if ( has_hvm_container_domain(d) )
         rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL);
     else if ( is_idle_domain(d) )
         rc = 0;
@@ -512,7 +512,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     mapcache_domain_init(d);
 
     HYPERVISOR_COMPAT_VIRT_START(d) =
-        is_hvm_domain(d) ? ~0u : __HYPERVISOR_COMPAT_VIRT_START;
+        is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
 
     if ( (rc = paging_domain_init(d, domcr_flags)) != 0 )
         goto fail;
@@ -554,7 +554,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
             goto fail;
     }
 
-    if ( is_hvm_domain(d) )
+    if ( has_hvm_container_domain(d) )
     {
         if ( (rc = hvm_domain_initialise(d)) != 0 )
         {
@@ -583,14 +583,14 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     if ( paging_initialised )
         paging_final_teardown(d);
     free_perdomain_mappings(d);
-    if ( !is_hvm_domain(d) )
+    if ( is_pv_domain(d) )
         free_xenheap_page(d->arch.pv_domain.gdt_ldt_l1tab);
     return rc;
 }
 
 void arch_domain_destroy(struct domain *d)
 {
-    if ( is_hvm_domain(d) )
+    if ( has_hvm_container_domain(d) )
         hvm_domain_destroy(d);
     else
         xfree(d->arch.pv_domain.e820);
@@ -602,7 +602,7 @@ void arch_domain_destroy(struct domain *d)
     paging_final_teardown(d);
 
     free_perdomain_mappings(d);
-    if ( !is_hvm_domain(d) )
+    if ( is_pv_domain(d) )
         free_xenheap_page(d->arch.pv_domain.gdt_ldt_l1tab);
 
     free_xenheap_page(d->shared_info);
@@ -653,7 +653,7 @@ int arch_set_info_guest(
 #define c(fld) (compat ? (c.cmp->fld) : (c.nat->fld))
     flags = c(flags);
 
-    if ( !is_hvm_vcpu(v) )
+    if ( is_pv_vcpu(v) )
     {
         if ( !compat )
         {
@@ -706,7 +706,7 @@ int arch_set_info_guest(
     v->fpu_initialised = !!(flags & VGCF_I387_VALID);
 
     v->arch.flags &= ~TF_kernel_mode;
-    if ( (flags & VGCF_in_kernel) || is_hvm_vcpu(v)/*???*/ )
+    if ( (flags & VGCF_in_kernel) || has_hvm_container_vcpu(v)/*???*/ )
         v->arch.flags |= TF_kernel_mode;
 
     v->arch.vgc_flags = flags;
@@ -721,7 +721,7 @@ int arch_set_info_guest(
     if ( !compat )
     {
         memcpy(&v->arch.user_regs, &c.nat->user_regs, sizeof(c.nat->user_regs));
-        if ( !is_hvm_vcpu(v) )
+        if ( is_pv_vcpu(v) )
             memcpy(v->arch.pv_vcpu.trap_ctxt, c.nat->trap_ctxt,
                    sizeof(c.nat->trap_ctxt));
     }
@@ -737,7 +737,7 @@ int arch_set_info_guest(
 
     v->arch.user_regs.eflags |= 2;
 
-    if ( is_hvm_vcpu(v) )
+    if ( has_hvm_container_vcpu(v) )
     {
         hvm_set_info_guest(v);
         goto out;
@@ -966,7 +966,7 @@ int arch_set_info_guest(
 
 int arch_vcpu_reset(struct vcpu *v)
 {
-    if ( !is_hvm_vcpu(v) )
+    if ( is_pv_vcpu(v) )
     {
         destroy_gdt(v);
         return vcpu_destroy_pagetables(v);
@@ -1316,7 +1316,7 @@ static void update_runstate_area(struct vcpu *v)
 
 static inline int need_full_gdt(struct vcpu *v)
 {
-    return (!is_hvm_vcpu(v) && !is_idle_vcpu(v));
+    return (is_pv_vcpu(v) && !is_idle_vcpu(v));
 }
 
 static void __context_switch(void)
@@ -1438,9 +1438,9 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     {
         __context_switch();
 
-        if ( !is_hvm_vcpu(next) &&
+        if ( is_pv_vcpu(next) &&
              (is_idle_vcpu(prev) ||
-              is_hvm_vcpu(prev) ||
+              has_hvm_container_vcpu(prev) ||
               is_pv_32on64_vcpu(prev) != is_pv_32on64_vcpu(next)) )
         {
             uint64_t efer = read_efer();
@@ -1451,13 +1451,13 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
         /* Re-enable interrupts before restoring state which may fault. */
         local_irq_enable();
 
-        if ( !is_hvm_vcpu(next) )
+        if ( is_pv_vcpu(next) )
         {
             load_LDT(next);
             load_segments(next);
         }
 
-        set_cpuid_faulting(!is_hvm_vcpu(next) &&
+        set_cpuid_faulting(is_pv_vcpu(next) &&
                            (next->domain->domain_id != 0));
     }
 
@@ -1540,7 +1540,7 @@ void hypercall_cancel_continuation(void)
     }
     else
     {
-        if ( !is_hvm_vcpu(current) )
+        if ( is_pv_vcpu(current) )
             regs->eip += 2; /* skip re-execute 'syscall' / 'int $xx' */
         else
             current->arch.hvm_vcpu.hcall_preempted = 0;
@@ -1577,12 +1577,12 @@ unsigned long hypercall_create_continuation(
         regs->eax  = op;
 
         /* Ensure the hypercall trap instruction is re-executed. */
-        if ( !is_hvm_vcpu(current) )
+        if ( is_pv_vcpu(current) )
             regs->eip -= 2;  /* re-execute 'syscall' / 'int $xx' */
         else
             current->arch.hvm_vcpu.hcall_preempted = 1;
 
-        if ( !is_hvm_vcpu(current) ?
+        if ( is_pv_vcpu(current) ?
              !is_pv_32on64_vcpu(current) :
              (hvm_guest_x86_mode(current) == 8) )
         {
@@ -1850,7 +1850,7 @@ int domain_relinquish_resources(struct domain *d)
                 return ret;
         }
 
-        if ( !is_hvm_domain(d) )
+        if ( is_pv_domain(d) )
         {
             for_each_vcpu ( d, v )
             {
@@ -1923,7 +1923,7 @@ int domain_relinquish_resources(struct domain *d)
         BUG();
     }
 
-    if ( is_hvm_domain(d) )
+    if ( has_hvm_container_domain(d) )
         hvm_domain_relinquish_resources(d);
 
     return 0;
@@ -2007,7 +2007,7 @@ void vcpu_mark_events_pending(struct vcpu *v)
     if ( already_pending )
         return;
 
-    if ( is_hvm_vcpu(v) )
+    if ( has_hvm_container_vcpu(v) )
         hvm_assert_evtchn_irq(v);
     else
         vcpu_kick(v);
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index bc18263..b9db322 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -35,7 +35,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
      * then it means we are running on the idle domain's page table and must
      * therefore use its mapcache.
      */
-    if ( unlikely(pagetable_is_null(v->arch.guest_table)) && !is_hvm_vcpu(v) )
+    if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) )
     {
         /* If we really are idling, perform lazy context switch now. */
         if ( (v = idle_vcpu[smp_processor_id()]) == current )
@@ -72,7 +72,7 @@ void *map_domain_page(unsigned long mfn)
 #endif
 
     v = mapcache_current_vcpu();
-    if ( !v || is_hvm_vcpu(v) )
+    if ( !v || has_hvm_container_vcpu(v) )
         return mfn_to_virt(mfn);
 
     dcache = &v->domain->arch.pv_domain.mapcache;
@@ -177,7 +177,7 @@ void unmap_domain_page(const void *ptr)
     ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
 
     v = mapcache_current_vcpu();
-    ASSERT(v && !is_hvm_vcpu(v));
+    ASSERT(v && is_pv_vcpu(v));
 
     dcache = &v->domain->arch.pv_domain.mapcache;
     ASSERT(dcache->inuse);
@@ -244,7 +244,7 @@ int mapcache_domain_init(struct domain *d)
     struct mapcache_domain *dcache = &d->arch.pv_domain.mapcache;
     unsigned int bitmap_pages;
 
-    if ( is_hvm_domain(d) || is_idle_domain(d) )
+    if ( has_hvm_container_domain(d) || is_idle_domain(d) )
         return 0;
 
 #ifdef NDEBUG
@@ -275,7 +275,7 @@ int mapcache_vcpu_init(struct vcpu *v)
     unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
     unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long));
 
-    if ( is_hvm_vcpu(v) || !dcache->inuse )
+    if ( has_hvm_container_vcpu(v) || !dcache->inuse )
         return 0;
 
     if ( ents > dcache->entries )
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index c2a04c4..e0e3fe7 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -800,7 +800,7 @@ long arch_do_domctl(
         if ( domctl->cmd == XEN_DOMCTL_get_ext_vcpucontext )
         {
             evc->size = sizeof(*evc);
-            if ( !is_hvm_domain(d) )
+            if ( is_pv_domain(d) )
             {
                 evc->sysenter_callback_cs      =
                     v->arch.pv_vcpu.sysenter_callback_cs;
@@ -833,7 +833,7 @@ long arch_do_domctl(
             ret = -EINVAL;
             if ( evc->size < offsetof(typeof(*evc), vmce) )
                 goto ext_vcpucontext_out;
-            if ( !is_hvm_domain(d) )
+            if ( is_pv_domain(d) )
             {
                 if ( !is_canonical_address(evc->sysenter_callback_eip) ||
                      !is_canonical_address(evc->syscall32_callback_eip) )
@@ -1237,7 +1237,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
     bool_t compat = is_pv_32on64_domain(v->domain);
 #define c(fld) (!compat ? (c.nat->fld) : (c.cmp->fld))
 
-    if ( is_hvm_vcpu(v) )
+    if ( has_hvm_container_vcpu(v) )
         memset(c.nat, 0, sizeof(*c.nat));
     memcpy(&c.nat->fpu_ctxt, v->arch.fpu_ctxt, sizeof(c.nat->fpu_ctxt));
     c(flags = v->arch.vgc_flags & ~(VGCF_i387_valid|VGCF_in_kernel));
@@ -1248,7 +1248,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
     if ( !compat )
     {
         memcpy(&c.nat->user_regs, &v->arch.user_regs, sizeof(c.nat->user_regs));
-        if ( !is_hvm_vcpu(v) )
+        if ( is_pv_vcpu(v) )
             memcpy(c.nat->trap_ctxt, v->arch.pv_vcpu.trap_ctxt,
                    sizeof(c.nat->trap_ctxt));
     }
@@ -1263,7 +1263,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
     for ( i = 0; i < ARRAY_SIZE(v->arch.debugreg); ++i )
         c(debugreg[i] = v->arch.debugreg[i]);
 
-    if ( is_hvm_vcpu(v) )
+    if ( has_hvm_container_vcpu(v) )
     {
         struct segment_register sreg;
 
diff --git a/xen/arch/x86/efi/runtime.c b/xen/arch/x86/efi/runtime.c
index 37bb535..d7c884b 100644
--- a/xen/arch/x86/efi/runtime.c
+++ b/xen/arch/x86/efi/runtime.c
@@ -52,7 +52,7 @@ unsigned long efi_rs_enter(void)
     /* prevent fixup_page_fault() from doing anything */
     irq_enter();
 
-    if ( !is_hvm_vcpu(current) && !is_idle_vcpu(current) )
+    if ( is_pv_vcpu(current) && !is_idle_vcpu(current) )
     {
         struct desc_ptr gdt_desc = {
             .limit = LAST_RESERVED_GDT_BYTE,
@@ -71,7 +71,7 @@ unsigned long efi_rs_enter(void)
 void efi_rs_leave(unsigned long cr3)
 {
     write_cr3(cr3);
-    if ( !is_hvm_vcpu(current) && !is_idle_vcpu(current) )
+    if ( is_pv_vcpu(current) && !is_idle_vcpu(current) )
     {
         struct desc_ptr gdt_desc = {
             .limit = LAST_RESERVED_GDT_BYTE,
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 0620f87..7087630 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -637,7 +637,7 @@ void vmx_vmcs_exit(struct vcpu *v)
     {
         /* Don't confuse vmx_do_resume (for @v or @current!) */
         vmx_clear_vmcs(v);
-        if ( is_hvm_vcpu(current) )
+        if ( has_hvm_container_vcpu(current) )
             vmx_load_vmcs(current);
 
         spin_unlock(&v->arch.hvm_vmx.vmcs_lock);
@@ -1477,7 +1477,7 @@ static void vmcs_dump(unsigned char ch)
 
     for_each_domain ( d )
     {
-        if ( !is_hvm_domain(d) )
+        if ( !has_hvm_container_domain(d) )
             continue;
         printk("\n>>> Domain %d <<<\n", d->domain_id);
         for_each_vcpu ( d, v )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e7f0e13..120599a 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -181,7 +181,7 @@ static uint32_t base_disallow_mask;
      (rangeset_is_empty((d)->iomem_caps) &&                     \
       rangeset_is_empty((d)->arch.ioport_caps) &&               \
       !has_arch_pdevs(d) &&                                     \
-      !is_hvm_domain(d)) ?                                      \
+      is_pv_domain(d)) ?                                        \
      L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
 
 static void __init init_frametable_chunk(void *start, void *end)
@@ -433,7 +433,7 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
 
 unsigned long domain_get_maximum_gpfn(struct domain *d)
 {
-    if ( is_hvm_domain(d) )
+    if ( has_hvm_container_domain(d) )
         return p2m_get_hostp2m(d)->max_mapped_pfn;
     /* NB. PV guests specify nr_pfns rather than max_pfn so we adjust here. */
     return (arch_get_max_pfn(d) ?: 1) - 1;
@@ -2379,7 +2379,7 @@ static int __get_page_type(struct page_info *page, unsigned long type,
     {
         /* Special pages should not be accessible from devices. */
         struct domain *d = page_get_owner(page);
-        if ( d && !is_hvm_domain(d) && unlikely(need_iommu(d)) )
+        if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
         {
             if ( (x & PGT_type_mask) == PGT_writable_page )
                 iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index adffa06..0bfa595 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -367,7 +367,7 @@ const struct x86_emulate_ops *shadow_init_emulation(
     sh_ctxt->ctxt.regs = regs;
     sh_ctxt->ctxt.force_writeback = 0;
 
-    if ( !is_hvm_vcpu(v) )
+    if ( is_pv_vcpu(v) )
     {
         sh_ctxt->ctxt.addr_size = sh_ctxt->ctxt.sp_size = BITS_PER_LONG;
         return &pv_shadow_emulator_ops;
@@ -964,7 +964,7 @@ int sh_unsync(struct vcpu *v, mfn_t gmfn)
     if ( pg->shadow_flags & 
          ((SHF_page_type_mask & ~SHF_L1_ANY) | SHF_out_of_sync) 
          || sh_page_has_multiple_shadows(pg)
-         || !is_hvm_domain(v->domain)
+         || is_pv_domain(v->domain)
          || !v->domain->arch.paging.shadow.oos_active )
         return 0;
 
@@ -2753,7 +2753,7 @@ static void sh_update_paging_modes(struct vcpu *v)
     if ( v->arch.paging.mode )
         v->arch.paging.mode->shadow.detach_old_tables(v);
 
-    if ( !is_hvm_domain(d) )
+    if ( is_pv_domain(d) )
     {
         ///
         /// PV guest
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 4c4c2ba..a17386f 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -711,8 +711,9 @@ _sh_propagate(struct vcpu *v,
     // PV guests in 64-bit mode use two different page tables for user vs
     // supervisor permissions, making the guest's _PAGE_USER bit irrelevant.
     // It is always shadowed as present...
-    if ( (GUEST_PAGING_LEVELS == 4) && !is_pv_32on64_domain(d) 
-         && !is_hvm_domain(d) )
+    if ( (GUEST_PAGING_LEVELS == 4)
+         && !is_pv_32on64_domain(d) 
+         && is_pv_domain(d) )
     {
         sflags |= _PAGE_USER;
     }
@@ -3918,7 +3919,7 @@ sh_update_cr3(struct vcpu *v, int do_locking)
 #endif
 
     /* Don't do anything on an uninitialised vcpu */
-    if ( !is_hvm_domain(d) && !v->is_initialised )
+    if ( is_pv_domain(d) && !v->is_initialised )
     {
         ASSERT(v->arch.cr3 == 0);
         return;
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 4835ed7..dab6213 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -310,10 +310,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             spin_unlock(&v->domain->event_lock);
             break;
         }
-        if ( !is_hvm_domain(v->domain) &&
+        if ( is_pv_domain(v->domain) &&
              v->domain->arch.pv_domain.auto_unmask )
             evtchn_unmask(pirq->evtchn);
-        if ( !is_hvm_domain(v->domain) ||
+        if ( is_pv_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
             pirq_guest_eoi(pirq);
         if ( is_hvm_domain(v->domain) &&
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index b445b2f..e4f080c 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -119,6 +119,7 @@ static void show_guest_stack(struct vcpu *v, struct cpu_user_regs *regs)
     unsigned long *stack, addr;
     unsigned long mask = STACK_SIZE;
 
+    /* Avoid HVM as we don't know what the stack looks like */
     if ( is_hvm_vcpu(v) )
         return;
 
@@ -555,7 +556,7 @@ static inline void do_trap(
     }
 
     if ( ((trapnr == TRAP_copro_error) || (trapnr == TRAP_simd_error)) &&
-         is_hvm_vcpu(curr) && curr->arch.hvm_vcpu.fpu_exception_callback )
+         has_hvm_container_vcpu(curr) && curr->arch.hvm_vcpu.fpu_exception_callback )
     {
         curr->arch.hvm_vcpu.fpu_exception_callback(
             curr->arch.hvm_vcpu.fpu_exception_callback_arg, regs);
@@ -721,7 +722,7 @@ int cpuid_hypervisor_leaves( uint32_t idx, uint32_t sub_idx,
             *ebx = 0x40000200;
         *ecx = 0;          /* Features 1 */
         *edx = 0;          /* Features 2 */
-        if ( !is_hvm_vcpu(current) )
+        if ( !has_hvm_container_vcpu(current) )
             *ecx |= XEN_CPUID_FEAT1_MMU_PT_UPDATE_PRESERVE_AD;
         break;
 
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 1cc977c..ef5c470 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -86,7 +86,7 @@ void show_registers(struct cpu_user_regs *regs)
     enum context context;
     struct vcpu *v = current;
 
-    if ( is_hvm_vcpu(v) && guest_mode(regs) )
+    if ( has_hvm_container_vcpu(v) && guest_mode(regs) )
     {
         struct segment_register sreg;
         context = CTXT_hvm_guest;
@@ -147,8 +147,8 @@ void vcpu_show_registers(const struct vcpu *v)
     const struct cpu_user_regs *regs = &v->arch.user_regs;
     unsigned long crs[8];
 
-    /* No need to handle HVM for now. */
-    if ( is_hvm_vcpu(v) )
+    /* Only handle PV guests for now */
+    if ( !is_pv_vcpu(v) )
         return;
 
     crs[0] = v->arch.pv_vcpu.ctrlreg[0];
@@ -622,7 +622,7 @@ static void hypercall_page_initialise_ring3_kernel(void *hypercall_page)
 void hypercall_page_initialise(struct domain *d, void *hypercall_page)
 {
     memset(hypercall_page, 0xCC, PAGE_SIZE);
-    if ( is_hvm_domain(d) )
+    if ( has_hvm_container_domain(d) )
         hvm_hypercall_page_initialise(d, hypercall_page);
     else if ( !is_pv_32bit_domain(d) )
         hypercall_page_initialise_ring3_kernel(hypercall_page);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 6c264a5..38b1bad 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -236,7 +236,7 @@ struct domain *domain_create(
         goto fail;
 
     if ( domcr_flags & DOMCRF_hvm )
-        d->is_hvm = 1;
+        d->guest_type = guest_type_hvm;
 
     if ( domid == 0 )
     {
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index eb50288..c51da30 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -721,7 +721,7 @@ __gnttab_map_grant_ref(
 
     double_gt_lock(lgt, rgt);
 
-    if ( !is_hvm_domain(ld) && need_iommu(ld) )
+    if ( is_pv_domain(ld) && need_iommu(ld) )
     {
         unsigned int wrc, rdc;
         int err = 0;
@@ -932,7 +932,7 @@ __gnttab_unmap_common(
             act->pin -= GNTPIN_hstw_inc;
     }
 
-    if ( !is_hvm_domain(ld) && need_iommu(ld) )
+    if ( is_pv_domain(ld) && need_iommu(ld) )
     {
         unsigned int wrc, rdc;
         int err = 0;
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 72fb905..f6aafad 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -289,7 +289,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             if ( current->domain == dom0 )
                 fi.submap |= 1U << XENFEAT_dom0;
 #ifdef CONFIG_X86
-            if ( !is_hvm_vcpu(current) )
+            if ( is_pv_vcpu(current) )
                 fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
                              (1U << XENFEAT_highmem_assist) |
                              (1U << XENFEAT_gnttab_map_avail_bits);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index d79464d..4cbad4e 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -16,7 +16,7 @@
 #define is_pv_32on64_domain(d) (is_pv_32bit_domain(d))
 #define is_pv_32on64_vcpu(v)   (is_pv_32on64_domain((v)->domain))
 
-#define is_hvm_pv_evtchn_domain(d) (is_hvm_domain(d) && \
+#define is_hvm_pv_evtchn_domain(d) (has_hvm_container_domain(d) && \
         d->arch.hvm_domain.irq.callback_via_type == HVMIRQ_callback_vector)
 #define is_hvm_pv_evtchn_vcpu(v) (is_hvm_pv_evtchn_domain(v->domain))
 
diff --git a/xen/include/asm-x86/event.h b/xen/include/asm-x86/event.h
index 06057c7..7600cf4 100644
--- a/xen/include/asm-x86/event.h
+++ b/xen/include/asm-x86/event.h
@@ -18,7 +18,7 @@ int hvm_local_events_need_delivery(struct vcpu *v);
 static inline int local_events_need_delivery(void)
 {
     struct vcpu *v = current;
-    return (is_hvm_vcpu(v) ? hvm_local_events_need_delivery(v) :
+    return (has_hvm_container_vcpu(v) ? hvm_local_events_need_delivery(v) :
             (vcpu_info(v, evtchn_upcall_pending) &&
              !vcpu_info(v, evtchn_upcall_mask)));
 }
diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
index ca700c9..88edb3f 100644
--- a/xen/include/asm-x86/guest_access.h
+++ b/xen/include/asm-x86/guest_access.h
@@ -14,27 +14,27 @@
 
 /* Raw access functions: no type checking. */
 #define raw_copy_to_guest(dst, src, len)        \
-    (is_hvm_vcpu(current) ?                     \
+    (has_hvm_container_vcpu(current) ?                     \
      copy_to_user_hvm((dst), (src), (len)) :    \
      copy_to_user((dst), (src), (len)))
 #define raw_copy_from_guest(dst, src, len)      \
-    (is_hvm_vcpu(current) ?                     \
+    (has_hvm_container_vcpu(current) ?                     \
      copy_from_user_hvm((dst), (src), (len)) :  \
      copy_from_user((dst), (src), (len)))
 #define raw_clear_guest(dst,  len)              \
-    (is_hvm_vcpu(current) ?                     \
+    (has_hvm_container_vcpu(current) ?                     \
      clear_user_hvm((dst), (len)) :             \
      clear_user((dst), (len)))
 #define __raw_copy_to_guest(dst, src, len)      \
-    (is_hvm_vcpu(current) ?                     \
+    (has_hvm_container_vcpu(current) ?                     \
      copy_to_user_hvm((dst), (src), (len)) :    \
      __copy_to_user((dst), (src), (len)))
 #define __raw_copy_from_guest(dst, src, len)    \
-    (is_hvm_vcpu(current) ?                     \
+    (has_hvm_container_vcpu(current) ?                     \
      copy_from_user_hvm((dst), (src), (len)) :  \
      __copy_from_user((dst), (src), (len)))
 #define __raw_clear_guest(dst,  len)            \
-    (is_hvm_vcpu(current) ?                     \
+    (has_hvm_container_vcpu(current) ?                     \
      clear_user_hvm((dst), (len)) :             \
      clear_user((dst), (len)))
 
diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index b62bc6a..d2a8250 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -196,7 +196,7 @@ guest_supports_superpages(struct vcpu *v)
     /* The _PAGE_PSE bit must be honoured in HVM guests, whenever
      * CR4.PSE is set or the guest is in PAE or long mode. 
      * It's also used in the dummy PT for vcpus with CR4.PG cleared. */
-    return (!is_hvm_vcpu(v)
+    return (is_pv_vcpu(v)
             ? opt_allow_superpage
             : (GUEST_PAGING_LEVELS != 2 
                || !hvm_paging_enabled(v)
@@ -214,7 +214,7 @@ guest_supports_nx(struct vcpu *v)
 {
     if ( GUEST_PAGING_LEVELS == 2 || !cpu_has_nx )
         return 0;
-    if ( !is_hvm_vcpu(v) )
+    if ( is_pv_vcpu(v) )
         return cpu_has_nx;
     return hvm_nx_enabled(v);
 }
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ae6a3b8..d395087 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -238,6 +238,10 @@ struct mem_event_per_domain
     struct mem_event_domain access;
 };
 
+enum guest_type {
+    guest_type_pv, guest_type_hvm
+};
+
 struct domain
 {
     domid_t          domain_id;
@@ -285,8 +289,8 @@ struct domain
     struct rangeset *iomem_caps;
     struct rangeset *irq_caps;
 
-    /* Is this an HVM guest? */
-    bool_t           is_hvm;
+    enum guest_type guest_type;
+
 #ifdef HAS_PASSTHROUGH
     /* Does this guest need iommu mappings? */
     bool_t           need_iommu;
@@ -732,8 +736,12 @@ void watchdog_domain_destroy(struct domain *d);
 
 #define VM_ASSIST(_d,_t) (test_bit((_t), &(_d)->vm_assist))
 
-#define is_hvm_domain(d) ((d)->is_hvm)
+#define is_pv_domain(d) ((d)->guest_type == guest_type_pv)
+#define is_pv_vcpu(v)   (is_pv_domain((v)->domain))
+#define is_hvm_domain(d) ((d)->guest_type == guest_type_hvm)
 #define is_hvm_vcpu(v)   (is_hvm_domain(v->domain))
+#define has_hvm_container_domain(d) ((d)->guest_type != guest_type_pv)
+#define has_hvm_container_vcpu(v)   (has_hvm_container_domain((v)->domain))
 #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \
                            cpumask_weight((v)->cpu_affinity) == 1)
 #ifdef HAS_PASSTHROUGH
diff --git a/xen/include/xen/tmem_xen.h b/xen/include/xen/tmem_xen.h
index ad1ddd5..9fb7446 100644
--- a/xen/include/xen/tmem_xen.h
+++ b/xen/include/xen/tmem_xen.h
@@ -442,7 +442,7 @@ typedef XEN_GUEST_HANDLE_PARAM(char) tmem_cli_va_param_t;
 static inline int tmh_get_tmemop_from_client(tmem_op_t *op, tmem_cli_op_t uops)
 {
 #ifdef CONFIG_COMPAT
-    if ( is_hvm_vcpu(current) ?
+    if ( has_hvm_container_vcpu(current) ?
          hvm_guest_x86_mode(current) != 8 :
          is_pv_32on64_vcpu(current) )
     {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 06/21] pvh: Introduce PVH guest type
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (4 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-18 14:10   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers George Dunlap
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

Introduce new PVH guest type, flags to create it, and ways to identify it.

To begin with, it will inherit functionality marked hvm_container.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/common/domain.c         |   11 +++++++++++
 xen/common/domctl.c         |   15 +++++++++++++--
 xen/include/public/domctl.h |    8 +++++++-
 xen/include/xen/sched.h     |   11 ++++++++++-
 4 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 38b1bad..2899021 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -237,6 +237,17 @@ struct domain *domain_create(
 
     if ( domcr_flags & DOMCRF_hvm )
         d->guest_type = guest_type_hvm;
+    else if ( domcr_flags & DOMCRF_pvh )
+    {
+        if ( !(domcr_flags & DOMCRF_hap) )
+        {
+            err = -EOPNOTSUPP;
+            printk(XENLOG_INFO "PVH guest must have HAP on\n");
+            goto fail;
+        }
+        d->guest_type = guest_type_pvh;
+        printk("Creating PVH guest d%d\n", d->domain_id);
+    }
 
     if ( domid == 0 )
     {
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index c653efb..04fcf47 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -187,6 +187,8 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
 
     if ( is_hvm_domain(d) )
         info->flags |= XEN_DOMINF_hvm_guest;
+    else if ( is_pvh_domain(d) )
+        info->flags |= XEN_DOMINF_pvh_guest;
 
     xsm_security_domaininfo(d, info);
 
@@ -412,8 +414,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         ret = -EINVAL;
         if ( supervisor_mode_kernel ||
              (op->u.createdomain.flags &
-             ~(XEN_DOMCTL_CDF_hvm_guest | XEN_DOMCTL_CDF_hap |
-               XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off)) )
+             ~(XEN_DOMCTL_CDF_hvm_guest
+               | XEN_DOMCTL_CDF_pvh_guest
+               | XEN_DOMCTL_CDF_hap
+               | XEN_DOMCTL_CDF_s3_integrity
+               | XEN_DOMCTL_CDF_oos_off)) )
             break;
 
         dom = op->domain;
@@ -440,9 +445,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             rover = dom;
         }
 
+        if ( (op->u.createdomain.flags & XEN_DOMCTL_CDF_hvm_guest)
+             && (op->u.createdomain.flags & XEN_DOMCTL_CDF_pvh_guest) )
+            return -EINVAL;
+
         domcr_flags = 0;
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hvm_guest )
             domcr_flags |= DOMCRF_hvm;
+        if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_pvh_guest )
+            domcr_flags |= DOMCRF_pvh;
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap )
             domcr_flags |= DOMCRF_hap;
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_s3_integrity )
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 4c5b2bb..e471179 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -47,7 +47,7 @@ struct xen_domctl_createdomain {
     /* IN parameters */
     uint32_t ssidref;
     xen_domain_handle_t handle;
- /* Is this an HVM guest (as opposed to a PV guest)? */
+ /* Is this an HVM guest (as opposed to a PVH or PV guest)? */
 #define _XEN_DOMCTL_CDF_hvm_guest     0
 #define XEN_DOMCTL_CDF_hvm_guest      (1U<<_XEN_DOMCTL_CDF_hvm_guest)
  /* Use hardware-assisted paging if available? */
@@ -59,6 +59,9 @@ struct xen_domctl_createdomain {
  /* Disable out-of-sync shadow page tables? */
 #define _XEN_DOMCTL_CDF_oos_off       3
 #define XEN_DOMCTL_CDF_oos_off        (1U<<_XEN_DOMCTL_CDF_oos_off)
+ /* Is this a PVH guest (as opposed to an HVM or PV guest)? */
+#define _XEN_DOMCTL_CDF_pvh_guest     4
+#define XEN_DOMCTL_CDF_pvh_guest      (1U<<_XEN_DOMCTL_CDF_pvh_guest)
     uint32_t flags;
 };
 typedef struct xen_domctl_createdomain xen_domctl_createdomain_t;
@@ -89,6 +92,9 @@ struct xen_domctl_getdomaininfo {
  /* Being debugged.  */
 #define _XEN_DOMINF_debugged  6
 #define XEN_DOMINF_debugged   (1U<<_XEN_DOMINF_debugged)
+/* domain is PVH */
+#define _XEN_DOMINF_pvh_guest 7
+#define XEN_DOMINF_pvh_guest   (1U<<_XEN_DOMINF_pvh_guest)
  /* XEN_DOMINF_shutdown guest-supplied code.  */
 #define XEN_DOMINF_shutdownmask 255
 #define XEN_DOMINF_shutdownshift 16
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d395087..c804925 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -238,8 +238,12 @@ struct mem_event_per_domain
     struct mem_event_domain access;
 };
 
+/*
+ * PVH is a PV guest running in an HVM container.  is_hvm_* checks
+ * will be false, but has_hvm_container_* checks will be true.
+ */
 enum guest_type {
-    guest_type_pv, guest_type_hvm
+    guest_type_pv, guest_type_pvh, guest_type_hvm
 };
 
 struct domain
@@ -468,6 +472,9 @@ struct domain *domain_create(
  /* DOMCRF_oos_off: dont use out-of-sync optimization for shadow page tables */
 #define _DOMCRF_oos_off         4
 #define DOMCRF_oos_off          (1U<<_DOMCRF_oos_off)
+ /* DOMCRF_pvh: Create PV domain in HVM container. */
+#define _DOMCRF_pvh            5
+#define DOMCRF_pvh             (1U<<_DOMCRF_pvh)
 
 /*
  * rcu_lock_domain_by_id() is more efficient than get_domain_by_id().
@@ -738,6 +745,8 @@ void watchdog_domain_destroy(struct domain *d);
 
 #define is_pv_domain(d) ((d)->guest_type == guest_type_pv)
 #define is_pv_vcpu(v)   (is_pv_domain((v)->domain))
+#define is_pvh_domain(d) ((d)->guest_type == guest_type_pvh)
+#define is_pvh_vcpu(v)   (is_pvh_domain((v)->domain))
 #define is_hvm_domain(d) ((d)->guest_type == guest_type_hvm)
 #define is_hvm_vcpu(v)   (is_hvm_domain(v->domain))
 #define has_hvm_container_domain(d) ((d)->guest_type != guest_type_pv)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (5 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 06/21] pvh: Introduce PVH guest type George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-13 16:36   ` George Dunlap
  2013-09-18 14:18   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 08/21] pvh: vmx-specific changes George Dunlap
                   ` (14 subsequent siblings)
  21 siblings, 2 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

Things kept:
* cacheattr_region lists
* irq-related structures
* paging
* tm_list

Things disabled for now:
* compat xlation

Things disabled:
* Emulated timers and clock sources
* IO/MMIO emulation
* msix tables
* hvm params
* hvm_funcs
* nested HVM
* Fast-path for emulated lapic accesses

Getting rid of the hvm_params struct required a couple other places to
check for its existence before attempting to read the params.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/hvm/hvm.c      |   37 ++++++++++++++++++++++++++++++++++---
 xen/arch/x86/hvm/io.c       |    4 ++++
 xen/arch/x86/hvm/irq.c      |    3 +++
 xen/arch/x86/hvm/mtrr.c     |    3 ++-
 xen/arch/x86/hvm/vmx/intr.c |    3 ++-
 xen/arch/x86/hvm/vmx/vmcs.c |    5 +++--
 xen/arch/x86/hvm/vmx/vmx.c  |   10 ++++++++--
 7 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1764b78..6a7a006 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -301,6 +301,10 @@ u64 hvm_get_guest_tsc_adjust(struct vcpu *v)
 
 void hvm_migrate_timers(struct vcpu *v)
 {
+    /* PVH doesn't use rtc and emulated timers, it uses pvclock mechanism. */
+    if ( is_pvh_vcpu(v) )
+        return;
+
     rtc_migrate_timers(v);
     pt_migrate(v);
 }
@@ -342,10 +346,13 @@ void hvm_do_resume(struct vcpu *v)
 {
     ioreq_t *p;
 
-    pt_restore_timer(v);
-
     check_wakeup_from_wait();
 
+    if ( is_pvh_vcpu(v) )
+        goto check_inject_trap;
+
+    pt_restore_timer(v);
+
     /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
     p = get_ioreq(v);
     while ( p->state != STATE_IOREQ_NONE )
@@ -368,6 +375,7 @@ void hvm_do_resume(struct vcpu *v)
         }
     }
 
+  check_inject_trap:
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
     {
@@ -521,6 +529,7 @@ int hvm_domain_initialise(struct domain *d)
         return -EINVAL;
     }
 
+    /* PVH: pbut_lock and uc_lock unused, but won't hurt */
     spin_lock_init(&d->arch.hvm_domain.pbuf_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
@@ -531,6 +540,9 @@ int hvm_domain_initialise(struct domain *d)
     if ( rc != 0 )
         goto fail0;
 
+    if ( is_pvh_domain(d) )
+        return 0;
+
     INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
     spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock);
 
@@ -584,6 +596,9 @@ int hvm_domain_initialise(struct domain *d)
 
 void hvm_domain_relinquish_resources(struct domain *d)
 {
+    if ( is_pvh_domain(d) )
+        return;
+
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
@@ -609,6 +624,10 @@ void hvm_domain_relinquish_resources(struct domain *d)
 void hvm_domain_destroy(struct domain *d)
 {
     hvm_destroy_cacheattr_region_list(d);
+
+    if ( is_pvh_domain(d) )
+        return;
+
     hvm_funcs.domain_destroy(d);
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -1093,6 +1112,14 @@ int hvm_vcpu_initialise(struct vcpu *v)
 
     v->arch.hvm_vcpu.inject_trap.vector = -1;
 
+    if ( is_pvh_vcpu(v) )
+    {
+        v->arch.hvm_vcpu.hcall_64bit = 1;    /* PVH 32bitfixme. */
+        /* This for hvm_long_mode_enabled(v). */
+        v->arch.hvm_vcpu.guest_efer = EFER_SCE | EFER_LMA | EFER_LME;
+        return 0;
+    }
+
     rc = setup_compat_arg_xlat(v); /* teardown: free_compat_arg_xlat() */
     if ( rc != 0 )
         goto fail3;
@@ -1168,7 +1195,10 @@ void hvm_vcpu_destroy(struct vcpu *v)
 
     tasklet_kill(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet);
     hvm_vcpu_cacheattr_destroy(v);
-    vlapic_destroy(v);
+
+    if ( is_hvm_vcpu(v) )
+        vlapic_destroy(v);
+
     hvm_funcs.vcpu_destroy(v);
 
     /* Event channel is already freed by evtchn_destroy(). */
@@ -1369,6 +1399,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
     /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
      * a fast path for LAPIC accesses, skipping the p2m lookup. */
     if ( !nestedhvm_vcpu_in_guestmode(v)
+         && is_hvm_vcpu(v)
          && gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
     {
         if ( !handle_mmio() )
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 4ae2c0c..3af4b34 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -175,6 +175,10 @@ int handle_mmio(void)
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     int rc;
 
+    /* No MMIO for PVH vcpus */
+    if ( is_pvh_vcpu(curr) )
+        return 0;
+
     hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
 
     rc = hvm_emulate_one(&ctxt);
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 9eae5de..92fb245 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -405,6 +405,9 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v)
          && vcpu_info(v, evtchn_upcall_pending) )
         return hvm_intack_vector(plat->irq.callback_via.vector);
 
+    if ( is_pvh_vcpu(v) )
+        return hvm_intack_none;
+
     if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output )
         return hvm_intack_pic(0);
 
diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index ef51a8d..df888a6 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -693,7 +693,8 @@ uint8_t epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
          ((d->vcpu == NULL) || ((v = d->vcpu[0]) == NULL)) )
         return MTRR_TYPE_WRBACK;
 
-    if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )
+    if ( v->domain->arch.hvm_domain.params
+         && !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )
         return MTRR_TYPE_WRBACK;
 
     if ( (v == current) && v->domain->arch.hvm_domain.is_in_uc_mode )
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index e376f3c..5a7a62e 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -220,7 +220,8 @@ void vmx_intr_assist(void)
     }
 
     /* Crank the handle on interrupt state. */
-    pt_vector = pt_update_irq(v);
+    if ( is_hvm_vcpu(v) )
+        pt_vector = pt_update_irq(v);
 
     do {
         intack = hvm_vcpu_has_pending_irq(v);
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 7087630..fa90493 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1295,8 +1295,9 @@ void vmx_do_resume(struct vcpu *v)
     }
 
     debug_state = v->domain->debugger_attached
-                  || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_INT3]
-                  || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP];
+                  || (v->domain->arch.hvm_domain.params
+                      && (v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_INT3]
+                          || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP]));
 
     if ( unlikely(v->arch.hvm_vcpu.debug_state_latch != debug_state) )
     {
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index f02e47a..0ac96ab 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -788,7 +788,8 @@ static void vmx_set_segment_register(struct vcpu *v, enum x86_segment seg,
         
         if ( seg == x86_seg_tr ) 
         {
-            if ( v->domain->arch.hvm_domain.params[HVM_PARAM_VM86_TSS] )
+            if ( v->domain->arch.hvm_domain.params
+                 && v->domain->arch.hvm_domain.params[HVM_PARAM_VM86_TSS] )
             {
                 sel = 0;
                 attr = vm86_tr_attr;
@@ -1078,7 +1079,8 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
                 v->arch.hvm_vmx.exec_control |= cr3_ctls;
 
             /* Trap CR3 updates if CR3 memory events are enabled. */
-            if ( v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_CR3] )
+            if ( v->domain->arch.hvm_domain.params
+                 && v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_CR3] )
                 v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING;
 
             vmx_update_cpu_exec_control(v);
@@ -1145,8 +1147,12 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
         if ( paging_mode_hap(v->domain) )
         {
             if ( !hvm_paging_enabled(v) )
+            {
+                /* We should never get here for pvh domains */
+                ASSERT(v->domain->arch.hvm_domain.params);
                 v->arch.hvm_vcpu.hw_cr[3] =
                     v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT];
+            }
             vmx_load_pdptrs(v);
         }
  
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 08/21] pvh: vmx-specific changes
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (6 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-13 16:38   ` George Dunlap
  2013-09-18 14:25   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 09/21] pvh: Do not allow PVH guests to change paging modes George Dunlap
                   ` (13 subsequent siblings)
  21 siblings, 2 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

Changes:
* Enforce HAP mode for now
* Disable exits related to virtual interrupts or emulated APICs
* Disable changing paging mode
 - "unrestricted guest" (i.e., real mode for EPT) disabled
 - write guest EFER disabled
* Start in 64-bit mode
* Force TSC mode to be "none"
* Paging mode update to happen in arch_set_info_guest

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/hvm/vmx/vmcs.c |  140 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 135 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index fa90493..f016343 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -828,6 +828,58 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
     virtual_vmcs_exit(vvmcs);
 }
 
+static int pvh_check_requirements(struct vcpu *v)
+{
+    u64 required, tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features);
+
+    if ( !paging_mode_hap(v->domain) )
+    {
+        printk(XENLOG_G_INFO "HAP is required for PVH guest.\n");
+        return -EINVAL;
+    }
+    if ( !cpu_has_vmx_ept )
+    {
+        printk(XENLOG_G_INFO "PVH: CPU does not have EPT support\n");
+        return -ENOSYS;
+    }
+    if ( !cpu_has_vmx_pat )
+    {
+        printk(XENLOG_G_INFO "PVH: CPU does not have PAT support\n");
+        return -ENOSYS;
+    }
+    if ( !cpu_has_vmx_msr_bitmap )
+    {
+        printk(XENLOG_G_INFO "PVH: CPU does not have msr bitmap\n");
+        return -ENOSYS;
+    }
+    if ( !cpu_has_vmx_secondary_exec_control )
+    {
+        printk(XENLOG_G_INFO "CPU Secondary exec is required to run PVH\n");
+        return -ENOSYS;
+    }
+
+    /*
+     * If rdtsc exiting is turned on and it goes thru emulate_privileged_op,
+     * then pv_vcpu.ctrlreg must be added to the pvh struct.
+     */
+    if ( v->domain->arch.vtsc )
+    {
+        printk(XENLOG_G_INFO
+               "At present PVH only supports the default timer mode\n");
+        return -ENOSYS;
+    }
+
+    required = X86_CR4_PAE | X86_CR4_VMXE | X86_CR4_OSFXSR;
+    if ( (tmpval & required) != required )
+    {
+        printk(XENLOG_G_INFO "PVH: required CR4 features not available:%lx\n",
+               required);
+        return -ENOSYS;
+    }
+
+    return 0;
+}
+
 static int construct_vmcs(struct vcpu *v)
 {
     struct domain *d = v->domain;
@@ -836,6 +888,13 @@ static int construct_vmcs(struct vcpu *v)
     u32 vmexit_ctl = vmx_vmexit_control;
     u32 vmentry_ctl = vmx_vmentry_control;
 
+    if ( is_pvh_domain(d) )
+    {
+        int rc = pvh_check_requirements(v);
+        if ( rc )
+            return rc;
+    }
+
     vmx_vmcs_enter(v);
 
     /* VMCS controls. */
@@ -874,7 +933,44 @@ static int construct_vmcs(struct vcpu *v)
     /* Do not enable Monitor Trap Flag unless start single step debug */
     v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG;
 
+    if ( is_pvh_domain(d) )
+    {
+        /* Disable virtual apics, TPR */
+        v->arch.hvm_vmx.secondary_exec_control &= 
+            ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
+              | SECONDARY_EXEC_APIC_REGISTER_VIRT
+              | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
+        v->arch.hvm_vmx.exec_control &= ~CPU_BASED_TPR_SHADOW;
+
+        /* Disable wbinvd (only necessary for MMIO),
+         * unrestricted guest (real mode for EPT) */
+        v->arch.hvm_vmx.secondary_exec_control &= 
+            ~(SECONDARY_EXEC_UNRESTRICTED_GUEST
+              | SECONDARY_EXEC_WBINVD_EXITING);
+
+        ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS);
+        ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_ACTIVATE_MSR_BITMAP);
+        ASSERT(!(v->arch.hvm_vmx.exec_control & CPU_BASED_RDTSC_EXITING));
+
+        /*
+         * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of 1, which means
+         * upon vmentry, the cpu reads/loads VMCS.DR7 and VMCS.DEBUGCTLS, and not
+         * use the host values. 0 would cause it to not use the VMCS values.
+         */
+
+        /* PVH: I don't think these are necessary */
+        v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
+        vmentry_ctl &= ~VM_ENTRY_LOAD_GUEST_EFER;
+        vmentry_ctl &= ~VM_ENTRY_SMM;
+        vmentry_ctl &= ~VM_ENTRY_DEACT_DUAL_MONITOR;
+
+        /* Start in 64-bit mode.
+         * PVH 32bitfixme. */
+        vmentry_ctl |= VM_ENTRY_IA32E_MODE;       /* GUEST_EFER.LME/LMA ignored */
+    }
+
     vmx_update_cpu_exec_control(v);
+
     __vmwrite(VM_EXIT_CONTROLS, vmexit_ctl);
     __vmwrite(VM_ENTRY_CONTROLS, vmentry_ctl);
 
@@ -910,6 +1006,17 @@ static int construct_vmcs(struct vcpu *v)
         vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, MSR_TYPE_R | MSR_TYPE_W);
         if ( cpu_has_vmx_pat && paging_mode_hap(d) )
             vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, MSR_TYPE_R | MSR_TYPE_W);
+        if ( is_pvh_domain(d) )
+            vmx_disable_intercept_for_msr(v, MSR_SHADOW_GS_BASE, MSR_TYPE_R | MSR_TYPE_W);
+
+        /*
+         * PVH: We don't disable intercepts for MSRs: MSR_STAR, MSR_LSTAR,
+         *      MSR_CSTAR, and MSR_SYSCALL_MASK because we need to specify
+         *      save/restore area to save/restore at every VM exit and entry.
+         *      Instead, let the intercept functions save them into
+         *      vmx_msr_state fields. See comment in vmx_restore_host_msrs().
+         *      See also vmx_restore_guest_msrs().
+         */
     }
 
     /* I/O access bitmap. */
@@ -1002,7 +1109,11 @@ static int construct_vmcs(struct vcpu *v)
     __vmwrite(GUEST_DS_AR_BYTES, 0xc093);
     __vmwrite(GUEST_FS_AR_BYTES, 0xc093);
     __vmwrite(GUEST_GS_AR_BYTES, 0xc093);
-    __vmwrite(GUEST_CS_AR_BYTES, 0xc09b); /* exec/read, accessed */
+    if ( is_pvh_domain(d) )
+        /* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme. */
+        __vmwrite(GUEST_CS_AR_BYTES, 0xa09b);
+    else
+        __vmwrite(GUEST_CS_AR_BYTES, 0xc09b); /* exec/read, accessed */
 
     /* Guest IDT. */
     __vmwrite(GUEST_IDTR_BASE, 0);
@@ -1032,12 +1143,26 @@ static int construct_vmcs(struct vcpu *v)
               | (1U << TRAP_no_device);
     vmx_update_exception_bitmap(v);
 
-    v->arch.hvm_vcpu.guest_cr[0] = X86_CR0_PE | X86_CR0_ET;
+    /* In HVM domains, this happens on the realmode->paging
+     * transition.  Since PVH never goes through this transition, we
+     * need to do it at start-of-day. */
+    if ( is_pvh_domain(d) )
+        vmx_update_debug_state(v);
+
+    /* PVH: Why is ET not set? */
+    v->arch.hvm_vcpu.guest_cr[0] = is_pvh_domain(d) ?
+        ( X86_CR0_PG | X86_CR0_NE | X86_CR0_PE | X86_CR0_WP )
+        : ( X86_CR0_PE | X86_CR0_ET );
     hvm_update_guest_cr(v, 0);
 
-    v->arch.hvm_vcpu.guest_cr[4] = 0;
+    v->arch.hvm_vcpu.guest_cr[4] = is_pvh_domain(d) ?
+        real_cr4_to_pv_guest_cr4(mmu_cr4_features)
+        : 0;
     hvm_update_guest_cr(v, 4);
 
+    if ( is_pvh_domain(d) )
+        v->arch.hvm_vmx.vmx_realmode = 0;
+
     if ( cpu_has_vmx_tpr_shadow )
     {
         __vmwrite(VIRTUAL_APIC_PAGE_ADDR,
@@ -1067,9 +1192,14 @@ static int construct_vmcs(struct vcpu *v)
 
     vmx_vmcs_exit(v);
 
-    paging_update_paging_modes(v); /* will update HOST & GUEST_CR3 as reqd */
+    /* PVH: paging mode is updated by arch_set_info_guest(). */
+    if ( is_hvm_vcpu(v) )
+    {
+        /* will update HOST & GUEST_CR3 as reqd */
+        paging_update_paging_modes(v);
 
-    vmx_vlapic_msr_changed(v);
+        vmx_vlapic_msr_changed(v);
+    }
 
     return 0;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 09/21] pvh: Do not allow PVH guests to change paging modes
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (7 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 08/21] pvh: vmx-specific changes George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-18 14:32   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 10/21] pvh: PVH access to hypercalls George Dunlap
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/hvm/hvm.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 6a7a006..b59ff52 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1743,7 +1743,17 @@ int hvm_set_cr0(unsigned long value)
          (value & (X86_CR0_PE | X86_CR0_PG)) == X86_CR0_PG )
         goto gpf;
 
-    if ( (value & X86_CR0_PG) && !(old_value & X86_CR0_PG) )
+    
+
+    /* A pvh is not expected to change to real mode. */
+    if ( is_pvh_vcpu(v)
+         && (value & (X86_CR0_PE | X86_CR0_PG)) != (X86_CR0_PG | X86_CR0_PE) )
+    {
+        printk(XENLOG_G_WARNING
+               "PVH attempting to turn off PE/PG. CR0:%lx\n", value);
+        goto gpf;
+    }
+    else if ( (value & X86_CR0_PG) && !(old_value & X86_CR0_PG) )
     {
         if ( v->arch.hvm_vcpu.guest_efer & EFER_LME )
         {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 10/21] pvh: PVH access to hypercalls
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (8 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 09/21] pvh: Do not allow PVH guests to change paging modes George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-18 14:45   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 11/21] pvh: Use PV e820 George Dunlap
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

Hypercalls where we now have unrestricted access:
* memory_op
* console_io
* vcpu_op
* mmuext_op

We also restrict PVH domain access to HVMOP_*_param to writing
HVM_PARAM_CALLBACK_IRQ.

Most hvm_op functions require "is_hvm_domain()" and will default to
-EINVAL; exceptions are HVMOP_get_time and HVMOP_xentrace.

Finally, we restrict setting IOPL permissions for a PVH domain.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/hvm/hvm.c  |   49 ++++++++++++++++++++++++++++++++++++++---------
 xen/arch/x86/hvm/mtrr.c |    1 +
 xen/arch/x86/physdev.c  |   10 ++++++++++
 xen/common/kernel.c     |    4 ++++
 4 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b59ff52..a61a4a6 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3352,6 +3352,24 @@ static hvm_hypercall_t *const hvm_hypercall32_table[NR_hypercalls] = {
     HYPERCALL(tmem_op)
 };
 
+/* PVH 32bitfixme. */
+static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
+    HYPERCALL(platform_op),
+    HYPERCALL(memory_op),
+    HYPERCALL(xen_version),
+    HYPERCALL(console_io),
+    [ __HYPERVISOR_grant_table_op ]  = (hvm_hypercall_t *)hvm_grant_table_op,
+    HYPERCALL(vcpu_op),
+    HYPERCALL(mmuext_op),
+    HYPERCALL(xsm_op),
+    HYPERCALL(sched_op),
+    HYPERCALL(event_channel_op),
+    [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
+    HYPERCALL(hvm_op),
+    HYPERCALL(sysctl),
+    HYPERCALL(domctl)
+};
+
 int hvm_do_hypercall(struct cpu_user_regs *regs)
 {
     struct vcpu *curr = current;
@@ -3378,7 +3396,9 @@ int hvm_do_hypercall(struct cpu_user_regs *regs)
     if ( (eax & 0x80000000) && is_viridian_domain(curr->domain) )
         return viridian_hypercall(regs);
 
-    if ( (eax >= NR_hypercalls) || !hvm_hypercall32_table[eax] )
+    if ( (eax >= NR_hypercalls) ||
+         (is_pvh_vcpu(curr) && !pvh_hypercall64_table[eax]) ||
+         (is_hvm_vcpu(curr) && !hvm_hypercall32_table[eax]) )
     {
         regs->eax = -ENOSYS;
         return HVM_HCALL_completed;
@@ -3393,16 +3413,20 @@ int hvm_do_hypercall(struct cpu_user_regs *regs)
                     regs->r10, regs->r8, regs->r9);
 
         curr->arch.hvm_vcpu.hcall_64bit = 1;
-        regs->rax = hvm_hypercall64_table[eax](regs->rdi,
-                                               regs->rsi,
-                                               regs->rdx,
-                                               regs->r10,
-                                               regs->r8,
-                                               regs->r9); 
+        if ( is_pvh_vcpu(curr) )
+            regs->rax = pvh_hypercall64_table[eax](regs->rdi, regs->rsi,
+                                                   regs->rdx, regs->r10,
+                                                   regs->r8, regs->r9);
+        else
+            regs->rax = hvm_hypercall64_table[eax](regs->rdi, regs->rsi,
+                                                   regs->rdx, regs->r10,
+                                                   regs->r8, regs->r9);
         curr->arch.hvm_vcpu.hcall_64bit = 0;
     }
     else
     {
+        ASSERT(!is_pvh_vcpu(curr));   /* PVH 32bitfixme. */
+
         HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%u(%x, %x, %x, %x, %x, %x)", eax,
                     (uint32_t)regs->ebx, (uint32_t)regs->ecx,
                     (uint32_t)regs->edx, (uint32_t)regs->esi,
@@ -3827,7 +3851,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -ESRCH;
 
         rc = -EINVAL;
-        if ( !is_hvm_domain(d) )
+        if ( is_pv_domain(d) )
+            goto param_fail;
+
+        if ( is_pvh_domain(d)
+             && ( a.index != HVM_PARAM_CALLBACK_IRQ
+                  ||  op != HVMOP_set_param ) )
             goto param_fail;
 
         rc = xsm_hvm_param(XSM_TARGET, d, op);
@@ -4001,7 +4030,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
             if ( rc == 0 ) 
             {
-                d->arch.hvm_domain.params[a.index] = a.value;
+                /* PVH domains don't have a params struct */
+                if ( d->arch.hvm_domain.params )
+                    d->arch.hvm_domain.params[a.index] = a.value;
 
                 switch( a.index )
                 {
diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index df888a6..4c5e6a5 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -578,6 +578,7 @@ int32_t hvm_set_mem_pinned_cacheattr(
 {
     struct hvm_mem_pinned_cacheattr_range *range;
 
+    /* Side note: A PVH guest writes to MSR_IA32_CR_PAT natively. */
     if ( !((type == PAT_TYPE_UNCACHABLE) ||
            (type == PAT_TYPE_WRCOMB) ||
            (type == PAT_TYPE_WRTHROUGH) ||
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index dab6213..103b648 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -519,6 +519,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case PHYSDEVOP_set_iopl: {
         struct physdev_set_iopl set_iopl;
+        
+        ret = -EINVAL;
+        if ( is_pvh_vcpu(current) )
+            break;
+
         ret = -EFAULT;
         if ( copy_from_guest(&set_iopl, arg, 1) != 0 )
             break;
@@ -532,6 +537,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case PHYSDEVOP_set_iobitmap: {
         struct physdev_set_iobitmap set_iobitmap;
+
+        ret = -EINVAL;
+        if ( is_pvh_vcpu(current) )
+            break;
+
         ret = -EFAULT;
         if ( copy_from_guest(&set_iobitmap, arg, 1) != 0 )
             break;
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index f6aafad..91a5b1a 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -293,6 +293,10 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
                 fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
                              (1U << XENFEAT_highmem_assist) |
                              (1U << XENFEAT_gnttab_map_avail_bits);
+            else if ( is_pvh_vcpu(current) )
+                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                             (1U << XENFEAT_supervisor_mode_kernel) |
+                             (1U << XENFEAT_hvm_callback_vector);
             else
                 fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
                              (1U << XENFEAT_hvm_callback_vector) |
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 11/21] pvh: Use PV e820
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (9 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 10/21] pvh: PVH access to hypercalls George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-18 14:48   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 12/21] pvh: Support guest_kernel_mode for PVH George Dunlap
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

Allow PV e820 map to be set and read from a PVH domain.  This requires
moving the pv e820 struct out from the pv-specific domain struct and
into the arch domain struct.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/domain.c        |    9 +++------
 xen/arch/x86/mm.c            |   26 ++++++++++++--------------
 xen/include/asm-x86/domain.h |   10 +++++-----
 3 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 5c38cb1..c75798b 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -553,6 +553,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
         if ( (rc = iommu_domain_init(d)) != 0 )
             goto fail;
     }
+    spin_lock_init(&d->arch.e820_lock);
 
     if ( has_hvm_container_domain(d) )
     {
@@ -563,13 +564,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
         }
     }
     else
-    {
         /* 64-bit PV guest by default. */
         d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
 
-        spin_lock_init(&d->arch.pv_domain.e820_lock);
-    }
-
     /* initialize default tsc behavior in case tools don't */
     tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
     spin_lock_init(&d->arch.vtsc_lock);
@@ -592,8 +589,8 @@ void arch_domain_destroy(struct domain *d)
 {
     if ( has_hvm_container_domain(d) )
         hvm_domain_destroy(d);
-    else
-        xfree(d->arch.pv_domain.e820);
+
+    xfree(d->arch.e820);
 
     free_domain_pirqs(d);
     if ( !is_idle_domain(d) )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 120599a..f25df92 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4760,11 +4760,11 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -EFAULT;
         }
 
-        spin_lock(&d->arch.pv_domain.e820_lock);
-        xfree(d->arch.pv_domain.e820);
-        d->arch.pv_domain.e820 = e820;
-        d->arch.pv_domain.nr_e820 = fmap.map.nr_entries;
-        spin_unlock(&d->arch.pv_domain.e820_lock);
+        spin_lock(&d->arch.e820_lock);
+        xfree(d->arch.e820);
+        d->arch.e820 = e820;
+        d->arch.nr_e820 = fmap.map.nr_entries;
+        spin_unlock(&d->arch.e820_lock);
 
         rcu_unlock_domain(d);
         return rc;
@@ -4778,26 +4778,24 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&map, arg, 1) )
             return -EFAULT;
 
-        spin_lock(&d->arch.pv_domain.e820_lock);
+        spin_lock(&d->arch.e820_lock);
 
         /* Backwards compatibility. */
-        if ( (d->arch.pv_domain.nr_e820 == 0) ||
-             (d->arch.pv_domain.e820 == NULL) )
+        if ( (d->arch.nr_e820 == 0) || (d->arch.e820 == NULL) )
         {
-            spin_unlock(&d->arch.pv_domain.e820_lock);
+            spin_unlock(&d->arch.e820_lock);
             return -ENOSYS;
         }
 
-        map.nr_entries = min(map.nr_entries, d->arch.pv_domain.nr_e820);
-        if ( copy_to_guest(map.buffer, d->arch.pv_domain.e820,
-                           map.nr_entries) ||
+        map.nr_entries = min(map.nr_entries, d->arch.nr_e820);
+        if ( copy_to_guest(map.buffer, d->arch.e820, map.nr_entries) ||
              __copy_to_guest(arg, &map, 1) )
         {
-            spin_unlock(&d->arch.pv_domain.e820_lock);
+            spin_unlock(&d->arch.e820_lock);
             return -EFAULT;
         }
 
-        spin_unlock(&d->arch.pv_domain.e820_lock);
+        spin_unlock(&d->arch.e820_lock);
         return 0;
     }
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 4cbad4e..23923df 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -234,11 +234,6 @@ struct pv_domain
 
     /* map_domain_page() mapping cache. */
     struct mapcache_domain mapcache;
-
-    /* Pseudophysical e820 map (XENMEM_memory_map).  */
-    spinlock_t e820_lock;
-    struct e820entry *e820;
-    unsigned int nr_e820;
 };
 
 struct arch_domain
@@ -313,6 +308,11 @@ struct arch_domain
                                 (possibly other cases in the future */
     uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */
     uint64_t vtsc_usercount; /* not used for hvm */
+
+    /* Pseudophysical e820 map (XENMEM_memory_map).  */
+    spinlock_t e820_lock;
+    struct e820entry *e820;
+    unsigned int nr_e820;
 } __cacheline_aligned;
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 12/21] pvh: Support guest_kernel_mode for PVH
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (10 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 11/21] pvh: Use PV e820 George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-18 14:52   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 13/21] pvh: Support read_segment_register " George Dunlap
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

This will be requred in order to do emulated_privops for PVH.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/hvm/hvm.c            |    8 ++++++++
 xen/arch/x86/x86_64/traps.c       |    2 +-
 xen/include/asm-x86/domain.h      |    2 +-
 xen/include/asm-x86/x86_64/regs.h |   11 +++++++----
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index a61a4a6..8b87271 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4717,6 +4717,14 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v)
     return hvm_funcs.nhvm_intr_blocked(v);
 }
 
+bool_t hvm_kernel_mode(struct vcpu *v)
+{
+    struct segment_register seg;
+
+    hvm_get_segment_register(v, x86_seg_ss, &seg);
+    return (seg.attr.fields.dpl == 0);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index ef5c470..8644aaf 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -142,7 +142,7 @@ void show_registers(struct cpu_user_regs *regs)
     }
 }
 
-void vcpu_show_registers(const struct vcpu *v)
+void vcpu_show_registers(struct vcpu *v)
 {
     const struct cpu_user_regs *regs = &v->arch.user_regs;
     unsigned long crs[8];
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 23923df..ea7074c 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -447,7 +447,7 @@ struct arch_vcpu
 #define hvm_svm         hvm_vcpu.u.svm
 
 void vcpu_show_execution_state(struct vcpu *);
-void vcpu_show_registers(const struct vcpu *);
+void vcpu_show_registers(struct vcpu *);
 
 /* Clean up CR4 bits that are not under guest control. */
 unsigned long pv_guest_cr4_fixup(const struct vcpu *, unsigned long guest_cr4);
diff --git a/xen/include/asm-x86/x86_64/regs.h b/xen/include/asm-x86/x86_64/regs.h
index 3cdc702..d91a84b 100644
--- a/xen/include/asm-x86/x86_64/regs.h
+++ b/xen/include/asm-x86/x86_64/regs.h
@@ -10,10 +10,13 @@
 #define ring_2(r)    (((r)->cs & 3) == 2)
 #define ring_3(r)    (((r)->cs & 3) == 3)
 
-#define guest_kernel_mode(v, r)                                 \
-    (!is_pv_32bit_vcpu(v) ?                                     \
-     (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) :        \
-     (ring_1(r)))
+bool_t hvm_kernel_mode(struct vcpu *);
+
+#define guest_kernel_mode(v, r)                                   \
+    (is_pvh_vcpu(v) ? hvm_kernel_mode(v) :                        \
+     (!is_pv_32bit_vcpu(v) ?                                      \
+      (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) :         \
+      (ring_1(r))))
 
 #define permit_softint(dpl, v, r) \
     ((dpl) >= (guest_kernel_mode(v, r) ? 1 : 3))
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 13/21] pvh: Support read_segment_register for PVH
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (11 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 12/21] pvh: Support guest_kernel_mode for PVH George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-18 14:56   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 14/21] pvh: read_descriptor for PVH guests George Dunlap
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

This will be necessary to do PV-style emulated operations for PVH guests.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/domain.c         |    8 ++++----
 xen/arch/x86/hvm/vmx/vmx.c    |   40 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c          |   26 ++++++++++++--------------
 xen/arch/x86/x86_64/traps.c   |   16 ++++++++--------
 xen/include/asm-x86/hvm/hvm.h |    6 ++++++
 xen/include/asm-x86/system.h  |   19 +++++++++++++++----
 6 files changed, 85 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index c75798b..02a6479 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1217,10 +1217,10 @@ static void save_segments(struct vcpu *v)
     struct cpu_user_regs *regs = &v->arch.user_regs;
     unsigned int dirty_segment_mask = 0;
 
-    regs->ds = read_segment_register(ds);
-    regs->es = read_segment_register(es);
-    regs->fs = read_segment_register(fs);
-    regs->gs = read_segment_register(gs);
+    regs->ds = read_segment_register(v, regs, ds);
+    regs->es = read_segment_register(v, regs, es);
+    regs->fs = read_segment_register(v, regs, fs);
+    regs->gs = read_segment_register(v, regs, gs);
 
     if ( regs->ds )
         dirty_segment_mask |= DIRTY_DS;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 0ac96ab..c2156af 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -664,6 +664,45 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
         .fields = { .type = 0xb, .s = 0, .dpl = 0, .p = 1, .avl = 0,    \
                     .l = 0, .db = 0, .g = 0, .pad = 0 } }).bytes)
 
+u16 vmx_read_selector(struct vcpu *v, enum x86_segment seg)
+{
+    u16 sel = 0;
+
+    vmx_vmcs_enter(v);
+    switch ( seg )
+    {
+    case x86_seg_cs:
+        sel = __vmread(GUEST_CS_SELECTOR);
+        break;
+
+    case x86_seg_ss:
+        sel = __vmread(GUEST_SS_SELECTOR);
+        break;
+
+    case x86_seg_es:
+        sel = __vmread(GUEST_ES_SELECTOR);
+        break;
+
+    case x86_seg_ds:
+        sel = __vmread(GUEST_DS_SELECTOR);
+        break;
+
+    case x86_seg_fs:
+        sel = __vmread(GUEST_FS_SELECTOR);
+        break;
+
+    case x86_seg_gs:
+        sel = __vmread(GUEST_GS_SELECTOR);
+        break;
+
+    default:
+        BUG();
+    }
+    vmx_vmcs_exit(v);
+
+    return sel;
+}
+
 void vmx_get_segment_register(struct vcpu *v, enum x86_segment seg,
                               struct segment_register *reg)
 {
@@ -1566,6 +1605,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .sync_pir_to_irr      = vmx_sync_pir_to_irr,
     .handle_eoi           = vmx_handle_eoi,
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
+    .read_selector        = vmx_read_selector,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index e4f080c..a5d8349 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1833,8 +1833,6 @@ static inline uint64_t guest_misc_enable(uint64_t val)
     }                                                                       \
     (eip) += sizeof(_x); _x; })
 
-#define read_sreg(regs, sr) read_segment_register(sr)
-
 static int is_cpufreq_controller(struct domain *d)
 {
     return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
@@ -1879,7 +1877,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         goto fail;
 
     /* emulating only opcodes not allowing SS to be default */
-    data_sel = read_sreg(regs, ds);
+    data_sel = read_segment_register(v, regs, ds);
 
     /* Legacy prefixes. */
     for ( i = 0; i < 8; i++, rex == opcode || (rex = 0) )
@@ -1897,17 +1895,17 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             data_sel = regs->cs;
             continue;
         case 0x3e: /* DS override */
-            data_sel = read_sreg(regs, ds);
+            data_sel = read_segment_register(v, regs, ds);
             continue;
         case 0x26: /* ES override */
-            data_sel = read_sreg(regs, es);
+            data_sel = read_segment_register(v, regs, es);
             continue;
         case 0x64: /* FS override */
-            data_sel = read_sreg(regs, fs);
+            data_sel = read_segment_register(v, regs, fs);
             lm_ovr = lm_seg_fs;
             continue;
         case 0x65: /* GS override */
-            data_sel = read_sreg(regs, gs);
+            data_sel = read_segment_register(v, regs, gs);
             lm_ovr = lm_seg_gs;
             continue;
         case 0x36: /* SS override */
@@ -1954,7 +1952,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
 
         if ( !(opcode & 2) )
         {
-            data_sel = read_sreg(regs, es);
+            data_sel = read_segment_register(v, regs, es);
             lm_ovr = lm_seg_none;
         }
 
@@ -2687,22 +2685,22 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             ASSERT(opnd_sel);
             continue;
         case 0x3e: /* DS override */
-            opnd_sel = read_sreg(regs, ds);
+            opnd_sel = read_segment_register(v, regs, ds);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
         case 0x26: /* ES override */
-            opnd_sel = read_sreg(regs, es);
+            opnd_sel = read_segment_register(v, regs, es);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
         case 0x64: /* FS override */
-            opnd_sel = read_sreg(regs, fs);
+            opnd_sel = read_segment_register(v, regs, fs);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
         case 0x65: /* GS override */
-            opnd_sel = read_sreg(regs, gs);
+            opnd_sel = read_segment_register(v, regs, gs);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
@@ -2755,7 +2753,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
                             switch ( modrm & 7 )
                             {
                             default:
-                                opnd_sel = read_sreg(regs, ds);
+                                opnd_sel = read_segment_register(v, regs, ds);
                                 break;
                             case 4: case 5:
                                 opnd_sel = regs->ss;
@@ -2783,7 +2781,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
                             break;
                         }
                         if ( !opnd_sel )
-                            opnd_sel = read_sreg(regs, ds);
+                            opnd_sel = read_segment_register(v, regs, ds);
                         switch ( modrm & 7 )
                         {
                         case 0: case 2: case 4:
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 8644aaf..3dfb309 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -123,10 +123,10 @@ void show_registers(struct cpu_user_regs *regs)
         fault_crs[0] = read_cr0();
         fault_crs[3] = read_cr3();
         fault_crs[4] = read_cr4();
-        fault_regs.ds = read_segment_register(ds);
-        fault_regs.es = read_segment_register(es);
-        fault_regs.fs = read_segment_register(fs);
-        fault_regs.gs = read_segment_register(gs);
+        fault_regs.ds = read_segment_register(v, regs, ds);
+        fault_regs.es = read_segment_register(v, regs, es);
+        fault_regs.fs = read_segment_register(v, regs, fs);
+        fault_regs.gs = read_segment_register(v, regs, gs);
     }
 
     print_xen_info();
@@ -239,10 +239,10 @@ void do_double_fault(struct cpu_user_regs *regs)
     crs[2] = read_cr2();
     crs[3] = read_cr3();
     crs[4] = read_cr4();
-    regs->ds = read_segment_register(ds);
-    regs->es = read_segment_register(es);
-    regs->fs = read_segment_register(fs);
-    regs->gs = read_segment_register(gs);
+    regs->ds = read_segment_register(current, regs, ds);
+    regs->es = read_segment_register(current, regs, es);
+    regs->fs = read_segment_register(current, regs, fs);
+    regs->gs = read_segment_register(current, regs, gs);
 
     printk("CPU:    %d\n", cpu);
     _show_registers(regs, crs, CTXT_hypervisor, NULL);
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 00489cf..401fa4c 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -193,6 +193,7 @@ struct hvm_function_table {
                                 paddr_t *L1_gpa, unsigned int *page_order,
                                 uint8_t *p2m_acc, bool_t access_r,
                                 bool_t access_w, bool_t access_x);
+    u16 (*read_selector)(struct vcpu *v, enum x86_segment seg);
 };
 
 extern struct hvm_function_table hvm_funcs;
@@ -344,6 +345,11 @@ static inline int hvm_event_pending(struct vcpu *v)
     return hvm_funcs.event_pending(v);
 }
 
+static inline u16 pvh_get_selector(struct vcpu *v, enum x86_segment seg)
+{
+    return hvm_funcs.read_selector(v, seg);
+}
+
 /* These reserved bits in lower 32 remain 0 after any load of CR0 */
 #define HVM_CR0_GUEST_RESERVED_BITS             \
     (~((unsigned long)                          \
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 6ab7d56..1242657 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -4,10 +4,21 @@
 #include <xen/lib.h>
 #include <xen/bitops.h>
 
-#define read_segment_register(name)                             \
-({  u16 __sel;                                                  \
-    asm volatile ( "movw %%" STR(name) ",%0" : "=r" (__sel) );  \
-    __sel;                                                      \
+/*
+ * We need vcpu because during context switch, going from PV to PVH,
+ * in save_segments() current has been updated to next, and no longer pointing
+ * to the PV, but the intention is to get selector for the PV. Checking
+ * is_pvh_vcpu(current) will yield incorrect results in such a case.
+ */
+#define read_segment_register(vcpu, regs, name)                   \
+({  u16 __sel;                                                    \
+    struct cpu_user_regs *_regs = (regs);                         \
+                                                                  \
+    if ( is_pvh_vcpu(vcpu) && guest_mode(_regs) )                 \
+        __sel = pvh_get_selector(vcpu, x86_seg_##name);           \
+    else                                                          \
+        asm volatile ( "movw %%" #name ",%0" : "=r" (__sel) );    \
+    __sel;                                                        \
 })
 
 #define wbinvd() \
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 14/21] pvh: read_descriptor for PVH guests.
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (12 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 13/21] pvh: Support read_segment_register " George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-13 16:40   ` George Dunlap
  2013-09-18 15:00   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 15/21] pvh: Set up more PV stuff in set_info_guest George Dunlap
                   ` (7 subsequent siblings)
  21 siblings, 2 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

This is in preparation for enabling emulated privops for PVH guests.

This one unfortunately has a bit more of an impedance mismatch:

* For PV, the selector is hard-coded in by passing #name in the
  #define.  For PVH, we have to do a switch statement.

* For PV, given the desrciptor, it can read the resulting base, limit,
  &c directly; for PVH, we have to read the values currently loaded in the vmcs.

This leads to a rather awkward construct where we *both* read the
descriptor, *and* specify a selector, and introduce an intermediate function,
read_descriptor_sel.

Unfortunately, without introducing a rather pointless switch()
statement to the PV path similar to the one in the PVH path, there's
no way to make this less awkward.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/traps.c       |   73 ++++++++++++++++++++++++++++++++++++++------
 xen/include/asm-x86/desc.h |    4 ++-
 2 files changed, 66 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index a5d8349..35d4a3e 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1516,6 +1516,49 @@ static int read_descriptor(unsigned int sel,
     return 1;
 }
 
+static int read_descriptor_sel(unsigned int sel,
+                               enum x86_segment which_sel,
+                               struct vcpu *v,
+                               const struct cpu_user_regs *regs,
+                               unsigned long *base,
+                               unsigned long *limit,
+                               unsigned int *ar,
+                               unsigned int vm86attr)
+{
+    struct segment_register seg;
+    bool_t long_mode;
+
+    if ( !is_pvh_vcpu(v) )
+        return read_descriptor(sel, v, regs, base, limit, ar, vm86attr);
+
+    hvm_get_segment_register(v, x86_seg_cs, &seg);
+    long_mode = seg.attr.fields.l;
+
+    if ( which_sel != x86_seg_cs )
+        hvm_get_segment_register(v, which_sel, &seg);
+
+    /* "ar" is returned packed as in segment_attributes_t. Fix it up. */
+    *ar = seg.attr.bytes;
+    *ar = (*ar & 0xff ) | ((*ar & 0xf00) << 4);
+    *ar <<= 8;
+
+    if ( long_mode )
+    {
+        *limit = ~0UL;
+
+        if ( which_sel < x86_seg_fs )
+        {
+            *base = 0UL;
+            return 1;
+        }
+   }
+   else
+       *limit = seg.limit;
+
+   *base = seg.base;
+    return 1;
+}
+
 static int read_gate_descriptor(unsigned int gate_sel,
                                 const struct vcpu *v,
                                 unsigned int *sel,
@@ -1843,6 +1886,7 @@ static int is_cpufreq_controller(struct domain *d)
 
 static int emulate_privileged_op(struct cpu_user_regs *regs)
 {
+    enum x86_segment which_sel;
     struct vcpu *v = current;
     unsigned long *reg, eip = regs->eip;
     u8 opcode, modrm_reg = 0, modrm_rm = 0, rep_prefix = 0, lock = 0, rex = 0;
@@ -1865,9 +1909,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
     void (*io_emul)(struct cpu_user_regs *) __attribute__((__regparm__(1)));
     uint64_t val, msr_content;
 
-    if ( !read_descriptor(regs->cs, v, regs,
-                          &code_base, &code_limit, &ar,
-                          _SEGMENT_CODE|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P) )
+    if ( !read_descriptor_sel(regs->cs, x86_seg_cs, v, regs,
+                              &code_base, &code_limit, &ar,
+                              _SEGMENT_CODE|_SEGMENT_S|
+                              _SEGMENT_DPL|_SEGMENT_P) )
         goto fail;
     op_default = op_bytes = (ar & (_SEGMENT_L|_SEGMENT_DB)) ? 4 : 2;
     ad_default = ad_bytes = (ar & _SEGMENT_L) ? 8 : op_default;
@@ -1878,6 +1923,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
 
     /* emulating only opcodes not allowing SS to be default */
     data_sel = read_segment_register(v, regs, ds);
+    which_sel = x86_seg_ds;
 
     /* Legacy prefixes. */
     for ( i = 0; i < 8; i++, rex == opcode || (rex = 0) )
@@ -1893,23 +1939,29 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             continue;
         case 0x2e: /* CS override */
             data_sel = regs->cs;
+            which_sel = x86_seg_cs;
             continue;
         case 0x3e: /* DS override */
             data_sel = read_segment_register(v, regs, ds);
+            which_sel = x86_seg_ds;
             continue;
         case 0x26: /* ES override */
             data_sel = read_segment_register(v, regs, es);
+            which_sel = x86_seg_es;
             continue;
         case 0x64: /* FS override */
             data_sel = read_segment_register(v, regs, fs);
+            which_sel = x86_seg_fs;
             lm_ovr = lm_seg_fs;
             continue;
         case 0x65: /* GS override */
             data_sel = read_segment_register(v, regs, gs);
+            which_sel = x86_seg_gs;
             lm_ovr = lm_seg_gs;
             continue;
         case 0x36: /* SS override */
             data_sel = regs->ss;
+            which_sel = x86_seg_ss;
             continue;
         case 0xf0: /* LOCK */
             lock = 1;
@@ -1953,15 +2005,16 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         if ( !(opcode & 2) )
         {
             data_sel = read_segment_register(v, regs, es);
+            which_sel = x86_seg_es;
             lm_ovr = lm_seg_none;
         }
 
         if ( !(ar & _SEGMENT_L) )
         {
-            if ( !read_descriptor(data_sel, v, regs,
-                                  &data_base, &data_limit, &ar,
-                                  _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|
-                                  _SEGMENT_P) )
+            if ( !read_descriptor_sel(data_sel, which_sel, v, regs,
+                                      &data_base, &data_limit, &ar,
+                                      _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|
+                                      _SEGMENT_P) )
                 goto fail;
             if ( !(ar & _SEGMENT_S) ||
                  !(ar & _SEGMENT_P) ||
@@ -1991,9 +2044,9 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
                 }
             }
             else
-                read_descriptor(data_sel, v, regs,
-                                &data_base, &data_limit, &ar,
-                                0);
+                read_descriptor_sel(data_sel, which_sel, v, regs,
+                                    &data_base, &data_limit, &ar,
+                                    0);
             data_limit = ~0UL;
             ar = _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P;
         }
diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h
index 354b889..041e9d3 100644
--- a/xen/include/asm-x86/desc.h
+++ b/xen/include/asm-x86/desc.h
@@ -38,7 +38,9 @@
 
 #ifndef __ASSEMBLY__
 
-#define GUEST_KERNEL_RPL(d) (is_pv_32bit_domain(d) ? 1 : 3)
+/* PVH 32bitfixme : see emulate_gate_op call from do_general_protection */
+#define GUEST_KERNEL_RPL(d) ({ ASSERT(is_pv_domain(d)); \
+                               is_pv_32bit_domain(d) ? 1 : 3; })
 
 /* Fix up the RPL of a guest segment selector. */
 #define __fixup_guest_selector(d, sel)                             \
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 15/21] pvh: Set up more PV stuff in set_info_guest
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (13 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 14/21] pvh: read_descriptor for PVH guests George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-18 15:17   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO George Dunlap
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

Allow the guest to set up a few more things when bringing up a vcpu.

This includes cr3 and gs_base.

Also set up wallclock, and only initialize a vcpu once.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/domain.c             |   27 ++++++++++++++++++++++++++-
 xen/arch/x86/hvm/vmx/vmx.c        |   34 ++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/hvm.h     |    8 ++++++++
 xen/include/asm-x86/hvm/vmx/vmx.h |    2 ++
 xen/include/public/arch-x86/xen.h |    4 ++++
 5 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 02a6479..d046790 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -737,7 +737,31 @@ int arch_set_info_guest(
     if ( has_hvm_container_vcpu(v) )
     {
         hvm_set_info_guest(v);
-        goto out;
+
+        if ( is_hvm_vcpu(v) || v->is_initialised )
+            goto out;
+
+        /* PVH */
+        if ( c.nat->ctrlreg[1] )
+            return -EINVAL;
+
+        ASSERT(!compat);
+
+        cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[3]);
+        cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC);
+
+        v->arch.cr3 = page_to_maddr(cr3_page);
+        v->arch.hvm_vcpu.guest_cr[3] = c.nat->ctrlreg[3];
+        
+        if ( paging_mode_enabled(d) )
+            paging_update_paging_modes(v);
+
+        update_cr3(v);
+
+        if ( (rc = pvh_vcpu_boot_set_info(v, c.nat)) != 0 )
+            return rc;
+
+        goto pvh_skip_pv_stuff;
     }
 
     init_int80_direct_trap(v);
@@ -947,6 +971,7 @@ int arch_set_info_guest(
 
     update_cr3(v);
 
+ pvh_skip_pv_stuff:
     if ( v->vcpu_id == 0 )
         update_domain_wallclock_time(d);
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c2156af..b3f933a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -122,6 +122,39 @@ static int vmx_vcpu_initialise(struct vcpu *v)
     return 0;
 }
 
+/*
+ * Set vmcs fields during boot of a vcpu. Called from arch_set_info_guest.
+ *
+ * Boot vcpu call is from tools via:
+ *     do_domctl -> XEN_DOMCTL_setvcpucontext -> arch_set_info_guest
+ *
+ * Secondary vcpu's are brought up by the guest itself via:
+ *     do_vcpu_op -> VCPUOP_initialise -> arch_set_info_guest
+ *     (In case of linux, the call comes from cpu_initialize_context()).
+ *
+ * Note, PVH save/restore is expected to happen the HVM way, ie,
+ *        do_domctl -> XEN_DOMCTL_sethvmcontext -> hvm_load/save
+ * and not get here.
+ *
+ * PVH 32bitfixme: this function needs to be modified for 32bit guest.
+ */
+int vmx_pvh_vcpu_boot_set_info(struct vcpu *v,
+                               struct vcpu_guest_context *ctxtp)
+{
+    if ( ctxtp->ldt_base || ctxtp->ldt_ents ||
+         ctxtp->user_regs.cs || ctxtp->user_regs.ss || ctxtp->user_regs.es ||
+         ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs ||
+         *ctxtp->gdt_frames || ctxtp->gdt_ents ||
+         ctxtp->fs_base || ctxtp->gs_base_user )
+        return -EINVAL;
+
+    vmx_vmcs_enter(v);
+    __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel);
+    vmx_vmcs_exit(v);
+
+    return 0;
+}
+
 static void vmx_vcpu_destroy(struct vcpu *v)
 {
     vmx_destroy_vmcs(v);
@@ -1606,6 +1639,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .handle_eoi           = vmx_handle_eoi,
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
     .read_selector        = vmx_read_selector,
+    .pvh_vcpu_boot_set_info = vmx_pvh_vcpu_boot_set_info,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 401fa4c..d81a744 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -194,6 +194,8 @@ struct hvm_function_table {
                                 uint8_t *p2m_acc, bool_t access_r,
                                 bool_t access_w, bool_t access_x);
     u16 (*read_selector)(struct vcpu *v, enum x86_segment seg);
+    int (*pvh_vcpu_boot_set_info)(struct vcpu *v,
+                                  struct vcpu_guest_context *ctxtp);
 };
 
 extern struct hvm_function_table hvm_funcs;
@@ -327,6 +329,12 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
     return hvm_funcs.get_shadow_gs_base(v);
 }
 
+static inline int pvh_vcpu_boot_set_info(struct vcpu *v,
+                                         struct vcpu_guest_context *ctxtp)
+{
+    return hvm_funcs.pvh_vcpu_boot_set_info(v, ctxtp);
+}
+
 #define is_viridian_domain(_d)                                             \
  (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VIRIDIAN]))
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index c33b9f9..623e4ce 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -458,6 +458,8 @@ void ept_walk_table(struct domain *d, unsigned long gfn);
 void setup_ept_dump(void);
 
 void update_guest_eip(void);
+int  vmx_pvh_vcpu_boot_set_info(struct vcpu *v,
+                                struct vcpu_guest_context *ctxtp);
 
 int alloc_p2m_hap_data(struct p2m_domain *p2m);
 void free_p2m_hap_data(struct p2m_domain *p2m);
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index b7f6a51..4f12f50 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -150,6 +150,10 @@ typedef uint64_t tsc_timestamp_t; /* RDTSC timestamp */
 /*
  * The following is all CPU context. Note that the fpu_ctxt block is filled 
  * in by FXSAVE if the CPU has feature FXSR; otherwise FSAVE is used.
+ *
+ * PVH 64bit: In the vcpu boot path, for vmcs context, only gs_base_kernel
+ *            is honored. Other fields like gdt, ldt, and selectors must be
+ *            zeroed. See vmx_pvh_vcpu_boot_set_info.
  */
 struct vcpu_guest_context {
     /* FPU registers come first so they can be aligned for FXSAVE/FXRSTOR. */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (14 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 15/21] pvh: Set up more PV stuff in set_info_guest George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-18 15:31   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 17/21] pvh: Disable 32-bit guest support for now George Dunlap
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

This means putting hooks into the vmexit handlers to call them instead
of the HVM ones.

It also means calling raw_copy_from_guest() rather than
copy_from_user(), and delivering page faults during PV emulation to
guests using the HVM injection path rather than the PV one for PVH
guests.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/hvm/vmx/vmx.c      |   58 +++++++++++++++++++++++++++++----------
 xen/arch/x86/traps.c            |   28 ++++++++++++++-----
 xen/include/asm-x86/processor.h |    2 ++
 xen/include/asm-x86/traps.h     |    3 ++
 4 files changed, 70 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b3f933a..0fa2d0b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -56,6 +56,7 @@
 #include <asm/apic.h>
 #include <asm/hvm/nestedhvm.h>
 #include <asm/event.h>
+#include <asm/traps.h>
 
 enum handler_return { HNDL_done, HNDL_unhandled, HNDL_exception_raised };
 
@@ -2669,8 +2670,16 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             /* Already handled above. */
             break;
         case TRAP_invalid_op:
-            HVMTRACE_1D(TRAP, vector);
-            vmx_vmexit_ud_intercept(regs);
+            if ( is_pvh_vcpu(v) )
+            {
+                if ( !emulate_forced_invalid_op(regs) )
+                    hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+            }
+            else
+            {
+                HVMTRACE_1D(TRAP, vector);
+                vmx_vmexit_ud_intercept(regs);
+            }
             break;
         default:
             HVMTRACE_1D(TRAP, vector);
@@ -2719,8 +2728,8 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         break;
     }
     case EXIT_REASON_CPUID:
+        is_pvh_vcpu(v) ? pv_cpuid(regs) : vmx_do_cpuid(regs);
         update_guest_eip(); /* Safe: CPUID */
-        vmx_do_cpuid(regs);
         break;
     case EXIT_REASON_HLT:
         update_guest_eip(); /* Safe: HLT */
@@ -2868,21 +2877,42 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     }
 
     case EXIT_REASON_IO_INSTRUCTION:
-        exit_qualification = __vmread(EXIT_QUALIFICATION);
-        if ( exit_qualification & 0x10 )
+        if ( is_pvh_vcpu(v) )
         {
-            /* INS, OUTS */
-            if ( !handle_mmio() )
-                hvm_inject_hw_exception(TRAP_gp_fault, 0);
+            /*
+             * Note: A PVH guest sets IOPL natively by setting bits in
+             *       the eflags, and not via hypercalls used by a PV.
+             */
+            struct segment_register seg;
+            int requested = (regs->rflags & X86_EFLAGS_IOPL) >> 12;
+            int curr_lvl = (regs->rflags & X86_EFLAGS_VM) ? 3 : 0;
+            
+            if ( curr_lvl == 0 )
+            {
+                hvm_get_segment_register(current, x86_seg_ss, &seg);
+                curr_lvl = seg.attr.fields.dpl;
+            }
+            if ( requested < curr_lvl || !emulate_privileged_op(regs) )
+                hvm_inject_hw_exception(TRAP_gp_fault, regs->error_code);
         }
         else
         {
-            /* IN, OUT */
-            uint16_t port = (exit_qualification >> 16) & 0xFFFF;
-            int bytes = (exit_qualification & 0x07) + 1;
-            int dir = (exit_qualification & 0x08) ? IOREQ_READ : IOREQ_WRITE;
-            if ( handle_pio(port, bytes, dir) )
-                update_guest_eip(); /* Safe: IN, OUT */
+            exit_qualification = __vmread(EXIT_QUALIFICATION);
+            if ( exit_qualification & 0x10 )
+            {
+                /* INS, OUTS */
+                if ( !handle_mmio() )
+                    hvm_inject_hw_exception(TRAP_gp_fault, 0);
+            }
+            else
+            {
+                /* IN, OUT */
+                uint16_t port = (exit_qualification >> 16) & 0xFFFF;
+                int bytes = (exit_qualification & 0x07) + 1;
+                int dir = (exit_qualification & 0x08) ? IOREQ_READ : IOREQ_WRITE;
+                if ( handle_pio(port, bytes, dir) )
+                    update_guest_eip(); /* Safe: IN, OUT */
+            }
         }
         break;
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 35d4a3e..683ef8f 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -738,7 +738,7 @@ int cpuid_hypervisor_leaves( uint32_t idx, uint32_t sub_idx,
     return 1;
 }
 
-static void pv_cpuid(struct cpu_user_regs *regs)
+void pv_cpuid(struct cpu_user_regs *regs)
 {
     uint32_t a, b, c, d;
 
@@ -915,7 +915,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     return EXCRET_fault_fixed;
 }
 
-static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
+int emulate_forced_invalid_op(struct cpu_user_regs *regs)
 {
     char sig[5], instr[2];
     unsigned long eip, rc;
@@ -923,7 +923,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     eip = regs->eip;
 
     /* Check for forced emulation signature: ud2 ; .ascii "xen". */
-    if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 )
+    if ( (rc = raw_copy_from_guest(sig, (char *)eip, sizeof(sig))) != 0 )
     {
         propagate_page_fault(eip + sizeof(sig) - rc, 0);
         return EXCRET_fault_fixed;
@@ -933,7 +933,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     eip += sizeof(sig);
 
     /* We only emulate CPUID. */
-    if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 )
+    if ( ( rc = raw_copy_from_guest(instr, (char *)eip, sizeof(instr))) != 0 )
     {
         propagate_page_fault(eip + sizeof(instr) - rc, 0);
         return EXCRET_fault_fixed;
@@ -1072,7 +1072,7 @@ static void reserved_bit_page_fault(
     show_execution_state(regs);
 }
 
-void propagate_page_fault(unsigned long addr, u16 error_code)
+static void pv_inject_page_fault(unsigned long addr, u16 error_code)
 {
     struct trap_info *ti;
     struct vcpu *v = current;
@@ -1106,6 +1106,13 @@ void propagate_page_fault(unsigned long addr, u16 error_code)
         reserved_bit_page_fault(addr, guest_cpu_user_regs());
 }
 
+void propagate_page_fault(unsigned long addr, u16 error_code)
+{
+    is_pvh_vcpu(current)
+        ? hvm_inject_page_fault(error_code, addr)
+        : pv_inject_page_fault(addr, error_code);
+}
+
 static int handle_gdt_ldt_mapping_fault(
     unsigned long offset, struct cpu_user_regs *regs)
 {
@@ -1624,6 +1631,13 @@ static int guest_io_okay(
     int user_mode = !(v->arch.flags & TF_kernel_mode);
 #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
 
+    /*
+     * For PVH we check this in vmexit for EXIT_REASON_IO_INSTRUCTION
+     * and so don't need to check again here.
+     */
+    if ( is_pvh_vcpu(v) )
+        return 1;
+
     if ( !vm86_mode(regs) &&
          (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ? 1 : 3)) )
         return 1;
@@ -1869,7 +1883,7 @@ static inline uint64_t guest_misc_enable(uint64_t val)
         _ptr = (unsigned int)_ptr;                                          \
     if ( (limit) < sizeof(_x) - 1 || (eip) > (limit) - (sizeof(_x) - 1) )   \
         goto fail;                                                          \
-    if ( (_rc = copy_from_user(&_x, (type *)_ptr, sizeof(_x))) != 0 )       \
+    if ( (_rc = raw_copy_from_guest(&_x, (type *)_ptr, sizeof(_x))) != 0 )  \
     {                                                                       \
         propagate_page_fault(_ptr + sizeof(_x) - _rc, 0);                   \
         goto skip;                                                          \
@@ -1884,7 +1898,7 @@ static int is_cpufreq_controller(struct domain *d)
 
 #include "x86_64/mmconfig.h"
 
-static int emulate_privileged_op(struct cpu_user_regs *regs)
+int emulate_privileged_op(struct cpu_user_regs *regs)
 {
     enum x86_segment which_sel;
     struct vcpu *v = current;
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 5cdacc7..22a9653 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -566,6 +566,8 @@ void microcode_set_module(unsigned int);
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len);
 int microcode_resume_cpu(int cpu);
 
+void pv_cpuid(struct cpu_user_regs *regs);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_X86_PROCESSOR_H */
diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h
index 82cbcee..20c9151 100644
--- a/xen/include/asm-x86/traps.h
+++ b/xen/include/asm-x86/traps.h
@@ -49,4 +49,7 @@ extern int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
 extern int send_guest_trap(struct domain *d, uint16_t vcpuid,
 				unsigned int trap_nr);
 
+int emulate_privileged_op(struct cpu_user_regs *regs);
+int emulate_forced_invalid_op(struct cpu_user_regs *regs);
+
 #endif /* ASM_TRAP_H */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 17/21] pvh: Disable 32-bit guest support for now
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (15 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-18 15:36   ` Jan Beulich
  2013-09-13 16:25 ` [PATCH RFC v12 18/21] pvh: Restrict tsc_mode to NEVER_EMULATE " George Dunlap
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

To be implemented.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/domain.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index d046790..5e43a67 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -339,6 +339,14 @@ int switch_compat(struct domain *d)
 
     if ( d == NULL )
         return -EINVAL;
+
+    if ( is_pvh_domain(d) )
+    {
+        printk(XENLOG_INFO
+               "Xen currently does not support 32bit PVH guests\n");
+        return -EINVAL;
+    }
+
     if ( !may_switch_mode(d) )
         return -EACCES;
     if ( is_pv_32on64_domain(d) )
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 18/21] pvh: Restrict tsc_mode to NEVER_EMULATE for now
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (16 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 17/21] pvh: Disable 32-bit guest support for now George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-13 16:25 ` [PATCH RFC v12 19/21] pvh: Disable debug traps when doing pv emulation for PVH domains George Dunlap
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

To be implemented.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/time.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index f047cb3..4589d43 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1891,6 +1891,14 @@ void tsc_set_info(struct domain *d,
         d->arch.vtsc = 0;
         return;
     }
+    if ( is_pvh_domain(d) && tsc_mode != TSC_MODE_NEVER_EMULATE )
+    {
+        /* PVH fixme: support more tsc modes. */
+        printk(XENLOG_WARNING
+               "PVH currently does not support tsc emulation. Setting timer_mode = native\n");
+        d->arch.vtsc = 0;
+        return;
+    }
 
     switch ( d->arch.tsc_mode = tsc_mode )
     {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 19/21] pvh: Disable debug traps when doing pv emulation for PVH domains
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (17 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 18/21] pvh: Restrict tsc_mode to NEVER_EMULATE " George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-13 16:25 ` [PATCH RFC v12 20/21] pvh: Disable memevents for PVH guests for now George Dunlap
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

I'm not exactly sure why these were singled out in Mukesh's patch
series.  One accesses a pv_cpu structure, but the other just
accesses debugreg.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/traps.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 683ef8f..9fdeb14 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -461,6 +461,11 @@ static void instruction_done(
     struct cpu_user_regs *regs, unsigned long eip, unsigned int bpmatch)
 {
     regs->eip = eip;
+
+    /* PVH fixme: Make debug traps work */
+    if ( is_pvh_vcpu(current) )
+        return;
+
     regs->eflags &= ~X86_EFLAGS_RF;
     if ( bpmatch || (regs->eflags & X86_EFLAGS_TF) )
     {
@@ -477,6 +482,10 @@ static unsigned int check_guest_io_breakpoint(struct vcpu *v,
     unsigned int width, i, match = 0;
     unsigned long start;
 
+    /* PVH fixme: support io breakpoint. */
+    if ( is_pvh_vcpu(v) )
+        return 0;
+
     if ( !(v->arch.debugreg[5]) ||
          !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) )
         return 0;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 20/21] pvh: Disable memevents for PVH guests for now
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (18 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 19/21] pvh: Disable debug traps when doing pv emulation for PVH domains George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-13 16:25 ` [PATCH RFC v12 21/21] pvh: Documentation George Dunlap
  2013-09-13 16:41 ` Introduce PVH domU support George Dunlap
  21 siblings, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser, Tim Deegan, Jan Beulich

To be implemented.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
CC: Jan Beulich <jan.beulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/hvm/hvm.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8b87271..73a0c1e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3015,7 +3015,8 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
     hvm_cpuid(1, &cpuid[0], &cpuid[1], &cpuid[2], &cpuid[3]);
     mtrr = !!(cpuid[3] & cpufeat_mask(X86_FEATURE_MTRR));
 
-    hvm_memory_event_msr(msr, msr_content);
+    if ( !is_pvh_vcpu(v) )
+        hvm_memory_event_msr(msr, msr_content);
 
     switch ( msr )
     {
@@ -4601,8 +4602,11 @@ static int hvm_memory_event_traps(long p, uint32_t reason,
     return 1;
 }
 
+/* PVH fixme: add support for monitoring guest behaviour in below functions. */
 void hvm_memory_event_cr0(unsigned long value, unsigned long old) 
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_CR0],
                            MEM_EVENT_REASON_CR0,
@@ -4611,6 +4615,8 @@ void hvm_memory_event_cr0(unsigned long value, unsigned long old)
 
 void hvm_memory_event_cr3(unsigned long value, unsigned long old) 
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_CR3],
                            MEM_EVENT_REASON_CR3,
@@ -4619,6 +4625,8 @@ void hvm_memory_event_cr3(unsigned long value, unsigned long old)
 
 void hvm_memory_event_cr4(unsigned long value, unsigned long old) 
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_CR4],
                            MEM_EVENT_REASON_CR4,
@@ -4627,6 +4635,8 @@ void hvm_memory_event_cr4(unsigned long value, unsigned long old)
 
 void hvm_memory_event_msr(unsigned long msr, unsigned long value)
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_MSR],
                            MEM_EVENT_REASON_MSR,
@@ -4639,6 +4649,8 @@ int hvm_memory_event_int3(unsigned long gla)
     unsigned long gfn;
     gfn = paging_gva_to_gfn(current, gla, &pfec);
 
+    if ( is_pvh_vcpu(current) )
+        return 0;
     return hvm_memory_event_traps(current->domain->arch.hvm_domain
                                     .params[HVM_PARAM_MEMORY_EVENT_INT3],
                                   MEM_EVENT_REASON_INT3,
@@ -4651,6 +4663,8 @@ int hvm_memory_event_single_step(unsigned long gla)
     unsigned long gfn;
     gfn = paging_gva_to_gfn(current, gla, &pfec);
 
+    if ( is_pvh_vcpu(current) )
+        return 0;
     return hvm_memory_event_traps(current->domain->arch.hvm_domain
             .params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP],
             MEM_EVENT_REASON_SINGLESTEP,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v12 21/21] pvh: Documentation
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (19 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 20/21] pvh: Disable memevents for PVH guests for now George Dunlap
@ 2013-09-13 16:25 ` George Dunlap
  2013-09-13 16:41 ` Introduce PVH domU support George Dunlap
  21 siblings, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

---
 docs/misc/pvh-readme.txt |   59 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)
 create mode 100644 docs/misc/pvh-readme.txt

diff --git a/docs/misc/pvh-readme.txt b/docs/misc/pvh-readme.txt
new file mode 100644
index 0000000..fc63f7d
--- /dev/null
+++ b/docs/misc/pvh-readme.txt
@@ -0,0 +1,59 @@
+
+PVH : an x86 PV guest running in an HVM container.
+
+See: http://blog.xen.org/index.php/2012/10/23/the-paravirtualization-spectrum-part-1-the-ends-of-the-spectrum/
+
+At the moment HAP is required for PVH.
+
+At present the only PVH guest is an x86 64bit PV linux. Patches are at:
+   git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
+
+A PVH guest kernel must support following features, as defined for linux
+in arch/x86/xen/xen-head.S:
+
+   #define FEATURES_PVH "|writable_descriptor_tables" \
+                        "|auto_translated_physmap"    \
+                        "|supervisor_mode_kernel"     \
+                        "|hvm_callback_vector"
+
+In a nutshell, the guest uses auto translate, ie, p2m is managed by
+xen, it uses event callback and not vlapic emulation, the page tables
+are native, so mmu_update hcall is N/A for PVH guest. Moreover IDT is
+native, so set_trap_table hcall is also N/A for a PVH guest. For a
+full list of hcalls supported for PVH, see pvh_hypercall64_table in
+arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a PV
+guest with auto translate, although it does use hvm_op for setting
+callback vector.
+
+The initial phase targets the booting of a 64bit UP/SMP linux guest in PVH
+mode. This is done by adding: pvh=1 in the config file. xl, and not xm, is
+supported. Phase I patches are broken into three parts:
+   - xen changes for booting of 64bit PVH guest
+   - tools changes for creating a PVH guest
+   - boot of 64bit dom0 in PVH mode.
+
+Following fixme's exist in the code:
+  - Add support for more memory types in arch/x86/hvm/mtrr.c.
+  - arch/x86/time.c: support more tsc modes.
+  - check_guest_io_breakpoint(): check/add support for IO breakpoint.
+  - implement arch_get_info_guest() for pvh.
+  - verify bp matching on emulated instructions will work same as HVM for
+    PVH guest. see instruction_done() and check_guest_io_breakpoint().
+
+Following remain to be done for PVH:
+   - Investigate what else needs to be done for VMI support.
+   - AMD port.
+   - 32bit PVH guest support in both linux and xen. Xen changes are tagged
+     "32bitfixme".
+   - Add support for monitoring guest behavior. See hvm_memory_event* functions
+     in hvm.c
+   - vcpu hotplug support
+   - Live migration of PVH guests.
+   - Avail PVH dom0 of posted interrupts. (This will be a big win).
+
+
+Note, any emails to me must be cc'd to xen devel mailing list. OTOH, please
+cc me on PVH emails to the xen devel mailing list.
+
+Mukesh Rathor
+mukesh.rathor [at] oracle [dot] com
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers
  2013-09-13 16:25 ` [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers George Dunlap
@ 2013-09-13 16:36   ` George Dunlap
       [not found]     ` <CAGU+aus16muryVYd-aOzv-CAXPk_xxVh_e-R7Ug1RxGRJ_MAfQ@mail.gmail.com>
  2013-09-16 23:17     ` Mukesh Rathor
  2013-09-18 14:18   ` Jan Beulich
  1 sibling, 2 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:36 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Tim Deegan, Jan Beulich, xen-devel

On 13/09/13 17:25, George Dunlap wrote:
> Things kept:
> * cacheattr_region lists
> * irq-related structures
> * paging
> * tm_list
>
> Things disabled for now:
> * compat xlation
>
> Things disabled:
> * Emulated timers and clock sources
> * IO/MMIO emulation
> * msix tables
> * hvm params
> * hvm_funcs
> * nested HVM
> * Fast-path for emulated lapic accesses
>
> Getting rid of the hvm_params struct required a couple other places to
> check for its existence before attempting to read the params.
>
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> CC: Jan Beulich <jan.beulich@suse.com>
> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> ---
>   xen/arch/x86/hvm/hvm.c      |   37 ++++++++++++++++++++++++++++++++++---
>   xen/arch/x86/hvm/io.c       |    4 ++++
>   xen/arch/x86/hvm/irq.c      |    3 +++
>   xen/arch/x86/hvm/mtrr.c     |    3 ++-
>   xen/arch/x86/hvm/vmx/intr.c |    3 ++-
>   xen/arch/x86/hvm/vmx/vmcs.c |    5 +++--
>   xen/arch/x86/hvm/vmx/vmx.c  |   10 ++++++++--
>   7 files changed, 56 insertions(+), 9 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 1764b78..6a7a006 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -301,6 +301,10 @@ u64 hvm_get_guest_tsc_adjust(struct vcpu *v)
>   
>   void hvm_migrate_timers(struct vcpu *v)
>   {
> +    /* PVH doesn't use rtc and emulated timers, it uses pvclock mechanism. */
> +    if ( is_pvh_vcpu(v) )
> +        return;
> +
>       rtc_migrate_timers(v);
>       pt_migrate(v);
>   }
> @@ -342,10 +346,13 @@ void hvm_do_resume(struct vcpu *v)
>   {
>       ioreq_t *p;
>   
> -    pt_restore_timer(v);
> -
>       check_wakeup_from_wait();
>   
> +    if ( is_pvh_vcpu(v) )
> +        goto check_inject_trap;
> +
> +    pt_restore_timer(v);
> +
>       /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
>       p = get_ioreq(v);
>       while ( p->state != STATE_IOREQ_NONE )
> @@ -368,6 +375,7 @@ void hvm_do_resume(struct vcpu *v)
>           }
>       }
>   
> +  check_inject_trap:
>       /* Inject pending hw/sw trap */
>       if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
>       {
> @@ -521,6 +529,7 @@ int hvm_domain_initialise(struct domain *d)
>           return -EINVAL;
>       }
>   
> +    /* PVH: pbut_lock and uc_lock unused, but won't hurt */
>       spin_lock_init(&d->arch.hvm_domain.pbuf_lock);
>       spin_lock_init(&d->arch.hvm_domain.irq_lock);
>       spin_lock_init(&d->arch.hvm_domain.uc_lock);
> @@ -531,6 +540,9 @@ int hvm_domain_initialise(struct domain *d)
>       if ( rc != 0 )
>           goto fail0;
>   
> +    if ( is_pvh_domain(d) )
> +        return 0;
> +
>       INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
>       spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock);
>   
> @@ -584,6 +596,9 @@ int hvm_domain_initialise(struct domain *d)
>   
>   void hvm_domain_relinquish_resources(struct domain *d)
>   {
> +    if ( is_pvh_domain(d) )
> +        return;
> +
>       if ( hvm_funcs.nhvm_domain_relinquish_resources )
>           hvm_funcs.nhvm_domain_relinquish_resources(d);
>   
> @@ -609,6 +624,10 @@ void hvm_domain_relinquish_resources(struct domain *d)
>   void hvm_domain_destroy(struct domain *d)
>   {
>       hvm_destroy_cacheattr_region_list(d);
> +
> +    if ( is_pvh_domain(d) )
> +        return;
> +
>       hvm_funcs.domain_destroy(d);
>       rtc_deinit(d);
>       stdvga_deinit(d);
> @@ -1093,6 +1112,14 @@ int hvm_vcpu_initialise(struct vcpu *v)
>   
>       v->arch.hvm_vcpu.inject_trap.vector = -1;
>   
> +    if ( is_pvh_vcpu(v) )
> +    {
> +        v->arch.hvm_vcpu.hcall_64bit = 1;    /* PVH 32bitfixme. */
> +        /* This for hvm_long_mode_enabled(v). */
> +        v->arch.hvm_vcpu.guest_efer = EFER_SCE | EFER_LMA | EFER_LME;
> +        return 0;
> +    }
> +
>       rc = setup_compat_arg_xlat(v); /* teardown: free_compat_arg_xlat() */
>       if ( rc != 0 )
>           goto fail3;
> @@ -1168,7 +1195,10 @@ void hvm_vcpu_destroy(struct vcpu *v)
>   
>       tasklet_kill(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet);
>       hvm_vcpu_cacheattr_destroy(v);
> -    vlapic_destroy(v);
> +
> +    if ( is_hvm_vcpu(v) )
> +        vlapic_destroy(v);
> +
>       hvm_funcs.vcpu_destroy(v);
>   
>       /* Event channel is already freed by evtchn_destroy(). */
> @@ -1369,6 +1399,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
>       /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
>        * a fast path for LAPIC accesses, skipping the p2m lookup. */
>       if ( !nestedhvm_vcpu_in_guestmode(v)
> +         && is_hvm_vcpu(v)
>            && gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
>       {
>           if ( !handle_mmio() )
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index 4ae2c0c..3af4b34 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -175,6 +175,10 @@ int handle_mmio(void)
>       struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>       int rc;
>   
> +    /* No MMIO for PVH vcpus */
> +    if ( is_pvh_vcpu(curr) )
> +        return 0;
> +
>       hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
>   
>       rc = hvm_emulate_one(&ctxt);
> diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
> index 9eae5de..92fb245 100644
> --- a/xen/arch/x86/hvm/irq.c
> +++ b/xen/arch/x86/hvm/irq.c
> @@ -405,6 +405,9 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v)
>            && vcpu_info(v, evtchn_upcall_pending) )
>           return hvm_intack_vector(plat->irq.callback_via.vector);
>   
> +    if ( is_pvh_vcpu(v) )
> +        return hvm_intack_none;
> +
>       if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output )
>           return hvm_intack_pic(0);
>   
> diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
> index ef51a8d..df888a6 100644
> --- a/xen/arch/x86/hvm/mtrr.c
> +++ b/xen/arch/x86/hvm/mtrr.c
> @@ -693,7 +693,8 @@ uint8_t epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
>            ((d->vcpu == NULL) || ((v = d->vcpu[0]) == NULL)) )
>           return MTRR_TYPE_WRBACK;
>   
> -    if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )
> +    if ( v->domain->arch.hvm_domain.params
> +         && !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )

This is one thing I want to discuss: I can see why initially it was 
decided to disable hvm_params, as at the moment PVH only uses one of 
them, and this saves allocating the array for PVH domains.  But the 
result is a lot of fairly ugly things like this.  There's also (in 
another patch) a hack that allows a guest to *set* the IRQ callback via 
hvmop hvm_param_set, but not *read* it.

Additionally, as far as I can tell, the only reason we can't support mem 
events is because we don't have hvm_params.

Since I think at some point we well want to use the mem events on PVH 
guests, we should probably just allocate it now and avoid having things 
like this in the first place.

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 08/21] pvh: vmx-specific changes
  2013-09-13 16:25 ` [PATCH RFC v12 08/21] pvh: vmx-specific changes George Dunlap
@ 2013-09-13 16:38   ` George Dunlap
  2013-09-16  7:37     ` Jan Beulich
  2013-09-16 23:12     ` Mukesh Rathor
  2013-09-18 14:25   ` Jan Beulich
  1 sibling, 2 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:38 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Tim Deegan, Jan Beulich, xen-devel

On 13/09/13 17:25, George Dunlap wrote:
> Changes:
> * Enforce HAP mode for now
> * Disable exits related to virtual interrupts or emulated APICs
> * Disable changing paging mode
>   - "unrestricted guest" (i.e., real mode for EPT) disabled
>   - write guest EFER disabled
> * Start in 64-bit mode
> * Force TSC mode to be "none"
> * Paging mode update to happen in arch_set_info_guest
>
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> CC: Jan Beulich <jan.beulich@suse.com>
> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> ---
>   xen/arch/x86/hvm/vmx/vmcs.c |  140 +++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 135 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index fa90493..f016343 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -828,6 +828,58 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
>       virtual_vmcs_exit(vvmcs);
>   }
>   
> +static int pvh_check_requirements(struct vcpu *v)
> +{
> +    u64 required, tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features);
> +
> +    if ( !paging_mode_hap(v->domain) )
> +    {
> +        printk(XENLOG_G_INFO "HAP is required for PVH guest.\n");
> +        return -EINVAL;
> +    }
> +    if ( !cpu_has_vmx_ept )
> +    {
> +        printk(XENLOG_G_INFO "PVH: CPU does not have EPT support\n");
> +        return -ENOSYS;
> +    }
> +    if ( !cpu_has_vmx_pat )
> +    {
> +        printk(XENLOG_G_INFO "PVH: CPU does not have PAT support\n");
> +        return -ENOSYS;
> +    }
> +    if ( !cpu_has_vmx_msr_bitmap )
> +    {
> +        printk(XENLOG_G_INFO "PVH: CPU does not have msr bitmap\n");
> +        return -ENOSYS;
> +    }
> +    if ( !cpu_has_vmx_secondary_exec_control )
> +    {
> +        printk(XENLOG_G_INFO "CPU Secondary exec is required to run PVH\n");
> +        return -ENOSYS;
> +    }
> +
> +    /*
> +     * If rdtsc exiting is turned on and it goes thru emulate_privileged_op,
> +     * then pv_vcpu.ctrlreg must be added to the pvh struct.
> +     */
> +    if ( v->domain->arch.vtsc )
> +    {
> +        printk(XENLOG_G_INFO
> +               "At present PVH only supports the default timer mode\n");
> +        return -ENOSYS;
> +    }
> +
> +    required = X86_CR4_PAE | X86_CR4_VMXE | X86_CR4_OSFXSR;
> +    if ( (tmpval & required) != required )
> +    {
> +        printk(XENLOG_G_INFO "PVH: required CR4 features not available:%lx\n",
> +               required);
> +        return -ENOSYS;
> +    }
> +
> +    return 0;
> +}
> +
>   static int construct_vmcs(struct vcpu *v)
>   {
>       struct domain *d = v->domain;
> @@ -836,6 +888,13 @@ static int construct_vmcs(struct vcpu *v)
>       u32 vmexit_ctl = vmx_vmexit_control;
>       u32 vmentry_ctl = vmx_vmentry_control;
>   
> +    if ( is_pvh_domain(d) )
> +    {
> +        int rc = pvh_check_requirements(v);
> +        if ( rc )
> +            return rc;
> +    }
> +
>       vmx_vmcs_enter(v);
>   
>       /* VMCS controls. */
> @@ -874,7 +933,44 @@ static int construct_vmcs(struct vcpu *v)
>       /* Do not enable Monitor Trap Flag unless start single step debug */
>       v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG;
>   
> +    if ( is_pvh_domain(d) )
> +    {
> +        /* Disable virtual apics, TPR */
> +        v->arch.hvm_vmx.secondary_exec_control &=
> +            ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
> +              | SECONDARY_EXEC_APIC_REGISTER_VIRT
> +              | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> +        v->arch.hvm_vmx.exec_control &= ~CPU_BASED_TPR_SHADOW;
> +
> +        /* Disable wbinvd (only necessary for MMIO),
> +         * unrestricted guest (real mode for EPT) */
> +        v->arch.hvm_vmx.secondary_exec_control &=
> +            ~(SECONDARY_EXEC_UNRESTRICTED_GUEST
> +              | SECONDARY_EXEC_WBINVD_EXITING);
> +
> +        ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS);
> +        ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_ACTIVATE_MSR_BITMAP);
> +        ASSERT(!(v->arch.hvm_vmx.exec_control & CPU_BASED_RDTSC_EXITING));
> +
> +        /*
> +         * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of 1, which means
> +         * upon vmentry, the cpu reads/loads VMCS.DR7 and VMCS.DEBUGCTLS, and not
> +         * use the host values. 0 would cause it to not use the VMCS values.
> +         */
> +
> +        /* PVH: I don't think these are necessary */
> +        v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
> +        vmentry_ctl &= ~VM_ENTRY_LOAD_GUEST_EFER;
> +        vmentry_ctl &= ~VM_ENTRY_SMM;
> +        vmentry_ctl &= ~VM_ENTRY_DEACT_DUAL_MONITOR;

Mukesh, I couldn't figure out why you had set these.  What are these 
about, and are they really necessary?

> +    /* PVH: Why is ET not set? */
> +    v->arch.hvm_vcpu.guest_cr[0] = is_pvh_domain(d) ?
> +        ( X86_CR0_PG | X86_CR0_NE | X86_CR0_PE | X86_CR0_WP )
> +        : ( X86_CR0_PE | X86_CR0_ET );

I also couldn't figure out why CR0_ET wasn't set for PVH vms; is it not 
set for PV vms?

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 14/21] pvh: read_descriptor for PVH guests.
  2013-09-13 16:25 ` [PATCH RFC v12 14/21] pvh: read_descriptor for PVH guests George Dunlap
@ 2013-09-13 16:40   ` George Dunlap
  2013-09-18 15:00   ` Jan Beulich
  1 sibling, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:40 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Tim Deegan, Jan Beulich, xen-devel

On 13/09/13 17:25, George Dunlap wrote:
> This is in preparation for enabling emulated privops for PVH guests.
>
> This one unfortunately has a bit more of an impedance mismatch:
>
> * For PV, the selector is hard-coded in by passing #name in the
>    #define.  For PVH, we have to do a switch statement.
>
> * For PV, given the desrciptor, it can read the resulting base, limit,
>    &c directly; for PVH, we have to read the values currently loaded in the vmcs.
>
> This leads to a rather awkward construct where we *both* read the
> descriptor, *and* specify a selector, and introduce an intermediate function,
> read_descriptor_sel.
>
> Unfortunately, without introducing a rather pointless switch()
> statement to the PV path similar to the one in the PVH path, there's
> no way to make this less awkward.

If anyone has any clever ideas about this one, feel free to chime in...

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Introduce PVH domU support
  2013-09-13 16:25 Introduce PVH domU support George Dunlap
                   ` (20 preceding siblings ...)
  2013-09-13 16:25 ` [PATCH RFC v12 21/21] pvh: Documentation George Dunlap
@ 2013-09-13 16:41 ` George Dunlap
  21 siblings, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-13 16:41 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Tim Deegan, Jan Beulich, xen-devel

On 13/09/13 17:25, George Dunlap wrote:
> CC: Jan Beulich <jan.beulich@suse.com>

Oops, that should be JBeulich@suse.com...

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers
       [not found]     ` <CAGU+aus16muryVYd-aOzv-CAXPk_xxVh_e-R7Ug1RxGRJ_MAfQ@mail.gmail.com>
@ 2013-09-13 21:33       ` Aravindh Puthiyaparambil (aravindp)
  0 siblings, 0 replies; 76+ messages in thread
From: Aravindh Puthiyaparambil (aravindp) @ 2013-09-13 21:33 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tim Deegan, Keir Fraser, Jan Beulich (JBeulich@suse.com), xen-devel

> From: George Dunlap <george.dunlap@eu.citrix.com>
> Date: Fri, Sep 13, 2013 at 9:36 AM
> Subject: Re: [Xen-devel] [PATCH RFC v12 07/21] pvh: Disable unneeded
> features of HVM containers
> To: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Keir Fraser <keir@xen.org>, Tim Deegan <tim@xen.org>, Jan Beulich
> <jan.beulich@suse.com>, xen-devel@lists.xen.org
> 
> 
> On 13/09/13 17:25, George Dunlap wrote:
> >
> > Things kept:
> > * cacheattr_region lists
> > * irq-related structures
> > * paging
> > * tm_list
> >
> > Things disabled for now:
> > * compat xlation
> >
> > Things disabled:
> > * Emulated timers and clock sources
> > * IO/MMIO emulation
> > * msix tables
> > * hvm params
> > * hvm_funcs
> > * nested HVM
> > * Fast-path for emulated lapic accesses
> >
> > Getting rid of the hvm_params struct required a couple other places to
> > check for its existence before attempting to read the params.
> >
> > Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > CC: Jan Beulich <jan.beulich@suse.com>
> > CC: Tim Deegan <tim@xen.org>
> > CC: Keir Fraser <keir@xen.org>
> > ---
> >   xen/arch/x86/hvm/hvm.c      |   37
> ++++++++++++++++++++++++++++++++++---
> >   xen/arch/x86/hvm/io.c       |    4 ++++
> >   xen/arch/x86/hvm/irq.c      |    3 +++
> >   xen/arch/x86/hvm/mtrr.c     |    3 ++-
> >   xen/arch/x86/hvm/vmx/intr.c |    3 ++-
> >   xen/arch/x86/hvm/vmx/vmcs.c |    5 +++--
> >   xen/arch/x86/hvm/vmx/vmx.c  |   10 ++++++++--
> >   7 files changed, 56 insertions(+), 9 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index
> > 1764b78..6a7a006 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -301,6 +301,10 @@ u64 hvm_get_guest_tsc_adjust(struct vcpu *v)
> >     void hvm_migrate_timers(struct vcpu *v)
> >   {
> > +    /* PVH doesn't use rtc and emulated timers, it uses pvclock mechanism.
> */
> > +    if ( is_pvh_vcpu(v) )
> > +        return;
> > +
> >       rtc_migrate_timers(v);
> >       pt_migrate(v);
> >   }
> > @@ -342,10 +346,13 @@ void hvm_do_resume(struct vcpu *v)
> >   {
> >       ioreq_t *p;
> >   -    pt_restore_timer(v);
> > -
> >       check_wakeup_from_wait();
> >   +    if ( is_pvh_vcpu(v) )
> > +        goto check_inject_trap;
> > +
> > +    pt_restore_timer(v);
> > +
> >       /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE).
> */
> >       p = get_ioreq(v);
> >       while ( p->state != STATE_IOREQ_NONE ) @@ -368,6 +375,7 @@ void
> > hvm_do_resume(struct vcpu *v)
> >           }
> >       }
> >   +  check_inject_trap:
> >       /* Inject pending hw/sw trap */
> >       if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
> >       {
> > @@ -521,6 +529,7 @@ int hvm_domain_initialise(struct domain *d)
> >           return -EINVAL;
> >       }
> >   +    /* PVH: pbut_lock and uc_lock unused, but won't hurt */
> >       spin_lock_init(&d->arch.hvm_domain.pbuf_lock);
> >       spin_lock_init(&d->arch.hvm_domain.irq_lock);
> >       spin_lock_init(&d->arch.hvm_domain.uc_lock);
> > @@ -531,6 +540,9 @@ int hvm_domain_initialise(struct domain *d)
> >       if ( rc != 0 )
> >           goto fail0;
> >   +    if ( is_pvh_domain(d) )
> > +        return 0;
> > +
> >       INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
> >       spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock);
> >   @@ -584,6 +596,9 @@ int hvm_domain_initialise(struct domain *d)
> >     void hvm_domain_relinquish_resources(struct domain *d)
> >   {
> > +    if ( is_pvh_domain(d) )
> > +        return;
> > +
> >       if ( hvm_funcs.nhvm_domain_relinquish_resources )
> >           hvm_funcs.nhvm_domain_relinquish_resources(d);
> >   @@ -609,6 +624,10 @@ void hvm_domain_relinquish_resources(struct
> domain *d)
> >   void hvm_domain_destroy(struct domain *d)
> >   {
> >       hvm_destroy_cacheattr_region_list(d);
> > +
> > +    if ( is_pvh_domain(d) )
> > +        return;
> > +
> >       hvm_funcs.domain_destroy(d);
> >       rtc_deinit(d);
> >       stdvga_deinit(d);
> > @@ -1093,6 +1112,14 @@ int hvm_vcpu_initialise(struct vcpu *v)
> >         v->arch.hvm_vcpu.inject_trap.vector = -1;
> >   +    if ( is_pvh_vcpu(v) )
> > +    {
> > +        v->arch.hvm_vcpu.hcall_64bit = 1;    /* PVH 32bitfixme. */
> > +        /* This for hvm_long_mode_enabled(v). */
> > +        v->arch.hvm_vcpu.guest_efer = EFER_SCE | EFER_LMA | EFER_LME;
> > +        return 0;
> > +    }
> > +
> >       rc = setup_compat_arg_xlat(v); /* teardown: free_compat_arg_xlat()
> */
> >       if ( rc != 0 )
> >           goto fail3;
> > @@ -1168,7 +1195,10 @@ void hvm_vcpu_destroy(struct vcpu *v)
> >         tasklet_kill(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet);
> >       hvm_vcpu_cacheattr_destroy(v);
> > -    vlapic_destroy(v);
> > +
> > +    if ( is_hvm_vcpu(v) )
> > +        vlapic_destroy(v);
> > +
> >       hvm_funcs.vcpu_destroy(v);
> >         /* Event channel is already freed by evtchn_destroy(). */ @@
> > -1369,6 +1399,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> >       /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
> >        * a fast path for LAPIC accesses, skipping the p2m lookup. */
> >       if ( !nestedhvm_vcpu_in_guestmode(v)
> > +         && is_hvm_vcpu(v)
> >            && gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
> >       {
> >           if ( !handle_mmio() )
> > diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index
> > 4ae2c0c..3af4b34 100644
> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -175,6 +175,10 @@ int handle_mmio(void)
> >       struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
> >       int rc;
> >   +    /* No MMIO for PVH vcpus */
> > +    if ( is_pvh_vcpu(curr) )
> > +        return 0;
> > +
> >       hvm_emulate_prepare(&ctxt, guest_cpu_user_regs());
> >         rc = hvm_emulate_one(&ctxt);
> > diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c index
> > 9eae5de..92fb245 100644
> > --- a/xen/arch/x86/hvm/irq.c
> > +++ b/xen/arch/x86/hvm/irq.c
> > @@ -405,6 +405,9 @@ struct hvm_intack
> hvm_vcpu_has_pending_irq(struct vcpu *v)
> >            && vcpu_info(v, evtchn_upcall_pending) )
> >           return hvm_intack_vector(plat->irq.callback_via.vector);
> >   +    if ( is_pvh_vcpu(v) )
> > +        return hvm_intack_none;
> > +
> >       if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output )
> >           return hvm_intack_pic(0);
> >   diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c index
> > ef51a8d..df888a6 100644
> > --- a/xen/arch/x86/hvm/mtrr.c
> > +++ b/xen/arch/x86/hvm/mtrr.c
> > @@ -693,7 +693,8 @@ uint8_t epte_get_entry_emt(struct domain *d,
> unsigned long gfn, mfn_t mfn,
> >            ((d->vcpu == NULL) || ((v = d->vcpu[0]) == NULL)) )
> >           return MTRR_TYPE_WRBACK;
> >   -    if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )
> > +    if ( v->domain->arch.hvm_domain.params
> > +         && !v->domain-
> >arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )
> 
> 
> This is one thing I want to discuss: I can see why initially it was decided to
> disable hvm_params, as at the moment PVH only uses one of them, and this
> saves allocating the array for PVH domains.  But the result is a lot of fairly ugly
> things like this.  There's also (in another patch) a hack that allows a guest to
> *set* the IRQ callback via hvmop hvm_param_set, but not *read* it.
> 
> Additionally, as far as I can tell, the only reason we can't support mem events
> is because we don't have hvm_params.
> 
> Since I think at some point we well want to use the mem events on PVH
> guests, we should probably just allocate it now and avoid having things like
> this in the first place.

I definitely would like to use mem_access with PVH domains, so it would be good if this is not disabled. I am willing to test / make it work once these patches go in.

Thanks,
Aravindh  

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 08/21] pvh: vmx-specific changes
  2013-09-13 16:38   ` George Dunlap
@ 2013-09-16  7:37     ` Jan Beulich
  2013-09-16  9:15       ` George Dunlap
  2013-09-16 23:12     ` Mukesh Rathor
  1 sibling, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2013-09-16  7:37 UTC (permalink / raw)
  To: George Dunlap; +Cc: Tim Deegan, Keir Fraser, Jan Beulich, xen-devel

>>> On 13.09.13 at 18:38, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 13/09/13 17:25, George Dunlap wrote:
>> +    /* PVH: Why is ET not set? */
>> +    v->arch.hvm_vcpu.guest_cr[0] = is_pvh_domain(d) ?
>> +        ( X86_CR0_PG | X86_CR0_NE | X86_CR0_PE | X86_CR0_WP )
>> +        : ( X86_CR0_PE | X86_CR0_ET );
> 
> I also couldn't figure out why CR0_ET wasn't set for PVH vms; is it not 
> set for PV vms?

This one's certainly wrong - it shouldn't even be possible to clear
CR0.ET in hardware on anything post-486 (i.e. trying to clear it
would be silently ignored by the hardware, and not setting it in
the code above is merely confusing, but not actively breaking
anything).

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 08/21] pvh: vmx-specific changes
  2013-09-16  7:37     ` Jan Beulich
@ 2013-09-16  9:15       ` George Dunlap
  0 siblings, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-16  9:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Keir Fraser, Jan Beulich, xen-devel

On 16/09/13 08:37, Jan Beulich wrote:
>>>> On 13.09.13 at 18:38, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> On 13/09/13 17:25, George Dunlap wrote:
>>> +    /* PVH: Why is ET not set? */
>>> +    v->arch.hvm_vcpu.guest_cr[0] = is_pvh_domain(d) ?
>>> +        ( X86_CR0_PG | X86_CR0_NE | X86_CR0_PE | X86_CR0_WP )
>>> +        : ( X86_CR0_PE | X86_CR0_ET );
>> I also couldn't figure out why CR0_ET wasn't set for PVH vms; is it not
>> set for PV vms?
> This one's certainly wrong - it shouldn't even be possible to clear
> CR0.ET in hardware on anything post-486 (i.e. trying to clear it
> would be silently ignored by the hardware, and not setting it in
> the code above is merely confusing, but not actively breaking
> anything).

Right -- so it would be better if this were something like:

v->arch.hvm_vcpu.guest_cr[0] = $SHARED_FLAGS;
if ( is_pvh_domain(d))
     v->arch.hvm_vcpu.guest_cr[0] |= $PVH_FLAGS.

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 03/21] Remove an unnecessary assert from vmx_update_debug_state
  2013-09-13 16:25 ` [PATCH RFC v12 03/21] Remove an unnecessary assert from vmx_update_debug_state George Dunlap
@ 2013-09-16 21:09   ` Mukesh Rathor
  2013-09-18 10:39     ` George Dunlap
  0 siblings, 1 reply; 76+ messages in thread
From: Mukesh Rathor @ 2013-09-16 21:09 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Tim Deegan, Jan Beulich, xen-devel

On Fri, 13 Sep 2013 17:25:03 +0100
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> As far as I can tell, there's nothing here that requires v to be
> current.  vmx_update_exception_bitmap() is updated in other situations
> where v!=current.
> 
> Removing this allows the PVH code to call this during vmcs
> construction in a later patch, making the code more robust by removing
> duplicate code.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Mukesh Rathor <mukesh.rathor@oracle.com>
> CC: Jan Beulich <jan.beulich@suse.com>
> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c |    2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 8ed7026..f02e47a 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1041,8 +1041,6 @@ void vmx_update_debug_state(struct vcpu *v)
>  {
>      unsigned long mask;
>  
> -    ASSERT(v == current);
> -
>      mask = 1u << TRAP_int3;
>      if ( !cpu_has_monitor_trap_flag )
>          mask |= 1u << TRAP_debug;


vmx_update_exception_bitmap() writes EXCEPTION_BITMAP of current vmcs,
ie of current vcpu. If v != current, you'd be writing the bitmap of
wrong vcpu. If the assertion is removed, then the function would need
to vmx enter... 

mukesh

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 08/21] pvh: vmx-specific changes
  2013-09-13 16:38   ` George Dunlap
  2013-09-16  7:37     ` Jan Beulich
@ 2013-09-16 23:12     ` Mukesh Rathor
  2013-09-17  8:48       ` George Dunlap
  1 sibling, 1 reply; 76+ messages in thread
From: Mukesh Rathor @ 2013-09-16 23:12 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Tim Deegan, Jan Beulich, xen-devel

On Fri, 13 Sep 2013 17:38:41 +0100
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 13/09/13 17:25, George Dunlap wrote:
> > Changes:
> > * Enforce HAP mode for now
> > * Disable exits related to virtual interrupts or emulated APICs
> > * Disable changing paging mode
> >   - "unrestricted guest" (i.e., real mode for EPT) disabled
........
> > +        ASSERT(v->arch.hvm_vmx.exec_control &
> > CPU_BASED_ACTIVATE_MSR_BITMAP);
> > +        ASSERT(!(v->arch.hvm_vmx.exec_control &
> > CPU_BASED_RDTSC_EXITING)); +
> > +        /*
> > +         * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of
> > 1, which means
> > +         * upon vmentry, the cpu reads/loads VMCS.DR7 and
> > VMCS.DEBUGCTLS, and not
> > +         * use the host values. 0 would cause it to not use the
> > VMCS values.
> > +         */
> > +
> > +        /* PVH: I don't think these are necessary */
> > +        v->arch.hvm_vmx.exec_control &=
> > ~CPU_BASED_VIRTUAL_NMI_PENDING;
> > +        vmentry_ctl &= ~VM_ENTRY_LOAD_GUEST_EFER;
> > +        vmentry_ctl &= ~VM_ENTRY_SMM;
> > +        vmentry_ctl &= ~VM_ENTRY_DEACT_DUAL_MONITOR;
> 
> Mukesh, I couldn't figure out why you had set these.  What are these 
> about, and are they really necessary?

We clear those because:
  - VM_ENTRY_LOAD_GUEST_EFER: "Intel sdm 23.3.2.1 Loading Guest Control...."
     states that setting to 0 would cause the LMA to be loaded from
     VM_ENTRY_IA32E_MODE.
  - VM_ENTRY_SMM: also according to the SDM must be 0 if not entering from
                   SMM.
  - VM_ENTRY_DEACT_DUAL_MONITOR: same, must be 0 for entry outside SMM.

sorry, looking at hard copy so can't cut and paste.

> > +    /* PVH: Why is ET not set? */
> > +    v->arch.hvm_vcpu.guest_cr[0] = is_pvh_domain(d) ?
> > +        ( X86_CR0_PG | X86_CR0_NE | X86_CR0_PE | X86_CR0_WP )
> > +        : ( X86_CR0_PE | X86_CR0_ET );
> 
> I also couldn't figure out why CR0_ET wasn't set for PVH vms; is it
> not set for PV vms?

I was confused reading the section on it in the SDM, it appeared it
was not relevant after 486. But reading more, like Jan said, we
 should just set it.

thanks
Mukesh

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers
  2013-09-13 16:36   ` George Dunlap
       [not found]     ` <CAGU+aus16muryVYd-aOzv-CAXPk_xxVh_e-R7Ug1RxGRJ_MAfQ@mail.gmail.com>
@ 2013-09-16 23:17     ` Mukesh Rathor
  2013-09-18 10:50       ` George Dunlap
  1 sibling, 1 reply; 76+ messages in thread
From: Mukesh Rathor @ 2013-09-16 23:17 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Tim Deegan, Jan Beulich, xen-devel

On Fri, 13 Sep 2013 17:36:06 +0100
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 13/09/13 17:25, George Dunlap wrote:
> > Things kept:
> > * cacheattr_region lists
> > * irq-related structures
> > * paging
> > * tm_list
> >
> > Things disabled for now:
> > * compat xlation
> >
> > Things disabled:
> > * Emulated timers and clock sources
> > * IO/MMIO emulation
> > * msix tables
> > * hvm params
> > * hvm_funcs
......
> > unsigned long gfn, mfn_t mfn, ((d->vcpu == NULL) || ((v =
> > d->vcpu[0]) == NULL)) ) return MTRR_TYPE_WRBACK;
> >   
> > -    if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )
> > +    if ( v->domain->arch.hvm_domain.params
> > +
> > && !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )
> 
> This is one thing I want to discuss: I can see why initially it was 
> decided to disable hvm_params, as at the moment PVH only uses one of 
> them, and this saves allocating the array for PVH domains.  But the 
> result is a lot of fairly ugly things like this.  There's also (in 
> another patch) a hack that allows a guest to *set* the IRQ callback
> via hvmop hvm_param_set, but not *read* it.
> 
> Additionally, as far as I can tell, the only reason we can't support
> mem events is because we don't have hvm_params.
> 
> Since I think at some point we well want to use the mem events on PVH 
> guests, we should probably just allocate it now and avoid having
> things like this in the first place.

That would work too. My goal was to add things incrementally so they
got properly tested.

Mukesh

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 08/21] pvh: vmx-specific changes
  2013-09-16 23:12     ` Mukesh Rathor
@ 2013-09-17  8:48       ` George Dunlap
  2013-09-18  0:13         ` Mukesh Rathor
  0 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-17  8:48 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Keir Fraser, Tim Deegan, Jan Beulich, xen-devel

On 09/17/2013 12:12 AM, Mukesh Rathor wrote:
> On Fri, 13 Sep 2013 17:38:41 +0100
> George Dunlap <george.dunlap@eu.citrix.com> wrote:
>
>> On 13/09/13 17:25, George Dunlap wrote:
>>> Changes:
>>> * Enforce HAP mode for now
>>> * Disable exits related to virtual interrupts or emulated APICs
>>> * Disable changing paging mode
>>>    - "unrestricted guest" (i.e., real mode for EPT) disabled
> ........
>>> +        ASSERT(v->arch.hvm_vmx.exec_control &
>>> CPU_BASED_ACTIVATE_MSR_BITMAP);
>>> +        ASSERT(!(v->arch.hvm_vmx.exec_control &
>>> CPU_BASED_RDTSC_EXITING)); +
>>> +        /*
>>> +         * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of
>>> 1, which means
>>> +         * upon vmentry, the cpu reads/loads VMCS.DR7 and
>>> VMCS.DEBUGCTLS, and not
>>> +         * use the host values. 0 would cause it to not use the
>>> VMCS values.
>>> +         */
>>> +
>>> +        /* PVH: I don't think these are necessary */
>>> +        v->arch.hvm_vmx.exec_control &=
>>> ~CPU_BASED_VIRTUAL_NMI_PENDING;
>>> +        vmentry_ctl &= ~VM_ENTRY_LOAD_GUEST_EFER;
>>> +        vmentry_ctl &= ~VM_ENTRY_SMM;
>>> +        vmentry_ctl &= ~VM_ENTRY_DEACT_DUAL_MONITOR;
>>
>> Mukesh, I couldn't figure out why you had set these.  What are these
>> about, and are they really necessary?
>
> We clear those because:
>    - VM_ENTRY_LOAD_GUEST_EFER: "Intel sdm 23.3.2.1 Loading Guest Control...."
>       states that setting to 0 would cause the LMA to be loaded from
>       VM_ENTRY_IA32E_MODE.

OK -- but if we just set the LMA properly in the vmcs or wherever, and 
prevent the guest from changing it, then it will DTRT, won't it?

>    - VM_ENTRY_SMM: also according to the SDM must be 0 if not entering from
>                     SMM.
>    - VM_ENTRY_DEACT_DUAL_MONITOR: same, must be 0 for entry outside SMM.

But those have nothing to do with PVH -- Plain HVM guests needs that as 
well.  So they're already being cleared (or never being set) -- we don't 
need to add these lines in a PVH-specific section.

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 08/21] pvh: vmx-specific changes
  2013-09-17  8:48       ` George Dunlap
@ 2013-09-18  0:13         ` Mukesh Rathor
  0 siblings, 0 replies; 76+ messages in thread
From: Mukesh Rathor @ 2013-09-18  0:13 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Tim Deegan, Jan Beulich, xen-devel

On Tue, 17 Sep 2013 09:48:08 +0100
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 09/17/2013 12:12 AM, Mukesh Rathor wrote:
> > On Fri, 13 Sep 2013 17:38:41 +0100
> > George Dunlap <george.dunlap@eu.citrix.com> wrote:
> >
> >> On 13/09/13 17:25, George Dunlap wrote:
> >>> Changes:
> >>> * Enforce HAP mode for now
> >>> * Disable exits related to virtual interrupts or emulated APICs
> >>> * Disable changing paging mode
> >>>    - "unrestricted guest" (i.e., real mode for EPT) disabled
> > ........
> >>> +        ASSERT(v->arch.hvm_vmx.exec_control &
> >>> CPU_BASED_ACTIVATE_MSR_BITMAP);
> >>> +        ASSERT(!(v->arch.hvm_vmx.exec_control &
> >>> CPU_BASED_RDTSC_EXITING)); +
> >>> +        /*
> >>> +         * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of
> >>> 1, which means
> >>> +         * upon vmentry, the cpu reads/loads VMCS.DR7 and
> >>> VMCS.DEBUGCTLS, and not
> >>> +         * use the host values. 0 would cause it to not use the
> >>> VMCS values.
> >>> +         */
> >>> +
> >>> +        /* PVH: I don't think these are necessary */
> >>> +        v->arch.hvm_vmx.exec_control &=
> >>> ~CPU_BASED_VIRTUAL_NMI_PENDING;
> >>> +        vmentry_ctl &= ~VM_ENTRY_LOAD_GUEST_EFER;
> >>> +        vmentry_ctl &= ~VM_ENTRY_SMM;
> >>> +        vmentry_ctl &= ~VM_ENTRY_DEACT_DUAL_MONITOR;
> >>
> >> Mukesh, I couldn't figure out why you had set these.  What are
> >> these about, and are they really necessary?
> >
> > We clear those because:
> >    - VM_ENTRY_LOAD_GUEST_EFER: "Intel sdm 23.3.2.1 Loading Guest
> > Control...." states that setting to 0 would cause the LMA to be
> > loaded from VM_ENTRY_IA32E_MODE.
> 
> OK -- but if we just set the LMA properly in the vmcs or wherever,
> and prevent the guest from changing it, then it will DTRT, won't it?

Hmm.. I gathered reading the SDMs that the LMA is set via
VM_ENTRY_IA32E_MODE. We start the guest in Long mode. May be there are
other ways, I am not sure.

> >    - VM_ENTRY_SMM: also according to the SDM must be 0 if not
> > entering from SMM.
> >    - VM_ENTRY_DEACT_DUAL_MONITOR: same, must be 0 for entry outside
> > SMM.
> 
> But those have nothing to do with PVH -- Plain HVM guests needs that
> as well.  So they're already being cleared (or never being set) -- we
> don't need to add these lines in a PVH-specific section.

Right. Initially PVH was separate from HVM, hence these lines.

thaks
mukesh

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 03/21] Remove an unnecessary assert from vmx_update_debug_state
  2013-09-16 21:09   ` Mukesh Rathor
@ 2013-09-18 10:39     ` George Dunlap
  2013-09-18 12:38       ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-18 10:39 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Tim Deegan, Keir Fraser, Jan Beulich, xen-devel

On Mon, Sep 16, 2013 at 10:09 PM, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> On Fri, 13 Sep 2013 17:25:03 +0100
> George Dunlap <george.dunlap@eu.citrix.com> wrote:
>
>> As far as I can tell, there's nothing here that requires v to be
>> current.  vmx_update_exception_bitmap() is updated in other situations
>> where v!=current.
>>
>> Removing this allows the PVH code to call this during vmcs
>> construction in a later patch, making the code more robust by removing
>> duplicate code.
>>
>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>> CC: Mukesh Rathor <mukesh.rathor@oracle.com>
>> CC: Jan Beulich <jan.beulich@suse.com>
>> CC: Tim Deegan <tim@xen.org>
>> CC: Keir Fraser <keir@xen.org>
>> ---
>>  xen/arch/x86/hvm/vmx/vmx.c |    2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index 8ed7026..f02e47a 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1041,8 +1041,6 @@ void vmx_update_debug_state(struct vcpu *v)
>>  {
>>      unsigned long mask;
>>
>> -    ASSERT(v == current);
>> -
>>      mask = 1u << TRAP_int3;
>>      if ( !cpu_has_monitor_trap_flag )
>>          mask |= 1u << TRAP_debug;
>
>
> vmx_update_exception_bitmap() writes EXCEPTION_BITMAP of current vmcs,
> ie of current vcpu. If v != current, you'd be writing the bitmap of
> wrong vcpu. If the assertion is removed, then the function would need
> to vmx enter...

You mean, callers would need to call vmcs_enter before calling it.
Hmm... is there an assert for that?

 -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers
  2013-09-16 23:17     ` Mukesh Rathor
@ 2013-09-18 10:50       ` George Dunlap
  0 siblings, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-18 10:50 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Tim Deegan, Keir Fraser, Jan Beulich, xen-devel

On Tue, Sep 17, 2013 at 12:17 AM, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> On Fri, 13 Sep 2013 17:36:06 +0100
> George Dunlap <george.dunlap@eu.citrix.com> wrote:
>
>> On 13/09/13 17:25, George Dunlap wrote:
>> > Things kept:
>> > * cacheattr_region lists
>> > * irq-related structures
>> > * paging
>> > * tm_list
>> >
>> > Things disabled for now:
>> > * compat xlation
>> >
>> > Things disabled:
>> > * Emulated timers and clock sources
>> > * IO/MMIO emulation
>> > * msix tables
>> > * hvm params
>> > * hvm_funcs
> ......
>> > unsigned long gfn, mfn_t mfn, ((d->vcpu == NULL) || ((v =
>> > d->vcpu[0]) == NULL)) ) return MTRR_TYPE_WRBACK;
>> >
>> > -    if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )
>> > +    if ( v->domain->arch.hvm_domain.params
>> > +
>> > && !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )
>>
>> This is one thing I want to discuss: I can see why initially it was
>> decided to disable hvm_params, as at the moment PVH only uses one of
>> them, and this saves allocating the array for PVH domains.  But the
>> result is a lot of fairly ugly things like this.  There's also (in
>> another patch) a hack that allows a guest to *set* the IRQ callback
>> via hvmop hvm_param_set, but not *read* it.
>>
>> Additionally, as far as I can tell, the only reason we can't support
>> mem events is because we don't have hvm_params.
>>
>> Since I think at some point we well want to use the mem events on PVH
>> guests, we should probably just allocate it now and avoid having
>> things like this in the first place.
>
> That would work too. My goal was to add things incrementally so they
> got properly tested.

Sure, and there's some sense in that -- but how you implement it
depends on the cost.  You're not implementing shadow pagetables, but
all it takes is to just enforce hap=1 at the beginning and you're
done.  In this case you're adding 20-30 checks and other codepath
changes all over the place.  It seems like in this case it would be
better just to document that the memory interface hasn't been tested,
and let people take their chances.

 -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 03/21] Remove an unnecessary assert from vmx_update_debug_state
  2013-09-18 10:39     ` George Dunlap
@ 2013-09-18 12:38       ` Jan Beulich
  2013-09-18 12:53         ` George Dunlap
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 12:38 UTC (permalink / raw)
  To: George Dunlap; +Cc: Tim Deegan, Keir Fraser, xen-devel

>>> On 18.09.13 at 12:39, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> On Mon, Sep 16, 2013 at 10:09 PM, Mukesh Rathor
> <mukesh.rathor@oracle.com> wrote:
>> On Fri, 13 Sep 2013 17:25:03 +0100
>> George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>
>>> As far as I can tell, there's nothing here that requires v to be
>>> current.  vmx_update_exception_bitmap() is updated in other situations
>>> where v!=current.
>>>
>>> Removing this allows the PVH code to call this during vmcs
>>> construction in a later patch, making the code more robust by removing
>>> duplicate code.
>>>
>>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>>> CC: Mukesh Rathor <mukesh.rathor@oracle.com>
>>> CC: Jan Beulich <jan.beulich@suse.com>
>>> CC: Tim Deegan <tim@xen.org>
>>> CC: Keir Fraser <keir@xen.org>
>>> ---
>>>  xen/arch/x86/hvm/vmx/vmx.c |    2 --
>>>  1 file changed, 2 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>> index 8ed7026..f02e47a 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -1041,8 +1041,6 @@ void vmx_update_debug_state(struct vcpu *v)
>>>  {
>>>      unsigned long mask;
>>>
>>> -    ASSERT(v == current);
>>> -
>>>      mask = 1u << TRAP_int3;
>>>      if ( !cpu_has_monitor_trap_flag )
>>>          mask |= 1u << TRAP_debug;
>>
>>
>> vmx_update_exception_bitmap() writes EXCEPTION_BITMAP of current vmcs,
>> ie of current vcpu. If v != current, you'd be writing the bitmap of
>> wrong vcpu. If the assertion is removed, then the function would need
>> to vmx enter...
> 
> You mean, callers would need to call vmcs_enter before calling it.
> Hmm... is there an assert for that?

This is basically what the assertion you're removing does, albeit
restricting it more than it needs to be. But - do you have a use
case for the function when v != current?

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 03/21] Remove an unnecessary assert from vmx_update_debug_state
  2013-09-18 12:38       ` Jan Beulich
@ 2013-09-18 12:53         ` George Dunlap
  2013-09-18 13:51           ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-18 12:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Keir Fraser, xen-devel

On 18/09/13 13:38, Jan Beulich wrote:
>>>> On 18.09.13 at 12:39, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>> On Mon, Sep 16, 2013 at 10:09 PM, Mukesh Rathor
>> <mukesh.rathor@oracle.com> wrote:
>>> On Fri, 13 Sep 2013 17:25:03 +0100
>>> George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>
>>>> As far as I can tell, there's nothing here that requires v to be
>>>> current.  vmx_update_exception_bitmap() is updated in other situations
>>>> where v!=current.
>>>>
>>>> Removing this allows the PVH code to call this during vmcs
>>>> construction in a later patch, making the code more robust by removing
>>>> duplicate code.
>>>>
>>>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>>>> CC: Mukesh Rathor <mukesh.rathor@oracle.com>
>>>> CC: Jan Beulich <jan.beulich@suse.com>
>>>> CC: Tim Deegan <tim@xen.org>
>>>> CC: Keir Fraser <keir@xen.org>
>>>> ---
>>>>   xen/arch/x86/hvm/vmx/vmx.c |    2 --
>>>>   1 file changed, 2 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>>> index 8ed7026..f02e47a 100644
>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>>> @@ -1041,8 +1041,6 @@ void vmx_update_debug_state(struct vcpu *v)
>>>>   {
>>>>       unsigned long mask;
>>>>
>>>> -    ASSERT(v == current);
>>>> -
>>>>       mask = 1u << TRAP_int3;
>>>>       if ( !cpu_has_monitor_trap_flag )
>>>>           mask |= 1u << TRAP_debug;
>>>
>>> vmx_update_exception_bitmap() writes EXCEPTION_BITMAP of current vmcs,
>>> ie of current vcpu. If v != current, you'd be writing the bitmap of
>>> wrong vcpu. If the assertion is removed, then the function would need
>>> to vmx enter...
>> You mean, callers would need to call vmcs_enter before calling it.
>> Hmm... is there an assert for that?
> This is basically what the assertion you're removing does, albeit
> restricting it more than it needs to be. But - do you have a use
> case for the function when v != current?

Yes -- in patch 8, I add a call to this from 
xen/arch/x86/hvm/vmx/vmcs.c:construct_vmcs().  As the comment there 
says, this update normally happens when the guest enables paging. But 
since PVH guests start out with paging enabled, this never happens so we 
do it at start-of-day. (That's what Mukesh's code did, but by 
duplicating the code inside the function, so I followed suit, but called 
the function itself instead.)  At that point vmx_vmcs_enter() has 
already been called, so the ASSERT is overly restrictive.

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 04/21] pvh prep: code motion
  2013-09-13 16:25 ` [PATCH RFC v12 04/21] pvh prep: code motion George Dunlap
@ 2013-09-18 12:59   ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 12:59 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> @@ -1143,14 +1145,17 @@ int hvm_vcpu_initialise(struct vcpu *v)
>  
>      return 0;
>  
> + fail6:
> +    nestedhvm_vcpu_destroy(v);
>   fail5:
> -    free_compat_arg_xlat(v);
> +    vlapic_destroy(v);
>   fail4:
> -    nestedhvm_vcpu_destroy(v);
> +    free_compat_arg_xlat(v);
>   fail3:
> +    tasklet_kill(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet);

While everything else here looks fine, this call seems pretty
pointless for a tasklet that was merely initialized (i.e. just the
pointed to structure filled suitably), but never used for
anything.

Jan

>      hvm_funcs.vcpu_destroy(v);
>   fail2:
> -    vlapic_destroy(v);
> +    hvm_vcpu_cacheattr_destroy(v);
>   fail1:
>      return rc;
>  }
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros
  2013-09-13 16:25 ` [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros George Dunlap
@ 2013-09-18 13:46   ` Jan Beulich
  2013-09-19 16:27     ` George Dunlap
  2013-09-19 16:58   ` George Dunlap
  1 sibling, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 13:46 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> @@ -72,7 +72,7 @@ void *map_domain_page(unsigned long mfn)
>  #endif
>  
>      v = mapcache_current_vcpu();
> -    if ( !v || is_hvm_vcpu(v) )
> +    if ( !v || has_hvm_container_vcpu(v) )
>          return mfn_to_virt(mfn);
>  
>      dcache = &v->domain->arch.pv_domain.mapcache;
> @@ -177,7 +177,7 @@ void unmap_domain_page(const void *ptr)
>      ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
>  
>      v = mapcache_current_vcpu();
> -    ASSERT(v && !is_hvm_vcpu(v));
> +    ASSERT(v && is_pv_vcpu(v));

This is inconsistent with the above change to map_domain_page()
and the changes later in this file.

> @@ -1237,7 +1237,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
>      bool_t compat = is_pv_32on64_domain(v->domain);
>  #define c(fld) (!compat ? (c.nat->fld) : (c.cmp->fld))
>  
> -    if ( is_hvm_vcpu(v) )
> +    if ( has_hvm_container_vcpu(v) )
>          memset(c.nat, 0, sizeof(*c.nat));
>      memcpy(&c.nat->fpu_ctxt, v->arch.fpu_ctxt, sizeof(c.nat->fpu_ctxt));

I think best would be to drop the condition here altogether.

> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -119,6 +119,7 @@ static void show_guest_stack(struct vcpu *v, struct cpu_user_regs *regs)
>      unsigned long *stack, addr;
>      unsigned long mask = STACK_SIZE;
>  
> +    /* Avoid HVM as we don't know what the stack looks like */
>      if ( is_hvm_vcpu(v) )

So why do you not change this one to !is_pv_vcpu()? In particular
the do_page_walk() further down is inherently PV.

> @@ -732,8 +736,12 @@ void watchdog_domain_destroy(struct domain *d);
>  
>  #define VM_ASSIST(_d,_t) (test_bit((_t), &(_d)->vm_assist))
>  
> -#define is_hvm_domain(d) ((d)->is_hvm)
> +#define is_pv_domain(d) ((d)->guest_type == guest_type_pv)
> +#define is_pv_vcpu(v)   (is_pv_domain((v)->domain))
> +#define is_hvm_domain(d) ((d)->guest_type == guest_type_hvm)
>  #define is_hvm_vcpu(v)   (is_hvm_domain(v->domain))
> +#define has_hvm_container_domain(d) ((d)->guest_type != guest_type_pv)
> +#define has_hvm_container_vcpu(v)   (has_hvm_container_domain((v)->domain))

So in the end is_pv_...() = !has_hvm_container_...(), i.e. I
don't really see the point in having both.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 03/21] Remove an unnecessary assert from vmx_update_debug_state
  2013-09-18 12:53         ` George Dunlap
@ 2013-09-18 13:51           ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 13:51 UTC (permalink / raw)
  To: George Dunlap; +Cc: Tim Deegan, Keir Fraser, xen-devel

>>> On 18.09.13 at 14:53, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 18/09/13 13:38, Jan Beulich wrote:
>>>>> On 18.09.13 at 12:39, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>> On Mon, Sep 16, 2013 at 10:09 PM, Mukesh Rathor
>>> <mukesh.rathor@oracle.com> wrote:
>>>> On Fri, 13 Sep 2013 17:25:03 +0100
>>>> George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>>
>>>>> As far as I can tell, there's nothing here that requires v to be
>>>>> current.  vmx_update_exception_bitmap() is updated in other situations
>>>>> where v!=current.
>>>>>
>>>>> Removing this allows the PVH code to call this during vmcs
>>>>> construction in a later patch, making the code more robust by removing
>>>>> duplicate code.
>>>>>
>>>>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>>>>> CC: Mukesh Rathor <mukesh.rathor@oracle.com>
>>>>> CC: Jan Beulich <jan.beulich@suse.com>
>>>>> CC: Tim Deegan <tim@xen.org>
>>>>> CC: Keir Fraser <keir@xen.org>
>>>>> ---
>>>>>   xen/arch/x86/hvm/vmx/vmx.c |    2 --
>>>>>   1 file changed, 2 deletions(-)
>>>>>
>>>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>>>> index 8ed7026..f02e47a 100644
>>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>>>> @@ -1041,8 +1041,6 @@ void vmx_update_debug_state(struct vcpu *v)
>>>>>   {
>>>>>       unsigned long mask;
>>>>>
>>>>> -    ASSERT(v == current);
>>>>> -
>>>>>       mask = 1u << TRAP_int3;
>>>>>       if ( !cpu_has_monitor_trap_flag )
>>>>>           mask |= 1u << TRAP_debug;
>>>>
>>>> vmx_update_exception_bitmap() writes EXCEPTION_BITMAP of current vmcs,
>>>> ie of current vcpu. If v != current, you'd be writing the bitmap of
>>>> wrong vcpu. If the assertion is removed, then the function would need
>>>> to vmx enter...
>>> You mean, callers would need to call vmcs_enter before calling it.
>>> Hmm... is there an assert for that?
>> This is basically what the assertion you're removing does, albeit
>> restricting it more than it needs to be. But - do you have a use
>> case for the function when v != current?
> 
> Yes -- in patch 8, I add a call to this from 
> xen/arch/x86/hvm/vmx/vmcs.c:construct_vmcs().  As the comment there 
> says, this update normally happens when the guest enables paging. But 
> since PVH guests start out with paging enabled, this never happens so we 
> do it at start-of-day. (That's what Mukesh's code did, but by 
> duplicating the code inside the function, so I followed suit, but called 
> the function itself instead.)  At that point vmx_vmcs_enter() has 
> already been called, so the ASSERT is overly restrictive.

Perhaps it's then better to indeed call vmx_vmcs_enter() here
(as I think Mukesh also already suggested), which is mostly a
no-op when called redundantly.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 06/21] pvh: Introduce PVH guest type
  2013-09-13 16:25 ` [PATCH RFC v12 06/21] pvh: Introduce PVH guest type George Dunlap
@ 2013-09-18 14:10   ` Jan Beulich
  2013-09-20 10:01     ` George Dunlap
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 14:10 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> Introduce new PVH guest type, flags to create it, and ways to identify it.
> 
> To begin with, it will inherit functionality marked hvm_container.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

with one minor comment:

> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -187,6 +187,8 @@ void getdomaininfo(struct domain *d, struct 
> xen_domctl_getdomaininfo *info)
>  
>      if ( is_hvm_domain(d) )
>          info->flags |= XEN_DOMINF_hvm_guest;
> +    else if ( is_pvh_domain(d) )
> +        info->flags |= XEN_DOMINF_pvh_guest;

I'd prefer constructs like this to be done as a switch statement.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers
  2013-09-13 16:25 ` [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers George Dunlap
  2013-09-13 16:36   ` George Dunlap
@ 2013-09-18 14:18   ` Jan Beulich
  2013-09-18 14:43     ` George Dunlap
  1 sibling, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 14:18 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> @@ -368,6 +375,7 @@ void hvm_do_resume(struct vcpu *v)
>          }
>      }
>  
> +  check_inject_trap:

Labels are commonly indented by just one space.

> @@ -521,6 +529,7 @@ int hvm_domain_initialise(struct domain *d)
>          return -EINVAL;
>      }
>  
> +    /* PVH: pbut_lock and uc_lock unused, but won't hurt */
>      spin_lock_init(&d->arch.hvm_domain.pbuf_lock);
>      spin_lock_init(&d->arch.hvm_domain.irq_lock);
>      spin_lock_init(&d->arch.hvm_domain.uc_lock);

Typo (pbuf_lock). But the comment is pretty pointless anyway.
And if any future patch starts using either of the mentioned
locks, it'll likely forget to update the comment (and hence it get
stale).

> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -175,6 +175,10 @@ int handle_mmio(void)
>      struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>      int rc;
>  
> +    /* No MMIO for PVH vcpus */
> +    if ( is_pvh_vcpu(curr) )
> +        return 0;
> +

So why does hvm_pio() not get adjusted similarly?

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 08/21] pvh: vmx-specific changes
  2013-09-13 16:25 ` [PATCH RFC v12 08/21] pvh: vmx-specific changes George Dunlap
  2013-09-13 16:38   ` George Dunlap
@ 2013-09-18 14:25   ` Jan Beulich
  2013-09-20 13:07     ` George Dunlap
  1 sibling, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 14:25 UTC (permalink / raw)
  To: George Dunlap
  Cc: xen-devel, Keir Fraser, Eddie Dong, Jun Nakajima, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> +static int pvh_check_requirements(struct vcpu *v)
> +{
> +    u64 required, tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features);
> +
> +    if ( !paging_mode_hap(v->domain) )
> +    {
> +        printk(XENLOG_G_INFO "HAP is required for PVH guest.\n");
> +        return -EINVAL;
> +    }
> +    if ( !cpu_has_vmx_ept )
> +    {
> +        printk(XENLOG_G_INFO "PVH: CPU does not have EPT support\n");
> +        return -ENOSYS;

While I realize that this was that was perhaps in what you took
from Mukesh, I think this and similar uses of -ENOSYS are
misguided; -EOPNOTSUPP would seem more appropriate.

> +        /* PVH: I don't think these are necessary */
> +        v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
> +        vmentry_ctl &= ~VM_ENTRY_LOAD_GUEST_EFER;

Particularly these two would be nice to be confirmed by our VMX
maintainers, who you should have Cc-ed.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 09/21] pvh: Do not allow PVH guests to change paging modes
  2013-09-13 16:25 ` [PATCH RFC v12 09/21] pvh: Do not allow PVH guests to change paging modes George Dunlap
@ 2013-09-18 14:32   ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 14:32 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

once again with a minor comment:

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1743,7 +1743,17 @@ int hvm_set_cr0(unsigned long value)
>           (value & (X86_CR0_PE | X86_CR0_PG)) == X86_CR0_PG )
>          goto gpf;
>  
> -    if ( (value & X86_CR0_PG) && !(old_value & X86_CR0_PG) )
> +    
> +
> +    /* A pvh is not expected to change to real mode. */
> +    if ( is_pvh_vcpu(v)
> +         && (value & (X86_CR0_PE | X86_CR0_PG)) != (X86_CR0_PG | X86_CR0_PE) )
> +    {
> +        printk(XENLOG_G_WARNING
> +               "PVH attempting to turn off PE/PG. CR0:%lx\n", value);
> +        goto gpf;
> +    }
> +    else if ( (value & X86_CR0_PG) && !(old_value & X86_CR0_PG) )

It is pointless to follow a "goto" with an "else if".

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers
  2013-09-18 14:18   ` Jan Beulich
@ 2013-09-18 14:43     ` George Dunlap
  2013-09-18 14:47       ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-18 14:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Tim Deegan

On 18/09/13 15:18, Jan Beulich wrote:
>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> @@ -368,6 +375,7 @@ void hvm_do_resume(struct vcpu *v)
>>           }
>>       }
>>   
>> +  check_inject_trap:
> Labels are commonly indented by just one space.

Ack -- sorry, emacs its own ideas about how they should be indented.  I 
try to catch these but I don't always notice.

>
>> @@ -521,6 +529,7 @@ int hvm_domain_initialise(struct domain *d)
>>           return -EINVAL;
>>       }
>>   
>> +    /* PVH: pbut_lock and uc_lock unused, but won't hurt */
>>       spin_lock_init(&d->arch.hvm_domain.pbuf_lock);
>>       spin_lock_init(&d->arch.hvm_domain.irq_lock);
>>       spin_lock_init(&d->arch.hvm_domain.uc_lock);
> Typo (pbuf_lock). But the comment is pretty pointless anyway.
> And if any future patch starts using either of the mentioned
> locks, it'll likely forget to update the comment (and hence it get
> stale).
>
>> --- a/xen/arch/x86/hvm/io.c
>> +++ b/xen/arch/x86/hvm/io.c
>> @@ -175,6 +175,10 @@ int handle_mmio(void)
>>       struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>>       int rc;
>>   
>> +    /* No MMIO for PVH vcpus */
>> +    if ( is_pvh_vcpu(curr) )
>> +        return 0;
>> +
> So why does hvm_pio() not get adjusted similarly?

At the moment I don't think hvm_pio() will be called for PVH guests.  IO 
instructions for PVH domains are handled by the PV io emulation engine.  
The check here is to make sure that Xen doesn't try to do MMIO for PVH 
domains on an EPT violation.  Mukesh had an "if(!is_pvh_domain)" in the 
EPT handler before calling hvm_hap_nested_page_fault(), which not only 
filtered out MMIO, but also any other reason the fault may have happened 
(e.g., paging or a mem event).

I put the check in handle_mmio() instead of hvm_hap_nested_page_fault() 
just in case there was another path to that function.  I suppose the 
same logic applies to hvm_pio(). We could put either an ASSERT or just a 
check that returns failure.  (I would lean toward an ASSERT.)

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 10/21] pvh: PVH access to hypercalls
  2013-09-13 16:25 ` [PATCH RFC v12 10/21] pvh: PVH access to hypercalls George Dunlap
@ 2013-09-18 14:45   ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 14:45 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> Hypercalls where we now have unrestricted access:
> * memory_op
> * console_io
> * vcpu_op
> * mmuext_op
> 
> We also restrict PVH domain access to HVMOP_*_param to writing
> HVM_PARAM_CALLBACK_IRQ.
> 
> Most hvm_op functions require "is_hvm_domain()" and will default to
> -EINVAL; exceptions are HVMOP_get_time and HVMOP_xentrace.
> 
> Finally, we restrict setting IOPL permissions for a PVH domain.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

again with a couple of comments:

> @@ -3378,7 +3396,9 @@ int hvm_do_hypercall(struct cpu_user_regs *regs)
>      if ( (eax & 0x80000000) && is_viridian_domain(curr->domain) )
>          return viridian_hypercall(regs);
>  
> -    if ( (eax >= NR_hypercalls) || !hvm_hypercall32_table[eax] )
> +    if ( (eax >= NR_hypercalls) ||
> +         (is_pvh_vcpu(curr) && !pvh_hypercall64_table[eax]) ||
> +         (is_hvm_vcpu(curr) && !hvm_hypercall32_table[eax]) )

This sort of gives the impression that the is_pv_() case isn't handled.
Realizing that PV can't make it here, doing this with the ?: operator
would seem more natural (matching the if/else further down around
doing the actual call).

> @@ -3393,16 +3413,20 @@ int hvm_do_hypercall(struct cpu_user_regs *regs)
>                      regs->r10, regs->r8, regs->r9);
>  
>          curr->arch.hvm_vcpu.hcall_64bit = 1;
> -        regs->rax = hvm_hypercall64_table[eax](regs->rdi,
> -                                               regs->rsi,
> -                                               regs->rdx,
> -                                               regs->r10,
> -                                               regs->r8,
> -                                               regs->r9); 
> +        if ( is_pvh_vcpu(curr) )
> +            regs->rax = pvh_hypercall64_table[eax](regs->rdi, regs->rsi,
> +                                                   regs->rdx, regs->r10,
> +                                                   regs->r8, regs->r9);
> +        else
> +            regs->rax = hvm_hypercall64_table[eax](regs->rdi, regs->rsi,
> +                                                   regs->rdx, regs->r10,
> +                                                   regs->r8, regs->r9);
>          curr->arch.hvm_vcpu.hcall_64bit = 0;
>      }
>      else
>      {
> +        ASSERT(!is_pvh_vcpu(curr));   /* PVH 32bitfixme. */
> +

And this would then possibly better be

        ASSERT(is_hvm_vcpu(curr));   /* PVH 32bitfixme. */

> @@ -3827,7 +3851,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              return -ESRCH;
>  
>          rc = -EINVAL;
> -        if ( !is_hvm_domain(d) )
> +        if ( is_pv_domain(d) )

This would be a case where I would see better fit for
!has_hvm_container...(), if we already need to have that.

> +            goto param_fail;
> +
> +        if ( is_pvh_domain(d)
> +             && ( a.index != HVM_PARAM_CALLBACK_IRQ
> +                  ||  op != HVMOP_set_param ) )

Extra blanks inside the inner parentheses.

> --- a/xen/arch/x86/physdev.c
> +++ b/xen/arch/x86/physdev.c
> @@ -519,6 +519,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>      case PHYSDEVOP_set_iopl: {
>          struct physdev_set_iopl set_iopl;
> +        
> +        ret = -EINVAL;
> +        if ( is_pvh_vcpu(current) )
> +            break;

Now this is a case where -ENOSYS would actually make sense: The
sub-hypercall is to be considered not implemented for PVH guests.

> --- a/xen/common/kernel.c
> +++ b/xen/common/kernel.c
> @@ -293,6 +293,10 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
>                               (1U << XENFEAT_highmem_assist) |
>                               (1U << XENFEAT_gnttab_map_avail_bits);
> +            else if ( is_pvh_vcpu(current) )
> +                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
> +                             (1U << XENFEAT_supervisor_mode_kernel) |
> +                             (1U << XENFEAT_hvm_callback_vector);
>              else
>                  fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
>                               (1U << XENFEAT_hvm_callback_vector) |

This once again would better be a switch statement now that
we're dealing with an enumerated type.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers
  2013-09-18 14:43     ` George Dunlap
@ 2013-09-18 14:47       ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 14:47 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 18.09.13 at 16:43, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 18/09/13 15:18, Jan Beulich wrote:
>>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>> --- a/xen/arch/x86/hvm/io.c
>>> +++ b/xen/arch/x86/hvm/io.c
>>> @@ -175,6 +175,10 @@ int handle_mmio(void)
>>>       struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>>>       int rc;
>>>   
>>> +    /* No MMIO for PVH vcpus */
>>> +    if ( is_pvh_vcpu(curr) )
>>> +        return 0;
>>> +
>> So why does hvm_pio() not get adjusted similarly?
> 
> At the moment I don't think hvm_pio() will be called for PVH guests.  IO 
> instructions for PVH domains are handled by the PV io emulation engine.  
> The check here is to make sure that Xen doesn't try to do MMIO for PVH 
> domains on an EPT violation.  Mukesh had an "if(!is_pvh_domain)" in the 
> EPT handler before calling hvm_hap_nested_page_fault(), which not only 
> filtered out MMIO, but also any other reason the fault may have happened 
> (e.g., paging or a mem event).
> 
> I put the check in handle_mmio() instead of hvm_hap_nested_page_fault() 
> just in case there was another path to that function.  I suppose the 
> same logic applies to hvm_pio(). We could put either an ASSERT or just a 
> check that returns failure.  (I would lean toward an ASSERT.)

Yes, an ASSERT() would be fine, restoring consistency.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 11/21] pvh: Use PV e820
  2013-09-13 16:25 ` [PATCH RFC v12 11/21] pvh: Use PV e820 George Dunlap
@ 2013-09-18 14:48   ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 14:48 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> Allow PV e820 map to be set and read from a PVH domain.  This requires
> moving the pv e820 struct out from the pv-specific domain struct and
> into the arch domain struct.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

I think this basically was already
Reviewed-by: Jan Beulich <jbeulich@suse.com>

> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> ---
>  xen/arch/x86/domain.c        |    9 +++------
>  xen/arch/x86/mm.c            |   26 ++++++++++++--------------
>  xen/include/asm-x86/domain.h |   10 +++++-----
>  3 files changed, 20 insertions(+), 25 deletions(-)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 5c38cb1..c75798b 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -553,6 +553,7 @@ int arch_domain_create(struct domain *d, unsigned int 
> domcr_flags)
>          if ( (rc = iommu_domain_init(d)) != 0 )
>              goto fail;
>      }
> +    spin_lock_init(&d->arch.e820_lock);
>  
>      if ( has_hvm_container_domain(d) )
>      {
> @@ -563,13 +564,9 @@ int arch_domain_create(struct domain *d, unsigned int 
> domcr_flags)
>          }
>      }
>      else
> -    {
>          /* 64-bit PV guest by default. */
>          d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
>  
> -        spin_lock_init(&d->arch.pv_domain.e820_lock);
> -    }
> -
>      /* initialize default tsc behavior in case tools don't */
>      tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
>      spin_lock_init(&d->arch.vtsc_lock);
> @@ -592,8 +589,8 @@ void arch_domain_destroy(struct domain *d)
>  {
>      if ( has_hvm_container_domain(d) )
>          hvm_domain_destroy(d);
> -    else
> -        xfree(d->arch.pv_domain.e820);
> +
> +    xfree(d->arch.e820);
>  
>      free_domain_pirqs(d);
>      if ( !is_idle_domain(d) )
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 120599a..f25df92 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4760,11 +4760,11 @@ long arch_memory_op(int op, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>              return -EFAULT;
>          }
>  
> -        spin_lock(&d->arch.pv_domain.e820_lock);
> -        xfree(d->arch.pv_domain.e820);
> -        d->arch.pv_domain.e820 = e820;
> -        d->arch.pv_domain.nr_e820 = fmap.map.nr_entries;
> -        spin_unlock(&d->arch.pv_domain.e820_lock);
> +        spin_lock(&d->arch.e820_lock);
> +        xfree(d->arch.e820);
> +        d->arch.e820 = e820;
> +        d->arch.nr_e820 = fmap.map.nr_entries;
> +        spin_unlock(&d->arch.e820_lock);
>  
>          rcu_unlock_domain(d);
>          return rc;
> @@ -4778,26 +4778,24 @@ long arch_memory_op(int op, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( copy_from_guest(&map, arg, 1) )
>              return -EFAULT;
>  
> -        spin_lock(&d->arch.pv_domain.e820_lock);
> +        spin_lock(&d->arch.e820_lock);
>  
>          /* Backwards compatibility. */
> -        if ( (d->arch.pv_domain.nr_e820 == 0) ||
> -             (d->arch.pv_domain.e820 == NULL) )
> +        if ( (d->arch.nr_e820 == 0) || (d->arch.e820 == NULL) )
>          {
> -            spin_unlock(&d->arch.pv_domain.e820_lock);
> +            spin_unlock(&d->arch.e820_lock);
>              return -ENOSYS;
>          }
>  
> -        map.nr_entries = min(map.nr_entries, d->arch.pv_domain.nr_e820);
> -        if ( copy_to_guest(map.buffer, d->arch.pv_domain.e820,
> -                           map.nr_entries) ||
> +        map.nr_entries = min(map.nr_entries, d->arch.nr_e820);
> +        if ( copy_to_guest(map.buffer, d->arch.e820, map.nr_entries) ||
>               __copy_to_guest(arg, &map, 1) )
>          {
> -            spin_unlock(&d->arch.pv_domain.e820_lock);
> +            spin_unlock(&d->arch.e820_lock);
>              return -EFAULT;
>          }
>  
> -        spin_unlock(&d->arch.pv_domain.e820_lock);
> +        spin_unlock(&d->arch.e820_lock);
>          return 0;
>      }
>  
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 4cbad4e..23923df 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -234,11 +234,6 @@ struct pv_domain
>  
>      /* map_domain_page() mapping cache. */
>      struct mapcache_domain mapcache;
> -
> -    /* Pseudophysical e820 map (XENMEM_memory_map).  */
> -    spinlock_t e820_lock;
> -    struct e820entry *e820;
> -    unsigned int nr_e820;
>  };
>  
>  struct arch_domain
> @@ -313,6 +308,11 @@ struct arch_domain
>                                  (possibly other cases in the future */
>      uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */
>      uint64_t vtsc_usercount; /* not used for hvm */
> +
> +    /* Pseudophysical e820 map (XENMEM_memory_map).  */
> +    spinlock_t e820_lock;
> +    struct e820entry *e820;
> +    unsigned int nr_e820;
>  } __cacheline_aligned;
>  
>  #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 12/21] pvh: Support guest_kernel_mode for PVH
  2013-09-13 16:25 ` [PATCH RFC v12 12/21] pvh: Support guest_kernel_mode for PVH George Dunlap
@ 2013-09-18 14:52   ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 14:52 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> This will be requred in order to do emulated_privops for PVH.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

albeit for generality's sake ...

> +#define guest_kernel_mode(v, r)                                   \
> +    (is_pvh_vcpu(v) ? hvm_kernel_mode(v) :                        \

... this could probably become !is_pv_vcpu() (or
has_hvm_container_vcpu(), if that doesn't go away).

> +     (!is_pv_32bit_vcpu(v) ?                                      \
> +      (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) :         \
> +      (ring_1(r))))

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 13/21] pvh: Support read_segment_register for PVH
  2013-09-13 16:25 ` [PATCH RFC v12 13/21] pvh: Support read_segment_register " George Dunlap
@ 2013-09-18 14:56   ` Jan Beulich
  2013-09-20 14:18     ` George Dunlap
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 14:56 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> This will be necessary to do PV-style emulated operations for PVH guests.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

provided you ...

> @@ -1566,6 +1605,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
>      .sync_pir_to_irr      = vmx_sync_pir_to_irr,
>      .handle_eoi           = vmx_handle_eoi,
>      .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
> +    .read_selector        = vmx_read_selector,
>  };

... put this and ...

> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -193,6 +193,7 @@ struct hvm_function_table {
>                                  paddr_t *L1_gpa, unsigned int *page_order,
>                                  uint8_t *p2m_acc, bool_t access_r,
>                                  bool_t access_w, bool_t access_x);
> +    u16 (*read_selector)(struct vcpu *v, enum x86_segment seg);
>  };

this alongside the other segment register related hooks.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 14/21] pvh: read_descriptor for PVH guests.
  2013-09-13 16:25 ` [PATCH RFC v12 14/21] pvh: read_descriptor for PVH guests George Dunlap
  2013-09-13 16:40   ` George Dunlap
@ 2013-09-18 15:00   ` Jan Beulich
  1 sibling, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 15:00 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> This is in preparation for enabling emulated privops for PVH guests.
> 
> This one unfortunately has a bit more of an impedance mismatch:
> 
> * For PV, the selector is hard-coded in by passing #name in the
>   #define.  For PVH, we have to do a switch statement.
> 
> * For PV, given the desrciptor, it can read the resulting base, limit,
>   &c directly; for PVH, we have to read the values currently loaded in the 
> vmcs.
> 
> This leads to a rather awkward construct where we *both* read the
> descriptor, *and* specify a selector, and introduce an intermediate 
> function,
> read_descriptor_sel.
> 
> Unfortunately, without introducing a rather pointless switch()
> statement to the PV path similar to the one in the PVH path, there's
> no way to make this less awkward.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

with one really minor comment:

> +static int read_descriptor_sel(unsigned int sel,
> +                               enum x86_segment which_sel,
> +                               struct vcpu *v,
> +                               const struct cpu_user_regs *regs,
> +                               unsigned long *base,
> +                               unsigned long *limit,
> +                               unsigned int *ar,
> +                               unsigned int vm86attr)
> +{
> +    struct segment_register seg;
> +    bool_t long_mode;
> +
> +    if ( !is_pvh_vcpu(v) )
> +        return read_descriptor(sel, v, regs, base, limit, ar, vm86attr);
> +
> +    hvm_get_segment_register(v, x86_seg_cs, &seg);
> +    long_mode = seg.attr.fields.l;
> +
> +    if ( which_sel != x86_seg_cs )
> +        hvm_get_segment_register(v, which_sel, &seg);
> +
> +    /* "ar" is returned packed as in segment_attributes_t. Fix it up. */
> +    *ar = seg.attr.bytes;
> +    *ar = (*ar & 0xff ) | ((*ar & 0xf00) << 4);

Stray blank before the first closing parenthesis.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 15/21] pvh: Set up more PV stuff in set_info_guest
  2013-09-13 16:25 ` [PATCH RFC v12 15/21] pvh: Set up more PV stuff in set_info_guest George Dunlap
@ 2013-09-18 15:17   ` Jan Beulich
  2013-09-20 14:50     ` George Dunlap
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 15:17 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> @@ -737,7 +737,31 @@ int arch_set_info_guest(
>      if ( has_hvm_container_vcpu(v) )
>      {
>          hvm_set_info_guest(v);
> -        goto out;
> +
> +        if ( is_hvm_vcpu(v) || v->is_initialised )
> +            goto out;
> +
> +        /* PVH */
> +        if ( c.nat->ctrlreg[1] )
> +            return -EINVAL;
> +
> +        ASSERT(!compat);

This lacks a pvh-32bit-fixme annotation.

> +
> +        cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[3]);
> +        cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC);
> +
> +        v->arch.cr3 = page_to_maddr(cr3_page);
> +        v->arch.hvm_vcpu.guest_cr[3] = c.nat->ctrlreg[3];
> +        
> +        if ( paging_mode_enabled(d) )

Is the opposite really possible? I think this ought to be an
assertion.

> +int vmx_pvh_vcpu_boot_set_info(struct vcpu *v,
> +                               struct vcpu_guest_context *ctxtp)
> +{
> +    if ( ctxtp->ldt_base || ctxtp->ldt_ents ||
> +         ctxtp->user_regs.cs || ctxtp->user_regs.ss || ctxtp->user_regs.es ||
> +         ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs ||
> +         *ctxtp->gdt_frames || ctxtp->gdt_ents ||

Don't know why I didn't spot this earlier, but the gdt_frames check
is pointless when gdt_ents is verified to be zero.

> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -150,6 +150,10 @@ typedef uint64_t tsc_timestamp_t; /* RDTSC timestamp */
>  /*
>   * The following is all CPU context. Note that the fpu_ctxt block is filled 
> 
>   * in by FXSAVE if the CPU has feature FXSR; otherwise FSAVE is used.
> + *
> + * PVH 64bit: In the vcpu boot path, for vmcs context, only gs_base_kernel
> + *            is honored. Other fields like gdt, ldt, and selectors must be
> + *            zeroed. See vmx_pvh_vcpu_boot_set_info.

In the current form I hope the comment is misleading: Surely
general purpose and FPU registers get honored too?

And referring to an internal function from a public header is sort
of bogus too.

>   */
>  struct vcpu_guest_context {
>      /* FPU registers come first so they can be aligned for FXSAVE/FXRSTOR. */

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO
  2013-09-13 16:25 ` [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO George Dunlap
@ 2013-09-18 15:31   ` Jan Beulich
  2013-09-19  1:02     ` Mukesh Rathor
  2013-09-23 13:48     ` George Dunlap
  0 siblings, 2 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 15:31 UTC (permalink / raw)
  To: George Dunlap, Mukesh Rathor; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>      case EXIT_REASON_IO_INSTRUCTION:
> -        exit_qualification = __vmread(EXIT_QUALIFICATION);
> -        if ( exit_qualification & 0x10 )
> +        if ( is_pvh_vcpu(v) )
>          {
> -            /* INS, OUTS */
> -            if ( !handle_mmio() )
> -                hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +            /*
> +             * Note: A PVH guest sets IOPL natively by setting bits in
> +             *       the eflags, and not via hypercalls used by a PV.
> +             */
> +            struct segment_register seg;
> +            int requested = (regs->rflags & X86_EFLAGS_IOPL) >> 12;
> +            int curr_lvl = (regs->rflags & X86_EFLAGS_VM) ? 3 : 0;
> +            
> +            if ( curr_lvl == 0 )
> +            {
> +                hvm_get_segment_register(current, x86_seg_ss, &seg);
> +                curr_lvl = seg.attr.fields.dpl;
> +            }
> +            if ( requested < curr_lvl || !emulate_privileged_op(regs) )
> +                hvm_inject_hw_exception(TRAP_gp_fault, regs->error_code);

Now that I think about it once more, that's actually rather
questionable. First of all - does a PVH guest see translated or
untranslated I/O port space? With there not being a PV MMU, the
former might seem more natural...

And then for the majority of I/O ports where Xen simply carries
out the access on behalf of the guest, we could as well allow the
guest to do the port access itself by clearing the respective flags
in the bitmap. Once that is done, the question would then be
whether any legitimate cases remain that require a call to
emulate_privileged_op() here.
 
> +void propagate_page_fault(unsigned long addr, u16 error_code)
> +{
> +    is_pvh_vcpu(current)
> +        ? hvm_inject_page_fault(error_code, addr)
> +        : pv_inject_page_fault(addr, error_code);

Even if not written down in CODINGSTYLE, the majority of other
cases in the code has the operators last thing on a line rather than
first.

> @@ -1624,6 +1631,13 @@ static int guest_io_okay(
>      int user_mode = !(v->arch.flags & TF_kernel_mode);
>  #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
>  
> +    /*
> +     * For PVH we check this in vmexit for EXIT_REASON_IO_INSTRUCTION
> +     * and so don't need to check again here.
> +     */
> +    if ( is_pvh_vcpu(v) )
> +        return 1;
> +
>      if ( !vm86_mode(regs) &&
>           (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ? 1 : 3)) )
>          return 1;

Hmm, am I missing something here? The check in the VMEXIT
handler is just a privilege level one - where's the bitmap being
consulted? _If_ the bitmap is being maintained properly for the
guest (which I don't recall having seen), anything leading here
would be for ports the guest was not permitted access to. Yet
we would happily emulate the access for it then.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 17/21] pvh: Disable 32-bit guest support for now
  2013-09-13 16:25 ` [PATCH RFC v12 17/21] pvh: Disable 32-bit guest support for now George Dunlap
@ 2013-09-18 15:36   ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-18 15:36 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> To be implemented.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> ---
>  xen/arch/x86/domain.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index d046790..5e43a67 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -339,6 +339,14 @@ int switch_compat(struct domain *d)
>  
>      if ( d == NULL )
>          return -EINVAL;
> +
> +    if ( is_pvh_domain(d) )
> +    {
> +        printk(XENLOG_INFO
> +               "Xen currently does not support 32bit PVH guests\n");
> +        return -EINVAL;
> +    }
> +
>      if ( !may_switch_mode(d) )
>          return -EACCES;
>      if ( is_pv_32on64_domain(d) )
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO
  2013-09-18 15:31   ` Jan Beulich
@ 2013-09-19  1:02     ` Mukesh Rathor
  2013-09-19 10:09       ` Jan Beulich
  2013-09-23 13:48     ` George Dunlap
  1 sibling, 1 reply; 76+ messages in thread
From: Mukesh Rathor @ 2013-09-19  1:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Keir Fraser, Tim Deegan

On Wed, 18 Sep 2013 16:31:17 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com>
> >>> wrote:
> >      case EXIT_REASON_IO_INSTRUCTION:
> > -        exit_qualification = __vmread(EXIT_QUALIFICATION);
> > -        if ( exit_qualification & 0x10 )
> > +        if ( is_pvh_vcpu(v) )
> >          {
> > -            /* INS, OUTS */
> > -            if ( !handle_mmio() )
> > -                hvm_inject_hw_exception(TRAP_gp_fault, 0);
> > +            /*
> > +             * Note: A PVH guest sets IOPL natively by setting
> > bits in
> > +             *       the eflags, and not via hypercalls used by a
> > PV.
> > +             */
> > +            struct segment_register seg;
> > +            int requested = (regs->rflags & X86_EFLAGS_IOPL) >> 12;
> > +            int curr_lvl = (regs->rflags & X86_EFLAGS_VM) ? 3 : 0;
> > +            
> > +            if ( curr_lvl == 0 )
> > +            {
> > +                hvm_get_segment_register(current, x86_seg_ss,
> > &seg);
> > +                curr_lvl = seg.attr.fields.dpl;
> > +            }
> > +            if ( requested < curr_lvl
> > || !emulate_privileged_op(regs) )
> > +                hvm_inject_hw_exception(TRAP_gp_fault,
> > regs->error_code);
> 
> Now that I think about it once more, that's actually rather
> questionable. First of all - does a PVH guest see translated or
> untranslated I/O port space? With there not being a PV MMU, the
> former might seem more natural...
> 
> And then for the majority of I/O ports where Xen simply carries
> out the access on behalf of the guest, we could as well allow the
> guest to do the port access itself by clearing the respective flags
> in the bitmap. Once that is done, the question would then be
> whether any legitimate cases remain that require a call to
> emulate_privileged_op() here.

Good idea, I didn't know enough about what ports are allowed access
to do that.

> > @@ -1624,6 +1631,13 @@ static int guest_io_okay(
> >      int user_mode = !(v->arch.flags & TF_kernel_mode);
> >  #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
> >  
> > +    /*
> > +     * For PVH we check this in vmexit for
> > EXIT_REASON_IO_INSTRUCTION
> > +     * and so don't need to check again here.
> > +     */
> > +    if ( is_pvh_vcpu(v) )
> > +        return 1;
> > +
> >      if ( !vm86_mode(regs) &&
> >           (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ?
> > 1 : 3)) ) return 1;
> 
> Hmm, am I missing something here? The check in the VMEXIT
> handler is just a privilege level one - where's the bitmap being
> consulted? _If_ the bitmap is being maintained properly for the
> guest (which I don't recall having seen), anything leading here
> would be for ports the guest was not permitted access to. Yet
> we would happily emulate the access for it then.

Not sure I understand which bitmap needs to be consulted. The bitmap
hvm_io_bitmap is used to set the intercepts which PVH also uses, with
HVM defaults.

thanks
mukesh

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO
  2013-09-19  1:02     ` Mukesh Rathor
@ 2013-09-19 10:09       ` Jan Beulich
  2013-09-20 17:03         ` George Dunlap
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2013-09-19 10:09 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: George Dunlap, xen-devel, Keir Fraser, TimDeegan

>>> On 19.09.13 at 03:02, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Wed, 18 Sep 2013 16:31:17 +0100 "Jan Beulich" <JBeulich@suse.com> wrote:
>> >>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> > @@ -1624,6 +1631,13 @@ static int guest_io_okay(
>> >      int user_mode = !(v->arch.flags & TF_kernel_mode);
>> >  #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
>> >  
>> > +    /*
>> > +     * For PVH we check this in vmexit for
>> > EXIT_REASON_IO_INSTRUCTION
>> > +     * and so don't need to check again here.
>> > +     */
>> > +    if ( is_pvh_vcpu(v) )
>> > +        return 1;
>> > +
>> >      if ( !vm86_mode(regs) &&
>> >           (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ?
>> > 1 : 3)) ) return 1;
>> 
>> Hmm, am I missing something here? The check in the VMEXIT
>> handler is just a privilege level one - where's the bitmap being
>> consulted? _If_ the bitmap is being maintained properly for the
>> guest (which I don't recall having seen), anything leading here
>> would be for ports the guest was not permitted access to. Yet
>> we would happily emulate the access for it then.
> 
> Not sure I understand which bitmap needs to be consulted. The bitmap
> hvm_io_bitmap is used to set the intercepts which PVH also uses, with
> HVM defaults.

Actually I think I got confused by the IOPL checking in your
VM exit handling code: _If_ you need to check the IOPL, then
you also need to check the bitmap hanging off of the TSS. But
aiui all the privilege level checks get done inside the guest, and
only if those pass would the I/O bitmap hanging off of the VMCS
be consulted. Hence neither the IOPL check nor the TSS-based
bitmap check ought to be necessary here.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros
  2013-09-18 13:46   ` Jan Beulich
@ 2013-09-19 16:27     ` George Dunlap
  2013-09-20  8:11       ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-19 16:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Tim Deegan

On 18/09/13 14:46, Jan Beulich wrote:
>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> @@ -72,7 +72,7 @@ void *map_domain_page(unsigned long mfn)
>>   #endif
>>   
>>       v = mapcache_current_vcpu();
>> -    if ( !v || is_hvm_vcpu(v) )
>> +    if ( !v || has_hvm_container_vcpu(v) )
>>           return mfn_to_virt(mfn);
>>   
>>       dcache = &v->domain->arch.pv_domain.mapcache;
>> @@ -177,7 +177,7 @@ void unmap_domain_page(const void *ptr)
>>       ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
>>   
>>       v = mapcache_current_vcpu();
>> -    ASSERT(v && !is_hvm_vcpu(v));
>> +    ASSERT(v && is_pv_vcpu(v));
> This is inconsistent with the above change to map_domain_page()
> and the changes later in this file.

You mean, making this "is_pv_vcpu" instead of 
"!has_hvm_container_vcpu"?  I guess that's true.

>
>> @@ -1237,7 +1237,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
>>       bool_t compat = is_pv_32on64_domain(v->domain);
>>   #define c(fld) (!compat ? (c.nat->fld) : (c.cmp->fld))
>>   
>> -    if ( is_hvm_vcpu(v) )
>> +    if ( has_hvm_container_vcpu(v) )
>>           memset(c.nat, 0, sizeof(*c.nat));
>>       memcpy(&c.nat->fpu_ctxt, v->arch.fpu_ctxt, sizeof(c.nat->fpu_ctxt));
> I think best would be to drop the condition here altogether.

Ack.

>
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -119,6 +119,7 @@ static void show_guest_stack(struct vcpu *v, struct cpu_user_regs *regs)
>>       unsigned long *stack, addr;
>>       unsigned long mask = STACK_SIZE;
>>   
>> +    /* Avoid HVM as we don't know what the stack looks like */
>>       if ( is_hvm_vcpu(v) )
> So why do you not change this one to !is_pv_vcpu()? In particular
> the do_page_walk() further down is inherently PV.

I guess it depends.  I was thinking that the reason we didn't do this 
for HVM guests was that there was no guarantee what the stack looked 
like -- it looks like another aspect is that for VMs that are not 
current, whether the PTs are controlled by Xen or the guest is a factor.

But if we make do_page_walk gated on is_pv_vcpu(), then it seems like it 
should work when v==current, and fail gracefully when v!=current.

>
>> @@ -732,8 +736,12 @@ void watchdog_domain_destroy(struct domain *d);
>>   
>>   #define VM_ASSIST(_d,_t) (test_bit((_t), &(_d)->vm_assist))
>>   
>> -#define is_hvm_domain(d) ((d)->is_hvm)
>> +#define is_pv_domain(d) ((d)->guest_type == guest_type_pv)
>> +#define is_pv_vcpu(v)   (is_pv_domain((v)->domain))
>> +#define is_hvm_domain(d) ((d)->guest_type == guest_type_hvm)
>>   #define is_hvm_vcpu(v)   (is_hvm_domain(v->domain))
>> +#define has_hvm_container_domain(d) ((d)->guest_type != guest_type_pv)
>> +#define has_hvm_container_vcpu(v)   (has_hvm_container_domain((v)->domain))
> So in the end is_pv_...() = !has_hvm_container_...(), i.e. I
> don't really see the point in having both.

Yes, is_pv <=> !has_hvm_container.  My original goal with having 
something different between the two was to try to hint to the coder 
whether the check had something to do with PV stuff or with HVM stuff.  
For example, in arch_set_info_guest(), at the top, it uses "is_pv" 
because the various selectors are only fixed up for PV guests; in 
arch_vcpu_reset() it's "is_pv" because it's related to destroying 
PV-only structures.  Whereas in, say, domain_relinquish_resources, the 
call to hvm_domain_relinquish_resources() is gated on has_hvm_container, 
because it has to do with HVM structures; and in vmcs_dump() it's 
"!has_hvm_container" instead if "is_pv" for the same reasons.

I can see now that this hasn't been consistently applied; for one, in 
many places the patch still assumes two alternatives.  And in the 
map_domain_page() functions, it looks like it should be "!is_pv", since 
it appears only PV guests have per-domain mapcache structures.

It still makes more sense to me to have two different labels, because to 
me they don't seem like the same thing: Taking a codepath because you 
*don't* have PV structures is different than taking a codepath because 
you *do* have HVM structures, even if it happens that at the moment the 
two happen to coincide 100% of the time.

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros
  2013-09-13 16:25 ` [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros George Dunlap
  2013-09-18 13:46   ` Jan Beulich
@ 2013-09-19 16:58   ` George Dunlap
  2013-09-20  8:38     ` Jan Beulich
  1 sibling, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-19 16:58 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Tim Deegan, Jan Beulich, xen-devel

On 13/09/13 17:25, George Dunlap wrote:
> The goal of this patch is to classify conditionals more clearly, as to
> whether they relate to pv guests, hvm-only guests, or guests with an
> "hvm container" (which will eventually include PVH).
>
> This patch introduces an enum for guest type, as well as two new macros
> for switching behavior on and off: is_pv_* and has_hvm_container_*.
>
> In general, a switch should use is_pv_* (or !is_pv_*) if the code in question
> relates directly to a PV guest.  Examples include use of pv_vcpu structs or
> other behavior directly related to PV domains.
>
> hvm_container is more of a fuzzy concept, but in general:
>
> * Most core HVM behavior will be included in this.  Behavior not
> appropriate for PVH mode will be disabled in later patches
>
> * Hypercalls related to HVM guests will *not* be included by default;
> functionality needed by PVH guests will be enabled in future patches
>
> * The following functionality are not considered part of the HVM
> container, and PVH will end up behaving like PV by default: Event
> channel, vtsc offset, code related to emulated timers, nested HVM,
> emuirq, PoD
>
> * Some features are left to implement for PVH later: vpmu, shadow mode
>
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> CC: Jan Beulich <jan.beulich@suse.com>
> CC: Tim Deegan <tim@xen.org>
> CC: Keir Fraser <keir@xen.org>
> ---
>   xen/arch/x86/acpi/suspend.c        |    2 +-
>   xen/arch/x86/cpu/mcheck/vmce.c     |    6 ++--
>   xen/arch/x86/debug.c               |    2 +-
>   xen/arch/x86/domain.c              |   54 ++++++++++++++++++------------------
>   xen/arch/x86/domain_page.c         |   10 +++----
>   xen/arch/x86/domctl.c              |   10 +++----
>   xen/arch/x86/efi/runtime.c         |    4 +--
>   xen/arch/x86/hvm/vmx/vmcs.c        |    4 +--
>   xen/arch/x86/mm.c                  |    6 ++--
>   xen/arch/x86/mm/shadow/common.c    |    6 ++--
>   xen/arch/x86/mm/shadow/multi.c     |    7 +++--
>   xen/arch/x86/physdev.c             |    4 +--
>   xen/arch/x86/traps.c               |    5 ++--
>   xen/arch/x86/x86_64/traps.c        |    8 +++---
>   xen/common/domain.c                |    2 +-
>   xen/common/grant_table.c           |    4 +--
>   xen/common/kernel.c                |    2 +-
>   xen/include/asm-x86/domain.h       |    2 +-
>   xen/include/asm-x86/event.h        |    2 +-
>   xen/include/asm-x86/guest_access.h |   12 ++++----
>   xen/include/asm-x86/guest_pt.h     |    4 +--
>   xen/include/xen/sched.h            |   14 ++++++++--
>   xen/include/xen/tmem_xen.h         |    2 +-
>   23 files changed, 91 insertions(+), 81 deletions(-)
>
> diff --git a/xen/arch/x86/acpi/suspend.c b/xen/arch/x86/acpi/suspend.c
> index c690b45..2b7aa3b 100644
> --- a/xen/arch/x86/acpi/suspend.c
> +++ b/xen/arch/x86/acpi/suspend.c
> @@ -78,7 +78,7 @@ void restore_rest_processor_state(void)
>       }
>   
>       /* Maybe load the debug registers. */
> -    BUG_ON(is_hvm_vcpu(curr));
> +    BUG_ON(has_hvm_container_vcpu(curr));
>       if ( !is_idle_vcpu(curr) && curr->arch.debugreg[7] )
>       {
>           write_debugreg(0, curr->arch.debugreg[0]);
> diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
> index af3b491..f6c35db 100644
> --- a/xen/arch/x86/cpu/mcheck/vmce.c
> +++ b/xen/arch/x86/cpu/mcheck/vmce.c
> @@ -83,7 +83,7 @@ int vmce_restore_vcpu(struct vcpu *v, const struct hvm_vmce_vcpu *ctxt)
>       {
>           dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA capabilities"
>                   " %#" PRIx64 " for d%d:v%u (supported: %#Lx)\n",
> -                is_hvm_vcpu(v) ? "HVM" : "PV", ctxt->caps,
> +                has_hvm_container_vcpu(v) ? "HVM" : "PV", ctxt->caps,
>                   v->domain->domain_id, v->vcpu_id,
>                   guest_mcg_cap & ~MCG_CAP_COUNT);
>           return -EPERM;
> @@ -357,7 +357,7 @@ int inject_vmce(struct domain *d, int vcpu)
>           if ( vcpu != VMCE_INJECT_BROADCAST && vcpu != v->vcpu_id )
>               continue;
>   
> -        if ( (is_hvm_domain(d) ||
> +        if ( (has_hvm_container_domain(d) ||
>                 guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) &&
>                !test_and_set_bool(v->mce_pending) )
>           {
> @@ -439,7 +439,7 @@ int unmmap_broken_page(struct domain *d, mfn_t mfn, unsigned long gfn)
>       if (!mfn_valid(mfn_x(mfn)))
>           return -EINVAL;
>   
> -    if ( !is_hvm_domain(d) || !paging_mode_hap(d) )
> +    if ( !has_hvm_container_domain(d) || !paging_mode_hap(d) )
>           return -ENOSYS;
>   
>       rc = -1;
> diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c
> index e67473e..3e21ca8 100644
> --- a/xen/arch/x86/debug.c
> +++ b/xen/arch/x86/debug.c
> @@ -158,7 +158,7 @@ dbg_rw_guest_mem(dbgva_t addr, dbgbyte_t *buf, int len, struct domain *dp,
>   
>           pagecnt = min_t(long, PAGE_SIZE - (addr & ~PAGE_MASK), len);
>   
> -        mfn = (dp->is_hvm
> +        mfn = (has_hvm_container_domain(dp)
>                  ? dbg_hvm_va2mfn(addr, dp, toaddr, &gfn)
>                  : dbg_pv_va2mfn(addr, dp, pgd3));
>           if ( mfn == INVALID_MFN )
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index a9e2383..5c38cb1 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -167,7 +167,7 @@ void dump_pageframe_info(struct domain *d)
>           spin_unlock(&d->page_alloc_lock);
>       }
>   
> -    if ( is_hvm_domain(d) )
> +    if ( has_hvm_container_domain(d) )
>           p2m_pod_dump_data(d);
>   
>       spin_lock(&d->page_alloc_lock);
> @@ -385,7 +385,7 @@ int vcpu_initialise(struct vcpu *v)
>   
>       vmce_init_vcpu(v);
>   
> -    if ( is_hvm_domain(d) )
> +    if ( has_hvm_container_domain(d) )
>       {
>           rc = hvm_vcpu_initialise(v);
>           goto done;
> @@ -438,7 +438,7 @@ int vcpu_initialise(struct vcpu *v)
>       {
>           vcpu_destroy_fpu(v);
>   
> -        if ( !is_hvm_domain(d) )
> +        if ( is_pv_domain(d) )
>               xfree(v->arch.pv_vcpu.trap_ctxt);
>       }
>   
> @@ -452,7 +452,7 @@ void vcpu_destroy(struct vcpu *v)
>   
>       vcpu_destroy_fpu(v);
>   
> -    if ( is_hvm_vcpu(v) )
> +    if ( has_hvm_container_vcpu(v) )
>           hvm_vcpu_destroy(v);
>       else
>           xfree(v->arch.pv_vcpu.trap_ctxt);
> @@ -464,7 +464,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
>       int rc = -ENOMEM;
>   
>       d->arch.hvm_domain.hap_enabled =
> -        is_hvm_domain(d) &&
> +        has_hvm_container_domain(d) &&
>           hvm_funcs.hap_supported &&
>           (domcr_flags & DOMCRF_hap);
>       d->arch.hvm_domain.mem_sharing_enabled = 0;
> @@ -490,7 +490,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
>                  d->domain_id);
>       }
>   
> -    if ( is_hvm_domain(d) )
> +    if ( has_hvm_container_domain(d) )
>           rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL);
>       else if ( is_idle_domain(d) )
>           rc = 0;
> @@ -512,7 +512,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
>       mapcache_domain_init(d);
>   
>       HYPERVISOR_COMPAT_VIRT_START(d) =
> -        is_hvm_domain(d) ? ~0u : __HYPERVISOR_COMPAT_VIRT_START;
> +        is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
>   
>       if ( (rc = paging_domain_init(d, domcr_flags)) != 0 )
>           goto fail;
> @@ -554,7 +554,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
>               goto fail;
>       }
>   
> -    if ( is_hvm_domain(d) )
> +    if ( has_hvm_container_domain(d) )
>       {
>           if ( (rc = hvm_domain_initialise(d)) != 0 )
>           {
> @@ -583,14 +583,14 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
>       if ( paging_initialised )
>           paging_final_teardown(d);
>       free_perdomain_mappings(d);
> -    if ( !is_hvm_domain(d) )
> +    if ( is_pv_domain(d) )
>           free_xenheap_page(d->arch.pv_domain.gdt_ldt_l1tab);
>       return rc;
>   }
>   
>   void arch_domain_destroy(struct domain *d)
>   {
> -    if ( is_hvm_domain(d) )
> +    if ( has_hvm_container_domain(d) )
>           hvm_domain_destroy(d);
>       else
>           xfree(d->arch.pv_domain.e820);
> @@ -602,7 +602,7 @@ void arch_domain_destroy(struct domain *d)
>       paging_final_teardown(d);
>   
>       free_perdomain_mappings(d);
> -    if ( !is_hvm_domain(d) )
> +    if ( is_pv_domain(d) )
>           free_xenheap_page(d->arch.pv_domain.gdt_ldt_l1tab);
>   
>       free_xenheap_page(d->shared_info);
> @@ -653,7 +653,7 @@ int arch_set_info_guest(
>   #define c(fld) (compat ? (c.cmp->fld) : (c.nat->fld))
>       flags = c(flags);
>   
> -    if ( !is_hvm_vcpu(v) )
> +    if ( is_pv_vcpu(v) )
>       {
>           if ( !compat )
>           {
> @@ -706,7 +706,7 @@ int arch_set_info_guest(
>       v->fpu_initialised = !!(flags & VGCF_I387_VALID);
>   
>       v->arch.flags &= ~TF_kernel_mode;
> -    if ( (flags & VGCF_in_kernel) || is_hvm_vcpu(v)/*???*/ )
> +    if ( (flags & VGCF_in_kernel) || has_hvm_container_vcpu(v)/*???*/ )
>           v->arch.flags |= TF_kernel_mode;
>   
>       v->arch.vgc_flags = flags;
> @@ -721,7 +721,7 @@ int arch_set_info_guest(
>       if ( !compat )
>       {
>           memcpy(&v->arch.user_regs, &c.nat->user_regs, sizeof(c.nat->user_regs));
> -        if ( !is_hvm_vcpu(v) )
> +        if ( is_pv_vcpu(v) )
>               memcpy(v->arch.pv_vcpu.trap_ctxt, c.nat->trap_ctxt,
>                      sizeof(c.nat->trap_ctxt));
>       }
> @@ -737,7 +737,7 @@ int arch_set_info_guest(
>   
>       v->arch.user_regs.eflags |= 2;
>   
> -    if ( is_hvm_vcpu(v) )
> +    if ( has_hvm_container_vcpu(v) )
>       {
>           hvm_set_info_guest(v);
>           goto out;
> @@ -966,7 +966,7 @@ int arch_set_info_guest(
>   
>   int arch_vcpu_reset(struct vcpu *v)
>   {
> -    if ( !is_hvm_vcpu(v) )
> +    if ( is_pv_vcpu(v) )
>       {
>           destroy_gdt(v);
>           return vcpu_destroy_pagetables(v);
> @@ -1316,7 +1316,7 @@ static void update_runstate_area(struct vcpu *v)
>   
>   static inline int need_full_gdt(struct vcpu *v)
>   {
> -    return (!is_hvm_vcpu(v) && !is_idle_vcpu(v));
> +    return (is_pv_vcpu(v) && !is_idle_vcpu(v));
>   }
>   
>   static void __context_switch(void)
> @@ -1438,9 +1438,9 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>       {
>           __context_switch();
>   
> -        if ( !is_hvm_vcpu(next) &&
> +        if ( is_pv_vcpu(next) &&
>                (is_idle_vcpu(prev) ||
> -              is_hvm_vcpu(prev) ||
> +              has_hvm_container_vcpu(prev) ||
>                 is_pv_32on64_vcpu(prev) != is_pv_32on64_vcpu(next)) )
>           {
>               uint64_t efer = read_efer();
> @@ -1451,13 +1451,13 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>           /* Re-enable interrupts before restoring state which may fault. */
>           local_irq_enable();
>   
> -        if ( !is_hvm_vcpu(next) )
> +        if ( is_pv_vcpu(next) )
>           {
>               load_LDT(next);
>               load_segments(next);
>           }
>   
> -        set_cpuid_faulting(!is_hvm_vcpu(next) &&
> +        set_cpuid_faulting(is_pv_vcpu(next) &&
>                              (next->domain->domain_id != 0));
>       }
>   
> @@ -1540,7 +1540,7 @@ void hypercall_cancel_continuation(void)
>       }
>       else
>       {
> -        if ( !is_hvm_vcpu(current) )
> +        if ( is_pv_vcpu(current) )
>               regs->eip += 2; /* skip re-execute 'syscall' / 'int $xx' */
>           else
>               current->arch.hvm_vcpu.hcall_preempted = 0;
> @@ -1577,12 +1577,12 @@ unsigned long hypercall_create_continuation(
>           regs->eax  = op;
>   
>           /* Ensure the hypercall trap instruction is re-executed. */
> -        if ( !is_hvm_vcpu(current) )
> +        if ( is_pv_vcpu(current) )
>               regs->eip -= 2;  /* re-execute 'syscall' / 'int $xx' */
>           else
>               current->arch.hvm_vcpu.hcall_preempted = 1;
>   
> -        if ( !is_hvm_vcpu(current) ?
> +        if ( is_pv_vcpu(current) ?
>                !is_pv_32on64_vcpu(current) :
>                (hvm_guest_x86_mode(current) == 8) )
>           {
> @@ -1850,7 +1850,7 @@ int domain_relinquish_resources(struct domain *d)
>                   return ret;
>           }
>   
> -        if ( !is_hvm_domain(d) )
> +        if ( is_pv_domain(d) )
>           {
>               for_each_vcpu ( d, v )
>               {
> @@ -1923,7 +1923,7 @@ int domain_relinquish_resources(struct domain *d)
>           BUG();
>       }
>   
> -    if ( is_hvm_domain(d) )
> +    if ( has_hvm_container_domain(d) )
>           hvm_domain_relinquish_resources(d);
>   
>       return 0;
> @@ -2007,7 +2007,7 @@ void vcpu_mark_events_pending(struct vcpu *v)
>       if ( already_pending )
>           return;
>   
> -    if ( is_hvm_vcpu(v) )
> +    if ( has_hvm_container_vcpu(v) )
>           hvm_assert_evtchn_irq(v);
>       else
>           vcpu_kick(v);
> diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
> index bc18263..b9db322 100644
> --- a/xen/arch/x86/domain_page.c
> +++ b/xen/arch/x86/domain_page.c
> @@ -35,7 +35,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
>        * then it means we are running on the idle domain's page table and must
>        * therefore use its mapcache.
>        */
> -    if ( unlikely(pagetable_is_null(v->arch.guest_table)) && !is_hvm_vcpu(v) )
> +    if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) )
>       {
>           /* If we really are idling, perform lazy context switch now. */
>           if ( (v = idle_vcpu[smp_processor_id()]) == current )
> @@ -72,7 +72,7 @@ void *map_domain_page(unsigned long mfn)
>   #endif
>   
>       v = mapcache_current_vcpu();
> -    if ( !v || is_hvm_vcpu(v) )
> +    if ( !v || has_hvm_container_vcpu(v) )
>           return mfn_to_virt(mfn);
>   
>       dcache = &v->domain->arch.pv_domain.mapcache;
> @@ -177,7 +177,7 @@ void unmap_domain_page(const void *ptr)
>       ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
>   
>       v = mapcache_current_vcpu();
> -    ASSERT(v && !is_hvm_vcpu(v));
> +    ASSERT(v && is_pv_vcpu(v));
>   
>       dcache = &v->domain->arch.pv_domain.mapcache;
>       ASSERT(dcache->inuse);
> @@ -244,7 +244,7 @@ int mapcache_domain_init(struct domain *d)
>       struct mapcache_domain *dcache = &d->arch.pv_domain.mapcache;
>       unsigned int bitmap_pages;
>   
> -    if ( is_hvm_domain(d) || is_idle_domain(d) )
> +    if ( has_hvm_container_domain(d) || is_idle_domain(d) )
>           return 0;
>   
>   #ifdef NDEBUG
> @@ -275,7 +275,7 @@ int mapcache_vcpu_init(struct vcpu *v)
>       unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
>       unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long));
>   
> -    if ( is_hvm_vcpu(v) || !dcache->inuse )
> +    if ( has_hvm_container_vcpu(v) || !dcache->inuse )
>           return 0;
>   
>       if ( ents > dcache->entries )
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index c2a04c4..e0e3fe7 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -800,7 +800,7 @@ long arch_do_domctl(
>           if ( domctl->cmd == XEN_DOMCTL_get_ext_vcpucontext )
>           {
>               evc->size = sizeof(*evc);
> -            if ( !is_hvm_domain(d) )
> +            if ( is_pv_domain(d) )
>               {
>                   evc->sysenter_callback_cs      =
>                       v->arch.pv_vcpu.sysenter_callback_cs;
> @@ -833,7 +833,7 @@ long arch_do_domctl(
>               ret = -EINVAL;
>               if ( evc->size < offsetof(typeof(*evc), vmce) )
>                   goto ext_vcpucontext_out;
> -            if ( !is_hvm_domain(d) )
> +            if ( is_pv_domain(d) )
>               {
>                   if ( !is_canonical_address(evc->sysenter_callback_eip) ||
>                        !is_canonical_address(evc->syscall32_callback_eip) )
> @@ -1237,7 +1237,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
>       bool_t compat = is_pv_32on64_domain(v->domain);
>   #define c(fld) (!compat ? (c.nat->fld) : (c.cmp->fld))
>   
> -    if ( is_hvm_vcpu(v) )
> +    if ( has_hvm_container_vcpu(v) )
>           memset(c.nat, 0, sizeof(*c.nat));
>       memcpy(&c.nat->fpu_ctxt, v->arch.fpu_ctxt, sizeof(c.nat->fpu_ctxt));
>       c(flags = v->arch.vgc_flags & ~(VGCF_i387_valid|VGCF_in_kernel));
> @@ -1248,7 +1248,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
>       if ( !compat )
>       {
>           memcpy(&c.nat->user_regs, &v->arch.user_regs, sizeof(c.nat->user_regs));
> -        if ( !is_hvm_vcpu(v) )
> +        if ( is_pv_vcpu(v) )
>               memcpy(c.nat->trap_ctxt, v->arch.pv_vcpu.trap_ctxt,
>                      sizeof(c.nat->trap_ctxt));
>       }
> @@ -1263,7 +1263,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
>       for ( i = 0; i < ARRAY_SIZE(v->arch.debugreg); ++i )
>           c(debugreg[i] = v->arch.debugreg[i]);
>   
> -    if ( is_hvm_vcpu(v) )
> +    if ( has_hvm_container_vcpu(v) )
>       {
>           struct segment_register sreg;
>   
> diff --git a/xen/arch/x86/efi/runtime.c b/xen/arch/x86/efi/runtime.c
> index 37bb535..d7c884b 100644
> --- a/xen/arch/x86/efi/runtime.c
> +++ b/xen/arch/x86/efi/runtime.c
> @@ -52,7 +52,7 @@ unsigned long efi_rs_enter(void)
>       /* prevent fixup_page_fault() from doing anything */
>       irq_enter();
>   
> -    if ( !is_hvm_vcpu(current) && !is_idle_vcpu(current) )
> +    if ( is_pv_vcpu(current) && !is_idle_vcpu(current) )
>       {
>           struct desc_ptr gdt_desc = {
>               .limit = LAST_RESERVED_GDT_BYTE,
> @@ -71,7 +71,7 @@ unsigned long efi_rs_enter(void)
>   void efi_rs_leave(unsigned long cr3)
>   {
>       write_cr3(cr3);
> -    if ( !is_hvm_vcpu(current) && !is_idle_vcpu(current) )
> +    if ( is_pv_vcpu(current) && !is_idle_vcpu(current) )
>       {
>           struct desc_ptr gdt_desc = {
>               .limit = LAST_RESERVED_GDT_BYTE,
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 0620f87..7087630 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -637,7 +637,7 @@ void vmx_vmcs_exit(struct vcpu *v)
>       {
>           /* Don't confuse vmx_do_resume (for @v or @current!) */
>           vmx_clear_vmcs(v);
> -        if ( is_hvm_vcpu(current) )
> +        if ( has_hvm_container_vcpu(current) )
>               vmx_load_vmcs(current);
>   
>           spin_unlock(&v->arch.hvm_vmx.vmcs_lock);
> @@ -1477,7 +1477,7 @@ static void vmcs_dump(unsigned char ch)
>   
>       for_each_domain ( d )
>       {
> -        if ( !is_hvm_domain(d) )
> +        if ( !has_hvm_container_domain(d) )
>               continue;
>           printk("\n>>> Domain %d <<<\n", d->domain_id);
>           for_each_vcpu ( d, v )
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index e7f0e13..120599a 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -181,7 +181,7 @@ static uint32_t base_disallow_mask;
>        (rangeset_is_empty((d)->iomem_caps) &&                     \
>         rangeset_is_empty((d)->arch.ioport_caps) &&               \
>         !has_arch_pdevs(d) &&                                     \
> -      !is_hvm_domain(d)) ?                                      \
> +      is_pv_domain(d)) ?                                        \
>        L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
>   
>   static void __init init_frametable_chunk(void *start, void *end)
> @@ -433,7 +433,7 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
>   
>   unsigned long domain_get_maximum_gpfn(struct domain *d)
>   {
> -    if ( is_hvm_domain(d) )
> +    if ( has_hvm_container_domain(d) )
>           return p2m_get_hostp2m(d)->max_mapped_pfn;
>       /* NB. PV guests specify nr_pfns rather than max_pfn so we adjust here. */
>       return (arch_get_max_pfn(d) ?: 1) - 1;
> @@ -2379,7 +2379,7 @@ static int __get_page_type(struct page_info *page, unsigned long type,
>       {
>           /* Special pages should not be accessible from devices. */
>           struct domain *d = page_get_owner(page);
> -        if ( d && !is_hvm_domain(d) && unlikely(need_iommu(d)) )
> +        if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>           {
>               if ( (x & PGT_type_mask) == PGT_writable_page )
>                   iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));

This is one in particular I couldn't quite figure out what the PV/HVM 
split was for, so I couldn't figure out whether to take this conditional 
for PVH or not.

I think this is probably correct, as in common/grant_table.c, there is 
in one location under "need_iommu" the following two lines:

         /* Shouldn't happen, because you can't use iommu in a HVM 
domain. */
         BUG_ON(paging_mode_translate(ld));

Not sure exactly what the comment means...

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros
  2013-09-19 16:27     ` George Dunlap
@ 2013-09-20  8:11       ` Jan Beulich
  2013-09-20  9:23         ` George Dunlap
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2013-09-20  8:11 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 19.09.13 at 18:27, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 18/09/13 14:46, Jan Beulich wrote:
>>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>> @@ -72,7 +72,7 @@ void *map_domain_page(unsigned long mfn)
>>>   #endif
>>>   
>>>       v = mapcache_current_vcpu();
>>> -    if ( !v || is_hvm_vcpu(v) )
>>> +    if ( !v || has_hvm_container_vcpu(v) )
>>>           return mfn_to_virt(mfn);
>>>   
>>>       dcache = &v->domain->arch.pv_domain.mapcache;
>>> @@ -177,7 +177,7 @@ void unmap_domain_page(const void *ptr)
>>>       ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
>>>   
>>>       v = mapcache_current_vcpu();
>>> -    ASSERT(v && !is_hvm_vcpu(v));
>>> +    ASSERT(v && is_pv_vcpu(v));
>> This is inconsistent with the above change to map_domain_page()
>> and the changes later in this file.
> 
> You mean, making this "is_pv_vcpu" instead of 
> "!has_hvm_container_vcpu"?  I guess that's true.

Actually I meant the cited one above to be converted - the map
cache, following your argumentation further down, is something
that is _not_ needed when there is a HVM container (i.e. external
paging); the need for it is not tied to the PV-ness of a guest.

But anyway, this is-PV and doesn't-have-HVM-container distinction
is sort of fuzzy anyway, and you seem to half way agree following
later parts of your response.

>>> --- a/xen/arch/x86/traps.c
>>> +++ b/xen/arch/x86/traps.c
>>> @@ -119,6 +119,7 @@ static void show_guest_stack(struct vcpu *v, struct cpu_user_regs *regs)
>>>       unsigned long *stack, addr;
>>>       unsigned long mask = STACK_SIZE;
>>>   
>>> +    /* Avoid HVM as we don't know what the stack looks like */
>>>       if ( is_hvm_vcpu(v) )
>> So why do you not change this one to !is_pv_vcpu()? In particular
>> the do_page_walk() further down is inherently PV.
> 
> I guess it depends.  I was thinking that the reason we didn't do this 
> for HVM guests was that there was no guarantee what the stack looked 
> like -- it looks like another aspect is that for VMs that are not 
> current, whether the PTs are controlled by Xen or the guest is a factor.
> 
> But if we make do_page_walk gated on is_pv_vcpu(), then it seems like it 
> should work when v==current, and fail gracefully when v!=current.

No, the right (long term) thing would in fact be to do the right
form of page walk in the PVH case here too. After all you're right
that the original reason for filtering out HVM guests was that we
can't make assumptions about their stacks (whether making any
such assumptions of PV guests is really appropriate is another
question). Hence I wouldn't want to see do_page_walk() to be
fiddled with; suppressing the call in show_guest_stack() would be
fine with me.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros
  2013-09-19 16:58   ` George Dunlap
@ 2013-09-20  8:38     ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-20  8:38 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 19.09.13 at 18:58, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 13/09/13 17:25, George Dunlap wrote:
>> @@ -2379,7 +2379,7 @@ static int __get_page_type(struct page_info *page, unsigned long type,
>>       {
>>           /* Special pages should not be accessible from devices. */
>>           struct domain *d = page_get_owner(page);
>> -        if ( d && !is_hvm_domain(d) && unlikely(need_iommu(d)) )
>> +        if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>>           {
>>               if ( (x & PGT_type_mask) == PGT_writable_page )
>>                   iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
> 
> This is one in particular I couldn't quite figure out what the PV/HVM 
> split was for, so I couldn't figure out whether to take this conditional 
> for PVH or not.

Yes, IOMMU mappings should follow the HVM model for PVH
afaict.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros
  2013-09-20  8:11       ` Jan Beulich
@ 2013-09-20  9:23         ` George Dunlap
  2013-09-20  9:44           ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-20  9:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Tim Deegan

On 20/09/13 09:11, Jan Beulich wrote:
>>>> On 19.09.13 at 18:27, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> On 18/09/13 14:46, Jan Beulich wrote:
>>>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>> @@ -72,7 +72,7 @@ void *map_domain_page(unsigned long mfn)
>>>>    #endif
>>>>    
>>>>        v = mapcache_current_vcpu();
>>>> -    if ( !v || is_hvm_vcpu(v) )
>>>> +    if ( !v || has_hvm_container_vcpu(v) )
>>>>            return mfn_to_virt(mfn);
>>>>    
>>>>        dcache = &v->domain->arch.pv_domain.mapcache;
>>>> @@ -177,7 +177,7 @@ void unmap_domain_page(const void *ptr)
>>>>        ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
>>>>    
>>>>        v = mapcache_current_vcpu();
>>>> -    ASSERT(v && !is_hvm_vcpu(v));
>>>> +    ASSERT(v && is_pv_vcpu(v));
>>> This is inconsistent with the above change to map_domain_page()
>>> and the changes later in this file.
>> You mean, making this "is_pv_vcpu" instead of
>> "!has_hvm_container_vcpu"?  I guess that's true.
> Actually I meant the cited one above to be converted - the map
> cache, following your argumentation further down, is something
> that is _not_ needed when there is a HVM container (i.e. external
> paging); the need for it is not tied to the PV-ness of a guest.
>
> But anyway, this is-PV and doesn't-have-HVM-container distinction
> is sort of fuzzy anyway, and you seem to half way agree following
> later parts of your response.
>
>>>> --- a/xen/arch/x86/traps.c
>>>> +++ b/xen/arch/x86/traps.c
>>>> @@ -119,6 +119,7 @@ static void show_guest_stack(struct vcpu *v, struct cpu_user_regs *regs)
>>>>        unsigned long *stack, addr;
>>>>        unsigned long mask = STACK_SIZE;
>>>>    
>>>> +    /* Avoid HVM as we don't know what the stack looks like */
>>>>        if ( is_hvm_vcpu(v) )
>>> So why do you not change this one to !is_pv_vcpu()? In particular
>>> the do_page_walk() further down is inherently PV.
>> I guess it depends.  I was thinking that the reason we didn't do this
>> for HVM guests was that there was no guarantee what the stack looked
>> like -- it looks like another aspect is that for VMs that are not
>> current, whether the PTs are controlled by Xen or the guest is a factor.
>>
>> But if we make do_page_walk gated on is_pv_vcpu(), then it seems like it
>> should work when v==current, and fail gracefully when v!=current.
> No, the right (long term) thing would in fact be to do the right
> form of page walk in the PVH case here too. After all you're right
> that the original reason for filtering out HVM guests was that we
> can't make assumptions about their stacks (whether making any
> such assumptions of PV guests is really appropriate is another
> question). Hence I wouldn't want to see do_page_walk() to be
> fiddled with; suppressing the call in show_guest_stack() would be
> fine with me.

What I meant was, do_page_walk() already had a conditional at the top to 
return NULL if is_hvm_domain() was true, and callers should all be 
written to handle that case.  If we change that to if(!is_pv_domain()) 
(which should probably be done anyway), then for PVH domains we'll get a 
stack for v==current, but not for others. do_page_walk() can be extended 
for PVH domains at some point in the future.

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros
  2013-09-20  9:23         ` George Dunlap
@ 2013-09-20  9:44           ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-20  9:44 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 20.09.13 at 11:23, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> What I meant was, do_page_walk() already had a conditional at the top to 
> return NULL if is_hvm_domain() was true, and callers should all be 
> written to handle that case.  If we change that to if(!is_pv_domain()) 
> (which should probably be done anyway), then for PVH domains we'll get a 
> stack for v==current, but not for others. do_page_walk() can be extended 
> for PVH domains at some point in the future.

Oh, right, I should have checked... So after all I agree.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 06/21] pvh: Introduce PVH guest type
  2013-09-18 14:10   ` Jan Beulich
@ 2013-09-20 10:01     ` George Dunlap
  0 siblings, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-20 10:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Tim Deegan

On 18/09/13 15:10, Jan Beulich wrote:
>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> Introduce new PVH guest type, flags to create it, and ways to identify it.
>>
>> To begin with, it will inherit functionality marked hvm_container.
>>
>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
> with one minor comment:
>
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -187,6 +187,8 @@ void getdomaininfo(struct domain *d, struct
>> xen_domctl_getdomaininfo *info)
>>   
>>       if ( is_hvm_domain(d) )
>>           info->flags |= XEN_DOMINF_hvm_guest;
>> +    else if ( is_pvh_domain(d) )
>> +        info->flags |= XEN_DOMINF_pvh_guest;
> I'd prefer constructs like this to be done as a switch statement.

Hmm, the resulting boilerplate takes up 11 lines rather than 4, but I 
guess that's OK with me...

Thanks,
  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 08/21] pvh: vmx-specific changes
  2013-09-18 14:25   ` Jan Beulich
@ 2013-09-20 13:07     ` George Dunlap
  0 siblings, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-20 13:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Eddie Dong, Jun Nakajima, Tim Deegan

On 18/09/13 15:25, Jan Beulich wrote:
>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> +static int pvh_check_requirements(struct vcpu *v)
>> +{
>> +    u64 required, tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features);
>> +
>> +    if ( !paging_mode_hap(v->domain) )
>> +    {
>> +        printk(XENLOG_G_INFO "HAP is required for PVH guest.\n");
>> +        return -EINVAL;
>> +    }
>> +    if ( !cpu_has_vmx_ept )
>> +    {
>> +        printk(XENLOG_G_INFO "PVH: CPU does not have EPT support\n");
>> +        return -ENOSYS;
> While I realize that this was that was perhaps in what you took
> from Mukesh, I think this and similar uses of -ENOSYS are
> misguided; -EOPNOTSUPP would seem more appropriate.
>
>> +        /* PVH: I don't think these are necessary */
>> +        v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
>> +        vmentry_ctl &= ~VM_ENTRY_LOAD_GUEST_EFER;
> Particularly these two would be nice to be confirmed by our VMX
> maintainers, who you should have Cc-ed.

Doing a bit more investigation, it looks like the NMI_PENDING thing was 
probably because Mukesh's duplicated vmexit handler didn't (yet) do the 
work of setting and clearing this when necessary.  Since we're using the 
shared vmexit code now, that's all sorted out.

In any case, I'm pretty sure it starts out cleared, and is set in 
vmx/intr.c:enable_intr_window().

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 13/21] pvh: Support read_segment_register for PVH
  2013-09-18 14:56   ` Jan Beulich
@ 2013-09-20 14:18     ` George Dunlap
  2013-09-20 14:56       ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-20 14:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Tim Deegan

On 18/09/13 15:56, Jan Beulich wrote:
>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> This will be necessary to do PV-style emulated operations for PVH guests.
>>
>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan,

There are a couple of patches where I'm taking your advice, but also 
doing a handful of other minor clean-ups; for instance, in this patch 
I'm renaming the public function "hvm_read_selector" rather than 
"pvh_get_selector" to make the naming consistent.  Would you like me to 
retain the Reviewed-by, to remind you that you've basically already 
approved the patch, and let you withdraw it if you don't like the minor 
changes (which will be listed in the description)?  Or shall I leave it 
off and let you review it again?

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 15/21] pvh: Set up more PV stuff in set_info_guest
  2013-09-18 15:17   ` Jan Beulich
@ 2013-09-20 14:50     ` George Dunlap
  2013-09-20 14:58       ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-20 14:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Tim Deegan

On 18/09/13 16:17, Jan Beulich wrote:
>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> @@ -737,7 +737,31 @@ int arch_set_info_guest(
>>       if ( has_hvm_container_vcpu(v) )
>>       {
>>           hvm_set_info_guest(v);
>> -        goto out;
>> +
>> +        if ( is_hvm_vcpu(v) || v->is_initialised )
>> +            goto out;
>> +
>> +        /* PVH */
>> +        if ( c.nat->ctrlreg[1] )
>> +            return -EINVAL;
>> +
>> +        ASSERT(!compat);
> This lacks a pvh-32bit-fixme annotation.

Ack.

>
>> +
>> +        cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[3]);
>> +        cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC);
>> +
>> +        v->arch.cr3 = page_to_maddr(cr3_page);
>> +        v->arch.hvm_vcpu.guest_cr[3] = c.nat->ctrlreg[3];
>> +
>> +        if ( paging_mode_enabled(d) )
> Is the opposite really possible? I think this ought to be an
> assertion.

No, it shouldn't be.  I thought about doing it, but at some point I had 
made so many substantive changes that I decided to be more 
conservative.  I'll change it to an assert.

>
>> +int vmx_pvh_vcpu_boot_set_info(struct vcpu *v,
>> +                               struct vcpu_guest_context *ctxtp)
>> +{
>> +    if ( ctxtp->ldt_base || ctxtp->ldt_ents ||
>> +         ctxtp->user_regs.cs || ctxtp->user_regs.ss || ctxtp->user_regs.es ||
>> +         ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs ||
>> +         *ctxtp->gdt_frames || ctxtp->gdt_ents ||
> Don't know why I didn't spot this earlier, but the gdt_frames check
> is pointless when gdt_ents is verified to be zero.

You know, looking at this again -- is there a reason we can't just put 
this in hvm_set_info_guest()?  It's already only called from that one 
place, only a few lines before.  There doesn't really seem to be a need 
to have yet another function for just a few lines.

>
>> --- a/xen/include/public/arch-x86/xen.h
>> +++ b/xen/include/public/arch-x86/xen.h
>> @@ -150,6 +150,10 @@ typedef uint64_t tsc_timestamp_t; /* RDTSC timestamp */
>>   /*
>>    * The following is all CPU context. Note that the fpu_ctxt block is filled
>>
>>    * in by FXSAVE if the CPU has feature FXSR; otherwise FSAVE is used.
>> + *
>> + * PVH 64bit: In the vcpu boot path, for vmcs context, only gs_base_kernel
>> + *            is honored. Other fields like gdt, ldt, and selectors must be
>> + *            zeroed. See vmx_pvh_vcpu_boot_set_info.
> In the current form I hope the comment is misleading: Surely
> general purpose and FPU registers get honored too?
>
> And referring to an internal function from a public header is sort
> of bogus too.

In any case the comment about how a hypercall behaves should be near the 
hypercall.  I'll move this over.

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 13/21] pvh: Support read_segment_register for PVH
  2013-09-20 14:18     ` George Dunlap
@ 2013-09-20 14:56       ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-20 14:56 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 20.09.13 at 16:18, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 18/09/13 15:56, Jan Beulich wrote:
>>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>> This will be necessary to do PV-style emulated operations for PVH guests.
>>>
>>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>>> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> There are a couple of patches where I'm taking your advice, but also 
> doing a handful of other minor clean-ups; for instance, in this patch 
> I'm renaming the public function "hvm_read_selector" rather than 
> "pvh_get_selector" to make the naming consistent.  Would you like me to 
> retain the Reviewed-by, to remind you that you've basically already 
> approved the patch, and let you withdraw it if you don't like the minor 
> changes (which will be listed in the description)?  Or shall I leave it 
> off and let you review it again?

For minor changes, and especially for fallout from changes I had
asked for, retaining the tag is fine. Anything changing functionality,
or any significant re-arrangement should cause it to be dropped.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 15/21] pvh: Set up more PV stuff in set_info_guest
  2013-09-20 14:50     ` George Dunlap
@ 2013-09-20 14:58       ` Jan Beulich
  2013-09-20 15:12         ` George Dunlap
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2013-09-20 14:58 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 20.09.13 at 16:50, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 18/09/13 16:17, Jan Beulich wrote:
>>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>> +int vmx_pvh_vcpu_boot_set_info(struct vcpu *v,
>>> +                               struct vcpu_guest_context *ctxtp)
>>> +{
>>> +    if ( ctxtp->ldt_base || ctxtp->ldt_ents ||
>>> +         ctxtp->user_regs.cs || ctxtp->user_regs.ss || ctxtp->user_regs.es ||
>>> +         ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs ||
>>> +         *ctxtp->gdt_frames || ctxtp->gdt_ents ||
>> Don't know why I didn't spot this earlier, but the gdt_frames check
>> is pointless when gdt_ents is verified to be zero.
> 
> You know, looking at this again -- is there a reason we can't just put 
> this in hvm_set_info_guest()?  It's already only called from that one 
> place, only a few lines before.  There doesn't really seem to be a need 
> to have yet another function for just a few lines.

Indeed - only the VMX specific piece belongs here; everything
else should be in the generic wrapper (which perhaps then
won't be a simple wrapper anymore).

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 15/21] pvh: Set up more PV stuff in set_info_guest
  2013-09-20 14:58       ` Jan Beulich
@ 2013-09-20 15:12         ` George Dunlap
  2013-09-20 15:26           ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-20 15:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Tim Deegan

On 20/09/13 15:58, Jan Beulich wrote:
>>>> On 20.09.13 at 16:50, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> On 18/09/13 16:17, Jan Beulich wrote:
>>>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>> +int vmx_pvh_vcpu_boot_set_info(struct vcpu *v,
>>>> +                               struct vcpu_guest_context *ctxtp)
>>>> +{
>>>> +    if ( ctxtp->ldt_base || ctxtp->ldt_ents ||
>>>> +         ctxtp->user_regs.cs || ctxtp->user_regs.ss || ctxtp->user_regs.es ||
>>>> +         ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs ||
>>>> +         *ctxtp->gdt_frames || ctxtp->gdt_ents ||
>>> Don't know why I didn't spot this earlier, but the gdt_frames check
>>> is pointless when gdt_ents is verified to be zero.
>> You know, looking at this again -- is there a reason we can't just put
>> this in hvm_set_info_guest()?  It's already only called from that one
>> place, only a few lines before.  There doesn't really seem to be a need
>> to have yet another function for just a few lines.
> Indeed - only the VMX specific piece belongs here; everything
> else should be in the generic wrapper (which perhaps then
> won't be a simple wrapper anymore).

Or, we could do like is done for pv, and put the validity checks at the 
top of arch_set_info_guest().  I think I like that option the best.

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 15/21] pvh: Set up more PV stuff in set_info_guest
  2013-09-20 15:12         ` George Dunlap
@ 2013-09-20 15:26           ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-20 15:26 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 20.09.13 at 17:12, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 20/09/13 15:58, Jan Beulich wrote:
>>>>> On 20.09.13 at 16:50, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>> On 18/09/13 16:17, Jan Beulich wrote:
>>>>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>>> +int vmx_pvh_vcpu_boot_set_info(struct vcpu *v,
>>>>> +                               struct vcpu_guest_context *ctxtp)
>>>>> +{
>>>>> +    if ( ctxtp->ldt_base || ctxtp->ldt_ents ||
>>>>> +         ctxtp->user_regs.cs || ctxtp->user_regs.ss || ctxtp->user_regs.es ||
>>>>> +         ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs ||
>>>>> +         *ctxtp->gdt_frames || ctxtp->gdt_ents ||
>>>> Don't know why I didn't spot this earlier, but the gdt_frames check
>>>> is pointless when gdt_ents is verified to be zero.
>>> You know, looking at this again -- is there a reason we can't just put
>>> this in hvm_set_info_guest()?  It's already only called from that one
>>> place, only a few lines before.  There doesn't really seem to be a need
>>> to have yet another function for just a few lines.
>> Indeed - only the VMX specific piece belongs here; everything
>> else should be in the generic wrapper (which perhaps then
>> won't be a simple wrapper anymore).
> 
> Or, we could do like is done for pv, and put the validity checks at the 
> top of arch_set_info_guest().  I think I like that option the best.

True.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO
  2013-09-19 10:09       ` Jan Beulich
@ 2013-09-20 17:03         ` George Dunlap
  2013-09-20 17:06           ` George Dunlap
  2013-09-23  6:49           ` Jan Beulich
  0 siblings, 2 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-20 17:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, TimDeegan

On 19/09/13 11:09, Jan Beulich wrote:
>>>> On 19.09.13 at 03:02, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>> On Wed, 18 Sep 2013 16:31:17 +0100 "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>> @@ -1624,6 +1631,13 @@ static int guest_io_okay(
>>>>       int user_mode = !(v->arch.flags & TF_kernel_mode);
>>>>   #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
>>>>   
>>>> +    /*
>>>> +     * For PVH we check this in vmexit for
>>>> EXIT_REASON_IO_INSTRUCTION
>>>> +     * and so don't need to check again here.
>>>> +     */
>>>> +    if ( is_pvh_vcpu(v) )
>>>> +        return 1;
>>>> +
>>>>       if ( !vm86_mode(regs) &&
>>>>            (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ?
>>>> 1 : 3)) ) return 1;
>>> Hmm, am I missing something here? The check in the VMEXIT
>>> handler is just a privilege level one - where's the bitmap being
>>> consulted? _If_ the bitmap is being maintained properly for the
>>> guest (which I don't recall having seen), anything leading here
>>> would be for ports the guest was not permitted access to. Yet
>>> we would happily emulate the access for it then.
>> Not sure I understand which bitmap needs to be consulted. The bitmap
>> hvm_io_bitmap is used to set the intercepts which PVH also uses, with
>> HVM defaults.
> Actually I think I got confused by the IOPL checking in your
> VM exit handling code: _If_ you need to check the IOPL, then
> you also need to check the bitmap hanging off of the TSS. But
> aiui all the privilege level checks get done inside the guest, and
> only if those pass would the I/O bitmap hanging off of the VMCS
> be consulted. Hence neither the IOPL check nor the TSS-based
> bitmap check ought to be necessary here.

I'm not quite following this.  Are you saying that the iopl checks are 
done by the hardware, and if failed it will already have delivered a 
GPF; and if we've managed to get an IO_INSTRUCTION exit then they have 
passed, so we don't need the iopl check in vmx.c?

If not, what needs to change here?

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO
  2013-09-20 17:03         ` George Dunlap
@ 2013-09-20 17:06           ` George Dunlap
  2013-09-23  6:49           ` Jan Beulich
  1 sibling, 0 replies; 76+ messages in thread
From: George Dunlap @ 2013-09-20 17:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, TimDeegan

On 20/09/13 18:03, George Dunlap wrote:
> On 19/09/13 11:09, Jan Beulich wrote:
>>>>> On 19.09.13 at 03:02, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>>> On Wed, 18 Sep 2013 16:31:17 +0100 "Jan Beulich" <JBeulich@suse.com> 
>>> wrote:
>>>>>>> On 13.09.13 at 18:25, George Dunlap 
>>>>>>> <george.dunlap@eu.citrix.com> wrote:
>>>>> @@ -1624,6 +1631,13 @@ static int guest_io_okay(
>>>>>       int user_mode = !(v->arch.flags & TF_kernel_mode);
>>>>>   #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
>>>>>   +    /*
>>>>> +     * For PVH we check this in vmexit for
>>>>> EXIT_REASON_IO_INSTRUCTION
>>>>> +     * and so don't need to check again here.
>>>>> +     */
>>>>> +    if ( is_pvh_vcpu(v) )
>>>>> +        return 1;
>>>>> +
>>>>>       if ( !vm86_mode(regs) &&
>>>>>            (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ?
>>>>> 1 : 3)) ) return 1;
>>>> Hmm, am I missing something here? The check in the VMEXIT
>>>> handler is just a privilege level one - where's the bitmap being
>>>> consulted? _If_ the bitmap is being maintained properly for the
>>>> guest (which I don't recall having seen), anything leading here
>>>> would be for ports the guest was not permitted access to. Yet
>>>> we would happily emulate the access for it then.
>>> Not sure I understand which bitmap needs to be consulted. The bitmap
>>> hvm_io_bitmap is used to set the intercepts which PVH also uses, with
>>> HVM defaults.
>> Actually I think I got confused by the IOPL checking in your
>> VM exit handling code: _If_ you need to check the IOPL, then
>> you also need to check the bitmap hanging off of the TSS. But
>> aiui all the privilege level checks get done inside the guest, and
>> only if those pass would the I/O bitmap hanging off of the VMCS
>> be consulted. Hence neither the IOPL check nor the TSS-based
>> bitmap check ought to be necessary here.
>
> I'm not quite following this.  Are you saying that the iopl checks are 
> done by the hardware, and if failed it will already have delivered a 
> GPF; and if we've managed to get an IO_INSTRUCTION exit then they have 
> passed, so we don't need the iopl check in vmx.c?

It does look as if the HVM path doesn't do any checking at all.

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO
  2013-09-20 17:03         ` George Dunlap
  2013-09-20 17:06           ` George Dunlap
@ 2013-09-23  6:49           ` Jan Beulich
  1 sibling, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-23  6:49 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, TimDeegan

>>> On 20.09.13 at 19:03, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 19/09/13 11:09, Jan Beulich wrote:
>>>>> On 19.09.13 at 03:02, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>>> On Wed, 18 Sep 2013 16:31:17 +0100 "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>>>> On 13.09.13 at 18:25, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>>> @@ -1624,6 +1631,13 @@ static int guest_io_okay(
>>>>>       int user_mode = !(v->arch.flags & TF_kernel_mode);
>>>>>   #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
>>>>>   
>>>>> +    /*
>>>>> +     * For PVH we check this in vmexit for
>>>>> EXIT_REASON_IO_INSTRUCTION
>>>>> +     * and so don't need to check again here.
>>>>> +     */
>>>>> +    if ( is_pvh_vcpu(v) )
>>>>> +        return 1;
>>>>> +
>>>>>       if ( !vm86_mode(regs) &&
>>>>>            (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ?
>>>>> 1 : 3)) ) return 1;
>>>> Hmm, am I missing something here? The check in the VMEXIT
>>>> handler is just a privilege level one - where's the bitmap being
>>>> consulted? _If_ the bitmap is being maintained properly for the
>>>> guest (which I don't recall having seen), anything leading here
>>>> would be for ports the guest was not permitted access to. Yet
>>>> we would happily emulate the access for it then.
>>> Not sure I understand which bitmap needs to be consulted. The bitmap
>>> hvm_io_bitmap is used to set the intercepts which PVH also uses, with
>>> HVM defaults.
>> Actually I think I got confused by the IOPL checking in your
>> VM exit handling code: _If_ you need to check the IOPL, then
>> you also need to check the bitmap hanging off of the TSS. But
>> aiui all the privilege level checks get done inside the guest, and
>> only if those pass would the I/O bitmap hanging off of the VMCS
>> be consulted. Hence neither the IOPL check nor the TSS-based
>> bitmap check ought to be necessary here.
> 
> I'm not quite following this.  Are you saying that the iopl checks are 
> done by the hardware, and if failed it will already have delivered a 
> GPF; and if we've managed to get an IO_INSTRUCTION exit then they have 
> passed, so we don't need the iopl check in vmx.c?

Yes, exactly.

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO
  2013-09-18 15:31   ` Jan Beulich
  2013-09-19  1:02     ` Mukesh Rathor
@ 2013-09-23 13:48     ` George Dunlap
  2013-09-23 14:09       ` Jan Beulich
  1 sibling, 1 reply; 76+ messages in thread
From: George Dunlap @ 2013-09-23 13:48 UTC (permalink / raw)
  To: Jan Beulich, Mukesh Rathor; +Cc: xen-devel, Keir Fraser, Tim Deegan

On 18/09/13 16:31, Jan Beulich wrote:
>> +void propagate_page_fault(unsigned long addr, u16 error_code)
>> +{
>> +    is_pvh_vcpu(current)
>> +        ? hvm_inject_page_fault(error_code, addr)
>> +        : pv_inject_page_fault(addr, error_code);
> Even if not written down in CODINGSTYLE, the majority of other
> cases in the code has the operators last thing on a line rather than
> first.

In the case of boolean operators (&&, ||) I think it makes a lot more 
sense to put them at the beginning of the line; I think it makes the 
code easier to read.

But with the ?: operator I guess it's about the same either way, so I'll 
just go ahead and change it.

  -George

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO
  2013-09-23 13:48     ` George Dunlap
@ 2013-09-23 14:09       ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2013-09-23 14:09 UTC (permalink / raw)
  To: George Dunlap, Mukesh Rathor; +Cc: xen-devel, Keir Fraser, Tim Deegan

>>> On 23.09.13 at 15:48, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 18/09/13 16:31, Jan Beulich wrote:
>>> +void propagate_page_fault(unsigned long addr, u16 error_code)
>>> +{
>>> +    is_pvh_vcpu(current)
>>> +        ? hvm_inject_page_fault(error_code, addr)
>>> +        : pv_inject_page_fault(addr, error_code);
>> Even if not written down in CODINGSTYLE, the majority of other
>> cases in the code has the operators last thing on a line rather than
>> first.
> 
> In the case of boolean operators (&&, ||) I think it makes a lot more 
> sense to put them at the beginning of the line; I think it makes the 
> code easier to read.

As a matter of fact my personal preference is to have all
operators at the start of a line (thus making it immediately
obvious that the expression is being continued), but in Xen
I try to respect the different model (albeit occasionally I
forget).

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2013-09-23 14:09 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-13 16:25 Introduce PVH domU support George Dunlap
2013-09-13 16:25 ` [PATCH RFC v12 01/21] Fix failure path in hvm_vcpu_initialise George Dunlap
2013-09-13 16:25 ` [PATCH RFC v12 02/21] Fix failure path in construct_vmcs George Dunlap
2013-09-13 16:25 ` [PATCH RFC v12 03/21] Remove an unnecessary assert from vmx_update_debug_state George Dunlap
2013-09-16 21:09   ` Mukesh Rathor
2013-09-18 10:39     ` George Dunlap
2013-09-18 12:38       ` Jan Beulich
2013-09-18 12:53         ` George Dunlap
2013-09-18 13:51           ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 04/21] pvh prep: code motion George Dunlap
2013-09-18 12:59   ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 05/21] Introduce pv guest type and has_hvm_container macros George Dunlap
2013-09-18 13:46   ` Jan Beulich
2013-09-19 16:27     ` George Dunlap
2013-09-20  8:11       ` Jan Beulich
2013-09-20  9:23         ` George Dunlap
2013-09-20  9:44           ` Jan Beulich
2013-09-19 16:58   ` George Dunlap
2013-09-20  8:38     ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 06/21] pvh: Introduce PVH guest type George Dunlap
2013-09-18 14:10   ` Jan Beulich
2013-09-20 10:01     ` George Dunlap
2013-09-13 16:25 ` [PATCH RFC v12 07/21] pvh: Disable unneeded features of HVM containers George Dunlap
2013-09-13 16:36   ` George Dunlap
     [not found]     ` <CAGU+aus16muryVYd-aOzv-CAXPk_xxVh_e-R7Ug1RxGRJ_MAfQ@mail.gmail.com>
2013-09-13 21:33       ` Aravindh Puthiyaparambil (aravindp)
2013-09-16 23:17     ` Mukesh Rathor
2013-09-18 10:50       ` George Dunlap
2013-09-18 14:18   ` Jan Beulich
2013-09-18 14:43     ` George Dunlap
2013-09-18 14:47       ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 08/21] pvh: vmx-specific changes George Dunlap
2013-09-13 16:38   ` George Dunlap
2013-09-16  7:37     ` Jan Beulich
2013-09-16  9:15       ` George Dunlap
2013-09-16 23:12     ` Mukesh Rathor
2013-09-17  8:48       ` George Dunlap
2013-09-18  0:13         ` Mukesh Rathor
2013-09-18 14:25   ` Jan Beulich
2013-09-20 13:07     ` George Dunlap
2013-09-13 16:25 ` [PATCH RFC v12 09/21] pvh: Do not allow PVH guests to change paging modes George Dunlap
2013-09-18 14:32   ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 10/21] pvh: PVH access to hypercalls George Dunlap
2013-09-18 14:45   ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 11/21] pvh: Use PV e820 George Dunlap
2013-09-18 14:48   ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 12/21] pvh: Support guest_kernel_mode for PVH George Dunlap
2013-09-18 14:52   ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 13/21] pvh: Support read_segment_register " George Dunlap
2013-09-18 14:56   ` Jan Beulich
2013-09-20 14:18     ` George Dunlap
2013-09-20 14:56       ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 14/21] pvh: read_descriptor for PVH guests George Dunlap
2013-09-13 16:40   ` George Dunlap
2013-09-18 15:00   ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 15/21] pvh: Set up more PV stuff in set_info_guest George Dunlap
2013-09-18 15:17   ` Jan Beulich
2013-09-20 14:50     ` George Dunlap
2013-09-20 14:58       ` Jan Beulich
2013-09-20 15:12         ` George Dunlap
2013-09-20 15:26           ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 16/21] pvh: Use PV handlers for emulated forced invalid ops, cpuid, and IO George Dunlap
2013-09-18 15:31   ` Jan Beulich
2013-09-19  1:02     ` Mukesh Rathor
2013-09-19 10:09       ` Jan Beulich
2013-09-20 17:03         ` George Dunlap
2013-09-20 17:06           ` George Dunlap
2013-09-23  6:49           ` Jan Beulich
2013-09-23 13:48     ` George Dunlap
2013-09-23 14:09       ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 17/21] pvh: Disable 32-bit guest support for now George Dunlap
2013-09-18 15:36   ` Jan Beulich
2013-09-13 16:25 ` [PATCH RFC v12 18/21] pvh: Restrict tsc_mode to NEVER_EMULATE " George Dunlap
2013-09-13 16:25 ` [PATCH RFC v12 19/21] pvh: Disable debug traps when doing pv emulation for PVH domains George Dunlap
2013-09-13 16:25 ` [PATCH RFC v12 20/21] pvh: Disable memevents for PVH guests for now George Dunlap
2013-09-13 16:25 ` [PATCH RFC v12 21/21] pvh: Documentation George Dunlap
2013-09-13 16:41 ` Introduce PVH domU support George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.