All of lore.kernel.org
 help / color / mirror / Atom feed
* [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
@ 2013-08-23  1:18 Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 01/21] PVH xen: Add readme docs/misc/pvh-readme.txt Mukesh Rathor
                   ` (21 more replies)
  0 siblings, 22 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:18 UTC (permalink / raw)
  To: Xen-devel

Finally, I've the V11 set of patches.

V11:
   - gdt union patch not needed anymore, so dropped it.
   - patch 17 made the last patch
   - merged patch 22 and 23. 

Individual patches state the changes made to them in V11, if any.

As a result of V11, I've three work items in the immediate:
   - Investigate guest IRET fault handling in vmexit handler.
   - PVH guest shutdown hangs intermittantly in some handle_speaker_io code
   - submit linux patch as a result of the redone vcpu set context function.

To repeat from before, these are xen changes to support
boot of a 64bit PVH domU guest. Built on top of unstable git c/s:
0c006b41a283a0a569c863d44abde5aa5750ae01

The public git tree for this:
   git clone -n git://oss.oracle.com/git/mrathor/xen.git .
   git checkout pvh.v11-final

Coming in future after this is done, two patchsets: 
    - 1) tools changes and 2) dom0 changes.

Thanks for all the help,
Mukesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [V11 PATCH 01/21] PVH xen: Add readme docs/misc/pvh-readme.txt
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
@ 2013-08-23  1:18 ` Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 02/21] PVH xen: add params to read_segment_register Mukesh Rathor
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:18 UTC (permalink / raw)
  To: Xen-devel

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
---
 docs/misc/pvh-readme.txt |   59 ++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 59 insertions(+), 0 deletions(-)
 create mode 100644 docs/misc/pvh-readme.txt

diff --git a/docs/misc/pvh-readme.txt b/docs/misc/pvh-readme.txt
new file mode 100644
index 0000000..0c00595
--- /dev/null
+++ b/docs/misc/pvh-readme.txt
@@ -0,0 +1,59 @@
+
+PVH : an x86 PV guest running in an HVM container. HAP is required for PVH.
+
+See: http://blog.xen.org/index.php/2012/10/23/the-paravirtualization-spectrum-part-1-the-ends-of-the-spectrum/
+
+At present the only PVH guest is an x86 64bit PV linux. Patches are at:
+   git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
+
+A PVH guest kernel must support following features, as defined for linux
+in arch/x86/xen/xen-head.S:
+
+   #define FEATURES_PVH "|writable_descriptor_tables" \
+                        "|auto_translated_physmap"    \
+                        "|supervisor_mode_kernel"     \
+                        "|hvm_callback_vector"
+
+In a nutshell, the guest uses auto translate, ie, p2m is managed by xen,
+it uses event callback and not vlapic emulation, the page tables are
+native, so mmu_update hcall is N/A for PVH guest. Moreover IDT is native, so
+set_trap_table hcall is also N/A for a PVH guest. For a full list of hcalls
+supported for PVH, see pvh_hypercall64_table in arch/x86/hvm/hvm.c in xen.
+From the ABI prespective, it's mostly a PV guest with auto translate, altho
+it does use hvm_op for setting callback vector.
+
+The initial phase targets the booting of a 64bit UP/SMP linux guest in PVH
+mode. This is done by adding: pvh=1 in the config file. xl, and not xm, is
+supported. Phase I patches are broken into three parts:
+   - xen changes for booting of 64bit PVH guest
+   - tools changes for creating a PVH guest
+   - boot of 64bit dom0 in PVH mode.
+
+Following fixme's exist in the code:
+  - Add support for more memory types in arch/x86/hvm/mtrr.c.
+  - arch/x86/time.c: support more tsc modes.
+  - check_guest_io_breakpoint(): check/add support for IO breakpoint.
+  - implement arch_get_info_guest() for pvh.
+  - vmxit_msr_read(): during AMD port go thru hvm_msr_read_intercept() again.
+  - verify bp matching on emulated instructions will work same as HVM for
+    PVH guest. see instruction_done() and check_guest_io_breakpoint().
+
+Following remain to be done for PVH:
+   - Virtual Machine introspection support: check_wakeup_from_wait() is being
+     skipped for PVH because hvm_do_resume is not called in vmx_do_resume.
+     Investigate what else needs to be done for VMI support.
+   - AMD port.
+   - 32bit PVH guest support in both linux and xen. Xen changes are tagged
+     "32bitfixme".
+   - Add support for monitoring guest behavior. See hvm_memory_event* functions
+     in hvm.c
+   - vcpu hotplug support
+   - Live migration of PVH guests.
+   - Avail PVH dom0 of posted interrupts. (This will be a big win).
+
+
+Note, any emails to me must be cc'd to xen devel mailing list. OTOH, please
+cc me on PVH emails to the xen devel mailing list.
+
+Mukesh Rathor
+mukesh.rathor [at] oracle [dot] com
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 02/21] PVH xen: add params to read_segment_register
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 01/21] PVH xen: Add readme docs/misc/pvh-readme.txt Mukesh Rathor
@ 2013-08-23  1:18 ` Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 03/21] PVH xen: Move e820 fields out of pv_domain struct Mukesh Rathor
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:18 UTC (permalink / raw)
  To: Xen-devel

In this preparatory patch, read_segment_register macro is changed to take
vcpu and regs parameters. No functionality change.

Changes in V2:  None
Changes in V3:
   - Replace read_sreg with read_segment_register

Changes in V7:
   - Don't make emulate_privileged_op() public here.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/domain.c        |    8 ++++----
 xen/arch/x86/traps.c         |   26 ++++++++++++--------------
 xen/arch/x86/x86_64/traps.c  |   16 ++++++++--------
 xen/include/asm-x86/system.h |    2 +-
 4 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 874742c..1ffdcfe 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1221,10 +1221,10 @@ static void save_segments(struct vcpu *v)
     struct cpu_user_regs *regs = &v->arch.user_regs;
     unsigned int dirty_segment_mask = 0;
 
-    regs->ds = read_segment_register(ds);
-    regs->es = read_segment_register(es);
-    regs->fs = read_segment_register(fs);
-    regs->gs = read_segment_register(gs);
+    regs->ds = read_segment_register(v, regs, ds);
+    regs->es = read_segment_register(v, regs, es);
+    regs->fs = read_segment_register(v, regs, fs);
+    regs->gs = read_segment_register(v, regs, gs);
 
     if ( regs->ds )
         dirty_segment_mask |= DIRTY_DS;
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index b445b2f..0519836 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1832,8 +1832,6 @@ static inline uint64_t guest_misc_enable(uint64_t val)
     }                                                                       \
     (eip) += sizeof(_x); _x; })
 
-#define read_sreg(regs, sr) read_segment_register(sr)
-
 static int is_cpufreq_controller(struct domain *d)
 {
     return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
@@ -1878,7 +1876,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         goto fail;
 
     /* emulating only opcodes not allowing SS to be default */
-    data_sel = read_sreg(regs, ds);
+    data_sel = read_segment_register(v, regs, ds);
 
     /* Legacy prefixes. */
     for ( i = 0; i < 8; i++, rex == opcode || (rex = 0) )
@@ -1896,17 +1894,17 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             data_sel = regs->cs;
             continue;
         case 0x3e: /* DS override */
-            data_sel = read_sreg(regs, ds);
+            data_sel = read_segment_register(v, regs, ds);
             continue;
         case 0x26: /* ES override */
-            data_sel = read_sreg(regs, es);
+            data_sel = read_segment_register(v, regs, es);
             continue;
         case 0x64: /* FS override */
-            data_sel = read_sreg(regs, fs);
+            data_sel = read_segment_register(v, regs, fs);
             lm_ovr = lm_seg_fs;
             continue;
         case 0x65: /* GS override */
-            data_sel = read_sreg(regs, gs);
+            data_sel = read_segment_register(v, regs, gs);
             lm_ovr = lm_seg_gs;
             continue;
         case 0x36: /* SS override */
@@ -1953,7 +1951,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
 
         if ( !(opcode & 2) )
         {
-            data_sel = read_sreg(regs, es);
+            data_sel = read_segment_register(v, regs, es);
             lm_ovr = lm_seg_none;
         }
 
@@ -2686,22 +2684,22 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             ASSERT(opnd_sel);
             continue;
         case 0x3e: /* DS override */
-            opnd_sel = read_sreg(regs, ds);
+            opnd_sel = read_segment_register(v, regs, ds);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
         case 0x26: /* ES override */
-            opnd_sel = read_sreg(regs, es);
+            opnd_sel = read_segment_register(v, regs, es);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
         case 0x64: /* FS override */
-            opnd_sel = read_sreg(regs, fs);
+            opnd_sel = read_segment_register(v, regs, fs);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
         case 0x65: /* GS override */
-            opnd_sel = read_sreg(regs, gs);
+            opnd_sel = read_segment_register(v, regs, gs);
             if ( !opnd_sel )
                 opnd_sel = dpl;
             continue;
@@ -2754,7 +2752,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
                             switch ( modrm & 7 )
                             {
                             default:
-                                opnd_sel = read_sreg(regs, ds);
+                                opnd_sel = read_segment_register(v, regs, ds);
                                 break;
                             case 4: case 5:
                                 opnd_sel = regs->ss;
@@ -2782,7 +2780,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
                             break;
                         }
                         if ( !opnd_sel )
-                            opnd_sel = read_sreg(regs, ds);
+                            opnd_sel = read_segment_register(v, regs, ds);
                         switch ( modrm & 7 )
                         {
                         case 0: case 2: case 4:
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 1cc977c..90e07fd 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -123,10 +123,10 @@ void show_registers(struct cpu_user_regs *regs)
         fault_crs[0] = read_cr0();
         fault_crs[3] = read_cr3();
         fault_crs[4] = read_cr4();
-        fault_regs.ds = read_segment_register(ds);
-        fault_regs.es = read_segment_register(es);
-        fault_regs.fs = read_segment_register(fs);
-        fault_regs.gs = read_segment_register(gs);
+        fault_regs.ds = read_segment_register(v, regs, ds);
+        fault_regs.es = read_segment_register(v, regs, es);
+        fault_regs.fs = read_segment_register(v, regs, fs);
+        fault_regs.gs = read_segment_register(v, regs, gs);
     }
 
     print_xen_info();
@@ -239,10 +239,10 @@ void do_double_fault(struct cpu_user_regs *regs)
     crs[2] = read_cr2();
     crs[3] = read_cr3();
     crs[4] = read_cr4();
-    regs->ds = read_segment_register(ds);
-    regs->es = read_segment_register(es);
-    regs->fs = read_segment_register(fs);
-    regs->gs = read_segment_register(gs);
+    regs->ds = read_segment_register(current, regs, ds);
+    regs->es = read_segment_register(current, regs, es);
+    regs->fs = read_segment_register(current, regs, fs);
+    regs->gs = read_segment_register(current, regs, gs);
 
     printk("CPU:    %d\n", cpu);
     _show_registers(regs, crs, CTXT_hypervisor, NULL);
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 6ab7d56..9bb22cb 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -4,7 +4,7 @@
 #include <xen/lib.h>
 #include <xen/bitops.h>
 
-#define read_segment_register(name)                             \
+#define read_segment_register(vcpu, regs, name)                 \
 ({  u16 __sel;                                                  \
     asm volatile ( "movw %%" STR(name) ",%0" : "=r" (__sel) );  \
     __sel;                                                      \
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 03/21] PVH xen: Move e820 fields out of pv_domain struct
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 01/21] PVH xen: Add readme docs/misc/pvh-readme.txt Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 02/21] PVH xen: add params to read_segment_register Mukesh Rathor
@ 2013-08-23  1:18 ` Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 04/21] PVH xen: hvm related preparatory changes for PVH Mukesh Rathor
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:18 UTC (permalink / raw)
  To: Xen-devel

This patch moves fields out of the pv_domain struct as they are used by
PVH also.

Changes in V6:
  - Don't base on guest type the initialization and cleanup.

Changes in V7:
  - If statement doesn't need to be split across lines anymore.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/domain.c        |    9 +++------
 xen/arch/x86/mm.c            |   26 ++++++++++++--------------
 xen/include/asm-x86/domain.h |   10 +++++-----
 3 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 1ffdcfe..d124507 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -553,6 +553,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
         if ( (rc = iommu_domain_init(d)) != 0 )
             goto fail;
     }
+    spin_lock_init(&d->arch.e820_lock);
 
     if ( is_hvm_domain(d) )
     {
@@ -563,13 +564,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
         }
     }
     else
-    {
         /* 64-bit PV guest by default. */
         d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
 
-        spin_lock_init(&d->arch.pv_domain.e820_lock);
-    }
-
     /* initialize default tsc behavior in case tools don't */
     tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
     spin_lock_init(&d->arch.vtsc_lock);
@@ -592,8 +589,8 @@ void arch_domain_destroy(struct domain *d)
 {
     if ( is_hvm_domain(d) )
         hvm_domain_destroy(d);
-    else
-        xfree(d->arch.pv_domain.e820);
+
+    xfree(d->arch.e820);
 
     free_domain_pirqs(d);
     if ( !is_idle_domain(d) )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e7f0e13..8ee5488 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4760,11 +4760,11 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -EFAULT;
         }
 
-        spin_lock(&d->arch.pv_domain.e820_lock);
-        xfree(d->arch.pv_domain.e820);
-        d->arch.pv_domain.e820 = e820;
-        d->arch.pv_domain.nr_e820 = fmap.map.nr_entries;
-        spin_unlock(&d->arch.pv_domain.e820_lock);
+        spin_lock(&d->arch.e820_lock);
+        xfree(d->arch.e820);
+        d->arch.e820 = e820;
+        d->arch.nr_e820 = fmap.map.nr_entries;
+        spin_unlock(&d->arch.e820_lock);
 
         rcu_unlock_domain(d);
         return rc;
@@ -4778,26 +4778,24 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&map, arg, 1) )
             return -EFAULT;
 
-        spin_lock(&d->arch.pv_domain.e820_lock);
+        spin_lock(&d->arch.e820_lock);
 
         /* Backwards compatibility. */
-        if ( (d->arch.pv_domain.nr_e820 == 0) ||
-             (d->arch.pv_domain.e820 == NULL) )
+        if ( (d->arch.nr_e820 == 0) || (d->arch.e820 == NULL) )
         {
-            spin_unlock(&d->arch.pv_domain.e820_lock);
+            spin_unlock(&d->arch.e820_lock);
             return -ENOSYS;
         }
 
-        map.nr_entries = min(map.nr_entries, d->arch.pv_domain.nr_e820);
-        if ( copy_to_guest(map.buffer, d->arch.pv_domain.e820,
-                           map.nr_entries) ||
+        map.nr_entries = min(map.nr_entries, d->arch.nr_e820);
+        if ( copy_to_guest(map.buffer, d->arch.e820, map.nr_entries) ||
              __copy_to_guest(arg, &map, 1) )
         {
-            spin_unlock(&d->arch.pv_domain.e820_lock);
+            spin_unlock(&d->arch.e820_lock);
             return -EFAULT;
         }
 
-        spin_unlock(&d->arch.pv_domain.e820_lock);
+        spin_unlock(&d->arch.e820_lock);
         return 0;
     }
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index d79464d..c3f9f8e 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -234,11 +234,6 @@ struct pv_domain
 
     /* map_domain_page() mapping cache. */
     struct mapcache_domain mapcache;
-
-    /* Pseudophysical e820 map (XENMEM_memory_map).  */
-    spinlock_t e820_lock;
-    struct e820entry *e820;
-    unsigned int nr_e820;
 };
 
 struct arch_domain
@@ -313,6 +308,11 @@ struct arch_domain
                                 (possibly other cases in the future */
     uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */
     uint64_t vtsc_usercount; /* not used for hvm */
+
+    /* Pseudophysical e820 map (XENMEM_memory_map).  */
+    spinlock_t e820_lock;
+    struct e820entry *e820;
+    unsigned int nr_e820;
 } __cacheline_aligned;
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 04/21] PVH xen: hvm related preparatory changes for PVH
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (2 preceding siblings ...)
  2013-08-23  1:18 ` [V11 PATCH 03/21] PVH xen: Move e820 fields out of pv_domain struct Mukesh Rathor
@ 2013-08-23  1:18 ` Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 05/21] PVH xen: vmx " Mukesh Rathor
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:18 UTC (permalink / raw)
  To: Xen-devel

This patch contains small changes to hvm.c because hvm_domain.params is
not set/used/supported for PVH in the present series.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1fcaed0..8284b3b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1070,10 +1070,13 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    domid_t dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+    domid_t dm_domid;
 
     hvm_asid_flush_vcpu(v);
 
+    spin_lock_init(&v->arch.hvm_vcpu.tm_lock);
+    INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list);
+
     if ( (rc = vlapic_init(v)) != 0 )
         goto fail1;
 
@@ -1084,6 +1087,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) 
         goto fail3;
 
+    dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+
     /* Create ioreq event channel. */
     rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL);
     if ( rc < 0 )
@@ -1106,9 +1111,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
         get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
     spin_unlock(&d->arch.hvm_domain.ioreq.lock);
 
-    spin_lock_init(&v->arch.hvm_vcpu.tm_lock);
-    INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list);
-
     v->arch.hvm_vcpu.inject_trap.vector = -1;
 
     rc = setup_compat_arg_xlat(v);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 05/21] PVH xen: vmx related preparatory changes for PVH
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (3 preceding siblings ...)
  2013-08-23  1:18 ` [V11 PATCH 04/21] PVH xen: hvm related preparatory changes for PVH Mukesh Rathor
@ 2013-08-23  1:18 ` Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 06/21] PVH xen: vmcs " Mukesh Rathor
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:18 UTC (permalink / raw)
  To: Xen-devel

This is another preparatory patch for PVH. In this patch, following
functions are made available for general/public use:
    vmx_fpu_enter(), get_instruction_length(), update_guest_eip(),
    and vmx_dr_access().

There is no functionality change.

Changes in V2:
  - prepend vmx_ to get_instruction_length and update_guest_eip.
  - Do not export/use vmr().

Changes in V3:
  - Do not change emulate_forced_invalid_op() in this patch.

Changes in V7:
  - Drop pv_cpuid going public here.

Changes in V8:
  - Move vmx_fpu_enter prototype from vmcs.h to vmx.h

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmx.c        |   72 +++++++++++++++---------------------
 xen/arch/x86/hvm/vmx/vvmx.c       |    2 +-
 xen/include/asm-x86/hvm/vmx/vmx.h |   17 ++++++++-
 3 files changed, 47 insertions(+), 44 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 8ed7026..7292357 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -577,7 +577,7 @@ static int vmx_load_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt)
     return 0;
 }
 
-static void vmx_fpu_enter(struct vcpu *v)
+void vmx_fpu_enter(struct vcpu *v)
 {
     vcpu_restore_fpu_lazy(v);
     v->arch.hvm_vmx.exception_bitmap &= ~(1u << TRAP_no_device);
@@ -1608,24 +1608,12 @@ const struct hvm_function_table * __init start_vmx(void)
     return &vmx_function_table;
 }
 
-/*
- * Not all cases receive valid value in the VM-exit instruction length field.
- * Callers must know what they're doing!
- */
-static int get_instruction_length(void)
-{
-    int len;
-    len = __vmread(VM_EXIT_INSTRUCTION_LEN); /* Safe: callers audited */
-    BUG_ON((len < 1) || (len > 15));
-    return len;
-}
-
-void update_guest_eip(void)
+void vmx_update_guest_eip(void)
 {
     struct cpu_user_regs *regs = guest_cpu_user_regs();
     unsigned long x;
 
-    regs->eip += get_instruction_length(); /* Safe: callers audited */
+    regs->eip += vmx_get_instruction_length(); /* Safe: callers audited */
     regs->eflags &= ~X86_EFLAGS_RF;
 
     x = __vmread(GUEST_INTERRUPTIBILITY_INFO);
@@ -1698,8 +1686,8 @@ static void vmx_do_cpuid(struct cpu_user_regs *regs)
     regs->edx = edx;
 }
 
-static void vmx_dr_access(unsigned long exit_qualification,
-                          struct cpu_user_regs *regs)
+void vmx_dr_access(unsigned long exit_qualification,
+                   struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
 
@@ -2312,7 +2300,7 @@ static int vmx_handle_eoi_write(void)
     if ( (((exit_qualification >> 12) & 0xf) == 1) &&
          ((exit_qualification & 0xfff) == APIC_EOI) )
     {
-        update_guest_eip(); /* Safe: APIC data write */
+        vmx_update_guest_eip(); /* Safe: APIC data write */
         vlapic_EOI_set(vcpu_vlapic(current));
         HVMTRACE_0D(VLAPIC);
         return 1;
@@ -2525,7 +2513,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             HVMTRACE_1D(TRAP, vector);
             if ( v->domain->debugger_attached )
             {
-                update_guest_eip(); /* Safe: INT3 */            
+                vmx_update_guest_eip(); /* Safe: INT3 */
                 current->arch.gdbsx_vcpu_event = TRAP_int3;
                 domain_pause_for_debugger();
                 break;
@@ -2633,7 +2621,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
          */
         inst_len = ((source != 3) ||        /* CALL, IRET, or JMP? */
                     (idtv_info & (1u<<10))) /* IntrType > 3? */
-            ? get_instruction_length() /* Safe: SDM 3B 23.2.4 */ : 0;
+            ? vmx_get_instruction_length() /* Safe: SDM 3B 23.2.4 */ : 0;
         if ( (source == 3) && (idtv_info & INTR_INFO_DELIVER_CODE_MASK) )
             ecode = __vmread(IDT_VECTORING_ERROR_CODE);
         regs->eip += inst_len;
@@ -2641,15 +2629,15 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         break;
     }
     case EXIT_REASON_CPUID:
-        update_guest_eip(); /* Safe: CPUID */
+        vmx_update_guest_eip(); /* Safe: CPUID */
         vmx_do_cpuid(regs);
         break;
     case EXIT_REASON_HLT:
-        update_guest_eip(); /* Safe: HLT */
+        vmx_update_guest_eip(); /* Safe: HLT */
         hvm_hlt(regs->eflags);
         break;
     case EXIT_REASON_INVLPG:
-        update_guest_eip(); /* Safe: INVLPG */
+        vmx_update_guest_eip(); /* Safe: INVLPG */
         exit_qualification = __vmread(EXIT_QUALIFICATION);
         vmx_invlpg_intercept(exit_qualification);
         break;
@@ -2657,7 +2645,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         regs->ecx = hvm_msr_tsc_aux(v);
         /* fall through */
     case EXIT_REASON_RDTSC:
-        update_guest_eip(); /* Safe: RDTSC, RDTSCP */
+        vmx_update_guest_eip(); /* Safe: RDTSC, RDTSCP */
         hvm_rdtsc_intercept(regs);
         break;
     case EXIT_REASON_VMCALL:
@@ -2667,7 +2655,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         rc = hvm_do_hypercall(regs);
         if ( rc != HVM_HCALL_preempted )
         {
-            update_guest_eip(); /* Safe: VMCALL */
+            vmx_update_guest_eip(); /* Safe: VMCALL */
             if ( rc == HVM_HCALL_invalidate )
                 send_invalidate_req();
         }
@@ -2677,7 +2665,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     {
         exit_qualification = __vmread(EXIT_QUALIFICATION);
         if ( vmx_cr_access(exit_qualification) == X86EMUL_OKAY )
-            update_guest_eip(); /* Safe: MOV Cn, LMSW, CLTS */
+            vmx_update_guest_eip(); /* Safe: MOV Cn, LMSW, CLTS */
         break;
     }
     case EXIT_REASON_DR_ACCESS:
@@ -2691,7 +2679,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         {
             regs->eax = (uint32_t)msr_content;
             regs->edx = (uint32_t)(msr_content >> 32);
-            update_guest_eip(); /* Safe: RDMSR */
+            vmx_update_guest_eip(); /* Safe: RDMSR */
         }
         break;
     }
@@ -2700,63 +2688,63 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         uint64_t msr_content;
         msr_content = ((uint64_t)regs->edx << 32) | (uint32_t)regs->eax;
         if ( hvm_msr_write_intercept(regs->ecx, msr_content) == X86EMUL_OKAY )
-            update_guest_eip(); /* Safe: WRMSR */
+            vmx_update_guest_eip(); /* Safe: WRMSR */
         break;
     }
 
     case EXIT_REASON_VMXOFF:
         if ( nvmx_handle_vmxoff(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMXON:
         if ( nvmx_handle_vmxon(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMCLEAR:
         if ( nvmx_handle_vmclear(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
  
     case EXIT_REASON_VMPTRLD:
         if ( nvmx_handle_vmptrld(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMPTRST:
         if ( nvmx_handle_vmptrst(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMREAD:
         if ( nvmx_handle_vmread(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
  
     case EXIT_REASON_VMWRITE:
         if ( nvmx_handle_vmwrite(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMLAUNCH:
         if ( nvmx_handle_vmlaunch(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_VMRESUME:
         if ( nvmx_handle_vmresume(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_INVEPT:
         if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_INVVPID:
         if ( nvmx_handle_invvpid(regs) == X86EMUL_OKAY )
-            update_guest_eip();
+            vmx_update_guest_eip();
         break;
 
     case EXIT_REASON_MWAIT_INSTRUCTION:
@@ -2804,14 +2792,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             int bytes = (exit_qualification & 0x07) + 1;
             int dir = (exit_qualification & 0x08) ? IOREQ_READ : IOREQ_WRITE;
             if ( handle_pio(port, bytes, dir) )
-                update_guest_eip(); /* Safe: IN, OUT */
+                vmx_update_guest_eip(); /* Safe: IN, OUT */
         }
         break;
 
     case EXIT_REASON_INVD:
     case EXIT_REASON_WBINVD:
     {
-        update_guest_eip(); /* Safe: INVD, WBINVD */
+        vmx_update_guest_eip(); /* Safe: INVD, WBINVD */
         vmx_wbinvd_intercept();
         break;
     }
@@ -2843,7 +2831,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     case EXIT_REASON_XSETBV:
         if ( hvm_handle_xsetbv(regs->ecx,
                                (regs->rdx << 32) | regs->_eax) == 0 )
-            update_guest_eip(); /* Safe: XSETBV */
+            vmx_update_guest_eip(); /* Safe: XSETBV */
         break;
 
     case EXIT_REASON_APIC_WRITE:
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 5dfbc54..82be4cc 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -2139,7 +2139,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
             tsc += __get_vvmcs(nvcpu->nv_vvmcx, TSC_OFFSET);
             regs->eax = (uint32_t)tsc;
             regs->edx = (uint32_t)(tsc >> 32);
-            update_guest_eip();
+            vmx_update_guest_eip();
 
             return 1;
         }
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index c33b9f9..c21a303 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -446,6 +446,18 @@ static inline int __vmxon(u64 addr)
     return rc;
 }
 
+/*
+ * Not all cases receive valid value in the VM-exit instruction length field.
+ * Callers must know what they're doing!
+ */
+static inline int vmx_get_instruction_length(void)
+{
+    int len;
+    len = __vmread(VM_EXIT_INSTRUCTION_LEN); /* Safe: callers audited */
+    BUG_ON((len < 1) || (len > 15));
+    return len;
+}
+
 void vmx_get_segment_register(struct vcpu *, enum x86_segment,
                               struct segment_register *);
 void vmx_inject_extint(int trap);
@@ -457,7 +469,10 @@ void ept_p2m_uninit(struct p2m_domain *p2m);
 void ept_walk_table(struct domain *d, unsigned long gfn);
 void setup_ept_dump(void);
 
-void update_guest_eip(void);
+void vmx_update_guest_eip(void);
+void vmx_dr_access(unsigned long exit_qualification,
+                   struct cpu_user_regs *regs);
+void vmx_fpu_enter(struct vcpu *v);
 
 int alloc_p2m_hap_data(struct p2m_domain *p2m);
 void free_p2m_hap_data(struct p2m_domain *p2m);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 06/21] PVH xen: vmcs related preparatory changes for PVH
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (4 preceding siblings ...)
  2013-08-23  1:18 ` [V11 PATCH 05/21] PVH xen: vmx " Mukesh Rathor
@ 2013-08-23  1:18 ` Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 07/21] PVH xen: Introduce PVH guest type and some basic changes Mukesh Rathor
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:18 UTC (permalink / raw)
  To: Xen-devel

In this patch, some code is factored out of construct_vmcs() to
create vmx_set_host_vmcs_fields(). If we decide to create a separate
vmcs function for PVH again, both call call vmx_set_host_vmcs_fields.

Changes in V11:
  - Rename to vmx_set_host_vmcs_fields.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c |   58 +++++++++++++++++++++++-------------------
 1 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 79efb3f..36c4adf 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -828,11 +828,40 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
     virtual_vmcs_exit(vvmcs);
 }
 
-static int construct_vmcs(struct vcpu *v)
+static void vmx_set_host_vmcs_fields(struct vcpu *v)
 {
-    struct domain *d = v->domain;
     uint16_t sysenter_cs;
     unsigned long sysenter_eip;
+
+    /* Host data selectors. */
+    __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS);
+    __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS);
+    __vmwrite(HOST_ES_SELECTOR, __HYPERVISOR_DS);
+    __vmwrite(HOST_FS_SELECTOR, 0);
+    __vmwrite(HOST_GS_SELECTOR, 0);
+    __vmwrite(HOST_FS_BASE, 0);
+    __vmwrite(HOST_GS_BASE, 0);
+
+    /* Host control registers. */
+    v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS;
+    __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);
+    __vmwrite(HOST_CR4,
+              mmu_cr4_features | (xsave_enabled(v) ? X86_CR4_OSXSAVE : 0));
+
+    /* Host CS:RIP. */
+    __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS);
+    __vmwrite(HOST_RIP, (unsigned long)vmx_asm_vmexit_handler);
+
+    /* Host SYSENTER CS:RIP. */
+    rdmsrl(MSR_IA32_SYSENTER_CS, sysenter_cs);
+    __vmwrite(HOST_SYSENTER_CS, sysenter_cs);
+    rdmsrl(MSR_IA32_SYSENTER_EIP, sysenter_eip);
+    __vmwrite(HOST_SYSENTER_EIP, sysenter_eip);
+}
+
+static int construct_vmcs(struct vcpu *v)
+{
+    struct domain *d = v->domain;
     u32 vmexit_ctl = vmx_vmexit_control;
     u32 vmentry_ctl = vmx_vmentry_control;
 
@@ -935,30 +964,7 @@ static int construct_vmcs(struct vcpu *v)
         __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
     }
 
-    /* Host data selectors. */
-    __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS);
-    __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS);
-    __vmwrite(HOST_ES_SELECTOR, __HYPERVISOR_DS);
-    __vmwrite(HOST_FS_SELECTOR, 0);
-    __vmwrite(HOST_GS_SELECTOR, 0);
-    __vmwrite(HOST_FS_BASE, 0);
-    __vmwrite(HOST_GS_BASE, 0);
-
-    /* Host control registers. */
-    v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS;
-    __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);
-    __vmwrite(HOST_CR4,
-              mmu_cr4_features | (xsave_enabled(v) ? X86_CR4_OSXSAVE : 0));
-
-    /* Host CS:RIP. */
-    __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS);
-    __vmwrite(HOST_RIP, (unsigned long)vmx_asm_vmexit_handler);
-
-    /* Host SYSENTER CS:RIP. */
-    rdmsrl(MSR_IA32_SYSENTER_CS, sysenter_cs);
-    __vmwrite(HOST_SYSENTER_CS, sysenter_cs);
-    rdmsrl(MSR_IA32_SYSENTER_EIP, sysenter_eip);
-    __vmwrite(HOST_SYSENTER_EIP, sysenter_eip);
+    vmx_set_host_vmcs_fields(v);
 
     /* MSR intercepts. */
     __vmwrite(VM_EXIT_MSR_LOAD_COUNT, 0);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 07/21] PVH xen: Introduce PVH guest type and some basic changes.
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (5 preceding siblings ...)
  2013-08-23  1:18 ` [V11 PATCH 06/21] PVH xen: vmcs " Mukesh Rathor
@ 2013-08-23  1:18 ` Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 08/21] PVH xen: introduce pvh_vcpu_boot_set_info() and vmx_pvh_vcpu_boot_set_info() Mukesh Rathor
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:18 UTC (permalink / raw)
  To: Xen-devel

This patch introduces the concept of a pvh guest. There are other basic
changes like creating macros to check for pv/pvh vcpu/domain, and also
modifying copy-macros to account for pvh. Finally, guest_kernel_mode is
changed to boast that a PVH doesn't need to check for TF_kernel_mode
flag since the kernel runs in ring 0.

Note, we drop the const qualifier from vcpu_show_registers() to accomodate
the hvm function call chain in guest_kernel_mode(). The leaf function
svm_get_segment_register calls vcpu_runnable which can't have const *v. On
vmx side, vmx_get_segment_register calls vmx_vmcs_enter which may
involved vcpu_pause, and that can't have a const pointer passed either.

Chagnes in V2:
  - make is_pvh/is_hvm enum instead of adding is_pvh as a new flag.
  - fix indentation and spacing in guest_kernel_mode macro.
  - add debug only BUG() in GUEST_KERNEL_RPL macro as it should no longer
    be called in any PVH paths.

Chagnes in V3:
  - Rename enum fields, and add is_pv to it.
  - Get rid if is_hvm_or_pvh_* macros.

Chagnes in V4:
  - Move e820 fields out of pv_domain struct.

Chagnes in V5:
  - Move e820 changes above in V4, to a separate patch.

Chagnes in V5:
  - Rename enum guest_type from is_pv, ... to guest_type_pv, ....

Chagnes in V8:
  - Got to VMCS for DPL check instead of checking the rpl in
    guest_kernel_mode.
  - Also, hvm_kernel_mode is put in hvm.c because it's called from
    guest_kernel_mode in regs.h which is a pretty early header include.
    Hence, we can't place it in hvm.h like other similar functions.
    The other alternative, to put hvm_kernel_mode in regs.h itself,
    but then it calls hvm_get_segment_register() for which either we
    need to include hvm.h in regs.h, not possible, or add
    proto for hvm_get_segment_register(). But then the args to
    hvm_get_segment_register() also need their headers. So, in the
    end this seems to be the best/only way.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/debug.c               |    2 +-
 xen/arch/x86/hvm/hvm.c             |    8 ++++++++
 xen/arch/x86/x86_64/traps.c        |    2 +-
 xen/common/domain.c                |    2 +-
 xen/include/asm-x86/desc.h         |    4 +++-
 xen/include/asm-x86/domain.h       |    2 +-
 xen/include/asm-x86/guest_access.h |   12 ++++++------
 xen/include/asm-x86/x86_64/regs.h  |   11 +++++++----
 xen/include/public/domctl.h        |    3 +++
 xen/include/xen/sched.h            |   21 ++++++++++++++++++---
 10 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c
index e67473e..167421d 100644
--- a/xen/arch/x86/debug.c
+++ b/xen/arch/x86/debug.c
@@ -158,7 +158,7 @@ dbg_rw_guest_mem(dbgva_t addr, dbgbyte_t *buf, int len, struct domain *dp,
 
         pagecnt = min_t(long, PAGE_SIZE - (addr & ~PAGE_MASK), len);
 
-        mfn = (dp->is_hvm
+        mfn = (!is_pv_domain(dp)
                ? dbg_hvm_va2mfn(addr, dp, toaddr, &gfn)
                : dbg_pv_va2mfn(addr, dp, pgd3));
         if ( mfn == INVALID_MFN ) 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8284b3b..bac4708 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4642,6 +4642,14 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v)
     return hvm_funcs.nhvm_intr_blocked(v);
 }
 
+bool_t hvm_kernel_mode(struct vcpu *v)
+{
+    struct segment_register seg;
+
+    hvm_get_segment_register(v, x86_seg_ss, &seg);
+    return (seg.attr.fields.dpl == 0);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 90e07fd..2134cee 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -142,7 +142,7 @@ void show_registers(struct cpu_user_regs *regs)
     }
 }
 
-void vcpu_show_registers(const struct vcpu *v)
+void vcpu_show_registers(struct vcpu *v)
 {
     const struct cpu_user_regs *regs = &v->arch.user_regs;
     unsigned long crs[8];
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 6c264a5..38b1bad 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -236,7 +236,7 @@ struct domain *domain_create(
         goto fail;
 
     if ( domcr_flags & DOMCRF_hvm )
-        d->is_hvm = 1;
+        d->guest_type = guest_type_hvm;
 
     if ( domid == 0 )
     {
diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h
index 354b889..041e9d3 100644
--- a/xen/include/asm-x86/desc.h
+++ b/xen/include/asm-x86/desc.h
@@ -38,7 +38,9 @@
 
 #ifndef __ASSEMBLY__
 
-#define GUEST_KERNEL_RPL(d) (is_pv_32bit_domain(d) ? 1 : 3)
+/* PVH 32bitfixme : see emulate_gate_op call from do_general_protection */
+#define GUEST_KERNEL_RPL(d) ({ ASSERT(is_pv_domain(d)); \
+                               is_pv_32bit_domain(d) ? 1 : 3; })
 
 /* Fix up the RPL of a guest segment selector. */
 #define __fixup_guest_selector(d, sel)                             \
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index c3f9f8e..22a72df 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -447,7 +447,7 @@ struct arch_vcpu
 #define hvm_svm         hvm_vcpu.u.svm
 
 void vcpu_show_execution_state(struct vcpu *);
-void vcpu_show_registers(const struct vcpu *);
+void vcpu_show_registers(struct vcpu *);
 
 /* Clean up CR4 bits that are not under guest control. */
 unsigned long pv_guest_cr4_fixup(const struct vcpu *, unsigned long guest_cr4);
diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h
index ca700c9..675dda1 100644
--- a/xen/include/asm-x86/guest_access.h
+++ b/xen/include/asm-x86/guest_access.h
@@ -14,27 +14,27 @@
 
 /* Raw access functions: no type checking. */
 #define raw_copy_to_guest(dst, src, len)        \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      copy_to_user_hvm((dst), (src), (len)) :    \
      copy_to_user((dst), (src), (len)))
 #define raw_copy_from_guest(dst, src, len)      \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      copy_from_user_hvm((dst), (src), (len)) :  \
      copy_from_user((dst), (src), (len)))
 #define raw_clear_guest(dst,  len)              \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      clear_user_hvm((dst), (len)) :             \
      clear_user((dst), (len)))
 #define __raw_copy_to_guest(dst, src, len)      \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      copy_to_user_hvm((dst), (src), (len)) :    \
      __copy_to_user((dst), (src), (len)))
 #define __raw_copy_from_guest(dst, src, len)    \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      copy_from_user_hvm((dst), (src), (len)) :  \
      __copy_from_user((dst), (src), (len)))
 #define __raw_clear_guest(dst,  len)            \
-    (is_hvm_vcpu(current) ?                     \
+    (!is_pv_vcpu(current) ?                     \
      clear_user_hvm((dst), (len)) :             \
      clear_user((dst), (len)))
 
diff --git a/xen/include/asm-x86/x86_64/regs.h b/xen/include/asm-x86/x86_64/regs.h
index 3cdc702..d91a84b 100644
--- a/xen/include/asm-x86/x86_64/regs.h
+++ b/xen/include/asm-x86/x86_64/regs.h
@@ -10,10 +10,13 @@
 #define ring_2(r)    (((r)->cs & 3) == 2)
 #define ring_3(r)    (((r)->cs & 3) == 3)
 
-#define guest_kernel_mode(v, r)                                 \
-    (!is_pv_32bit_vcpu(v) ?                                     \
-     (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) :        \
-     (ring_1(r)))
+bool_t hvm_kernel_mode(struct vcpu *);
+
+#define guest_kernel_mode(v, r)                                   \
+    (is_pvh_vcpu(v) ? hvm_kernel_mode(v) :                        \
+     (!is_pv_32bit_vcpu(v) ?                                      \
+      (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) :         \
+      (ring_1(r))))
 
 #define permit_softint(dpl, v, r) \
     ((dpl) >= (guest_kernel_mode(v, r) ? 1 : 3))
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 4c5b2bb..6b1aa11 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -89,6 +89,9 @@ struct xen_domctl_getdomaininfo {
  /* Being debugged.  */
 #define _XEN_DOMINF_debugged  6
 #define XEN_DOMINF_debugged   (1U<<_XEN_DOMINF_debugged)
+/* domain is PVH */
+#define _XEN_DOMINF_pvh_guest 7
+#define XEN_DOMINF_pvh_guest   (1U<<_XEN_DOMINF_pvh_guest)
  /* XEN_DOMINF_shutdown guest-supplied code.  */
 #define XEN_DOMINF_shutdownmask 255
 #define XEN_DOMINF_shutdownshift 16
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ae6a3b8..2d48d22 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -238,6 +238,14 @@ struct mem_event_per_domain
     struct mem_event_domain access;
 };
 
+/*
+ * PVH is a PV guest running in an HVM container. While is_hvm_* checks are
+ * false for it, it uses many of the HVM data structs.
+ */
+enum guest_type {
+    guest_type_pv, guest_type_pvh, guest_type_hvm
+};
+
 struct domain
 {
     domid_t          domain_id;
@@ -285,8 +293,8 @@ struct domain
     struct rangeset *iomem_caps;
     struct rangeset *irq_caps;
 
-    /* Is this an HVM guest? */
-    bool_t           is_hvm;
+    enum guest_type guest_type;
+
 #ifdef HAS_PASSTHROUGH
     /* Does this guest need iommu mappings? */
     bool_t           need_iommu;
@@ -464,6 +472,9 @@ struct domain *domain_create(
  /* DOMCRF_oos_off: dont use out-of-sync optimization for shadow page tables */
 #define _DOMCRF_oos_off         4
 #define DOMCRF_oos_off          (1U<<_DOMCRF_oos_off)
+ /* DOMCRF_pvh: Create PV domain in HVM container. */
+#define _DOMCRF_pvh            5
+#define DOMCRF_pvh             (1U<<_DOMCRF_pvh)
 
 /*
  * rcu_lock_domain_by_id() is more efficient than get_domain_by_id().
@@ -732,8 +743,12 @@ void watchdog_domain_destroy(struct domain *d);
 
 #define VM_ASSIST(_d,_t) (test_bit((_t), &(_d)->vm_assist))
 
-#define is_hvm_domain(d) ((d)->is_hvm)
+#define is_pv_domain(d) ((d)->guest_type == guest_type_pv)
+#define is_pv_vcpu(v)   (is_pv_domain((v)->domain))
+#define is_hvm_domain(d) ((d)->guest_type == guest_type_hvm)
 #define is_hvm_vcpu(v)   (is_hvm_domain(v->domain))
+#define is_pvh_domain(d) ((d)->guest_type == guest_type_pvh)
+#define is_pvh_vcpu(v)   (is_pvh_domain((v)->domain))
 #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \
                            cpumask_weight((v)->cpu_affinity) == 1)
 #ifdef HAS_PASSTHROUGH
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 08/21] PVH xen: introduce pvh_vcpu_boot_set_info() and vmx_pvh_vcpu_boot_set_info()
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (6 preceding siblings ...)
  2013-08-23  1:18 ` [V11 PATCH 07/21] PVH xen: Introduce PVH guest type and some basic changes Mukesh Rathor
@ 2013-08-23  1:18 ` Mukesh Rathor
  2013-08-23  1:18 ` [V11 PATCH 09/21] PVH xen: domain create, context switch related code changes Mukesh Rathor
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:18 UTC (permalink / raw)
  To: Xen-devel

vmx_pvh_vcpu_boot_set_info() is added to a new file pvh.c, to which
more changes are added later, like pvh vmexit handler.

Changes in V11:
   - vmx_pvh_vcpu_boot_set_info pretty much redone to be minimal.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/vmx/Makefile     |    1 +
 xen/arch/x86/hvm/vmx/pvh.c        |   54 +++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c        |    1 +
 xen/include/asm-x86/hvm/hvm.h     |    9 ++++++
 xen/include/asm-x86/hvm/vmx/vmx.h |    2 +
 xen/include/public/arch-x86/xen.h |    4 +++
 6 files changed, 71 insertions(+), 0 deletions(-)
 create mode 100644 xen/arch/x86/hvm/vmx/pvh.c

diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
index 373b3d9..59fb5d4 100644
--- a/xen/arch/x86/hvm/vmx/Makefile
+++ b/xen/arch/x86/hvm/vmx/Makefile
@@ -1,5 +1,6 @@
 obj-bin-y += entry.o
 obj-y += intr.o
+obj-y += pvh.o
 obj-y += realmode.o
 obj-y += vmcs.o
 obj-y += vmx.o
diff --git a/xen/arch/x86/hvm/vmx/pvh.c b/xen/arch/x86/hvm/vmx/pvh.c
new file mode 100644
index 0000000..526ce2b
--- /dev/null
+++ b/xen/arch/x86/hvm/vmx/pvh.c
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) 2013, Mukesh Rathor, Oracle Corp.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <xen/hypercall.h>
+#include <xen/guest_access.h>
+#include <asm/p2m.h>
+#include <asm/traps.h>
+#include <asm/hvm/vmx/vmx.h>
+#include <public/sched.h>
+#include <asm/hvm/nestedhvm.h>
+#include <asm/xstate.h>
+
+/*
+ * Set vmcs fields during boot of a vcpu. Called from arch_set_info_guest.
+ *
+ * Boot vcpu call is from tools via:
+ *     do_domctl -> XEN_DOMCTL_setvcpucontext -> arch_set_info_guest
+ *
+ * Secondary vcpu's are brought up by the guest itself via:
+ *     do_vcpu_op -> VCPUOP_initialise -> arch_set_info_guest
+ *     (In case of linux, the call comes from cpu_initialize_context()).
+ *
+ * Note, PVH save/restore is expected to happen the HVM way, ie,
+ *        do_domctl -> XEN_DOMCTL_sethvmcontext -> hvm_load/save
+ * and not get here.
+ *
+ * PVH 32bitfixme: this function needs to be modified for 32bit guest.
+ */
+int vmx_pvh_vcpu_boot_set_info(struct vcpu *v,
+                               struct vcpu_guest_context *ctxtp)
+{
+    if ( ctxtp->ldt_base || ctxtp->ldt_ents ||
+         ctxtp->user_regs.cs || ctxtp->user_regs.ss || ctxtp->user_regs.es ||
+         ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs ||
+         *ctxtp->gdt_frames || ctxtp->gdt_ents ||
+         ctxtp->fs_base || ctxtp->gs_base_user )
+        return -EINVAL;
+
+    vmx_vmcs_enter(v);
+    __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel);
+    vmx_vmcs_exit(v);
+
+    return 0;
+}
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 7292357..a778dca 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1562,6 +1562,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .sync_pir_to_irr      = vmx_sync_pir_to_irr,
     .handle_eoi           = vmx_handle_eoi,
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
+    .pvh_vcpu_boot_set_info = vmx_pvh_vcpu_boot_set_info,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 00489cf..1bd8fc9 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -193,6 +193,9 @@ struct hvm_function_table {
                                 paddr_t *L1_gpa, unsigned int *page_order,
                                 uint8_t *p2m_acc, bool_t access_r,
                                 bool_t access_w, bool_t access_x);
+
+    int (*pvh_vcpu_boot_set_info)(struct vcpu *v,
+                                  struct vcpu_guest_context *ctxtp);
 };
 
 extern struct hvm_function_table hvm_funcs;
@@ -326,6 +329,12 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
     return hvm_funcs.get_shadow_gs_base(v);
 }
 
+static inline int pvh_vcpu_boot_set_info(struct vcpu *v,
+                                         struct vcpu_guest_context *ctxtp)
+{
+    return hvm_funcs.pvh_vcpu_boot_set_info(v, ctxtp);
+}
+
 #define is_viridian_domain(_d)                                             \
  (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VIRIDIAN]))
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index c21a303..3ad2188 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -473,6 +473,8 @@ void vmx_update_guest_eip(void);
 void vmx_dr_access(unsigned long exit_qualification,
                    struct cpu_user_regs *regs);
 void vmx_fpu_enter(struct vcpu *v);
+int  vmx_pvh_vcpu_boot_set_info(struct vcpu *v,
+                                struct vcpu_guest_context *ctxtp);
 
 int alloc_p2m_hap_data(struct p2m_domain *p2m);
 void free_p2m_hap_data(struct p2m_domain *p2m);
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index b7f6a51..4f12f50 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -150,6 +150,10 @@ typedef uint64_t tsc_timestamp_t; /* RDTSC timestamp */
 /*
  * The following is all CPU context. Note that the fpu_ctxt block is filled 
  * in by FXSAVE if the CPU has feature FXSR; otherwise FSAVE is used.
+ *
+ * PVH 64bit: In the vcpu boot path, for vmcs context, only gs_base_kernel
+ *            is honored. Other fields like gdt, ldt, and selectors must be
+ *            zeroed. See vmx_pvh_vcpu_boot_set_info.
  */
 struct vcpu_guest_context {
     /* FPU registers come first so they can be aligned for FXSAVE/FXRSTOR. */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 09/21] PVH xen: domain create, context switch related code changes
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (7 preceding siblings ...)
  2013-08-23  1:18 ` [V11 PATCH 08/21] PVH xen: introduce pvh_vcpu_boot_set_info() and vmx_pvh_vcpu_boot_set_info() Mukesh Rathor
@ 2013-08-23  1:18 ` Mukesh Rathor
  2013-08-23  8:12   ` Jan Beulich
  2013-08-23  1:18 ` [V11 PATCH 10/21] PVH xen: support invalid op emulation for PVH Mukesh Rathor
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:18 UTC (permalink / raw)
  To: Xen-devel

This patch mostly contains changes to arch/x86/domain.c to allow for a PVH
domain creation. The new function pvh_set_vcpu_info(), introduced in the
previous patch, is called here to set some guest context in the VMCS.
This patch also changes the context_switch code in the same file to follow
HVM behaviour for PVH.

Changes in V2:
  - changes to read_segment_register() moved to this patch.

Changes in V3:
  - Fix read_segment_register() macro to make sure args are evaluated once,
    and use # instead of STR for name in the macro.

Changes in V4:
  - Remove pvh substruct in the hvm substruct, as the vcpu_info_mfn has been
    moved out of pv_vcpu struct.
  - rename hvm_pvh_* functions to hvm_*.

Changes in V5:
  - remove pvh_read_descriptor().

Changes in V7:
  - remove hap_update_cr3() and read_segment_register changes from here.

Changes in V11:
  - set cr3 to page_to_maddr and not page_to_mfn.
  - reject non-zero cr1 value for pvh.
  - Do not check for pvh in destroy_gdt, but put the check in callers.
  - Set _VPF_in_reset for PVH also.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/domain.c |   65 +++++++++++++++++++++++++++++++++----------------
 1 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index d124507..917eb6a 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -385,7 +385,7 @@ int vcpu_initialise(struct vcpu *v)
 
     vmce_init_vcpu(v);
 
-    if ( is_hvm_domain(d) )
+    if ( !is_pv_domain(d) )
     {
         rc = hvm_vcpu_initialise(v);
         goto done;
@@ -452,7 +452,7 @@ void vcpu_destroy(struct vcpu *v)
 
     vcpu_destroy_fpu(v);
 
-    if ( is_hvm_vcpu(v) )
+    if ( !is_pv_vcpu(v) )
         hvm_vcpu_destroy(v);
     else
         xfree(v->arch.pv_vcpu.trap_ctxt);
@@ -464,7 +464,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     int rc = -ENOMEM;
 
     d->arch.hvm_domain.hap_enabled =
-        is_hvm_domain(d) &&
+        !is_pv_domain(d) &&
         hvm_funcs.hap_supported &&
         (domcr_flags & DOMCRF_hap);
     d->arch.hvm_domain.mem_sharing_enabled = 0;
@@ -512,7 +512,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     mapcache_domain_init(d);
 
     HYPERVISOR_COMPAT_VIRT_START(d) =
-        is_hvm_domain(d) ? ~0u : __HYPERVISOR_COMPAT_VIRT_START;
+        is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
 
     if ( (rc = paging_domain_init(d, domcr_flags)) != 0 )
         goto fail;
@@ -555,7 +555,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     }
     spin_lock_init(&d->arch.e820_lock);
 
-    if ( is_hvm_domain(d) )
+    if ( !is_pv_domain(d) )
     {
         if ( (rc = hvm_domain_initialise(d)) != 0 )
         {
@@ -650,7 +650,7 @@ int arch_set_info_guest(
 #define c(fld) (compat ? (c.cmp->fld) : (c.nat->fld))
     flags = c(flags);
 
-    if ( !is_hvm_vcpu(v) )
+    if ( is_pv_vcpu(v) )
     {
         if ( !compat )
         {
@@ -703,7 +703,7 @@ int arch_set_info_guest(
     v->fpu_initialised = !!(flags & VGCF_I387_VALID);
 
     v->arch.flags &= ~TF_kernel_mode;
-    if ( (flags & VGCF_in_kernel) || is_hvm_vcpu(v)/*???*/ )
+    if ( (flags & VGCF_in_kernel) || !is_pv_vcpu(v)/*???*/ )
         v->arch.flags |= TF_kernel_mode;
 
     v->arch.vgc_flags = flags;
@@ -718,7 +718,7 @@ int arch_set_info_guest(
     if ( !compat )
     {
         memcpy(&v->arch.user_regs, &c.nat->user_regs, sizeof(c.nat->user_regs));
-        if ( !is_hvm_vcpu(v) )
+        if ( is_pv_vcpu(v) )
             memcpy(v->arch.pv_vcpu.trap_ctxt, c.nat->trap_ctxt,
                    sizeof(c.nat->trap_ctxt));
     }
@@ -734,10 +734,13 @@ int arch_set_info_guest(
 
     v->arch.user_regs.eflags |= 2;
 
-    if ( is_hvm_vcpu(v) )
+    if ( !is_pv_vcpu(v) )
     {
         hvm_set_info_guest(v);
-        goto out;
+        if ( is_hvm_vcpu(v) || v->is_initialised )
+            goto out;
+        else
+            goto pvh_skip_pv_stuff;
     }
 
     init_int80_direct_trap(v);
@@ -850,6 +853,7 @@ int arch_set_info_guest(
     if ( rc != 0 )
         return rc;
 
+ pvh_skip_pv_stuff:
     set_bit(_VPF_in_reset, &v->pause_flags);
 
     if ( !compat )
@@ -860,7 +864,7 @@ int arch_set_info_guest(
 
     if ( !cr3_page )
         rc = -EINVAL;
-    else if ( paging_mode_refcounts(d) )
+    else if ( paging_mode_refcounts(d) || is_pvh_vcpu(v) )
         /* nothing */;
     else if ( cr3_page == v->arch.old_guest_table )
     {
@@ -892,8 +896,19 @@ int arch_set_info_guest(
         /* handled below */;
     else if ( !compat )
     {
+        /* PVH 32bitfixme. */
+        if ( is_pvh_vcpu(v) )
+        {
+            v->arch.cr3 = page_to_maddr(cr3_page);
+            v->arch.hvm_vcpu.guest_cr[3] = c.nat->ctrlreg[3];
+        }
+
         v->arch.guest_table = pagetable_from_page(cr3_page);
-        if ( c.nat->ctrlreg[1] )
+
+        if ( c.nat->ctrlreg[1] && is_pvh_vcpu(v) )
+            rc = -EINVAL;
+
+        if ( c.nat->ctrlreg[1] && is_pv_vcpu(v) )
         {
             cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[1]);
             cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC);
@@ -936,7 +951,8 @@ int arch_set_info_guest(
     {
         if ( cr3_page )
             put_page(cr3_page);
-        destroy_gdt(v);
+        if ( !is_pvh_vcpu(v) )
+            destroy_gdt(v);
         return rc;
     }
 
@@ -953,6 +969,13 @@ int arch_set_info_guest(
 
     update_cr3(v);
 
+    if ( is_pvh_vcpu(v) )
+    {
+        /* Set VMCS fields. */
+        if ( (rc = pvh_vcpu_boot_set_info(v, c.nat)) != 0 )
+            return rc;
+    }
+
  out:
     if ( flags & VGCF_online )
         clear_bit(_VPF_down, &v->pause_flags);
@@ -964,7 +987,7 @@ int arch_set_info_guest(
 
 int arch_vcpu_reset(struct vcpu *v)
 {
-    if ( !is_hvm_vcpu(v) )
+    if ( is_pv_vcpu(v) )
     {
         destroy_gdt(v);
         return vcpu_destroy_pagetables(v);
@@ -1314,7 +1337,7 @@ static void update_runstate_area(struct vcpu *v)
 
 static inline int need_full_gdt(struct vcpu *v)
 {
-    return (!is_hvm_vcpu(v) && !is_idle_vcpu(v));
+    return (is_pv_vcpu(v) && !is_idle_vcpu(v));
 }
 
 static void __context_switch(void)
@@ -1449,7 +1472,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
         /* Re-enable interrupts before restoring state which may fault. */
         local_irq_enable();
 
-        if ( !is_hvm_vcpu(next) )
+        if ( is_pv_vcpu(next) )
         {
             load_LDT(next);
             load_segments(next);
@@ -1575,12 +1598,12 @@ unsigned long hypercall_create_continuation(
         regs->eax  = op;
 
         /* Ensure the hypercall trap instruction is re-executed. */
-        if ( !is_hvm_vcpu(current) )
+        if ( is_pv_vcpu(current) )
             regs->eip -= 2;  /* re-execute 'syscall' / 'int $xx' */
         else
             current->arch.hvm_vcpu.hcall_preempted = 1;
 
-        if ( !is_hvm_vcpu(current) ?
+        if ( is_pv_vcpu(current) ?
              !is_pv_32on64_vcpu(current) :
              (hvm_guest_x86_mode(current) == 8) )
         {
@@ -1848,7 +1871,7 @@ int domain_relinquish_resources(struct domain *d)
                 return ret;
         }
 
-        if ( !is_hvm_domain(d) )
+        if ( is_pv_domain(d) )
         {
             for_each_vcpu ( d, v )
             {
@@ -1921,7 +1944,7 @@ int domain_relinquish_resources(struct domain *d)
         BUG();
     }
 
-    if ( is_hvm_domain(d) )
+    if ( !is_pv_domain(d) )
         hvm_domain_relinquish_resources(d);
 
     return 0;
@@ -2005,7 +2028,7 @@ void vcpu_mark_events_pending(struct vcpu *v)
     if ( already_pending )
         return;
 
-    if ( is_hvm_vcpu(v) )
+    if ( !is_pv_vcpu(v) )
         hvm_assert_evtchn_irq(v);
     else
         vcpu_kick(v);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 10/21] PVH xen: support invalid op emulation for PVH
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (8 preceding siblings ...)
  2013-08-23  1:18 ` [V11 PATCH 09/21] PVH xen: domain create, context switch related code changes Mukesh Rathor
@ 2013-08-23  1:18 ` Mukesh Rathor
  2013-08-23  1:19 ` [V11 PATCH 11/21] PVH xen: Support privileged " Mukesh Rathor
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:18 UTC (permalink / raw)
  To: Xen-devel

This patch supports invalid op emulation for PVH by calling appropriate
copy macros and and HVM function to inject PF.

Changes in V11:
  - Break propagate_page_fault to create pv_inject_page_fault.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/traps.c |   21 ++++++++++++++++++---
 1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 0519836..2e92a0e 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -460,6 +460,11 @@ static void instruction_done(
     struct cpu_user_regs *regs, unsigned long eip, unsigned int bpmatch)
 {
     regs->eip = eip;
+
+    /* PVH fixme: support guest io bp trap below */
+    if ( is_pvh_vcpu(current) )
+        return;
+
     regs->eflags &= ~X86_EFLAGS_RF;
     if ( bpmatch || (regs->eflags & X86_EFLAGS_TF) )
     {
@@ -922,7 +927,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     eip = regs->eip;
 
     /* Check for forced emulation signature: ud2 ; .ascii "xen". */
-    if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 )
+    if ( (rc = raw_copy_from_guest(sig, (char *)eip, sizeof(sig))) != 0 )
     {
         propagate_page_fault(eip + sizeof(sig) - rc, 0);
         return EXCRET_fault_fixed;
@@ -932,7 +937,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     eip += sizeof(sig);
 
     /* We only emulate CPUID. */
-    if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 )
+    if ( ( rc = raw_copy_from_guest(instr, (char *)eip, sizeof(instr))) != 0 )
     {
         propagate_page_fault(eip + sizeof(instr) - rc, 0);
         return EXCRET_fault_fixed;
@@ -1071,7 +1076,7 @@ static void reserved_bit_page_fault(
     show_execution_state(regs);
 }
 
-void propagate_page_fault(unsigned long addr, u16 error_code)
+static void pv_inject_page_fault(unsigned long addr, u16 error_code)
 {
     struct trap_info *ti;
     struct vcpu *v = current;
@@ -1105,6 +1110,16 @@ void propagate_page_fault(unsigned long addr, u16 error_code)
         reserved_bit_page_fault(addr, guest_cpu_user_regs());
 }
 
+void propagate_page_fault(unsigned long addr, u16 error_code)
+{
+    if ( is_pvh_vcpu(current) )
+        hvm_inject_page_fault(error_code, addr);
+    else if ( is_pv_vcpu(current) )
+        pv_inject_page_fault(addr, error_code);
+    else
+        BUG();
+}
+
 static int handle_gdt_ldt_mapping_fault(
     unsigned long offset, struct cpu_user_regs *regs)
 {
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 11/21] PVH xen: Support privileged op emulation for PVH
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (9 preceding siblings ...)
  2013-08-23  1:18 ` [V11 PATCH 10/21] PVH xen: support invalid op emulation for PVH Mukesh Rathor
@ 2013-08-23  1:19 ` Mukesh Rathor
  2013-08-23  1:19 ` [V11 PATCH 12/21] PVH xen: interrupt/event-channel delivery to PVH Mukesh Rathor
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:19 UTC (permalink / raw)
  To: Xen-devel

This patch changes mostly traps.c to support privileged op emulation for
PVH. A new function read_descriptor_sel() is introduced to read descriptor
for PVH given a selector.  Another new function vmx_read_selector() reads
a selector from VMCS, to support read_segment_register() for PVH.

We replace copy_from_user function as it's not appropriate to access
PVH/HVM memory directly. We discard the STR macro in the
read_segment_register macro because, in Jan's words:
  "In any event I think STR() should go away altogether (where
   necessary replaced by __stringify()), and was needlessly used
   in the original code here: The intended use is when you need
   the argument macro expanded before stringification, which is
   not the case here."

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmx.c    |   40 +++++++++++++++++++
 xen/arch/x86/traps.c          |   86 +++++++++++++++++++++++++++++++++++-----
 xen/include/asm-x86/hvm/hvm.h |    7 +++
 xen/include/asm-x86/system.h  |   19 +++++++--
 4 files changed, 137 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index a778dca..9056a3f 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -664,6 +664,45 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
         .fields = { .type = 0xb, .s = 0, .dpl = 0, .p = 1, .avl = 0,    \
                     .l = 0, .db = 0, .g = 0, .pad = 0 } }).bytes)
 
+u16 vmx_read_selector(struct vcpu *v, enum x86_segment seg)
+{
+    u16 sel = 0;
+
+    vmx_vmcs_enter(v);
+    switch ( seg )
+    {
+    case x86_seg_cs:
+        sel = __vmread(GUEST_CS_SELECTOR);
+        break;
+
+    case x86_seg_ss:
+        sel = __vmread(GUEST_SS_SELECTOR);
+        break;
+
+    case x86_seg_es:
+        sel = __vmread(GUEST_ES_SELECTOR);
+        break;
+
+    case x86_seg_ds:
+        sel = __vmread(GUEST_DS_SELECTOR);
+        break;
+
+    case x86_seg_fs:
+        sel = __vmread(GUEST_FS_SELECTOR);
+        break;
+
+    case x86_seg_gs:
+        sel = __vmread(GUEST_GS_SELECTOR);
+        break;
+
+    default:
+        BUG();
+    }
+    vmx_vmcs_exit(v);
+
+    return sel;
+}
+
 void vmx_get_segment_register(struct vcpu *v, enum x86_segment seg,
                               struct segment_register *reg)
 {
@@ -1563,6 +1602,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .handle_eoi           = vmx_handle_eoi,
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
     .pvh_vcpu_boot_set_info = vmx_pvh_vcpu_boot_set_info,
+    .read_selector        = vmx_read_selector,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 2e92a0e..2a3f517 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -481,6 +481,10 @@ static unsigned int check_guest_io_breakpoint(struct vcpu *v,
     unsigned int width, i, match = 0;
     unsigned long start;
 
+    /* PVH fixme: support io breakpoint. */
+    if ( is_pvh_vcpu(v) )
+        return 0;
+
     if ( !(v->arch.debugreg[5]) ||
          !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) )
         return 0;
@@ -1530,6 +1534,49 @@ static int read_descriptor(unsigned int sel,
     return 1;
 }
 
+static int read_descriptor_sel(unsigned int sel,
+                               enum x86_segment which_sel,
+                               struct vcpu *v,
+                               const struct cpu_user_regs *regs,
+                               unsigned long *base,
+                               unsigned long *limit,
+                               unsigned int *ar,
+                               unsigned int vm86attr)
+{
+    struct segment_register seg;
+    bool_t long_mode;
+
+    if ( !is_pvh_vcpu(v) )
+        return read_descriptor(sel, v, regs, base, limit, ar, vm86attr);
+
+    hvm_get_segment_register(v, x86_seg_cs, &seg);
+    long_mode = seg.attr.fields.l;
+
+    if ( which_sel != x86_seg_cs )
+        hvm_get_segment_register(v, which_sel, &seg);
+
+    /* "ar" is returned packed as in segment_attributes_t. Fix it up. */
+    *ar = seg.attr.bytes;
+    *ar = (*ar & 0xff ) | ((*ar & 0xf00) << 4);
+    *ar <<= 8;
+
+    if ( long_mode )
+    {
+        *limit = ~0UL;
+
+        if ( which_sel < x86_seg_fs )
+        {
+            *base = 0UL;
+            return 1;
+        }
+   }
+   else
+       *limit = seg.limit;
+
+   *base = seg.base;
+    return 1;
+}
+
 static int read_gate_descriptor(unsigned int gate_sel,
                                 const struct vcpu *v,
                                 unsigned int *sel,
@@ -1595,6 +1642,13 @@ static int guest_io_okay(
     int user_mode = !(v->arch.flags & TF_kernel_mode);
 #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
 
+    /*
+     * For PVH we check this in vmexit for EXIT_REASON_IO_INSTRUCTION
+     * and so don't need to check again here.
+     */
+    if ( is_pvh_vcpu(v) )
+        return 1;
+
     if ( !vm86_mode(regs) &&
          (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ? 1 : 3)) )
         return 1;
@@ -1840,7 +1894,7 @@ static inline uint64_t guest_misc_enable(uint64_t val)
         _ptr = (unsigned int)_ptr;                                          \
     if ( (limit) < sizeof(_x) - 1 || (eip) > (limit) - (sizeof(_x) - 1) )   \
         goto fail;                                                          \
-    if ( (_rc = copy_from_user(&_x, (type *)_ptr, sizeof(_x))) != 0 )       \
+    if ( (_rc = raw_copy_from_guest(&_x, (type *)_ptr, sizeof(_x))) != 0 )  \
     {                                                                       \
         propagate_page_fault(_ptr + sizeof(_x) - _rc, 0);                   \
         goto skip;                                                          \
@@ -1857,6 +1911,7 @@ static int is_cpufreq_controller(struct domain *d)
 
 static int emulate_privileged_op(struct cpu_user_regs *regs)
 {
+    enum x86_segment which_sel;
     struct vcpu *v = current;
     unsigned long *reg, eip = regs->eip;
     u8 opcode, modrm_reg = 0, modrm_rm = 0, rep_prefix = 0, lock = 0, rex = 0;
@@ -1879,9 +1934,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
     void (*io_emul)(struct cpu_user_regs *) __attribute__((__regparm__(1)));
     uint64_t val, msr_content;
 
-    if ( !read_descriptor(regs->cs, v, regs,
-                          &code_base, &code_limit, &ar,
-                          _SEGMENT_CODE|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P) )
+    if ( !read_descriptor_sel(regs->cs, x86_seg_cs, v, regs,
+                              &code_base, &code_limit, &ar,
+                              _SEGMENT_CODE|_SEGMENT_S|
+                              _SEGMENT_DPL|_SEGMENT_P) )
         goto fail;
     op_default = op_bytes = (ar & (_SEGMENT_L|_SEGMENT_DB)) ? 4 : 2;
     ad_default = ad_bytes = (ar & _SEGMENT_L) ? 8 : op_default;
@@ -1892,6 +1948,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
 
     /* emulating only opcodes not allowing SS to be default */
     data_sel = read_segment_register(v, regs, ds);
+    which_sel = x86_seg_ds;
 
     /* Legacy prefixes. */
     for ( i = 0; i < 8; i++, rex == opcode || (rex = 0) )
@@ -1907,23 +1964,29 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             continue;
         case 0x2e: /* CS override */
             data_sel = regs->cs;
+            which_sel = x86_seg_cs;
             continue;
         case 0x3e: /* DS override */
             data_sel = read_segment_register(v, regs, ds);
+            which_sel = x86_seg_ds;
             continue;
         case 0x26: /* ES override */
             data_sel = read_segment_register(v, regs, es);
+            which_sel = x86_seg_es;
             continue;
         case 0x64: /* FS override */
             data_sel = read_segment_register(v, regs, fs);
+            which_sel = x86_seg_fs;
             lm_ovr = lm_seg_fs;
             continue;
         case 0x65: /* GS override */
             data_sel = read_segment_register(v, regs, gs);
+            which_sel = x86_seg_gs;
             lm_ovr = lm_seg_gs;
             continue;
         case 0x36: /* SS override */
             data_sel = regs->ss;
+            which_sel = x86_seg_ss;
             continue;
         case 0xf0: /* LOCK */
             lock = 1;
@@ -1967,15 +2030,16 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         if ( !(opcode & 2) )
         {
             data_sel = read_segment_register(v, regs, es);
+            which_sel = x86_seg_es;
             lm_ovr = lm_seg_none;
         }
 
         if ( !(ar & _SEGMENT_L) )
         {
-            if ( !read_descriptor(data_sel, v, regs,
-                                  &data_base, &data_limit, &ar,
-                                  _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|
-                                  _SEGMENT_P) )
+            if ( !read_descriptor_sel(data_sel, which_sel, v, regs,
+                                      &data_base, &data_limit, &ar,
+                                      _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|
+                                      _SEGMENT_P) )
                 goto fail;
             if ( !(ar & _SEGMENT_S) ||
                  !(ar & _SEGMENT_P) ||
@@ -2005,9 +2069,9 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
                 }
             }
             else
-                read_descriptor(data_sel, v, regs,
-                                &data_base, &data_limit, &ar,
-                                0);
+                read_descriptor_sel(data_sel, which_sel, v, regs,
+                                    &data_base, &data_limit, &ar,
+                                    0);
             data_limit = ~0UL;
             ar = _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P;
         }
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1bd8fc9..96e32f2 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -196,6 +196,8 @@ struct hvm_function_table {
 
     int (*pvh_vcpu_boot_set_info)(struct vcpu *v,
                                   struct vcpu_guest_context *ctxtp);
+
+    u16 (*read_selector)(struct vcpu *v, enum x86_segment seg);
 };
 
 extern struct hvm_function_table hvm_funcs;
@@ -335,6 +337,11 @@ static inline int pvh_vcpu_boot_set_info(struct vcpu *v,
     return hvm_funcs.pvh_vcpu_boot_set_info(v, ctxtp);
 }
 
+static inline u16 pvh_get_selector(struct vcpu *v, enum x86_segment seg)
+{
+    return hvm_funcs.read_selector(v, seg);
+}
+
 #define is_viridian_domain(_d)                                             \
  (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VIRIDIAN]))
 
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 9bb22cb..1242657 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -4,10 +4,21 @@
 #include <xen/lib.h>
 #include <xen/bitops.h>
 
-#define read_segment_register(vcpu, regs, name)                 \
-({  u16 __sel;                                                  \
-    asm volatile ( "movw %%" STR(name) ",%0" : "=r" (__sel) );  \
-    __sel;                                                      \
+/*
+ * We need vcpu because during context switch, going from PV to PVH,
+ * in save_segments() current has been updated to next, and no longer pointing
+ * to the PV, but the intention is to get selector for the PV. Checking
+ * is_pvh_vcpu(current) will yield incorrect results in such a case.
+ */
+#define read_segment_register(vcpu, regs, name)                   \
+({  u16 __sel;                                                    \
+    struct cpu_user_regs *_regs = (regs);                         \
+                                                                  \
+    if ( is_pvh_vcpu(vcpu) && guest_mode(_regs) )                 \
+        __sel = pvh_get_selector(vcpu, x86_seg_##name);           \
+    else                                                          \
+        asm volatile ( "movw %%" #name ",%0" : "=r" (__sel) );    \
+    __sel;                                                        \
 })
 
 #define wbinvd() \
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 12/21] PVH xen: interrupt/event-channel delivery to PVH
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (10 preceding siblings ...)
  2013-08-23  1:19 ` [V11 PATCH 11/21] PVH xen: Support privileged " Mukesh Rathor
@ 2013-08-23  1:19 ` Mukesh Rathor
  2013-08-23  1:19 ` [V11 PATCH 13/21] PVH xen: additional changes to support PVH guest creation and execution Mukesh Rathor
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:19 UTC (permalink / raw)
  To: Xen-devel

PVH uses HVMIRQ_callback_vector for interrupt delivery. Also, change
hvm_vcpu_has_pending_irq() as PVH doesn't need to use vlapic emulation,
so we can skip vlapic checks in the function. Moreover, a PVH guest
installs IDT natively, and sets a callback vector for interrupt delivery
during boot. Once that is done, it receives interrupts via the callback.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/irq.c       |    3 +++
 xen/arch/x86/hvm/vmx/intr.c  |    8 ++++++--
 xen/include/asm-x86/domain.h |    2 +-
 xen/include/asm-x86/event.h  |    2 +-
 4 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 9eae5de..92fb245 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -405,6 +405,9 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v)
          && vcpu_info(v, evtchn_upcall_pending) )
         return hvm_intack_vector(plat->irq.callback_via.vector);
 
+    if ( is_pvh_vcpu(v) )
+        return hvm_intack_none;
+
     if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output )
         return hvm_intack_pic(0);
 
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index e376f3c..ce42950 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -165,6 +165,9 @@ static int nvmx_intr_intercept(struct vcpu *v, struct hvm_intack intack)
 {
     u32 ctrl;
 
+    if ( is_pvh_vcpu(v) )
+        return 0;
+
     if ( nvmx_intr_blocked(v) != hvm_intblk_none )
     {
         enable_intr_window(v, intack);
@@ -219,8 +222,9 @@ void vmx_intr_assist(void)
         return;
     }
 
-    /* Crank the handle on interrupt state. */
-    pt_vector = pt_update_irq(v);
+    if ( !is_pvh_vcpu(v) )
+        /* Crank the handle on interrupt state. */
+        pt_vector = pt_update_irq(v);
 
     do {
         intack = hvm_vcpu_has_pending_irq(v);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 22a72df..21a9954 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -16,7 +16,7 @@
 #define is_pv_32on64_domain(d) (is_pv_32bit_domain(d))
 #define is_pv_32on64_vcpu(v)   (is_pv_32on64_domain((v)->domain))
 
-#define is_hvm_pv_evtchn_domain(d) (is_hvm_domain(d) && \
+#define is_hvm_pv_evtchn_domain(d) (!is_pv_domain(d) && \
         d->arch.hvm_domain.irq.callback_via_type == HVMIRQ_callback_vector)
 #define is_hvm_pv_evtchn_vcpu(v) (is_hvm_pv_evtchn_domain(v->domain))
 
diff --git a/xen/include/asm-x86/event.h b/xen/include/asm-x86/event.h
index 06057c7..7ed5812 100644
--- a/xen/include/asm-x86/event.h
+++ b/xen/include/asm-x86/event.h
@@ -18,7 +18,7 @@ int hvm_local_events_need_delivery(struct vcpu *v);
 static inline int local_events_need_delivery(void)
 {
     struct vcpu *v = current;
-    return (is_hvm_vcpu(v) ? hvm_local_events_need_delivery(v) :
+    return (!is_pv_vcpu(v) ? hvm_local_events_need_delivery(v) :
             (vcpu_info(v, evtchn_upcall_pending) &&
              !vcpu_info(v, evtchn_upcall_mask)));
 }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 13/21] PVH xen: additional changes to support PVH guest creation and execution.
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (11 preceding siblings ...)
  2013-08-23  1:19 ` [V11 PATCH 12/21] PVH xen: interrupt/event-channel delivery to PVH Mukesh Rathor
@ 2013-08-23  1:19 ` Mukesh Rathor
  2013-08-23  1:19 ` [V11 PATCH 14/21] PVH xen: mapcache and show registers Mukesh Rathor
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:19 UTC (permalink / raw)
  To: Xen-devel

Fail creation of 32bit PVH guest. Change hap_update_cr3() to return long
mode for PVH, this called during domain creation from arch_set_info_guest().
Return correct features for PVH to guest during it's boot.

Changes in V11:
  - Change the else from !is_hvm_vcpu to "else if ( is_pv_vcpu(current) )"
  - Drop the changes to hap_paging_get_mode, not needed as we always
    update guest_cr for PVH also.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/domain.c |    8 ++++++++
 xen/common/domain.c   |   10 ++++++++++
 xen/common/domctl.c   |    5 +++++
 xen/common/kernel.c   |    6 +++++-
 4 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 917eb6a..974afae 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -339,6 +339,14 @@ int switch_compat(struct domain *d)
 
     if ( d == NULL )
         return -EINVAL;
+
+    if ( is_pvh_domain(d) )
+    {
+        printk(XENLOG_INFO
+               "Xen currently does not support 32bit PVH guests\n");
+        return -EINVAL;
+    }
+
     if ( !may_switch_mode(d) )
         return -EACCES;
     if ( is_pv_32on64_domain(d) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 38b1bad..3b4af4b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -237,6 +237,16 @@ struct domain *domain_create(
 
     if ( domcr_flags & DOMCRF_hvm )
         d->guest_type = guest_type_hvm;
+    else if ( domcr_flags & DOMCRF_pvh )
+    {
+        if ( !(domcr_flags & DOMCRF_hap) )
+        {
+            err = -EOPNOTSUPP;
+            printk(XENLOG_INFO "PVH guest must have HAP on\n");
+            goto fail;
+        }
+        d->guest_type = guest_type_pvh;
+    }
 
     if ( domid == 0 )
     {
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index c653efb..48e4c08 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -187,6 +187,8 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
 
     if ( is_hvm_domain(d) )
         info->flags |= XEN_DOMINF_hvm_guest;
+    else if ( is_pvh_domain(d) )
+        info->flags |= XEN_DOMINF_pvh_guest;
 
     xsm_security_domaininfo(d, info);
 
@@ -443,6 +445,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         domcr_flags = 0;
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hvm_guest )
             domcr_flags |= DOMCRF_hvm;
+        else if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap )
+            domcr_flags |= DOMCRF_pvh;     /* PV with HAP is a PVH guest */
+
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap )
             domcr_flags |= DOMCRF_hap;
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_s3_integrity )
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 72fb905..463983d 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -289,7 +289,11 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             if ( current->domain == dom0 )
                 fi.submap |= 1U << XENFEAT_dom0;
 #ifdef CONFIG_X86
-            if ( !is_hvm_vcpu(current) )
+            if ( is_pvh_vcpu(current) )
+                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                             (1U << XENFEAT_supervisor_mode_kernel) |
+                             (1U << XENFEAT_hvm_callback_vector);
+            else if ( is_pv_vcpu(current) )
                 fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
                              (1U << XENFEAT_highmem_assist) |
                              (1U << XENFEAT_gnttab_map_avail_bits);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 14/21] PVH xen: mapcache and show registers
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (12 preceding siblings ...)
  2013-08-23  1:19 ` [V11 PATCH 13/21] PVH xen: additional changes to support PVH guest creation and execution Mukesh Rathor
@ 2013-08-23  1:19 ` Mukesh Rathor
  2013-08-23  1:19 ` [V11 PATCH 15/21] PVH xen: mtrr, tsc, timers, grant changes Mukesh Rathor
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:19 UTC (permalink / raw)
  To: Xen-devel

PVH doesn't use map cache. show_registers() for PVH takes the HVM path.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/domain_page.c  |   10 +++++-----
 xen/arch/x86/x86_64/traps.c |    6 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index bc18263..3903952 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -35,7 +35,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
      * then it means we are running on the idle domain's page table and must
      * therefore use its mapcache.
      */
-    if ( unlikely(pagetable_is_null(v->arch.guest_table)) && !is_hvm_vcpu(v) )
+    if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) )
     {
         /* If we really are idling, perform lazy context switch now. */
         if ( (v = idle_vcpu[smp_processor_id()]) == current )
@@ -72,7 +72,7 @@ void *map_domain_page(unsigned long mfn)
 #endif
 
     v = mapcache_current_vcpu();
-    if ( !v || is_hvm_vcpu(v) )
+    if ( !v || !is_pv_vcpu(v) )
         return mfn_to_virt(mfn);
 
     dcache = &v->domain->arch.pv_domain.mapcache;
@@ -177,7 +177,7 @@ void unmap_domain_page(const void *ptr)
     ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
 
     v = mapcache_current_vcpu();
-    ASSERT(v && !is_hvm_vcpu(v));
+    ASSERT(v && is_pv_vcpu(v));
 
     dcache = &v->domain->arch.pv_domain.mapcache;
     ASSERT(dcache->inuse);
@@ -244,7 +244,7 @@ int mapcache_domain_init(struct domain *d)
     struct mapcache_domain *dcache = &d->arch.pv_domain.mapcache;
     unsigned int bitmap_pages;
 
-    if ( is_hvm_domain(d) || is_idle_domain(d) )
+    if ( !is_pv_domain(d) || is_idle_domain(d) )
         return 0;
 
 #ifdef NDEBUG
@@ -275,7 +275,7 @@ int mapcache_vcpu_init(struct vcpu *v)
     unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
     unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long));
 
-    if ( is_hvm_vcpu(v) || !dcache->inuse )
+    if ( !is_pv_vcpu(v) || !dcache->inuse )
         return 0;
 
     if ( ents > dcache->entries )
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 2134cee..bc65359 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -86,7 +86,7 @@ void show_registers(struct cpu_user_regs *regs)
     enum context context;
     struct vcpu *v = current;
 
-    if ( is_hvm_vcpu(v) && guest_mode(regs) )
+    if ( !is_pv_vcpu(v) && guest_mode(regs) )
     {
         struct segment_register sreg;
         context = CTXT_hvm_guest;
@@ -147,8 +147,8 @@ void vcpu_show_registers(struct vcpu *v)
     const struct cpu_user_regs *regs = &v->arch.user_regs;
     unsigned long crs[8];
 
-    /* No need to handle HVM for now. */
-    if ( is_hvm_vcpu(v) )
+    /* No need to handle HVM and PVH for now. */
+    if ( !is_pv_vcpu(v) )
         return;
 
     crs[0] = v->arch.pv_vcpu.ctrlreg[0];
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 15/21] PVH xen: mtrr, tsc, timers, grant changes...
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (13 preceding siblings ...)
  2013-08-23  1:19 ` [V11 PATCH 14/21] PVH xen: mapcache and show registers Mukesh Rathor
@ 2013-08-23  1:19 ` Mukesh Rathor
  2013-08-23  1:19 ` [V11 PATCH 16/21] PVH xen: add hypercall support for PVH Mukesh Rathor
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:19 UTC (permalink / raw)
  To: Xen-devel

PVH only supports limited memory types in Phase I. TSC is limited to
native mode only also for the moment. Finally, grant mapping of iomem
for PVH hasn't been explored in phase I.

Changes in V10:
  - don't migrate timers for PVH as it doesn't use rtc or emulated timers.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c   |    4 ++++
 xen/arch/x86/hvm/mtrr.c  |    8 ++++++++
 xen/arch/x86/time.c      |    8 ++++++++
 xen/common/grant_table.c |    4 ++--
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bac4708..93aa42c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -301,6 +301,10 @@ u64 hvm_get_guest_tsc_adjust(struct vcpu *v)
 
 void hvm_migrate_timers(struct vcpu *v)
 {
+    /* PVH doesn't use rtc and emulated timers, it uses pvclock mechanism. */
+    if ( is_pvh_vcpu(v) )
+        return;
+
     rtc_migrate_timers(v);
     pt_migrate(v);
 }
diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index ef51a8d..b9d6411 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -693,6 +693,14 @@ uint8_t epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
          ((d->vcpu == NULL) || ((v = d->vcpu[0]) == NULL)) )
         return MTRR_TYPE_WRBACK;
 
+    /* PVH fixme: Add support for more memory types. */
+    if ( is_pvh_domain(d) )
+    {
+        if ( direct_mmio )
+            return MTRR_TYPE_UNCACHABLE;
+        return MTRR_TYPE_WRBACK;
+    }
+
     if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] )
         return MTRR_TYPE_WRBACK;
 
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index f047cb3..4589d43 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1891,6 +1891,14 @@ void tsc_set_info(struct domain *d,
         d->arch.vtsc = 0;
         return;
     }
+    if ( is_pvh_domain(d) && tsc_mode != TSC_MODE_NEVER_EMULATE )
+    {
+        /* PVH fixme: support more tsc modes. */
+        printk(XENLOG_WARNING
+               "PVH currently does not support tsc emulation. Setting timer_mode = native\n");
+        d->arch.vtsc = 0;
+        return;
+    }
 
     switch ( d->arch.tsc_mode = tsc_mode )
     {
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index eb50288..c51da30 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -721,7 +721,7 @@ __gnttab_map_grant_ref(
 
     double_gt_lock(lgt, rgt);
 
-    if ( !is_hvm_domain(ld) && need_iommu(ld) )
+    if ( is_pv_domain(ld) && need_iommu(ld) )
     {
         unsigned int wrc, rdc;
         int err = 0;
@@ -932,7 +932,7 @@ __gnttab_unmap_common(
             act->pin -= GNTPIN_hstw_inc;
     }
 
-    if ( !is_hvm_domain(ld) && need_iommu(ld) )
+    if ( is_pv_domain(ld) && need_iommu(ld) )
     {
         unsigned int wrc, rdc;
         int err = 0;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 16/21] PVH xen: add hypercall support for PVH
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (14 preceding siblings ...)
  2013-08-23  1:19 ` [V11 PATCH 15/21] PVH xen: mtrr, tsc, timers, grant changes Mukesh Rathor
@ 2013-08-23  1:19 ` Mukesh Rathor
  2013-08-23  1:19 ` [V11 PATCH 17/21] PVH xen: vmcs related changes Mukesh Rathor
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:19 UTC (permalink / raw)
  To: Xen-devel

This patch expands HVM hcall support to include PVH.

Changes in v8:
  - Carve out PVH support of hvm_op to a small function.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c      |   80 +++++++++++++++++++++++++++++++++++++------
 xen/arch/x86/x86_64/traps.c |    2 +-
 2 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 93aa42c..2407396 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3192,6 +3192,17 @@ static long hvm_vcpu_op(
     case VCPUOP_register_vcpu_time_memory_area:
         rc = do_vcpu_op(cmd, vcpuid, arg);
         break;
+
+    case VCPUOP_is_up:
+    case VCPUOP_up:
+    case VCPUOP_initialise:
+        /* PVH fixme: this white list should be removed eventually */
+        if ( is_pvh_vcpu(current) )
+            rc = do_vcpu_op(cmd, vcpuid, arg);
+        else
+            rc = -ENOSYS;
+        break;
+
     default:
         rc = -ENOSYS;
         break;
@@ -3312,6 +3323,24 @@ static hvm_hypercall_t *const hvm_hypercall32_table[NR_hypercalls] = {
     HYPERCALL(tmem_op)
 };
 
+/* PVH 32bitfixme. */
+static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
+    HYPERCALL(platform_op),
+    HYPERCALL(memory_op),
+    HYPERCALL(xen_version),
+    HYPERCALL(console_io),
+    [ __HYPERVISOR_grant_table_op ]  = (hvm_hypercall_t *)hvm_grant_table_op,
+    [ __HYPERVISOR_vcpu_op ]         = (hvm_hypercall_t *)hvm_vcpu_op,
+    HYPERCALL(mmuext_op),
+    HYPERCALL(xsm_op),
+    HYPERCALL(sched_op),
+    HYPERCALL(event_channel_op),
+    [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
+    HYPERCALL(hvm_op),
+    HYPERCALL(sysctl),
+    HYPERCALL(domctl)
+};
+
 int hvm_do_hypercall(struct cpu_user_regs *regs)
 {
     struct vcpu *curr = current;
@@ -3338,7 +3367,9 @@ int hvm_do_hypercall(struct cpu_user_regs *regs)
     if ( (eax & 0x80000000) && is_viridian_domain(curr->domain) )
         return viridian_hypercall(regs);
 
-    if ( (eax >= NR_hypercalls) || !hvm_hypercall32_table[eax] )
+    if ( (eax >= NR_hypercalls) ||
+         (is_pvh_vcpu(curr) && !pvh_hypercall64_table[eax]) ||
+         (is_hvm_vcpu(curr) && !hvm_hypercall32_table[eax]) )
     {
         regs->eax = -ENOSYS;
         return HVM_HCALL_completed;
@@ -3353,16 +3384,20 @@ int hvm_do_hypercall(struct cpu_user_regs *regs)
                     regs->r10, regs->r8, regs->r9);
 
         curr->arch.hvm_vcpu.hcall_64bit = 1;
-        regs->rax = hvm_hypercall64_table[eax](regs->rdi,
-                                               regs->rsi,
-                                               regs->rdx,
-                                               regs->r10,
-                                               regs->r8,
-                                               regs->r9); 
+        if ( is_pvh_vcpu(curr) )
+            regs->rax = pvh_hypercall64_table[eax](regs->rdi, regs->rsi,
+                                                   regs->rdx, regs->r10,
+                                                   regs->r8, regs->r9);
+        else
+            regs->rax = hvm_hypercall64_table[eax](regs->rdi, regs->rsi,
+                                                   regs->rdx, regs->r10,
+                                                   regs->r8, regs->r9);
         curr->arch.hvm_vcpu.hcall_64bit = 0;
     }
     else
     {
+        ASSERT(!is_pvh_vcpu(curr));   /* PVH 32bitfixme. */
+
         HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%u(%x, %x, %x, %x, %x, %x)", eax,
                     (uint32_t)regs->ebx, (uint32_t)regs->ecx,
                     (uint32_t)regs->edx, (uint32_t)regs->esi,
@@ -3760,6 +3795,23 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
     return 0;
 }
 
+static long pvh_hvm_op(unsigned long op, struct domain *d,
+                       struct xen_hvm_param *harg)
+{
+    long rc = -ENOSYS;
+
+    if ( op == HVMOP_set_param )
+    {
+        if ( harg->index == HVM_PARAM_CALLBACK_IRQ )
+        {
+            hvm_set_callback_via(d, harg->value);
+            hvm_latch_shinfo_size(d);
+            rc = 0;
+        }
+    }
+    return rc;
+}
+
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -3787,12 +3839,18 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -ESRCH;
 
         rc = -EINVAL;
-        if ( !is_hvm_domain(d) )
-            goto param_fail;
+        if ( is_pv_domain(d) )
+            goto param_done;
 
         rc = xsm_hvm_param(XSM_TARGET, d, op);
         if ( rc )
-            goto param_fail;
+            goto param_done;
+
+        if ( is_pvh_domain(d) )
+        {
+            rc = pvh_hvm_op(op, d, &a);
+            goto param_done;
+        }
 
         if ( op == HVMOP_set_param )
         {
@@ -4001,7 +4059,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                     op == HVMOP_set_param ? "set" : "get",
                     a.index, a.value);
 
-    param_fail:
+    param_done:
         rcu_unlock_domain(d);
         break;
     }
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index bc65359..b27d79b 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -622,7 +622,7 @@ static void hypercall_page_initialise_ring3_kernel(void *hypercall_page)
 void hypercall_page_initialise(struct domain *d, void *hypercall_page)
 {
     memset(hypercall_page, 0xCC, PAGE_SIZE);
-    if ( is_hvm_domain(d) )
+    if ( !is_pv_domain(d) )
         hvm_hypercall_page_initialise(d, hypercall_page);
     else if ( !is_pv_32bit_domain(d) )
         hypercall_page_initialise_ring3_kernel(hypercall_page);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 17/21] PVH xen: vmcs related changes
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (15 preceding siblings ...)
  2013-08-23  1:19 ` [V11 PATCH 16/21] PVH xen: add hypercall support for PVH Mukesh Rathor
@ 2013-08-23  1:19 ` Mukesh Rathor
  2013-08-23  8:41   ` Jan Beulich
  2013-08-23  1:19 ` [V11 PATCH 18/21] PVH xen: HVM support of PVH guest creation/destruction Mukesh Rathor
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:19 UTC (permalink / raw)
  To: Xen-devel

This patch contains vmcs changes related for PVH, mainly creating a VMCS
for PVH guest.

Changes in V11:
   - Remove pvh_construct_vmcs and make it part of construct_vmcs

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c |  188 ++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 167 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 36c4adf..89a1e9f 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -637,7 +637,7 @@ void vmx_vmcs_exit(struct vcpu *v)
     {
         /* Don't confuse vmx_do_resume (for @v or @current!) */
         vmx_clear_vmcs(v);
-        if ( is_hvm_vcpu(current) )
+        if ( !is_pv_vcpu(current) )
             vmx_load_vmcs(current);
 
         spin_unlock(&v->arch.hvm_vmx.vmcs_lock);
@@ -828,6 +828,63 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
     virtual_vmcs_exit(vvmcs);
 }
 
+static int pvh_check_requirements(struct vcpu *v)
+{
+    u64 required, tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features);
+
+    if ( !paging_mode_hap(v->domain) )
+    {
+        printk(XENLOG_G_INFO "HAP is required for PVH guest.\n");
+        return -EINVAL;
+    }
+    if ( !cpu_has_vmx_ept )
+    {
+        printk(XENLOG_G_INFO "PVH: CPU does not have EPT support\n");
+        return -ENOSYS;
+    }
+    if ( !cpu_has_vmx_pat )
+    {
+        printk(XENLOG_G_INFO "PVH: CPU does not have PAT support\n");
+        return -ENOSYS;
+    }
+    if ( !cpu_has_vmx_msr_bitmap )
+    {
+        printk(XENLOG_G_INFO "PVH: CPU does not have msr bitmap\n");
+        return -ENOSYS;
+    }
+    if ( !cpu_has_vmx_vpid )
+    {
+        printk(XENLOG_G_INFO "PVH: CPU doesn't have VPID support\n");
+        return -ENOSYS;
+    }
+    if ( !cpu_has_vmx_secondary_exec_control )
+    {
+        printk(XENLOG_G_INFO "CPU Secondary exec is required to run PVH\n");
+        return -ENOSYS;
+    }
+
+    /*
+     * If rdtsc exiting is turned on and it goes thru emulate_privileged_op,
+     * then pv_vcpu.ctrlreg must be added to the pvh struct.
+     */
+    if ( v->domain->arch.vtsc )
+    {
+        printk(XENLOG_G_INFO
+               "At present PVH only supports the default timer mode\n");
+        return -ENOSYS;
+    }
+
+    required = X86_CR4_PAE | X86_CR4_VMXE | X86_CR4_OSFXSR;
+    if ( (tmpval & required) != required )
+    {
+        printk(XENLOG_G_INFO "PVH: required CR4 features not available:%lx\n",
+               required);
+        return -ENOSYS;
+    }
+
+    return 0;
+}
+
 static void vmx_set_host_vmcs_fields(struct vcpu *v)
 {
     uint16_t sysenter_cs;
@@ -861,10 +918,14 @@ static void vmx_set_host_vmcs_fields(struct vcpu *v)
 
 static int construct_vmcs(struct vcpu *v)
 {
+    int rc;
     struct domain *d = v->domain;
     u32 vmexit_ctl = vmx_vmexit_control;
     u32 vmentry_ctl = vmx_vmentry_control;
 
+    if ( is_pvh_vcpu(v) && (rc = pvh_check_requirements(v)) )
+        return rc;
+
     vmx_vmcs_enter(v);
 
     /* VMCS controls. */
@@ -874,16 +935,37 @@ static int construct_vmcs(struct vcpu *v)
     if ( d->arch.vtsc )
         v->arch.hvm_vmx.exec_control |= CPU_BASED_RDTSC_EXITING;
 
-    v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control;
+    if ( is_pvh_vcpu(v) )
+    {
+        /* Phase I PVH: we run with minimal secondary exec features */
+        u32 bits = SECONDARY_EXEC_ENABLE_EPT | SECONDARY_EXEC_ENABLE_VPID;
 
-    /* Disable VPID for now: we decide when to enable it on VMENTER. */
-    v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
+        /* Intel SDM: resvd bits are 0 */
+        v->arch.hvm_vmx.secondary_exec_control = bits;
+    }
+    else
+    {
+        v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control;
+
+        /* Disable VPID for now: we decide when to enable it on VMENTER. */
+        v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
+    }
 
     if ( paging_mode_hap(d) )
     {
         v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING |
                                           CPU_BASED_CR3_LOAD_EXITING |
                                           CPU_BASED_CR3_STORE_EXITING);
+        if ( is_pvh_vcpu(v) )
+        {
+            u32 bits = CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
+                       CPU_BASED_ACTIVATE_MSR_BITMAP;
+            v->arch.hvm_vmx.exec_control |= bits;
+
+            bits = CPU_BASED_USE_TSC_OFFSETING | CPU_BASED_TPR_SHADOW |
+                   CPU_BASED_VIRTUAL_NMI_PENDING;
+            v->arch.hvm_vmx.exec_control &= ~bits;
+        }
     }
     else
     {
@@ -905,9 +987,22 @@ static int construct_vmcs(struct vcpu *v)
 
     vmx_update_cpu_exec_control(v);
     __vmwrite(VM_EXIT_CONTROLS, vmexit_ctl);
+
+    if ( is_pvh_vcpu(v) )
+    {
+        /*
+         * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of 1, which means
+         * upon vmentry, the cpu reads/loads VMCS.DR7 and VMCS.DEBUGCTLS, and
+         * not use the host values. 0 would cause it to not use the VMCS values.
+         */
+        vmentry_ctl &= ~(VM_ENTRY_LOAD_GUEST_EFER | VM_ENTRY_SMM |
+                         VM_ENTRY_DEACT_DUAL_MONITOR);
+        /* PVH 32bitfixme. */
+        vmentry_ctl |= VM_ENTRY_IA32E_MODE;   /* GUEST_EFER.LME/LMA ignored */
+    }
     __vmwrite(VM_ENTRY_CONTROLS, vmentry_ctl);
 
-    if ( cpu_has_vmx_ple )
+    if ( cpu_has_vmx_ple && !is_pvh_vcpu(v) )
     {
         __vmwrite(PLE_GAP, ple_gap);
         __vmwrite(PLE_WINDOW, ple_window);
@@ -921,28 +1016,46 @@ static int construct_vmcs(struct vcpu *v)
     if ( cpu_has_vmx_msr_bitmap )
     {
         unsigned long *msr_bitmap = alloc_xenheap_page();
+        int msr_type = MSR_TYPE_R | MSR_TYPE_W;
 
         if ( msr_bitmap == NULL )
+        {
+            vmx_vmcs_exit(v);
             return -ENOMEM;
+        }
 
         memset(msr_bitmap, ~0, PAGE_SIZE);
         v->arch.hvm_vmx.msr_bitmap = msr_bitmap;
         __vmwrite(MSR_BITMAP, virt_to_maddr(msr_bitmap));
 
-        vmx_disable_intercept_for_msr(v, MSR_FS_BASE, MSR_TYPE_R | MSR_TYPE_W);
-        vmx_disable_intercept_for_msr(v, MSR_GS_BASE, MSR_TYPE_R | MSR_TYPE_W);
-        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS, MSR_TYPE_R | MSR_TYPE_W);
-        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP, MSR_TYPE_R | MSR_TYPE_W);
-        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, MSR_TYPE_R | MSR_TYPE_W);
+        /* Disable interecepts for MSRs that have corresponding VMCS fields. */
+        vmx_disable_intercept_for_msr(v, MSR_FS_BASE, msr_type);
+        vmx_disable_intercept_for_msr(v, MSR_GS_BASE, msr_type);
+        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS, msr_type);
+        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP, msr_type);
+        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, msr_type);
+
         if ( cpu_has_vmx_pat && paging_mode_hap(d) )
-            vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, MSR_TYPE_R | MSR_TYPE_W);
+            vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, msr_type);
+
+        if ( is_pvh_vcpu(v) )
+            vmx_disable_intercept_for_msr(v, MSR_SHADOW_GS_BASE, msr_type);
+
+        /*
+         * PVH: We don't disable intercepts for MSRs: MSR_STAR, MSR_LSTAR,
+         *      MSR_CSTAR, and MSR_SYSCALL_MASK because we need to specify
+         *      save/restore area to save/restore at every VM exit and entry.
+         *      Instead, let the intercept functions save them into
+         *      vmx_msr_state fields. See comment in vmx_restore_host_msrs().
+         *      See also vmx_restore_guest_msrs().
+         */
     }
 
     /* I/O access bitmap. */
     __vmwrite(IO_BITMAP_A, virt_to_maddr((char *)hvm_io_bitmap + 0));
     __vmwrite(IO_BITMAP_B, virt_to_maddr((char *)hvm_io_bitmap + PAGE_SIZE));
 
-    if ( cpu_has_vmx_virtual_intr_delivery )
+    if ( cpu_has_vmx_virtual_intr_delivery && !is_pvh_vcpu(v) )
     {
         /* EOI-exit bitmap */
         v->arch.hvm_vmx.eoi_exit_bitmap[0] = (uint64_t)0;
@@ -958,7 +1071,7 @@ static int construct_vmcs(struct vcpu *v)
         __vmwrite(GUEST_INTR_STATUS, 0);
     }
 
-    if ( cpu_has_vmx_posted_intr_processing )
+    if ( cpu_has_vmx_posted_intr_processing && !is_pvh_vcpu(v) )
     {
         __vmwrite(PI_DESC_ADDR, virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
         __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
@@ -1005,7 +1118,12 @@ static int construct_vmcs(struct vcpu *v)
     __vmwrite(GUEST_DS_AR_BYTES, 0xc093);
     __vmwrite(GUEST_FS_AR_BYTES, 0xc093);
     __vmwrite(GUEST_GS_AR_BYTES, 0xc093);
-    __vmwrite(GUEST_CS_AR_BYTES, 0xc09b); /* exec/read, accessed */
+
+    if ( is_pvh_vcpu(v) )
+        /* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme. */
+        __vmwrite(GUEST_CS_AR_BYTES, 0xa09b); /* exec/read, accessed */
+    else
+        __vmwrite(GUEST_CS_AR_BYTES, 0xc09b); /* exec/read, accessed */
 
     /* Guest IDT. */
     __vmwrite(GUEST_IDTR_BASE, 0);
@@ -1032,16 +1150,36 @@ static int construct_vmcs(struct vcpu *v)
 
     v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
               | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
+              | (is_pvh_vcpu(v) ? (1U << TRAP_int3) | (1U << TRAP_debug) : 0)
               | (1U << TRAP_no_device);
     vmx_update_exception_bitmap(v);
 
     v->arch.hvm_vcpu.guest_cr[0] = X86_CR0_PE | X86_CR0_ET;
-    hvm_update_guest_cr(v, 0);
+    if ( is_pvh_vcpu(v) )
+    {
+        u64 tmpval = v->arch.hvm_vcpu.guest_cr[0] | X86_CR0_PG | X86_CR0_NE;
+        v->arch.hvm_vcpu.hw_cr[0] = v->arch.hvm_vcpu.guest_cr[0] = tmpval;
+        __vmwrite(GUEST_CR0, tmpval);
+        __vmwrite(CR0_READ_SHADOW, tmpval);
 
-    v->arch.hvm_vcpu.guest_cr[4] = 0;
-    hvm_update_guest_cr(v, 4);
+        v->arch.hvm_vmx.vmx_realmode = 0;
+    } else
+        hvm_update_guest_cr(v, 0);
 
-    if ( cpu_has_vmx_tpr_shadow )
+    if ( is_pvh_vcpu(v) )
+    {
+        u64 tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features);
+        __vmwrite(GUEST_CR4, tmpval);
+        __vmwrite(CR4_READ_SHADOW, tmpval);
+        v->arch.hvm_vcpu.guest_cr[4] = tmpval;
+    }
+    else
+    {
+        v->arch.hvm_vcpu.guest_cr[4] = 0;
+        hvm_update_guest_cr(v, 4);
+    }
+
+    if ( cpu_has_vmx_tpr_shadow && !is_pvh_vcpu(v) )
     {
         __vmwrite(VIRTUAL_APIC_PAGE_ADDR,
                   page_to_maddr(vcpu_vlapic(v)->regs_page));
@@ -1070,9 +1208,14 @@ static int construct_vmcs(struct vcpu *v)
 
     vmx_vmcs_exit(v);
 
-    paging_update_paging_modes(v); /* will update HOST & GUEST_CR3 as reqd */
+    /* PVH: paging mode is updated by arch_set_info_guest(). */
+    if ( is_hvm_vcpu(v) )
+    {
+        /* will update HOST & GUEST_CR3 as reqd */
+        paging_update_paging_modes(v);
 
-    vmx_vlapic_msr_changed(v);
+        vmx_vlapic_msr_changed(v);
+    }
 
     return 0;
 }
@@ -1297,6 +1440,9 @@ void vmx_do_resume(struct vcpu *v)
         hvm_asid_flush_vcpu(v);
     }
 
+    if ( is_pvh_vcpu(v) )
+        reset_stack_and_jump(vmx_asm_do_vmentry);
+
     debug_state = v->domain->debugger_attached
                   || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_INT3]
                   || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP];
@@ -1480,7 +1626,7 @@ static void vmcs_dump(unsigned char ch)
 
     for_each_domain ( d )
     {
-        if ( !is_hvm_domain(d) )
+        if ( is_pv_domain(d) )
             continue;
         printk("\n>>> Domain %d <<<\n", d->domain_id);
         for_each_vcpu ( d, v )
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 18/21] PVH xen: HVM support of PVH guest creation/destruction
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (16 preceding siblings ...)
  2013-08-23  1:19 ` [V11 PATCH 17/21] PVH xen: vmcs related changes Mukesh Rathor
@ 2013-08-23  1:19 ` Mukesh Rathor
  2013-08-23  1:19 ` [V11 PATCH 19/21] PVH xen: VMX " Mukesh Rathor
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:19 UTC (permalink / raw)
  To: Xen-devel

This patch implements the HVM portion of the guest create, ie
vcpu and domain initilization. Some changes to support the destroy path.

Changes in V10:
    - Move hvm_vcpu.guest_efer setting to here from VMX.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 2407396..4769420 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -514,6 +514,27 @@ static int hvm_print_line(
     return X86EMUL_OKAY;
 }
 
+static int pvh_dom_initialise(struct domain *d)
+{
+    int rc;
+
+    if ( !d->arch.hvm_domain.hap_enabled )
+        return -EINVAL;
+
+    spin_lock_init(&d->arch.hvm_domain.irq_lock);
+
+    hvm_init_cacheattr_region_list(d);
+
+    if ( (rc = paging_enable(d, PG_refcounts|PG_translate|PG_external)) != 0 )
+        goto pvh_dominit_fail;
+
+    return 0;
+
+ pvh_dominit_fail:
+    hvm_destroy_cacheattr_region_list(d);
+    return rc;
+}
+
 int hvm_domain_initialise(struct domain *d)
 {
     int rc;
@@ -524,6 +545,8 @@ int hvm_domain_initialise(struct domain *d)
                  "on a non-VT/AMDV platform.\n");
         return -EINVAL;
     }
+    if ( is_pvh_domain(d) )
+        return pvh_dom_initialise(d);
 
     spin_lock_init(&d->arch.hvm_domain.pbuf_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
@@ -588,6 +611,9 @@ int hvm_domain_initialise(struct domain *d)
 
 void hvm_domain_relinquish_resources(struct domain *d)
 {
+    if ( is_pvh_domain(d) )
+        return;
+
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
@@ -612,11 +638,15 @@ void hvm_domain_relinquish_resources(struct domain *d)
 
 void hvm_domain_destroy(struct domain *d)
 {
+    hvm_destroy_cacheattr_region_list(d);
+
+    if ( is_pvh_domain(d) )
+        return;
+
     hvm_funcs.domain_destroy(d);
     rtc_deinit(d);
     stdvga_deinit(d);
     vioapic_deinit(d);
-    hvm_destroy_cacheattr_region_list(d);
 }
 
 static int hvm_save_tsc_adjust(struct domain *d, hvm_domain_context_t *h)
@@ -1070,6 +1100,33 @@ static int __init __hvm_register_CPU_XSAVE_save_and_restore(void)
 }
 __initcall(__hvm_register_CPU_XSAVE_save_and_restore);
 
+static int pvh_vcpu_initialise(struct vcpu *v)
+{
+    int rc;
+
+    if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 )
+        return rc;
+
+    softirq_tasklet_init(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet,
+                         (void(*)(unsigned long))hvm_assert_evtchn_irq,
+                         (unsigned long)v);
+
+    v->arch.hvm_vcpu.hcall_64bit = 1;    /* PVH 32bitfixme. */
+    v->arch.user_regs.eflags = 2;
+    v->arch.hvm_vcpu.inject_trap.vector = -1;
+
+    if ( (rc = hvm_vcpu_cacheattr_init(v)) != 0 )
+    {
+        hvm_funcs.vcpu_destroy(v);
+        return rc;
+    }
+
+    /* This for hvm_long_mode_enabled(v). PVH 32bitfixme */
+    v->arch.hvm_vcpu.guest_efer = EFER_SCE | EFER_LMA | EFER_LME;
+
+    return 0;
+}
+
 int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
@@ -1081,6 +1138,9 @@ int hvm_vcpu_initialise(struct vcpu *v)
     spin_lock_init(&v->arch.hvm_vcpu.tm_lock);
     INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list);
 
+    if ( is_pvh_vcpu(v) )
+        return pvh_vcpu_initialise(v);
+
     if ( (rc = vlapic_init(v)) != 0 )
         goto fail1;
 
@@ -1169,7 +1229,10 @@ void hvm_vcpu_destroy(struct vcpu *v)
 
     tasklet_kill(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet);
     hvm_vcpu_cacheattr_destroy(v);
-    vlapic_destroy(v);
+
+    if ( !is_pvh_vcpu(v) )
+        vlapic_destroy(v);
+
     hvm_funcs.vcpu_destroy(v);
 
     /* Event channel is already freed by evtchn_destroy(). */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 19/21] PVH xen: VMX support of PVH guest creation/destruction
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (17 preceding siblings ...)
  2013-08-23  1:19 ` [V11 PATCH 18/21] PVH xen: HVM support of PVH guest creation/destruction Mukesh Rathor
@ 2013-08-23  1:19 ` Mukesh Rathor
  2013-08-23  9:14   ` Jan Beulich
  2013-08-23  1:19 ` [V11 PATCH 20/21] PVH xen: introduce vmexit handler for PVH Mukesh Rathor
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:19 UTC (permalink / raw)
  To: Xen-devel

This patch implements the vmx portion of the guest create, ie
vcpu and domain initialization. Some changes to support the destroy path.

Change in V10:
  - Don't call vmx_domain_initialise / vmx_domain_destroy for PVH.
  - Do not set hvm_vcpu.guest_efer here in vmx.c.

Change in V11:
  - Remove vmx_update_pvh_cr() and make it part of vmx_update_guest_cr

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmx.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 9056a3f..93775dd 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1095,6 +1095,18 @@ void vmx_update_debug_state(struct vcpu *v)
 
 static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
 {
+    /*
+     * PVH guest never causes CR3 write vmexit. This is called during the guest
+     * setup.
+     */
+    if ( is_pvh_vcpu(v) && cr != 3 )
+    {
+        printk(XENLOG_G_ERR
+               "PVH: d%d v%d unexpected cr%d update at rip:%lx\n",
+               v->domain->domain_id, v->vcpu_id, cr, __vmread(GUEST_RIP));
+        return;
+    }
+
     vmx_vmcs_enter(v);
 
     switch ( cr )
@@ -1183,7 +1195,7 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
         /* CR2 is updated in exit stub. */
         break;
     case 3:
-        if ( paging_mode_hap(v->domain) )
+        if ( paging_mode_hap(v->domain) && !is_pvh_vcpu(v) )
         {
             if ( !hvm_paging_enabled(v) )
                 v->arch.hvm_vcpu.hw_cr[3] =
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 20/21] PVH xen: introduce vmexit handler for PVH
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (18 preceding siblings ...)
  2013-08-23  1:19 ` [V11 PATCH 19/21] PVH xen: VMX " Mukesh Rathor
@ 2013-08-23  1:19 ` Mukesh Rathor
  2013-08-23  9:12   ` Jan Beulich
  2013-08-23  1:19 ` [V11 PATCH 21/21] PVH xen: Checks, asserts, and limitations " Mukesh Rathor
  2013-08-23  8:49 ` [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Jan Beulich
  21 siblings, 1 reply; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:19 UTC (permalink / raw)
  To: Xen-devel

This patch contains vmx exit handler for PVH guest. Note it contains
a macro dbgp1 to print vmexit reasons and a lot of other data
to go with it. It can be enabled by setting pvhdbg to 1. This can be
very useful debugging for the first few months of testing, after which
it can be removed at the maintainer's discretion.

Changes in V2:
  - Move non VMX generic code to arch/x86/hvm/pvh.c
  - Remove get_gpr_ptr() and use existing decode_register() instead.
  - Defer call to pvh vmx exit handler until interrupts are enabled. So the
    caller vmx_pvh_vmexit_handler() handles the NMI/EXT-INT/TRIPLE_FAULT now.
  - Fix the CPUID (wrongly) clearing bit 24. No need to do this now, set
    the correct feature bits in CR4 during vmcs creation.
  - Fix few hard tabs.

Changes in V3:
  - Lot of cleanup and rework in PVH vm exit handler.
  - add parameter to emulate_forced_invalid_op().

Changes in V5:
  - Move pvh.c and emulate_forced_invalid_op related changes to another patch.
  - Formatting.
  - Remove vmx_pvh_read_descriptor().
  - Use SS DPL instead of CS.RPL for CPL.
  - Remove pvh_user_cpuid() and call pv_cpuid for user mode also.

Changes in V6:
  - Replace domain_crash_synchronous() with domain_crash().

Changes in V7:
  - Don't read all selectors on every vmexit. Do that only for the
    IO instruction vmexit.
  - Add couple checks and set guest_cr[4] in access_cr4().
  - Add period after all comments in case that's an issue.
  - Move making pv_cpuid and emulate_privileged_op public here.

Changes in V8:
  - Mainly, don't read selectors on vmexit. The macros now come to VMCS
    to read selectors on demand.

Changes in V11:
  - merge this with previous patch "prep changes".
  - allow invalid op emulation for kernel mode also.
  - Use CR0_READ_SHADOW instead of GUEST_CR0.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c            |    3 +-
 xen/arch/x86/hvm/vmx/pvh.c        |  468 +++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c        |    6 +
 xen/arch/x86/traps.c              |    6 +-
 xen/include/asm-x86/hvm/vmx/vmx.h |    1 +
 xen/include/asm-x86/processor.h   |    2 +
 xen/include/asm-x86/traps.h       |    3 +
 7 files changed, 485 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4769420..372c2db 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3038,7 +3038,8 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
     hvm_cpuid(1, &cpuid[0], &cpuid[1], &cpuid[2], &cpuid[3]);
     mtrr = !!(cpuid[3] & cpufeat_mask(X86_FEATURE_MTRR));
 
-    hvm_memory_event_msr(msr, msr_content);
+    if ( !is_pvh_vcpu(v) )
+        hvm_memory_event_msr(msr, msr_content);
 
     switch ( msr )
     {
diff --git a/xen/arch/x86/hvm/vmx/pvh.c b/xen/arch/x86/hvm/vmx/pvh.c
index 526ce2b..23e7d17 100644
--- a/xen/arch/x86/hvm/vmx/pvh.c
+++ b/xen/arch/x86/hvm/vmx/pvh.c
@@ -20,6 +20,474 @@
 #include <asm/hvm/nestedhvm.h>
 #include <asm/xstate.h>
 
+#ifndef NDEBUG
+static int pvhdbg = 0;
+#define dbgp1(...) do { (pvhdbg == 1) ? printk(__VA_ARGS__) : 0; } while ( 0 )
+#else
+#define dbgp1(...) ((void)0)
+#endif
+
+/* Returns : 0 == msr read successfully. */
+static int vmxit_msr_read(struct cpu_user_regs *regs)
+{
+    u64 msr_content = 0;
+
+    switch ( regs->ecx )
+    {
+    case MSR_IA32_MISC_ENABLE:
+        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
+        msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
+                       MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
+        break;
+
+    default:
+        /* PVH fixme: see hvm_msr_read_intercept(). */
+        rdmsrl(regs->ecx, msr_content);
+        break;
+    }
+    regs->eax = (uint32_t)msr_content;
+    regs->edx = (uint32_t)(msr_content >> 32);
+    vmx_update_guest_eip();
+
+    dbgp1("msr read c:%#lx a:%#lx d:%#lx RIP:%#lx RSP:%#lx\n", regs->ecx,
+          regs->eax, regs->edx, regs->rip, regs->rsp);
+
+    return 0;
+}
+
+/* Returns : 0 == msr written successfully. */
+static int vmxit_msr_write(struct cpu_user_regs *regs)
+{
+    uint64_t msr_content = regs->eax | (regs->edx << 32);
+
+    dbgp1("PVH: msr write:%#lx. eax:%#lx edx:%#lx\n", regs->ecx,
+          regs->eax, regs->edx);
+
+    if ( hvm_msr_write_intercept(regs->ecx, msr_content) == X86EMUL_OKAY )
+    {
+        vmx_update_guest_eip();
+        return 0;
+    }
+    return 1;
+}
+
+static int vmxit_debug(struct cpu_user_regs *regs)
+{
+    struct vcpu *vp = current;
+    unsigned long exit_qualification = __vmread(EXIT_QUALIFICATION);
+
+    write_debugreg(6, exit_qualification | 0xffff0ff0);
+
+    /* gdbsx or another debugger. Never pause dom0. */
+    if ( vp->domain->domain_id != 0 && vp->domain->debugger_attached )
+        domain_pause_for_debugger();
+    else
+        hvm_inject_hw_exception(TRAP_debug, HVM_DELIVER_NO_ERROR_CODE);
+
+    return 0;
+}
+
+/* Returns: rc == 0: handled the MTF vmexit. */
+static int vmxit_mtf(struct cpu_user_regs *regs)
+{
+    struct vcpu *vp = current;
+    int rc = -EINVAL, ss = vp->arch.hvm_vcpu.single_step;
+
+    vp->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG;
+    __vmwrite(CPU_BASED_VM_EXEC_CONTROL, vp->arch.hvm_vmx.exec_control);
+    vp->arch.hvm_vcpu.single_step = 0;
+
+    if ( vp->domain->debugger_attached && ss )
+    {
+        domain_pause_for_debugger();
+        rc = 0;
+    }
+    return rc;
+}
+
+static int vmxit_int3(struct cpu_user_regs *regs)
+{
+    int ilen = vmx_get_instruction_length();
+    struct vcpu *vp = current;
+    struct hvm_trap trap_info = {
+        .vector = TRAP_int3,
+        .type = X86_EVENTTYPE_SW_EXCEPTION,
+        .error_code = HVM_DELIVER_NO_ERROR_CODE,
+        .insn_len = ilen
+    };
+
+    /* gdbsx or another debugger. Never pause dom0. */
+    if ( vp->domain->domain_id != 0 && vp->domain->debugger_attached )
+    {
+        regs->eip += ilen;
+        dbgp1("[%d]PVH: domain pause for debugger\n", smp_processor_id());
+        current->arch.gdbsx_vcpu_event = TRAP_int3;
+        domain_pause_for_debugger();
+        return 0;
+    }
+    hvm_inject_trap(&trap_info);
+
+    return 0;
+}
+
+/*
+ * Just like HVM, PVH should be using "cpuid" from the kernel mode.
+ * While it's allowed, it's depracated.
+ */
+static int vmxit_invalid_op(struct cpu_user_regs *regs)
+{
+    if ( !emulate_forced_invalid_op(regs) )
+        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+
+    return 0;
+}
+
+/* Returns: rc == 0: handled the exception. */
+static int vmxit_exception(struct cpu_user_regs *regs)
+{
+    int vector = (__vmread(VM_EXIT_INTR_INFO)) & INTR_INFO_VECTOR_MASK;
+    int rc = -ENOSYS;
+
+    dbgp1(" EXCPT: vec:%#x cs:%#lx rip:%#lx\n", vector,
+          __vmread(GUEST_CS_SELECTOR), regs->eip);
+
+    switch ( vector )
+    {
+    case TRAP_debug:
+        rc = vmxit_debug(regs);
+        break;
+
+    case TRAP_int3:
+        rc = vmxit_int3(regs);
+        break;
+
+    case TRAP_invalid_op:
+        rc = vmxit_invalid_op(regs);
+        break;
+
+    case TRAP_no_device:
+        hvm_funcs.fpu_dirty_intercept();
+        rc = 0;
+        break;
+
+    default:
+        printk(XENLOG_G_WARNING
+               "PVH: Unhandled trap:%#x RIP:%#lx\n", vector, regs->eip);
+    }
+    return rc;
+}
+
+static int vmxit_vmcall(struct cpu_user_regs *regs)
+{
+    if ( hvm_do_hypercall(regs) != HVM_HCALL_preempted )
+        vmx_update_guest_eip();
+    return 0;
+}
+
+/* Returns: rc == 0: success. */
+static int access_cr0(struct cpu_user_regs *regs, uint acc_typ, uint64_t *regp)
+{
+    struct vcpu *vp = current;
+
+    if ( acc_typ == VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR )
+    {
+        unsigned long new_cr0 = *regp;
+        unsigned long old_cr0 = __vmread(GUEST_CR0);
+
+        dbgp1("PVH:writing to CR0. RIP:%#lx val:%#lx\n", regs->rip, *regp);
+        if ( (u32)new_cr0 != new_cr0 )
+        {
+            printk(XENLOG_G_WARNING
+                   "Guest setting upper 32 bits in CR0: %#lx", new_cr0);
+            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+            return 0;
+        }
+
+        new_cr0 &= ~HVM_CR0_GUEST_RESERVED_BITS;
+        /* ET is reserved and should always be 1. */
+        new_cr0 |= X86_CR0_ET;
+
+        /* A pvh is not expected to change to real mode. */
+        if ( (new_cr0 & (X86_CR0_PE | X86_CR0_PG)) !=
+             (X86_CR0_PG | X86_CR0_PE) )
+        {
+            printk(XENLOG_G_WARNING
+                   "PVH attempting to turn off PE/PG. CR0:%#lx\n", new_cr0);
+            return -EPERM;
+        }
+        /* TS going from 1 to 0 */
+        if ( (old_cr0 & X86_CR0_TS) && ((new_cr0 & X86_CR0_TS) == 0) )
+            vmx_fpu_enter(vp);
+
+        vp->arch.hvm_vcpu.hw_cr[0] = vp->arch.hvm_vcpu.guest_cr[0] = new_cr0;
+        __vmwrite(GUEST_CR0, new_cr0);
+        __vmwrite(CR0_READ_SHADOW, new_cr0);
+    }
+    else
+        *regp = __vmread(CR0_READ_SHADOW);
+
+    return 0;
+}
+
+/* Returns: rc == 0: success. */
+static int access_cr4(struct cpu_user_regs *regs, uint acc_typ, uint64_t *regp)
+{
+    if ( acc_typ == VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR )
+    {
+        struct vcpu *vp = current;
+        u64 old_val = __vmread(GUEST_CR4);
+        u64 new = *regp;
+
+        if ( new & HVM_CR4_GUEST_RESERVED_BITS(vp) )
+        {
+            printk(XENLOG_G_WARNING
+                   "PVH guest attempts to set reserved bit in CR4: %#lx", new);
+            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+            return 0;
+        }
+
+        if ( !(new & X86_CR4_PAE) && hvm_long_mode_enabled(vp) )
+        {
+            printk(XENLOG_G_WARNING "Guest cleared CR4.PAE while "
+                   "EFER.LMA is set");
+            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+            return 0;
+        }
+
+        /* hw_cr[4] is not used for PVH anywhere */
+        vp->arch.hvm_vcpu.guest_cr[4] = new;
+
+        if ( (old_val ^ new) & (X86_CR4_PSE | X86_CR4_PGE | X86_CR4_PAE) )
+            vpid_sync_all();
+
+        __vmwrite(CR4_READ_SHADOW, new);
+
+        new |= X86_CR4_VMXE | X86_CR4_MCE;
+        __vmwrite(GUEST_CR4, new);
+    }
+    else
+        *regp = __vmread(CR4_READ_SHADOW);
+
+    return 0;
+}
+
+/* Returns: rc == 0: success, else -errno. */
+static int vmxit_cr_access(struct cpu_user_regs *regs)
+{
+    unsigned long exit_qualification = __vmread(EXIT_QUALIFICATION);
+    uint acc_typ = VMX_CONTROL_REG_ACCESS_TYPE(exit_qualification);
+    int cr, rc = -EINVAL;
+
+    switch ( acc_typ )
+    {
+    case VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR:
+    case VMX_CONTROL_REG_ACCESS_TYPE_MOV_FROM_CR:
+    {
+        uint gpr = VMX_CONTROL_REG_ACCESS_GPR(exit_qualification);
+        uint64_t *regp = decode_register(gpr, regs, 0);
+        cr = VMX_CONTROL_REG_ACCESS_NUM(exit_qualification);
+
+        if ( regp == NULL )
+            break;
+
+        switch ( cr )
+        {
+        case 0:
+            rc = access_cr0(regs, acc_typ, regp);
+            break;
+
+        case 3:
+            printk(XENLOG_G_ERR "PVH: unexpected cr3 vmexit. rip:%#lx\n",
+                   regs->rip);
+            domain_crash(current->domain);
+            break;
+
+        case 4:
+            rc = access_cr4(regs, acc_typ, regp);
+            break;
+        }
+        if ( rc == 0 )
+            vmx_update_guest_eip();
+        break;
+    }
+
+    case VMX_CONTROL_REG_ACCESS_TYPE_CLTS:
+    {
+        struct vcpu *vp = current;
+        unsigned long cr0 = vp->arch.hvm_vcpu.guest_cr[0] & ~X86_CR0_TS;
+        vp->arch.hvm_vcpu.hw_cr[0] = vp->arch.hvm_vcpu.guest_cr[0] = cr0;
+
+        vmx_fpu_enter(vp);
+        __vmwrite(GUEST_CR0, cr0);
+        __vmwrite(CR0_READ_SHADOW, cr0);
+        vmx_update_guest_eip();
+        rc = 0;
+        break;
+    }
+
+    case VMX_CONTROL_REG_ACCESS_TYPE_LMSW:
+    {
+        uint64_t value = current->arch.hvm_vcpu.guest_cr[0];
+        /* LMSW can: (1) set bits 0-3; (2) clear bits 1-3. */
+        value = (value & ~0xe) | ((exit_qualification >> 16) & 0xf);
+        rc = access_cr0(regs, VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR, &value);
+        break;
+    }
+    }
+    return rc;
+}
+
+/*
+ * Note: A PVH guest sets IOPL natively by setting bits in the eflags, and not
+ *       via hypercalls used by a PV.
+ */
+static int vmxit_io_instr(struct cpu_user_regs *regs)
+{
+    struct segment_register seg;
+    int requested = (regs->rflags & X86_EFLAGS_IOPL) >> 12;
+    int curr_lvl = (regs->rflags & X86_EFLAGS_VM) ? 3 : 0;
+
+    if ( curr_lvl == 0 )
+    {
+        hvm_get_segment_register(current, x86_seg_ss, &seg);
+        curr_lvl = seg.attr.fields.dpl;
+    }
+    if ( requested >= curr_lvl && emulate_privileged_op(regs) )
+        return 0;
+
+    hvm_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+    return 0;
+}
+
+static int pvh_ept_handle_violation(unsigned long qualification,
+                                    paddr_t gpa, struct cpu_user_regs *regs)
+{
+    unsigned long gla, gfn = gpa >> PAGE_SHIFT;
+    p2m_type_t p2mt;
+    mfn_t mfn = get_gfn_query_unlocked(current->domain, gfn, &p2mt);
+
+    printk(XENLOG_G_ERR "EPT violation %#lx (%c%c%c/%c%c%c), "
+           "gpa %#"PRIpaddr", mfn %#lx, type %i. RIP:%#lx RSP:%#lx\n",
+           qualification,
+           (qualification & EPT_READ_VIOLATION) ? 'r' : '-',
+           (qualification & EPT_WRITE_VIOLATION) ? 'w' : '-',
+           (qualification & EPT_EXEC_VIOLATION) ? 'x' : '-',
+           (qualification & EPT_EFFECTIVE_READ) ? 'r' : '-',
+           (qualification & EPT_EFFECTIVE_WRITE) ? 'w' : '-',
+           (qualification & EPT_EFFECTIVE_EXEC) ? 'x' : '-',
+           gpa, mfn_x(mfn), p2mt, regs->rip, regs->rsp);
+
+    ept_walk_table(current->domain, gfn);
+
+    if ( qualification & EPT_GLA_VALID )
+    {
+        gla = __vmread(GUEST_LINEAR_ADDRESS);
+        printk(XENLOG_G_ERR " --- GLA %#lx\n", gla);
+    }
+    hvm_inject_hw_exception(TRAP_gp_fault, 0);
+    return 0;
+}
+
+/*
+ * Main vm exit handler for PVH . Called from vmx_vmexit_handler().
+ * Note: vmx_asm_vmexit_handler updates rip/rsp/eflags in the regs{} struct.
+ */
+void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs)
+{
+    unsigned long exit_qualification;
+    unsigned int exit_reason = __vmread(VM_EXIT_REASON);
+    int rc=0, ccpu = smp_processor_id();
+    struct vcpu *v = current;
+
+    dbgp1("PVH:[%d]left VMCS exitreas:%d RIP:%#lx RSP:%#lx EFLAGS:%#lx "
+          "CR0:%#lx\n", ccpu, exit_reason, regs->rip, regs->rsp, regs->rflags,
+          __vmread(GUEST_CR0));
+
+    switch ( (uint16_t)exit_reason )
+    {
+    /* NMI and machine_check are handled by the caller, we handle rest here */
+    case EXIT_REASON_EXCEPTION_NMI:      /* 0 */
+        rc = vmxit_exception(regs);
+        break;
+
+    case EXIT_REASON_EXTERNAL_INTERRUPT: /* 1 */
+        break;              /* handled in vmx_vmexit_handler() */
+
+    case EXIT_REASON_PENDING_VIRT_INTR:  /* 7 */
+        /* Disable the interrupt window. */
+        v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
+        __vmwrite(CPU_BASED_VM_EXEC_CONTROL, v->arch.hvm_vmx.exec_control);
+        break;
+
+    case EXIT_REASON_CPUID:              /* 10 */
+        pv_cpuid(regs);
+        vmx_update_guest_eip();
+        break;
+
+    case EXIT_REASON_HLT:                /* 12 */
+        vmx_update_guest_eip();
+        hvm_hlt(regs->eflags);
+        break;
+
+    case EXIT_REASON_VMCALL:             /* 18 */
+        rc = vmxit_vmcall(regs);
+        break;
+
+    case EXIT_REASON_CR_ACCESS:          /* 28 */
+        rc = vmxit_cr_access(regs);
+        break;
+
+    case EXIT_REASON_DR_ACCESS:          /* 29 */
+        exit_qualification = __vmread(EXIT_QUALIFICATION);
+        vmx_dr_access(exit_qualification, regs);
+        break;
+
+    case EXIT_REASON_IO_INSTRUCTION:     /* 30 */
+        vmxit_io_instr(regs);
+        break;
+
+    case EXIT_REASON_MSR_READ:           /* 31 */
+        rc = vmxit_msr_read(regs);
+        break;
+
+    case EXIT_REASON_MSR_WRITE:          /* 32 */
+        rc = vmxit_msr_write(regs);
+        break;
+
+    case EXIT_REASON_MONITOR_TRAP_FLAG:  /* 37 */
+        rc = vmxit_mtf(regs);
+        break;
+
+    case EXIT_REASON_MCE_DURING_VMENTRY: /* 41 */
+        break;              /* handled in vmx_vmexit_handler() */
+
+    case EXIT_REASON_EPT_VIOLATION:      /* 48 */
+    {
+        paddr_t gpa = __vmread(GUEST_PHYSICAL_ADDRESS);
+        exit_qualification = __vmread(EXIT_QUALIFICATION);
+        rc = pvh_ept_handle_violation(exit_qualification, gpa, regs);
+        break;
+    }
+
+    default:
+        rc = 1;
+        printk(XENLOG_G_ERR
+               "PVH: Unexpected exit reason:%#x\n", exit_reason);
+    }
+
+    if ( rc )
+    {
+        exit_qualification = __vmread(EXIT_QUALIFICATION);
+        printk(XENLOG_G_WARNING
+               "PVH: [%d] exit_reas:%d %#x qual:%ld %#lx cr0:%#016lx\n",
+               ccpu, exit_reason, exit_reason, exit_qualification,
+               exit_qualification, __vmread(GUEST_CR0));
+        printk(XENLOG_G_WARNING "PVH: RIP:%#lx RSP:%#lx EFLGS:%#lx CR3:%#lx\n",
+               regs->rip, regs->rsp, regs->rflags, __vmread(GUEST_CR3));
+        domain_crash(v->domain);
+    }
+}
+
 /*
  * Set vmcs fields during boot of a vcpu. Called from arch_set_info_guest.
  *
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 93775dd..54d0d14 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2474,6 +2474,12 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     if ( unlikely(exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) )
         return vmx_failed_vmentry(exit_reason, regs);
 
+    if ( is_pvh_vcpu(v) )
+    {
+        vmx_pvh_vmexit_handler(regs);
+        return;
+    }
+
     if ( v->arch.hvm_vmx.vmx_realmode )
     {
         /* Put RFLAGS back the way the guest wants it */
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 2a3f517..c0aafb4 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -746,7 +746,7 @@ int cpuid_hypervisor_leaves( uint32_t idx, uint32_t sub_idx,
     return 1;
 }
 
-static void pv_cpuid(struct cpu_user_regs *regs)
+void pv_cpuid(struct cpu_user_regs *regs)
 {
     uint32_t a, b, c, d;
 
@@ -923,7 +923,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     return EXCRET_fault_fixed;
 }
 
-static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
+int emulate_forced_invalid_op(struct cpu_user_regs *regs)
 {
     char sig[5], instr[2];
     unsigned long eip, rc;
@@ -1909,7 +1909,7 @@ static int is_cpufreq_controller(struct domain *d)
 
 #include "x86_64/mmconfig.h"
 
-static int emulate_privileged_op(struct cpu_user_regs *regs)
+int emulate_privileged_op(struct cpu_user_regs *regs)
 {
     enum x86_segment which_sel;
     struct vcpu *v = current;
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 3ad2188..78fb7fb 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -475,6 +475,7 @@ void vmx_dr_access(unsigned long exit_qualification,
 void vmx_fpu_enter(struct vcpu *v);
 int  vmx_pvh_vcpu_boot_set_info(struct vcpu *v,
                                 struct vcpu_guest_context *ctxtp);
+void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs);
 
 int alloc_p2m_hap_data(struct p2m_domain *p2m);
 void free_p2m_hap_data(struct p2m_domain *p2m);
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 5cdacc7..22a9653 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -566,6 +566,8 @@ void microcode_set_module(unsigned int);
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len);
 int microcode_resume_cpu(int cpu);
 
+void pv_cpuid(struct cpu_user_regs *regs);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_X86_PROCESSOR_H */
diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h
index 82cbcee..20c9151 100644
--- a/xen/include/asm-x86/traps.h
+++ b/xen/include/asm-x86/traps.h
@@ -49,4 +49,7 @@ extern int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
 extern int send_guest_trap(struct domain *d, uint16_t vcpuid,
 				unsigned int trap_nr);
 
+int emulate_privileged_op(struct cpu_user_regs *regs);
+int emulate_forced_invalid_op(struct cpu_user_regs *regs);
+
 #endif /* ASM_TRAP_H */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [V11 PATCH 21/21] PVH xen: Checks, asserts, and limitations for PVH
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (19 preceding siblings ...)
  2013-08-23  1:19 ` [V11 PATCH 20/21] PVH xen: introduce vmexit handler for PVH Mukesh Rathor
@ 2013-08-23  1:19 ` Mukesh Rathor
  2013-08-23  8:49 ` [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Jan Beulich
  21 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-23  1:19 UTC (permalink / raw)
  To: Xen-devel

This patch adds some precautionary checks and debug asserts for PVH. Also,
PVH doesn't support any HVM type guest monitoring at present.

Change in V9:
   - Remove ASSERTs from emulate_gate_op and do_device_not_available.

Change in V11:
   - make this last patch.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c  |   13 +++++++++++++
 xen/arch/x86/hvm/mtrr.c |    4 ++++
 xen/arch/x86/physdev.c  |   13 +++++++++++++
 3 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 372c2db..ef72a55 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4652,8 +4652,11 @@ static int hvm_memory_event_traps(long p, uint32_t reason,
     return 1;
 }
 
+/* PVH fixme: add support for monitoring guest behaviour in below functions. */
 void hvm_memory_event_cr0(unsigned long value, unsigned long old) 
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_CR0],
                            MEM_EVENT_REASON_CR0,
@@ -4662,6 +4665,8 @@ void hvm_memory_event_cr0(unsigned long value, unsigned long old)
 
 void hvm_memory_event_cr3(unsigned long value, unsigned long old) 
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_CR3],
                            MEM_EVENT_REASON_CR3,
@@ -4670,6 +4675,8 @@ void hvm_memory_event_cr3(unsigned long value, unsigned long old)
 
 void hvm_memory_event_cr4(unsigned long value, unsigned long old) 
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_CR4],
                            MEM_EVENT_REASON_CR4,
@@ -4678,6 +4685,8 @@ void hvm_memory_event_cr4(unsigned long value, unsigned long old)
 
 void hvm_memory_event_msr(unsigned long msr, unsigned long value)
 {
+    if ( is_pvh_vcpu(current) )
+        return;
     hvm_memory_event_traps(current->domain->arch.hvm_domain
                              .params[HVM_PARAM_MEMORY_EVENT_MSR],
                            MEM_EVENT_REASON_MSR,
@@ -4690,6 +4699,8 @@ int hvm_memory_event_int3(unsigned long gla)
     unsigned long gfn;
     gfn = paging_gva_to_gfn(current, gla, &pfec);
 
+    if ( is_pvh_vcpu(current) )
+        return 0;
     return hvm_memory_event_traps(current->domain->arch.hvm_domain
                                     .params[HVM_PARAM_MEMORY_EVENT_INT3],
                                   MEM_EVENT_REASON_INT3,
@@ -4702,6 +4713,8 @@ int hvm_memory_event_single_step(unsigned long gla)
     unsigned long gfn;
     gfn = paging_gva_to_gfn(current, gla, &pfec);
 
+    if ( is_pvh_vcpu(current) )
+        return 0;
     return hvm_memory_event_traps(current->domain->arch.hvm_domain
             .params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP],
             MEM_EVENT_REASON_SINGLESTEP,
diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index b9d6411..6706af6 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -578,6 +578,10 @@ int32_t hvm_set_mem_pinned_cacheattr(
 {
     struct hvm_mem_pinned_cacheattr_range *range;
 
+    /* Side note: A PVH guest writes to MSR_IA32_CR_PAT natively. */
+    if ( is_pvh_domain(d) )
+        return -EOPNOTSUPP;
+
     if ( !((type == PAT_TYPE_UNCACHABLE) ||
            (type == PAT_TYPE_WRCOMB) ||
            (type == PAT_TYPE_WRTHROUGH) ||
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 4835ed7..a6e06a3 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -519,6 +519,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case PHYSDEVOP_set_iopl: {
         struct physdev_set_iopl set_iopl;
+
+        if ( is_pvh_vcpu(current) )
+        {
+            ret = -EPERM;
+            break;
+        }
+
         ret = -EFAULT;
         if ( copy_from_guest(&set_iopl, arg, 1) != 0 )
             break;
@@ -532,6 +539,12 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case PHYSDEVOP_set_iobitmap: {
         struct physdev_set_iobitmap set_iobitmap;
+
+        if ( is_pvh_vcpu(current) )
+        {
+            ret = -EPERM;
+            break;
+        }
         ret = -EFAULT;
         if ( copy_from_guest(&set_iobitmap, arg, 1) != 0 )
             break;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 09/21] PVH xen: domain create, context switch related code changes
  2013-08-23  1:18 ` [V11 PATCH 09/21] PVH xen: domain create, context switch related code changes Mukesh Rathor
@ 2013-08-23  8:12   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2013-08-23  8:12 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 23.08.13 at 03:18, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> Changes in V11:
>   - set cr3 to page_to_maddr and not page_to_mfn.
>   - reject non-zero cr1 value for pvh.
>   - Do not check for pvh in destroy_gdt, but put the check in callers.
>   - Set _VPF_in_reset for PVH also.

These are clearly too many changes to retain...

> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Keir Fraser <keir@xen.org>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Tim Deegan <tim@xen.org>
> PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>

... all these tags. This practice of yours was already questionable
on one or two earlier patches, but I guess we can tolerate it there.

> @@ -892,8 +896,19 @@ int arch_set_info_guest(
>          /* handled below */;
>      else if ( !compat )
>      {
> +        /* PVH 32bitfixme. */
> +        if ( is_pvh_vcpu(v) )
> +        {
> +            v->arch.cr3 = page_to_maddr(cr3_page);
> +            v->arch.hvm_vcpu.guest_cr[3] = c.nat->ctrlreg[3];
> +        }
> +
>          v->arch.guest_table = pagetable_from_page(cr3_page);
> -        if ( c.nat->ctrlreg[1] )
> +
> +        if ( c.nat->ctrlreg[1] && is_pvh_vcpu(v) )
> +            rc = -EINVAL;
> +
> +        if ( c.nat->ctrlreg[1] && is_pv_vcpu(v) )
>          {
>              cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[1]);
>              cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC);

To many changes to if() conditions. I think it could be brought down
to

    else if ( !compat )
    {
        v->arch.guest_table = pagetable_from_page(cr3_page);

        /* PVH 32bitfixme. */
        if ( is_pvh_vcpu(v) )
        {
            v->arch.cr3 = page_to_maddr(cr3_page);
            v->arch.hvm_vcpu.guest_cr[3] = c.nat->ctrlreg[3];
            if ( c.nat->ctrlreg[1] )
                rc = -EINVAL;
        }
        else if ( c.nat->ctrlreg[1] )
        {
            cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[1]);
            cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC);

making the flow much more clear (at least to me).

> @@ -953,6 +969,13 @@ int arch_set_info_guest(
>  
>      update_cr3(v);
>  
> +    if ( is_pvh_vcpu(v) )
> +    {
> +        /* Set VMCS fields. */
> +        if ( (rc = pvh_vcpu_boot_set_info(v, c.nat)) != 0 )
> +            return rc;
> +    }

Combine the two if()-s and drop or generalize the VMX-specific
comment.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 17/21] PVH xen: vmcs related changes
  2013-08-23  1:19 ` [V11 PATCH 17/21] PVH xen: vmcs related changes Mukesh Rathor
@ 2013-08-23  8:41   ` Jan Beulich
  2013-08-24  0:26     ` Mukesh Rathor
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2013-08-23  8:41 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> This patch contains vmcs changes related for PVH, mainly creating a VMCS
> for PVH guest.
> 
> Changes in V11:
>    - Remove pvh_construct_vmcs and make it part of construct_vmcs
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Acked-by: Keir Fraser <keir@xen.org>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>

Same comment as for #9 - too many changes to retain the tags.

> @@ -874,16 +935,37 @@ static int construct_vmcs(struct vcpu *v)
>      if ( d->arch.vtsc )
>          v->arch.hvm_vmx.exec_control |= CPU_BASED_RDTSC_EXITING;
>  
> -    v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control;
> +    if ( is_pvh_vcpu(v) )
> +    {
> +        /* Phase I PVH: we run with minimal secondary exec features */
> +        u32 bits = SECONDARY_EXEC_ENABLE_EPT | SECONDARY_EXEC_ENABLE_VPID;
>  
> -    /* Disable VPID for now: we decide when to enable it on VMENTER. */
> -    v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
> +        /* Intel SDM: resvd bits are 0 */
> +        v->arch.hvm_vmx.secondary_exec_control = bits;
> +    }
> +    else
> +    {
> +        v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control;
> +
> +        /* Disable VPID for now: we decide when to enable it on VMENTER. */
> +        v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
> +    }

So this difference in VPID handling needs some explanation, as I
don't immediately see how you adjust the code referred to by the
non-PVH comment above.

>      if ( paging_mode_hap(d) )
>      {
>          v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING |
>                                            CPU_BASED_CR3_LOAD_EXITING |
>                                            CPU_BASED_CR3_STORE_EXITING);
> +        if ( is_pvh_vcpu(v) )
> +        {
> +            u32 bits = CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
> +                       CPU_BASED_ACTIVATE_MSR_BITMAP;
> +            v->arch.hvm_vmx.exec_control |= bits;
> +
> +            bits = CPU_BASED_USE_TSC_OFFSETING | CPU_BASED_TPR_SHADOW |
> +                   CPU_BASED_VIRTUAL_NMI_PENDING;
> +            v->arch.hvm_vmx.exec_control &= ~bits;
> +        }

Did you notice that the original adjustments were truly HAP-related?
Putting your PVH code here just because it takes HAP as a prereq is
not the way to go. Wherever the flags you want set get set for the
HVM case, you should add your adjustments (if any are necessary
at all).

> @@ -905,9 +987,22 @@ static int construct_vmcs(struct vcpu *v)
>  
>      vmx_update_cpu_exec_control(v);
>      __vmwrite(VM_EXIT_CONTROLS, vmexit_ctl);
> +
> +    if ( is_pvh_vcpu(v) )
> +    {
> +        /*
> +         * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of 1, which means
> +         * upon vmentry, the cpu reads/loads VMCS.DR7 and VMCS.DEBUGCTLS, and
> +         * not use the host values. 0 would cause it to not use the VMCS values.
> +         */
> +        vmentry_ctl &= ~(VM_ENTRY_LOAD_GUEST_EFER | VM_ENTRY_SMM |
> +                         VM_ENTRY_DEACT_DUAL_MONITOR);
> +        /* PVH 32bitfixme. */
> +        vmentry_ctl |= VM_ENTRY_IA32E_MODE;   /* GUEST_EFER.LME/LMA ignored */
> +    }
>      __vmwrite(VM_ENTRY_CONTROLS, vmentry_ctl);

This is misplaced too - we're past the point of determining the set of
flags, and already in the process of committing them. The code
again should go alongside where the corresponding HVM code sits.

> -    if ( cpu_has_vmx_ple )
> +    if ( cpu_has_vmx_ple && !is_pvh_vcpu(v) )
>      {
>          __vmwrite(PLE_GAP, ple_gap);
>          __vmwrite(PLE_WINDOW, ple_window);

Why would this be conditional upon !PVH?

> @@ -921,28 +1016,46 @@ static int construct_vmcs(struct vcpu *v)
>      if ( cpu_has_vmx_msr_bitmap )
>      {
>          unsigned long *msr_bitmap = alloc_xenheap_page();
> +        int msr_type = MSR_TYPE_R | MSR_TYPE_W;
>  
>          if ( msr_bitmap == NULL )
> +        {
> +            vmx_vmcs_exit(v);
>              return -ENOMEM;
> +        }

This is a genuine bug fix, which should be contributed separately
(so it becomes backportable).

> -        vmx_disable_intercept_for_msr(v, MSR_FS_BASE, MSR_TYPE_R | MSR_TYPE_W);
> -        vmx_disable_intercept_for_msr(v, MSR_GS_BASE, MSR_TYPE_R | MSR_TYPE_W);
> -        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS, MSR_TYPE_R | MSR_TYPE_W);
> -        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP, MSR_TYPE_R | MSR_TYPE_W);
> -        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, MSR_TYPE_R | MSR_TYPE_W);
> +        /* Disable interecepts for MSRs that have corresponding VMCS fields. */
> +        vmx_disable_intercept_for_msr(v, MSR_FS_BASE, msr_type);
> +        vmx_disable_intercept_for_msr(v, MSR_GS_BASE, msr_type);
> +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS, msr_type);
> +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP, msr_type);
> +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, msr_type);
> +
>          if ( cpu_has_vmx_pat && paging_mode_hap(d) )
> -            vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, MSR_TYPE_R | MSR_TYPE_W);
> +            vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, msr_type);

These changes all look pointless; if you really think they're worthwhile
cleanup, they don't belong here.

> -    if ( cpu_has_vmx_virtual_intr_delivery )
> +    if ( cpu_has_vmx_virtual_intr_delivery && !is_pvh_vcpu(v) )
>      {
>          /* EOI-exit bitmap */
>          v->arch.hvm_vmx.eoi_exit_bitmap[0] = (uint64_t)0;
> @@ -958,7 +1071,7 @@ static int construct_vmcs(struct vcpu *v)
>          __vmwrite(GUEST_INTR_STATUS, 0);
>      }
>  
> -    if ( cpu_has_vmx_posted_intr_processing )
> +    if ( cpu_has_vmx_posted_intr_processing && !is_pvh_vcpu(v) )
>      {
>          __vmwrite(PI_DESC_ADDR, virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
>          __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);

These are likely to be meant as temporary changes only? In which
case they ought to be marked as such.

> @@ -1032,16 +1150,36 @@ static int construct_vmcs(struct vcpu *v)
>  
>      v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
>                | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
> +              | (is_pvh_vcpu(v) ? (1U << TRAP_int3) | (1U << TRAP_debug) : 0)
>                | (1U << TRAP_no_device);

What's so special about PVH that it requires these extra intercepts?

>      v->arch.hvm_vcpu.guest_cr[0] = X86_CR0_PE | X86_CR0_ET;
> -    hvm_update_guest_cr(v, 0);
> +    if ( is_pvh_vcpu(v) )

I dislike your attitude of considering PVH the most important (thus
handled first) mode in general, but here it becomes most obvious:
The pre-existing code should really come first, and the new mode
should be handled in the "else" path. Same further down for the
CR4 handling.

> +    {
> +        u64 tmpval = v->arch.hvm_vcpu.guest_cr[0] | X86_CR0_PG | X86_CR0_NE;
> +        v->arch.hvm_vcpu.hw_cr[0] = v->arch.hvm_vcpu.guest_cr[0] = tmpval;

So you set hw_cr[0] while I saw you saying hw_cr[4] won't be
touched by PVH in a reply to the v10 series. Such
inconsistencies are just calling for subsequent bugs. Either PVH
set all valid hw_cr[] fields, or it clearly states somewhere that
none of them are used (and then the assignment above ought to
go away).

> @@ -1297,6 +1440,9 @@ void vmx_do_resume(struct vcpu *v)
>          hvm_asid_flush_vcpu(v);
>      }
>  
> +    if ( is_pvh_vcpu(v) )
> +        reset_stack_and_jump(vmx_asm_do_vmentry);
> +
>      debug_state = v->domain->debugger_attached
>                    || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_INT3]
>                    || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP];

Missing a "fixme" annotation?

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
                   ` (20 preceding siblings ...)
  2013-08-23  1:19 ` [V11 PATCH 21/21] PVH xen: Checks, asserts, and limitations " Mukesh Rathor
@ 2013-08-23  8:49 ` Jan Beulich
  2013-08-23 11:15   ` George Dunlap
  21 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2013-08-23  8:49 UTC (permalink / raw)
  To: George Dunlap, Mukesh Rathor; +Cc: xen-devel

>>> On 23.08.13 at 03:18, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> Finally, I've the V11 set of patches.
> 
> V11:
>    - gdt union patch not needed anymore, so dropped it.
>    - patch 17 made the last patch
>    - merged patch 22 and 23. 

So I'd be okay with applying 1...8 and 10...16, provided
- you, Mukesh, can confirm that 9 can safely be left out,
- you, George, don't object to that (considering your comments
  on v10).

Additionally, Mukesh, please finally get used to Cc-ing respective
maintainers. Considering the changes all have is_pvh_...() around
critical changes, and considering that Keir had ack-ed the whole
v10, I'm willing to ignore the general need for e.g. VMX maintainers
to ack VMX-specific changes. But now that you successfully ignored
this rule for the 11th time, I'm afraid you really got used to this and
won't change without being explicitly told.

And while at formal aspects, a minor other remark: The up-to-date
xen-devel address is as Cc-ed here, @lists.xensource.com is _two_
generations old at this point.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 20/21] PVH xen: introduce vmexit handler for PVH
  2013-08-23  1:19 ` [V11 PATCH 20/21] PVH xen: introduce vmexit handler for PVH Mukesh Rathor
@ 2013-08-23  9:12   ` Jan Beulich
  2013-08-24  0:35     ` Mukesh Rathor
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2013-08-23  9:12 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> Changes in V11:
>   - merge this with previous patch "prep changes".
>   - allow invalid op emulation for kernel mode also.
>   - Use CR0_READ_SHADOW instead of GUEST_CR0.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Acked-by: Keir Fraser <keir@xen.org>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>

Again the changes above void the tags here.

> +static int vmxit_msr_read(struct cpu_user_regs *regs)
> +{
> +    u64 msr_content = 0;
> +
> +    switch ( regs->ecx )

Did you mean regs->_ecx?

> +    default:
> +        /* PVH fixme: see hvm_msr_read_intercept(). */
> +        rdmsrl(regs->ecx, msr_content);

So what does this comment refer to? There's no change to the
referred to function here. And it seems rather questionable that
reading the physical MSR values for everything but
MSR_IA32_MISC_ENABLE is correct/secure. I appreciate the
"fixme" annotation, but I'm afraid this is not sufficient here.

> +/* Returns : 0 == msr written successfully. */
> +static int vmxit_msr_write(struct cpu_user_regs *regs)
> +{
> +    uint64_t msr_content = regs->eax | (regs->edx << 32);

And similarly to the above regs->_eax?

> +static int vmxit_debug(struct cpu_user_regs *regs)
> +{
> +    struct vcpu *vp = current;

This variable is to be named either "v" or, preferably, "curr". Same
further down.

> +static int vmxit_exception(struct cpu_user_regs *regs)
> +{
> +    int vector = (__vmread(VM_EXIT_INTR_INFO)) & INTR_INFO_VECTOR_MASK;
> +    int rc = -ENOSYS;
> +
> +    dbgp1(" EXCPT: vec:%#x cs:%#lx rip:%#lx\n", vector,
> +          __vmread(GUEST_CS_SELECTOR), regs->eip);

Do you continue to have these funny dbgp constructs in here. Are
they supposed to go away before this gets committed? If not,
please use a model similar to HVM_DBG_LOG().

> +static int vmxit_io_instr(struct cpu_user_regs *regs)
> +{
> +    struct segment_register seg;
> +    int requested = (regs->rflags & X86_EFLAGS_IOPL) >> 12;
> +    int curr_lvl = (regs->rflags & X86_EFLAGS_VM) ? 3 : 0;
> +
> +    if ( curr_lvl == 0 )
> +    {
> +        hvm_get_segment_register(current, x86_seg_ss, &seg);
> +        curr_lvl = seg.attr.fields.dpl;
> +    }
> +    if ( requested >= curr_lvl && emulate_privileged_op(regs) )
> +        return 0;
> +
> +    hvm_inject_hw_exception(TRAP_gp_fault, regs->error_code);

I don't think reg->error_code is valid here, I think this needs to be
read from the VMCS.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 19/21] PVH xen: VMX support of PVH guest creation/destruction
  2013-08-23  1:19 ` [V11 PATCH 19/21] PVH xen: VMX " Mukesh Rathor
@ 2013-08-23  9:14   ` Jan Beulich
  2013-08-24  0:27     ` Mukesh Rathor
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2013-08-23  9:14 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>  static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
>  {
> +    /*
> +     * PVH guest never causes CR3 write vmexit. This is called during the guest
> +     * setup.
> +     */
> +    if ( is_pvh_vcpu(v) && cr != 3 )
> +    {
> +        printk(XENLOG_G_ERR
> +               "PVH: d%d v%d unexpected cr%d update at rip:%lx\n",
> +               v->domain->domain_id, v->vcpu_id, cr, __vmread(GUEST_RIP));
> +        return;
> +    }
> +
>      vmx_vmcs_enter(v);
>  
>      switch ( cr )
> @@ -1183,7 +1195,7 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
>          /* CR2 is updated in exit stub. */
>          break;
>      case 3:
> -        if ( paging_mode_hap(v->domain) )
> +        if ( paging_mode_hap(v->domain) && !is_pvh_vcpu(v) )

This seems redundant with the check above?

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-23  8:49 ` [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Jan Beulich
@ 2013-08-23 11:15   ` George Dunlap
  2013-08-23 12:05     ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: George Dunlap @ 2013-08-23 11:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, Aug 23, 2013 at 9:49 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 23.08.13 at 03:18, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>> Finally, I've the V11 set of patches.
>>
>> V11:
>>    - gdt union patch not needed anymore, so dropped it.
>>    - patch 17 made the last patch
>>    - merged patch 22 and 23.
>
> So I'd be okay with applying 1...8 and 10...16, provided
> - you, Mukesh, can confirm that 9 can safely be left out,
> - you, George, don't object to that (considering your comments
>   on v10).

1-8,10-16 I'm OK with the code for the most part, but the changesets
themselves leave something to be desired.

Many of the prep patches would be fine, and the e820 struct relocate
is OK as well (though the changelog entry isn't really good).

But the read_segment_register patch I think needs to be put in after
the is_pvh_*() patch, so the entire new bit of functionality comes in
one go.  And the guest_kernel_mode() change should be a separate
patch, since it performs a similar function to read_segment_register()
-- i.e., enabling the emulated PV ops.

In many cases, there are handfuls of other "!is_hvm" -> "is_pv"
scattered randomly throughout unrelated other changes.  And some of
the changes from patches 15-16 I think should be grouped together with
later changesets (e.g., all the irq-related ones in a single
changeset).

Also, I think that having a separate set of nearly-identical exit
handlers for PVH is a really bad idea.  Without them, however, pvh.c
is only a single small function long -- so I think we shouldn't bother
with pvh.c, and should just put that function into vmx.c.

All in all, I would personally prefer if you hold off until my series
re-work; I should have something by the end of next week.

My basic outline for the re-worked patch series looks like the
following (NOT one patch per bullet):
- Prep patches
- Introduce pvh domain type
- Disable unused HVM functionality
- Enable used PV functionality

What do you think?

 -George

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-23 11:15   ` George Dunlap
@ 2013-08-23 12:05     ` Jan Beulich
  2013-08-24  0:40       ` Mukesh Rathor
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2013-08-23 12:05 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

>>> On 23.08.13 at 13:15, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> On Fri, Aug 23, 2013 at 9:49 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 23.08.13 at 03:18, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>>> Finally, I've the V11 set of patches.
>>>
>>> V11:
>>>    - gdt union patch not needed anymore, so dropped it.
>>>    - patch 17 made the last patch
>>>    - merged patch 22 and 23.
>>
>> So I'd be okay with applying 1...8 and 10...16, provided
>> - you, Mukesh, can confirm that 9 can safely be left out,
>> - you, George, don't object to that (considering your comments
>>   on v10).
> 
> 1-8,10-16 I'm OK with the code for the most part, but the changesets
> themselves leave something to be desired.
> 
> Many of the prep patches would be fine, and the e820 struct relocate
> is OK as well (though the changelog entry isn't really good).
> 
> But the read_segment_register patch I think needs to be put in after
> the is_pvh_*() patch, so the entire new bit of functionality comes in
> one go.  And the guest_kernel_mode() change should be a separate
> patch, since it performs a similar function to read_segment_register()
> -- i.e., enabling the emulated PV ops.
> 
> In many cases, there are handfuls of other "!is_hvm" -> "is_pv"
> scattered randomly throughout unrelated other changes.  And some of
> the changes from patches 15-16 I think should be grouped together with
> later changesets (e.g., all the irq-related ones in a single
> changeset).
> 
> Also, I think that having a separate set of nearly-identical exit
> handlers for PVH is a really bad idea.  Without them, however, pvh.c
> is only a single small function long -- so I think we shouldn't bother
> with pvh.c, and should just put that function into vmx.c.
> 
> All in all, I would personally prefer if you hold off until my series
> re-work; I should have something by the end of next week.
> 
> My basic outline for the re-worked patch series looks like the
> following (NOT one patch per bullet):
> - Prep patches
> - Introduce pvh domain type
> - Disable unused HVM functionality
> - Enable used PV functionality
> 
> What do you think?

Fine with me, but perhaps Mukesh won't be that happy...

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 17/21] PVH xen: vmcs related changes
  2013-08-23  8:41   ` Jan Beulich
@ 2013-08-24  0:26     ` Mukesh Rathor
  2013-08-26  8:15       ` Jan Beulich
  2013-08-27 17:00       ` George Dunlap
  0 siblings, 2 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-24  0:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 23 Aug 2013 09:41:55 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> > This patch contains vmcs changes related for PVH, mainly creating a
> > VMCS for PVH guest.
> > 
> > Changes in V11:
> >    - Remove pvh_construct_vmcs and make it part of construct_vmcs
> > 
> > -    v->arch.hvm_vmx.secondary_exec_control &=
> > ~SECONDARY_EXEC_ENABLE_VPID;
> > +        /* Intel SDM: resvd bits are 0 */
> > +        v->arch.hvm_vmx.secondary_exec_control = bits;
> > +    }
> > +    else
> > +    {
> > +        v->arch.hvm_vmx.secondary_exec_control =
> > vmx_secondary_exec_control; +
> > +        /* Disable VPID for now: we decide when to enable it on
> > VMENTER. */
> > +        v->arch.hvm_vmx.secondary_exec_control &=
> > ~SECONDARY_EXEC_ENABLE_VPID;
> > +    }
> 
> So this difference in VPID handling needs some explanation, as I
> don't immediately see how you adjust the code referred to by the
> non-PVH comment above.

        /* Phase I PVH: we run with minimal secondary exec features */
        u32 bits = SECONDARY_EXEC_ENABLE_EPT | SECONDARY_EXEC_ENABLE_VPID;

Somehow I had concluded it was harmless to have it that way, but I
can't remember now. Since, we have common function now, it could just
do the HVM way too, ie, disabled here since we've already checked
cpu_has_vmx_vpid in pvh_check_requirements.

> >      if ( paging_mode_hap(d) )
> >      {
> >          v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING
> > | CPU_BASED_CR3_LOAD_EXITING |
> >                                            CPU_BASED_CR3_STORE_EXITING);
> > +        if ( is_pvh_vcpu(v) )
> > +        {
> > +            u32 bits = CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
> > +                       CPU_BASED_ACTIVATE_MSR_BITMAP;
> > +            v->arch.hvm_vmx.exec_control |= bits;
> > +
> > +            bits = CPU_BASED_USE_TSC_OFFSETING |
> > CPU_BASED_TPR_SHADOW |
> > +                   CPU_BASED_VIRTUAL_NMI_PENDING;
> > +            v->arch.hvm_vmx.exec_control &= ~bits;
> > +        }
> 
> Did you notice that the original adjustments were truly HAP-related?
> Putting your PVH code here just because it takes HAP as a prereq is
> not the way to go. Wherever the flags you want set get set for the
> HVM case, you should add your adjustments (if any are necessary
> at all).

Ok. They, the adjustments, are necessary in Phase I.

> This is misplaced too - we're past the point of determining the set of
> flags, and already in the process of committing them. The code
> again should go alongside where the corresponding HVM code sits.
> 
> > -    if ( cpu_has_vmx_ple )
> > +    if ( cpu_has_vmx_ple && !is_pvh_vcpu(v) )
> >      {
> >          __vmwrite(PLE_GAP, ple_gap);
> >          __vmwrite(PLE_WINDOW, ple_window);
> 
> Why would this be conditional upon !PVH?

We don't have intercept for PLE right now for PVH in phase I.
Easy to add tho.

> > +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS,
> > msr_type);
> > +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP,
> > msr_type);
> > +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP,
> > msr_type); +
> >          if ( cpu_has_vmx_pat && paging_mode_hap(d) )
> > -            vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT,
> > MSR_TYPE_R | MSR_TYPE_W);
> > +            vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT,
> > msr_type);
> 
> These changes all look pointless; if you really think they're
> worthwhile cleanup, they don't belong here.

They violate coding style (lines longer than 80). but anyways..

> > @@ -1032,16 +1150,36 @@ static int construct_vmcs(struct vcpu *v)
> >  
> >      v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
> >                | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
> > +              | (is_pvh_vcpu(v) ? (1U << TRAP_int3) | (1U <<
> > TRAP_debug) : 0) | (1U << TRAP_no_device);
> 
> What's so special about PVH that it requires these extra intercepts?

HVM does the same in vmx_update_debug_state() called from 
vmx_update_guest_cr() which we don't call for cr0. We need the
two for gdbsx and kdb, and they are harmless too as we inject back into
the guest if we don't own the exception, jfyi.

> So you set hw_cr[0] while I saw you saying hw_cr[4] won't be
> touched by PVH in a reply to the v10 series. Such
> inconsistencies are just calling for subsequent bugs. Either PVH
> set all valid hw_cr[] fields, or it clearly states somewhere that
> none of them are used (and then the assignment above ought to
> go away).

With commonising PVH with HVM, hw_cr gets used. Otherwise the intention
was not to use it. I don't see a huge problem at this point with this.

> > @@ -1297,6 +1440,9 @@ void vmx_do_resume(struct vcpu *v)
> >          hvm_asid_flush_vcpu(v);
> >      }
> >  
> > +    if ( is_pvh_vcpu(v) )
> > +        reset_stack_and_jump(vmx_asm_do_vmentry);
> > +
> >      debug_state = v->domain->debugger_attached
> >                    ||
> > v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_INT3] ||
> > v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP];
> 
> Missing a "fixme" annotation?

PVH supports gdbsx, I don't think we support PV for this external debugger,
and hence PVH support is not intended also.

thanks
Mukesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 19/21] PVH xen: VMX support of PVH guest creation/destruction
  2013-08-23  9:14   ` Jan Beulich
@ 2013-08-24  0:27     ` Mukesh Rathor
  0 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-24  0:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 23 Aug 2013 10:14:11 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> >  static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
> >  {
> > +    /*
> > +     * PVH guest never causes CR3 write vmexit. This is called
> > during the guest
> > +     * setup.
> > +     */
> > +    if ( is_pvh_vcpu(v) && cr != 3 )
> > +    {
> > +        printk(XENLOG_G_ERR
> > +               "PVH: d%d v%d unexpected cr%d update at rip:%lx\n",
> > +               v->domain->domain_id, v->vcpu_id, cr,
> > __vmread(GUEST_RIP));
> > +        return;
> > +    }
> > +
> >      vmx_vmcs_enter(v);
> >  
> >      switch ( cr )
> > @@ -1183,7 +1195,7 @@ static void vmx_update_guest_cr(struct vcpu
> > *v, unsigned int cr) /* CR2 is updated in exit stub. */
> >          break;
> >      case 3:
> > -        if ( paging_mode_hap(v->domain) )
> > +        if ( paging_mode_hap(v->domain) && !is_pvh_vcpu(v) )
> 
> This seems redundant with the check above?

We are trying to avoid unnecessary call to vmx_load_pdptrs(v).

-Mukesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 20/21] PVH xen: introduce vmexit handler for PVH
  2013-08-23  9:12   ` Jan Beulich
@ 2013-08-24  0:35     ` Mukesh Rathor
  2013-08-26  8:22       ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-24  0:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 23 Aug 2013 10:12:16 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> > Changes in V11:
> >   - merge this with previous patch "prep changes".
> >   - allow invalid op emulation for kernel mode also.
> >   - Use CR0_READ_SHADOW instead of GUEST_CR0.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Acked-by: Keir Fraser <keir@xen.org>
> > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > PV-HVM-Regression-Tested-by: Andrew Cooper
> > <andrew.cooper3@citrix.com>
> 
> Again the changes above void the tags here.
> 
> > +static int vmxit_msr_read(struct cpu_user_regs *regs)
> > +{
> > +    u64 msr_content = 0;
> > +
> > +    switch ( regs->ecx )
> 
> Did you mean regs->_ecx?

Hmm.. don't understand why? HVM uses ecx:

  hvm_msr_read_intercept(regs->ecx, &msr_content) == X86EMUL_OKAY )


> > +    default:
> > +        /* PVH fixme: see hvm_msr_read_intercept(). */
> > +        rdmsrl(regs->ecx, msr_content);
> 
> So what does this comment refer to? There's no change to the
> referred to function here. And it seems rather questionable that
> reading the physical MSR values for everything but
> MSR_IA32_MISC_ENABLE is correct/secure. I appreciate the
> "fixme" annotation, but I'm afraid this is not sufficient here.

Yes, it needs to be revisited, best with AMD port so that a good
solution can be contrived for PVH.

> > +{
> > +    int vector = (__vmread(VM_EXIT_INTR_INFO)) &
> > INTR_INFO_VECTOR_MASK;
> > +    int rc = -ENOSYS;
> > +
> > +    dbgp1(" EXCPT: vec:%#x cs:%#lx rip:%#lx\n", vector,
> > +          __vmread(GUEST_CS_SELECTOR), regs->eip);
> 
> Do you continue to have these funny dbgp constructs in here. Are
> they supposed to go away before this gets committed? If not,
> please use a model similar to HVM_DBG_LOG().

Like the commit log says, it helps debug, but can be removed anytime.
I left it there thinking it might be useful for first couple months
while it gets thoroughly tested.

> > +static int vmxit_io_instr(struct cpu_user_regs *regs)
> > +{
> > +    struct segment_register seg;
> > +    int requested = (regs->rflags & X86_EFLAGS_IOPL) >> 12;
> > +    int curr_lvl = (regs->rflags & X86_EFLAGS_VM) ? 3 : 0;
> > +
> > +    if ( curr_lvl == 0 )
> > +    {
> > +        hvm_get_segment_register(current, x86_seg_ss, &seg);
> > +        curr_lvl = seg.attr.fields.dpl;
> > +    }
> > +    if ( requested >= curr_lvl && emulate_privileged_op(regs) )
> > +        return 0;
> > +
> > +    hvm_inject_hw_exception(TRAP_gp_fault, regs->error_code);
> 
> I don't think reg->error_code is valid here, I think this needs to be
> read from the VMCS.

Correct. Thats a bug. 

thanks
Mukesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-23 12:05     ` Jan Beulich
@ 2013-08-24  0:40       ` Mukesh Rathor
  2013-08-27 17:05         ` George Dunlap
  0 siblings, 1 reply; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-24  0:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel

On Fri, 23 Aug 2013 13:05:08 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 23.08.13 at 13:15, George Dunlap <George.Dunlap@eu.citrix.com>
> >>> wrote:
> > On Fri, Aug 23, 2013 at 9:49 AM, Jan Beulich <JBeulich@suse.com>
> > wrote:
> >>>>> On 23.08.13 at 03:18, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>>>> wrote:
> >>> Finally, I've the V11 set of patches.
> >>>
> >>> V11:
> >>>    - gdt union patch not needed anymore, so dropped it.
> >>>    - patch 17 made the last patch
> >>>    - merged patch 22 and 23.
> >>
> >> So I'd be okay with applying 1...8 and 10...16, provided
> >> - you, Mukesh, can confirm that 9 can safely be left out,
> >> - you, George, don't object to that (considering your comments
> >>   on v10).
> > 
> > 1-8,10-16 I'm OK with the code for the most part, but the changesets
> > themselves leave something to be desired.
> > 
> > Many of the prep patches would be fine, and the e820 struct relocate
> > is OK as well (though the changelog entry isn't really good).
> > 
> > But the read_segment_register patch I think needs to be put in after
> > the is_pvh_*() patch, so the entire new bit of functionality comes
> > in one go.  And the guest_kernel_mode() change should be a separate
> > patch, since it performs a similar function to
> > read_segment_register() -- i.e., enabling the emulated PV ops.
> > 
> > In many cases, there are handfuls of other "!is_hvm" -> "is_pv"
> > scattered randomly throughout unrelated other changes.  And some of
> > the changes from patches 15-16 I think should be grouped together
> > with later changesets (e.g., all the irq-related ones in a single
> > changeset).
> > 
> > Also, I think that having a separate set of nearly-identical exit
> > handlers for PVH is a really bad idea.  Without them, however, pvh.c
> > is only a single small function long -- so I think we shouldn't
> > bother with pvh.c, and should just put that function into vmx.c.
> > 
> > All in all, I would personally prefer if you hold off until my
> > series re-work; I should have something by the end of next week.
> > 
> > My basic outline for the re-worked patch series looks like the
> > following (NOT one patch per bullet):
> > - Prep patches
> > - Introduce pvh domain type
> > - Disable unused HVM functionality
> > - Enable used PV functionality
> > 
> > What do you think?
> 
> Fine with me, but perhaps Mukesh won't be that happy...

It's OK. I'd like this to be merged in asap so I and others can working
on the FIXME's right away...

thanks
mukesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 17/21] PVH xen: vmcs related changes
  2013-08-24  0:26     ` Mukesh Rathor
@ 2013-08-26  8:15       ` Jan Beulich
  2013-08-27 17:00       ` George Dunlap
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2013-08-26  8:15 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 24.08.13 at 02:26, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Fri, 23 Aug 2013 09:41:55 +0100 "Jan Beulich" <JBeulich@suse.com> wrote:
>> >>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>> > -    if ( cpu_has_vmx_ple )
>> > +    if ( cpu_has_vmx_ple && !is_pvh_vcpu(v) )
>> >      {
>> >          __vmwrite(PLE_GAP, ple_gap);
>> >          __vmwrite(PLE_WINDOW, ple_window);
>> 
>> Why would this be conditional upon !PVH?
> 
> We don't have intercept for PLE right now for PVH in phase I.
> Easy to add tho.

As pointed out before - any such temporary things need to be
properly annotated, so they can be understood by readers and
found when looking for remaining ones.

>> > +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS,
>> > msr_type);
>> > +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP,
>> > msr_type);
>> > +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP,
>> > msr_type); +
>> >          if ( cpu_has_vmx_pat && paging_mode_hap(d) )
>> > -            vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT,
>> > MSR_TYPE_R | MSR_TYPE_W);
>> > +            vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT,
>> > msr_type);
>> 
>> These changes all look pointless; if you really think they're
>> worthwhile cleanup, they don't belong here.
> 
> They violate coding style (lines longer than 80). but anyways..

I wouldn't mind such clean up if you needed to touch the lines
anyway. But in a series that's large and has been causing quite
a bit of discussion pure reformatting should be avoided unless
done in a separate patch.

>> So you set hw_cr[0] while I saw you saying hw_cr[4] won't be
>> touched by PVH in a reply to the v10 series. Such
>> inconsistencies are just calling for subsequent bugs. Either PVH
>> set all valid hw_cr[] fields, or it clearly states somewhere that
>> none of them are used (and then the assignment above ought to
>> go away).
> 
> With commonising PVH with HVM, hw_cr gets used. Otherwise the intention
> was not to use it. I don't see a huge problem at this point with this.

The problem is the inconsistency: Maintenance of this code will be
more difficult if things aren't done in a half way predictable way.
And inconsistency is never predictable.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 20/21] PVH xen: introduce vmexit handler for PVH
  2013-08-24  0:35     ` Mukesh Rathor
@ 2013-08-26  8:22       ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2013-08-26  8:22 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel

>>> On 24.08.13 at 02:35, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Fri, 23 Aug 2013 10:12:16 +0100 "Jan Beulich" <JBeulich@suse.com> wrote:
>> >>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>> > +static int vmxit_msr_read(struct cpu_user_regs *regs)
>> > +{
>> > +    u64 msr_content = 0;
>> > +
>> > +    switch ( regs->ecx )
>> 
>> Did you mean regs->_ecx?
> 
> Hmm.. don't understand why? HVM uses ecx:
> 
>   hvm_msr_read_intercept(regs->ecx, &msr_content) == X86EMUL_OKAY )

And the declaration of that function is

int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content);

i.e. the 64-bit regs->ecx correctly gets truncated to 32 bits while
passing arguments to the function.

We've had quite a few similar bugs in the code in the past, so I'd
really appreciate if you could avoid introducing similar ones again.

>> > +    default:
>> > +        /* PVH fixme: see hvm_msr_read_intercept(). */
>> > +        rdmsrl(regs->ecx, msr_content);
>> 
>> So what does this comment refer to? There's no change to the
>> referred to function here. And it seems rather questionable that
>> reading the physical MSR values for everything but
>> MSR_IA32_MISC_ENABLE is correct/secure. I appreciate the
>> "fixme" annotation, but I'm afraid this is not sufficient here.
> 
> Yes, it needs to be revisited, best with AMD port so that a good
> solution can be contrived for PVH.

Nice that you say "yes" here, but the request was to make the
comment understandable to others than just you.

>> > +{
>> > +    int vector = (__vmread(VM_EXIT_INTR_INFO)) &
>> > INTR_INFO_VECTOR_MASK;
>> > +    int rc = -ENOSYS;
>> > +
>> > +    dbgp1(" EXCPT: vec:%#x cs:%#lx rip:%#lx\n", vector,
>> > +          __vmread(GUEST_CS_SELECTOR), regs->eip);
>> 
>> Do you continue to have these funny dbgp constructs in here. Are
>> they supposed to go away before this gets committed? If not,
>> please use a model similar to HVM_DBG_LOG().
> 
> Like the commit log says, it helps debug, but can be removed anytime.
> I left it there thinking it might be useful for first couple months
> while it gets thoroughly tested.

And as said - I don't mind it left there if done properly.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 17/21] PVH xen: vmcs related changes
  2013-08-24  0:26     ` Mukesh Rathor
  2013-08-26  8:15       ` Jan Beulich
@ 2013-08-27 17:00       ` George Dunlap
  2013-08-27 22:43         ` Mukesh Rathor
  1 sibling, 1 reply; 49+ messages in thread
From: George Dunlap @ 2013-08-27 17:00 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel, Jan Beulich

On Sat, Aug 24, 2013 at 1:26 AM, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Fri, 23 Aug 2013 09:41:55 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
>
>> >>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@oracle.com>
>> >>> wrote:
>> > This patch contains vmcs changes related for PVH, mainly creating a
>> > VMCS for PVH guest.
>> >
>> > Changes in V11:
>> >    - Remove pvh_construct_vmcs and make it part of construct_vmcs
>> >
>> > -    v->arch.hvm_vmx.secondary_exec_control &=
>> > ~SECONDARY_EXEC_ENABLE_VPID;
>> > +        /* Intel SDM: resvd bits are 0 */
>> > +        v->arch.hvm_vmx.secondary_exec_control = bits;
>> > +    }
>> > +    else
>> > +    {
>> > +        v->arch.hvm_vmx.secondary_exec_control =
>> > vmx_secondary_exec_control; +
>> > +        /* Disable VPID for now: we decide when to enable it on
>> > VMENTER. */
>> > +        v->arch.hvm_vmx.secondary_exec_control &=
>> > ~SECONDARY_EXEC_ENABLE_VPID;
>> > +    }
>>
>> So this difference in VPID handling needs some explanation, as I
>> don't immediately see how you adjust the code referred to by the
>> non-PVH comment above.
>
>         /* Phase I PVH: we run with minimal secondary exec features */
>         u32 bits = SECONDARY_EXEC_ENABLE_EPT | SECONDARY_EXEC_ENABLE_VPID;
>
> Somehow I had concluded it was harmless to have it that way, but I
> can't remember now. Since, we have common function now, it could just
> do the HVM way too, ie, disabled here since we've already checked
> cpu_has_vmx_vpid in pvh_check_requirements.
>
>> >      if ( paging_mode_hap(d) )
>> >      {
>> >          v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING
>> > | CPU_BASED_CR3_LOAD_EXITING |
>> >                                            CPU_BASED_CR3_STORE_EXITING);
>> > +        if ( is_pvh_vcpu(v) )
>> > +        {
>> > +            u32 bits = CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
>> > +                       CPU_BASED_ACTIVATE_MSR_BITMAP;
>> > +            v->arch.hvm_vmx.exec_control |= bits;
>> > +
>> > +            bits = CPU_BASED_USE_TSC_OFFSETING |
>> > CPU_BASED_TPR_SHADOW |
>> > +                   CPU_BASED_VIRTUAL_NMI_PENDING;
>> > +            v->arch.hvm_vmx.exec_control &= ~bits;
>> > +        }
>>
>> Did you notice that the original adjustments were truly HAP-related?
>> Putting your PVH code here just because it takes HAP as a prereq is
>> not the way to go. Wherever the flags you want set get set for the
>> HVM case, you should add your adjustments (if any are necessary
>> at all).
>
> Ok. They, the adjustments, are necessary in Phase I.
>
>> This is misplaced too - we're past the point of determining the set of
>> flags, and already in the process of committing them. The code
>> again should go alongside where the corresponding HVM code sits.
>>
>> > -    if ( cpu_has_vmx_ple )
>> > +    if ( cpu_has_vmx_ple && !is_pvh_vcpu(v) )
>> >      {
>> >          __vmwrite(PLE_GAP, ple_gap);
>> >          __vmwrite(PLE_WINDOW, ple_window);
>>
>> Why would this be conditional upon !PVH?
>
> We don't have intercept for PLE right now for PVH in phase I.
> Easy to add tho.

Or, you could just use the pre-existing VMX exits.  That's what I've
done in my port.

>> > @@ -1032,16 +1150,36 @@ static int construct_vmcs(struct vcpu *v)
>> >
>> >      v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
>> >                | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
>> > +              | (is_pvh_vcpu(v) ? (1U << TRAP_int3) | (1U <<
>> > TRAP_debug) : 0) | (1U << TRAP_no_device);
>>
>> What's so special about PVH that it requires these extra intercepts?
>
> HVM does the same in vmx_update_debug_state() called from
> vmx_update_guest_cr() which we don't call for cr0. We need the
> two for gdbsx and kdb, and they are harmless too as we inject back into
> the guest if we don't own the exception, jfyi.

What you mean is, when an HVM guest sets the cr0 to come out of real
mode, these will be set from vmx_update_debug_state().  Since PVH
never goes through that transition, we need to set them at
start-of-day.  Is that right?

It seems like it would be better to just call vmx_update_debug_state()
directly, with a comment about the lack of real-mode transition.

 -George

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-24  0:40       ` Mukesh Rathor
@ 2013-08-27 17:05         ` George Dunlap
  2013-08-27 19:18           ` Mukesh Rathor
  2013-08-28  0:37           ` Mukesh Rathor
  0 siblings, 2 replies; 49+ messages in thread
From: George Dunlap @ 2013-08-27 17:05 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel, Jan Beulich

On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Fri, 23 Aug 2013 13:05:08 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
>
>> >>> On 23.08.13 at 13:15, George Dunlap <George.Dunlap@eu.citrix.com>
>> >>> wrote:
>> > On Fri, Aug 23, 2013 at 9:49 AM, Jan Beulich <JBeulich@suse.com>
>> > wrote:
>> >>>>> On 23.08.13 at 03:18, Mukesh Rathor <mukesh.rathor@oracle.com>
>> >>>>> wrote:
>> >>> Finally, I've the V11 set of patches.
>> >>>
>> >>> V11:
>> >>>    - gdt union patch not needed anymore, so dropped it.
>> >>>    - patch 17 made the last patch
>> >>>    - merged patch 22 and 23.
>> >>
>> >> So I'd be okay with applying 1...8 and 10...16, provided
>> >> - you, Mukesh, can confirm that 9 can safely be left out,
>> >> - you, George, don't object to that (considering your comments
>> >>   on v10).
>> >
>> > 1-8,10-16 I'm OK with the code for the most part, but the changesets
>> > themselves leave something to be desired.
>> >
>> > Many of the prep patches would be fine, and the e820 struct relocate
>> > is OK as well (though the changelog entry isn't really good).
>> >
>> > But the read_segment_register patch I think needs to be put in after
>> > the is_pvh_*() patch, so the entire new bit of functionality comes
>> > in one go.  And the guest_kernel_mode() change should be a separate
>> > patch, since it performs a similar function to
>> > read_segment_register() -- i.e., enabling the emulated PV ops.
>> >
>> > In many cases, there are handfuls of other "!is_hvm" -> "is_pv"
>> > scattered randomly throughout unrelated other changes.  And some of
>> > the changes from patches 15-16 I think should be grouped together
>> > with later changesets (e.g., all the irq-related ones in a single
>> > changeset).
>> >
>> > Also, I think that having a separate set of nearly-identical exit
>> > handlers for PVH is a really bad idea.  Without them, however, pvh.c
>> > is only a single small function long -- so I think we shouldn't
>> > bother with pvh.c, and should just put that function into vmx.c.
>> >
>> > All in all, I would personally prefer if you hold off until my
>> > series re-work; I should have something by the end of next week.
>> >
>> > My basic outline for the re-worked patch series looks like the
>> > following (NOT one patch per bullet):
>> > - Prep patches
>> > - Introduce pvh domain type
>> > - Disable unused HVM functionality
>> > - Enable used PV functionality
>> >
>> > What do you think?
>>
>> Fine with me, but perhaps Mukesh won't be that happy...
>
> It's OK. I'd like this to be merged in asap so I and others can working
> on the FIXME's right away...

I'm still waiting on the toolstack changes that are needed to actually
put it in PVH mode before I can test it.

 -George

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-27 17:05         ` George Dunlap
@ 2013-08-27 19:18           ` Mukesh Rathor
  2013-08-28 11:20             ` George Dunlap
  2013-08-28  0:37           ` Mukesh Rathor
  1 sibling, 1 reply; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-27 19:18 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 16324 bytes --]

On Tue, 27 Aug 2013 18:05:00 +0100
George Dunlap <George.Dunlap@eu.citrix.com> wrote:

> On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor
> <mukesh.rathor@oracle.com> wrote:
> > On Fri, 23 Aug 2013 13:05:08 +0100
> > "Jan Beulich" <JBeulich@suse.com> wrote:
> >
....
> >
> > It's OK. I'd like this to be merged in asap so I and others can
> > working on the FIXME's right away...
> 
> I'm still waiting on the toolstack changes that are needed to actually
> put it in PVH mode before I can test it.
> 
>  -George

Oh, I thought I had sent you already. Anyways, here's the unofficial
version I've in my tree c/s: 704302ce9404c73cfb687d31adcf67094ab5bb53

Both inlined and attachd.

thanks
Mukesh



diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 069b73f..ff48653 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -634,6 +634,9 @@ if your particular guest kernel does not require this behaviour then
 it is safe to allow this to be enabled but you may wish to disable it
 anyway.
 
+=item B<pvh=BOOLEAN>
+Selects whether to run this guest in an HVM container. Default is 0.
+
 =back
 
 =head2 Fully-virtualised (HVM) Guest Specific Options
diff --git a/tools/debugger/gdbsx/xg/xg_main.c b/tools/debugger/gdbsx/xg/xg_main.c
index 64c7484..5736b86 100644
--- a/tools/debugger/gdbsx/xg/xg_main.c
+++ b/tools/debugger/gdbsx/xg/xg_main.c
@@ -81,6 +81,7 @@ int xgtrc_on = 0;
 struct xen_domctl domctl;         /* just use a global domctl */
 
 static int     _hvm_guest;        /* hvm guest? 32bit HVMs have 64bit context */
+static int     _pvh_guest;        /* PV guest in HVM container */
 static domid_t _dom_id;           /* guest domid */
 static int     _max_vcpu_id;      /* thus max_vcpu_id+1 VCPUs */
 static int     _dom0_fd;          /* fd of /dev/privcmd */
@@ -309,6 +310,7 @@ xg_attach(int domid, int guest_bitness)
 
     _max_vcpu_id = domctl.u.getdomaininfo.max_vcpu_id;
     _hvm_guest = (domctl.u.getdomaininfo.flags & XEN_DOMINF_hvm_guest);
+    _pvh_guest = (domctl.u.getdomaininfo.flags & XEN_DOMINF_pvh_guest);
     return _max_vcpu_id;
 }
 
@@ -369,7 +371,7 @@ _change_TF(vcpuid_t which_vcpu, int guest_bitness, int setit)
     int sz = sizeof(anyc);
 
     /* first try the MTF for hvm guest. otherwise do manually */
-    if (_hvm_guest) {
+    if (_hvm_guest || _pvh_guest) {
         domctl.u.debug_op.vcpu = which_vcpu;
         domctl.u.debug_op.op = setit ? XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON :
                                        XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF;
diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 512a994..a92e823 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -86,7 +86,7 @@ OSDEP_SRCS-y                 += xenctrl_osdep_ENOSYS.c
 -include $(XEN_TARGET_ARCH)/Makefile
 
 CFLAGS   += -Werror -Wmissing-prototypes
-CFLAGS   += -I. $(CFLAGS_xeninclude)
+CFLAGS   += -I. $(CFLAGS_xeninclude) -g -O0
 
 # Needed for posix_fadvise64() in xc_linux.c
 CFLAGS-$(CONFIG_Linux) += -D_GNU_SOURCE
diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h
index 86e23ee..5168bcd 100644
--- a/tools/libxc/xc_dom.h
+++ b/tools/libxc/xc_dom.h
@@ -130,6 +130,7 @@ struct xc_dom_image {
     domid_t console_domid;
     domid_t xenstore_domid;
     xen_pfn_t shared_info_mfn;
+    int pvh_enabled;
 
     xc_interface *xch;
     domid_t guest_domid;
diff --git a/tools/libxc/xc_dom_elfloader.c b/tools/libxc/xc_dom_elfloader.c
index 9843b1f..40b99f6 100644
--- a/tools/libxc/xc_dom_elfloader.c
+++ b/tools/libxc/xc_dom_elfloader.c
@@ -362,6 +362,18 @@ static elf_errorstatus xc_dom_parse_elf_kernel(struct xc_dom_image *dom)
         rc = -EINVAL;
         goto out;
     }
+    if ( dom->pvh_enabled )
+    {
+        if ( !elf_xen_feature_get(XENFEAT_auto_translated_physmap, 
+                                 dom->parms.f_supported) )
+        {
+            xc_dom_panic(dom->xch, XC_INVALID_KERNEL, "%s: Kernel does not"
+                         " support PVH mode. Supported: %x", __FUNCTION__,
+                         *dom->parms.f_supported);
+            rc = -EINVAL;
+            goto out;
+        }
+    }
 
     /* find kernel segment */
     dom->kernel_seg.vstart = dom->parms.virt_kstart;
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 126c0f8..111f001 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -415,7 +415,8 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
         pgpfn = (addr - dom->parms.virt_base) >> PAGE_SHIFT_X86;
         l1tab[l1off] =
             pfn_to_paddr(xc_dom_p2m_guest(dom, pgpfn)) | L1_PROT;
-        if ( (addr >= dom->pgtables_seg.vstart) && 
+        if ( (!dom->pvh_enabled)                &&
+             (addr >= dom->pgtables_seg.vstart) &&
              (addr < dom->pgtables_seg.vend) )
             l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
         if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
@@ -629,12 +630,6 @@ static int vcpu_x86_64(struct xc_dom_image *dom, void *ptr)
     /* clear everything */
     memset(ctxt, 0, sizeof(*ctxt));
 
-    ctxt->user_regs.ds = FLAT_KERNEL_DS_X86_64;
-    ctxt->user_regs.es = FLAT_KERNEL_DS_X86_64;
-    ctxt->user_regs.fs = FLAT_KERNEL_DS_X86_64;
-    ctxt->user_regs.gs = FLAT_KERNEL_DS_X86_64;
-    ctxt->user_regs.ss = FLAT_KERNEL_SS_X86_64;
-    ctxt->user_regs.cs = FLAT_KERNEL_CS_X86_64;
     ctxt->user_regs.rip = dom->parms.virt_entry;
     ctxt->user_regs.rsp =
         dom->parms.virt_base + (dom->bootstack_pfn + 1) * PAGE_SIZE_X86;
@@ -642,15 +637,25 @@ static int vcpu_x86_64(struct xc_dom_image *dom, void *ptr)
         dom->parms.virt_base + (dom->start_info_pfn) * PAGE_SIZE_X86;
     ctxt->user_regs.rflags = 1 << 9; /* Interrupt Enable */
 
-    ctxt->kernel_ss = ctxt->user_regs.ss;
-    ctxt->kernel_sp = ctxt->user_regs.esp;
-
     ctxt->flags = VGCF_in_kernel_X86_64 | VGCF_online_X86_64;
     cr3_pfn = xc_dom_p2m_guest(dom, dom->pgtables_seg.pfn);
     ctxt->ctrlreg[3] = xen_pfn_to_cr3_x86_64(cr3_pfn);
     DOMPRINTF("%s: cr3: pfn 0x%" PRIpfn " mfn 0x%" PRIpfn "",
               __FUNCTION__, dom->pgtables_seg.pfn, cr3_pfn);
 
+    if ( dom->pvh_enabled )
+        return 0;
+
+    ctxt->user_regs.ds = FLAT_KERNEL_DS_X86_64;
+    ctxt->user_regs.es = FLAT_KERNEL_DS_X86_64;
+    ctxt->user_regs.fs = FLAT_KERNEL_DS_X86_64;
+    ctxt->user_regs.gs = FLAT_KERNEL_DS_X86_64;
+    ctxt->user_regs.ss = FLAT_KERNEL_SS_X86_64;
+    ctxt->user_regs.cs = FLAT_KERNEL_CS_X86_64;
+
+    ctxt->kernel_ss = ctxt->user_regs.ss;
+    ctxt->kernel_sp = ctxt->user_regs.esp;
+
     return 0;
 }
 
@@ -751,7 +756,7 @@ int arch_setup_meminit(struct xc_dom_image *dom)
     rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type);
     if ( rc )
         return rc;
-    if ( xc_dom_feature_translated(dom) )
+    if ( xc_dom_feature_translated(dom) && !dom->pvh_enabled )
     {
         dom->shadow_enabled = 1;
         rc = x86_shadow(dom->xch, dom->guest_domid);
@@ -827,6 +832,35 @@ int arch_setup_bootearly(struct xc_dom_image *dom)
     return 0;
 }
 
+/* Map grant table frames into guest physmap. */
+static int map_grant_table_frames(struct xc_dom_image *dom)
+{
+    int i, rc;
+
+    if ( dom->pvh_enabled )
+        return 0;
+
+    for ( i = 0; ; i++ )
+    {
+        rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid,
+                                      XENMAPSPACE_grant_table,
+                                      i, dom->total_pages + i);
+        if ( rc != 0 )
+        {
+            if ( (i > 0) && (errno == EINVAL) )
+            {
+                DOMPRINTF("%s: %d grant tables mapped", __FUNCTION__, i);
+                break;
+            }
+            xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                         "%s: mapping grant tables failed " "(pfn=0x%" PRIpfn
+                         ", rc=%d)", __FUNCTION__, dom->total_pages + i, rc);
+            return rc;
+        }
+    }
+    return 0;
+}
+
 int arch_setup_bootlate(struct xc_dom_image *dom)
 {
     static const struct {
@@ -865,7 +899,6 @@ int arch_setup_bootlate(struct xc_dom_image *dom)
     else
     {
         /* paravirtualized guest with auto-translation */
-        int i;
 
         /* Map shared info frame into guest physmap. */
         rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid,
@@ -879,25 +912,10 @@ int arch_setup_bootlate(struct xc_dom_image *dom)
             return rc;
         }
 
-        /* Map grant table frames into guest physmap. */
-        for ( i = 0; ; i++ )
-        {
-            rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid,
-                                          XENMAPSPACE_grant_table,
-                                          i, dom->total_pages + i);
-            if ( rc != 0 )
-            {
-                if ( (i > 0) && (errno == EINVAL) )
-                {
-                    DOMPRINTF("%s: %d grant tables mapped", __FUNCTION__, i);
-                    break;
-                }
-                xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
-                             "%s: mapping grant tables failed " "(pfn=0x%"
-                             PRIpfn ", rc=%d)", __FUNCTION__, dom->total_pages + i, rc);
-                return rc;
-            }
-        }
+        rc = map_grant_table_frames(dom);
+        if ( rc != 0 )
+            return rc;
+
         shinfo = dom->shared_info_pfn;
     }
 
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index cf214bb..941f37b 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -13,7 +13,7 @@ XLUMINOR = 0
 
 CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \
 	-Wno-declaration-after-statement -Wformat-nonliteral
-CFLAGS += -I. -fPIC
+CFLAGS += -I. -fPIC -g -O0
 
 ifeq ($(CONFIG_Linux),y)
 LIBUUID_LIBS += -luuid
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 0c32d0b..77d1a34 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -405,6 +405,8 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_create_info *info,
         flags |= XEN_DOMCTL_CDF_hvm_guest;
         flags |= libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
         flags |= libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
+    } else if ( libxl_defbool_val(info->pvh) ) {
+        flags |= XEN_DOMCTL_CDF_hap;
     }
     *domid = -1;
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index b38d0a7..cefbf76 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -329,9 +329,23 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
     struct xc_dom_image *dom;
     int ret;
     int flags = 0;
+    int is_pvh = libxl_defbool_val(info->pvh);
 
     xc_dom_loginit(ctx->xch);
 
+    if (is_pvh) {
+        char *pv_feats = "writable_descriptor_tables|auto_translated_physmap"
+                         "|supervisor_mode_kernel|hvm_callback_vector";
+
+        if (info->u.pv.features && info->u.pv.features[0] != '\0')
+        {
+            LOG(ERROR, "Didn't expect info->u.pv.features to contain string\n");
+            LOG(ERROR, "String: %s\n", info->u.pv.features);
+            return ERROR_FAIL;
+        }
+        info->u.pv.features = strdup(pv_feats);
+    }
+
     dom = xc_dom_allocate(ctx->xch, state->pv_cmdline, info->u.pv.features);
     if (!dom) {
         LOGE(ERROR, "xc_dom_allocate failed");
@@ -370,6 +384,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
     }
 
     dom->flags = flags;
+    dom->pvh_enabled = is_pvh;
     dom->console_evtchn = state->console_port;
     dom->console_domid = state->console_domid;
     dom->xenstore_evtchn = state->store_port;
@@ -400,7 +415,8 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
         LOGE(ERROR, "xc_dom_boot_image failed");
         goto out;
     }
-    if ( (ret = xc_dom_gnttab_init(dom)) != 0 ) {
+    /* PVH sets up its own grant during boot via hvm mechanisms */
+    if ( !is_pvh && (ret = xc_dom_gnttab_init(dom)) != 0 ) {
         LOGE(ERROR, "xc_dom_gnttab_init failed");
         goto out;
     }
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index d218a2d..2e100f9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -244,6 +244,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
     ("platformdata", libxl_key_value_list),
     ("poolid",       uint32),
     ("run_hotplug_scripts",libxl_defbool),
+    ("pvh",          libxl_defbool),
     ], dir=DIR_IN)
 
 MemKB = UInt(64, init_val = "LIBXL_MEMKB_DEFAULT")
@@ -345,6 +346,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                       ])),
                  ("invalid", Struct(None, [])),
                  ], keyvar_init_val = "LIBXL_DOMAIN_TYPE_INVALID")),
+    ("pvh",       libxl_defbool),
     ], dir=DIR_IN
 )
 
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index a78c91d..a136f70 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -290,7 +290,9 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
     if (rtc_timeoffset)
         xc_domain_set_time_offset(ctx->xch, domid, rtc_timeoffset);
 
-    if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM) {
+    if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM ||
+        libxl_defbool_val(d_config->b_info.pvh)) {
+
         unsigned long shadow;
         shadow = (d_config->b_info.shadow_memkb + 1023) / 1024;
         xc_shadow_control(ctx->xch, domid, XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION, NULL, 0, &shadow, 0, NULL);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 5bef969..20b171d 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -610,8 +610,18 @@ static void parse_config_data(const char *config_source,
         !strncmp(buf, "hvm", strlen(buf)))
         c_info->type = LIBXL_DOMAIN_TYPE_HVM;
 
+    libxl_defbool_setdefault(&c_info->pvh, false);
+    libxl_defbool_setdefault(&c_info->hap, false);
+    xlu_cfg_get_defbool(config, "pvh", &c_info->pvh, 0);
     xlu_cfg_get_defbool(config, "hap", &c_info->hap, 0);
 
+    if (libxl_defbool_val(c_info->pvh) &&
+        !libxl_defbool_val(c_info->hap)) {
+
+        fprintf(stderr, "hap is required for PVH domain\n");
+        exit(1);
+    }
+
     if (xlu_cfg_replace_string (config, "name", &c_info->name, 0)) {
         fprintf(stderr, "Domain name must be specified.\n");
         exit(1);
@@ -918,6 +928,7 @@ static void parse_config_data(const char *config_source,
 
         b_info->u.pv.cmdline = cmdline;
         xlu_cfg_replace_string (config, "ramdisk", &b_info->u.pv.ramdisk, 0);
+        libxl_defbool_set(&b_info->pvh, libxl_defbool_val(c_info->pvh));
         break;
     }
     default:
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index bf83d58..10c23a1 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -168,13 +168,15 @@ static int readchn(struct connection *conn, void *data, unsigned int len)
 static void *map_interface(domid_t domid, unsigned long mfn)
 {
 	if (*xcg_handle != NULL) {
-		/* this is the preferred method */
-		return xc_gnttab_map_grant_ref(*xcg_handle, domid,
+                void *addr;
+                /* this is the preferred method */
+                addr = xc_gnttab_map_grant_ref(*xcg_handle, domid,
 			GNTTAB_RESERVED_XENSTORE, PROT_READ|PROT_WRITE);
-	} else {
-		return xc_map_foreign_range(*xc_handle, domid,
-			getpagesize(), PROT_READ|PROT_WRITE, mfn);
+                if (addr)
+                        return addr;
 	}
+	return xc_map_foreign_range(*xc_handle, domid,
+		        getpagesize(), PROT_READ|PROT_WRITE, mfn);
 }
 
 static void unmap_interface(void *interface)
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 4c5b2bb..6b1aa11 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -89,6 +89,9 @@ struct xen_domctl_getdomaininfo {
  /* Being debugged.  */
 #define _XEN_DOMINF_debugged  6
 #define XEN_DOMINF_debugged   (1U<<_XEN_DOMINF_debugged)
+/* domain is PVH */
+#define _XEN_DOMINF_pvh_guest 7
+#define XEN_DOMINF_pvh_guest   (1U<<_XEN_DOMINF_pvh_guest)
  /* XEN_DOMINF_shutdown guest-supplied code.  */
 #define XEN_DOMINF_shutdownmask 255
 #define XEN_DOMINF_shutdownshift 16

[-- Attachment #2: pvh.tools.patch --]
[-- Type: text/x-patch, Size: 15628 bytes --]

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 069b73f..ff48653 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -634,6 +634,9 @@ if your particular guest kernel does not require this behaviour then
 it is safe to allow this to be enabled but you may wish to disable it
 anyway.
 
+=item B<pvh=BOOLEAN>
+Selects whether to run this guest in an HVM container. Default is 0.
+
 =back
 
 =head2 Fully-virtualised (HVM) Guest Specific Options
diff --git a/tools/debugger/gdbsx/xg/xg_main.c b/tools/debugger/gdbsx/xg/xg_main.c
index 64c7484..5736b86 100644
--- a/tools/debugger/gdbsx/xg/xg_main.c
+++ b/tools/debugger/gdbsx/xg/xg_main.c
@@ -81,6 +81,7 @@ int xgtrc_on = 0;
 struct xen_domctl domctl;         /* just use a global domctl */
 
 static int     _hvm_guest;        /* hvm guest? 32bit HVMs have 64bit context */
+static int     _pvh_guest;        /* PV guest in HVM container */
 static domid_t _dom_id;           /* guest domid */
 static int     _max_vcpu_id;      /* thus max_vcpu_id+1 VCPUs */
 static int     _dom0_fd;          /* fd of /dev/privcmd */
@@ -309,6 +310,7 @@ xg_attach(int domid, int guest_bitness)
 
     _max_vcpu_id = domctl.u.getdomaininfo.max_vcpu_id;
     _hvm_guest = (domctl.u.getdomaininfo.flags & XEN_DOMINF_hvm_guest);
+    _pvh_guest = (domctl.u.getdomaininfo.flags & XEN_DOMINF_pvh_guest);
     return _max_vcpu_id;
 }
 
@@ -369,7 +371,7 @@ _change_TF(vcpuid_t which_vcpu, int guest_bitness, int setit)
     int sz = sizeof(anyc);
 
     /* first try the MTF for hvm guest. otherwise do manually */
-    if (_hvm_guest) {
+    if (_hvm_guest || _pvh_guest) {
         domctl.u.debug_op.vcpu = which_vcpu;
         domctl.u.debug_op.op = setit ? XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON :
                                        XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF;
diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 512a994..a92e823 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -86,7 +86,7 @@ OSDEP_SRCS-y                 += xenctrl_osdep_ENOSYS.c
 -include $(XEN_TARGET_ARCH)/Makefile
 
 CFLAGS   += -Werror -Wmissing-prototypes
-CFLAGS   += -I. $(CFLAGS_xeninclude)
+CFLAGS   += -I. $(CFLAGS_xeninclude) -g -O0
 
 # Needed for posix_fadvise64() in xc_linux.c
 CFLAGS-$(CONFIG_Linux) += -D_GNU_SOURCE
diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h
index 86e23ee..5168bcd 100644
--- a/tools/libxc/xc_dom.h
+++ b/tools/libxc/xc_dom.h
@@ -130,6 +130,7 @@ struct xc_dom_image {
     domid_t console_domid;
     domid_t xenstore_domid;
     xen_pfn_t shared_info_mfn;
+    int pvh_enabled;
 
     xc_interface *xch;
     domid_t guest_domid;
diff --git a/tools/libxc/xc_dom_elfloader.c b/tools/libxc/xc_dom_elfloader.c
index 9843b1f..40b99f6 100644
--- a/tools/libxc/xc_dom_elfloader.c
+++ b/tools/libxc/xc_dom_elfloader.c
@@ -362,6 +362,18 @@ static elf_errorstatus xc_dom_parse_elf_kernel(struct xc_dom_image *dom)
         rc = -EINVAL;
         goto out;
     }
+    if ( dom->pvh_enabled )
+    {
+        if ( !elf_xen_feature_get(XENFEAT_auto_translated_physmap, 
+                                 dom->parms.f_supported) )
+        {
+            xc_dom_panic(dom->xch, XC_INVALID_KERNEL, "%s: Kernel does not"
+                         " support PVH mode. Supported: %x", __FUNCTION__,
+                         *dom->parms.f_supported);
+            rc = -EINVAL;
+            goto out;
+        }
+    }
 
     /* find kernel segment */
     dom->kernel_seg.vstart = dom->parms.virt_kstart;
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 126c0f8..111f001 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -415,7 +415,8 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
         pgpfn = (addr - dom->parms.virt_base) >> PAGE_SHIFT_X86;
         l1tab[l1off] =
             pfn_to_paddr(xc_dom_p2m_guest(dom, pgpfn)) | L1_PROT;
-        if ( (addr >= dom->pgtables_seg.vstart) && 
+        if ( (!dom->pvh_enabled)                &&
+             (addr >= dom->pgtables_seg.vstart) &&
              (addr < dom->pgtables_seg.vend) )
             l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
         if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
@@ -629,12 +630,6 @@ static int vcpu_x86_64(struct xc_dom_image *dom, void *ptr)
     /* clear everything */
     memset(ctxt, 0, sizeof(*ctxt));
 
-    ctxt->user_regs.ds = FLAT_KERNEL_DS_X86_64;
-    ctxt->user_regs.es = FLAT_KERNEL_DS_X86_64;
-    ctxt->user_regs.fs = FLAT_KERNEL_DS_X86_64;
-    ctxt->user_regs.gs = FLAT_KERNEL_DS_X86_64;
-    ctxt->user_regs.ss = FLAT_KERNEL_SS_X86_64;
-    ctxt->user_regs.cs = FLAT_KERNEL_CS_X86_64;
     ctxt->user_regs.rip = dom->parms.virt_entry;
     ctxt->user_regs.rsp =
         dom->parms.virt_base + (dom->bootstack_pfn + 1) * PAGE_SIZE_X86;
@@ -642,15 +637,25 @@ static int vcpu_x86_64(struct xc_dom_image *dom, void *ptr)
         dom->parms.virt_base + (dom->start_info_pfn) * PAGE_SIZE_X86;
     ctxt->user_regs.rflags = 1 << 9; /* Interrupt Enable */
 
-    ctxt->kernel_ss = ctxt->user_regs.ss;
-    ctxt->kernel_sp = ctxt->user_regs.esp;
-
     ctxt->flags = VGCF_in_kernel_X86_64 | VGCF_online_X86_64;
     cr3_pfn = xc_dom_p2m_guest(dom, dom->pgtables_seg.pfn);
     ctxt->ctrlreg[3] = xen_pfn_to_cr3_x86_64(cr3_pfn);
     DOMPRINTF("%s: cr3: pfn 0x%" PRIpfn " mfn 0x%" PRIpfn "",
               __FUNCTION__, dom->pgtables_seg.pfn, cr3_pfn);
 
+    if ( dom->pvh_enabled )
+        return 0;
+
+    ctxt->user_regs.ds = FLAT_KERNEL_DS_X86_64;
+    ctxt->user_regs.es = FLAT_KERNEL_DS_X86_64;
+    ctxt->user_regs.fs = FLAT_KERNEL_DS_X86_64;
+    ctxt->user_regs.gs = FLAT_KERNEL_DS_X86_64;
+    ctxt->user_regs.ss = FLAT_KERNEL_SS_X86_64;
+    ctxt->user_regs.cs = FLAT_KERNEL_CS_X86_64;
+
+    ctxt->kernel_ss = ctxt->user_regs.ss;
+    ctxt->kernel_sp = ctxt->user_regs.esp;
+
     return 0;
 }
 
@@ -751,7 +756,7 @@ int arch_setup_meminit(struct xc_dom_image *dom)
     rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type);
     if ( rc )
         return rc;
-    if ( xc_dom_feature_translated(dom) )
+    if ( xc_dom_feature_translated(dom) && !dom->pvh_enabled )
     {
         dom->shadow_enabled = 1;
         rc = x86_shadow(dom->xch, dom->guest_domid);
@@ -827,6 +832,35 @@ int arch_setup_bootearly(struct xc_dom_image *dom)
     return 0;
 }
 
+/* Map grant table frames into guest physmap. */
+static int map_grant_table_frames(struct xc_dom_image *dom)
+{
+    int i, rc;
+
+    if ( dom->pvh_enabled )
+        return 0;
+
+    for ( i = 0; ; i++ )
+    {
+        rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid,
+                                      XENMAPSPACE_grant_table,
+                                      i, dom->total_pages + i);
+        if ( rc != 0 )
+        {
+            if ( (i > 0) && (errno == EINVAL) )
+            {
+                DOMPRINTF("%s: %d grant tables mapped", __FUNCTION__, i);
+                break;
+            }
+            xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                         "%s: mapping grant tables failed " "(pfn=0x%" PRIpfn
+                         ", rc=%d)", __FUNCTION__, dom->total_pages + i, rc);
+            return rc;
+        }
+    }
+    return 0;
+}
+
 int arch_setup_bootlate(struct xc_dom_image *dom)
 {
     static const struct {
@@ -865,7 +899,6 @@ int arch_setup_bootlate(struct xc_dom_image *dom)
     else
     {
         /* paravirtualized guest with auto-translation */
-        int i;
 
         /* Map shared info frame into guest physmap. */
         rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid,
@@ -879,25 +912,10 @@ int arch_setup_bootlate(struct xc_dom_image *dom)
             return rc;
         }
 
-        /* Map grant table frames into guest physmap. */
-        for ( i = 0; ; i++ )
-        {
-            rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid,
-                                          XENMAPSPACE_grant_table,
-                                          i, dom->total_pages + i);
-            if ( rc != 0 )
-            {
-                if ( (i > 0) && (errno == EINVAL) )
-                {
-                    DOMPRINTF("%s: %d grant tables mapped", __FUNCTION__, i);
-                    break;
-                }
-                xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
-                             "%s: mapping grant tables failed " "(pfn=0x%"
-                             PRIpfn ", rc=%d)", __FUNCTION__, dom->total_pages + i, rc);
-                return rc;
-            }
-        }
+        rc = map_grant_table_frames(dom);
+        if ( rc != 0 )
+            return rc;
+
         shinfo = dom->shared_info_pfn;
     }
 
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index cf214bb..941f37b 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -13,7 +13,7 @@ XLUMINOR = 0
 
 CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \
 	-Wno-declaration-after-statement -Wformat-nonliteral
-CFLAGS += -I. -fPIC
+CFLAGS += -I. -fPIC -g -O0
 
 ifeq ($(CONFIG_Linux),y)
 LIBUUID_LIBS += -luuid
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 0c32d0b..77d1a34 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -405,6 +405,8 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_create_info *info,
         flags |= XEN_DOMCTL_CDF_hvm_guest;
         flags |= libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
         flags |= libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
+    } else if ( libxl_defbool_val(info->pvh) ) {
+        flags |= XEN_DOMCTL_CDF_hap;
     }
     *domid = -1;
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index b38d0a7..cefbf76 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -329,9 +329,23 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
     struct xc_dom_image *dom;
     int ret;
     int flags = 0;
+    int is_pvh = libxl_defbool_val(info->pvh);
 
     xc_dom_loginit(ctx->xch);
 
+    if (is_pvh) {
+        char *pv_feats = "writable_descriptor_tables|auto_translated_physmap"
+                         "|supervisor_mode_kernel|hvm_callback_vector";
+
+        if (info->u.pv.features && info->u.pv.features[0] != '\0')
+        {
+            LOG(ERROR, "Didn't expect info->u.pv.features to contain string\n");
+            LOG(ERROR, "String: %s\n", info->u.pv.features);
+            return ERROR_FAIL;
+        }
+        info->u.pv.features = strdup(pv_feats);
+    }
+
     dom = xc_dom_allocate(ctx->xch, state->pv_cmdline, info->u.pv.features);
     if (!dom) {
         LOGE(ERROR, "xc_dom_allocate failed");
@@ -370,6 +384,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
     }
 
     dom->flags = flags;
+    dom->pvh_enabled = is_pvh;
     dom->console_evtchn = state->console_port;
     dom->console_domid = state->console_domid;
     dom->xenstore_evtchn = state->store_port;
@@ -400,7 +415,8 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
         LOGE(ERROR, "xc_dom_boot_image failed");
         goto out;
     }
-    if ( (ret = xc_dom_gnttab_init(dom)) != 0 ) {
+    /* PVH sets up its own grant during boot via hvm mechanisms */
+    if ( !is_pvh && (ret = xc_dom_gnttab_init(dom)) != 0 ) {
         LOGE(ERROR, "xc_dom_gnttab_init failed");
         goto out;
     }
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index d218a2d..2e100f9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -244,6 +244,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
     ("platformdata", libxl_key_value_list),
     ("poolid",       uint32),
     ("run_hotplug_scripts",libxl_defbool),
+    ("pvh",          libxl_defbool),
     ], dir=DIR_IN)
 
 MemKB = UInt(64, init_val = "LIBXL_MEMKB_DEFAULT")
@@ -345,6 +346,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                       ])),
                  ("invalid", Struct(None, [])),
                  ], keyvar_init_val = "LIBXL_DOMAIN_TYPE_INVALID")),
+    ("pvh",       libxl_defbool),
     ], dir=DIR_IN
 )
 
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index a78c91d..a136f70 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -290,7 +290,9 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
     if (rtc_timeoffset)
         xc_domain_set_time_offset(ctx->xch, domid, rtc_timeoffset);
 
-    if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM) {
+    if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM ||
+        libxl_defbool_val(d_config->b_info.pvh)) {
+
         unsigned long shadow;
         shadow = (d_config->b_info.shadow_memkb + 1023) / 1024;
         xc_shadow_control(ctx->xch, domid, XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION, NULL, 0, &shadow, 0, NULL);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 5bef969..20b171d 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -610,8 +610,18 @@ static void parse_config_data(const char *config_source,
         !strncmp(buf, "hvm", strlen(buf)))
         c_info->type = LIBXL_DOMAIN_TYPE_HVM;
 
+    libxl_defbool_setdefault(&c_info->pvh, false);
+    libxl_defbool_setdefault(&c_info->hap, false);
+    xlu_cfg_get_defbool(config, "pvh", &c_info->pvh, 0);
     xlu_cfg_get_defbool(config, "hap", &c_info->hap, 0);
 
+    if (libxl_defbool_val(c_info->pvh) &&
+        !libxl_defbool_val(c_info->hap)) {
+
+        fprintf(stderr, "hap is required for PVH domain\n");
+        exit(1);
+    }
+
     if (xlu_cfg_replace_string (config, "name", &c_info->name, 0)) {
         fprintf(stderr, "Domain name must be specified.\n");
         exit(1);
@@ -918,6 +928,7 @@ static void parse_config_data(const char *config_source,
 
         b_info->u.pv.cmdline = cmdline;
         xlu_cfg_replace_string (config, "ramdisk", &b_info->u.pv.ramdisk, 0);
+        libxl_defbool_set(&b_info->pvh, libxl_defbool_val(c_info->pvh));
         break;
     }
     default:
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index bf83d58..10c23a1 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -168,13 +168,15 @@ static int readchn(struct connection *conn, void *data, unsigned int len)
 static void *map_interface(domid_t domid, unsigned long mfn)
 {
 	if (*xcg_handle != NULL) {
-		/* this is the preferred method */
-		return xc_gnttab_map_grant_ref(*xcg_handle, domid,
+                void *addr;
+                /* this is the preferred method */
+                addr = xc_gnttab_map_grant_ref(*xcg_handle, domid,
 			GNTTAB_RESERVED_XENSTORE, PROT_READ|PROT_WRITE);
-	} else {
-		return xc_map_foreign_range(*xc_handle, domid,
-			getpagesize(), PROT_READ|PROT_WRITE, mfn);
+                if (addr)
+                        return addr;
 	}
+	return xc_map_foreign_range(*xc_handle, domid,
+		        getpagesize(), PROT_READ|PROT_WRITE, mfn);
 }
 
 static void unmap_interface(void *interface)
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 4c5b2bb..6b1aa11 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -89,6 +89,9 @@ struct xen_domctl_getdomaininfo {
  /* Being debugged.  */
 #define _XEN_DOMINF_debugged  6
 #define XEN_DOMINF_debugged   (1U<<_XEN_DOMINF_debugged)
+/* domain is PVH */
+#define _XEN_DOMINF_pvh_guest 7
+#define XEN_DOMINF_pvh_guest   (1U<<_XEN_DOMINF_pvh_guest)
  /* XEN_DOMINF_shutdown guest-supplied code.  */
 #define XEN_DOMINF_shutdownmask 255
 #define XEN_DOMINF_shutdownshift 16

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 17/21] PVH xen: vmcs related changes
  2013-08-27 17:00       ` George Dunlap
@ 2013-08-27 22:43         ` Mukesh Rathor
  0 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-27 22:43 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Jan Beulich

On Tue, 27 Aug 2013 18:00:04 +0100
George Dunlap <George.Dunlap@eu.citrix.com> wrote:

> On Sat, Aug 24, 2013 at 1:26 AM, Mukesh Rathor
> <mukesh.rathor@oracle.com> wrote:
> > On Fri, 23 Aug 2013 09:41:55 +0100
> > "Jan Beulich" <JBeulich@suse.com> wrote:
> >
> >> >>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@oracle.com>
> >> >>> wrote:
> >> > This patch contains vmcs changes related for PVH, mainly
> >> > creating a VMCS for PVH guest.
> >> >
> >> > Changes in V11:
> >> >    - Remove pvh_construct_vmcs and make it part of construct_vmcs
> >> >
...
> > Ok. They, the adjustments, are necessary in Phase I.
> >
> >> This is misplaced too - we're past the point of determining the
> >> set of flags, and already in the process of committing them. The
> >> code again should go alongside where the corresponding HVM code
> >> sits.
> >>
> >> > -    if ( cpu_has_vmx_ple )
> >> > +    if ( cpu_has_vmx_ple && !is_pvh_vcpu(v) )
> >> >      {
> >> >          __vmwrite(PLE_GAP, ple_gap);
> >> >          __vmwrite(PLE_WINDOW, ple_window);
> >>
> >> Why would this be conditional upon !PVH?
> >
> > We don't have intercept for PLE right now for PVH in phase I.
> > Easy to add tho.
> 
> Or, you could just use the pre-existing VMX exits.  That's what I've
> done in my port.

My intention was to have this in with some baseline working, then add
each of bells whistles one at a time by testing them. If you are able to 
test it all, or are comfortable checking untested code, go for it.

> >> > @@ -1032,16 +1150,36 @@ static int construct_vmcs(struct vcpu *v)
> >> >
> >> >      v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
> >> >                | (paging_mode_hap(d) ? 0 : (1U <<
> >> > TRAP_page_fault))
> >> > +              | (is_pvh_vcpu(v) ? (1U << TRAP_int3) | (1U <<
> >> > TRAP_debug) : 0) | (1U << TRAP_no_device);
> >>
> >> What's so special about PVH that it requires these extra
> >> intercepts?
> >
> > HVM does the same in vmx_update_debug_state() called from
> > vmx_update_guest_cr() which we don't call for cr0. We need the
> > two for gdbsx and kdb, and they are harmless too as we inject back
> > into the guest if we don't own the exception, jfyi.
> 
> What you mean is, when an HVM guest sets the cr0 to come out of real
> mode, these will be set from vmx_update_debug_state().  Since PVH
> never goes through that transition, we need to set them at
> start-of-day.  Is that right?
> 
> It seems like it would be better to just call vmx_update_debug_state()
> directly, with a comment about the lack of real-mode transition.

That would work too.

thx,
m-

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-27 17:05         ` George Dunlap
  2013-08-27 19:18           ` Mukesh Rathor
@ 2013-08-28  0:37           ` Mukesh Rathor
  2013-08-29 16:28             ` George Dunlap
  1 sibling, 1 reply; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-28  0:37 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Jan Beulich

On Tue, 27 Aug 2013 18:05:00 +0100
George Dunlap <George.Dunlap@eu.citrix.com> wrote:

> On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor
> <mukesh.rathor@oracle.com> wrote:
> > On Fri, 23 Aug 2013 13:05:08 +0100
> > "Jan Beulich" <JBeulich@suse.com> wrote:
> >
> >> >>> On 23.08.13 at 13:15, George Dunlap
> >> >>> <George.Dunlap@eu.citrix.com> wrote:
> >> > On Fri, Aug 23, 2013 at 9:49 AM, Jan Beulich <JBeulich@suse.com>
> >> > wrote:
> >> >>>>> On 23.08.13 at 03:18, Mukesh Rathor
.......
> >> Fine with me, but perhaps Mukesh won't be that happy...
> >
> > It's OK. I'd like this to be merged in asap so I and others can
> > working on the FIXME's right away...
> 
> I'm still waiting on the toolstack changes that are needed to actually
> put it in PVH mode before I can test it.

Also, for V11 you'd need following patch for linux:

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7cd47a7..747057e 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -94,8 +94,33 @@ static void __cpuinit cpu_bringup(void)
 	wmb();			/* make sure everything is out */
 }
 
+static void pvh_set_bringup_context(void)
+{
+        int cpu = smp_processor_id();
+
+        load_percpu_segment(cpu);
+        switch_to_new_gdt(smp_processor_id());
+
+        /* loadsegment(es, 0); */
+
+        __asm__ __volatile__ (
+                "movl %0,%%ds\n"
+                "movl %0,%%ss\n"
+                "pushq %%rax\n"
+                "leaq 1f(%%rip),%%rax\n"
+                "pushq %%rax\n"
+                "retfq\n"
+                "1:\n"
+                : : "r" (__KERNEL_DS), "a" (__KERNEL_CS) : "memory");
+}
+
+
 static void __cpuinit cpu_bringup_and_idle(void)
 {
+        if (xen_feature(XENFEAT_auto_translated_physmap) &&
+            xen_feature(XENFEAT_supervisor_mode_kernel))
+                pvh_set_bringup_context();
+
 	cpu_bringup();
 	cpu_idle();
 }
@@ -302,38 +327,23 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 
 	gdt = get_cpu_gdt_table(cpu);
 
-	ctxt->flags = VGCF_IN_KERNEL;
-	ctxt->user_regs.ss = __KERNEL_DS;
 #ifdef CONFIG_X86_32
 	ctxt->user_regs.fs = __KERNEL_PERCPU;
 	ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
 #else
+	/* Note: PVH is not yet supported on x86_32. */
 	ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
 	ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
 
 	memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
-	if (xen_feature(XENFEAT_auto_translated_physmap) &&
-	    xen_feature(XENFEAT_supervisor_mode_kernel)) {
-		/* Note: PVH is not supported on x86_32. */
-#ifdef CONFIG_X86_64
-		ctxt->user_regs.ds = __KERNEL_DS;
-		ctxt->user_regs.es = 0;
-		ctxt->user_regs.gs = 0;
-
-		/* GUEST_GDTR_BASE and */
-		ctxt->u.pvh.gdtaddr = (unsigned long)gdt;
-		/* GUEST_GDTR_LIMIT in the VMCS. */
-		ctxt->u.pvh.gdtsz = (unsigned long)(GDT_SIZE - 1);
-
-		ctxt->gs_base_user = (unsigned long)
-					per_cpu(irq_stack_union.gs_base, cpu);
-#endif
-	} else {
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		ctxt->flags = VGCF_IN_KERNEL;
 		ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
 		ctxt->user_regs.ds = __USER_DS;
 		ctxt->user_regs.es = __USER_DS;
+		ctxt->user_regs.ss = __KERNEL_DS;
 
 		xen_copy_trap_info(ctxt->trap_ctxt);
 
@@ -359,13 +369,12 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 					(unsigned long)xen_hypervisor_callback;
 		ctxt->failsafe_callback_eip =
 					(unsigned long)xen_failsafe_callback;
+		ctxt->user_regs.cs = __KERNEL_CS;
+		per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
 	}
-	ctxt->user_regs.cs = __KERNEL_CS;
-	ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
 
-	per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
+	ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
 	ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
-
 	if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
 		BUG();

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-27 19:18           ` Mukesh Rathor
@ 2013-08-28 11:20             ` George Dunlap
  2013-08-29  0:16               ` Mukesh Rathor
  0 siblings, 1 reply; 49+ messages in thread
From: George Dunlap @ 2013-08-28 11:20 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel, Jan Beulich

On 27/08/13 20:18, Mukesh Rathor wrote:
> On Tue, 27 Aug 2013 18:05:00 +0100
> George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>
>> On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor
>> <mukesh.rathor@oracle.com> wrote:
>>> On Fri, 23 Aug 2013 13:05:08 +0100
>>> "Jan Beulich" <JBeulich@suse.com> wrote:
>>>
> ....
>>> It's OK. I'd like this to be merged in asap so I and others can
>>> working on the FIXME's right away...
>> I'm still waiting on the toolstack changes that are needed to actually
>> put it in PVH mode before I can test it.
>>
>>   -George
> Oh, I thought I had sent you already. Anyways, here's the unofficial
> version I've in my tree c/s: 704302ce9404c73cfb687d31adcf67094ab5bb53
>
> Both inlined and attachd.

Thanks -- but why is this not part of the series?

  -George

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-28 11:20             ` George Dunlap
@ 2013-08-29  0:16               ` Mukesh Rathor
  0 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-29  0:16 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Jan Beulich

On Wed, 28 Aug 2013 12:20:58 +0100
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 27/08/13 20:18, Mukesh Rathor wrote:
> > On Tue, 27 Aug 2013 18:05:00 +0100
> > George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> >
> >> On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor
> >> <mukesh.rathor@oracle.com> wrote:
> >>> On Fri, 23 Aug 2013 13:05:08 +0100
> >>> "Jan Beulich" <JBeulich@suse.com> wrote:
> >>>
> > ....
> >>> It's OK. I'd like this to be merged in asap so I and others can
> >>> working on the FIXME's right away...
> >> I'm still waiting on the toolstack changes that are needed to
> >> actually put it in PVH mode before I can test it.
> >>
> >>   -George
> > Oh, I thought I had sent you already. Anyways, here's the unofficial
> > version I've in my tree c/s:
> > 704302ce9404c73cfb687d31adcf67094ab5bb53
> >
> > Both inlined and attachd.
> 
> Thanks -- but why is this not part of the series?
> 

It was earlier, but Konrad suggested we break up the patches to
make easier to review.

Mukesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-28  0:37           ` Mukesh Rathor
@ 2013-08-29 16:28             ` George Dunlap
  2013-08-30  0:25               ` Mukesh Rathor
  0 siblings, 1 reply; 49+ messages in thread
From: George Dunlap @ 2013-08-29 16:28 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel, Jan Beulich

On 28/08/13 01:37, Mukesh Rathor wrote:
> On Tue, 27 Aug 2013 18:05:00 +0100
> George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>
>> On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor
>> <mukesh.rathor@oracle.com> wrote:
>>> On Fri, 23 Aug 2013 13:05:08 +0100
>>> "Jan Beulich" <JBeulich@suse.com> wrote:
>>>
>>>>>>> On 23.08.13 at 13:15, George Dunlap
>>>>>>> <George.Dunlap@eu.citrix.com> wrote:
>>>>> On Fri, Aug 23, 2013 at 9:49 AM, Jan Beulich <JBeulich@suse.com>
>>>>> wrote:
>>>>>>>>> On 23.08.13 at 03:18, Mukesh Rathor
> .......
>>>> Fine with me, but perhaps Mukesh won't be that happy...
>>> It's OK. I'd like this to be merged in asap so I and others can
>>> working on the FIXME's right away...
>> I'm still waiting on the toolstack changes that are needed to actually
>> put it in PVH mode before I can test it.
> Also, for V11 you'd need following patch for linux:

OK, so I've tried this with your Xen and Linux branches (i.e., without 
any of my changes).  Dom0 boots, and the kernel boots as PV, but crashes 
as PVH:

(XEN) PVH currently does not support tsc emulation. Setting timer_mode = 
native
(XEN) PVH currently does not support tsc emulation. Setting timer_mode = 
native
(XEN) grant_table.c:577:d0 remote grant table not yet set 
up[95984.867796] device vif19.0 entered promiscuous mode
[95984.882699] ADDRCONF(NETDEV_UP): vif19.0: link is not ready
mapping kernel into physical memory
about to get started...
<G><2>irq.c:375: Dom19 callback via changed to Direct Vector 0xf3
(XEN) PVH: Unhandled trap:0x2 RIP:0xffffffff8101a503
(XEN) PVH: [15] exit_reas:0 0 qual:0 0 cr0:0x00000080000039
(XEN) PVH: RIP:0xffffffff8101a503 RSP:0xffff88003e1b5dd8 EFLGS:0x12 
CR3:0x1c0c000
(XEN) domain_crash called from pvh.c:487
(XEN) Domain 19 (vcpu#0) crashed on cpu#15:
(XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    15
(XEN) RIP:    0000:[<ffffffff8101a503>]
(XEN) RFLAGS: 0000000000000012   CONTEXT: hvm guest
(XEN) rax: ffffffffff493c7c   rbx: ffffffff81dc0d24   rcx: 00000000000000f0
(XEN) rdx: 0000000000000001   rsi: 0000000000000000   rdi: 0000000000000200
(XEN) rbp: ffff88003e1b5e18   rsp: ffff88003e1b5dd8   r8: 0000000000000000
(XEN) r9:  0000000000000063   r10: 0720072007200720   r11: 0720072007200720
(XEN) r12: ffffffff81dc5000   r13: ffff88003e005240   r14: ffffffff817d2b69
(XEN) r15: ffffffff81000000   cr0: 0000000080000039   cr4: 0000000000002660
(XEN) cr3: 0000000001c0c000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: 0000
(XEN) Guest stack trace from rsp=ffff88003e1b5dd8:
(XEN)   Fault while accessing guest memory.
[95985.368360] device vif19.0 left promiscuous mode


  -George

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-29 16:28             ` George Dunlap
@ 2013-08-30  0:25               ` Mukesh Rathor
  2013-08-30 11:02                 ` George Dunlap
  0 siblings, 1 reply; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-30  0:25 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Jan Beulich

On Thu, 29 Aug 2013 17:28:57 +0100
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 28/08/13 01:37, Mukesh Rathor wrote:
> > On Tue, 27 Aug 2013 18:05:00 +0100
> > George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> >
> >> On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor
> >> <mukesh.rathor@oracle.com> wrote:
> >>> On Fri, 23 Aug 2013 13:05:08 +0100
> >>> "Jan Beulich" <JBeulich@suse.com> wrote:
> >>>
> >>>>>>> On 23.08.13 at 13:15, George Dunlap
> >>>>>>> <George.Dunlap@eu.citrix.com> wrote:
> >>>>> On Fri, Aug 23, 2013 at 9:49 AM, Jan Beulich <JBeulich@suse.com>
> >>>>> wrote:
> >>>>>>>>> On 23.08.13 at 03:18, Mukesh Rathor
> > .......
> >>>> Fine with me, but perhaps Mukesh won't be that happy...
> >>> It's OK. I'd like this to be merged in asap so I and others can
> >>> working on the FIXME's right away...
> >> I'm still waiting on the toolstack changes that are needed to
> >> actually put it in PVH mode before I can test it.
> > Also, for V11 you'd need following patch for linux:
> 
> OK, so I've tried this with your Xen and Linux branches (i.e.,
> without any of my changes).  Dom0 boots, and the kernel boots as PV,
> but crashes as PVH:
> 
> (XEN) PVH currently does not support tsc emulation. Setting
> timer_mode = native
> (XEN) PVH currently does not support tsc emulation. Setting
> timer_mode = native
> (XEN) grant_table.c:577:d0 remote grant table not yet set 
> up[95984.867796] device vif19.0 entered promiscuous mode
> [95984.882699] ADDRCONF(NETDEV_UP): vif19.0: link is not ready
> mapping kernel into physical memory
> about to get started...
> <G><2>irq.c:375: Dom19 callback via changed to Direct Vector 0xf3
> (XEN) PVH: Unhandled trap:0x2 RIP:0xffffffff8101a503
> (XEN) PVH: [15] exit_reas:0 0 qual:0 0 cr0:0x00000080000039
> (XEN) PVH: RIP:0xffffffff8101a503 RSP:0xffff88003e1b5dd8 EFLGS:0x12 
> CR3:0x1c0c000
> (XEN) domain_crash called from pvh.c:487
> (XEN) Domain 19 (vcpu#0) crashed on cpu#15:
> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    15
> (XEN) RIP:    0000:[<ffffffff8101a503>]
> (XEN) RFLAGS: 0000000000000012   CONTEXT: hvm guest
> (XEN) rax: ffffffffff493c7c   rbx: ffffffff81dc0d24   rcx:
> 00000000000000f0 (XEN) rdx: 0000000000000001   rsi:
> 0000000000000000   rdi: 0000000000000200 (XEN) rbp:
> ffff88003e1b5e18   rsp: ffff88003e1b5dd8   r8: 0000000000000000 (XEN)
> r9:  0000000000000063   r10: 0720072007200720   r11: 0720072007200720
> (XEN) r12: ffffffff81dc5000   r13: ffff88003e005240   r14:
> ffffffff817d2b69 (XEN) r15: ffffffff81000000   cr0:
> 0000000080000039   cr4: 0000000000002660 (XEN) cr3:
> 0000000001c0c000   cr2: 0000000000000000 (XEN) ds: 0000   es: 0000
> fs: 0000   gs: 0000   ss: 0000   cs: 0000 (XEN) Guest stack trace
> from rsp=ffff88003e1b5dd8: (XEN)   Fault while accessing guest
> memory. [95985.368360] device vif19.0 left promiscuous mode

You prob have nmi watchdog running... you can just disable it
for now. The NMI is handled in the caller, so pvh handler needs to just
ignore it. I'll make a note of that.

thanks
Mukesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-30  0:25               ` Mukesh Rathor
@ 2013-08-30 11:02                 ` George Dunlap
  2013-08-30 17:21                   ` George Dunlap
  0 siblings, 1 reply; 49+ messages in thread
From: George Dunlap @ 2013-08-30 11:02 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel, Jan Beulich

On 30/08/13 01:25, Mukesh Rathor wrote:
> On Thu, 29 Aug 2013 17:28:57 +0100
> George Dunlap <george.dunlap@eu.citrix.com> wrote:
>
>> On 28/08/13 01:37, Mukesh Rathor wrote:
>>> On Tue, 27 Aug 2013 18:05:00 +0100
>>> George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>>
>>>> On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor
>>>> <mukesh.rathor@oracle.com> wrote:
>>>>> On Fri, 23 Aug 2013 13:05:08 +0100
>>>>> "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>>
>>>>>>>>> On 23.08.13 at 13:15, George Dunlap
>>>>>>>>> <George.Dunlap@eu.citrix.com> wrote:
>>>>>>> On Fri, Aug 23, 2013 at 9:49 AM, Jan Beulich <JBeulich@suse.com>
>>>>>>> wrote:
>>>>>>>>>>> On 23.08.13 at 03:18, Mukesh Rathor
>>> .......
>>>>>> Fine with me, but perhaps Mukesh won't be that happy...
>>>>> It's OK. I'd like this to be merged in asap so I and others can
>>>>> working on the FIXME's right away...
>>>> I'm still waiting on the toolstack changes that are needed to
>>>> actually put it in PVH mode before I can test it.
>>> Also, for V11 you'd need following patch for linux:
>> OK, so I've tried this with your Xen and Linux branches (i.e.,
>> without any of my changes).  Dom0 boots, and the kernel boots as PV,
>> but crashes as PVH:
>>
>> (XEN) PVH currently does not support tsc emulation. Setting
>> timer_mode = native
>> (XEN) PVH currently does not support tsc emulation. Setting
>> timer_mode = native
>> (XEN) grant_table.c:577:d0 remote grant table not yet set
>> up[95984.867796] device vif19.0 entered promiscuous mode
>> [95984.882699] ADDRCONF(NETDEV_UP): vif19.0: link is not ready
>> mapping kernel into physical memory
>> about to get started...
>> <G><2>irq.c:375: Dom19 callback via changed to Direct Vector 0xf3
>> (XEN) PVH: Unhandled trap:0x2 RIP:0xffffffff8101a503
>> (XEN) PVH: [15] exit_reas:0 0 qual:0 0 cr0:0x00000080000039
>> (XEN) PVH: RIP:0xffffffff8101a503 RSP:0xffff88003e1b5dd8 EFLGS:0x12
>> CR3:0x1c0c000
>> (XEN) domain_crash called from pvh.c:487
>> (XEN) Domain 19 (vcpu#0) crashed on cpu#15:
>> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:    C ]----
>> (XEN) CPU:    15
>> (XEN) RIP:    0000:[<ffffffff8101a503>]
>> (XEN) RFLAGS: 0000000000000012   CONTEXT: hvm guest
>> (XEN) rax: ffffffffff493c7c   rbx: ffffffff81dc0d24   rcx:
>> 00000000000000f0 (XEN) rdx: 0000000000000001   rsi:
>> 0000000000000000   rdi: 0000000000000200 (XEN) rbp:
>> ffff88003e1b5e18   rsp: ffff88003e1b5dd8   r8: 0000000000000000 (XEN)
>> r9:  0000000000000063   r10: 0720072007200720   r11: 0720072007200720
>> (XEN) r12: ffffffff81dc5000   r13: ffff88003e005240   r14:
>> ffffffff817d2b69 (XEN) r15: ffffffff81000000   cr0:
>> 0000000080000039   cr4: 0000000000002660 (XEN) cr3:
>> 0000000001c0c000   cr2: 0000000000000000 (XEN) ds: 0000   es: 0000
>> fs: 0000   gs: 0000   ss: 0000   cs: 0000 (XEN) Guest stack trace
>> from rsp=ffff88003e1b5dd8: (XEN)   Fault while accessing guest
>> memory. [95985.368360] device vif19.0 left promiscuous mode
> You prob have nmi watchdog running... you can just disable it
> for now. The NMI is handled in the caller, so pvh handler needs to just
> ignore it. I'll make a note of that.

Now with multiple vcpus, the guest crashes without any error message:

(XEN) PVH currently does not support tsc emulation. Setting timer_mode = 
native
(XEN) PVH currently does not support tsc emulation. Setting timer_mode = 
native
(XEN) grant_table.c:577:d0 remote grant table not yet set up[ 
158.203543] device vif2.0 entered promiscuous mode^M
[  158.222642] ADDRCONF(NETDEV_UP): vif2.0: link is not ready^M
mapping kernel into physical memory
about to get started...
<G><2>irq.c:375: Dom2 callback via changed to Direct Vector 0xf3
[  158.620609] device vif2.0 left promiscuous mode^M


And if I set it to only one vcpu, it gets stuck in an EPT violation loop:

(XEN) PVH currently does not support tsc emulation. Setting timer_mode = 
native
(XEN) PVH currently does not support tsc emulation. Setting timer_mode = 
native
(XEN) grant_table.c:577:d0 remote grant table not yet set up[ 
283.823609] device vif3.0 entered promiscuous mode^M
[  283.843691] ADDRCONF(NETDEV_UP): vif3.0: link is not ready^M
mapping kernel into physical memory
about to get started...
<G><2>irq.c:375: Dom3 callback via changed to Direct Vector 0xf3
(XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df90, mfn 
0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd RSP:0xffff88003e22df98
(XEN) p2m-ept.c:638:d3 Walking EPT tables for domain 3 gfn 3e22d
(XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
(XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
(XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
(XEN) p2m-ept.c:657:d3  epte 0
(XEN)  --- GLA 0xffff88003e22df90
(XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn 
0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd RSP:0xffff88003e22df98
(XEN) p2m-ept.c:638:d3 Walking EPT tables for domain 3 gfn 3e22d
(XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
(XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
(XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
(XEN) p2m-ept.c:657:d3  epte 0
(XEN)  --- GLA 0xffff88003e22df88
(XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn 
0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd RSP:0xffff88003e22df98
(XEN) p2m-ept.c:638:d3 Walking EPT tables for domain 3 gfn 3e22d
(XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
(XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
(XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
(XEN) p2m-ept.c:657:d3  epte 0
(XEN)  --- GLA 0xffff88003e22df88
[...]

  -George

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-30 11:02                 ` George Dunlap
@ 2013-08-30 17:21                   ` George Dunlap
  2013-08-30 21:22                     ` Mukesh Rathor
  0 siblings, 1 reply; 49+ messages in thread
From: George Dunlap @ 2013-08-30 17:21 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel, Tim Deegan, Jan Beulich

On 30/08/13 12:02, George Dunlap wrote:
> On 30/08/13 01:25, Mukesh Rathor wrote:
>> On Thu, 29 Aug 2013 17:28:57 +0100
>> George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>
>>> On 28/08/13 01:37, Mukesh Rathor wrote:
>>>> On Tue, 27 Aug 2013 18:05:00 +0100
>>>> George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>>>
>>>>> On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor
>>>>> <mukesh.rathor@oracle.com> wrote:
>>>>>> On Fri, 23 Aug 2013 13:05:08 +0100
>>>>>> "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>>>
>>>>>>>>>> On 23.08.13 at 13:15, George Dunlap
>>>>>>>>>> <George.Dunlap@eu.citrix.com> wrote:
>>>>>>>> On Fri, Aug 23, 2013 at 9:49 AM, Jan Beulich <JBeulich@suse.com>
>>>>>>>> wrote:
>>>>>>>>>>>> On 23.08.13 at 03:18, Mukesh Rathor
>>>> .......
>>>>>>> Fine with me, but perhaps Mukesh won't be that happy...
>>>>>> It's OK. I'd like this to be merged in asap so I and others can
>>>>>> working on the FIXME's right away...
>>>>> I'm still waiting on the toolstack changes that are needed to
>>>>> actually put it in PVH mode before I can test it.
>>>> Also, for V11 you'd need following patch for linux:
>>> OK, so I've tried this with your Xen and Linux branches (i.e.,
>>> without any of my changes).  Dom0 boots, and the kernel boots as PV,
>>> but crashes as PVH:
>>>
>>> (XEN) PVH currently does not support tsc emulation. Setting
>>> timer_mode = native
>>> (XEN) PVH currently does not support tsc emulation. Setting
>>> timer_mode = native
>>> (XEN) grant_table.c:577:d0 remote grant table not yet set
>>> up[95984.867796] device vif19.0 entered promiscuous mode
>>> [95984.882699] ADDRCONF(NETDEV_UP): vif19.0: link is not ready
>>> mapping kernel into physical memory
>>> about to get started...
>>> <G><2>irq.c:375: Dom19 callback via changed to Direct Vector 0xf3
>>> (XEN) PVH: Unhandled trap:0x2 RIP:0xffffffff8101a503
>>> (XEN) PVH: [15] exit_reas:0 0 qual:0 0 cr0:0x00000080000039
>>> (XEN) PVH: RIP:0xffffffff8101a503 RSP:0xffff88003e1b5dd8 EFLGS:0x12
>>> CR3:0x1c0c000
>>> (XEN) domain_crash called from pvh.c:487
>>> (XEN) Domain 19 (vcpu#0) crashed on cpu#15:
>>> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:    C ]----
>>> (XEN) CPU:    15
>>> (XEN) RIP:    0000:[<ffffffff8101a503>]
>>> (XEN) RFLAGS: 0000000000000012   CONTEXT: hvm guest
>>> (XEN) rax: ffffffffff493c7c   rbx: ffffffff81dc0d24   rcx:
>>> 00000000000000f0 (XEN) rdx: 0000000000000001   rsi:
>>> 0000000000000000   rdi: 0000000000000200 (XEN) rbp:
>>> ffff88003e1b5e18   rsp: ffff88003e1b5dd8   r8: 0000000000000000 (XEN)
>>> r9:  0000000000000063   r10: 0720072007200720   r11: 0720072007200720
>>> (XEN) r12: ffffffff81dc5000   r13: ffff88003e005240   r14:
>>> ffffffff817d2b69 (XEN) r15: ffffffff81000000   cr0:
>>> 0000000080000039   cr4: 0000000000002660 (XEN) cr3:
>>> 0000000001c0c000   cr2: 0000000000000000 (XEN) ds: 0000   es: 0000
>>> fs: 0000   gs: 0000   ss: 0000   cs: 0000 (XEN) Guest stack trace
>>> from rsp=ffff88003e1b5dd8: (XEN)   Fault while accessing guest
>>> memory. [95985.368360] device vif19.0 left promiscuous mode
>> You prob have nmi watchdog running... you can just disable it
>> for now. The NMI is handled in the caller, so pvh handler needs to just
>> ignore it. I'll make a note of that.
>
> Now with multiple vcpus, the guest crashes without any error message:
>
> (XEN) PVH currently does not support tsc emulation. Setting timer_mode 
> = native
> (XEN) PVH currently does not support tsc emulation. Setting timer_mode 
> = native
> (XEN) grant_table.c:577:d0 remote grant table not yet set up[ 
> 158.203543] device vif2.0 entered promiscuous mode^M
> [  158.222642] ADDRCONF(NETDEV_UP): vif2.0: link is not ready^M
> mapping kernel into physical memory
> about to get started...
> <G><2>irq.c:375: Dom2 callback via changed to Direct Vector 0xf3
> [  158.620609] device vif2.0 left promiscuous mode^M
>
>
> And if I set it to only one vcpu, it gets stuck in an EPT violation loop:
>
> (XEN) PVH currently does not support tsc emulation. Setting timer_mode 
> = native
> (XEN) PVH currently does not support tsc emulation. Setting timer_mode 
> = native
> (XEN) grant_table.c:577:d0 remote grant table not yet set up[ 
> 283.823609] device vif3.0 entered promiscuous mode^M
> [  283.843691] ADDRCONF(NETDEV_UP): vif3.0: link is not ready^M
> mapping kernel into physical memory
> about to get started...
> <G><2>irq.c:375: Dom3 callback via changed to Direct Vector 0xf3
> (XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df90, mfn 
> 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd RSP:0xffff88003e22df98
> (XEN) p2m-ept.c:638:d3 Walking EPT tables for domain 3 gfn 3e22d
> (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
> (XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
> (XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
> (XEN) p2m-ept.c:657:d3  epte 0
> (XEN)  --- GLA 0xffff88003e22df90
> (XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn 
> 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd RSP:0xffff88003e22df98
> (XEN) p2m-ept.c:638:d3 Walking EPT tables for domain 3 gfn 3e22d
> (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
> (XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
> (XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
> (XEN) p2m-ept.c:657:d3  epte 0
> (XEN)  --- GLA 0xffff88003e22df88
> (XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn 
> 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd RSP:0xffff88003e22df98
> (XEN) p2m-ept.c:638:d3 Walking EPT tables for domain 3 gfn 3e22d
> (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
> (XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
> (XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
> (XEN) p2m-ept.c:657:d3  epte 0
> (XEN)  --- GLA 0xffff88003e22df88

I took a xentrace of this, and it looks like what happens is this:

]  9.403782967 --------x------- d3v0 vmexit exit_reason VMCALL eip 
ffffffff81001405
]  9.403784176 --------x------- d3v0 vmentry cycles 2903
]  9.403792751 --------x------- d3v0 vmexit exit_reason VMCALL eip 
ffffffff81001305
]  9.403794945 --------x------- d3v0 vmentry cycles 5263
]  9.404782907 --------x------- d3v0 vmexit exit_reason 
EXTERNAL_INTERRUPT eip ffffffff817c6ff0
    9.404782907 --------x------- d3v0 intr vec THERMAL_APIC(fa)
    9.404782907 --------x------- d3v0 intr_window vec 243 src 5(vector) 
intr #
]  9.404785283 --------x------- d3v0 vmentry cycles 5703
]  9.406630481 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI 
eip ffffffff817ca5a5
    9.406630481 --------x------- inj_exc trap Invalid Op ec ffffffff
    9.406630481 --------x------- d3v0 intr_window vec 243 src 5(vector) 
intr 6
]  9.406634957 --------x------- d3v0 vmentry cycles 10741 !
hvm_generic_postprocess: Strange, exit 0(EXCEPTION_NMI) missing a handler
]  9.406636249 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI 
eip ffffffff817ca655
    9.406636249 --------x------- inj_exc trap Invalid Op ec ffffffff
    9.406636249 --------x------- d3v0 intr_window vec 243 src 5(vector) 
intr 6
]  9.406637659 --------x------- d3v0 vmentry cycles 3382
]  9.406638483 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI 
eip ffffffff817ca655
    9.406638483 --------x------- inj_exc trap Invalid Op ec ffffffff
    9.406638483 --------x------- d3v0 intr_window vec 243 src 5(vector) 
intr 6
]  9.406639793 --------x------- d3v0 vmentry cycles 3143


Note the "Invalid Op" that's being delivered, at address 
ffffffff817ca5a5.  Here is a disassembly of that region:

ffffffff817ca5a0 <do_page_fault>:
ffffffff817ca5a0:       55                      push   %rbp
ffffffff817ca5a1:       48 89 e5                mov    %rsp,%rbp
ffffffff817ca5a4:       e8 47 fb ff ff          callq ffffffff817ca0f0 
<__do_page_fault>
ffffffff817ca5a9:       5d                      pop    %rbp
ffffffff817ca5aa:       c3                      retq
ffffffff817ca5ab:       90                      nop

If you'll notice, ffffffff817ca5a5 is actually in the middle of an 
instruction; it's no surprise that it's an invalid one.  The next two 
eips for illegal instructions are at ffffffff817ca655:

ffffffff817ca650 <notify_die>:
ffffffff817ca650:       55                      push   %rbp
ffffffff817ca651:       48 89 e5                mov    %rsp,%rbp
ffffffff817ca654:       48 83 ec 20             sub    $0x20,%rsp
ffffffff817ca658:       48 89 55 e0             mov %rdx,-0x20(%rbp)
ffffffff817ca65c:       48 8d 55 e0             lea -0x20(%rbp),%rdx
ffffffff817ca660:       48 89 75 e8             mov %rsi,-0x18(%rbp)
ffffffff817ca664:       89 fe                   mov    %edi,%esi
ffffffff817ca666:       48 c7 c7 10 55 e4 81    mov $0xffffffff81e45510,%rdi
ffffffff817ca66d:       48 89 4d f0             mov %rcx,-0x10(%rbp)
ffffffff817ca671:       44 89 45 f8             mov %r8d,-0x8(%rbp)
ffffffff817ca675:       44 89 4d fc             mov %r9d,-0x4(%rbp)
ffffffff817ca679:       e8 b2 ff ff ff          callq ffffffff817ca630 
<atomic_notifier_call_chain>
ffffffff817ca67e:       c9                      leaveq
ffffffff817ca67f:       c3                      retq

Again, in the middle of an instruction; and again 5 bytes after the 
beginning of a function.

It looks, from the rest of it, like it keeps looping on illegal op exits 
in the fault handlers until it runs out of stack space and hits an EPT 
fault.

The first question to ask, of course, is whether the disassembly is 
valid; I think it is, because I looked up the RIP of 5-6 vmexits before 
this one, and they seem to match (e.g., CPUID exits are at an RIP that 
the disassembly says is a cpuid instruction).

Any ideas what might be causing it to end up in the middle of 
instructions while handling exits?

I should repeat, this is your tree + the tools patch, without any 
changes.  (My port actually does the same thing, which is reassuring I 
guess...)

  -George

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-30 17:21                   ` George Dunlap
@ 2013-08-30 21:22                     ` Mukesh Rathor
  2013-09-02 14:52                       ` George Dunlap
  0 siblings, 1 reply; 49+ messages in thread
From: Mukesh Rathor @ 2013-08-30 21:22 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Tim Deegan, Jan Beulich

On Fri, 30 Aug 2013 18:21:52 +0100
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 30/08/13 12:02, George Dunlap wrote:
> > On 30/08/13 01:25, Mukesh Rathor wrote:
> >> On Thu, 29 Aug 2013 17:28:57 +0100
> >> George Dunlap <george.dunlap@eu.citrix.com> wrote:
> >>
> >>> On 28/08/13 01:37, Mukesh Rathor wrote:
> >>>> On Tue, 27 Aug 2013 18:05:00 +0100
> >>>> George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> >>>>
> >>>>> On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor
> >>>>> <mukesh.rathor@oracle.com> wrote:
> >>>>>> On Fri, 23 Aug 2013 13:05:08 +0100
......
> >
> > And if I set it to only one vcpu, it gets stuck in an EPT violation
> > loop:
> >
> > (XEN) PVH currently does not support tsc emulation. Setting
> > timer_mode = native
> > (XEN) PVH currently does not support tsc emulation. Setting
> > timer_mode = native
> > (XEN) grant_table.c:577:d0 remote grant table not yet set up[ 
> > 283.823609] device vif3.0 entered promiscuous mode^M
> > [  283.843691] ADDRCONF(NETDEV_UP): vif3.0: link is not ready^M
> > mapping kernel into physical memory
> > about to get started...
> > <G><2>irq.c:375: Dom3 callback via changed to Direct Vector 0xf3
> > (XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df90, mfn 
> > 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd
> > RSP:0xffff88003e22df98 (XEN) p2m-ept.c:638:d3 Walking EPT tables
> > for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
> > (XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
> > (XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
> > (XEN) p2m-ept.c:657:d3  epte 0
> > (XEN)  --- GLA 0xffff88003e22df90
> > (XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn 
> > 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd
> > RSP:0xffff88003e22df98 (XEN) p2m-ept.c:638:d3 Walking EPT tables
> > for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
> > (XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
> > (XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
> > (XEN) p2m-ept.c:657:d3  epte 0
> > (XEN)  --- GLA 0xffff88003e22df88
> > (XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn 
> > 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd
> > RSP:0xffff88003e22df98 (XEN) p2m-ept.c:638:d3 Walking EPT tables
> > for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
> > (XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
> > (XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
> > (XEN) p2m-ept.c:657:d3  epte 0
> > (XEN)  --- GLA 0xffff88003e22df88
> 
> I took a xentrace of this, and it looks like what happens is this:
> 
> ]  9.403782967 --------x------- d3v0 vmexit exit_reason VMCALL eip 
> ffffffff81001405
> ]  9.403784176 --------x------- d3v0 vmentry cycles 2903
> ]  9.403792751 --------x------- d3v0 vmexit exit_reason VMCALL eip 
> ffffffff81001305
> ]  9.403794945 --------x------- d3v0 vmentry cycles 5263
> ]  9.404782907 --------x------- d3v0 vmexit exit_reason 
> EXTERNAL_INTERRUPT eip ffffffff817c6ff0
>     9.404782907 --------x------- d3v0 intr vec THERMAL_APIC(fa)
>     9.404782907 --------x------- d3v0 intr_window vec 243 src
> 5(vector) intr #
> ]  9.404785283 --------x------- d3v0 vmentry cycles 5703
> ]  9.406630481 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI 
> eip ffffffff817ca5a5
>     9.406630481 --------x------- inj_exc trap Invalid Op ec ffffffff
>     9.406630481 --------x------- d3v0 intr_window vec 243 src
> 5(vector) intr 6
> ]  9.406634957 --------x------- d3v0 vmentry cycles 10741 !
> hvm_generic_postprocess: Strange, exit 0(EXCEPTION_NMI) missing a
> handler ]  9.406636249 --------x------- d3v0 vmexit exit_reason
> EXCEPTION_NMI eip ffffffff817ca655
>     9.406636249 --------x------- inj_exc trap Invalid Op ec ffffffff
>     9.406636249 --------x------- d3v0 intr_window vec 243 src
> 5(vector) intr 6
> ]  9.406637659 --------x------- d3v0 vmentry cycles 3382
> ]  9.406638483 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI 
> eip ffffffff817ca655
>     9.406638483 --------x------- inj_exc trap Invalid Op ec ffffffff
>     9.406638483 --------x------- d3v0 intr_window vec 243 src
> 5(vector) intr 6
> ]  9.406639793 --------x------- d3v0 vmentry cycles 3143
> 
> 
> Note the "Invalid Op" that's being delivered, at address 
> ffffffff817ca5a5.  Here is a disassembly of that region:
> 
> ffffffff817ca5a0 <do_page_fault>:
> ffffffff817ca5a0:       55                      push   %rbp
> ffffffff817ca5a1:       48 89 e5                mov    %rsp,%rbp
> ffffffff817ca5a4:       e8 47 fb ff ff          callq
> ffffffff817ca0f0 <__do_page_fault>
> ffffffff817ca5a9:       5d                      pop    %rbp
> ffffffff817ca5aa:       c3                      retq
> ffffffff817ca5ab:       90                      nop
> 
> If you'll notice, ffffffff817ca5a5 is actually in the middle of an 
> instruction; it's no surprise that it's an invalid one.  The next two 
> eips for illegal instructions are at ffffffff817ca655:
> 
> ffffffff817ca650 <notify_die>:
> ffffffff817ca650:       55                      push   %rbp
> ffffffff817ca651:       48 89 e5                mov    %rsp,%rbp
> ffffffff817ca654:       48 83 ec 20             sub    $0x20,%rsp
> ffffffff817ca658:       48 89 55 e0             mov %rdx,-0x20(%rbp)
> ffffffff817ca65c:       48 8d 55 e0             lea -0x20(%rbp),%rdx
> ffffffff817ca660:       48 89 75 e8             mov %rsi,-0x18(%rbp)
> ffffffff817ca664:       89 fe                   mov    %edi,%esi
> ffffffff817ca666:       48 c7 c7 10 55 e4 81    mov
> $0xffffffff81e45510,%rdi ffffffff817ca66d:       48 89 4d
> f0             mov %rcx,-0x10(%rbp) ffffffff817ca671:       44 89 45
> f8             mov %r8d,-0x8(%rbp) ffffffff817ca675:       44 89 4d
> fc             mov %r9d,-0x4(%rbp) ffffffff817ca679:       e8 b2 ff
> ff ff          callq ffffffff817ca630 <atomic_notifier_call_chain>
> ffffffff817ca67e:       c9                      leaveq
> ffffffff817ca67f:       c3                      retq
> 
> Again, in the middle of an instruction; and again 5 bytes after the 
> beginning of a function.
> 
> It looks, from the rest of it, like it keeps looping on illegal op
> exits in the fault handlers until it runs out of stack space and hits
> an EPT fault.
> 
> The first question to ask, of course, is whether the disassembly is 
> valid; I think it is, because I looked up the RIP of 5-6 vmexits
> before this one, and they seem to match (e.g., CPUID exits are at an
> RIP that the disassembly says is a cpuid instruction).
> 
> Any ideas what might be causing it to end up in the middle of 
> instructions while handling exits?
> 
> I should repeat, this is your tree + the tools patch, without any 
> changes.  (My port actually does the same thing, which is reassuring
> I guess...)

The RIP totally doesn't makes sense, and 90% of the time, I've found
make mrproper to completely clean it up and starting again, will give
you better info. 

I think it might be better to have one tree. So, konrad has refreshed
the tree pvh.v9, I'm taking that and adding whatever patches, make it
work, and then put it externally. So you and I will then both be looking
at exact same linux. Monday is holiday here, so most likely the external
tree would be Tues/Wed, gotta go thru admin hoops here to set it up.

thanks
mukesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-08-30 21:22                     ` Mukesh Rathor
@ 2013-09-02 14:52                       ` George Dunlap
  2013-09-06  1:07                         ` Mukesh Rathor
  0 siblings, 1 reply; 49+ messages in thread
From: George Dunlap @ 2013-09-02 14:52 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: xen-devel, Tim Deegan, Jan Beulich

On 30/08/13 22:22, Mukesh Rathor wrote:
> On Fri, 30 Aug 2013 18:21:52 +0100
> George Dunlap <george.dunlap@eu.citrix.com> wrote:
>
>> On 30/08/13 12:02, George Dunlap wrote:
>>> On 30/08/13 01:25, Mukesh Rathor wrote:
>>>> On Thu, 29 Aug 2013 17:28:57 +0100
>>>> George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>>
>>>>> On 28/08/13 01:37, Mukesh Rathor wrote:
>>>>>> On Tue, 27 Aug 2013 18:05:00 +0100
>>>>>> George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>>>>>>
>>>>>>> On Sat, Aug 24, 2013 at 1:40 AM, Mukesh Rathor
>>>>>>> <mukesh.rathor@oracle.com> wrote:
>>>>>>>> On Fri, 23 Aug 2013 13:05:08 +0100
> ......
>>> And if I set it to only one vcpu, it gets stuck in an EPT violation
>>> loop:
>>>
>>> (XEN) PVH currently does not support tsc emulation. Setting
>>> timer_mode = native
>>> (XEN) PVH currently does not support tsc emulation. Setting
>>> timer_mode = native
>>> (XEN) grant_table.c:577:d0 remote grant table not yet set up[
>>> 283.823609] device vif3.0 entered promiscuous mode^M
>>> [  283.843691] ADDRCONF(NETDEV_UP): vif3.0: link is not ready^M
>>> mapping kernel into physical memory
>>> about to get started...
>>> <G><2>irq.c:375: Dom3 callback via changed to Direct Vector 0xf3
>>> (XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df90, mfn
>>> 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd
>>> RSP:0xffff88003e22df98 (XEN) p2m-ept.c:638:d3 Walking EPT tables
>>> for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
>>> (XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
>>> (XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
>>> (XEN) p2m-ept.c:657:d3  epte 0
>>> (XEN)  --- GLA 0xffff88003e22df90
>>> (XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn
>>> 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd
>>> RSP:0xffff88003e22df98 (XEN) p2m-ept.c:638:d3 Walking EPT tables
>>> for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
>>> (XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
>>> (XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
>>> (XEN) p2m-ept.c:657:d3  epte 0
>>> (XEN)  --- GLA 0xffff88003e22df88
>>> (XEN) EPT violation 0x182 (-w-/---), gpa 0x0000003e22df88, mfn
>>> 0xffffffffffffffff, type 4. RIP:0xffffffff817c6ffd
>>> RSP:0xffff88003e22df98 (XEN) p2m-ept.c:638:d3 Walking EPT tables
>>> for domain 3 gfn 3e22d (XEN) p2m-ept.c:657:d3  epte 1c000008295c6007
>>> (XEN) p2m-ept.c:657:d3  epte 1c000008295c5007
>>> (XEN) p2m-ept.c:657:d3  epte 1c00000434c38007
>>> (XEN) p2m-ept.c:657:d3  epte 0
>>> (XEN)  --- GLA 0xffff88003e22df88
>> I took a xentrace of this, and it looks like what happens is this:
>>
>> ]  9.403782967 --------x------- d3v0 vmexit exit_reason VMCALL eip
>> ffffffff81001405
>> ]  9.403784176 --------x------- d3v0 vmentry cycles 2903
>> ]  9.403792751 --------x------- d3v0 vmexit exit_reason VMCALL eip
>> ffffffff81001305
>> ]  9.403794945 --------x------- d3v0 vmentry cycles 5263
>> ]  9.404782907 --------x------- d3v0 vmexit exit_reason
>> EXTERNAL_INTERRUPT eip ffffffff817c6ff0
>>      9.404782907 --------x------- d3v0 intr vec THERMAL_APIC(fa)
>>      9.404782907 --------x------- d3v0 intr_window vec 243 src
>> 5(vector) intr #
>> ]  9.404785283 --------x------- d3v0 vmentry cycles 5703
>> ]  9.406630481 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI
>> eip ffffffff817ca5a5
>>      9.406630481 --------x------- inj_exc trap Invalid Op ec ffffffff
>>      9.406630481 --------x------- d3v0 intr_window vec 243 src
>> 5(vector) intr 6
>> ]  9.406634957 --------x------- d3v0 vmentry cycles 10741 !
>> hvm_generic_postprocess: Strange, exit 0(EXCEPTION_NMI) missing a
>> handler ]  9.406636249 --------x------- d3v0 vmexit exit_reason
>> EXCEPTION_NMI eip ffffffff817ca655
>>      9.406636249 --------x------- inj_exc trap Invalid Op ec ffffffff
>>      9.406636249 --------x------- d3v0 intr_window vec 243 src
>> 5(vector) intr 6
>> ]  9.406637659 --------x------- d3v0 vmentry cycles 3382
>> ]  9.406638483 --------x------- d3v0 vmexit exit_reason EXCEPTION_NMI
>> eip ffffffff817ca655
>>      9.406638483 --------x------- inj_exc trap Invalid Op ec ffffffff
>>      9.406638483 --------x------- d3v0 intr_window vec 243 src
>> 5(vector) intr 6
>> ]  9.406639793 --------x------- d3v0 vmentry cycles 3143
>>
>>
>> Note the "Invalid Op" that's being delivered, at address
>> ffffffff817ca5a5.  Here is a disassembly of that region:
>>
>> ffffffff817ca5a0 <do_page_fault>:
>> ffffffff817ca5a0:       55                      push   %rbp
>> ffffffff817ca5a1:       48 89 e5                mov    %rsp,%rbp
>> ffffffff817ca5a4:       e8 47 fb ff ff          callq
>> ffffffff817ca0f0 <__do_page_fault>
>> ffffffff817ca5a9:       5d                      pop    %rbp
>> ffffffff817ca5aa:       c3                      retq
>> ffffffff817ca5ab:       90                      nop
>>
>> If you'll notice, ffffffff817ca5a5 is actually in the middle of an
>> instruction; it's no surprise that it's an invalid one.  The next two
>> eips for illegal instructions are at ffffffff817ca655:
>>
>> ffffffff817ca650 <notify_die>:
>> ffffffff817ca650:       55                      push   %rbp
>> ffffffff817ca651:       48 89 e5                mov    %rsp,%rbp
>> ffffffff817ca654:       48 83 ec 20             sub    $0x20,%rsp
>> ffffffff817ca658:       48 89 55 e0             mov %rdx,-0x20(%rbp)
>> ffffffff817ca65c:       48 8d 55 e0             lea -0x20(%rbp),%rdx
>> ffffffff817ca660:       48 89 75 e8             mov %rsi,-0x18(%rbp)
>> ffffffff817ca664:       89 fe                   mov    %edi,%esi
>> ffffffff817ca666:       48 c7 c7 10 55 e4 81    mov
>> $0xffffffff81e45510,%rdi ffffffff817ca66d:       48 89 4d
>> f0             mov %rcx,-0x10(%rbp) ffffffff817ca671:       44 89 45
>> f8             mov %r8d,-0x8(%rbp) ffffffff817ca675:       44 89 4d
>> fc             mov %r9d,-0x4(%rbp) ffffffff817ca679:       e8 b2 ff
>> ff ff          callq ffffffff817ca630 <atomic_notifier_call_chain>
>> ffffffff817ca67e:       c9                      leaveq
>> ffffffff817ca67f:       c3                      retq
>>
>> Again, in the middle of an instruction; and again 5 bytes after the
>> beginning of a function.
>>
>> It looks, from the rest of it, like it keeps looping on illegal op
>> exits in the fault handlers until it runs out of stack space and hits
>> an EPT fault.
>>
>> The first question to ask, of course, is whether the disassembly is
>> valid; I think it is, because I looked up the RIP of 5-6 vmexits
>> before this one, and they seem to match (e.g., CPUID exits are at an
>> RIP that the disassembly says is a cpuid instruction).
>>
>> Any ideas what might be causing it to end up in the middle of
>> instructions while handling exits?
>>
>> I should repeat, this is your tree + the tools patch, without any
>> changes.  (My port actually does the same thing, which is reassuring
>> I guess...)
> The RIP totally doesn't makes sense, and 90% of the time, I've found
> make mrproper to completely clean it up and starting again, will give
> you better info.

Just for good measure, I did a "git clean -ffdx", which gets rid of 
every file in the repo that git doesn't recognize, and re-built. Same 
thing: Invalid instruction traps, the first one being delivered in the 
middle of do_page_fault().

One thing I did forget to mention: this is with only one vcpu.  With 4 
vcpus, it crashes much sooner, but with no useful output.

> I think it might be better to have one tree. So, konrad has refreshed
> the tree pvh.v9, I'm taking that and adding whatever patches, make it
> work, and then put it externally. So you and I will then both be looking
> at exact same linux. Monday is holiday here, so most likely the external
> tree would be Tues/Wed, gotta go thru admin hoops here to set it up.

Sounds good -- it might be helpful to have your kernel config as well.

  -George

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches...
  2013-09-02 14:52                       ` George Dunlap
@ 2013-09-06  1:07                         ` Mukesh Rathor
  0 siblings, 0 replies; 49+ messages in thread
From: Mukesh Rathor @ 2013-09-06  1:07 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Tim Deegan, Jan Beulich

On Mon, 2 Sep 2013 15:52:18 +0100
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 30/08/13 22:22, Mukesh Rathor wrote:
> > On Fri, 30 Aug 2013 18:21:52 +0100
> > George Dunlap <george.dunlap@eu.citrix.com> wrote:
> >
> >> On 30/08/13 12:02, George Dunlap wrote:
> >>> On 30/08/13 01:25, Mukesh Rathor wrote:
> >>>> On Thu, 29 Aug 2013 17:28:57 +0100
> >>>> George Dunlap <george.dunlap@eu.citrix.com> wrote:
> >>>>
> >>>>> On 28/08/13 01:37, Mukesh Rathor wrote:
> >>>>>> On Tue, 27 Aug 2013 18:05:00 +0100
......

> > I think it might be better to have one tree. So, konrad has
> > refreshed the tree pvh.v9, I'm taking that and adding whatever
> > patches, make it work, and then put it externally. So you and I
> > will then both be looking at exact same linux. Monday is holiday
> > here, so most likely the external tree would be Tues/Wed, gotta go
> > thru admin hoops here to set it up.
> 
> Sounds good -- it might be helpful to have your kernel config as well.
> 

Ok, finally have the linux tree at:

git://oss.oracle.com/git/mrathor/linux.git

Branch is: pvh.v9-muk-1

I put the config in there too called pvh-config-file. 

thanks
Mukesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2013-09-06  1:07 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-23  1:18 [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Mukesh Rathor
2013-08-23  1:18 ` [V11 PATCH 01/21] PVH xen: Add readme docs/misc/pvh-readme.txt Mukesh Rathor
2013-08-23  1:18 ` [V11 PATCH 02/21] PVH xen: add params to read_segment_register Mukesh Rathor
2013-08-23  1:18 ` [V11 PATCH 03/21] PVH xen: Move e820 fields out of pv_domain struct Mukesh Rathor
2013-08-23  1:18 ` [V11 PATCH 04/21] PVH xen: hvm related preparatory changes for PVH Mukesh Rathor
2013-08-23  1:18 ` [V11 PATCH 05/21] PVH xen: vmx " Mukesh Rathor
2013-08-23  1:18 ` [V11 PATCH 06/21] PVH xen: vmcs " Mukesh Rathor
2013-08-23  1:18 ` [V11 PATCH 07/21] PVH xen: Introduce PVH guest type and some basic changes Mukesh Rathor
2013-08-23  1:18 ` [V11 PATCH 08/21] PVH xen: introduce pvh_vcpu_boot_set_info() and vmx_pvh_vcpu_boot_set_info() Mukesh Rathor
2013-08-23  1:18 ` [V11 PATCH 09/21] PVH xen: domain create, context switch related code changes Mukesh Rathor
2013-08-23  8:12   ` Jan Beulich
2013-08-23  1:18 ` [V11 PATCH 10/21] PVH xen: support invalid op emulation for PVH Mukesh Rathor
2013-08-23  1:19 ` [V11 PATCH 11/21] PVH xen: Support privileged " Mukesh Rathor
2013-08-23  1:19 ` [V11 PATCH 12/21] PVH xen: interrupt/event-channel delivery to PVH Mukesh Rathor
2013-08-23  1:19 ` [V11 PATCH 13/21] PVH xen: additional changes to support PVH guest creation and execution Mukesh Rathor
2013-08-23  1:19 ` [V11 PATCH 14/21] PVH xen: mapcache and show registers Mukesh Rathor
2013-08-23  1:19 ` [V11 PATCH 15/21] PVH xen: mtrr, tsc, timers, grant changes Mukesh Rathor
2013-08-23  1:19 ` [V11 PATCH 16/21] PVH xen: add hypercall support for PVH Mukesh Rathor
2013-08-23  1:19 ` [V11 PATCH 17/21] PVH xen: vmcs related changes Mukesh Rathor
2013-08-23  8:41   ` Jan Beulich
2013-08-24  0:26     ` Mukesh Rathor
2013-08-26  8:15       ` Jan Beulich
2013-08-27 17:00       ` George Dunlap
2013-08-27 22:43         ` Mukesh Rathor
2013-08-23  1:19 ` [V11 PATCH 18/21] PVH xen: HVM support of PVH guest creation/destruction Mukesh Rathor
2013-08-23  1:19 ` [V11 PATCH 19/21] PVH xen: VMX " Mukesh Rathor
2013-08-23  9:14   ` Jan Beulich
2013-08-24  0:27     ` Mukesh Rathor
2013-08-23  1:19 ` [V11 PATCH 20/21] PVH xen: introduce vmexit handler for PVH Mukesh Rathor
2013-08-23  9:12   ` Jan Beulich
2013-08-24  0:35     ` Mukesh Rathor
2013-08-26  8:22       ` Jan Beulich
2013-08-23  1:19 ` [V11 PATCH 21/21] PVH xen: Checks, asserts, and limitations " Mukesh Rathor
2013-08-23  8:49 ` [V11 PATCH 00/21]PVH xen: Phase I, Version 11 patches Jan Beulich
2013-08-23 11:15   ` George Dunlap
2013-08-23 12:05     ` Jan Beulich
2013-08-24  0:40       ` Mukesh Rathor
2013-08-27 17:05         ` George Dunlap
2013-08-27 19:18           ` Mukesh Rathor
2013-08-28 11:20             ` George Dunlap
2013-08-29  0:16               ` Mukesh Rathor
2013-08-28  0:37           ` Mukesh Rathor
2013-08-29 16:28             ` George Dunlap
2013-08-30  0:25               ` Mukesh Rathor
2013-08-30 11:02                 ` George Dunlap
2013-08-30 17:21                   ` George Dunlap
2013-08-30 21:22                     ` Mukesh Rathor
2013-09-02 14:52                       ` George Dunlap
2013-09-06  1:07                         ` Mukesh Rathor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.