All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] nested vmx: enable VMCS shadowing feature
@ 2013-01-22 12:00 Dongxiao Xu
  2013-01-22 12:00 ` [PATCH v4 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM Dongxiao Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Dongxiao Xu @ 2013-01-22 12:00 UTC (permalink / raw)
  To: xen-devel; +Cc: JBeulich, eddie.dong, xiantao.zhang, jun.nakajima

Changes from v3 to v4:
 - Allocate a buffer in initialization time for data exchange instead of runtime allocation (reflect Jun's comment).
 - Move the vmread/vmwrite bitmap logic into nvmx_vcpu_initialise(), also track MFN instead of virtual address in the vcpu global data. (reflect Jan's comment).

Changes from v2 to v3:
 - Use pfn_to_paddr() to get the address from frame number instead of doing shift directly.
 - Remove some unnecessary initialization code and add "static" to vmentry_fields and gpdptr_fields.
 - Enable the VMREAD/VMWRITE bitmap only if nested hvm is enabled.
 - Use clear_page() to set all 0 to the page instead of memset().
 - Use domheap to allocate the VMREAD/VMWRITE bitmap instead of xenheap.

Changes from v1 to v2:
 - Use domain_page_map_to_mfn() instead of virt_to_mfn() to get the machine frame.
 - Address other comments from Jan, including some programming format, return value type, etc.

Latest Intel SDM introduced a new feature "VMCS shadowing" at bit 14 in
Secondary Processor-Based VM-Execution Controls for nested virtualization.

The main purpose of this feature is to reduce or eliminate the number of VM exit
for non-root VMREAD and VMWRITE. It provides the capability to link a
"virtual VMCS" with the current running VMCS, so that after VM entry, the
non-root VMREAD and VMWRITE can get/set related data directly from/to the
"virtual VMCS" without trap and emulation.

A separate bitmap is introduced for VMREAD and VMWRITE, from which hypervisor
can control whether VMREAD/VMWRITE from/to certain VMCS field will trigger
VM exit or directly get/set data by hardware.

With the new feature introduced, all the in "virtual VMCS" need to be operated
by VMREAD and VMWRITE because this VMCS will also be loaded into hardware.
This requires the capability to VMWRITE all the VMCS fields, including those
readonly ones. Intel SDM introduces this functionality at bit 29 in
IA32_VMX_MISC MSR.

For details, please refer to:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

Thanks,
Dongxiao

Dongxiao Xu (4):
  nested vmx: Use a list to store the launched vvmcs for L1 VMM
  nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS
    shadowing
  nested vmx: optimize for bulk access of virtual VMCS
  nested vmx: enable VMCS shadowing feature

 xen/arch/x86/hvm/vmx/vmcs.c        |   38 +++++
 xen/arch/x86/hvm/vmx/vvmx.c        |  300 ++++++++++++++++++++++++++++++++----
 xen/include/asm-x86/hvm/vcpu.h     |    2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   23 +++-
 xen/include/asm-x86/hvm/vmx/vvmx.h |   22 ++-
 5 files changed, 349 insertions(+), 36 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM
  2013-01-22 12:00 [PATCH v4 0/4] nested vmx: enable VMCS shadowing feature Dongxiao Xu
@ 2013-01-22 12:00 ` Dongxiao Xu
  2013-01-22 12:00 ` [PATCH v4 2/4] nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing Dongxiao Xu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Dongxiao Xu @ 2013-01-22 12:00 UTC (permalink / raw)
  To: xen-devel; +Cc: JBeulich, eddie.dong, xiantao.zhang, jun.nakajima

Originally we use a virtual VMCS field to store the launch state of
a certain vmcs. However if we introduce VMCS shadowing feature, this
virtual VMCS should also be able to load into real hardware,
and VMREAD/VMWRITE operate invalid fields.

The new approach is to store the launch state into a list for L1 VMM.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c        |   96 ++++++++++++++++++++++++++++++++----
 xen/include/asm-x86/hvm/vmx/vmcs.h |    2 -
 xen/include/asm-x86/hvm/vmx/vvmx.h |    6 ++
 3 files changed, 92 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index d4e9b02..a0e49c4 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -51,6 +51,7 @@ int nvmx_vcpu_initialise(struct vcpu *v)
     nvmx->iobitmap[0] = NULL;
     nvmx->iobitmap[1] = NULL;
     nvmx->msrbitmap = NULL;
+    INIT_LIST_HEAD(&nvmx->launched_list);
     return 0;
 out:
     return -ENOMEM;
@@ -58,7 +59,9 @@ out:
  
 void nvmx_vcpu_destroy(struct vcpu *v)
 {
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct vvmcs_list *item, *n;
 
     /* 
      * When destroying the vcpu, it may be running on behalf of L2 guest.
@@ -74,6 +77,12 @@ void nvmx_vcpu_destroy(struct vcpu *v)
         free_xenheap_page(nvcpu->nv_n2vmcx);
         nvcpu->nv_n2vmcx = NULL;
     }
+
+    list_for_each_entry_safe(item, n, &nvmx->launched_list, node)
+    {
+        list_del(&item->node);
+        xfree(item);
+    }
 }
  
 void nvmx_domain_relinquish_resources(struct domain *d)
@@ -1198,6 +1207,62 @@ int nvmx_handle_vmxoff(struct cpu_user_regs *regs)
     return X86EMUL_OKAY;
 }
 
+static bool_t vvmcs_launched(struct list_head *launched_list,
+                             unsigned long vvmcs_mfn)
+{
+    struct vvmcs_list *vvmcs;
+    struct list_head *pos;
+    bool_t launched = 0;
+
+    list_for_each(pos, launched_list)
+    {
+        vvmcs = list_entry(pos, struct vvmcs_list, node);
+        if ( vvmcs_mfn == vvmcs->vvmcs_mfn )
+        {
+            launched = 1;
+            break;
+        }
+    }
+
+    return launched;
+}
+
+static int set_vvmcs_launched(struct list_head *launched_list,
+                              unsigned long vvmcs_mfn)
+{
+    struct vvmcs_list *vvmcs;
+
+    if ( vvmcs_launched(launched_list, vvmcs_mfn) )
+        return 0;
+
+    vvmcs = xzalloc(struct vvmcs_list);
+    if ( !vvmcs )
+        return -ENOMEM;
+
+    vvmcs->vvmcs_mfn = vvmcs_mfn;
+    list_add(&vvmcs->node, launched_list);
+
+    return 0;
+}
+
+static void clear_vvmcs_launched(struct list_head *launched_list,
+                                 paddr_t vvmcs_mfn)
+{
+    struct vvmcs_list *vvmcs;
+    struct list_head *pos;
+
+    list_for_each(pos, launched_list)
+    {
+        vvmcs = list_entry(pos, struct vvmcs_list, node);
+        if ( vvmcs_mfn == vvmcs->vvmcs_mfn )
+        {
+            list_del(&vvmcs->node);
+            xfree(vvmcs);
+            break;
+        }
+    }
+}
+
 int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs)
 {
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
@@ -1221,8 +1286,10 @@ int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs)
 
 int nvmx_handle_vmresume(struct cpu_user_regs *regs)
 {
-    int launched;
+    bool_t launched;
     struct vcpu *v = current;
+    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 
     if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR )
     {
@@ -1230,8 +1297,8 @@ int nvmx_handle_vmresume(struct cpu_user_regs *regs)
         return X86EMUL_OKAY;        
     }
 
-    launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
-                           NVMX_LAUNCH_STATE);
+    launched = vvmcs_launched(&nvmx->launched_list,
+                   domain_page_map_to_mfn(nvcpu->nv_vvmcx));
     if ( !launched ) {
        vmreturn (regs, VMFAIL_VALID);
        return X86EMUL_OKAY;
@@ -1241,9 +1308,11 @@ int nvmx_handle_vmresume(struct cpu_user_regs *regs)
 
 int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
 {
-    int launched;
+    bool_t launched;
     int rc;
     struct vcpu *v = current;
+    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 
     if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR )
     {
@@ -1251,8 +1320,8 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
         return X86EMUL_OKAY;
     }
 
-    launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
-                           NVMX_LAUNCH_STATE);
+    launched = vvmcs_launched(&nvmx->launched_list,
+                   domain_page_map_to_mfn(nvcpu->nv_vvmcx));
     if ( launched ) {
        vmreturn (regs, VMFAIL_VALID);
        return X86EMUL_OKAY;
@@ -1260,8 +1329,12 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
     else {
         rc = nvmx_vmresume(v,regs);
         if ( rc == X86EMUL_OKAY )
-            __set_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
-                        NVMX_LAUNCH_STATE, 1);
+        {
+            if ( set_vvmcs_launched(&nvmx->launched_list,
+                    domain_page_map_to_mfn(nvcpu->nv_vvmcx)) < 0 )
+                return X86EMUL_UNHANDLEABLE;
+        }
+          
     }
     return rc;
 }
@@ -1328,6 +1401,7 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
     struct vcpu *v = current;
     struct vmx_inst_decoded decode;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
     unsigned long gpa = 0;
     void *vvmcs;
     int rc;
@@ -1344,7 +1418,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
     
     if ( gpa == nvcpu->nv_vvmcxaddr ) 
     {
-        __set_vvmcs(nvcpu->nv_vvmcx, NVMX_LAUNCH_STATE, 0);
+        clear_vvmcs_launched(&nvmx->launched_list,
+            domain_page_map_to_mfn(nvcpu->nv_vvmcx));
         nvmx_purge_vvmcs(v);
     }
     else 
@@ -1352,7 +1427,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
         /* Even if this VMCS isn't the current one, we must clear it. */
         vvmcs = hvm_map_guest_frame_rw(gpa >> PAGE_SHIFT);
         if ( vvmcs ) 
-            __set_vvmcs(vvmcs, NVMX_LAUNCH_STATE, 0);
+            clear_vvmcs_launched(&nvmx->launched_list,
+                domain_page_map_to_mfn(vvmcs));
         hvm_unmap_guest_frame(vvmcs);
     }
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 51df81e..9ff741f 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -421,8 +421,6 @@ enum vmcs_field {
     HOST_SYSENTER_EIP               = 0x00006c12,
     HOST_RSP                        = 0x00006c14,
     HOST_RIP                        = 0x00006c16,
-    /* A virtual VMCS field used for nestedvmx only */
-    NVMX_LAUNCH_STATE               = 0x00006c20,
 };
 
 #define VMCS_VPID_WIDTH 16
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 9e1dc77..89e839f 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -23,6 +23,11 @@
 #ifndef __ASM_X86_HVM_VVMX_H__
 #define __ASM_X86_HVM_VVMX_H__
 
+struct vvmcs_list {
+    unsigned long vvmcs_mfn;
+    struct list_head node;
+};
+
 struct nestedvmx {
     paddr_t    vmxon_region_pa;
     void       *iobitmap[2];		/* map (va) of L1 guest I/O bitmap */
@@ -38,6 +43,7 @@ struct nestedvmx {
         uint32_t exit_qual;
     } ept;
     uint32_t guest_vpid;
+    struct list_head launched_list;
 };
 
 #define vcpu_2_nvmx(v)	(vcpu_nestedhvm(v).u.nvmx)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/4] nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing
  2013-01-22 12:00 [PATCH v4 0/4] nested vmx: enable VMCS shadowing feature Dongxiao Xu
  2013-01-22 12:00 ` [PATCH v4 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM Dongxiao Xu
@ 2013-01-22 12:00 ` Dongxiao Xu
  2013-01-22 12:00 ` [PATCH v4 3/4] nested vmx: optimize for bulk access of virtual VMCS Dongxiao Xu
  2013-01-22 12:00 ` [PATCH v4 4/4] nested vmx: enable VMCS shadowing feature Dongxiao Xu
  3 siblings, 0 replies; 9+ messages in thread
From: Dongxiao Xu @ 2013-01-22 12:00 UTC (permalink / raw)
  To: xen-devel; +Cc: JBeulich, eddie.dong, xiantao.zhang, jun.nakajima

Before the VMCS shadowing feature, we use memory operation to build up
the virtual VMCS. This does work since this virtual VMCS will never be
loaded into real hardware. However after we introduce the VMCS
shadowing feature, this VMCS will be loaded into hardware, which
requires all fields in the VMCS accessed by VMREAD/VMWRITE.

Besides, the virtual VMCS revision identifer should also meet the
hardware's requirement, instead of using a faked one.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        |   29 +++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vvmx.c        |   20 ++++++++++++++++----
 xen/include/asm-x86/hvm/vmx/vmcs.h |    5 +++++
 xen/include/asm-x86/hvm/vmx/vvmx.h |   16 ++++++++++++----
 4 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index de22e03..82a8d91 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -725,6 +725,35 @@ void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to)
     spin_unlock(&vmx->vmcs_lock);
 }
 
+void virtual_vmcs_enter(void *vvmcs)
+{
+    __vmptrld(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
+}
+
+void virtual_vmcs_exit(void *vvmcs)
+{
+    __vmpclear(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
+    __vmptrld(virt_to_maddr(this_cpu(current_vmcs)));
+}
+
+u64 virtual_vmcs_vmread(void *vvmcs, u32 vmcs_encoding)
+{
+    u64 res;
+
+    virtual_vmcs_enter(vvmcs);
+    res = __vmread(vmcs_encoding);
+    virtual_vmcs_exit(vvmcs);
+
+    return res;
+}
+
+void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
+{
+    virtual_vmcs_enter(vvmcs);
+    __vmwrite(vmcs_encoding, val);
+    virtual_vmcs_exit(vvmcs);
+}
+
 static int construct_vmcs(struct vcpu *v)
 {
     struct domain *d = v->domain;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index a0e49c4..3221dd2 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -175,7 +175,7 @@ static int vvmcs_offset(u32 width, u32 type, u32 index)
     return offset;
 }
 
-u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding)
+u64 __get_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding)
 {
     union vmcs_encoding enc;
     u64 *content = (u64 *) vvmcs;
@@ -205,7 +205,12 @@ u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding)
     return res;
 }
 
-void __set_vvmcs(void *vvmcs, u32 vmcs_encoding, u64 val)
+u64 __get_vvmcs_real(void *vvmcs, u32 vmcs_encoding)
+{
+    return virtual_vmcs_vmread(vvmcs, vmcs_encoding);
+}
+
+void __set_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding, u64 val)
 {
     union vmcs_encoding enc;
     u64 *content = (u64 *) vvmcs;
@@ -241,6 +246,11 @@ void __set_vvmcs(void *vvmcs, u32 vmcs_encoding, u64 val)
     content[offset] = res;
 }
 
+void __set_vvmcs_real(void *vvmcs, u32 vmcs_encoding, u64 val)
+{
+    virtual_vmcs_vmwrite(vvmcs, vmcs_encoding, val);
+}
+
 static unsigned long reg_read(struct cpu_user_regs *regs,
                               enum vmx_regs_enc index)
 {
@@ -1564,10 +1574,11 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
  */
 int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
 {
+    struct vcpu *v = current;
     u64 data = 0, host_data = 0;
     int r = 1;
 
-    if ( !nestedhvm_enabled(current->domain) )
+    if ( !nestedhvm_enabled(v->domain) )
         return 0;
 
     rdmsrl(msr, host_data);
@@ -1577,7 +1588,8 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
      */
     switch (msr) {
     case MSR_IA32_VMX_BASIC:
-        data = (host_data & (~0ul << 32)) | VVMCS_REVISION;
+        data = (host_data & (~0ul << 32)) |
+               ((v->arch.hvm_vmx.vmcs)->vmcs_revision_id);
         break;
     case MSR_IA32_VMX_PINBASED_CTLS:
     case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 9ff741f..652dc21 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -244,6 +244,7 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info;
     (vmx_secondary_exec_control & SECONDARY_EXEC_APIC_REGISTER_VIRT)
 #define cpu_has_vmx_virtual_intr_delivery \
     (vmx_secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)
+#define cpu_has_vmx_vmcs_shadowing 0
 
 /* GUEST_INTERRUPTIBILITY_INFO flags. */
 #define VMX_INTR_SHADOW_STI             0x00000001
@@ -436,6 +437,10 @@ void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 int vmx_check_msr_bitmap(unsigned long *msr_bitmap, u32 msr, int access_type);
+void virtual_vmcs_enter(void *vvmcs);
+void virtual_vmcs_exit(void *vvmcs);
+u64 virtual_vmcs_vmread(void *vvmcs, u32 vmcs_encoding);
+void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val);
 
 #endif /* ASM_X86_HVM_VMX_VMCS_H__ */
 
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 89e839f..73a67cc 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -152,8 +152,6 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
  *
  */
 
-#define VVMCS_REVISION 0x40000001u
-
 struct vvmcs_header {
     u32 revision;
     u32 abort;
@@ -185,8 +183,18 @@ enum vvmcs_encoding_type {
     VVMCS_TYPE_HSTATE,
 };
 
-u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding);
-void __set_vvmcs(void *vvmcs, u32 vmcs_encoding, u64 val);
+u64 __get_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding);
+u64 __get_vvmcs_real(void *vvmcs, u32 vmcs_encoding);
+void __set_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding, u64 val);
+void __set_vvmcs_real(void *vvmcs, u32 vmcs_encoding, u64 val);
+
+#define __get_vvmcs(_vvmcs, _vmcs_encoding) \
+  (cpu_has_vmx_vmcs_shadowing ? __get_vvmcs_real(_vvmcs, _vmcs_encoding) \
+                              : __get_vvmcs_virtual(_vvmcs, _vmcs_encoding))
+
+#define __set_vvmcs(_vvmcs, _vmcs_encoding, _val) \
+  (cpu_has_vmx_vmcs_shadowing ? __set_vvmcs_real(_vvmcs, _vmcs_encoding, _val) \
+                              : __set_vvmcs_virtual(_vvmcs, _vmcs_encoding, _val))
 
 uint64_t get_shadow_eptp(struct vcpu *v);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/4] nested vmx: optimize for bulk access of virtual VMCS
  2013-01-22 12:00 [PATCH v4 0/4] nested vmx: enable VMCS shadowing feature Dongxiao Xu
  2013-01-22 12:00 ` [PATCH v4 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM Dongxiao Xu
  2013-01-22 12:00 ` [PATCH v4 2/4] nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing Dongxiao Xu
@ 2013-01-22 12:00 ` Dongxiao Xu
  2013-01-22 13:12   ` Jan Beulich
  2013-01-22 12:00 ` [PATCH v4 4/4] nested vmx: enable VMCS shadowing feature Dongxiao Xu
  3 siblings, 1 reply; 9+ messages in thread
From: Dongxiao Xu @ 2013-01-22 12:00 UTC (permalink / raw)
  To: xen-devel; +Cc: JBeulich, eddie.dong, xiantao.zhang, jun.nakajima

After we use the VMREAD/VMWRITE to build up the virtual VMCS, each
access to the virtual VMCS needs two VMPTRLD and one VMCLEAR to
switch the environment, which might be an overhead to performance.
This commit tries to handle multiple virtual VMCS access together
to improve the performance.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c    |  103 +++++++++++++++++++++++++++++++++------
 xen/include/asm-x86/hvm/vcpu.h |    2 +
 2 files changed, 89 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 3221dd2..e401b63 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -30,6 +30,7 @@
 
 static void nvmx_purge_vvmcs(struct vcpu *v);
 
+#define VMCS_BUF_SIZE 500
 int nvmx_vcpu_initialise(struct vcpu *v)
 {
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
@@ -41,6 +42,12 @@ int nvmx_vcpu_initialise(struct vcpu *v)
         gdprintk(XENLOG_ERR, "nest: allocation for shadow vmcs failed\n");
 	goto out;
     }
+
+    nvcpu->vvmcx_buf = xzalloc_array(u64, VMCS_BUF_SIZE);
+    /* If vmcs buffer allocation failed, we will fall back to default mode. */
+    if ( !nvcpu->vvmcx_buf )
+        gdprintk(XENLOG_WARNING, "nest: allocating exchanging buffer failed\n");
+
     nvmx->ept.enabled = 0;
     nvmx->guest_vpid = 0;
     nvmx->vmxon_region_pa = 0;
@@ -83,6 +90,9 @@ void nvmx_vcpu_destroy(struct vcpu *v)
         list_del(&item->node);
         xfree(item);
     }
+
+    if ( nvcpu->vvmcx_buf )
+        xfree(nvcpu->vvmcx_buf);
 }
  
 void nvmx_domain_relinquish_resources(struct domain *d)
@@ -830,6 +840,35 @@ static void vvmcs_to_shadow(void *vvmcs, unsigned int field)
     __vmwrite(field, value);
 }
 
+static void vvmcs_to_shadow_bulk(struct vcpu *v, unsigned int n,
+                                 const u16 *field)
+{
+    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    void *vvmcs = nvcpu->nv_vvmcx;
+    u64 *value = nvcpu->vvmcx_buf;
+    unsigned int i;
+
+    if ( !cpu_has_vmx_vmcs_shadowing )
+        goto fallback;
+
+    if ( !value || n > VMCS_BUF_SIZE )
+        goto fallback;
+
+    virtual_vmcs_enter(vvmcs);
+    for ( i = 0; i < n; i++ )
+        value[i] = __vmread(field[i]);
+    virtual_vmcs_exit(vvmcs);
+
+    for ( i = 0; i < n; i++ )
+        __vmwrite(field[i], value[i]);
+
+    return;
+
+fallback:
+    for ( i = 0; i < n; i++ )
+        vvmcs_to_shadow(vvmcs, field[i]);
+}
+
 static void shadow_to_vvmcs(void *vvmcs, unsigned int field)
 {
     u64 value;
@@ -840,6 +879,35 @@ static void shadow_to_vvmcs(void *vvmcs, unsigned int field)
         __set_vvmcs(vvmcs, field, value);
 }
 
+static void shadow_to_vvmcs_bulk(struct vcpu *v, unsigned int n,
+                                 const u16 *field)
+{
+    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    void *vvmcs = nvcpu->nv_vvmcx;
+    u64 *value = nvcpu->vvmcx_buf;
+    unsigned int i;
+
+    if ( !cpu_has_vmx_vmcs_shadowing )
+        goto fallback;
+
+    if ( !value || n > VMCS_BUF_SIZE )
+        goto fallback;
+
+    for ( i = 0; i < n; i++ )
+        value[i] = __vmread(field[i]);
+
+    virtual_vmcs_enter(vvmcs);
+    for ( i = 0; i < n; i++ )
+        __vmwrite(field[i], value[i]);
+    virtual_vmcs_exit(vvmcs);
+
+    return;
+
+fallback:
+    for ( i = 0; i < n; i++ )
+        shadow_to_vvmcs(vvmcs, field[i]);
+}
+
 static void load_shadow_control(struct vcpu *v)
 {
     /*
@@ -863,13 +931,18 @@ static void load_shadow_guest_state(struct vcpu *v)
 {
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     void *vvmcs = nvcpu->nv_vvmcx;
-    int i;
     u32 control;
     u64 cr_gh_mask, cr_read_shadow;
 
+    static const u16 vmentry_fields[] = {
+        VM_ENTRY_INTR_INFO,
+        VM_ENTRY_EXCEPTION_ERROR_CODE,
+        VM_ENTRY_INSTRUCTION_LEN,
+    };
+
     /* vvmcs.gstate to shadow vmcs.gstate */
-    for ( i = 0; i < ARRAY_SIZE(vmcs_gstate_field); i++ )
-        vvmcs_to_shadow(vvmcs, vmcs_gstate_field[i]);
+    vvmcs_to_shadow_bulk(v, ARRAY_SIZE(vmcs_gstate_field),
+                         vmcs_gstate_field);
 
     hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0));
     hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4));
@@ -883,9 +956,7 @@ static void load_shadow_guest_state(struct vcpu *v)
 
     hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset);
 
-    vvmcs_to_shadow(vvmcs, VM_ENTRY_INTR_INFO);
-    vvmcs_to_shadow(vvmcs, VM_ENTRY_EXCEPTION_ERROR_CODE);
-    vvmcs_to_shadow(vvmcs, VM_ENTRY_INSTRUCTION_LEN);
+    vvmcs_to_shadow_bulk(v, ARRAY_SIZE(vmentry_fields), vmentry_fields);
 
     /*
      * While emulate CR0 and CR4 for nested virtualization, set the CR0/CR4
@@ -905,10 +976,13 @@ static void load_shadow_guest_state(struct vcpu *v)
     if ( nvmx_ept_enabled(v) && hvm_pae_enabled(v) &&
          (v->arch.hvm_vcpu.guest_efer & EFER_LMA) )
     {
-        vvmcs_to_shadow(vvmcs, GUEST_PDPTR0);
-        vvmcs_to_shadow(vvmcs, GUEST_PDPTR1);
-        vvmcs_to_shadow(vvmcs, GUEST_PDPTR2);
-        vvmcs_to_shadow(vvmcs, GUEST_PDPTR3);
+        static const u16 gpdptr_fields[] = {
+            GUEST_PDPTR0,
+            GUEST_PDPTR1,
+            GUEST_PDPTR2,
+            GUEST_PDPTR3,
+        };
+        vvmcs_to_shadow_bulk(v, ARRAY_SIZE(gpdptr_fields), gpdptr_fields);
     }
 
     /* TODO: CR3 target control */
@@ -999,13 +1073,12 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
 
 static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs)
 {
-    int i;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     void *vvmcs = nvcpu->nv_vvmcx;
 
     /* copy shadow vmcs.gstate back to vvmcs.gstate */
-    for ( i = 0; i < ARRAY_SIZE(vmcs_gstate_field); i++ )
-        shadow_to_vvmcs(vvmcs, vmcs_gstate_field[i]);
+    shadow_to_vvmcs_bulk(v, ARRAY_SIZE(vmcs_gstate_field),
+                         vmcs_gstate_field);
     /* RIP, RSP are in user regs */
     __set_vvmcs(vvmcs, GUEST_RIP, regs->eip);
     __set_vvmcs(vvmcs, GUEST_RSP, regs->esp);
@@ -1017,13 +1090,11 @@ static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs)
 
 static void sync_vvmcs_ro(struct vcpu *v)
 {
-    int i;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
     void *vvmcs = nvcpu->nv_vvmcx;
 
-    for ( i = 0; i < ARRAY_SIZE(vmcs_ro_field); i++ )
-        shadow_to_vvmcs(nvcpu->nv_vvmcx, vmcs_ro_field[i]);
+    shadow_to_vvmcs_bulk(v, ARRAY_SIZE(vmcs_ro_field), vmcs_ro_field);
 
     /* Adjust exit_reason/exit_qualifciation for violation case */
     if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) == EXIT_REASON_EPT_VIOLATION )
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index e8b8cd7..c9ffafc 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -100,6 +100,8 @@ struct nestedvcpu {
      */
     bool_t nv_ioport80;
     bool_t nv_ioportED;
+
+    u64 *vvmcx_buf; /* A temp buffer for data exchange */
 };
 
 #define vcpu_nestedhvm(v) ((v)->arch.hvm_vcpu.nvcpu)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 4/4] nested vmx: enable VMCS shadowing feature
  2013-01-22 12:00 [PATCH v4 0/4] nested vmx: enable VMCS shadowing feature Dongxiao Xu
                   ` (2 preceding siblings ...)
  2013-01-22 12:00 ` [PATCH v4 3/4] nested vmx: optimize for bulk access of virtual VMCS Dongxiao Xu
@ 2013-01-22 12:00 ` Dongxiao Xu
  2013-01-22 13:22   ` Jan Beulich
  3 siblings, 1 reply; 9+ messages in thread
From: Dongxiao Xu @ 2013-01-22 12:00 UTC (permalink / raw)
  To: xen-devel; +Cc: JBeulich, eddie.dong, xiantao.zhang, jun.nakajima

The current logic for handling the non-root VMREAD/VMWRITE is by
VM-Exit and emulate, which may bring certain overhead.

On new Intel platform, it introduces a new feature called VMCS
shadowing, where non-root VMREAD/VMWRITE will not trigger VM-Exit,
and the hardware will read/write the virtual VMCS instead.
This is proved to have performance improvement with the feature.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        |    9 ++++
 xen/arch/x86/hvm/vmx/vvmx.c        |   81 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |   18 ++++++++-
 3 files changed, 107 insertions(+), 1 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 82a8d91..14c5948 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -91,6 +91,7 @@ static void __init vmx_display_features(void)
     P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest");
     P(cpu_has_vmx_apic_reg_virt, "APIC Register Virtualization");
     P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt Delivery");
+    P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing");
 #undef P
 
     if ( !printed )
@@ -132,6 +133,7 @@ static int vmx_init_vmcs_config(void)
     u32 _vmx_cpu_based_exec_control;
     u32 _vmx_secondary_exec_control = 0;
     u64 _vmx_ept_vpid_cap = 0;
+    u64 _vmx_misc_cap = 0;
     u32 _vmx_vmexit_control;
     u32 _vmx_vmentry_control;
     bool_t mismatch = 0;
@@ -179,6 +181,9 @@ static int vmx_init_vmcs_config(void)
                SECONDARY_EXEC_ENABLE_RDTSCP |
                SECONDARY_EXEC_PAUSE_LOOP_EXITING |
                SECONDARY_EXEC_ENABLE_INVPCID);
+        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
+        if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
+            opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
         if ( opt_vpid_enabled )
             opt |= SECONDARY_EXEC_ENABLE_VPID;
         if ( opt_unrestricted_guest_enabled )
@@ -382,6 +387,8 @@ static void __vmx_clear_vmcs(void *info)
     if ( arch_vmx->active_cpu == smp_processor_id() )
     {
         __vmpclear(virt_to_maddr(arch_vmx->vmcs));
+        if ( arch_vmx->vmcs_shadow_maddr )
+            __vmpclear(arch_vmx->vmcs_shadow_maddr);
 
         arch_vmx->active_cpu = -1;
         arch_vmx->launched   = 0;
@@ -710,6 +717,8 @@ void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to)
     spin_lock(&vmx->vmcs_lock);
 
     __vmpclear(virt_to_maddr(from));
+    if ( vmx->vmcs_shadow_maddr )
+        __vmpclear(vmx->vmcs_shadow_maddr);
     __vmptrld(virt_to_maddr(to));
 
     vmx->vmcs = to;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index e401b63..98777b1 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -48,6 +48,47 @@ int nvmx_vcpu_initialise(struct vcpu *v)
     if ( !nvcpu->vvmcx_buf )
         gdprintk(XENLOG_WARNING, "nest: allocating exchanging buffer failed\n");
 
+    /* non-root VMREAD/VMWRITE bitmap. */
+    if ( cpu_has_vmx_vmcs_shadowing )
+    {
+        struct page_info *vmread_bitmap, *vmwrite_bitmap;
+        void *vr, *vw;
+
+        vmread_bitmap = alloc_domheap_page(NULL, 0);
+        if ( !vmread_bitmap )
+        {
+            gdprintk(XENLOG_ERR, "nest: allocation for vmread bitmap failed\n");
+            goto out1;
+        }
+        v->arch.hvm_vmx.vmread_bitmap = page_to_mfn(vmread_bitmap);
+
+        vmwrite_bitmap = alloc_domheap_page(NULL, 0);
+        if ( !vmwrite_bitmap )
+        {
+            gdprintk(XENLOG_ERR, "nest: allocation for vmwrite bitmap failed\n");
+            goto out2;
+        }
+        v->arch.hvm_vmx.vmwrite_bitmap = page_to_mfn(vmwrite_bitmap);
+
+        vr = map_domain_page(v->arch.hvm_vmx.vmread_bitmap);
+        vw = map_domain_page(v->arch.hvm_vmx.vmwrite_bitmap);
+
+        clear_page(vr);
+        clear_page(vw);
+
+        /* 
+         * For the following 4 encodings, we need to handle them in VMM.
+         * Let them vmexit as usual.
+         */
+        set_bit(IO_BITMAP_A, (u64 *)vw);
+        set_bit(IO_BITMAP_A_HIGH, (u64 *)vw);
+        set_bit(IO_BITMAP_B, (u64 *)vw);
+        set_bit(IO_BITMAP_B_HIGH, (u64 *)vw);
+
+        unmap_domain_page(vr);
+        unmap_domain_page(vw);
+    }
+
     nvmx->ept.enabled = 0;
     nvmx->guest_vpid = 0;
     nvmx->vmxon_region_pa = 0;
@@ -60,6 +101,10 @@ int nvmx_vcpu_initialise(struct vcpu *v)
     nvmx->msrbitmap = NULL;
     INIT_LIST_HEAD(&nvmx->launched_list);
     return 0;
+out2:
+    free_domheap_page(mfn_to_page(v->arch.hvm_vmx.vmread_bitmap));
+out1:
+    free_xenheap_page(nvcpu->nv_n2vmcx);
 out:
     return -ENOMEM;
 }
@@ -93,6 +138,11 @@ void nvmx_vcpu_destroy(struct vcpu *v)
 
     if ( nvcpu->vvmcx_buf )
         xfree(nvcpu->vvmcx_buf);
+
+    if ( v->arch.hvm_vmx.vmread_bitmap )
+        free_domheap_page(mfn_to_page(v->arch.hvm_vmx.vmread_bitmap));
+    if ( v->arch.hvm_vmx.vmwrite_bitmap )
+        free_domheap_page(mfn_to_page(v->arch.hvm_vmx.vmwrite_bitmap));
 }
  
 void nvmx_domain_relinquish_resources(struct domain *d)
@@ -1008,6 +1058,32 @@ static bool_t nvmx_vpid_enabled(struct nestedvcpu *nvcpu)
     return 0;
 }
 
+static void nvmx_set_vmcs_pointer(struct vcpu *v, struct vmcs_struct *vvmcs)
+{
+    unsigned long vvmcs_mfn = domain_page_map_to_mfn(vvmcs);
+    paddr_t vvmcs_maddr = vvmcs_mfn << PAGE_SHIFT;
+
+    __vmpclear(vvmcs_maddr);
+    vvmcs->vmcs_revision_id |= VMCS_RID_TYPE_MASK;
+    v->arch.hvm_vmx.vmcs_shadow_maddr = vvmcs_maddr;
+    __vmwrite(VMCS_LINK_POINTER, vvmcs_maddr);
+    __vmwrite(VMREAD_BITMAP, pfn_to_paddr(v->arch.hvm_vmx.vmread_bitmap));
+    __vmwrite(VMWRITE_BITMAP, pfn_to_paddr(v->arch.hvm_vmx.vmwrite_bitmap));
+}
+
+static void nvmx_clear_vmcs_pointer(struct vcpu *v, struct vmcs_struct *vvmcs)
+{
+    unsigned long vvmcs_mfn = domain_page_map_to_mfn(vvmcs);
+    paddr_t vvmcs_maddr = vvmcs_mfn << PAGE_SHIFT;
+
+    __vmpclear(vvmcs_maddr);
+    vvmcs->vmcs_revision_id &= ~VMCS_RID_TYPE_MASK;
+    v->arch.hvm_vmx.vmcs_shadow_maddr = 0;
+    __vmwrite(VMCS_LINK_POINTER, ~0ul);
+    __vmwrite(VMREAD_BITMAP, 0);
+    __vmwrite(VMWRITE_BITMAP, 0);
+}
+
 static void virtual_vmentry(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
@@ -1449,6 +1525,9 @@ int nvmx_handle_vmptrld(struct cpu_user_regs *regs)
         __map_msr_bitmap(v);
     }
 
+    if ( cpu_has_vmx_vmcs_shadowing )
+        nvmx_set_vmcs_pointer(v, nvcpu->nv_vvmcx);
+
     vmreturn(regs, VMSUCCEED);
 
 out:
@@ -1499,6 +1578,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
     
     if ( gpa == nvcpu->nv_vvmcxaddr ) 
     {
+        if ( cpu_has_vmx_vmcs_shadowing )
+            nvmx_clear_vmcs_pointer(v, nvcpu->nv_vvmcx);
         clear_vvmcs_launched(&nvmx->launched_list,
             domain_page_map_to_mfn(nvcpu->nv_vvmcx));
         nvmx_purge_vvmcs(v);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 652dc21..5fc75ac 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -81,6 +81,8 @@ struct vmx_domain {
 struct arch_vmx_struct {
     /* Virtual address of VMCS. */
     struct vmcs_struct  *vmcs;
+    /* VMCS shadow machine address. */
+    paddr_t             vmcs_shadow_maddr;
 
     /* Protects remote usage of VMCS (VMPTRLD/VMCLEAR). */
     spinlock_t           vmcs_lock;
@@ -125,6 +127,10 @@ struct arch_vmx_struct {
     /* Remember EFLAGS while in virtual 8086 mode */
     uint32_t             vm86_saved_eflags;
     int                  hostenv_migrated;
+
+    /* Bitmap MFN to control vmexit policy for Non-root VMREAD/VMWRITE */
+    unsigned long       vmread_bitmap;
+    unsigned long       vmwrite_bitmap;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
@@ -191,6 +197,7 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING       0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID           0x00001000
+#define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
 extern u32 vmx_secondary_exec_control;
 
 extern bool_t cpu_has_vmx_ins_outs_instr_info;
@@ -205,6 +212,8 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info;
 #define VMX_EPT_INVEPT_SINGLE_CONTEXT           0x02000000
 #define VMX_EPT_INVEPT_ALL_CONTEXT              0x04000000
 
+#define VMX_MISC_VMWRITE_ALL                    0x20000000
+
 #define VMX_VPID_INVVPID_INSTRUCTION                        0x100000000ULL
 #define VMX_VPID_INVVPID_INDIVIDUAL_ADDR                    0x10000000000ULL
 #define VMX_VPID_INVVPID_SINGLE_CONTEXT                     0x20000000000ULL
@@ -244,7 +253,10 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info;
     (vmx_secondary_exec_control & SECONDARY_EXEC_APIC_REGISTER_VIRT)
 #define cpu_has_vmx_virtual_intr_delivery \
     (vmx_secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)
-#define cpu_has_vmx_vmcs_shadowing 0
+#define cpu_has_vmx_vmcs_shadowing \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VMCS_SHADOWING)
+
+#define VMCS_RID_TYPE_MASK              0x80000000
 
 /* GUEST_INTERRUPTIBILITY_INFO flags. */
 #define VMX_INTR_SHADOW_STI             0x00000001
@@ -305,6 +317,10 @@ enum vmcs_field {
     EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
     EOI_EXIT_BITMAP3                = 0x00002022,
     EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
+    VMREAD_BITMAP                   = 0x00002026,
+    VMREAD_BITMAP_HIGH              = 0x00002027,
+    VMWRITE_BITMAP                  = 0x00002028,
+    VMWRITE_BITMAP_HIGH             = 0x00002029,
     GUEST_PHYSICAL_ADDRESS          = 0x00002400,
     GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
     VMCS_LINK_POINTER               = 0x00002800,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/4] nested vmx: optimize for bulk access of virtual VMCS
  2013-01-22 12:00 ` [PATCH v4 3/4] nested vmx: optimize for bulk access of virtual VMCS Dongxiao Xu
@ 2013-01-22 13:12   ` Jan Beulich
  2013-01-22 14:16     ` Xu, Dongxiao
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2013-01-22 13:12 UTC (permalink / raw)
  To: Dongxiao Xu; +Cc: eddie.dong, xiantao.zhang, jun.nakajima, xen-devel

>>> On 22.01.13 at 13:00, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -30,6 +30,7 @@
>  
>  static void nvmx_purge_vvmcs(struct vcpu *v);
>  
> +#define VMCS_BUF_SIZE 500

The biggest batch I can spot is about 60 elements large, so
why 500?

> @@ -83,6 +90,9 @@ void nvmx_vcpu_destroy(struct vcpu *v)
>          list_del(&item->node);
>          xfree(item);
>      }
> +
> +    if ( nvcpu->vvmcx_buf )
> +        xfree(nvcpu->vvmcx_buf);

No need for the if() - xfree() copes quite well with NULL pointers.

> @@ -830,6 +840,35 @@ static void vvmcs_to_shadow(void *vvmcs, unsigned int field)
>      __vmwrite(field, value);
>  }
>  
> +static void vvmcs_to_shadow_bulk(struct vcpu *v, unsigned int n,
> +                                 const u16 *field)
> +{
> +    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> +    void *vvmcs = nvcpu->nv_vvmcx;
> +    u64 *value = nvcpu->vvmcx_buf;
> +    unsigned int i;
> +
> +    if ( !cpu_has_vmx_vmcs_shadowing )
> +        goto fallback;
> +
> +    if ( !value || n > VMCS_BUF_SIZE )

And then, if you lower that value, be verbose (at lest in debugging
builds) about the buffer size being exceeded.

> --- a/xen/include/asm-x86/hvm/vcpu.h
> +++ b/xen/include/asm-x86/hvm/vcpu.h
> @@ -100,6 +100,8 @@ struct nestedvcpu {
>       */
>      bool_t nv_ioport80;
>      bool_t nv_ioportED;
> +
> +    u64 *vvmcx_buf; /* A temp buffer for data exchange */

VMX-specific field in non-VMX structure? And wouldn't the buffer
anyway more efficiently be per-pCPU instead of per-vCPU?

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 4/4] nested vmx: enable VMCS shadowing feature
  2013-01-22 12:00 ` [PATCH v4 4/4] nested vmx: enable VMCS shadowing feature Dongxiao Xu
@ 2013-01-22 13:22   ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2013-01-22 13:22 UTC (permalink / raw)
  To: Dongxiao Xu; +Cc: eddie.dong, xiantao.zhang, jun.nakajima, xen-devel

>>> On 22.01.13 at 13:00, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -48,6 +48,47 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>      if ( !nvcpu->vvmcx_buf )
>          gdprintk(XENLOG_WARNING, "nest: allocating exchanging buffer failed\n");
>  
> +    /* non-root VMREAD/VMWRITE bitmap. */
> +    if ( cpu_has_vmx_vmcs_shadowing )
> +    {
> +        struct page_info *vmread_bitmap, *vmwrite_bitmap;
> +        void *vr, *vw;
> +
> +        vmread_bitmap = alloc_domheap_page(NULL, 0);
> +        if ( !vmread_bitmap )
> +        {
> +            gdprintk(XENLOG_ERR, "nest: allocation for vmread bitmap failed\n");
> +            goto out1;
> +        }
> +        v->arch.hvm_vmx.vmread_bitmap = page_to_mfn(vmread_bitmap);

Considering that you never really use the MFNs, why do you store
them instead of the struct page_info *?

> +
> +        vmwrite_bitmap = alloc_domheap_page(NULL, 0);
> +        if ( !vmwrite_bitmap )
> +        {
> +            gdprintk(XENLOG_ERR, "nest: allocation for vmwrite bitmap failed\n");
> +            goto out2;
> +        }
> +        v->arch.hvm_vmx.vmwrite_bitmap = page_to_mfn(vmwrite_bitmap);
> +
> +        vr = map_domain_page(v->arch.hvm_vmx.vmread_bitmap);
> +        vw = map_domain_page(v->arch.hvm_vmx.vmwrite_bitmap);
> +
> +        clear_page(vr);
> +        clear_page(vw);
> +
> +        /* 
> +         * For the following 4 encodings, we need to handle them in VMM.
> +         * Let them vmexit as usual.
> +         */
> +        set_bit(IO_BITMAP_A, (u64 *)vw);
> +        set_bit(IO_BITMAP_A_HIGH, (u64 *)vw);
> +        set_bit(IO_BITMAP_B, (u64 *)vw);
> +        set_bit(IO_BITMAP_B_HIGH, (u64 *)vw);

Ugly casts - neither do you really need u64 here (unsigned int
or unsigned long would suffice), nor is there any point in having
these (dangerous) explicit casts. Just have the variables have
the right type, map_domain_page() returning void * allows you
to do so.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/4] nested vmx: optimize for bulk access of virtual VMCS
  2013-01-22 13:12   ` Jan Beulich
@ 2013-01-22 14:16     ` Xu, Dongxiao
  2013-01-22 14:56       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Xu, Dongxiao @ 2013-01-22 14:16 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Dong, Eddie, Zhang, Xiantao, Nakajima, Jun, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, January 22, 2013 9:13 PM
> To: Xu, Dongxiao
> Cc: Dong, Eddie; Nakajima, Jun; Zhang, Xiantao; xen-devel
> Subject: Re: [PATCH v4 3/4] nested vmx: optimize for bulk access of virtual
> VMCS
> 
> >>> On 22.01.13 at 13:00, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > @@ -30,6 +30,7 @@
> >
> >  static void nvmx_purge_vvmcs(struct vcpu *v);
> >
> > +#define VMCS_BUF_SIZE 500
> 
> The biggest batch I can spot is about 60 elements large, so
> why 500?
> 
> > @@ -83,6 +90,9 @@ void nvmx_vcpu_destroy(struct vcpu *v)
> >          list_del(&item->node);
> >          xfree(item);
> >      }
> > +
> > +    if ( nvcpu->vvmcx_buf )
> > +        xfree(nvcpu->vvmcx_buf);
> 
> No need for the if() - xfree() copes quite well with NULL pointers.
> 
> > @@ -830,6 +840,35 @@ static void vvmcs_to_shadow(void *vvmcs,
> unsigned int field)
> >      __vmwrite(field, value);
> >  }
> >
> > +static void vvmcs_to_shadow_bulk(struct vcpu *v, unsigned int n,
> > +                                 const u16 *field)
> > +{
> > +    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> > +    void *vvmcs = nvcpu->nv_vvmcx;
> > +    u64 *value = nvcpu->vvmcx_buf;
> > +    unsigned int i;
> > +
> > +    if ( !cpu_has_vmx_vmcs_shadowing )
> > +        goto fallback;
> > +
> > +    if ( !value || n > VMCS_BUF_SIZE )
> 
> And then, if you lower that value, be verbose (at lest in debugging
> builds) about the buffer size being exceeded.
> 
> > --- a/xen/include/asm-x86/hvm/vcpu.h
> > +++ b/xen/include/asm-x86/hvm/vcpu.h
> > @@ -100,6 +100,8 @@ struct nestedvcpu {
> >       */
> >      bool_t nv_ioport80;
> >      bool_t nv_ioportED;
> > +
> > +    u64 *vvmcx_buf; /* A temp buffer for data exchange */
> 
> VMX-specific field in non-VMX structure? And wouldn't the buffer
> anyway more efficiently be per-pCPU instead of per-vCPU?

Yes, it should be VMX specific.
I also ever thought of putting it per-pCPU, however we need to find somewhere to initialize and finalize the pointer.
One possible place is in vmx_cpu_up()/vmx_cpu_down(), but I think it is not proper to put the code into the two functions, they are not quite related. Therefore I put it per-vcpu structure.
Do you have any hint about it?

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/4] nested vmx: optimize for bulk access of virtual VMCS
  2013-01-22 14:16     ` Xu, Dongxiao
@ 2013-01-22 14:56       ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2013-01-22 14:56 UTC (permalink / raw)
  To: Dongxiao Xu; +Cc: Eddie Dong, Xiantao Zhang, Jun Nakajima, xen-devel

>>> On 22.01.13 at 15:16, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 22.01.13 at 13:00, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
>> > --- a/xen/include/asm-x86/hvm/vcpu.h
>> > +++ b/xen/include/asm-x86/hvm/vcpu.h
>> > @@ -100,6 +100,8 @@ struct nestedvcpu {
>> >       */
>> >      bool_t nv_ioport80;
>> >      bool_t nv_ioportED;
>> > +
>> > +    u64 *vvmcx_buf; /* A temp buffer for data exchange */
>> 
>> VMX-specific field in non-VMX structure? And wouldn't the buffer
>> anyway more efficiently be per-pCPU instead of per-vCPU?
> 
> Yes, it should be VMX specific.
> I also ever thought of putting it per-pCPU, however we need to find somewhere 
> to initialize and finalize the pointer.
> One possible place is in vmx_cpu_up()/vmx_cpu_down(), but I think it is not 
> proper to put the code into the two functions, they are not quite related. 
> Therefore I put it per-vcpu structure.
> Do you have any hint about it?

Calling new functions in vvmx.c from vmx_cpu_up_prepare() and
vmx_cpu_dead() would seem appropriate.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-01-22 14:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-22 12:00 [PATCH v4 0/4] nested vmx: enable VMCS shadowing feature Dongxiao Xu
2013-01-22 12:00 ` [PATCH v4 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM Dongxiao Xu
2013-01-22 12:00 ` [PATCH v4 2/4] nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing Dongxiao Xu
2013-01-22 12:00 ` [PATCH v4 3/4] nested vmx: optimize for bulk access of virtual VMCS Dongxiao Xu
2013-01-22 13:12   ` Jan Beulich
2013-01-22 14:16     ` Xu, Dongxiao
2013-01-22 14:56       ` Jan Beulich
2013-01-22 12:00 ` [PATCH v4 4/4] nested vmx: enable VMCS shadowing feature Dongxiao Xu
2013-01-22 13:22   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.