All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5
@ 2013-04-18 11:34 Abel Gordon
  2013-04-18 11:34 ` [PATCH 01/11] KVM: nVMX: Shadow-vmcs control fields/bits Abel Gordon
                   ` (13 more replies)
  0 siblings, 14 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:34 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

This series of patches implements shadow-vmcs capability for nested VMX.

Shadow-vmcs - background and overview:

 In Intel VMX, vmread and vmwrite privileged instructions are used by the
 hypervisor to read and modify the guest and host specifications (VMCS). In a
 nested virtualization environment, L1 executes multiple vmread and vmwrite
 instruction to handle a single L2 exit. Each vmread and vmwrite executed by L1
 traps (cause an exit) to the L0 hypervisor (KVM). L0 emulates the instruction
 behaviour and resumes L1 execution.

 Removing the need to trap and emulate these special instructions reduces the
 number of exits and improves nested virtualization performance. As it was first
 evaluated in [1], exit-less vmread and vmwrite can reduce nested virtualization
 overhead up-to 40%.
 
 Intel introduced a new feature to their processors called shadow-vmcs.  Using
 shadow-vmcs, L0 can configure the processor to let L1 running in guest-mode
 access VMCS12 fields using vmread and vmwrite instructions but without causing
 an exit to L0. The VMCS12 fields' data is stored in a shadow-vmcs controlled
 by L0.

Shadow-vmcs - design considerations: 

 A shadow-vmcs is processor-dependent and must be accessed by L0 or L1 using
 vmread and vmwrite instructions. With nested virtualization we aim to abstract
 the hardware from the L1 hypervisor. Thus, to avoid hardware dependencies we
 prefered to keep the software defined VMCS12 format as part of L1 address space
 and hold the processor-specific shadow-vmcs format only in L0 address space.
 In other words, the shadow-vmcs is used by L0 as an accelerator but the format
 and content is never exposed to L1 directly. L0 syncs the content of the
 processor-specific shadow vmcs with the content of the software-controlled
 VMCS12 format.

 We could have been kept the processor-specific shadow-vmcs format in L1 address
 space to avoid using the software defined VMCS12 format, however, this type of
 design/implementation would have been created hardware dependencies and
 would complicate other capabilities (e.g. Live Migration of L1).

Changes since v1:
 1) Added sync_shadow_vmcs flag used to indicate when the content of VMCS12
    must be copied to the shadow vmcs. The flag value is checked during 
    vmx_vcpu_run.
 2) Code quality improvements

Changes since v2:
 1) Allocate shadow vmcs only once per VCPU on handle_vmxon and re-use the 
    same instance for multiple VMCS12s
 2) More code quality improvements

Changes since v3:
 1) Fixed VMXON emulation (new patch). 
    Previous nVMX code didn't verify if L1 is already in root mode (VMXON
    was previously called). Now we call nested_vmx_failValid if VMX is 
    already ON. This is requird to avoid host leaks (due to shadow vmcs 
    allocation) if L1 repetedly executes VMXON.
 2) Improved comment: clarified we do not shadow fields that are modified
    when L1 executes vmx instructions like the VM_INSTRUCTION_ERROR field.

Changes since v4:
 1) Fixed free_nested: we now free the shadow vmcs also 
    when there is no current vmcs.
 
Acknowledgments:

 Many thanks to
 "Natapov, Gleb" <gleb@redhat.com> 
 "Xu, Dongxiao" <dongxiao.xu@intel.com>
 "Nakajima, Jun" <jun.nakajima@intel.com>
 "Har'El, Nadav" <nadav@harel.org.il> 
  
 for the insightful discussions, comments and reviews.


 These patches were easily created and maintained using
     Patchouli -- patch creator
     http://patchouli.sourceforge.net/


[1] "The Turtles Project: Design and Implementation of Nested Virtualization",
    http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/11] KVM: nVMX: Shadow-vmcs control fields/bits
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
@ 2013-04-18 11:34 ` Abel Gordon
  2013-04-18 11:35 ` [PATCH 02/11] KVM: nVMX: Detect shadow-vmcs capability Abel Gordon
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:34 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Add definitions for all the vmcs control fields/bits
required to enable vmcs-shadowing

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/include/asm/vmx.h            |    3 +++
 arch/x86/include/uapi/asm/msr-index.h |    2 ++
 2 files changed, 5 insertions(+)

--- .before/arch/x86/include/asm/vmx.h	2013-04-18 13:31:33.000000000 +0300
+++ .after/arch/x86/include/asm/vmx.h	2013-04-18 13:31:33.000000000 +0300
@@ -65,6 +65,7 @@
 #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
+#define SECONDARY_EXEC_SHADOW_VMCS              0x00004000
 
 
 #define PIN_BASED_EXT_INTR_MASK                 0x00000001
@@ -150,6 +151,8 @@ enum vmcs_field {
 	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
 	EOI_EXIT_BITMAP3                = 0x00002022,
 	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
+	VMREAD_BITMAP                   = 0x00002026,
+	VMWRITE_BITMAP                  = 0x00002028,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
 	VMCS_LINK_POINTER               = 0x00002800,
--- .before/arch/x86/include/uapi/asm/msr-index.h	2013-04-18 13:31:34.000000000 +0300
+++ .after/arch/x86/include/uapi/asm/msr-index.h	2013-04-18 13:31:34.000000000 +0300
@@ -522,6 +522,8 @@
 #define VMX_BASIC_MEM_TYPE_WB	6LLU
 #define VMX_BASIC_INOUT		0x0040000000000000LLU
 
+/* MSR_IA32_VMX_MISC bits */
+#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
 /* AMD-V MSRs */
 
 #define MSR_VM_CR                       0xc0010114


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 02/11] KVM: nVMX: Detect shadow-vmcs capability
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
  2013-04-18 11:34 ` [PATCH 01/11] KVM: nVMX: Shadow-vmcs control fields/bits Abel Gordon
@ 2013-04-18 11:35 ` Abel Gordon
  2013-04-18 11:35 ` [PATCH 03/11] KVM: nVMX: Introduce vmread and vmwrite bitmaps Abel Gordon
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:35 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Add logic required to detect if shadow-vmcs is supported by the
processor. Introduce a new kernel module parameter to specify if L0 should use
shadow vmcs (or not) to run L1.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 13:31:34.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 13:31:34.000000000 +0300
@@ -87,6 +87,8 @@ module_param(fasteoi, bool, S_IRUGO);
 static bool __read_mostly enable_apicv = 1;
 module_param(enable_apicv, bool, S_IRUGO);
 
+static bool __read_mostly enable_shadow_vmcs = 1;
+module_param_named(enable_shadow_vmcs, enable_shadow_vmcs, bool, S_IRUGO);
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
  * VMX and be a hypervisor for its own guests. If nested=0, guests may not
@@ -940,6 +942,18 @@ static inline bool cpu_has_vmx_wbinvd_ex
 		SECONDARY_EXEC_WBINVD_EXITING;
 }
 
+static inline bool cpu_has_vmx_shadow_vmcs(void)
+{
+	u64 vmx_msr;
+	rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
+	/* check if the cpu supports writing r/o exit information fields */
+	if (!(vmx_msr & MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS))
+		return false;
+
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_SHADOW_VMCS;
+}
+
 static inline bool report_flexpriority(void)
 {
 	return flexpriority_enabled;
@@ -2632,7 +2646,8 @@ static __init int setup_vmcs_config(stru
 			SECONDARY_EXEC_RDTSCP |
 			SECONDARY_EXEC_ENABLE_INVPCID |
 			SECONDARY_EXEC_APIC_REGISTER_VIRT |
-			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
+			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
+			SECONDARY_EXEC_SHADOW_VMCS;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -2833,6 +2848,8 @@ static __init int hardware_setup(void)
 
 	if (!cpu_has_vmx_vpid())
 		enable_vpid = 0;
+	if (!cpu_has_vmx_shadow_vmcs())
+		enable_shadow_vmcs = 0;
 
 	if (!cpu_has_vmx_ept() ||
 	    !cpu_has_vmx_ept_4levels()) {
@@ -4077,6 +4094,12 @@ static u32 vmx_secondary_exec_control(st
 		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
 				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
 	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
+	/* SECONDARY_EXEC_SHADOW_VMCS is enabled when L1 executes VMPTRLD
+	   (handle_vmptrld).
+	   We can NOT enable shadow_vmcs here because we don't have yet
+	   a current VMCS12
+	*/
+	exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
 	return exec_control;
 }
 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 03/11] KVM: nVMX: Introduce vmread and vmwrite bitmaps
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
  2013-04-18 11:34 ` [PATCH 01/11] KVM: nVMX: Shadow-vmcs control fields/bits Abel Gordon
  2013-04-18 11:35 ` [PATCH 02/11] KVM: nVMX: Detect shadow-vmcs capability Abel Gordon
@ 2013-04-18 11:35 ` Abel Gordon
  2013-04-18 11:36 ` [PATCH 04/11] KVM: nVMX: Refactor handle_vmwrite Abel Gordon
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:35 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Prepare vmread and vmwrite bitmaps according to a pre-specified list of fields.
These lists are intended to specifiy most frequent accessed fields so we can
minimize the number of fields that are copied from/to the software controlled
VMCS12 format to/from to processor-specific shadow vmcs. The lists were built
measuring the VMCS fields access rate after L2 Ubuntu 12.04 booted when it was
running on top of L1 KVM, also Ubuntu 12.04. Note that during boot there were
additional fields which were frequently modified but they were not added to
these lists because after boot these fields were not longer accessed by L1.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   90 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 89 insertions(+), 1 deletion(-)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 13:31:34.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 13:31:34.000000000 +0300
@@ -484,6 +484,64 @@ static inline struct vcpu_vmx *to_vmx(st
 #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
 				[number##_HIGH] = VMCS12_OFFSET(name)+4
 
+
+static const unsigned long shadow_read_only_fields[] = {
+	/*
+	 * We do NOT shadow fields that are modified when L0
+	 * traps and emulates any vmx instruction (e.g. VMPTRLD,
+	 * VMXON...) executed by L1.
+	 * For example, VM_INSTRUCTION_ERROR is read
+	 * by L1 if a vmx instruction fails (part of the error path).
+	 * Note the code assumes this logic. If for some reason
+	 * we start shadowing these fields then we need to
+	 * force a shadow sync when L0 emulates vmx instructions
+	 * (e.g. force a sync if VM_INSTRUCTION_ERROR is modified
+	 * by nested_vmx_failValid)
+	 */
+	VM_EXIT_REASON,
+	VM_EXIT_INTR_INFO,
+	VM_EXIT_INSTRUCTION_LEN,
+	IDT_VECTORING_INFO_FIELD,
+	IDT_VECTORING_ERROR_CODE,
+	VM_EXIT_INTR_ERROR_CODE,
+	EXIT_QUALIFICATION,
+	GUEST_LINEAR_ADDRESS,
+	GUEST_PHYSICAL_ADDRESS
+};
+static const int max_shadow_read_only_fields =
+	ARRAY_SIZE(shadow_read_only_fields);
+
+static const unsigned long shadow_read_write_fields[] = {
+	GUEST_RIP,
+	GUEST_RSP,
+	GUEST_CR0,
+	GUEST_CR3,
+	GUEST_CR4,
+	GUEST_INTERRUPTIBILITY_INFO,
+	GUEST_RFLAGS,
+	GUEST_CS_SELECTOR,
+	GUEST_CS_AR_BYTES,
+	GUEST_CS_LIMIT,
+	GUEST_CS_BASE,
+	GUEST_ES_BASE,
+	CR0_GUEST_HOST_MASK,
+	CR0_READ_SHADOW,
+	CR4_READ_SHADOW,
+	TSC_OFFSET,
+	EXCEPTION_BITMAP,
+	CPU_BASED_VM_EXEC_CONTROL,
+	VM_ENTRY_EXCEPTION_ERROR_CODE,
+	VM_ENTRY_INTR_INFO_FIELD,
+	VM_ENTRY_INSTRUCTION_LEN,
+	VM_ENTRY_EXCEPTION_ERROR_CODE,
+	HOST_FS_BASE,
+	HOST_GS_BASE,
+	HOST_FS_SELECTOR,
+	HOST_GS_SELECTOR
+};
+static const int max_shadow_read_write_fields =
+	ARRAY_SIZE(shadow_read_write_fields);
+
 static const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD(VIRTUAL_PROCESSOR_ID, virtual_processor_id),
 	FIELD(GUEST_ES_SELECTOR, guest_es_selector),
@@ -675,6 +733,8 @@ static unsigned long *vmx_msr_bitmap_leg
 static unsigned long *vmx_msr_bitmap_longmode;
 static unsigned long *vmx_msr_bitmap_legacy_x2apic;
 static unsigned long *vmx_msr_bitmap_longmode_x2apic;
+static unsigned long *vmx_vmread_bitmap;
+static unsigned long *vmx_vmwrite_bitmap;
 
 static bool cpu_has_load_ia32_efer;
 static bool cpu_has_load_perf_global_ctrl;
@@ -4128,6 +4188,10 @@ static int vmx_vcpu_setup(struct vcpu_vm
 	vmcs_write64(IO_BITMAP_A, __pa(vmx_io_bitmap_a));
 	vmcs_write64(IO_BITMAP_B, __pa(vmx_io_bitmap_b));
 
+	if (enable_shadow_vmcs) {
+		vmcs_write64(VMREAD_BITMAP, __pa(vmx_vmread_bitmap));
+		vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap));
+	}
 	if (cpu_has_vmx_msr_bitmap())
 		vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy));
 
@@ -7941,6 +8005,24 @@ static int __init vmx_init(void)
 				(unsigned long *)__get_free_page(GFP_KERNEL);
 	if (!vmx_msr_bitmap_longmode_x2apic)
 		goto out4;
+	vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
+	if (!vmx_vmread_bitmap)
+		goto out5;
+
+	vmx_vmwrite_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
+	if (!vmx_vmwrite_bitmap)
+		goto out6;
+
+	memset(vmx_vmread_bitmap, 0xff, PAGE_SIZE);
+	memset(vmx_vmwrite_bitmap, 0xff, PAGE_SIZE);
+	/* shadowed read/write fields */
+	for (i = 0; i < max_shadow_read_write_fields; i++) {
+		clear_bit(shadow_read_write_fields[i], vmx_vmwrite_bitmap);
+		clear_bit(shadow_read_write_fields[i], vmx_vmread_bitmap);
+	}
+	/* shadowed read only fields */
+	for (i = 0; i < max_shadow_read_only_fields; i++)
+		clear_bit(shadow_read_only_fields[i], vmx_vmread_bitmap);
 
 	/*
 	 * Allow direct access to the PC debug port (it is often used for I/O
@@ -7959,7 +8041,7 @@ static int __init vmx_init(void)
 	r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
 		     __alignof__(struct vcpu_vmx), THIS_MODULE);
 	if (r)
-		goto out5;
+		goto out7;
 
 #ifdef CONFIG_KEXEC
 	rcu_assign_pointer(crash_vmclear_loaded_vmcss,
@@ -8007,6 +8089,10 @@ static int __init vmx_init(void)
 
 	return 0;
 
+out7:
+	free_page((unsigned long)vmx_vmwrite_bitmap);
+out6:
+	free_page((unsigned long)vmx_vmread_bitmap);
 out5:
 	free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
 out4:
@@ -8030,6 +8116,8 @@ static void __exit vmx_exit(void)
 	free_page((unsigned long)vmx_msr_bitmap_longmode);
 	free_page((unsigned long)vmx_io_bitmap_b);
 	free_page((unsigned long)vmx_io_bitmap_a);
+	free_page((unsigned long)vmx_vmwrite_bitmap);
+	free_page((unsigned long)vmx_vmread_bitmap);
 
 #ifdef CONFIG_KEXEC
 	rcu_assign_pointer(crash_vmclear_loaded_vmcss, NULL);


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 04/11] KVM: nVMX: Refactor handle_vmwrite
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
                   ` (2 preceding siblings ...)
  2013-04-18 11:35 ` [PATCH 03/11] KVM: nVMX: Introduce vmread and vmwrite bitmaps Abel Gordon
@ 2013-04-18 11:36 ` Abel Gordon
  2013-04-18 11:36 ` [PATCH 05/11] KVM: nVMX: Fix VMXON emulation Abel Gordon
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:36 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Refactor existent code so we re-use vmcs12_write_any to copy fields from the
shadow vmcs specified by the link pointer (used by the processor,
implementation-specific) to the VMCS12 software format used by L0 to hold
the fields in L1 memory address space.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   52 +++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 24 deletions(-)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 13:31:34.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 13:31:34.000000000 +0300
@@ -5842,6 +5842,33 @@ static inline bool vmcs12_read_any(struc
 	}
 }
 
+
+static inline bool vmcs12_write_any(struct kvm_vcpu *vcpu,
+				    unsigned long field, u64 field_value){
+	short offset = vmcs_field_to_offset(field);
+	char *p = ((char *) get_vmcs12(vcpu)) + offset;
+	if (offset < 0)
+		return false;
+
+	switch (vmcs_field_type(field)) {
+	case VMCS_FIELD_TYPE_U16:
+		*(u16 *)p = field_value;
+		return true;
+	case VMCS_FIELD_TYPE_U32:
+		*(u32 *)p = field_value;
+		return true;
+	case VMCS_FIELD_TYPE_U64:
+		*(u64 *)p = field_value;
+		return true;
+	case VMCS_FIELD_TYPE_NATURAL_WIDTH:
+		*(natural_width *)p = field_value;
+		return true;
+	default:
+		return false; /* can never happen. */
+	}
+
+}
+
 /*
  * VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
  * used before) all generate the same failure when it is missing.
@@ -5906,8 +5933,6 @@ static int handle_vmwrite(struct kvm_vcp
 	gva_t gva;
 	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 	u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
-	char *p;
-	short offset;
 	/* The value to write might be 32 or 64 bits, depending on L1's long
 	 * mode, and eventually we need to write that into a field of several
 	 * possible lengths. The code below first zero-extends the value to 64
@@ -5944,28 +5969,7 @@ static int handle_vmwrite(struct kvm_vcp
 		return 1;
 	}
 
-	offset = vmcs_field_to_offset(field);
-	if (offset < 0) {
-		nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
-		skip_emulated_instruction(vcpu);
-		return 1;
-	}
-	p = ((char *) get_vmcs12(vcpu)) + offset;
-
-	switch (vmcs_field_type(field)) {
-	case VMCS_FIELD_TYPE_U16:
-		*(u16 *)p = field_value;
-		break;
-	case VMCS_FIELD_TYPE_U32:
-		*(u32 *)p = field_value;
-		break;
-	case VMCS_FIELD_TYPE_U64:
-		*(u64 *)p = field_value;
-		break;
-	case VMCS_FIELD_TYPE_NATURAL_WIDTH:
-		*(natural_width *)p = field_value;
-		break;
-	default:
+	if (!vmcs12_write_any(vcpu, field, field_value)) {
 		nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
 		skip_emulated_instruction(vcpu);
 		return 1;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 05/11] KVM: nVMX: Fix VMXON emulation
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
                   ` (3 preceding siblings ...)
  2013-04-18 11:36 ` [PATCH 04/11] KVM: nVMX: Refactor handle_vmwrite Abel Gordon
@ 2013-04-18 11:36 ` Abel Gordon
  2013-04-18 11:37 ` [PATCH 06/11] KVM: nVMX: Allocate shadow vmcs Abel Gordon
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:36 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

handle_vmon doesn't check if L1 is already in root mode (VMXON
was previously called). This patch adds this missing check and calls
nested_vmx_failValid if VMX is already ON.
We need this check because L0 will allocate the shadow vmcs when L1
executes VMXON and we want to avoid host leaks (due to shadow vmcs
allocation) if L1 executes VMXON repeatedly.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 13:31:34.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 13:31:34.000000000 +0300
@@ -5512,6 +5512,9 @@ static void nested_free_all_saved_vmcss(
 		free_loaded_vmcs(&vmx->vmcs01);
 }
 
+static void nested_vmx_failValid(struct kvm_vcpu *vcpu,
+				 u32 vm_instruction_error);
+
 /*
  * Emulate the VMXON instruction.
  * Currently, we just remember that VMX is active, and do not save or even
@@ -5547,6 +5550,11 @@ static int handle_vmon(struct kvm_vcpu *
 		kvm_inject_gp(vcpu, 0);
 		return 1;
 	}
+	if (vmx->nested.vmxon) {
+		nested_vmx_failValid(vcpu, VMXERR_VMXON_IN_VMX_ROOT_OPERATION);
+		skip_emulated_instruction(vcpu);
+		return 1;
+	}
 
 	INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
 	vmx->nested.vmcs02_num = 0;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 06/11] KVM: nVMX: Allocate shadow vmcs
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
                   ` (4 preceding siblings ...)
  2013-04-18 11:36 ` [PATCH 05/11] KVM: nVMX: Fix VMXON emulation Abel Gordon
@ 2013-04-18 11:37 ` Abel Gordon
  2013-04-18 11:37 ` [PATCH 07/11] KVM: nVMX: Release " Abel Gordon
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:37 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Allocate a shadow vmcs used by the processor to shadow part of the fields
stored in the software defined VMCS12 (let L1 access fields without causing
exits). Note we keep a shadow vmcs only for the current vmcs12.  Once a vmcs12
becomes non-current, its shadow vmcs is released.


Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 13:31:34.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 13:31:34.000000000 +0300
@@ -355,6 +355,7 @@ struct nested_vmx {
 	/* The host-usable pointer to the above */
 	struct page *current_vmcs12_page;
 	struct vmcs12 *current_vmcs12;
+	struct vmcs *current_shadow_vmcs;
 
 	/* vmcs02_list cache of VMCSs recently used to run L2 guests */
 	struct list_head vmcs02_pool;
@@ -5527,6 +5528,7 @@ static int handle_vmon(struct kvm_vcpu *
 {
 	struct kvm_segment cs;
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	struct vmcs *shadow_vmcs;
 
 	/* The Intel VMX Instruction Reference lists a bunch of bits that
 	 * are prerequisite to running VMXON, most notably cr4.VMXE must be
@@ -5555,6 +5557,16 @@ static int handle_vmon(struct kvm_vcpu *
 		skip_emulated_instruction(vcpu);
 		return 1;
 	}
+	if (enable_shadow_vmcs) {
+		shadow_vmcs = alloc_vmcs();
+		if (!shadow_vmcs)
+			return -ENOMEM;
+		/* mark vmcs as shadow */
+		shadow_vmcs->revision_id |= (1u << 31);
+		/* init shadow vmcs */
+		vmcs_clear(shadow_vmcs);
+		vmx->nested.current_shadow_vmcs = shadow_vmcs;
+	}
 
 	INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
 	vmx->nested.vmcs02_num = 0;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 07/11] KVM: nVMX: Release shadow vmcs
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
                   ` (5 preceding siblings ...)
  2013-04-18 11:37 ` [PATCH 06/11] KVM: nVMX: Allocate shadow vmcs Abel Gordon
@ 2013-04-18 11:37 ` Abel Gordon
  2013-04-18 11:38 ` [PATCH 08/11] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 Abel Gordon
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:37 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Unmap vmcs12 and release the corresponding shadow vmcs 

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
@@ -5607,6 +5607,12 @@ static int nested_vmx_check_permission(s
 	return 1;
 }
 
+static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
+{
+	kunmap(vmx->nested.current_vmcs12_page);
+	nested_release_page(vmx->nested.current_vmcs12_page);
+}
+
 /*
  * Free whatever needs to be freed from vmx->nested when L1 goes down, or
  * just stops using VMX.
@@ -5617,11 +5623,12 @@ static void free_nested(struct vcpu_vmx 
 		return;
 	vmx->nested.vmxon = false;
 	if (vmx->nested.current_vmptr != -1ull) {
-		kunmap(vmx->nested.current_vmcs12_page);
-		nested_release_page(vmx->nested.current_vmcs12_page);
+		nested_release_vmcs12(vmx);
 		vmx->nested.current_vmptr = -1ull;
 		vmx->nested.current_vmcs12 = NULL;
 	}
+	if (enable_shadow_vmcs)
+		free_vmcs(vmx->nested.current_shadow_vmcs);
 	/* Unpin physical memory we referred to in current vmcs02 */
 	if (vmx->nested.apic_access_page) {
 		nested_release_page(vmx->nested.apic_access_page);
@@ -5762,8 +5769,7 @@ static int handle_vmclear(struct kvm_vcp
 	}
 
 	if (vmptr == vmx->nested.current_vmptr) {
-		kunmap(vmx->nested.current_vmcs12_page);
-		nested_release_page(vmx->nested.current_vmcs12_page);
+		nested_release_vmcs12(vmx);
 		vmx->nested.current_vmptr = -1ull;
 		vmx->nested.current_vmcs12 = NULL;
 	}
@@ -6045,10 +6051,8 @@ static int handle_vmptrld(struct kvm_vcp
 			skip_emulated_instruction(vcpu);
 			return 1;
 		}
-		if (vmx->nested.current_vmptr != -1ull) {
-			kunmap(vmx->nested.current_vmcs12_page);
-			nested_release_page(vmx->nested.current_vmcs12_page);
-		}
+		if (vmx->nested.current_vmptr != -1ull)
+			nested_release_vmcs12(vmx);
 
 		vmx->nested.current_vmptr = vmptr;
 		vmx->nested.current_vmcs12 = new_vmcs12;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 08/11] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
                   ` (6 preceding siblings ...)
  2013-04-18 11:37 ` [PATCH 07/11] KVM: nVMX: Release " Abel Gordon
@ 2013-04-18 11:38 ` Abel Gordon
  2013-04-18 11:38 ` [PATCH 09/11] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs Abel Gordon
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:38 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Introduce a function used to copy fields from the processor-specific shadow
vmcs to the software controlled VMCS12

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
@@ -718,6 +718,7 @@ static void vmx_get_segment(struct kvm_v
 static bool guest_state_valid(struct kvm_vcpu *vcpu);
 static u32 vmx_segment_access_rights(struct kvm_segment *var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
+static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -5895,6 +5896,40 @@ static inline bool vmcs12_write_any(stru
 
 }
 
+static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
+{
+	int i;
+	unsigned long field;
+	u64 field_value;
+	struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
+	unsigned long *fields = (unsigned long *)shadow_read_write_fields;
+	int num_fields = max_shadow_read_write_fields;
+
+	vmcs_load(shadow_vmcs);
+
+	for (i = 0; i < num_fields; i++) {
+		field = fields[i];
+		switch (vmcs_field_type(field)) {
+		case VMCS_FIELD_TYPE_U16:
+			field_value = vmcs_read16(field);
+			break;
+		case VMCS_FIELD_TYPE_U32:
+			field_value = vmcs_read32(field);
+			break;
+		case VMCS_FIELD_TYPE_U64:
+			field_value = vmcs_read64(field);
+			break;
+		case VMCS_FIELD_TYPE_NATURAL_WIDTH:
+			field_value = vmcs_readl(field);
+			break;
+		}
+		vmcs12_write_any(&vmx->vcpu, field, field_value);
+	}
+
+	vmcs_clear(shadow_vmcs);
+	vmcs_load(vmx->loaded_vmcs->vmcs);
+}
+
 /*
  * VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
  * used before) all generate the same failure when it is missing.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 09/11] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
                   ` (7 preceding siblings ...)
  2013-04-18 11:38 ` [PATCH 08/11] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 Abel Gordon
@ 2013-04-18 11:38 ` Abel Gordon
  2013-04-18 11:39 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the " Abel Gordon
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:38 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Introduce a function used to copy fields from the software controlled VMCS12
to the processor-specific shadow vmcs

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   45 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
@@ -718,6 +718,7 @@ static void vmx_get_segment(struct kvm_v
 static bool guest_state_valid(struct kvm_vcpu *vcpu);
 static u32 vmx_segment_access_rights(struct kvm_segment *var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
+static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
@@ -5930,6 +5931,50 @@ static void copy_shadow_to_vmcs12(struct
 	vmcs_load(vmx->loaded_vmcs->vmcs);
 }
 
+static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
+{
+	unsigned long *fields[] = {
+		(unsigned long *)shadow_read_write_fields,
+		(unsigned long *)shadow_read_only_fields
+	};
+	int num_lists =  ARRAY_SIZE(fields);
+	int max_fields[] = {
+		max_shadow_read_write_fields,
+		max_shadow_read_only_fields
+	};
+	int i, q;
+	unsigned long field;
+	u64 field_value = 0;
+	struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
+
+	vmcs_load(shadow_vmcs);
+
+	for (q = 0; q < num_lists; q++) {
+		for (i = 0; i < max_fields[q]; i++) {
+			field = fields[q][i];
+			vmcs12_read_any(&vmx->vcpu, field, &field_value);
+
+			switch (vmcs_field_type(field)) {
+			case VMCS_FIELD_TYPE_U16:
+				vmcs_write16(field, (u16)field_value);
+				break;
+			case VMCS_FIELD_TYPE_U32:
+				vmcs_write32(field, (u32)field_value);
+				break;
+			case VMCS_FIELD_TYPE_U64:
+				vmcs_write64(field, (u64)field_value);
+				break;
+			case VMCS_FIELD_TYPE_NATURAL_WIDTH:
+				vmcs_writel(field, (long)field_value);
+				break;
+			}
+		}
+	}
+
+	vmcs_clear(shadow_vmcs);
+	vmcs_load(vmx->loaded_vmcs->vmcs);
+}
+
 /*
  * VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
  * used before) all generate the same failure when it is missing.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
                   ` (8 preceding siblings ...)
  2013-04-18 11:38 ` [PATCH 09/11] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs Abel Gordon
@ 2013-04-18 11:39 ` Abel Gordon
  2013-04-18 11:39 ` [PATCH 11/11] KVM: nVMX: Enable and disable shadow vmcs functionality Abel Gordon
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:39 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Synchronize between the VMCS12 software controlled structure and the
processor-specific shadow vmcs

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
@@ -356,6 +356,11 @@ struct nested_vmx {
 	struct page *current_vmcs12_page;
 	struct vmcs12 *current_vmcs12;
 	struct vmcs *current_shadow_vmcs;
+	/*
+	 * Indicates if the shadow vmcs must be updated with the
+	 * data hold by vmcs12
+	 */
+	bool sync_shadow_vmcs;
 
 	/* vmcs02_list cache of VMCSs recently used to run L2 guests */
 	struct list_head vmcs02_pool;
@@ -5611,6 +5616,14 @@ static int nested_vmx_check_permission(s
 
 static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 {
+	if (enable_shadow_vmcs) {
+		if (vmx->nested.current_vmcs12 != NULL) {
+			/* copy to memory all shadowed fields in case
+			   they were modified */
+			copy_shadow_to_vmcs12(vmx);
+			vmx->nested.sync_shadow_vmcs = false;
+		}
+	}
 	kunmap(vmx->nested.current_vmcs12_page);
 	nested_release_page(vmx->nested.current_vmcs12_page);
 }
@@ -5739,6 +5752,10 @@ static void nested_vmx_failValid(struct 
 			    X86_EFLAGS_SF | X86_EFLAGS_OF))
 			| X86_EFLAGS_ZF);
 	get_vmcs12(vcpu)->vm_instruction_error = vm_instruction_error;
+	/*
+	 * We don't need to force a shadow sync because
+	 * VM_INSTRUCTION_ERROR is not shadowed
+	 */
 }
 
 /* Emulate the VMCLEAR instruction */
@@ -6137,6 +6154,9 @@ static int handle_vmptrld(struct kvm_vcp
 		vmx->nested.current_vmptr = vmptr;
 		vmx->nested.current_vmcs12 = new_vmcs12;
 		vmx->nested.current_vmcs12_page = page;
+		if (enable_shadow_vmcs) {
+			vmx->nested.sync_shadow_vmcs = true;
+		}
 	}
 
 	nested_vmx_succeed(vcpu);
@@ -6895,6 +6915,11 @@ static void __noclone vmx_vcpu_run(struc
 	if (vmx->emulation_required)
 		return;
 
+	if (vmx->nested.sync_shadow_vmcs) {
+		copy_vmcs12_to_shadow(vmx);
+		vmx->nested.sync_shadow_vmcs = false;
+	}
+
 	if (test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty))
 		vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
 	if (test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty))
@@ -7504,6 +7529,9 @@ static int nested_vmx_run(struct kvm_vcp
 	skip_emulated_instruction(vcpu);
 	vmcs12 = get_vmcs12(vcpu);
 
+	if (enable_shadow_vmcs)
+		copy_shadow_to_vmcs12(vmx);
+
 	/*
 	 * The nested entry process starts with enforcing various prerequisites
 	 * on vmcs12 as required by the Intel SDM, and act appropriately when
@@ -7950,6 +7978,8 @@ static void nested_vmx_vmexit(struct kvm
 		nested_vmx_failValid(vcpu, vmcs_read32(VM_INSTRUCTION_ERROR));
 	} else
 		nested_vmx_succeed(vcpu);
+	if (enable_shadow_vmcs)
+		vmx->nested.sync_shadow_vmcs = true;
 }
 
 /*
@@ -7967,6 +7997,8 @@ static void nested_vmx_entry_failure(str
 	vmcs12->vm_exit_reason = reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
 	vmcs12->exit_qualification = qualification;
 	nested_vmx_succeed(vcpu);
+	if (enable_shadow_vmcs)
+		to_vmx(vcpu)->nested.sync_shadow_vmcs = true;
 }
 
 static int vmx_check_intercept(struct kvm_vcpu *vcpu,


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 11/11] KVM: nVMX: Enable and disable shadow vmcs functionality
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
                   ` (9 preceding siblings ...)
  2013-04-18 11:39 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the " Abel Gordon
@ 2013-04-18 11:39 ` Abel Gordon
  2013-04-18 12:42 ` [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Gleb Natapov
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18 11:39 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Once L1 loads VMCS12 we enable shadow-vmcs capability and copy all the VMCS12
shadowed fields to the shadow vmcs.  When we release the VMCS12, we also
disable shadow-vmcs capability.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
@@ -5616,12 +5616,17 @@ static int nested_vmx_check_permission(s
 
 static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 {
+	u32 exec_control;
 	if (enable_shadow_vmcs) {
 		if (vmx->nested.current_vmcs12 != NULL) {
 			/* copy to memory all shadowed fields in case
 			   they were modified */
 			copy_shadow_to_vmcs12(vmx);
 			vmx->nested.sync_shadow_vmcs = false;
+			exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+			exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
+			vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
+			vmcs_write64(VMCS_LINK_POINTER, -1ull);
 		}
 	}
 	kunmap(vmx->nested.current_vmcs12_page);
@@ -6110,6 +6115,7 @@ static int handle_vmptrld(struct kvm_vcp
 	gva_t gva;
 	gpa_t vmptr;
 	struct x86_exception e;
+	u32 exec_control;
 
 	if (!nested_vmx_check_permission(vcpu))
 		return 1;
@@ -6155,6 +6161,11 @@ static int handle_vmptrld(struct kvm_vcp
 		vmx->nested.current_vmcs12 = new_vmcs12;
 		vmx->nested.current_vmcs12_page = page;
 		if (enable_shadow_vmcs) {
+			exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+			exec_control |= SECONDARY_EXEC_SHADOW_VMCS;
+			vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
+			vmcs_write64(VMCS_LINK_POINTER,
+				     __pa(vmx->nested.current_shadow_vmcs));
 			vmx->nested.sync_shadow_vmcs = true;
 		}
 	}


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
                   ` (10 preceding siblings ...)
  2013-04-18 11:39 ` [PATCH 11/11] KVM: nVMX: Enable and disable shadow vmcs functionality Abel Gordon
@ 2013-04-18 12:42 ` Gleb Natapov
  2013-04-18 12:48 ` Orit Wasserman
  2013-04-22  7:55 ` Gleb Natapov
  13 siblings, 0 replies; 36+ messages in thread
From: Gleb Natapov @ 2013-04-18 12:42 UTC (permalink / raw)
  To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu

On Thu, Apr 18, 2013 at 02:34:25PM +0300, Abel Gordon wrote:
> This series of patches implements shadow-vmcs capability for nested VMX.
> 
> Shadow-vmcs - background and overview:
> 
>  In Intel VMX, vmread and vmwrite privileged instructions are used by the
>  hypervisor to read and modify the guest and host specifications (VMCS). In a
>  nested virtualization environment, L1 executes multiple vmread and vmwrite
>  instruction to handle a single L2 exit. Each vmread and vmwrite executed by L1
>  traps (cause an exit) to the L0 hypervisor (KVM). L0 emulates the instruction
>  behaviour and resumes L1 execution.
> 
>  Removing the need to trap and emulate these special instructions reduces the
>  number of exits and improves nested virtualization performance. As it was first
>  evaluated in [1], exit-less vmread and vmwrite can reduce nested virtualization
>  overhead up-to 40%.
>  
>  Intel introduced a new feature to their processors called shadow-vmcs.  Using
>  shadow-vmcs, L0 can configure the processor to let L1 running in guest-mode
>  access VMCS12 fields using vmread and vmwrite instructions but without causing
>  an exit to L0. The VMCS12 fields' data is stored in a shadow-vmcs controlled
>  by L0.
> 
> Shadow-vmcs - design considerations: 
> 
>  A shadow-vmcs is processor-dependent and must be accessed by L0 or L1 using
>  vmread and vmwrite instructions. With nested virtualization we aim to abstract
>  the hardware from the L1 hypervisor. Thus, to avoid hardware dependencies we
>  prefered to keep the software defined VMCS12 format as part of L1 address space
>  and hold the processor-specific shadow-vmcs format only in L0 address space.
>  In other words, the shadow-vmcs is used by L0 as an accelerator but the format
>  and content is never exposed to L1 directly. L0 syncs the content of the
>  processor-specific shadow vmcs with the content of the software-controlled
>  VMCS12 format.
> 
>  We could have been kept the processor-specific shadow-vmcs format in L1 address
>  space to avoid using the software defined VMCS12 format, however, this type of
>  design/implementation would have been created hardware dependencies and
>  would complicate other capabilities (e.g. Live Migration of L1).
> 
> Changes since v1:
>  1) Added sync_shadow_vmcs flag used to indicate when the content of VMCS12
>     must be copied to the shadow vmcs. The flag value is checked during 
>     vmx_vcpu_run.
>  2) Code quality improvements
> 
> Changes since v2:
>  1) Allocate shadow vmcs only once per VCPU on handle_vmxon and re-use the 
>     same instance for multiple VMCS12s
>  2) More code quality improvements
> 
> Changes since v3:
>  1) Fixed VMXON emulation (new patch). 
>     Previous nVMX code didn't verify if L1 is already in root mode (VMXON
>     was previously called). Now we call nested_vmx_failValid if VMX is 
>     already ON. This is requird to avoid host leaks (due to shadow vmcs 
>     allocation) if L1 repetedly executes VMXON.
>  2) Improved comment: clarified we do not shadow fields that are modified
>     when L1 executes vmx instructions like the VM_INSTRUCTION_ERROR field.
> 
> Changes since v4:
>  1) Fixed free_nested: we now free the shadow vmcs also 
>     when there is no current vmcs.
>  
> Acknowledgments:
> 
>  Many thanks to
>  "Natapov, Gleb" <gleb@redhat.com> 
>  "Xu, Dongxiao" <dongxiao.xu@intel.com>
>  "Nakajima, Jun" <jun.nakajima@intel.com>
>  "Har'El, Nadav" <nadav@harel.org.il> 
>   
>  for the insightful discussions, comments and reviews.
> 
> 
Reviewed-by: Gleb Natapov <gleb@redhat.com>

>  These patches were easily created and maintained using
>      Patchouli -- patch creator
>      http://patchouli.sourceforge.net/
> 
> 
> [1] "The Turtles Project: Design and Implementation of Nested Virtualization",
>     http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
                   ` (11 preceding siblings ...)
  2013-04-18 12:42 ` [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Gleb Natapov
@ 2013-04-18 12:48 ` Orit Wasserman
  2013-04-22  7:55 ` Gleb Natapov
  13 siblings, 0 replies; 36+ messages in thread
From: Orit Wasserman @ 2013-04-18 12:48 UTC (permalink / raw)
  To: Abel Gordon; +Cc: kvm, nadav, jun.nakajima, dongxiao.xu

On 04/18/2013 02:34 PM, Abel Gordon wrote:
> This series of patches implements shadow-vmcs capability for nested VMX.
> 
> Shadow-vmcs - background and overview:
> 
>  In Intel VMX, vmread and vmwrite privileged instructions are used by the
>  hypervisor to read and modify the guest and host specifications (VMCS). In a
>  nested virtualization environment, L1 executes multiple vmread and vmwrite
>  instruction to handle a single L2 exit. Each vmread and vmwrite executed by L1
>  traps (cause an exit) to the L0 hypervisor (KVM). L0 emulates the instruction
>  behaviour and resumes L1 execution.
> 
>  Removing the need to trap and emulate these special instructions reduces the
>  number of exits and improves nested virtualization performance. As it was first
>  evaluated in [1], exit-less vmread and vmwrite can reduce nested virtualization
>  overhead up-to 40%.
>  
>  Intel introduced a new feature to their processors called shadow-vmcs.  Using
>  shadow-vmcs, L0 can configure the processor to let L1 running in guest-mode
>  access VMCS12 fields using vmread and vmwrite instructions but without causing
>  an exit to L0. The VMCS12 fields' data is stored in a shadow-vmcs controlled
>  by L0.
> 
> Shadow-vmcs - design considerations: 
> 
>  A shadow-vmcs is processor-dependent and must be accessed by L0 or L1 using
>  vmread and vmwrite instructions. With nested virtualization we aim to abstract
>  the hardware from the L1 hypervisor. Thus, to avoid hardware dependencies we
>  prefered to keep the software defined VMCS12 format as part of L1 address space
>  and hold the processor-specific shadow-vmcs format only in L0 address space.
>  In other words, the shadow-vmcs is used by L0 as an accelerator but the format
>  and content is never exposed to L1 directly. L0 syncs the content of the
>  processor-specific shadow vmcs with the content of the software-controlled
>  VMCS12 format.
> 
>  We could have been kept the processor-specific shadow-vmcs format in L1 address
>  space to avoid using the software defined VMCS12 format, however, this type of
>  design/implementation would have been created hardware dependencies and
>  would complicate other capabilities (e.g. Live Migration of L1).
> 
> Changes since v1:
>  1) Added sync_shadow_vmcs flag used to indicate when the content of VMCS12
>     must be copied to the shadow vmcs. The flag value is checked during 
>     vmx_vcpu_run.
>  2) Code quality improvements
> 
> Changes since v2:
>  1) Allocate shadow vmcs only once per VCPU on handle_vmxon and re-use the 
>     same instance for multiple VMCS12s
>  2) More code quality improvements
> 
> Changes since v3:
>  1) Fixed VMXON emulation (new patch). 
>     Previous nVMX code didn't verify if L1 is already in root mode (VMXON
>     was previously called). Now we call nested_vmx_failValid if VMX is 
>     already ON. This is requird to avoid host leaks (due to shadow vmcs 
>     allocation) if L1 repetedly executes VMXON.
>  2) Improved comment: clarified we do not shadow fields that are modified
>     when L1 executes vmx instructions like the VM_INSTRUCTION_ERROR field.
> 
> Changes since v4:
>  1) Fixed free_nested: we now free the shadow vmcs also 
>     when there is no current vmcs.
>  
> Acknowledgments:
> 
>  Many thanks to
>  "Natapov, Gleb" <gleb@redhat.com> 
>  "Xu, Dongxiao" <dongxiao.xu@intel.com>
>  "Nakajima, Jun" <jun.nakajima@intel.com>
>  "Har'El, Nadav" <nadav@harel.org.il> 
>   
>  for the insightful discussions, comments and reviews.
> 
> 
>  These patches were easily created and maintained using
>      Patchouli -- patch creator
>      http://patchouli.sourceforge.net/
> 
> 
> [1] "The Turtles Project: Design and Implementation of Nested Virtualization",
>     http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf
> 

Hard to keep up :)

Reviewed-by: Orit Wasserman <owasserm@redhat.com>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
                   ` (12 preceding siblings ...)
  2013-04-18 12:48 ` Orit Wasserman
@ 2013-04-22  7:55 ` Gleb Natapov
  13 siblings, 0 replies; 36+ messages in thread
From: Gleb Natapov @ 2013-04-22  7:55 UTC (permalink / raw)
  To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu

On Thu, Apr 18, 2013 at 02:34:25PM +0300, Abel Gordon wrote:
> This series of patches implements shadow-vmcs capability for nested VMX.
> 
Applied, thanks.

> Shadow-vmcs - background and overview:
> 
>  In Intel VMX, vmread and vmwrite privileged instructions are used by the
>  hypervisor to read and modify the guest and host specifications (VMCS). In a
>  nested virtualization environment, L1 executes multiple vmread and vmwrite
>  instruction to handle a single L2 exit. Each vmread and vmwrite executed by L1
>  traps (cause an exit) to the L0 hypervisor (KVM). L0 emulates the instruction
>  behaviour and resumes L1 execution.
> 
>  Removing the need to trap and emulate these special instructions reduces the
>  number of exits and improves nested virtualization performance. As it was first
>  evaluated in [1], exit-less vmread and vmwrite can reduce nested virtualization
>  overhead up-to 40%.
>  
>  Intel introduced a new feature to their processors called shadow-vmcs.  Using
>  shadow-vmcs, L0 can configure the processor to let L1 running in guest-mode
>  access VMCS12 fields using vmread and vmwrite instructions but without causing
>  an exit to L0. The VMCS12 fields' data is stored in a shadow-vmcs controlled
>  by L0.
> 
> Shadow-vmcs - design considerations: 
> 
>  A shadow-vmcs is processor-dependent and must be accessed by L0 or L1 using
>  vmread and vmwrite instructions. With nested virtualization we aim to abstract
>  the hardware from the L1 hypervisor. Thus, to avoid hardware dependencies we
>  prefered to keep the software defined VMCS12 format as part of L1 address space
>  and hold the processor-specific shadow-vmcs format only in L0 address space.
>  In other words, the shadow-vmcs is used by L0 as an accelerator but the format
>  and content is never exposed to L1 directly. L0 syncs the content of the
>  processor-specific shadow vmcs with the content of the software-controlled
>  VMCS12 format.
> 
>  We could have been kept the processor-specific shadow-vmcs format in L1 address
>  space to avoid using the software defined VMCS12 format, however, this type of
>  design/implementation would have been created hardware dependencies and
>  would complicate other capabilities (e.g. Live Migration of L1).
> 
> Changes since v1:
>  1) Added sync_shadow_vmcs flag used to indicate when the content of VMCS12
>     must be copied to the shadow vmcs. The flag value is checked during 
>     vmx_vcpu_run.
>  2) Code quality improvements
> 
> Changes since v2:
>  1) Allocate shadow vmcs only once per VCPU on handle_vmxon and re-use the 
>     same instance for multiple VMCS12s
>  2) More code quality improvements
> 
> Changes since v3:
>  1) Fixed VMXON emulation (new patch). 
>     Previous nVMX code didn't verify if L1 is already in root mode (VMXON
>     was previously called). Now we call nested_vmx_failValid if VMX is 
>     already ON. This is requird to avoid host leaks (due to shadow vmcs 
>     allocation) if L1 repetedly executes VMXON.
>  2) Improved comment: clarified we do not shadow fields that are modified
>     when L1 executes vmx instructions like the VM_INSTRUCTION_ERROR field.
> 
> Changes since v4:
>  1) Fixed free_nested: we now free the shadow vmcs also 
>     when there is no current vmcs.
>  
> Acknowledgments:
> 
>  Many thanks to
>  "Natapov, Gleb" <gleb@redhat.com> 
>  "Xu, Dongxiao" <dongxiao.xu@intel.com>
>  "Nakajima, Jun" <jun.nakajima@intel.com>
>  "Har'El, Nadav" <nadav@harel.org.il> 
>   
>  for the insightful discussions, comments and reviews.
> 
> 
>  These patches were easily created and maintained using
>      Patchouli -- patch creator
>      http://patchouli.sourceforge.net/
> 
> 
> [1] "The Turtles Project: Design and Implementation of Nested Virtualization",
>     http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-18  8:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v4 Abel Gordon
@ 2013-04-18  8:39 ` Abel Gordon
  0 siblings, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-18  8:39 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Synchronize between the VMCS12 software controlled structure and the
processor-specific shadow vmcs

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 11:28:23.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 11:28:23.000000000 +0300
@@ -356,6 +356,11 @@ struct nested_vmx {
 	struct page *current_vmcs12_page;
 	struct vmcs12 *current_vmcs12;
 	struct vmcs *current_shadow_vmcs;
+	/*
+	 * Indicates if the shadow vmcs must be updated with the
+	 * data hold by vmcs12
+	 */
+	bool sync_shadow_vmcs;
 
 	/* vmcs02_list cache of VMCSs recently used to run L2 guests */
 	struct list_head vmcs02_pool;
@@ -5611,6 +5616,14 @@ static int nested_vmx_check_permission(s
 
 static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 {
+	if (enable_shadow_vmcs) {
+		if (vmx->nested.current_vmcs12 != NULL) {
+			/* copy to memory all shadowed fields in case
+			   they were modified */
+			copy_shadow_to_vmcs12(vmx);
+			vmx->nested.sync_shadow_vmcs = false;
+		}
+	}
 	kunmap(vmx->nested.current_vmcs12_page);
 	nested_release_page(vmx->nested.current_vmcs12_page);
 }
@@ -5739,6 +5752,10 @@ static void nested_vmx_failValid(struct 
 			    X86_EFLAGS_SF | X86_EFLAGS_OF))
 			| X86_EFLAGS_ZF);
 	get_vmcs12(vcpu)->vm_instruction_error = vm_instruction_error;
+	/*
+	 * We don't need to force a shadow sync because
+	 * VM_INSTRUCTION_ERROR is not shadowed
+	 */
 }
 
 /* Emulate the VMCLEAR instruction */
@@ -6137,6 +6154,9 @@ static int handle_vmptrld(struct kvm_vcp
 		vmx->nested.current_vmptr = vmptr;
 		vmx->nested.current_vmcs12 = new_vmcs12;
 		vmx->nested.current_vmcs12_page = page;
+		if (enable_shadow_vmcs) {
+			vmx->nested.sync_shadow_vmcs = true;
+		}
 	}
 
 	nested_vmx_succeed(vcpu);
@@ -6895,6 +6915,11 @@ static void __noclone vmx_vcpu_run(struc
 	if (vmx->emulation_required)
 		return;
 
+	if (vmx->nested.sync_shadow_vmcs) {
+		copy_vmcs12_to_shadow(vmx);
+		vmx->nested.sync_shadow_vmcs = false;
+	}
+
 	if (test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty))
 		vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
 	if (test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty))
@@ -7504,6 +7529,9 @@ static int nested_vmx_run(struct kvm_vcp
 	skip_emulated_instruction(vcpu);
 	vmcs12 = get_vmcs12(vcpu);
 
+	if (enable_shadow_vmcs)
+		copy_shadow_to_vmcs12(vmx);
+
 	/*
 	 * The nested entry process starts with enforcing various prerequisites
 	 * on vmcs12 as required by the Intel SDM, and act appropriately when
@@ -7950,6 +7978,8 @@ static void nested_vmx_vmexit(struct kvm
 		nested_vmx_failValid(vcpu, vmcs_read32(VM_INSTRUCTION_ERROR));
 	} else
 		nested_vmx_succeed(vcpu);
+	if (enable_shadow_vmcs)
+		vmx->nested.sync_shadow_vmcs = true;
 }
 
 /*
@@ -7967,6 +7997,8 @@ static void nested_vmx_entry_failure(str
 	vmcs12->vm_exit_reason = reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
 	vmcs12->exit_qualification = qualification;
 	nested_vmx_succeed(vcpu);
+	if (enable_shadow_vmcs)
+		to_vmx(vcpu)->nested.sync_shadow_vmcs = true;
 }
 
 static int vmx_check_intercept(struct kvm_vcpu *vcpu,


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 13:47                                   ` Abel Gordon
@ 2013-04-14 14:41                                     ` Gleb Natapov
  0 siblings, 0 replies; 36+ messages in thread
From: Gleb Natapov @ 2013-04-14 14:41 UTC (permalink / raw)
  To: Abel Gordon
  Cc: dongxiao.xu, Jan Kiszka, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm

On Sun, Apr 14, 2013 at 04:47:48PM +0300, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 02:16:02 PM:
> 
> > On Sun, Apr 14, 2013 at 01:49:44PM +0300, Abel Gordon wrote:
> > >
> > >
> > > Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 01:34:52 PM:
> > >
> > > > On Sun, Apr 14, 2013 at 12:27:10PM +0200, Jan Kiszka wrote:
> > > > > On 2013-04-14 12:07, Gleb Natapov wrote:
> > > > > > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> > > > > >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> > > > > >>>
> > > > > >>>
> > > > > >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> > > > > >>>
> > > > > >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > > > > >>>>>
> > > > > >>>>> Ok, so then you prefer to add the inline functions to read/
> > > > write to the
> > > > > >>>>> vmcs12
> > > > > >>>>> fields, (to set the request bit if shadowed field changed)
> and
> > > you are
> > > > > >>> not
> > > > > >>>>> concerned
> > > > > >>>>> about any merge/rebase mess. I will work on this direction.
> > > > > >>>>> I'll first send an independent patch to introduce the
> accessors.
> > > Once
> > > > > >>> you
> > > > > >>>>> apply this patch, I'll continue and send you v2 patches for
> > > shadow
> > > > > >>> vmcs.
> > > > > >>>>>
> > > > > >>>>> Do you agree ?
> > > > > >>>> Yes.
> > > > > >>>
> > > > > >>> Looking again at the code it seems like we could avoid adding
> the
> > > > > >>> accessors.
> > > > > >>> We could just set a flag in nested_vmx_vmexit and
> > > > > >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset
> > > > the flag and
> > > > > >>> call copy_vmcs12_to_shadow (if required).
> > > > > >>>
> > > > > >>> What do you think ?
> > > > > >> Good idea! With accessors we can do further optimization by
> copying
> > > only
> > > > > >> things that changed, but it will be premature optimization at
> this
> > > > > >> point.
> > > > > >>
> > > > > > Actually this is good idea only if we know for sure that VMX
> > > emulation
> > > > > > changes vmcs12 only during guest entry/exit. Is this the case? I
> > > think
> > > > > > so.
> > > > >
> > > > > Some vmcs12 fields that are exposed to L1 are changed outside L2<->
> L1
> > > > > transitions. What comes to my mind: L0 emulates some change that L1
> > > does
> > > > > not trap, e.g. CRx accesses. Or what do you mean?
> > > > >
> > > > If vmcs12 is changed by L0 while L2 is running this is OK. If L0
> changes
> > > > shadowed vmcs12 field while L1 is running this is not OK. So for
> > > > instance if field XXX is R/W but we allow only read to be shadowed
> then
> > > > write emulation in L0 has to sync new value back to shadow before
> going
> > > > back to L1.
> > >
> > > Exactly.
> > >
> > > While L1 runs (L1 root mode), L0 does NOT change VMCS12 (unless L1
> > > executes vmwrite).
> > > VMCS12 fields are changed once L1 launches/resumes L2 and there is a
> > > L2 exit.
> > >
> > > L0 can change VMCS12 while it handles a L2 exit directly which is not
> > > forwarded to L1. But that's OK because L1 will eventually see the
> change
> > > once we switch to L1 due to other exit that L0 let L1 handle.
> > >
> > > L0 should NOT change VMCS12 fields if L1 is running and L1 didn't
> > > execute any vmlaunch, vmresume or vmwrite instruction.
> > >
> > The question is: is there a case like I described above when we shadow
> > only reads from R/W field and handle vmwrites in L0?
> 
> No, there is no such thing with the patches I submitted.
> The shadow VMCS patches assumes that only R/O fields can be shadowed
> for read and not for write (the shadow_read_only_fields array). R/O
> fields can't be written by L1 thus L0 will trap vmwrite
> to any r/o field and emulate a fail
> (VMXERR_VMWRITE_READ_ONLY_VMCS_COMPONENT).
> 
> Writable fields are shadowed for both read and write
> (the shadow_read_write_fields array).
> 
> There are no writable fields shadowed for write and not for read.
> There are no writable fields shadowed for read and not for write
> 
Got it, thanks. It may make sense for some fields to be shadowed for
read, but not for write, but they can be individually synced with shadow
during vmwrite emulation.

--
			Gleb.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 11:16                                 ` Gleb Natapov
@ 2013-04-14 13:47                                   ` Abel Gordon
  2013-04-14 14:41                                     ` Gleb Natapov
  0 siblings, 1 reply; 36+ messages in thread
From: Abel Gordon @ 2013-04-14 13:47 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: dongxiao.xu, Jan Kiszka, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 02:16:02 PM:

> On Sun, Apr 14, 2013 at 01:49:44PM +0300, Abel Gordon wrote:
> >
> >
> > Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 01:34:52 PM:
> >
> > > On Sun, Apr 14, 2013 at 12:27:10PM +0200, Jan Kiszka wrote:
> > > > On 2013-04-14 12:07, Gleb Natapov wrote:
> > > > > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> > > > >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> > > > >>>
> > > > >>>
> > > > >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> > > > >>>
> > > > >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > > > >>>>>
> > > > >>>>> Ok, so then you prefer to add the inline functions to read/
> > > write to the
> > > > >>>>> vmcs12
> > > > >>>>> fields, (to set the request bit if shadowed field changed)
and
> > you are
> > > > >>> not
> > > > >>>>> concerned
> > > > >>>>> about any merge/rebase mess. I will work on this direction.
> > > > >>>>> I'll first send an independent patch to introduce the
accessors.
> > Once
> > > > >>> you
> > > > >>>>> apply this patch, I'll continue and send you v2 patches for
> > shadow
> > > > >>> vmcs.
> > > > >>>>>
> > > > >>>>> Do you agree ?
> > > > >>>> Yes.
> > > > >>>
> > > > >>> Looking again at the code it seems like we could avoid adding
the
> > > > >>> accessors.
> > > > >>> We could just set a flag in nested_vmx_vmexit and
> > > > >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset
> > > the flag and
> > > > >>> call copy_vmcs12_to_shadow (if required).
> > > > >>>
> > > > >>> What do you think ?
> > > > >> Good idea! With accessors we can do further optimization by
copying
> > only
> > > > >> things that changed, but it will be premature optimization at
this
> > > > >> point.
> > > > >>
> > > > > Actually this is good idea only if we know for sure that VMX
> > emulation
> > > > > changes vmcs12 only during guest entry/exit. Is this the case? I
> > think
> > > > > so.
> > > >
> > > > Some vmcs12 fields that are exposed to L1 are changed outside L2<->
L1
> > > > transitions. What comes to my mind: L0 emulates some change that L1
> > does
> > > > not trap, e.g. CRx accesses. Or what do you mean?
> > > >
> > > If vmcs12 is changed by L0 while L2 is running this is OK. If L0
changes
> > > shadowed vmcs12 field while L1 is running this is not OK. So for
> > > instance if field XXX is R/W but we allow only read to be shadowed
then
> > > write emulation in L0 has to sync new value back to shadow before
going
> > > back to L1.
> >
> > Exactly.
> >
> > While L1 runs (L1 root mode), L0 does NOT change VMCS12 (unless L1
> > executes vmwrite).
> > VMCS12 fields are changed once L1 launches/resumes L2 and there is a
> > L2 exit.
> >
> > L0 can change VMCS12 while it handles a L2 exit directly which is not
> > forwarded to L1. But that's OK because L1 will eventually see the
change
> > once we switch to L1 due to other exit that L0 let L1 handle.
> >
> > L0 should NOT change VMCS12 fields if L1 is running and L1 didn't
> > execute any vmlaunch, vmresume or vmwrite instruction.
> >
> The question is: is there a case like I described above when we shadow
> only reads from R/W field and handle vmwrites in L0?

No, there is no such thing with the patches I submitted.
The shadow VMCS patches assumes that only R/O fields can be shadowed
for read and not for write (the shadow_read_only_fields array). R/O
fields can't be written by L1 thus L0 will trap vmwrite
to any r/o field and emulate a fail
(VMXERR_VMWRITE_READ_ONLY_VMCS_COMPONENT).

Writable fields are shadowed for both read and write
(the shadow_read_write_fields array).

There are no writable fields shadowed for write and not for read.
There are no writable fields shadowed for read and not for write







^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:49                               ` Abel Gordon
@ 2013-04-14 11:16                                 ` Gleb Natapov
  2013-04-14 13:47                                   ` Abel Gordon
  0 siblings, 1 reply; 36+ messages in thread
From: Gleb Natapov @ 2013-04-14 11:16 UTC (permalink / raw)
  To: Abel Gordon
  Cc: dongxiao.xu, Jan Kiszka, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm

On Sun, Apr 14, 2013 at 01:49:44PM +0300, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 01:34:52 PM:
> 
> > On Sun, Apr 14, 2013 at 12:27:10PM +0200, Jan Kiszka wrote:
> > > On 2013-04-14 12:07, Gleb Natapov wrote:
> > > > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> > > >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> > > >>>
> > > >>>
> > > >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> > > >>>
> > > >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > > >>>>>
> > > >>>>> Ok, so then you prefer to add the inline functions to read/
> > write to the
> > > >>>>> vmcs12
> > > >>>>> fields, (to set the request bit if shadowed field changed) and
> you are
> > > >>> not
> > > >>>>> concerned
> > > >>>>> about any merge/rebase mess. I will work on this direction.
> > > >>>>> I'll first send an independent patch to introduce the accessors.
> Once
> > > >>> you
> > > >>>>> apply this patch, I'll continue and send you v2 patches for
> shadow
> > > >>> vmcs.
> > > >>>>>
> > > >>>>> Do you agree ?
> > > >>>> Yes.
> > > >>>
> > > >>> Looking again at the code it seems like we could avoid adding the
> > > >>> accessors.
> > > >>> We could just set a flag in nested_vmx_vmexit and
> > > >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset
> > the flag and
> > > >>> call copy_vmcs12_to_shadow (if required).
> > > >>>
> > > >>> What do you think ?
> > > >> Good idea! With accessors we can do further optimization by copying
> only
> > > >> things that changed, but it will be premature optimization at this
> > > >> point.
> > > >>
> > > > Actually this is good idea only if we know for sure that VMX
> emulation
> > > > changes vmcs12 only during guest entry/exit. Is this the case? I
> think
> > > > so.
> > >
> > > Some vmcs12 fields that are exposed to L1 are changed outside L2<->L1
> > > transitions. What comes to my mind: L0 emulates some change that L1
> does
> > > not trap, e.g. CRx accesses. Or what do you mean?
> > >
> > If vmcs12 is changed by L0 while L2 is running this is OK. If L0 changes
> > shadowed vmcs12 field while L1 is running this is not OK. So for
> > instance if field XXX is R/W but we allow only read to be shadowed then
> > write emulation in L0 has to sync new value back to shadow before going
> > back to L1.
> 
> Exactly.
> 
> While L1 runs (L1 root mode), L0 does NOT change VMCS12 (unless L1
> executes vmwrite).
> VMCS12 fields are changed once L1 launches/resumes L2 and there is a
> L2 exit.
> 
> L0 can change VMCS12 while it handles a L2 exit directly which is not
> forwarded to L1. But that's OK because L1 will eventually see the change
> once we switch to L1 due to other exit that L0 let L1 handle.
> 
> L0 should NOT change VMCS12 fields if L1 is running and L1 didn't
> execute any vmlaunch, vmresume or vmwrite instruction.
> 
The question is: is there a case like I described above when we shadow
only reads from R/W field and handle vmwrites in L0?

--
			Gleb.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:34                             ` Gleb Natapov
@ 2013-04-14 10:49                               ` Abel Gordon
  2013-04-14 11:16                                 ` Gleb Natapov
  0 siblings, 1 reply; 36+ messages in thread
From: Abel Gordon @ 2013-04-14 10:49 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: dongxiao.xu, Jan Kiszka, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 01:34:52 PM:

> On Sun, Apr 14, 2013 at 12:27:10PM +0200, Jan Kiszka wrote:
> > On 2013-04-14 12:07, Gleb Natapov wrote:
> > > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> > >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> > >>>
> > >>>
> > >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> > >>>
> > >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > >>>>>
> > >>>>> Ok, so then you prefer to add the inline functions to read/
> write to the
> > >>>>> vmcs12
> > >>>>> fields, (to set the request bit if shadowed field changed) and
you are
> > >>> not
> > >>>>> concerned
> > >>>>> about any merge/rebase mess. I will work on this direction.
> > >>>>> I'll first send an independent patch to introduce the accessors.
Once
> > >>> you
> > >>>>> apply this patch, I'll continue and send you v2 patches for
shadow
> > >>> vmcs.
> > >>>>>
> > >>>>> Do you agree ?
> > >>>> Yes.
> > >>>
> > >>> Looking again at the code it seems like we could avoid adding the
> > >>> accessors.
> > >>> We could just set a flag in nested_vmx_vmexit and
> > >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset
> the flag and
> > >>> call copy_vmcs12_to_shadow (if required).
> > >>>
> > >>> What do you think ?
> > >> Good idea! With accessors we can do further optimization by copying
only
> > >> things that changed, but it will be premature optimization at this
> > >> point.
> > >>
> > > Actually this is good idea only if we know for sure that VMX
emulation
> > > changes vmcs12 only during guest entry/exit. Is this the case? I
think
> > > so.
> >
> > Some vmcs12 fields that are exposed to L1 are changed outside L2<->L1
> > transitions. What comes to my mind: L0 emulates some change that L1
does
> > not trap, e.g. CRx accesses. Or what do you mean?
> >
> If vmcs12 is changed by L0 while L2 is running this is OK. If L0 changes
> shadowed vmcs12 field while L1 is running this is not OK. So for
> instance if field XXX is R/W but we allow only read to be shadowed then
> write emulation in L0 has to sync new value back to shadow before going
> back to L1.

Exactly.

While L1 runs (L1 root mode), L0 does NOT change VMCS12 (unless L1
executes vmwrite).
VMCS12 fields are changed once L1 launches/resumes L2 and there is a
L2 exit.

L0 can change VMCS12 while it handles a L2 exit directly which is not
forwarded to L1. But that's OK because L1 will eventually see the change
once we switch to L1 due to other exit that L0 let L1 handle.

L0 should NOT change VMCS12 fields if L1 is running and L1 didn't
execute any vmlaunch, vmresume or vmwrite instruction.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:27                           ` Jan Kiszka
  2013-04-14 10:34                             ` Abel Gordon
@ 2013-04-14 10:34                             ` Gleb Natapov
  2013-04-14 10:49                               ` Abel Gordon
  1 sibling, 1 reply; 36+ messages in thread
From: Gleb Natapov @ 2013-04-14 10:34 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Abel Gordon, dongxiao.xu, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm

On Sun, Apr 14, 2013 at 12:27:10PM +0200, Jan Kiszka wrote:
> On 2013-04-14 12:07, Gleb Natapov wrote:
> > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> >>>
> >>>
> >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> >>>
> >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> >>>>>
> >>>>> Ok, so then you prefer to add the inline functions to read/write to the
> >>>>> vmcs12
> >>>>> fields, (to set the request bit if shadowed field changed) and you are
> >>> not
> >>>>> concerned
> >>>>> about any merge/rebase mess. I will work on this direction.
> >>>>> I'll first send an independent patch to introduce the accessors. Once
> >>> you
> >>>>> apply this patch, I'll continue and send you v2 patches for shadow
> >>> vmcs.
> >>>>>
> >>>>> Do you agree ?
> >>>> Yes.
> >>>
> >>> Looking again at the code it seems like we could avoid adding the
> >>> accessors.
> >>> We could just set a flag in nested_vmx_vmexit and
> >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset the flag and
> >>> call copy_vmcs12_to_shadow (if required).
> >>>
> >>> What do you think ?
> >> Good idea! With accessors we can do further optimization by copying only
> >> things that changed, but it will be premature optimization at this
> >> point.
> >>
> > Actually this is good idea only if we know for sure that VMX emulation
> > changes vmcs12 only during guest entry/exit. Is this the case? I think
> > so.
> 
> Some vmcs12 fields that are exposed to L1 are changed outside L2<->L1
> transitions. What comes to my mind: L0 emulates some change that L1 does
> not trap, e.g. CRx accesses. Or what do you mean?
> 
If vmcs12 is changed by L0 while L2 is running this is OK. If L0 changes
shadowed vmcs12 field while L1 is running this is not OK. So for
instance if field XXX is R/W but we allow only read to be shadowed then
write emulation in L0 has to sync new value back to shadow before going
back to L1.

--
			Gleb.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:27                           ` Jan Kiszka
@ 2013-04-14 10:34                             ` Abel Gordon
  2013-04-14 10:34                             ` Gleb Natapov
  1 sibling, 0 replies; 36+ messages in thread
From: Abel Gordon @ 2013-04-14 10:34 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: dongxiao.xu, Gleb Natapov, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm



Jan Kiszka <jan.kiszka@web.de> wrote on 14/04/2013 01:27:10 PM:


> On 2013-04-14 12:07, Gleb Natapov wrote:
> > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> >>>
> >>>
> >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> >>>
> >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> >>>>>
> >>>>> Ok, so then you prefer to add the inline functions to read/write to
the
> >>>>> vmcs12
> >>>>> fields, (to set the request bit if shadowed field changed) and you
are
> >>> not
> >>>>> concerned
> >>>>> about any merge/rebase mess. I will work on this direction.
> >>>>> I'll first send an independent patch to introduce the accessors.
Once
> >>> you
> >>>>> apply this patch, I'll continue and send you v2 patches for shadow
> >>> vmcs.
> >>>>>
> >>>>> Do you agree ?
> >>>> Yes.
> >>>
> >>> Looking again at the code it seems like we could avoid adding the
> >>> accessors.
> >>> We could just set a flag in nested_vmx_vmexit and
> >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset
> the flag and
> >>> call copy_vmcs12_to_shadow (if required).
> >>>
> >>> What do you think ?
> >> Good idea! With accessors we can do further optimization by copying
only
> >> things that changed, but it will be premature optimization at this
> >> point.
> >>
> > Actually this is good idea only if we know for sure that VMX emulation
> > changes vmcs12 only during guest entry/exit. Is this the case? I think
> > so.
>
> Some vmcs12 fields that are exposed to L1 are changed outside L2<->L1
> transitions. What comes to my mind: L0 emulates some change that L1 does
> not trap, e.g. CRx accesses. Or what do you mean?

If L0 emulates some change that L1 does not trap, L1 will not read (vmread)
the change until we switch to L1 (for other exit).
So, later we will do nested_vmx_vmexit and sync the shadowed field.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:07                         ` Gleb Natapov
@ 2013-04-14 10:27                           ` Jan Kiszka
  2013-04-14 10:34                             ` Abel Gordon
  2013-04-14 10:34                             ` Gleb Natapov
  0 siblings, 2 replies; 36+ messages in thread
From: Jan Kiszka @ 2013-04-14 10:27 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Abel Gordon, dongxiao.xu, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

On 2013-04-14 12:07, Gleb Natapov wrote:
> On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
>> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
>>>
>>>
>>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
>>>
>>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
>>>>>
>>>>> Ok, so then you prefer to add the inline functions to read/write to the
>>>>> vmcs12
>>>>> fields, (to set the request bit if shadowed field changed) and you are
>>> not
>>>>> concerned
>>>>> about any merge/rebase mess. I will work on this direction.
>>>>> I'll first send an independent patch to introduce the accessors. Once
>>> you
>>>>> apply this patch, I'll continue and send you v2 patches for shadow
>>> vmcs.
>>>>>
>>>>> Do you agree ?
>>>> Yes.
>>>
>>> Looking again at the code it seems like we could avoid adding the
>>> accessors.
>>> We could just set a flag in nested_vmx_vmexit and
>>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset the flag and
>>> call copy_vmcs12_to_shadow (if required).
>>>
>>> What do you think ?
>> Good idea! With accessors we can do further optimization by copying only
>> things that changed, but it will be premature optimization at this
>> point.
>>
> Actually this is good idea only if we know for sure that VMX emulation
> changes vmcs12 only during guest entry/exit. Is this the case? I think
> so.

Some vmcs12 fields that are exposed to L1 are changed outside L2<->L1
transitions. What comes to my mind: L0 emulates some change that L1 does
not trap, e.g. CRx accesses. Or what do you mean?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:00                       ` Gleb Natapov
@ 2013-04-14 10:07                         ` Gleb Natapov
  2013-04-14 10:27                           ` Jan Kiszka
  0 siblings, 1 reply; 36+ messages in thread
From: Gleb Natapov @ 2013-04-14 10:07 UTC (permalink / raw)
  To: Abel Gordon
  Cc: dongxiao.xu, jun.nakajima, kvm, kvm-owner, Nadav Har'El, owasserm

On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> > 
> > 
> > Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> > 
> > > On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > > >
> > > > Ok, so then you prefer to add the inline functions to read/write to the
> > > > vmcs12
> > > > fields, (to set the request bit if shadowed field changed) and you are
> > not
> > > > concerned
> > > > about any merge/rebase mess. I will work on this direction.
> > > > I'll first send an independent patch to introduce the accessors. Once
> > you
> > > > apply this patch, I'll continue and send you v2 patches for shadow
> > vmcs.
> > > >
> > > > Do you agree ?
> > > Yes.
> > 
> > Looking again at the code it seems like we could avoid adding the
> > accessors.
> > We could just set a flag in nested_vmx_vmexit and
> > nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset the flag and
> > call copy_vmcs12_to_shadow (if required).
> > 
> > What do you think ?
> Good idea! With accessors we can do further optimization by copying only
> things that changed, but it will be premature optimization at this
> point.
> 
Actually this is good idea only if we know for sure that VMX emulation
changes vmcs12 only during guest entry/exit. Is this the case? I think
so.

--
			Gleb.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14  9:51                     ` Abel Gordon
@ 2013-04-14 10:00                       ` Gleb Natapov
  2013-04-14 10:07                         ` Gleb Natapov
  0 siblings, 1 reply; 36+ messages in thread
From: Gleb Natapov @ 2013-04-14 10:00 UTC (permalink / raw)
  To: Abel Gordon
  Cc: dongxiao.xu, jun.nakajima, kvm, kvm-owner, Nadav Har'El, owasserm

On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> 
> > On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > >
> > > Ok, so then you prefer to add the inline functions to read/write to the
> > > vmcs12
> > > fields, (to set the request bit if shadowed field changed) and you are
> not
> > > concerned
> > > about any merge/rebase mess. I will work on this direction.
> > > I'll first send an independent patch to introduce the accessors. Once
> you
> > > apply this patch, I'll continue and send you v2 patches for shadow
> vmcs.
> > >
> > > Do you agree ?
> > Yes.
> 
> Looking again at the code it seems like we could avoid adding the
> accessors.
> We could just set a flag in nested_vmx_vmexit and
> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset the flag and
> call copy_vmcs12_to_shadow (if required).
> 
> What do you think ?
Good idea! With accessors we can do further optimization by copying only
things that changed, but it will be premature optimization at this
point.

--
			Gleb.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-12 10:48                   ` Gleb Natapov
@ 2013-04-14  9:51                     ` Abel Gordon
  2013-04-14 10:00                       ` Gleb Natapov
  0 siblings, 1 reply; 36+ messages in thread
From: Abel Gordon @ 2013-04-14  9:51 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: dongxiao.xu, jun.nakajima, kvm, kvm-owner, Nadav Har'El, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:

> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> >
> > Ok, so then you prefer to add the inline functions to read/write to the
> > vmcs12
> > fields, (to set the request bit if shadowed field changed) and you are
not
> > concerned
> > about any merge/rebase mess. I will work on this direction.
> > I'll first send an independent patch to introduce the accessors. Once
you
> > apply this patch, I'll continue and send you v2 patches for shadow
vmcs.
> >
> > Do you agree ?
> Yes.

Looking again at the code it seems like we could avoid adding the
accessors.
We could just set a flag in nested_vmx_vmexit and
nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset the flag and
call copy_vmcs12_to_shadow (if required).

What do you think ?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-12 10:44                 ` Abel Gordon
@ 2013-04-12 10:48                   ` Gleb Natapov
  2013-04-14  9:51                     ` Abel Gordon
  0 siblings, 1 reply; 36+ messages in thread
From: Gleb Natapov @ 2013-04-12 10:48 UTC (permalink / raw)
  To: Abel Gordon
  Cc: dongxiao.xu, jun.nakajima, kvm, kvm-owner, Nadav Har'El, owasserm

On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> 
> 
> kvm-owner@vger.kernel.org wrote on 12/04/2013 01:31:17 PM:
> 
> > From: Gleb Natapov <gleb@redhat.com>
> > To: Abel Gordon/Haifa/IBM@IBMIL,
> > Cc: dongxiao.xu@intel.com, jun.nakajima@intel.com,
> > kvm@vger.kernel.org, "Nadav Har'El" <nyh@math.technion.ac.il>,
> > owasserm@redhat.com
> > Date: 12/04/2013 01:31 PM
> > Subject: Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content
> > with the shadow vmcs
> > Sent by: kvm-owner@vger.kernel.org
> >
> > On Fri, Apr 12, 2013 at 01:26:32PM +0300, Abel Gordon wrote:
> > >
> > >
> > > Gleb Natapov <gleb@redhat.com> wrote on 11/04/2013 09:54:11 AM:
> > >
> > > > On Wed, Apr 10, 2013 at 10:15:37PM +0300, Abel Gordon wrote:
> > > > >
> > > > >
> > > > > Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:
> > > > > > I think the patch already miss some fields. What if
> nested_vmx_run()
> > > > > > fails and calls nested_vmx_entry_failure().
> nested_vmx_entry_failure
> > > ()
> > > > > > sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but
> where
> > > do
> > > > > > we copy them back to shadow before going back to L1?
> > > > >
> > > > > Good catch! :)
> > > > >
> > > > > Note that the entry path is easy to handle because we copy the
> fields
> > > > > as part of nested_vmx_entry. This is not like exit path where
> > > > > KVM(L0) code can modify fields after nested_vmx_vmexit is called.
> > > > >
> > > > > So here, we could simple call copy_vmcs12_to_shadow if the entry
> fails
> > > > > (as part of nested_vmx_entry_failure or nested_vmx). We could
> optimize
> > > > > the code by updating these specific fields directly, but I don't
> think
> > > > > we really need to optimize code that is part of the error path.
> > > > >
> > > > We needn't.
> > > >
> > > > > > May be we need to introduce vmcs12 accessors to track what is
> changes
> > > > > > and if something need to be copied to shadow before going back to
> L1.
> > > > >
> > > > > That means we will need to modify all the lines of code that uses
> > > > > "vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write
> > > function.
> > > > > Inside these inline functions we could access the shadow vmcs
> directly.
> > > > > However, to access the shadow vmcs we need to vmptrld first and
> this
> > > will
> > > > > force
> > > > > unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01)
> each
> > > time
> > > > > the code accesses a vmcs12 field. Alternatively, if we want to
> avoid
> > > > > unnecessary vmptrlds each time we access vmcs12 we could simple set
> a
> > > > > flag that indicates when a shadow field was changed. In this case,
> we
> > > will
> > > > > need to find all the places to check the flag and copy the fields,
> > > > > considering both success and error paths.
> > > > That's not how I see it. nested_vmcs_write() will set a request bit
> in
> > > > vcpu (this is the flag you mention above). The bit will be checked
> during
> > > > a guest entry and vmcs12 will be synced to shadow at this point.
> Later
> > > > we can track what fields were written and sync only them.
> > > >
> > XXX
> >
> > > > > Finally, I am afraid that these directions will introduce new
> issues,
> > > > > will force us to modify too many lines and they may create a
> > > merge/rebase
> > > > > mess...
> > > > Conflicts should be trivial and I do not expect many iterations for
> the
> > > > patch series. I like the approach you take to use vmcs shadowing and
> most
> > > > comment are nitpicks.  I promise to review the next version as soon
> as
> > > > it is posted :)
> > > >
> > > > >
> > > > > Maybe we should simple fix nested_vmx_entry_failure (as well as the
> > > > > other fixes you suggested in other patches) and apply the code.
> > > > > Do you agree ?
> > > > >
> > > > The approach is error prone. Even if we will fix all bugs in the
> current
> > > > code each new nested vmx modification will have to be reviewed with
> > > shadowing
> > > > in mind and sometimes it is hard to see global picture just from a
> > > > patch. It is better to hide shadow details from occasional nested vmx
> > > hacker.
> > >
> > > But how ? I suggested 2 alternatives with pros/cons.
> > I outlined how above (search for XXX). Use accessors, set request bit,
> > sync during guest entry if request bit is set.
> 
> Ok, so then you prefer to add the inline functions to read/write to the
> vmcs12
> fields, (to set the request bit if shadowed field changed) and you are not
> concerned
> about any merge/rebase mess. I will work on this direction.
> I'll first send an independent patch to introduce the accessors. Once you
> apply this patch, I'll continue and send you v2 patches for shadow vmcs.
> 
> Do you agree ?
Yes.

--
			Gleb.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-12 10:31               ` Gleb Natapov
@ 2013-04-12 10:44                 ` Abel Gordon
  2013-04-12 10:48                   ` Gleb Natapov
  0 siblings, 1 reply; 36+ messages in thread
From: Abel Gordon @ 2013-04-12 10:44 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: dongxiao.xu, jun.nakajima, kvm, kvm-owner, Nadav Har'El, owasserm



kvm-owner@vger.kernel.org wrote on 12/04/2013 01:31:17 PM:

> From: Gleb Natapov <gleb@redhat.com>
> To: Abel Gordon/Haifa/IBM@IBMIL,
> Cc: dongxiao.xu@intel.com, jun.nakajima@intel.com,
> kvm@vger.kernel.org, "Nadav Har'El" <nyh@math.technion.ac.il>,
> owasserm@redhat.com
> Date: 12/04/2013 01:31 PM
> Subject: Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content
> with the shadow vmcs
> Sent by: kvm-owner@vger.kernel.org
>
> On Fri, Apr 12, 2013 at 01:26:32PM +0300, Abel Gordon wrote:
> >
> >
> > Gleb Natapov <gleb@redhat.com> wrote on 11/04/2013 09:54:11 AM:
> >
> > > On Wed, Apr 10, 2013 at 10:15:37PM +0300, Abel Gordon wrote:
> > > >
> > > >
> > > > Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:
> > > > > I think the patch already miss some fields. What if
nested_vmx_run()
> > > > > fails and calls nested_vmx_entry_failure().
nested_vmx_entry_failure
> > ()
> > > > > sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but
where
> > do
> > > > > we copy them back to shadow before going back to L1?
> > > >
> > > > Good catch! :)
> > > >
> > > > Note that the entry path is easy to handle because we copy the
fields
> > > > as part of nested_vmx_entry. This is not like exit path where
> > > > KVM(L0) code can modify fields after nested_vmx_vmexit is called.
> > > >
> > > > So here, we could simple call copy_vmcs12_to_shadow if the entry
fails
> > > > (as part of nested_vmx_entry_failure or nested_vmx). We could
optimize
> > > > the code by updating these specific fields directly, but I don't
think
> > > > we really need to optimize code that is part of the error path.
> > > >
> > > We needn't.
> > >
> > > > > May be we need to introduce vmcs12 accessors to track what is
changes
> > > > > and if something need to be copied to shadow before going back to
L1.
> > > >
> > > > That means we will need to modify all the lines of code that uses
> > > > "vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write
> > function.
> > > > Inside these inline functions we could access the shadow vmcs
directly.
> > > > However, to access the shadow vmcs we need to vmptrld first and
this
> > will
> > > > force
> > > > unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01)
each
> > time
> > > > the code accesses a vmcs12 field. Alternatively, if we want to
avoid
> > > > unnecessary vmptrlds each time we access vmcs12 we could simple set
a
> > > > flag that indicates when a shadow field was changed. In this case,
we
> > will
> > > > need to find all the places to check the flag and copy the fields,
> > > > considering both success and error paths.
> > > That's not how I see it. nested_vmcs_write() will set a request bit
in
> > > vcpu (this is the flag you mention above). The bit will be checked
during
> > > a guest entry and vmcs12 will be synced to shadow at this point.
Later
> > > we can track what fields were written and sync only them.
> > >
> XXX
>
> > > > Finally, I am afraid that these directions will introduce new
issues,
> > > > will force us to modify too many lines and they may create a
> > merge/rebase
> > > > mess...
> > > Conflicts should be trivial and I do not expect many iterations for
the
> > > patch series. I like the approach you take to use vmcs shadowing and
most
> > > comment are nitpicks.  I promise to review the next version as soon
as
> > > it is posted :)
> > >
> > > >
> > > > Maybe we should simple fix nested_vmx_entry_failure (as well as the
> > > > other fixes you suggested in other patches) and apply the code.
> > > > Do you agree ?
> > > >
> > > The approach is error prone. Even if we will fix all bugs in the
current
> > > code each new nested vmx modification will have to be reviewed with
> > shadowing
> > > in mind and sometimes it is hard to see global picture just from a
> > > patch. It is better to hide shadow details from occasional nested vmx
> > hacker.
> >
> > But how ? I suggested 2 alternatives with pros/cons.
> I outlined how above (search for XXX). Use accessors, set request bit,
> sync during guest entry if request bit is set.

Ok, so then you prefer to add the inline functions to read/write to the
vmcs12
fields, (to set the request bit if shadowed field changed) and you are not
concerned
about any merge/rebase mess. I will work on this direction.
I'll first send an independent patch to introduce the accessors. Once you
apply this patch, I'll continue and send you v2 patches for shadow vmcs.

Do you agree ?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-12 10:26             ` Abel Gordon
@ 2013-04-12 10:31               ` Gleb Natapov
  2013-04-12 10:44                 ` Abel Gordon
  0 siblings, 1 reply; 36+ messages in thread
From: Gleb Natapov @ 2013-04-12 10:31 UTC (permalink / raw)
  To: Abel Gordon; +Cc: dongxiao.xu, jun.nakajima, kvm, Nadav Har'El, owasserm

On Fri, Apr 12, 2013 at 01:26:32PM +0300, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 11/04/2013 09:54:11 AM:
> 
> > On Wed, Apr 10, 2013 at 10:15:37PM +0300, Abel Gordon wrote:
> > >
> > >
> > > Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:
> > > > I think the patch already miss some fields. What if nested_vmx_run()
> > > > fails and calls nested_vmx_entry_failure(). nested_vmx_entry_failure
> ()
> > > > sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but where
> do
> > > > we copy them back to shadow before going back to L1?
> > >
> > > Good catch! :)
> > >
> > > Note that the entry path is easy to handle because we copy the fields
> > > as part of nested_vmx_entry. This is not like exit path where
> > > KVM(L0) code can modify fields after nested_vmx_vmexit is called.
> > >
> > > So here, we could simple call copy_vmcs12_to_shadow if the entry fails
> > > (as part of nested_vmx_entry_failure or nested_vmx). We could optimize
> > > the code by updating these specific fields directly, but I don't think
> > > we really need to optimize code that is part of the error path.
> > >
> > We needn't.
> >
> > > > May be we need to introduce vmcs12 accessors to track what is changes
> > > > and if something need to be copied to shadow before going back to L1.
> > >
> > > That means we will need to modify all the lines of code that uses
> > > "vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write
> function.
> > > Inside these inline functions we could access the shadow vmcs directly.
> > > However, to access the shadow vmcs we need to vmptrld first and this
> will
> > > force
> > > unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01) each
> time
> > > the code accesses a vmcs12 field. Alternatively, if we want to avoid
> > > unnecessary vmptrlds each time we access vmcs12 we could simple set a
> > > flag that indicates when a shadow field was changed. In this case, we
> will
> > > need to find all the places to check the flag and copy the fields,
> > > considering both success and error paths.
> > That's not how I see it. nested_vmcs_write() will set a request bit in
> > vcpu (this is the flag you mention above). The bit will be checked during
> > a guest entry and vmcs12 will be synced to shadow at this point. Later
> > we can track what fields were written and sync only them.
> >
XXX

> > > Finally, I am afraid that these directions will introduce new issues,
> > > will force us to modify too many lines and they may create a
> merge/rebase
> > > mess...
> > Conflicts should be trivial and I do not expect many iterations for the
> > patch series. I like the approach you take to use vmcs shadowing and most
> > comment are nitpicks.  I promise to review the next version as soon as
> > it is posted :)
> >
> > >
> > > Maybe we should simple fix nested_vmx_entry_failure (as well as the
> > > other fixes you suggested in other patches) and apply the code.
> > > Do you agree ?
> > >
> > The approach is error prone. Even if we will fix all bugs in the current
> > code each new nested vmx modification will have to be reviewed with
> shadowing
> > in mind and sometimes it is hard to see global picture just from a
> > patch. It is better to hide shadow details from occasional nested vmx
> hacker.
> 
> But how ? I suggested 2 alternatives with pros/cons.
I outlined how above (search for XXX). Use accessors, set request bit,
sync during guest entry if request bit is set.

> Do you have any other alternative in mid ?
> Three more alternatives:
> (3) set vmcs12 to null once the shadow is synched (so inappropriate
> accesses
> due to buggy code will access a null pointer and crash kvm).
> (4) like -1-, but using the accessors and WARN/PANIC.
> (5) change nested_vmx_vmexit so it can update additional fields passed
> by the caller. Additional vmcs12 fields should NOT be updated directly
> (as it is today). They should be updated only when calling
> nested_vmx_vmexit.
> 
> 
> > Are you expecting EPT patches to make it right for instance?
> 
> Yep, I actually added shadow vmcs also to a KVM version
> that includes nEPT patches Nadav Har'El shared. It required adding
> 1 call to copy_vmcs12_to_shadow after a nested exit related to which
> updates vmcs12 fields.

--
			Gleb.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-11  6:54           ` Gleb Natapov
@ 2013-04-12 10:26             ` Abel Gordon
  2013-04-12 10:31               ` Gleb Natapov
  0 siblings, 1 reply; 36+ messages in thread
From: Abel Gordon @ 2013-04-12 10:26 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, Nadav Har'El, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 11/04/2013 09:54:11 AM:

> On Wed, Apr 10, 2013 at 10:15:37PM +0300, Abel Gordon wrote:
> >
> >
> > Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:
> > > I think the patch already miss some fields. What if nested_vmx_run()
> > > fails and calls nested_vmx_entry_failure(). nested_vmx_entry_failure
()
> > > sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but where
do
> > > we copy them back to shadow before going back to L1?
> >
> > Good catch! :)
> >
> > Note that the entry path is easy to handle because we copy the fields
> > as part of nested_vmx_entry. This is not like exit path where
> > KVM(L0) code can modify fields after nested_vmx_vmexit is called.
> >
> > So here, we could simple call copy_vmcs12_to_shadow if the entry fails
> > (as part of nested_vmx_entry_failure or nested_vmx). We could optimize
> > the code by updating these specific fields directly, but I don't think
> > we really need to optimize code that is part of the error path.
> >
> We needn't.
>
> > > May be we need to introduce vmcs12 accessors to track what is changes
> > > and if something need to be copied to shadow before going back to L1.
> >
> > That means we will need to modify all the lines of code that uses
> > "vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write
function.
> > Inside these inline functions we could access the shadow vmcs directly.
> > However, to access the shadow vmcs we need to vmptrld first and this
will
> > force
> > unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01) each
time
> > the code accesses a vmcs12 field. Alternatively, if we want to avoid
> > unnecessary vmptrlds each time we access vmcs12 we could simple set a
> > flag that indicates when a shadow field was changed. In this case, we
will
> > need to find all the places to check the flag and copy the fields,
> > considering both success and error paths.
> That's not how I see it. nested_vmcs_write() will set a request bit in
> vcpu (this is the flag you mention above). The bit will be checked during
> a guest entry and vmcs12 will be synced to shadow at this point. Later
> we can track what fields were written and sync only them.
>
> > Finally, I am afraid that these directions will introduce new issues,
> > will force us to modify too many lines and they may create a
merge/rebase
> > mess...
> Conflicts should be trivial and I do not expect many iterations for the
> patch series. I like the approach you take to use vmcs shadowing and most
> comment are nitpicks.  I promise to review the next version as soon as
> it is posted :)
>
> >
> > Maybe we should simple fix nested_vmx_entry_failure (as well as the
> > other fixes you suggested in other patches) and apply the code.
> > Do you agree ?
> >
> The approach is error prone. Even if we will fix all bugs in the current
> code each new nested vmx modification will have to be reviewed with
shadowing
> in mind and sometimes it is hard to see global picture just from a
> patch. It is better to hide shadow details from occasional nested vmx
hacker.

But how ? I suggested 2 alternatives with pros/cons.
Do you have any other alternative in mid ?
Three more alternatives:
(3) set vmcs12 to null once the shadow is synched (so inappropriate
accesses
due to buggy code will access a null pointer and crash kvm).
(4) like -1-, but using the accessors and WARN/PANIC.
(5) change nested_vmx_vmexit so it can update additional fields passed
by the caller. Additional vmcs12 fields should NOT be updated directly
(as it is today). They should be updated only when calling
nested_vmx_vmexit.


> Are you expecting EPT patches to make it right for instance?

Yep, I actually added shadow vmcs also to a KVM version
that includes nEPT patches Nadav Har'El shared. It required adding
1 call to copy_vmcs12_to_shadow after a nested exit related to which
updates vmcs12 fields.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-10 19:15         ` Abel Gordon
@ 2013-04-11  6:54           ` Gleb Natapov
  2013-04-12 10:26             ` Abel Gordon
  0 siblings, 1 reply; 36+ messages in thread
From: Gleb Natapov @ 2013-04-11  6:54 UTC (permalink / raw)
  To: Abel Gordon; +Cc: dongxiao.xu, jun.nakajima, kvm, Nadav Har'El, owasserm

On Wed, Apr 10, 2013 at 10:15:37PM +0300, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:
> 
> > On Mon, Mar 11, 2013 at 09:54:47AM +0200, Abel Gordon wrote:
> > > "Nadav Har'El" <nyh@math.technion.ac.il> wrote on 11/03/2013 12:43:35
> AM:
> > >
> > >
> > > > On Sun, Mar 10, 2013, Abel Gordon wrote about "[PATCH 10/11] KVM:
> > > > nVMX: Synchronize VMCS12 content with the shadow vmcs":
> > > > >     nested_vmx_vmexit(vcpu);
> > > > > +   if (enable_shadow_vmcs)
> > > > > +      copy_vmcs12_to_shadow(to_vmx(vcpu));
> > > >
> > > > I was curious why your patch adds this call to copy_vmcs12_to_shadow
> > > after
> > > > every nested_vmx_vmexit (3 times), instead of making this call inside
> > > > nested_vmx_vmexit(), say right after prepare_vmcs12(). Until I saw:
> > >
> 
> > > Because nested code sometimes modifies vmcs fileds "after"
> > > nested_vmx_vmexit (see below). I was afraid nested logic
> > > may be changed in the future and some field may become out-of-sync.
> > >
> > > If we do have to call copy_vmcs12_to_shadow explicitly, then, it will
> be
> > > more difficult to miss some field.
> > >
> 
> 
> > I think the patch already miss some fields. What if nested_vmx_run()
> > fails and calls nested_vmx_entry_failure(). nested_vmx_entry_failure()
> > sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but where do
> > we copy them back to shadow before going back to L1?
> 
> Good catch! :)
> 
> Note that the entry path is easy to handle because we copy the fields
> as part of nested_vmx_entry. This is not like exit path where
> KVM(L0) code can modify fields after nested_vmx_vmexit is called.
> 
> So here, we could simple call copy_vmcs12_to_shadow if the entry fails
> (as part of nested_vmx_entry_failure or nested_vmx). We could optimize
> the code by updating these specific fields directly, but I don't think
> we really need to optimize code that is part of the error path.
> 
We needn't.

> > May be we need to introduce vmcs12 accessors to track what is changes
> > and if something need to be copied to shadow before going back to L1.
> 
> That means we will need to modify all the lines of code that uses
> "vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write function.
> Inside these inline functions we could access the shadow vmcs directly.
> However, to access the shadow vmcs we need to vmptrld first and this will
> force
> unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01) each time
> the code accesses a vmcs12 field. Alternatively, if we want to avoid
> unnecessary vmptrlds each time we access vmcs12 we could simple set a
> flag that indicates when a shadow field was changed. In this case, we will
> need to find all the places to check the flag and copy the fields,
> considering both success and error paths.
That's not how I see it. nested_vmcs_write() will set a request bit in
vcpu (this is the flag you mention above). The bit will be checked during
a guest entry and vmcs12 will be synced to shadow at this point. Later
we can track what fields were written and sync only them.

> Finally, I am afraid that these directions will introduce new issues,
> will force us to modify too many lines and they may create a merge/rebase
> mess...
Conflicts should be trivial and I do not expect many iterations for the
patch series. I like the approach you take to use vmcs shadowing and most
comment are nitpicks.  I promise to review the next version as soon as
it is posted :)

> 
> Maybe we should simple fix nested_vmx_entry_failure (as well as the
> other fixes you suggested in other patches) and apply the code.
> Do you agree ?
> 
The approach is error prone. Even if we will fix all bugs in the current
code each new nested vmx modification will have to be reviewed with shadowing
in mind and sometimes it is hard to see global picture just from a
patch. It is better to hide shadow details from occasional nested vmx hacker.
Are you expecting EPT patches to make it right for instance?

--
			Gleb.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-09 13:14       ` Gleb Natapov
@ 2013-04-10 19:15         ` Abel Gordon
  2013-04-11  6:54           ` Gleb Natapov
  0 siblings, 1 reply; 36+ messages in thread
From: Abel Gordon @ 2013-04-10 19:15 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, Nadav Har'El, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:

> On Mon, Mar 11, 2013 at 09:54:47AM +0200, Abel Gordon wrote:
> > "Nadav Har'El" <nyh@math.technion.ac.il> wrote on 11/03/2013 12:43:35
AM:
> >
> >
> > > On Sun, Mar 10, 2013, Abel Gordon wrote about "[PATCH 10/11] KVM:
> > > nVMX: Synchronize VMCS12 content with the shadow vmcs":
> > > >     nested_vmx_vmexit(vcpu);
> > > > +   if (enable_shadow_vmcs)
> > > > +      copy_vmcs12_to_shadow(to_vmx(vcpu));
> > >
> > > I was curious why your patch adds this call to copy_vmcs12_to_shadow
> > after
> > > every nested_vmx_vmexit (3 times), instead of making this call inside
> > > nested_vmx_vmexit(), say right after prepare_vmcs12(). Until I saw:
> >

> > Because nested code sometimes modifies vmcs fileds "after"
> > nested_vmx_vmexit (see below). I was afraid nested logic
> > may be changed in the future and some field may become out-of-sync.
> >
> > If we do have to call copy_vmcs12_to_shadow explicitly, then, it will
be
> > more difficult to miss some field.
> >


> I think the patch already miss some fields. What if nested_vmx_run()
> fails and calls nested_vmx_entry_failure(). nested_vmx_entry_failure()
> sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but where do
> we copy them back to shadow before going back to L1?

Good catch! :)

Note that the entry path is easy to handle because we copy the fields
as part of nested_vmx_entry. This is not like exit path where
KVM(L0) code can modify fields after nested_vmx_vmexit is called.

So here, we could simple call copy_vmcs12_to_shadow if the entry fails
(as part of nested_vmx_entry_failure or nested_vmx). We could optimize
the code by updating these specific fields directly, but I don't think
we really need to optimize code that is part of the error path.

> May be we need to introduce vmcs12 accessors to track what is changes
> and if something need to be copied to shadow before going back to L1.

That means we will need to modify all the lines of code that uses
"vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write function.
Inside these inline functions we could access the shadow vmcs directly.
However, to access the shadow vmcs we need to vmptrld first and this will
force
unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01) each time
the code accesses a vmcs12 field. Alternatively, if we want to avoid
unnecessary vmptrlds each time we access vmcs12 we could simple set a
flag that indicates when a shadow field was changed. In this case, we will
need to find all the places to check the flag and copy the fields,
considering both success and error paths.
Finally, I am afraid that these directions will introduce new issues,
will force us to modify too many lines and they may create a merge/rebase
mess...

Maybe we should simple fix nested_vmx_entry_failure (as well as the
other fixes you suggested in other patches) and apply the code.
Do you agree ?




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-03-11  7:54     ` Abel Gordon
@ 2013-04-09 13:14       ` Gleb Natapov
  2013-04-10 19:15         ` Abel Gordon
  0 siblings, 1 reply; 36+ messages in thread
From: Gleb Natapov @ 2013-04-09 13:14 UTC (permalink / raw)
  To: Abel Gordon; +Cc: Nadav Har'El, dongxiao.xu, jun.nakajima, kvm, owasserm

On Mon, Mar 11, 2013 at 09:54:47AM +0200, Abel Gordon wrote:
> "Nadav Har'El" <nyh@math.technion.ac.il> wrote on 11/03/2013 12:43:35 AM:
> 
> 
> > On Sun, Mar 10, 2013, Abel Gordon wrote about "[PATCH 10/11] KVM:
> > nVMX: Synchronize VMCS12 content with the shadow vmcs":
> > >     nested_vmx_vmexit(vcpu);
> > > +   if (enable_shadow_vmcs)
> > > +      copy_vmcs12_to_shadow(to_vmx(vcpu));
> >
> > I was curious why your patch adds this call to copy_vmcs12_to_shadow
> after
> > every nested_vmx_vmexit (3 times), instead of making this call inside
> > nested_vmx_vmexit(), say right after prepare_vmcs12(). Until I saw:
> 
> Because nested code sometimes modifies vmcs fileds "after"
> nested_vmx_vmexit (see below). I was afraid nested logic
> may be changed in the future and some field may become out-of-sync.
> 
> If we do have to call copy_vmcs12_to_shadow explicitly, then, it will be
> more difficult to miss some field.
> 
I think the patch already miss some fields. What if nested_vmx_run()
fails and calls nested_vmx_entry_failure(). nested_vmx_entry_failure()
sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but where do
we copy them back to shadow before going back to L1?

May be we need to introduce vmcs12 accessors to track what is changes
and if something need to be copied to shadow before going back to L1.

--
			Gleb.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-03-10 22:43   ` Nadav Har'El
@ 2013-03-11  7:54     ` Abel Gordon
  2013-04-09 13:14       ` Gleb Natapov
  0 siblings, 1 reply; 36+ messages in thread
From: Abel Gordon @ 2013-03-11  7:54 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: dongxiao.xu, jun.nakajima, kvm, owasserm

"Nadav Har'El" <nyh@math.technion.ac.il> wrote on 11/03/2013 12:43:35 AM:


> On Sun, Mar 10, 2013, Abel Gordon wrote about "[PATCH 10/11] KVM:
> nVMX: Synchronize VMCS12 content with the shadow vmcs":
> >     nested_vmx_vmexit(vcpu);
> > +   if (enable_shadow_vmcs)
> > +      copy_vmcs12_to_shadow(to_vmx(vcpu));
>
> I was curious why your patch adds this call to copy_vmcs12_to_shadow
after
> every nested_vmx_vmexit (3 times), instead of making this call inside
> nested_vmx_vmexit(), say right after prepare_vmcs12(). Until I saw:

Because nested code sometimes modifies vmcs fileds "after"
nested_vmx_vmexit (see below). I was afraid nested logic
may be changed in the future and some field may become out-of-sync.

If we do have to call copy_vmcs12_to_shadow explicitly, then, it will be
more difficult to miss some field.

>
> >        nested_vmx_vmexit(vcpu);
> >        vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
> >        vmcs12->vm_exit_intr_info = 0;
> > +      if (enable_shadow_vmcs)
> > +         copy_vmcs12_to_shadow(to_vmx(vcpu));
>
> where you need to copy this exit-reason override as well...
Exactly ;)

Also, if I remember properly, nested EPT patches also modify vmcs12 fields
outside the nested_vmx_vmexit context.

> I wonder if there isn't a less ugly and repetitive way to do this :(

We can change nested_vmx_vmexit to get additional parameters so
we can set others fields (e.g. exit_reasion and exit_intr_info) as
part of this function, however, I am not sure this will be
"less ugly"... do you remember the old/original nested code
in which nested_vmx_vmexit had an additional "external_interrupt"
parameter ? :)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-03-10 16:08 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs Abel Gordon
@ 2013-03-10 22:43   ` Nadav Har'El
  2013-03-11  7:54     ` Abel Gordon
  0 siblings, 1 reply; 36+ messages in thread
From: Nadav Har'El @ 2013-03-10 22:43 UTC (permalink / raw)
  To: Abel Gordon; +Cc: kvm, owasserm, jun.nakajima, dongxiao.xu

Hi Abel, very nice patches.

On Sun, Mar 10, 2013, Abel Gordon wrote about "[PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs":
>  	nested_vmx_vmexit(vcpu);
> +	if (enable_shadow_vmcs)
> +		copy_vmcs12_to_shadow(to_vmx(vcpu));

I was curious why your patch adds this call to copy_vmcs12_to_shadow after
every nested_vmx_vmexit (3 times), instead of making this call inside
nested_vmx_vmexit(), say right after prepare_vmcs12(). Until I saw:

>  		nested_vmx_vmexit(vcpu);
>  		vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
>  		vmcs12->vm_exit_intr_info = 0;
> +		if (enable_shadow_vmcs)
> +			copy_vmcs12_to_shadow(to_vmx(vcpu));

where you need to copy this exit-reason override as well...
I wonder if there isn't a less ugly and repetitive way to do this :(

-- 
Nadav Har'El                        |        Monday, Mar 11 2013, 29 Adar 5773
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Fame: when your name is in everything but
http://nadav.harel.org.il           |the phone book.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
@ 2013-03-10 16:08 ` Abel Gordon
  2013-03-10 22:43   ` Nadav Har'El
  0 siblings, 1 reply; 36+ messages in thread
From: Abel Gordon @ 2013-03-10 16:08 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Synchronize between the VMCS1212 software controlled structure and the
processor-specific shadow vmcs

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
@@ -1843,6 +1843,8 @@ static int nested_pf_handled(struct kvm_
 		return 0;
 
 	nested_vmx_vmexit(vcpu);
+	if (enable_shadow_vmcs)
+		copy_vmcs12_to_shadow(to_vmx(vcpu));
 	return 1;
 }
 
@@ -4430,6 +4432,8 @@ static int vmx_interrupt_allowed(struct 
 		nested_vmx_vmexit(vcpu);
 		vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
 		vmcs12->vm_exit_intr_info = 0;
+		if (enable_shadow_vmcs)
+			copy_vmcs12_to_shadow(to_vmx(vcpu));
 		/* fall through to normal code, but now in L1, not L2 */
 	}
 
@@ -5490,6 +5494,9 @@ static inline void nested_release_vmcs12
 {
 	if (enable_shadow_vmcs) {
 		if (vmx->nested.current_vmcs12 != NULL) {
+			/* copy to memory all shadowed fields in case
+			   they were modified */
+			copy_shadow_to_vmcs12(vmx);
 			free_vmcs(vmx->nested.current_shadow_vmcs);
 		}
 	}
@@ -6037,6 +6044,7 @@ static int handle_vmptrld(struct kvm_vcp
 			/* init shadow vmcs */
 			vmcs_clear(shadow_vmcs);
 			vmx->nested.current_shadow_vmcs = shadow_vmcs;
+			copy_vmcs12_to_shadow(vmx);
 		}
 	}
 
@@ -6425,6 +6433,8 @@ static int vmx_handle_exit(struct kvm_vc
 
 	if (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) {
 		nested_vmx_vmexit(vcpu);
+		if (enable_shadow_vmcs)
+			copy_vmcs12_to_shadow(vmx);
 		return 1;
 	}
 
@@ -7384,6 +7394,8 @@ static int nested_vmx_run(struct kvm_vcp
 	skip_emulated_instruction(vcpu);
 	vmcs12 = get_vmcs12(vcpu);
 
+	if (enable_shadow_vmcs)
+		copy_shadow_to_vmcs12(vmx);
 	/*
 	 * The nested entry process starts with enforcing various prerequisites
 	 * on vmcs12 as required by the Intel SDM, and act appropriately when


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2013-04-22  7:55 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
2013-04-18 11:34 ` [PATCH 01/11] KVM: nVMX: Shadow-vmcs control fields/bits Abel Gordon
2013-04-18 11:35 ` [PATCH 02/11] KVM: nVMX: Detect shadow-vmcs capability Abel Gordon
2013-04-18 11:35 ` [PATCH 03/11] KVM: nVMX: Introduce vmread and vmwrite bitmaps Abel Gordon
2013-04-18 11:36 ` [PATCH 04/11] KVM: nVMX: Refactor handle_vmwrite Abel Gordon
2013-04-18 11:36 ` [PATCH 05/11] KVM: nVMX: Fix VMXON emulation Abel Gordon
2013-04-18 11:37 ` [PATCH 06/11] KVM: nVMX: Allocate shadow vmcs Abel Gordon
2013-04-18 11:37 ` [PATCH 07/11] KVM: nVMX: Release " Abel Gordon
2013-04-18 11:38 ` [PATCH 08/11] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 Abel Gordon
2013-04-18 11:38 ` [PATCH 09/11] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs Abel Gordon
2013-04-18 11:39 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the " Abel Gordon
2013-04-18 11:39 ` [PATCH 11/11] KVM: nVMX: Enable and disable shadow vmcs functionality Abel Gordon
2013-04-18 12:42 ` [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Gleb Natapov
2013-04-18 12:48 ` Orit Wasserman
2013-04-22  7:55 ` Gleb Natapov
  -- strict thread matches above, loose matches on Subject: below --
2013-04-18  8:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v4 Abel Gordon
2013-04-18  8:39 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs Abel Gordon
2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
2013-03-10 16:08 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs Abel Gordon
2013-03-10 22:43   ` Nadav Har'El
2013-03-11  7:54     ` Abel Gordon
2013-04-09 13:14       ` Gleb Natapov
2013-04-10 19:15         ` Abel Gordon
2013-04-11  6:54           ` Gleb Natapov
2013-04-12 10:26             ` Abel Gordon
2013-04-12 10:31               ` Gleb Natapov
2013-04-12 10:44                 ` Abel Gordon
2013-04-12 10:48                   ` Gleb Natapov
2013-04-14  9:51                     ` Abel Gordon
2013-04-14 10:00                       ` Gleb Natapov
2013-04-14 10:07                         ` Gleb Natapov
2013-04-14 10:27                           ` Jan Kiszka
2013-04-14 10:34                             ` Abel Gordon
2013-04-14 10:34                             ` Gleb Natapov
2013-04-14 10:49                               ` Abel Gordon
2013-04-14 11:16                                 ` Gleb Natapov
2013-04-14 13:47                                   ` Abel Gordon
2013-04-14 14:41                                     ` Gleb Natapov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.