All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1
@ 2013-03-10 16:03 Abel Gordon
  2013-03-10 16:03 ` [PATCH 01/11] KVM: nVMX: Stats counters for nVMX Abel Gordon
                   ` (11 more replies)
  0 siblings, 12 replies; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:03 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

This series of patches implements shadow-vmcs capability for nested VMX.

Shadow-vmcs - background and overview:

 In Intel VMX, vmread and vmwrite privileged instructions are used by the
 hypervisor to read and modify the guest and host specifications (VMCS). In a
 nested virtualization environment, L1 executes multiple vmread and vmwrite
 instruction to handle a single L2 exit. Each vmread and vmwrite executed by L1
 traps (cause an exit) to the L0 hypervisor (KVM). L0 emulates the instruction
 behaviour and resumes L1 execution.

 Removing the need to trap and emulate these special instructions reduces the
 number of exits and improves nested virtualization performance. As it was first
 evaluated in [1], exit-less vmread and vmwrite can reduce nested virtualization
 overhead up-to 40%.
 
 Intel introduced a new feature to their processors called shadow-vmcs.  Using
 shadow-vmcs, L0 can configure the processor to let L1 running in guest-mode
 access VMCS12 fields using vmread and vmwrite instructions but without causing
 an exit to L0. The VMCS12 fields' data is stored in a shadow-vmcs controlled
 by L0.

Shadow-vmcs - design considerations: 

 A shadow-vmcs is processor-dependent and must be accessed by L0 or L1 using
 vmread and vmwrite instructions. With nested virtualization we aim to abstract
 the hardware from the L1 hypervisor. Thus, to avoid hardware dependencies we
 prefered to keep the software defined VMCS12 format as part of L1 address space
 and hold the processor-specific shadow-vmcs format only in L0 address space.
 In other words, the shadow-vmcs is used by L0 as an accelerator but the format
 and content is never exposed to L1 directly. L0 syncs the content of the
 processor-specific shadow vmcs with the content of the software-controlled
 VMCS12 format.

 We could have been kept the processor-specific shadow-vmcs format in L1 address
 space to avoid using the software defined VMCS12 format, however, this type of
 design/implementation would have been created hardware dependencies and
 would complicate other capabilities (e.g. Live Migration of L1).
 
Acknowledgments:

 Many thanks to
 "Xu, Dongxiao" <dongxiao.xu@intel.com>
 "Nakajima, Jun" <jun.nakajima@intel.com>
 "Har'El, Nadav" <nadav@harel.org.il> 
  
 for the insightful discussions, comments and reviews.


 These patches were easily created and maintained using
     Patchouli -- patch creator
     http://patchouli.sourceforge.net/


[1] "The Turtles Project: Design and Implementation of Nested Virtualization",
    http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/11] KVM: nVMX: Stats counters for nVMX
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
@ 2013-03-10 16:03 ` Abel Gordon
  2013-04-08 10:27   ` Gleb Natapov
  2013-03-10 16:04 ` [PATCH 02/11] KVM: nVMX: Shadow-vmcs control fields/bits Abel Gordon
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:03 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Add new counters to measure how many vmread/vmwrite/vmlaunch/vmresume/vmclear
instructions were trapped and emulated by L0

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/include/asm/kvm_host.h |    6 ++++++
 arch/x86/kvm/vmx.c              |    7 +++++++
 arch/x86/kvm/x86.c              |    6 ++++++
 3 files changed, 19 insertions(+)

--- .before/arch/x86/include/asm/kvm_host.h	2013-03-10 18:00:54.000000000 +0200
+++ .after/arch/x86/include/asm/kvm_host.h	2013-03-10 18:00:54.000000000 +0200
@@ -619,6 +619,12 @@ struct kvm_vcpu_stat {
 	u32 hypercalls;
 	u32 irq_injections;
 	u32 nmi_injections;
+	u32 nvmx_vmreads;
+	u32 nvmx_vmwrites;
+	u32 nvmx_vmptrlds;
+	u32 nvmx_vmlaunchs;
+	u32 nvmx_vmresumes;
+	u32 nvmx_vmclears;
 };
 
 struct x86_instruction_info;
--- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
@@ -5545,6 +5545,7 @@ static int handle_vmclear(struct kvm_vcp
 	struct vmcs12 *vmcs12;
 	struct page *page;
 	struct x86_exception e;
+	++vcpu->stat.nvmx_vmclears;
 
 	if (!nested_vmx_check_permission(vcpu))
 		return 1;
@@ -5601,12 +5602,14 @@ static int nested_vmx_run(struct kvm_vcp
 /* Emulate the VMLAUNCH instruction */
 static int handle_vmlaunch(struct kvm_vcpu *vcpu)
 {
+	++vcpu->stat.nvmx_vmlaunchs;
 	return nested_vmx_run(vcpu, true);
 }
 
 /* Emulate the VMRESUME instruction */
 static int handle_vmresume(struct kvm_vcpu *vcpu)
 {
+	++vcpu->stat.nvmx_vmresumes;
 
 	return nested_vmx_run(vcpu, false);
 }
@@ -5689,6 +5692,7 @@ static int handle_vmread(struct kvm_vcpu
 	u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
 	gva_t gva = 0;
 
+	++vcpu->stat.nvmx_vmreads;
 	if (!nested_vmx_check_permission(vcpu) ||
 	    !nested_vmx_check_vmcs12(vcpu))
 		return 1;
@@ -5741,6 +5745,8 @@ static int handle_vmwrite(struct kvm_vcp
 	u64 field_value = 0;
 	struct x86_exception e;
 
+	++vcpu->stat.nvmx_vmwrites;
+
 	if (!nested_vmx_check_permission(vcpu) ||
 	    !nested_vmx_check_vmcs12(vcpu))
 		return 1;
@@ -5807,6 +5813,7 @@ static int handle_vmptrld(struct kvm_vcp
 	gva_t gva;
 	gpa_t vmptr;
 	struct x86_exception e;
+	++vcpu->stat.nvmx_vmptrlds;
 
 	if (!nested_vmx_check_permission(vcpu))
 		return 1;
--- .before/arch/x86/kvm/x86.c	2013-03-10 18:00:54.000000000 +0200
+++ .after/arch/x86/kvm/x86.c	2013-03-10 18:00:54.000000000 +0200
@@ -145,6 +145,12 @@ struct kvm_stats_debugfs_item debugfs_en
 	{ "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) },
 	{ "irq_injections", VCPU_STAT(irq_injections) },
 	{ "nmi_injections", VCPU_STAT(nmi_injections) },
+	{ "nvmx_vmreads", VCPU_STAT(nvmx_vmreads) },
+	{ "nvmx_vmwrites", VCPU_STAT(nvmx_vmwrites) },
+	{ "nvmx_vmptrlds", VCPU_STAT(nvmx_vmptrlds) },
+	{ "nvmx_vmlaunchs", VCPU_STAT(nvmx_vmlaunchs) },
+	{ "nvmx_vmresumes", VCPU_STAT(nvmx_vmresumes) },
+	{ "nvmx_vmclears", VCPU_STAT(nvmx_vmclears) },
 	{ "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) },
 	{ "mmu_pte_write", VM_STAT(mmu_pte_write) },
 	{ "mmu_pte_updated", VM_STAT(mmu_pte_updated) },


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 02/11] KVM: nVMX: Shadow-vmcs control fields/bits
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
  2013-03-10 16:03 ` [PATCH 01/11] KVM: nVMX: Stats counters for nVMX Abel Gordon
@ 2013-03-10 16:04 ` Abel Gordon
  2013-03-10 16:04 ` [PATCH 03/11] KVM: nVMX: Detect shadow-vmcs capability Abel Gordon
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:04 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Add definitions for all the vmcs control fields/bits
required to enable vmcs-shadowing

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/include/asm/vmx.h |    3 +++
 1 file changed, 3 insertions(+)

--- .before/arch/x86/include/asm/vmx.h	2013-03-10 18:00:54.000000000 +0200
+++ .after/arch/x86/include/asm/vmx.h	2013-03-10 18:00:54.000000000 +0200
@@ -65,6 +65,7 @@
 #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
+#define SECONDARY_EXEC_SHADOW_VMCS              0x00004000
 
 
 #define PIN_BASED_EXT_INTR_MASK                 0x00000001
@@ -140,6 +141,8 @@ enum vmcs_field {
 	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
 	EOI_EXIT_BITMAP3                = 0x00002022,
 	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
+	VMREAD_BITMAP                   = 0x00002026,
+	VMWRITE_BITMAP                  = 0x00002028,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
 	VMCS_LINK_POINTER               = 0x00002800,


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 03/11] KVM: nVMX: Detect shadow-vmcs capability
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
  2013-03-10 16:03 ` [PATCH 01/11] KVM: nVMX: Stats counters for nVMX Abel Gordon
  2013-03-10 16:04 ` [PATCH 02/11] KVM: nVMX: Shadow-vmcs control fields/bits Abel Gordon
@ 2013-03-10 16:04 ` Abel Gordon
  2013-04-08 11:12   ` Gleb Natapov
  2013-03-10 16:05 ` [PATCH 04/11] KVM: nVMX: Introduce vmread and vmwrite bitmaps Abel Gordon
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:04 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Add logic required to detect if shadow-vmcs is supported by the
processor. Introduce a new kernel module parameter to specify if L0 should use
shadow vmcs (or not) to run L1.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

--- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
@@ -86,6 +86,8 @@ module_param(fasteoi, bool, S_IRUGO);
 
 static bool __read_mostly enable_apicv_reg_vid;
 
+static bool __read_mostly enable_shadow_vmcs = 1;
+module_param_named(enable_shadow_vmcs, enable_shadow_vmcs, bool, S_IRUGO);
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
  * VMX and be a hypervisor for its own guests. If nested=0, guests may not
@@ -895,6 +897,18 @@ static inline bool cpu_has_vmx_wbinvd_ex
 		SECONDARY_EXEC_WBINVD_EXITING;
 }
 
+static inline bool cpu_has_vmx_shadow_vmcs(void)
+{
+	u64 vmx_msr;
+	rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
+	/* check if the cpu supports writing r/o exit information fields */
+	if (!(vmx_msr & (1u << 29)))
+		return false;
+
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_SHADOW_VMCS;
+}
+
 static inline bool report_flexpriority(void)
 {
 	return flexpriority_enabled;
@@ -2582,7 +2596,8 @@ static __init int setup_vmcs_config(stru
 			SECONDARY_EXEC_RDTSCP |
 			SECONDARY_EXEC_ENABLE_INVPCID |
 			SECONDARY_EXEC_APIC_REGISTER_VIRT |
-			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
+			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
+			SECONDARY_EXEC_SHADOW_VMCS;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -2771,6 +2786,8 @@ static __init int hardware_setup(void)
 
 	if (!cpu_has_vmx_vpid())
 		enable_vpid = 0;
+	if (!cpu_has_vmx_shadow_vmcs())
+		enable_shadow_vmcs = 0;
 
 	if (!cpu_has_vmx_ept() ||
 	    !cpu_has_vmx_ept_4levels()) {
@@ -3982,6 +3999,12 @@ static u32 vmx_secondary_exec_control(st
 		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
 				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
 	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
+	/* SECONDARY_EXEC_SHADOW_VMCS is enabled when L1 executes VMPTRLD
+	   (handle_vmptrld).
+	   We can NOT enable shadow_vmcs here because we don't have yet
+	   a current VMCS12
+	*/
+	exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
 	return exec_control;
 }
 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 04/11] KVM: nVMX: Introduce vmread and vmwrite bitmaps
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
                   ` (2 preceding siblings ...)
  2013-03-10 16:04 ` [PATCH 03/11] KVM: nVMX: Detect shadow-vmcs capability Abel Gordon
@ 2013-03-10 16:05 ` Abel Gordon
  2013-04-08 11:50   ` Gleb Natapov
  2013-03-10 16:05 ` [PATCH 05/11] KVM: nVMX: Refactor handle_vmwrite Abel Gordon
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:05 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Prepare vmread and vmwrite bitmaps according to a pre-specified list of fields.
These lists are intended to specifiy most frequent accessed fields so we can
minimize the number of fields that are copied from/to the software controlled
VMCS12 format to/from to processor-specific shadow vmcs. The lists were built
measuring the VMCS fields access rate after L2 Ubuntu 12.04 booted when it was
running on top of L1 KVM, also Ubuntu 12.04. Note that during boot there were
additional fields which were frequently modified but they were not added to
these lists because after boot these fields were not longer accessed by L1.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   75 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 74 insertions(+), 1 deletion(-)

--- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
@@ -453,6 +453,51 @@ static inline struct vcpu_vmx *to_vmx(st
 #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
 				[number##_HIGH] = VMCS12_OFFSET(name)+4
 
+
+static const unsigned long shadow_read_only_fields[] = {
+	VM_EXIT_REASON,
+	VM_EXIT_INTR_INFO,
+	VM_EXIT_INSTRUCTION_LEN,
+	IDT_VECTORING_INFO_FIELD,
+	IDT_VECTORING_ERROR_CODE,
+	VM_EXIT_INTR_ERROR_CODE,
+	EXIT_QUALIFICATION,
+	GUEST_LINEAR_ADDRESS,
+	GUEST_PHYSICAL_ADDRESS
+};
+static const int max_shadow_read_only_fields = ARRAY_SIZE(shadow_read_only_fields);
+
+static const unsigned long shadow_read_write_fields[] = {
+	GUEST_RIP,
+	GUEST_RSP,
+	GUEST_CR0,
+	GUEST_CR3,
+	GUEST_CR4,
+	GUEST_INTERRUPTIBILITY_INFO,
+	GUEST_RFLAGS,
+	GUEST_CS_SELECTOR,
+	GUEST_CS_AR_BYTES,
+	GUEST_CS_LIMIT,
+	GUEST_CS_BASE,
+	GUEST_ES_BASE,
+	CR0_GUEST_HOST_MASK,
+	CR0_READ_SHADOW,
+	CR4_READ_SHADOW,
+	TSC_OFFSET,
+	EXCEPTION_BITMAP,
+	CPU_BASED_VM_EXEC_CONTROL,
+	VM_ENTRY_EXCEPTION_ERROR_CODE,
+	VM_ENTRY_INTR_INFO_FIELD,
+	VM_ENTRY_INSTRUCTION_LEN,
+	VM_ENTRY_EXCEPTION_ERROR_CODE,
+	HOST_FS_BASE,
+	HOST_GS_BASE,
+	HOST_FS_SELECTOR,
+	HOST_GS_SELECTOR
+};
+static const int max_shadow_read_write_fields =
+	ARRAY_SIZE(shadow_read_write_fields);
+
 static const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD(VIRTUAL_PROCESSOR_ID, virtual_processor_id),
 	FIELD(GUEST_ES_SELECTOR, guest_es_selector),
@@ -642,6 +687,8 @@ static unsigned long *vmx_msr_bitmap_leg
 static unsigned long *vmx_msr_bitmap_longmode;
 static unsigned long *vmx_msr_bitmap_legacy_x2apic;
 static unsigned long *vmx_msr_bitmap_longmode_x2apic;
+static unsigned long *vmx_vmread_bitmap;
+static unsigned long *vmx_vmwrite_bitmap;
 
 static bool cpu_has_load_ia32_efer;
 static bool cpu_has_load_perf_global_ctrl;
@@ -4033,6 +4080,8 @@ static int vmx_vcpu_setup(struct vcpu_vm
 	vmcs_write64(IO_BITMAP_A, __pa(vmx_io_bitmap_a));
 	vmcs_write64(IO_BITMAP_B, __pa(vmx_io_bitmap_b));
 
+	vmcs_write64(VMREAD_BITMAP, __pa(vmx_vmread_bitmap));
+	vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap));
 	if (cpu_has_vmx_msr_bitmap())
 		vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy));
 
@@ -7764,6 +7813,24 @@ static int __init vmx_init(void)
 				(unsigned long *)__get_free_page(GFP_KERNEL);
 	if (!vmx_msr_bitmap_longmode_x2apic)
 		goto out4;
+	vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
+	if (!vmx_vmread_bitmap)
+		goto out4;
+
+	vmx_vmwrite_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
+	if (!vmx_vmwrite_bitmap)
+		goto out5;
+
+	memset(vmx_vmread_bitmap, 0xff, PAGE_SIZE);
+	memset(vmx_vmwrite_bitmap, 0xff, PAGE_SIZE);
+	/* shadowed read/write fields */
+	for (i = 0; i < max_shadow_read_write_fields; i++) {
+		clear_bit(shadow_read_write_fields[i], vmx_vmwrite_bitmap);
+		clear_bit(shadow_read_write_fields[i], vmx_vmread_bitmap);
+	}
+	/* shadowed read only fields */
+	for (i = 0; i < max_shadow_read_only_fields; i++)
+		clear_bit(shadow_read_only_fields[i], vmx_vmread_bitmap);
 
 	/*
 	 * Allow direct access to the PC debug port (it is often used for I/O
@@ -7782,7 +7849,7 @@ static int __init vmx_init(void)
 	r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
 		     __alignof__(struct vcpu_vmx), THIS_MODULE);
 	if (r)
-		goto out3;
+		goto out6;
 
 #ifdef CONFIG_KEXEC
 	rcu_assign_pointer(crash_vmclear_loaded_vmcss,
@@ -7830,6 +7897,10 @@ static int __init vmx_init(void)
 
 	return 0;
 
+out6:
+	free_page((unsigned long)vmx_vmwrite_bitmap);
+out5:
+	free_page((unsigned long)vmx_vmread_bitmap);
 out4:
 	free_page((unsigned long)vmx_msr_bitmap_longmode);
 out3:
@@ -7851,6 +7922,8 @@ static void __exit vmx_exit(void)
 	free_page((unsigned long)vmx_msr_bitmap_longmode);
 	free_page((unsigned long)vmx_io_bitmap_b);
 	free_page((unsigned long)vmx_io_bitmap_a);
+	free_page((unsigned long)vmx_vmwrite_bitmap);
+	free_page((unsigned long)vmx_vmread_bitmap);
 
 #ifdef CONFIG_KEXEC
 	rcu_assign_pointer(crash_vmclear_loaded_vmcss, NULL);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 05/11] KVM: nVMX: Refactor handle_vmwrite
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
                   ` (3 preceding siblings ...)
  2013-03-10 16:05 ` [PATCH 04/11] KVM: nVMX: Introduce vmread and vmwrite bitmaps Abel Gordon
@ 2013-03-10 16:05 ` Abel Gordon
  2013-04-09 11:05   ` Gleb Natapov
  2013-03-10 16:06 ` [PATCH 06/11] KVM: nVMX: Allocate shadow vmcs Abel Gordon
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:05 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Refactor existent code so we re-use vmcs12_write_any to copy fields from the
shadow vmcs specified by the link pointer (used by the processor,
implementation-specific) to the VMCS12 software format used by L0 to hold
the fields in L1 memory address space.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   52 +++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 24 deletions(-)

--- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
@@ -5741,6 +5741,33 @@ static inline bool vmcs12_read_any(struc
 	}
 }
 
+
+static inline bool vmcs12_write_any(struct kvm_vcpu *vcpu,
+				    unsigned long field, u64 field_value){
+	short offset = vmcs_field_to_offset(field);
+	char *p = ((char *) get_vmcs12(vcpu)) + offset;
+	if (offset < 0)
+		return 0;
+
+	switch (vmcs_field_type(field)) {
+	case VMCS_FIELD_TYPE_U16:
+		*(u16 *)p = field_value;
+		return 1;
+	case VMCS_FIELD_TYPE_U32:
+		*(u32 *)p = field_value;
+		return 1;
+	case VMCS_FIELD_TYPE_U64:
+		*(u64 *)p = field_value;
+		return 1;
+	case VMCS_FIELD_TYPE_NATURAL_WIDTH:
+		*(natural_width *)p = field_value;
+		return 1;
+	default:
+		return 0; /* can never happen. */
+	}
+
+}
+
 /*
  * VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
  * used before) all generate the same failure when it is missing.
@@ -5806,8 +5833,6 @@ static int handle_vmwrite(struct kvm_vcp
 	gva_t gva;
 	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 	u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
-	char *p;
-	short offset;
 	/* The value to write might be 32 or 64 bits, depending on L1's long
 	 * mode, and eventually we need to write that into a field of several
 	 * possible lengths. The code below first zero-extends the value to 64
@@ -5846,28 +5871,7 @@ static int handle_vmwrite(struct kvm_vcp
 		return 1;
 	}
 
-	offset = vmcs_field_to_offset(field);
-	if (offset < 0) {
-		nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
-		skip_emulated_instruction(vcpu);
-		return 1;
-	}
-	p = ((char *) get_vmcs12(vcpu)) + offset;
-
-	switch (vmcs_field_type(field)) {
-	case VMCS_FIELD_TYPE_U16:
-		*(u16 *)p = field_value;
-		break;
-	case VMCS_FIELD_TYPE_U32:
-		*(u32 *)p = field_value;
-		break;
-	case VMCS_FIELD_TYPE_U64:
-		*(u64 *)p = field_value;
-		break;
-	case VMCS_FIELD_TYPE_NATURAL_WIDTH:
-		*(natural_width *)p = field_value;
-		break;
-	default:
+	if (!vmcs12_write_any(vcpu, field, field_value)) {
 		nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
 		skip_emulated_instruction(vcpu);
 		return 1;


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 06/11] KVM: nVMX: Allocate shadow vmcs
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
                   ` (4 preceding siblings ...)
  2013-03-10 16:05 ` [PATCH 05/11] KVM: nVMX: Refactor handle_vmwrite Abel Gordon
@ 2013-03-10 16:06 ` Abel Gordon
  2013-03-10 16:06 ` [PATCH 07/11] KVM: nVMX: Release " Abel Gordon
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:06 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Allocate a shadow vmcs used by the processor to shadow part of the fields
stored in the software defined VMCS12 (let L1 access fields without causing
exits). Note we keep a shadow vmcs only for the current vmcs12.  Once a vmcs12
becomes non-current, its shadow vmcs is released.


Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
@@ -353,6 +353,7 @@ struct nested_vmx {
 	/* The host-usable pointer to the above */
 	struct page *current_vmcs12_page;
 	struct vmcs12 *current_vmcs12;
+	struct vmcs *current_shadow_vmcs;
 
 	/* vmcs02_list cache of VMCSs recently used to run L2 guests */
 	struct list_head vmcs02_pool;
@@ -5889,6 +5890,7 @@ static int handle_vmptrld(struct kvm_vcp
 	gva_t gva;
 	gpa_t vmptr;
 	struct x86_exception e;
+	struct vmcs *shadow_vmcs;
 	++vcpu->stat.nvmx_vmptrlds;
 
 	if (!nested_vmx_check_permission(vcpu))
@@ -5936,6 +5938,19 @@ static int handle_vmptrld(struct kvm_vcp
 		vmx->nested.current_vmptr = vmptr;
 		vmx->nested.current_vmcs12 = new_vmcs12;
 		vmx->nested.current_vmcs12_page = page;
+		if (enable_shadow_vmcs) {
+			shadow_vmcs = alloc_vmcs();
+			if (!shadow_vmcs) {
+				nested_vmx_failInvalid(vcpu);
+				skip_emulated_instruction(vcpu);
+				return 1;
+			}
+			/* mark vmcs as shadow */
+			shadow_vmcs->revision_id |= (1u << 31);
+			/* init shadow vmcs */
+			vmcs_clear(shadow_vmcs);
+			vmx->nested.current_shadow_vmcs = shadow_vmcs;
+		}
 	}
 
 	nested_vmx_succeed(vcpu);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 07/11] KVM: nVMX: Release shadow vmcs
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
                   ` (5 preceding siblings ...)
  2013-03-10 16:06 ` [PATCH 06/11] KVM: nVMX: Allocate shadow vmcs Abel Gordon
@ 2013-03-10 16:06 ` Abel Gordon
  2013-03-10 16:07 ` [PATCH 08/11] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 Abel Gordon
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:06 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Unmap vmcs12 and release the corresponding shadow vmcs 

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

--- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
@@ -5484,6 +5484,17 @@ static int nested_vmx_check_permission(s
 	return 1;
 }
 
+static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
+{
+	if (enable_shadow_vmcs) {
+		if (vmx->nested.current_vmcs12 != NULL) {
+			free_vmcs(vmx->nested.current_shadow_vmcs);
+		}
+	}
+	kunmap(vmx->nested.current_vmcs12_page);
+	nested_release_page(vmx->nested.current_vmcs12_page);
+}
+
 /*
  * Free whatever needs to be freed from vmx->nested when L1 goes down, or
  * just stops using VMX.
@@ -5494,8 +5505,7 @@ static void free_nested(struct vcpu_vmx 
 		return;
 	vmx->nested.vmxon = false;
 	if (vmx->nested.current_vmptr != -1ull) {
-		kunmap(vmx->nested.current_vmcs12_page);
-		nested_release_page(vmx->nested.current_vmcs12_page);
+		nested_release_vmcs12(vmx);
 		vmx->nested.current_vmptr = -1ull;
 		vmx->nested.current_vmcs12 = NULL;
 	}
@@ -5640,8 +5650,7 @@ static int handle_vmclear(struct kvm_vcp
 	}
 
 	if (vmptr == vmx->nested.current_vmptr) {
-		kunmap(vmx->nested.current_vmcs12_page);
-		nested_release_page(vmx->nested.current_vmcs12_page);
+		nested_release_vmcs12(vmx);
 		vmx->nested.current_vmptr = -1ull;
 		vmx->nested.current_vmcs12 = NULL;
 	}
@@ -5930,10 +5939,8 @@ static int handle_vmptrld(struct kvm_vcp
 			skip_emulated_instruction(vcpu);
 			return 1;
 		}
-		if (vmx->nested.current_vmptr != -1ull) {
-			kunmap(vmx->nested.current_vmcs12_page);
-			nested_release_page(vmx->nested.current_vmcs12_page);
-		}
+		if (vmx->nested.current_vmptr != -1ull)
+			nested_release_vmcs12(vmx);
 
 		vmx->nested.current_vmptr = vmptr;
 		vmx->nested.current_vmcs12 = new_vmcs12;


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 08/11] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
                   ` (6 preceding siblings ...)
  2013-03-10 16:06 ` [PATCH 07/11] KVM: nVMX: Release " Abel Gordon
@ 2013-03-10 16:07 ` Abel Gordon
  2013-03-10 16:07 ` [PATCH 09/11] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs Abel Gordon
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:07 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Introduce a function used to copy fields from the processor-specific shadow
vmcs to the software controlled VMCS12

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
@@ -672,6 +672,7 @@ static void vmx_get_segment(struct kvm_v
 			    struct kvm_segment *var, int seg);
 static bool guest_state_valid(struct kvm_vcpu *vcpu);
 static u32 vmx_segment_access_rights(struct kvm_segment *var);
+static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -5778,6 +5779,40 @@ static inline bool vmcs12_write_any(stru
 
 }
 
+static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
+{
+	int i;
+	unsigned long field;
+	u64 field_value;
+	struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
+	unsigned long *fields = (unsigned long *)shadow_read_write_fields;
+	int num_fields = max_shadow_read_write_fields;
+
+	vmcs_load(shadow_vmcs);
+
+	for (i = 0; i < num_fields; i++) {
+		field = fields[i];
+		switch (vmcs_field_type(field)) {
+		case VMCS_FIELD_TYPE_U16:
+			field_value = vmcs_read16(field);
+			break;
+		case VMCS_FIELD_TYPE_U32:
+			field_value = vmcs_read32(field);
+			break;
+		case VMCS_FIELD_TYPE_U64:
+			field_value = vmcs_read64(field);
+			break;
+		case VMCS_FIELD_TYPE_NATURAL_WIDTH:
+			field_value = vmcs_readl(field);
+			break;
+		}
+		vmcs12_write_any(&vmx->vcpu, field, field_value);
+	}
+
+	vmcs_clear(shadow_vmcs);
+	vmcs_load(vmx->loaded_vmcs->vmcs);
+}
+
 /*
  * VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
  * used before) all generate the same failure when it is missing.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 09/11] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
                   ` (7 preceding siblings ...)
  2013-03-10 16:07 ` [PATCH 08/11] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 Abel Gordon
@ 2013-03-10 16:07 ` Abel Gordon
  2013-04-09 12:47   ` Gleb Natapov
  2013-03-10 16:08 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the " Abel Gordon
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:07 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Introduce a function used to copy fields from the software controlled VMCS12
to the processor-specific shadow vmcs

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   45 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
@@ -672,6 +672,7 @@ static void vmx_get_segment(struct kvm_v
 			    struct kvm_segment *var, int seg);
 static bool guest_state_valid(struct kvm_vcpu *vcpu);
 static u32 vmx_segment_access_rights(struct kvm_segment *var);
+static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
@@ -5813,6 +5814,50 @@ static void copy_shadow_to_vmcs12(struct
 	vmcs_load(vmx->loaded_vmcs->vmcs);
 }
 
+static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
+{
+	int num_lists = 2;
+	unsigned long *fields[] = {
+		(unsigned long *)shadow_read_write_fields,
+		(unsigned long *)shadow_read_only_fields
+	};
+	int max_fields[] = {
+		max_shadow_read_write_fields,
+		max_shadow_read_only_fields
+	};
+	int i, q;
+	unsigned long field;
+	u64 field_value = 0;
+	struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
+
+	vmcs_load(shadow_vmcs);
+
+	for (q = 0; q < num_lists; q++) {
+		for (i = 0; i < max_fields[q]; i++) {
+			field = fields[q][i];
+			vmcs12_read_any(&vmx->vcpu, field, &field_value);
+
+			switch (vmcs_field_type(field)) {
+			case VMCS_FIELD_TYPE_U16:
+				vmcs_write16(field, (u16)field_value);
+				break;
+			case VMCS_FIELD_TYPE_U32:
+				vmcs_write32(field, (u32)field_value);
+				break;
+			case VMCS_FIELD_TYPE_U64:
+				vmcs_write64(field, (u64)field_value);
+				break;
+			case VMCS_FIELD_TYPE_NATURAL_WIDTH:
+				vmcs_writel(field, (long)field_value);
+				break;
+			}
+		}
+	}
+
+	vmcs_clear(shadow_vmcs);
+	vmcs_load(vmx->loaded_vmcs->vmcs);
+}
+
 /*
  * VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
  * used before) all generate the same failure when it is missing.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
                   ` (8 preceding siblings ...)
  2013-03-10 16:07 ` [PATCH 09/11] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs Abel Gordon
@ 2013-03-10 16:08 ` Abel Gordon
  2013-03-10 22:43   ` Nadav Har'El
  2013-03-10 16:08 ` [PATCH 11/11] KVM: nVMX: Enable and disable shadow vmcs functionality Abel Gordon
  2013-03-21 12:22 ` [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Orit Wasserman
  11 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:08 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Synchronize between the VMCS1212 software controlled structure and the
processor-specific shadow vmcs

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
@@ -1843,6 +1843,8 @@ static int nested_pf_handled(struct kvm_
 		return 0;
 
 	nested_vmx_vmexit(vcpu);
+	if (enable_shadow_vmcs)
+		copy_vmcs12_to_shadow(to_vmx(vcpu));
 	return 1;
 }
 
@@ -4430,6 +4432,8 @@ static int vmx_interrupt_allowed(struct 
 		nested_vmx_vmexit(vcpu);
 		vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
 		vmcs12->vm_exit_intr_info = 0;
+		if (enable_shadow_vmcs)
+			copy_vmcs12_to_shadow(to_vmx(vcpu));
 		/* fall through to normal code, but now in L1, not L2 */
 	}
 
@@ -5490,6 +5494,9 @@ static inline void nested_release_vmcs12
 {
 	if (enable_shadow_vmcs) {
 		if (vmx->nested.current_vmcs12 != NULL) {
+			/* copy to memory all shadowed fields in case
+			   they were modified */
+			copy_shadow_to_vmcs12(vmx);
 			free_vmcs(vmx->nested.current_shadow_vmcs);
 		}
 	}
@@ -6037,6 +6044,7 @@ static int handle_vmptrld(struct kvm_vcp
 			/* init shadow vmcs */
 			vmcs_clear(shadow_vmcs);
 			vmx->nested.current_shadow_vmcs = shadow_vmcs;
+			copy_vmcs12_to_shadow(vmx);
 		}
 	}
 
@@ -6425,6 +6433,8 @@ static int vmx_handle_exit(struct kvm_vc
 
 	if (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) {
 		nested_vmx_vmexit(vcpu);
+		if (enable_shadow_vmcs)
+			copy_vmcs12_to_shadow(vmx);
 		return 1;
 	}
 
@@ -7384,6 +7394,8 @@ static int nested_vmx_run(struct kvm_vcp
 	skip_emulated_instruction(vcpu);
 	vmcs12 = get_vmcs12(vcpu);
 
+	if (enable_shadow_vmcs)
+		copy_shadow_to_vmcs12(vmx);
 	/*
 	 * The nested entry process starts with enforcing various prerequisites
 	 * on vmcs12 as required by the Intel SDM, and act appropriately when


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 11/11] KVM: nVMX: Enable and disable shadow vmcs functionality
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
                   ` (9 preceding siblings ...)
  2013-03-10 16:08 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the " Abel Gordon
@ 2013-03-10 16:08 ` Abel Gordon
  2013-03-21 12:22 ` [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Orit Wasserman
  11 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-03-10 16:08 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Once L1 loads VMCS12 we enable shadow-vmcs capability and copy all the VMCS12
shadowed fields to the shadow vmcs.  When we release the VMCS12, we also
disable shadow-vmcs capability.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:56.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:56.000000000 +0200
@@ -5492,11 +5492,16 @@ static int nested_vmx_check_permission(s
 
 static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 {
+	u32 exec_control;
 	if (enable_shadow_vmcs) {
 		if (vmx->nested.current_vmcs12 != NULL) {
 			/* copy to memory all shadowed fields in case
 			   they were modified */
 			copy_shadow_to_vmcs12(vmx);
+			exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+			exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
+			vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
+			vmcs_write64(VMCS_LINK_POINTER, -1ull);
 			free_vmcs(vmx->nested.current_shadow_vmcs);
 		}
 	}
@@ -5987,6 +5992,7 @@ static int handle_vmptrld(struct kvm_vcp
 	gpa_t vmptr;
 	struct x86_exception e;
 	struct vmcs *shadow_vmcs;
+	u32 exec_control;
 	++vcpu->stat.nvmx_vmptrlds;
 
 	if (!nested_vmx_check_permission(vcpu))
@@ -6044,6 +6050,11 @@ static int handle_vmptrld(struct kvm_vcp
 			/* init shadow vmcs */
 			vmcs_clear(shadow_vmcs);
 			vmx->nested.current_shadow_vmcs = shadow_vmcs;
+			exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+			exec_control |= SECONDARY_EXEC_SHADOW_VMCS;
+			vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
+			vmcs_write64(VMCS_LINK_POINTER,
+				     __pa(shadow_vmcs));
 			copy_vmcs12_to_shadow(vmx);
 		}
 	}


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-03-10 16:08 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the " Abel Gordon
@ 2013-03-10 22:43   ` Nadav Har'El
  2013-03-11  7:54     ` Abel Gordon
  0 siblings, 1 reply; 46+ messages in thread
From: Nadav Har'El @ 2013-03-10 22:43 UTC (permalink / raw)
  To: Abel Gordon; +Cc: kvm, owasserm, jun.nakajima, dongxiao.xu

Hi Abel, very nice patches.

On Sun, Mar 10, 2013, Abel Gordon wrote about "[PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs":
>  	nested_vmx_vmexit(vcpu);
> +	if (enable_shadow_vmcs)
> +		copy_vmcs12_to_shadow(to_vmx(vcpu));

I was curious why your patch adds this call to copy_vmcs12_to_shadow after
every nested_vmx_vmexit (3 times), instead of making this call inside
nested_vmx_vmexit(), say right after prepare_vmcs12(). Until I saw:

>  		nested_vmx_vmexit(vcpu);
>  		vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
>  		vmcs12->vm_exit_intr_info = 0;
> +		if (enable_shadow_vmcs)
> +			copy_vmcs12_to_shadow(to_vmx(vcpu));

where you need to copy this exit-reason override as well...
I wonder if there isn't a less ugly and repetitive way to do this :(

-- 
Nadav Har'El                        |        Monday, Mar 11 2013, 29 Adar 5773
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Fame: when your name is in everything but
http://nadav.harel.org.il           |the phone book.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-03-10 22:43   ` Nadav Har'El
@ 2013-03-11  7:54     ` Abel Gordon
  2013-04-09 13:14       ` Gleb Natapov
  0 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-03-11  7:54 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: dongxiao.xu, jun.nakajima, kvm, owasserm

"Nadav Har'El" <nyh@math.technion.ac.il> wrote on 11/03/2013 12:43:35 AM:


> On Sun, Mar 10, 2013, Abel Gordon wrote about "[PATCH 10/11] KVM:
> nVMX: Synchronize VMCS12 content with the shadow vmcs":
> >     nested_vmx_vmexit(vcpu);
> > +   if (enable_shadow_vmcs)
> > +      copy_vmcs12_to_shadow(to_vmx(vcpu));
>
> I was curious why your patch adds this call to copy_vmcs12_to_shadow
after
> every nested_vmx_vmexit (3 times), instead of making this call inside
> nested_vmx_vmexit(), say right after prepare_vmcs12(). Until I saw:

Because nested code sometimes modifies vmcs fileds "after"
nested_vmx_vmexit (see below). I was afraid nested logic
may be changed in the future and some field may become out-of-sync.

If we do have to call copy_vmcs12_to_shadow explicitly, then, it will be
more difficult to miss some field.

>
> >        nested_vmx_vmexit(vcpu);
> >        vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
> >        vmcs12->vm_exit_intr_info = 0;
> > +      if (enable_shadow_vmcs)
> > +         copy_vmcs12_to_shadow(to_vmx(vcpu));
>
> where you need to copy this exit-reason override as well...
Exactly ;)

Also, if I remember properly, nested EPT patches also modify vmcs12 fields
outside the nested_vmx_vmexit context.

> I wonder if there isn't a less ugly and repetitive way to do this :(

We can change nested_vmx_vmexit to get additional parameters so
we can set others fields (e.g. exit_reasion and exit_intr_info) as
part of this function, however, I am not sure this will be
"less ugly"... do you remember the old/original nested code
in which nested_vmx_vmexit had an additional "external_interrupt"
parameter ? :)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1
  2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
                   ` (10 preceding siblings ...)
  2013-03-10 16:08 ` [PATCH 11/11] KVM: nVMX: Enable and disable shadow vmcs functionality Abel Gordon
@ 2013-03-21 12:22 ` Orit Wasserman
  2013-03-21 13:56   ` Abel Gordon
  11 siblings, 1 reply; 46+ messages in thread
From: Orit Wasserman @ 2013-03-21 12:22 UTC (permalink / raw)
  To: Abel Gordon; +Cc: kvm, nadav, jun.nakajima, dongxiao.xu

On 03/10/2013 06:03 PM, Abel Gordon wrote:
> This series of patches implements shadow-vmcs capability for nested VMX.
> 
> Shadow-vmcs - background and overview:
> 
>  In Intel VMX, vmread and vmwrite privileged instructions are used by the
>  hypervisor to read and modify the guest and host specifications (VMCS). In a
>  nested virtualization environment, L1 executes multiple vmread and vmwrite
>  instruction to handle a single L2 exit. Each vmread and vmwrite executed by L1
>  traps (cause an exit) to the L0 hypervisor (KVM). L0 emulates the instruction
>  behaviour and resumes L1 execution.
> 
>  Removing the need to trap and emulate these special instructions reduces the
>  number of exits and improves nested virtualization performance. As it was first
>  evaluated in [1], exit-less vmread and vmwrite can reduce nested virtualization
>  overhead up-to 40%.
>  
>  Intel introduced a new feature to their processors called shadow-vmcs.  Using
>  shadow-vmcs, L0 can configure the processor to let L1 running in guest-mode
>  access VMCS12 fields using vmread and vmwrite instructions but without causing
>  an exit to L0. The VMCS12 fields' data is stored in a shadow-vmcs controlled
>  by L0.
> 
> Shadow-vmcs - design considerations: 
> 
>  A shadow-vmcs is processor-dependent and must be accessed by L0 or L1 using
>  vmread and vmwrite instructions. With nested virtualization we aim to abstract
>  the hardware from the L1 hypervisor. Thus, to avoid hardware dependencies we
>  prefered to keep the software defined VMCS12 format as part of L1 address space
>  and hold the processor-specific shadow-vmcs format only in L0 address space.
>  In other words, the shadow-vmcs is used by L0 as an accelerator but the format
>  and content is never exposed to L1 directly. L0 syncs the content of the
>  processor-specific shadow vmcs with the content of the software-controlled
>  VMCS12 format.
> 
>  We could have been kept the processor-specific shadow-vmcs format in L1 address
>  space to avoid using the software defined VMCS12 format, however, this type of
>  design/implementation would have been created hardware dependencies and
>  would complicate other capabilities (e.g. Live Migration of L1).
>  
> Acknowledgments:
> 
>  Many thanks to
>  "Xu, Dongxiao" <dongxiao.xu@intel.com>
>  "Nakajima, Jun" <jun.nakajima@intel.com>
>  "Har'El, Nadav" <nadav@harel.org.il> 
>   
>  for the insightful discussions, comments and reviews.
> 
> 
>  These patches were easily created and maintained using
>      Patchouli -- patch creator
>      http://patchouli.sourceforge.net/
> 
> 
> [1] "The Turtles Project: Design and Implementation of Nested Virtualization",
>     http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Reviewed-by: Orit Wasserman <owasserm@redhat.com>

By the way do you have some performance results, how does it improve nested ?

Orit

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1
  2013-03-21 12:22 ` [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Orit Wasserman
@ 2013-03-21 13:56   ` Abel Gordon
  0 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-03-21 13:56 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav



Orit Wasserman <owasserm@redhat.com> wrote on 21/03/2013 02:22:44 PM:

> By the way do you have some performance results, how does it improve
nested ?

Only the old numbers we obtained "emulating" this type of feature using
Nehalem processors and we --including you :)-- published in the Turtles
papers: http://static.usenix.org/event/osdi10/tech/slides/ben-yehuda.pdf
The results showed that up-to 40% of nested overhead was caused
by L0 trapping and emulating L1 vmread/vmwrite instructions.

To handle a single L2 exit, L1 performs around 10 vmread/vmwrite
instructions (10 exits + 10 entries), so this feature
should reduce the virtual exit/entry cost (L2->L1->L2)
by at least an order of magnitude. Instead of doing a long chain of
entries/exits (L2->L0->L1->L0->L1....L0->L1->L0->L2) we will have
a simple and short chain (L2->L0->L1->L0->L2).


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/11] KVM: nVMX: Stats counters for nVMX
  2013-03-10 16:03 ` [PATCH 01/11] KVM: nVMX: Stats counters for nVMX Abel Gordon
@ 2013-04-08 10:27   ` Gleb Natapov
  2013-04-10 19:08     ` Abel Gordon
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-08 10:27 UTC (permalink / raw)
  To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu

On Sun, Mar 10, 2013 at 06:03:55PM +0200, Abel Gordon wrote:
> Add new counters to measure how many vmread/vmwrite/vmlaunch/vmresume/vmclear
> instructions were trapped and emulated by L0
> 
stat counters are deprecated in favor of trace points. Adding kvmnested
trace system is very welcome though.

> Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    6 ++++++
>  arch/x86/kvm/vmx.c              |    7 +++++++
>  arch/x86/kvm/x86.c              |    6 ++++++
>  3 files changed, 19 insertions(+)
> 
> --- .before/arch/x86/include/asm/kvm_host.h	2013-03-10 18:00:54.000000000 +0200
> +++ .after/arch/x86/include/asm/kvm_host.h	2013-03-10 18:00:54.000000000 +0200
> @@ -619,6 +619,12 @@ struct kvm_vcpu_stat {
>  	u32 hypercalls;
>  	u32 irq_injections;
>  	u32 nmi_injections;
> +	u32 nvmx_vmreads;
> +	u32 nvmx_vmwrites;
> +	u32 nvmx_vmptrlds;
> +	u32 nvmx_vmlaunchs;
> +	u32 nvmx_vmresumes;
> +	u32 nvmx_vmclears;
>  };
>  
>  struct x86_instruction_info;
> --- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
> +++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
> @@ -5545,6 +5545,7 @@ static int handle_vmclear(struct kvm_vcp
>  	struct vmcs12 *vmcs12;
>  	struct page *page;
>  	struct x86_exception e;
> +	++vcpu->stat.nvmx_vmclears;
>  
>  	if (!nested_vmx_check_permission(vcpu))
>  		return 1;
> @@ -5601,12 +5602,14 @@ static int nested_vmx_run(struct kvm_vcp
>  /* Emulate the VMLAUNCH instruction */
>  static int handle_vmlaunch(struct kvm_vcpu *vcpu)
>  {
> +	++vcpu->stat.nvmx_vmlaunchs;
>  	return nested_vmx_run(vcpu, true);
>  }
>  
>  /* Emulate the VMRESUME instruction */
>  static int handle_vmresume(struct kvm_vcpu *vcpu)
>  {
> +	++vcpu->stat.nvmx_vmresumes;
>  
>  	return nested_vmx_run(vcpu, false);
>  }
> @@ -5689,6 +5692,7 @@ static int handle_vmread(struct kvm_vcpu
>  	u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
>  	gva_t gva = 0;
>  
> +	++vcpu->stat.nvmx_vmreads;
>  	if (!nested_vmx_check_permission(vcpu) ||
>  	    !nested_vmx_check_vmcs12(vcpu))
>  		return 1;
> @@ -5741,6 +5745,8 @@ static int handle_vmwrite(struct kvm_vcp
>  	u64 field_value = 0;
>  	struct x86_exception e;
>  
> +	++vcpu->stat.nvmx_vmwrites;
> +
>  	if (!nested_vmx_check_permission(vcpu) ||
>  	    !nested_vmx_check_vmcs12(vcpu))
>  		return 1;
> @@ -5807,6 +5813,7 @@ static int handle_vmptrld(struct kvm_vcp
>  	gva_t gva;
>  	gpa_t vmptr;
>  	struct x86_exception e;
> +	++vcpu->stat.nvmx_vmptrlds;
>  
>  	if (!nested_vmx_check_permission(vcpu))
>  		return 1;
> --- .before/arch/x86/kvm/x86.c	2013-03-10 18:00:54.000000000 +0200
> +++ .after/arch/x86/kvm/x86.c	2013-03-10 18:00:54.000000000 +0200
> @@ -145,6 +145,12 @@ struct kvm_stats_debugfs_item debugfs_en
>  	{ "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) },
>  	{ "irq_injections", VCPU_STAT(irq_injections) },
>  	{ "nmi_injections", VCPU_STAT(nmi_injections) },
> +	{ "nvmx_vmreads", VCPU_STAT(nvmx_vmreads) },
> +	{ "nvmx_vmwrites", VCPU_STAT(nvmx_vmwrites) },
> +	{ "nvmx_vmptrlds", VCPU_STAT(nvmx_vmptrlds) },
> +	{ "nvmx_vmlaunchs", VCPU_STAT(nvmx_vmlaunchs) },
> +	{ "nvmx_vmresumes", VCPU_STAT(nvmx_vmresumes) },
> +	{ "nvmx_vmclears", VCPU_STAT(nvmx_vmclears) },
>  	{ "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) },
>  	{ "mmu_pte_write", VM_STAT(mmu_pte_write) },
>  	{ "mmu_pte_updated", VM_STAT(mmu_pte_updated) },
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/11] KVM: nVMX: Detect shadow-vmcs capability
  2013-03-10 16:04 ` [PATCH 03/11] KVM: nVMX: Detect shadow-vmcs capability Abel Gordon
@ 2013-04-08 11:12   ` Gleb Natapov
  2013-04-10 19:14     ` Abel Gordon
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-08 11:12 UTC (permalink / raw)
  To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu

On Sun, Mar 10, 2013 at 06:04:55PM +0200, Abel Gordon wrote:
> Add logic required to detect if shadow-vmcs is supported by the
> processor. Introduce a new kernel module parameter to specify if L0 should use
> shadow vmcs (or not) to run L1.
> 
> Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> ---
>  arch/x86/kvm/vmx.c |   25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> --- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
> +++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
> @@ -86,6 +86,8 @@ module_param(fasteoi, bool, S_IRUGO);
>  
>  static bool __read_mostly enable_apicv_reg_vid;
>  
> +static bool __read_mostly enable_shadow_vmcs = 1;
> +module_param_named(enable_shadow_vmcs, enable_shadow_vmcs, bool, S_IRUGO);
>  /*
>   * If nested=1, nested virtualization is supported, i.e., guests may use
>   * VMX and be a hypervisor for its own guests. If nested=0, guests may not
> @@ -895,6 +897,18 @@ static inline bool cpu_has_vmx_wbinvd_ex
>  		SECONDARY_EXEC_WBINVD_EXITING;
>  }
>  
> +static inline bool cpu_has_vmx_shadow_vmcs(void)
> +{
> +	u64 vmx_msr;
> +	rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
> +	/* check if the cpu supports writing r/o exit information fields */
> +	if (!(vmx_msr & (1u << 29)))
Define please.

> +		return false;
> +
> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
> +		SECONDARY_EXEC_SHADOW_VMCS;
> +}
> +
>  static inline bool report_flexpriority(void)
>  {
>  	return flexpriority_enabled;
> @@ -2582,7 +2596,8 @@ static __init int setup_vmcs_config(stru
>  			SECONDARY_EXEC_RDTSCP |
>  			SECONDARY_EXEC_ENABLE_INVPCID |
>  			SECONDARY_EXEC_APIC_REGISTER_VIRT |
> -			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
> +			SECONDARY_EXEC_SHADOW_VMCS;
>  		if (adjust_vmx_controls(min2, opt2,
>  					MSR_IA32_VMX_PROCBASED_CTLS2,
>  					&_cpu_based_2nd_exec_control) < 0)
> @@ -2771,6 +2786,8 @@ static __init int hardware_setup(void)
>  
>  	if (!cpu_has_vmx_vpid())
>  		enable_vpid = 0;
> +	if (!cpu_has_vmx_shadow_vmcs())
> +		enable_shadow_vmcs = 0;
>  
>  	if (!cpu_has_vmx_ept() ||
>  	    !cpu_has_vmx_ept_4levels()) {
> @@ -3982,6 +3999,12 @@ static u32 vmx_secondary_exec_control(st
>  		exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
>  				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>  	exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
> +	/* SECONDARY_EXEC_SHADOW_VMCS is enabled when L1 executes VMPTRLD
> +	   (handle_vmptrld).
> +	   We can NOT enable shadow_vmcs here because we don't have yet
> +	   a current VMCS12
> +	*/
> +	exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
>  	return exec_control;
>  }
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/11] KVM: nVMX: Introduce vmread and vmwrite bitmaps
  2013-03-10 16:05 ` [PATCH 04/11] KVM: nVMX: Introduce vmread and vmwrite bitmaps Abel Gordon
@ 2013-04-08 11:50   ` Gleb Natapov
  2013-04-10 19:14     ` Abel Gordon
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-08 11:50 UTC (permalink / raw)
  To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu

On Sun, Mar 10, 2013 at 06:05:25PM +0200, Abel Gordon wrote:
> Prepare vmread and vmwrite bitmaps according to a pre-specified list of fields.
> These lists are intended to specifiy most frequent accessed fields so we can
> minimize the number of fields that are copied from/to the software controlled
> VMCS12 format to/from to processor-specific shadow vmcs. The lists were built
> measuring the VMCS fields access rate after L2 Ubuntu 12.04 booted when it was
> running on top of L1 KVM, also Ubuntu 12.04. Note that during boot there were
> additional fields which were frequently modified but they were not added to
> these lists because after boot these fields were not longer accessed by L1.
> 
> Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> ---
>  arch/x86/kvm/vmx.c |   75 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 74 insertions(+), 1 deletion(-)
> 
> --- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
> +++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
> @@ -453,6 +453,51 @@ static inline struct vcpu_vmx *to_vmx(st
>  #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
>  				[number##_HIGH] = VMCS12_OFFSET(name)+4
>  
> +
> +static const unsigned long shadow_read_only_fields[] = {
> +	VM_EXIT_REASON,
> +	VM_EXIT_INTR_INFO,
> +	VM_EXIT_INSTRUCTION_LEN,
> +	IDT_VECTORING_INFO_FIELD,
> +	IDT_VECTORING_ERROR_CODE,
> +	VM_EXIT_INTR_ERROR_CODE,
> +	EXIT_QUALIFICATION,
> +	GUEST_LINEAR_ADDRESS,
> +	GUEST_PHYSICAL_ADDRESS
> +};
> +static const int max_shadow_read_only_fields = ARRAY_SIZE(shadow_read_only_fields);
> +
> +static const unsigned long shadow_read_write_fields[] = {
> +	GUEST_RIP,
> +	GUEST_RSP,
> +	GUEST_CR0,
> +	GUEST_CR3,
> +	GUEST_CR4,
> +	GUEST_INTERRUPTIBILITY_INFO,
> +	GUEST_RFLAGS,
> +	GUEST_CS_SELECTOR,
> +	GUEST_CS_AR_BYTES,
> +	GUEST_CS_LIMIT,
> +	GUEST_CS_BASE,
> +	GUEST_ES_BASE,
> +	CR0_GUEST_HOST_MASK,
> +	CR0_READ_SHADOW,
> +	CR4_READ_SHADOW,
> +	TSC_OFFSET,
> +	EXCEPTION_BITMAP,
> +	CPU_BASED_VM_EXEC_CONTROL,
> +	VM_ENTRY_EXCEPTION_ERROR_CODE,
> +	VM_ENTRY_INTR_INFO_FIELD,
> +	VM_ENTRY_INSTRUCTION_LEN,
> +	VM_ENTRY_EXCEPTION_ERROR_CODE,
> +	HOST_FS_BASE,
> +	HOST_GS_BASE,
> +	HOST_FS_SELECTOR,
> +	HOST_GS_SELECTOR
> +};
> +static const int max_shadow_read_write_fields =
> +	ARRAY_SIZE(shadow_read_write_fields);
> +
>  static const unsigned short vmcs_field_to_offset_table[] = {
>  	FIELD(VIRTUAL_PROCESSOR_ID, virtual_processor_id),
>  	FIELD(GUEST_ES_SELECTOR, guest_es_selector),
> @@ -642,6 +687,8 @@ static unsigned long *vmx_msr_bitmap_leg
>  static unsigned long *vmx_msr_bitmap_longmode;
>  static unsigned long *vmx_msr_bitmap_legacy_x2apic;
>  static unsigned long *vmx_msr_bitmap_longmode_x2apic;
> +static unsigned long *vmx_vmread_bitmap;
> +static unsigned long *vmx_vmwrite_bitmap;
>  
>  static bool cpu_has_load_ia32_efer;
>  static bool cpu_has_load_perf_global_ctrl;
> @@ -4033,6 +4080,8 @@ static int vmx_vcpu_setup(struct vcpu_vm
>  	vmcs_write64(IO_BITMAP_A, __pa(vmx_io_bitmap_a));
>  	vmcs_write64(IO_BITMAP_B, __pa(vmx_io_bitmap_b));
>  
> +	vmcs_write64(VMREAD_BITMAP, __pa(vmx_vmread_bitmap));
> +	vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap));
Why are you doing it without checking that shadow vmcs is supported and
enabled?

>  	if (cpu_has_vmx_msr_bitmap())
>  		vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy));
>  
> @@ -7764,6 +7813,24 @@ static int __init vmx_init(void)
>  				(unsigned long *)__get_free_page(GFP_KERNEL);
>  	if (!vmx_msr_bitmap_longmode_x2apic)
>  		goto out4;
> +	vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
> +	if (!vmx_vmread_bitmap)
> +		goto out4;
> +
> +	vmx_vmwrite_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
> +	if (!vmx_vmwrite_bitmap)
> +		goto out5;
> +
We need to cleanup this bitmaps allocations some day to allocate only when the
feature is supported and used.

> +	memset(vmx_vmread_bitmap, 0xff, PAGE_SIZE);
> +	memset(vmx_vmwrite_bitmap, 0xff, PAGE_SIZE);
> +	/* shadowed read/write fields */
> +	for (i = 0; i < max_shadow_read_write_fields; i++) {
> +		clear_bit(shadow_read_write_fields[i], vmx_vmwrite_bitmap);
> +		clear_bit(shadow_read_write_fields[i], vmx_vmread_bitmap);
> +	}
> +	/* shadowed read only fields */
> +	for (i = 0; i < max_shadow_read_only_fields; i++)
> +		clear_bit(shadow_read_only_fields[i], vmx_vmread_bitmap);
>  
>  	/*
>  	 * Allow direct access to the PC debug port (it is often used for I/O
> @@ -7782,7 +7849,7 @@ static int __init vmx_init(void)
>  	r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
>  		     __alignof__(struct vcpu_vmx), THIS_MODULE);
>  	if (r)
> -		goto out3;
> +		goto out6;
>  
>  #ifdef CONFIG_KEXEC
>  	rcu_assign_pointer(crash_vmclear_loaded_vmcss,
> @@ -7830,6 +7897,10 @@ static int __init vmx_init(void)
>  
>  	return 0;
>  
> +out6:
> +	free_page((unsigned long)vmx_vmwrite_bitmap);
> +out5:
> +	free_page((unsigned long)vmx_vmread_bitmap);
>  out4:
>  	free_page((unsigned long)vmx_msr_bitmap_longmode);
>  out3:
> @@ -7851,6 +7922,8 @@ static void __exit vmx_exit(void)
>  	free_page((unsigned long)vmx_msr_bitmap_longmode);
>  	free_page((unsigned long)vmx_io_bitmap_b);
>  	free_page((unsigned long)vmx_io_bitmap_a);
> +	free_page((unsigned long)vmx_vmwrite_bitmap);
> +	free_page((unsigned long)vmx_vmread_bitmap);
>  
>  #ifdef CONFIG_KEXEC
>  	rcu_assign_pointer(crash_vmclear_loaded_vmcss, NULL);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/11] KVM: nVMX: Refactor handle_vmwrite
  2013-03-10 16:05 ` [PATCH 05/11] KVM: nVMX: Refactor handle_vmwrite Abel Gordon
@ 2013-04-09 11:05   ` Gleb Natapov
  2013-04-10 20:35     ` Abel Gordon
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-09 11:05 UTC (permalink / raw)
  To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu

On Sun, Mar 10, 2013 at 06:05:55PM +0200, Abel Gordon wrote:
> Refactor existent code so we re-use vmcs12_write_any to copy fields from the
> shadow vmcs specified by the link pointer (used by the processor,
> implementation-specific) to the VMCS12 software format used by L0 to hold
> the fields in L1 memory address space.
> 
> Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> ---
>  arch/x86/kvm/vmx.c |   52 +++++++++++++++++++++++--------------------
>  1 file changed, 28 insertions(+), 24 deletions(-)
> 
> --- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
> +++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:54.000000000 +0200
> @@ -5741,6 +5741,33 @@ static inline bool vmcs12_read_any(struc
>  	}
>  }
>  
> +
> +static inline bool vmcs12_write_any(struct kvm_vcpu *vcpu,
> +				    unsigned long field, u64 field_value){
> +	short offset = vmcs_field_to_offset(field);
> +	char *p = ((char *) get_vmcs12(vcpu)) + offset;
> +	if (offset < 0)
> +		return 0;
The function returns bool, so use true/false please.

> +
> +	switch (vmcs_field_type(field)) {
> +	case VMCS_FIELD_TYPE_U16:
> +		*(u16 *)p = field_value;
> +		return 1;
> +	case VMCS_FIELD_TYPE_U32:
> +		*(u32 *)p = field_value;
> +		return 1;
> +	case VMCS_FIELD_TYPE_U64:
> +		*(u64 *)p = field_value;
> +		return 1;
> +	case VMCS_FIELD_TYPE_NATURAL_WIDTH:
> +		*(natural_width *)p = field_value;
> +		return 1;
> +	default:
> +		return 0; /* can never happen. */
> +	}
> +
> +}
> +
>  /*
>   * VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
>   * used before) all generate the same failure when it is missing.
> @@ -5806,8 +5833,6 @@ static int handle_vmwrite(struct kvm_vcp
>  	gva_t gva;
>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>  	u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
> -	char *p;
> -	short offset;
>  	/* The value to write might be 32 or 64 bits, depending on L1's long
>  	 * mode, and eventually we need to write that into a field of several
>  	 * possible lengths. The code below first zero-extends the value to 64
> @@ -5846,28 +5871,7 @@ static int handle_vmwrite(struct kvm_vcp
>  		return 1;
>  	}
>  
> -	offset = vmcs_field_to_offset(field);
> -	if (offset < 0) {
> -		nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
> -		skip_emulated_instruction(vcpu);
> -		return 1;
> -	}
> -	p = ((char *) get_vmcs12(vcpu)) + offset;
> -
> -	switch (vmcs_field_type(field)) {
> -	case VMCS_FIELD_TYPE_U16:
> -		*(u16 *)p = field_value;
> -		break;
> -	case VMCS_FIELD_TYPE_U32:
> -		*(u32 *)p = field_value;
> -		break;
> -	case VMCS_FIELD_TYPE_U64:
> -		*(u64 *)p = field_value;
> -		break;
> -	case VMCS_FIELD_TYPE_NATURAL_WIDTH:
> -		*(natural_width *)p = field_value;
> -		break;
> -	default:
> +	if (!vmcs12_write_any(vcpu, field, field_value)) {
>  		nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
>  		skip_emulated_instruction(vcpu);
>  		return 1;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/11] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs
  2013-03-10 16:07 ` [PATCH 09/11] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs Abel Gordon
@ 2013-04-09 12:47   ` Gleb Natapov
  2013-04-10 19:15     ` Abel Gordon
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-09 12:47 UTC (permalink / raw)
  To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu

On Sun, Mar 10, 2013 at 06:07:56PM +0200, Abel Gordon wrote:
> Introduce a function used to copy fields from the software controlled VMCS12
> to the processor-specific shadow vmcs
> 
> Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> ---
>  arch/x86/kvm/vmx.c |   45 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> --- .before/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
> +++ .after/arch/x86/kvm/vmx.c	2013-03-10 18:00:55.000000000 +0200
> @@ -672,6 +672,7 @@ static void vmx_get_segment(struct kvm_v
>  			    struct kvm_segment *var, int seg);
>  static bool guest_state_valid(struct kvm_vcpu *vcpu);
>  static u32 vmx_segment_access_rights(struct kvm_segment *var);
> +static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
>  static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
>  
>  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
> @@ -5813,6 +5814,50 @@ static void copy_shadow_to_vmcs12(struct
>  	vmcs_load(vmx->loaded_vmcs->vmcs);
>  }
>  
> +static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
> +{
> +	int num_lists = 2;
        int num_lists = ARRAY_SIZE(fields)

> +	unsigned long *fields[] = {
> +		(unsigned long *)shadow_read_write_fields,
> +		(unsigned long *)shadow_read_only_fields
> +	};
> +	int max_fields[] = {
> +		max_shadow_read_write_fields,
> +		max_shadow_read_only_fields
> +	};
> +	int i, q;
> +	unsigned long field;
> +	u64 field_value = 0;
> +	struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
> +
> +	vmcs_load(shadow_vmcs);
> +
> +	for (q = 0; q < num_lists; q++) {
> +		for (i = 0; i < max_fields[q]; i++) {
> +			field = fields[q][i];
> +			vmcs12_read_any(&vmx->vcpu, field, &field_value);
> +
> +			switch (vmcs_field_type(field)) {
> +			case VMCS_FIELD_TYPE_U16:
> +				vmcs_write16(field, (u16)field_value);
> +				break;
> +			case VMCS_FIELD_TYPE_U32:
> +				vmcs_write32(field, (u32)field_value);
> +				break;
> +			case VMCS_FIELD_TYPE_U64:
> +				vmcs_write64(field, (u64)field_value);
> +				break;
> +			case VMCS_FIELD_TYPE_NATURAL_WIDTH:
> +				vmcs_writel(field, (long)field_value);
> +				break;
> +			}
> +		}
> +	}
> +
> +	vmcs_clear(shadow_vmcs);
> +	vmcs_load(vmx->loaded_vmcs->vmcs);
> +}
> +
>  /*
>   * VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
>   * used before) all generate the same failure when it is missing.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-03-11  7:54     ` Abel Gordon
@ 2013-04-09 13:14       ` Gleb Natapov
  2013-04-10 19:15         ` Abel Gordon
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-09 13:14 UTC (permalink / raw)
  To: Abel Gordon; +Cc: Nadav Har'El, dongxiao.xu, jun.nakajima, kvm, owasserm

On Mon, Mar 11, 2013 at 09:54:47AM +0200, Abel Gordon wrote:
> "Nadav Har'El" <nyh@math.technion.ac.il> wrote on 11/03/2013 12:43:35 AM:
> 
> 
> > On Sun, Mar 10, 2013, Abel Gordon wrote about "[PATCH 10/11] KVM:
> > nVMX: Synchronize VMCS12 content with the shadow vmcs":
> > >     nested_vmx_vmexit(vcpu);
> > > +   if (enable_shadow_vmcs)
> > > +      copy_vmcs12_to_shadow(to_vmx(vcpu));
> >
> > I was curious why your patch adds this call to copy_vmcs12_to_shadow
> after
> > every nested_vmx_vmexit (3 times), instead of making this call inside
> > nested_vmx_vmexit(), say right after prepare_vmcs12(). Until I saw:
> 
> Because nested code sometimes modifies vmcs fileds "after"
> nested_vmx_vmexit (see below). I was afraid nested logic
> may be changed in the future and some field may become out-of-sync.
> 
> If we do have to call copy_vmcs12_to_shadow explicitly, then, it will be
> more difficult to miss some field.
> 
I think the patch already miss some fields. What if nested_vmx_run()
fails and calls nested_vmx_entry_failure(). nested_vmx_entry_failure()
sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but where do
we copy them back to shadow before going back to L1?

May be we need to introduce vmcs12 accessors to track what is changes
and if something need to be copied to shadow before going back to L1.

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/11] KVM: nVMX: Stats counters for nVMX
  2013-04-08 10:27   ` Gleb Natapov
@ 2013-04-10 19:08     ` Abel Gordon
  2013-04-11  6:10       ` Gleb Natapov
  0 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-04-10 19:08 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 08/04/2013 01:27:28 PM:


> stat counters are deprecated in favor of trace points. Adding kvmnested
> trace system is very welcome though.

So, should I keep or remove this patch ?
If a kvmnested trace system is added then I'll add shadow-vmcs related
events


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/11] KVM: nVMX: Detect shadow-vmcs capability
  2013-04-08 11:12   ` Gleb Natapov
@ 2013-04-10 19:14     ` Abel Gordon
  0 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-04-10 19:14 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 08/04/2013 02:12:07 PM:


> > +static inline bool cpu_has_vmx_shadow_vmcs(void)
> > +{
> > +   u64 vmx_msr;
> > +   rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
> > +   /* check if the cpu supports writing r/o exit information fields */
> > +   if (!(vmx_msr & (1u << 29)))
> Define please.

I will add #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS





^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/11] KVM: nVMX: Introduce vmread and vmwrite bitmaps
  2013-04-08 11:50   ` Gleb Natapov
@ 2013-04-10 19:14     ` Abel Gordon
  0 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-04-10 19:14 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 08/04/2013 02:50:54 PM:


> >  static bool cpu_has_load_ia32_efer;
> >  static bool cpu_has_load_perf_global_ctrl;
> > @@ -4033,6 +4080,8 @@ static int vmx_vcpu_setup(struct vcpu_vm
> >     vmcs_write64(IO_BITMAP_A, __pa(vmx_io_bitmap_a));
> >     vmcs_write64(IO_BITMAP_B, __pa(vmx_io_bitmap_b));
> >
> > +   vmcs_write64(VMREAD_BITMAP, __pa(vmx_vmread_bitmap));
> > +   vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap));
> Why are you doing it without checking that shadow vmcs is supported and
> enabled?

I will add these lines under "if (enable_shadow_vmcs)"


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/11] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs
  2013-04-09 12:47   ` Gleb Natapov
@ 2013-04-10 19:15     ` Abel Gordon
  0 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-04-10 19:15 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 03:47:50 PM:

> >
> > +static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
> > +{
> > +   int num_lists = 2;
>         int num_lists = ARRAY_SIZE(fields)

I will fix this.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-09 13:14       ` Gleb Natapov
@ 2013-04-10 19:15         ` Abel Gordon
  2013-04-11  6:54           ` Gleb Natapov
  0 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-04-10 19:15 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, Nadav Har'El, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:

> On Mon, Mar 11, 2013 at 09:54:47AM +0200, Abel Gordon wrote:
> > "Nadav Har'El" <nyh@math.technion.ac.il> wrote on 11/03/2013 12:43:35
AM:
> >
> >
> > > On Sun, Mar 10, 2013, Abel Gordon wrote about "[PATCH 10/11] KVM:
> > > nVMX: Synchronize VMCS12 content with the shadow vmcs":
> > > >     nested_vmx_vmexit(vcpu);
> > > > +   if (enable_shadow_vmcs)
> > > > +      copy_vmcs12_to_shadow(to_vmx(vcpu));
> > >
> > > I was curious why your patch adds this call to copy_vmcs12_to_shadow
> > after
> > > every nested_vmx_vmexit (3 times), instead of making this call inside
> > > nested_vmx_vmexit(), say right after prepare_vmcs12(). Until I saw:
> >

> > Because nested code sometimes modifies vmcs fileds "after"
> > nested_vmx_vmexit (see below). I was afraid nested logic
> > may be changed in the future and some field may become out-of-sync.
> >
> > If we do have to call copy_vmcs12_to_shadow explicitly, then, it will
be
> > more difficult to miss some field.
> >


> I think the patch already miss some fields. What if nested_vmx_run()
> fails and calls nested_vmx_entry_failure(). nested_vmx_entry_failure()
> sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but where do
> we copy them back to shadow before going back to L1?

Good catch! :)

Note that the entry path is easy to handle because we copy the fields
as part of nested_vmx_entry. This is not like exit path where
KVM(L0) code can modify fields after nested_vmx_vmexit is called.

So here, we could simple call copy_vmcs12_to_shadow if the entry fails
(as part of nested_vmx_entry_failure or nested_vmx). We could optimize
the code by updating these specific fields directly, but I don't think
we really need to optimize code that is part of the error path.

> May be we need to introduce vmcs12 accessors to track what is changes
> and if something need to be copied to shadow before going back to L1.

That means we will need to modify all the lines of code that uses
"vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write function.
Inside these inline functions we could access the shadow vmcs directly.
However, to access the shadow vmcs we need to vmptrld first and this will
force
unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01) each time
the code accesses a vmcs12 field. Alternatively, if we want to avoid
unnecessary vmptrlds each time we access vmcs12 we could simple set a
flag that indicates when a shadow field was changed. In this case, we will
need to find all the places to check the flag and copy the fields,
considering both success and error paths.
Finally, I am afraid that these directions will introduce new issues,
will force us to modify too many lines and they may create a merge/rebase
mess...

Maybe we should simple fix nested_vmx_entry_failure (as well as the
other fixes you suggested in other patches) and apply the code.
Do you agree ?




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/11] KVM: nVMX: Refactor handle_vmwrite
  2013-04-09 11:05   ` Gleb Natapov
@ 2013-04-10 20:35     ` Abel Gordon
  0 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-04-10 20:35 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 02:05:07 PM:

> > +static inline bool vmcs12_write_any(struct kvm_vcpu *vcpu,
> > +                unsigned long field, u64 field_value){
> > +   short offset = vmcs_field_to_offset(field);
> > +   char *p = ((char *) get_vmcs12(vcpu)) + offset;
> > +   if (offset < 0)
> > +      return 0;
> The function returns bool, so use true/false please.

Will change to return true/false.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/11] KVM: nVMX: Stats counters for nVMX
  2013-04-10 19:08     ` Abel Gordon
@ 2013-04-11  6:10       ` Gleb Natapov
  0 siblings, 0 replies; 46+ messages in thread
From: Gleb Natapov @ 2013-04-11  6:10 UTC (permalink / raw)
  To: Abel Gordon; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm

On Wed, Apr 10, 2013 at 10:08:56PM +0300, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 08/04/2013 01:27:28 PM:
> 
> 
> > stat counters are deprecated in favor of trace points. Adding kvmnested
> > trace system is very welcome though.
> 
> So, should I keep or remove this patch ?
Remove.

> If a kvmnested trace system is added then I'll add shadow-vmcs related
> events
Adding it is as simple as adding arch/x86/kvm/nestedtrace.h file similar
to arch/x86/kvm/mmutrace.h. Doing #define TRACE_SYSTEM kvmnested there
and adding trace events that you want to trace. The advantage is that
you can provide additional information with each trace point. For
example name and value of a field that was vmwread.

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-10 19:15         ` Abel Gordon
@ 2013-04-11  6:54           ` Gleb Natapov
  2013-04-12 10:26             ` Abel Gordon
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-11  6:54 UTC (permalink / raw)
  To: Abel Gordon; +Cc: dongxiao.xu, jun.nakajima, kvm, Nadav Har'El, owasserm

On Wed, Apr 10, 2013 at 10:15:37PM +0300, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:
> 
> > On Mon, Mar 11, 2013 at 09:54:47AM +0200, Abel Gordon wrote:
> > > "Nadav Har'El" <nyh@math.technion.ac.il> wrote on 11/03/2013 12:43:35
> AM:
> > >
> > >
> > > > On Sun, Mar 10, 2013, Abel Gordon wrote about "[PATCH 10/11] KVM:
> > > > nVMX: Synchronize VMCS12 content with the shadow vmcs":
> > > > >     nested_vmx_vmexit(vcpu);
> > > > > +   if (enable_shadow_vmcs)
> > > > > +      copy_vmcs12_to_shadow(to_vmx(vcpu));
> > > >
> > > > I was curious why your patch adds this call to copy_vmcs12_to_shadow
> > > after
> > > > every nested_vmx_vmexit (3 times), instead of making this call inside
> > > > nested_vmx_vmexit(), say right after prepare_vmcs12(). Until I saw:
> > >
> 
> > > Because nested code sometimes modifies vmcs fileds "after"
> > > nested_vmx_vmexit (see below). I was afraid nested logic
> > > may be changed in the future and some field may become out-of-sync.
> > >
> > > If we do have to call copy_vmcs12_to_shadow explicitly, then, it will
> be
> > > more difficult to miss some field.
> > >
> 
> 
> > I think the patch already miss some fields. What if nested_vmx_run()
> > fails and calls nested_vmx_entry_failure(). nested_vmx_entry_failure()
> > sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but where do
> > we copy them back to shadow before going back to L1?
> 
> Good catch! :)
> 
> Note that the entry path is easy to handle because we copy the fields
> as part of nested_vmx_entry. This is not like exit path where
> KVM(L0) code can modify fields after nested_vmx_vmexit is called.
> 
> So here, we could simple call copy_vmcs12_to_shadow if the entry fails
> (as part of nested_vmx_entry_failure or nested_vmx). We could optimize
> the code by updating these specific fields directly, but I don't think
> we really need to optimize code that is part of the error path.
> 
We needn't.

> > May be we need to introduce vmcs12 accessors to track what is changes
> > and if something need to be copied to shadow before going back to L1.
> 
> That means we will need to modify all the lines of code that uses
> "vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write function.
> Inside these inline functions we could access the shadow vmcs directly.
> However, to access the shadow vmcs we need to vmptrld first and this will
> force
> unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01) each time
> the code accesses a vmcs12 field. Alternatively, if we want to avoid
> unnecessary vmptrlds each time we access vmcs12 we could simple set a
> flag that indicates when a shadow field was changed. In this case, we will
> need to find all the places to check the flag and copy the fields,
> considering both success and error paths.
That's not how I see it. nested_vmcs_write() will set a request bit in
vcpu (this is the flag you mention above). The bit will be checked during
a guest entry and vmcs12 will be synced to shadow at this point. Later
we can track what fields were written and sync only them.

> Finally, I am afraid that these directions will introduce new issues,
> will force us to modify too many lines and they may create a merge/rebase
> mess...
Conflicts should be trivial and I do not expect many iterations for the
patch series. I like the approach you take to use vmcs shadowing and most
comment are nitpicks.  I promise to review the next version as soon as
it is posted :)

> 
> Maybe we should simple fix nested_vmx_entry_failure (as well as the
> other fixes you suggested in other patches) and apply the code.
> Do you agree ?
> 
The approach is error prone. Even if we will fix all bugs in the current
code each new nested vmx modification will have to be reviewed with shadowing
in mind and sometimes it is hard to see global picture just from a
patch. It is better to hide shadow details from occasional nested vmx hacker.
Are you expecting EPT patches to make it right for instance?

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-11  6:54           ` Gleb Natapov
@ 2013-04-12 10:26             ` Abel Gordon
  2013-04-12 10:31               ` Gleb Natapov
  0 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-04-12 10:26 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, Nadav Har'El, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 11/04/2013 09:54:11 AM:

> On Wed, Apr 10, 2013 at 10:15:37PM +0300, Abel Gordon wrote:
> >
> >
> > Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:
> > > I think the patch already miss some fields. What if nested_vmx_run()
> > > fails and calls nested_vmx_entry_failure(). nested_vmx_entry_failure
()
> > > sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but where
do
> > > we copy them back to shadow before going back to L1?
> >
> > Good catch! :)
> >
> > Note that the entry path is easy to handle because we copy the fields
> > as part of nested_vmx_entry. This is not like exit path where
> > KVM(L0) code can modify fields after nested_vmx_vmexit is called.
> >
> > So here, we could simple call copy_vmcs12_to_shadow if the entry fails
> > (as part of nested_vmx_entry_failure or nested_vmx). We could optimize
> > the code by updating these specific fields directly, but I don't think
> > we really need to optimize code that is part of the error path.
> >
> We needn't.
>
> > > May be we need to introduce vmcs12 accessors to track what is changes
> > > and if something need to be copied to shadow before going back to L1.
> >
> > That means we will need to modify all the lines of code that uses
> > "vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write
function.
> > Inside these inline functions we could access the shadow vmcs directly.
> > However, to access the shadow vmcs we need to vmptrld first and this
will
> > force
> > unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01) each
time
> > the code accesses a vmcs12 field. Alternatively, if we want to avoid
> > unnecessary vmptrlds each time we access vmcs12 we could simple set a
> > flag that indicates when a shadow field was changed. In this case, we
will
> > need to find all the places to check the flag and copy the fields,
> > considering both success and error paths.
> That's not how I see it. nested_vmcs_write() will set a request bit in
> vcpu (this is the flag you mention above). The bit will be checked during
> a guest entry and vmcs12 will be synced to shadow at this point. Later
> we can track what fields were written and sync only them.
>
> > Finally, I am afraid that these directions will introduce new issues,
> > will force us to modify too many lines and they may create a
merge/rebase
> > mess...
> Conflicts should be trivial and I do not expect many iterations for the
> patch series. I like the approach you take to use vmcs shadowing and most
> comment are nitpicks.  I promise to review the next version as soon as
> it is posted :)
>
> >
> > Maybe we should simple fix nested_vmx_entry_failure (as well as the
> > other fixes you suggested in other patches) and apply the code.
> > Do you agree ?
> >
> The approach is error prone. Even if we will fix all bugs in the current
> code each new nested vmx modification will have to be reviewed with
shadowing
> in mind and sometimes it is hard to see global picture just from a
> patch. It is better to hide shadow details from occasional nested vmx
hacker.

But how ? I suggested 2 alternatives with pros/cons.
Do you have any other alternative in mid ?
Three more alternatives:
(3) set vmcs12 to null once the shadow is synched (so inappropriate
accesses
due to buggy code will access a null pointer and crash kvm).
(4) like -1-, but using the accessors and WARN/PANIC.
(5) change nested_vmx_vmexit so it can update additional fields passed
by the caller. Additional vmcs12 fields should NOT be updated directly
(as it is today). They should be updated only when calling
nested_vmx_vmexit.


> Are you expecting EPT patches to make it right for instance?

Yep, I actually added shadow vmcs also to a KVM version
that includes nEPT patches Nadav Har'El shared. It required adding
1 call to copy_vmcs12_to_shadow after a nested exit related to which
updates vmcs12 fields.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-12 10:26             ` Abel Gordon
@ 2013-04-12 10:31               ` Gleb Natapov
  2013-04-12 10:44                 ` Abel Gordon
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-12 10:31 UTC (permalink / raw)
  To: Abel Gordon; +Cc: dongxiao.xu, jun.nakajima, kvm, Nadav Har'El, owasserm

On Fri, Apr 12, 2013 at 01:26:32PM +0300, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 11/04/2013 09:54:11 AM:
> 
> > On Wed, Apr 10, 2013 at 10:15:37PM +0300, Abel Gordon wrote:
> > >
> > >
> > > Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:
> > > > I think the patch already miss some fields. What if nested_vmx_run()
> > > > fails and calls nested_vmx_entry_failure(). nested_vmx_entry_failure
> ()
> > > > sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but where
> do
> > > > we copy them back to shadow before going back to L1?
> > >
> > > Good catch! :)
> > >
> > > Note that the entry path is easy to handle because we copy the fields
> > > as part of nested_vmx_entry. This is not like exit path where
> > > KVM(L0) code can modify fields after nested_vmx_vmexit is called.
> > >
> > > So here, we could simple call copy_vmcs12_to_shadow if the entry fails
> > > (as part of nested_vmx_entry_failure or nested_vmx). We could optimize
> > > the code by updating these specific fields directly, but I don't think
> > > we really need to optimize code that is part of the error path.
> > >
> > We needn't.
> >
> > > > May be we need to introduce vmcs12 accessors to track what is changes
> > > > and if something need to be copied to shadow before going back to L1.
> > >
> > > That means we will need to modify all the lines of code that uses
> > > "vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write
> function.
> > > Inside these inline functions we could access the shadow vmcs directly.
> > > However, to access the shadow vmcs we need to vmptrld first and this
> will
> > > force
> > > unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01) each
> time
> > > the code accesses a vmcs12 field. Alternatively, if we want to avoid
> > > unnecessary vmptrlds each time we access vmcs12 we could simple set a
> > > flag that indicates when a shadow field was changed. In this case, we
> will
> > > need to find all the places to check the flag and copy the fields,
> > > considering both success and error paths.
> > That's not how I see it. nested_vmcs_write() will set a request bit in
> > vcpu (this is the flag you mention above). The bit will be checked during
> > a guest entry and vmcs12 will be synced to shadow at this point. Later
> > we can track what fields were written and sync only them.
> >
XXX

> > > Finally, I am afraid that these directions will introduce new issues,
> > > will force us to modify too many lines and they may create a
> merge/rebase
> > > mess...
> > Conflicts should be trivial and I do not expect many iterations for the
> > patch series. I like the approach you take to use vmcs shadowing and most
> > comment are nitpicks.  I promise to review the next version as soon as
> > it is posted :)
> >
> > >
> > > Maybe we should simple fix nested_vmx_entry_failure (as well as the
> > > other fixes you suggested in other patches) and apply the code.
> > > Do you agree ?
> > >
> > The approach is error prone. Even if we will fix all bugs in the current
> > code each new nested vmx modification will have to be reviewed with
> shadowing
> > in mind and sometimes it is hard to see global picture just from a
> > patch. It is better to hide shadow details from occasional nested vmx
> hacker.
> 
> But how ? I suggested 2 alternatives with pros/cons.
I outlined how above (search for XXX). Use accessors, set request bit,
sync during guest entry if request bit is set.

> Do you have any other alternative in mid ?
> Three more alternatives:
> (3) set vmcs12 to null once the shadow is synched (so inappropriate
> accesses
> due to buggy code will access a null pointer and crash kvm).
> (4) like -1-, but using the accessors and WARN/PANIC.
> (5) change nested_vmx_vmexit so it can update additional fields passed
> by the caller. Additional vmcs12 fields should NOT be updated directly
> (as it is today). They should be updated only when calling
> nested_vmx_vmexit.
> 
> 
> > Are you expecting EPT patches to make it right for instance?
> 
> Yep, I actually added shadow vmcs also to a KVM version
> that includes nEPT patches Nadav Har'El shared. It required adding
> 1 call to copy_vmcs12_to_shadow after a nested exit related to which
> updates vmcs12 fields.

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-12 10:31               ` Gleb Natapov
@ 2013-04-12 10:44                 ` Abel Gordon
  2013-04-12 10:48                   ` Gleb Natapov
  0 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-04-12 10:44 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: dongxiao.xu, jun.nakajima, kvm, kvm-owner, Nadav Har'El, owasserm



kvm-owner@vger.kernel.org wrote on 12/04/2013 01:31:17 PM:

> From: Gleb Natapov <gleb@redhat.com>
> To: Abel Gordon/Haifa/IBM@IBMIL,
> Cc: dongxiao.xu@intel.com, jun.nakajima@intel.com,
> kvm@vger.kernel.org, "Nadav Har'El" <nyh@math.technion.ac.il>,
> owasserm@redhat.com
> Date: 12/04/2013 01:31 PM
> Subject: Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content
> with the shadow vmcs
> Sent by: kvm-owner@vger.kernel.org
>
> On Fri, Apr 12, 2013 at 01:26:32PM +0300, Abel Gordon wrote:
> >
> >
> > Gleb Natapov <gleb@redhat.com> wrote on 11/04/2013 09:54:11 AM:
> >
> > > On Wed, Apr 10, 2013 at 10:15:37PM +0300, Abel Gordon wrote:
> > > >
> > > >
> > > > Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:
> > > > > I think the patch already miss some fields. What if
nested_vmx_run()
> > > > > fails and calls nested_vmx_entry_failure().
nested_vmx_entry_failure
> > ()
> > > > > sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but
where
> > do
> > > > > we copy them back to shadow before going back to L1?
> > > >
> > > > Good catch! :)
> > > >
> > > > Note that the entry path is easy to handle because we copy the
fields
> > > > as part of nested_vmx_entry. This is not like exit path where
> > > > KVM(L0) code can modify fields after nested_vmx_vmexit is called.
> > > >
> > > > So here, we could simple call copy_vmcs12_to_shadow if the entry
fails
> > > > (as part of nested_vmx_entry_failure or nested_vmx). We could
optimize
> > > > the code by updating these specific fields directly, but I don't
think
> > > > we really need to optimize code that is part of the error path.
> > > >
> > > We needn't.
> > >
> > > > > May be we need to introduce vmcs12 accessors to track what is
changes
> > > > > and if something need to be copied to shadow before going back to
L1.
> > > >
> > > > That means we will need to modify all the lines of code that uses
> > > > "vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write
> > function.
> > > > Inside these inline functions we could access the shadow vmcs
directly.
> > > > However, to access the shadow vmcs we need to vmptrld first and
this
> > will
> > > > force
> > > > unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01)
each
> > time
> > > > the code accesses a vmcs12 field. Alternatively, if we want to
avoid
> > > > unnecessary vmptrlds each time we access vmcs12 we could simple set
a
> > > > flag that indicates when a shadow field was changed. In this case,
we
> > will
> > > > need to find all the places to check the flag and copy the fields,
> > > > considering both success and error paths.
> > > That's not how I see it. nested_vmcs_write() will set a request bit
in
> > > vcpu (this is the flag you mention above). The bit will be checked
during
> > > a guest entry and vmcs12 will be synced to shadow at this point.
Later
> > > we can track what fields were written and sync only them.
> > >
> XXX
>
> > > > Finally, I am afraid that these directions will introduce new
issues,
> > > > will force us to modify too many lines and they may create a
> > merge/rebase
> > > > mess...
> > > Conflicts should be trivial and I do not expect many iterations for
the
> > > patch series. I like the approach you take to use vmcs shadowing and
most
> > > comment are nitpicks.  I promise to review the next version as soon
as
> > > it is posted :)
> > >
> > > >
> > > > Maybe we should simple fix nested_vmx_entry_failure (as well as the
> > > > other fixes you suggested in other patches) and apply the code.
> > > > Do you agree ?
> > > >
> > > The approach is error prone. Even if we will fix all bugs in the
current
> > > code each new nested vmx modification will have to be reviewed with
> > shadowing
> > > in mind and sometimes it is hard to see global picture just from a
> > > patch. It is better to hide shadow details from occasional nested vmx
> > hacker.
> >
> > But how ? I suggested 2 alternatives with pros/cons.
> I outlined how above (search for XXX). Use accessors, set request bit,
> sync during guest entry if request bit is set.

Ok, so then you prefer to add the inline functions to read/write to the
vmcs12
fields, (to set the request bit if shadowed field changed) and you are not
concerned
about any merge/rebase mess. I will work on this direction.
I'll first send an independent patch to introduce the accessors. Once you
apply this patch, I'll continue and send you v2 patches for shadow vmcs.

Do you agree ?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-12 10:44                 ` Abel Gordon
@ 2013-04-12 10:48                   ` Gleb Natapov
  2013-04-14  9:51                     ` Abel Gordon
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-12 10:48 UTC (permalink / raw)
  To: Abel Gordon
  Cc: dongxiao.xu, jun.nakajima, kvm, kvm-owner, Nadav Har'El, owasserm

On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> 
> 
> kvm-owner@vger.kernel.org wrote on 12/04/2013 01:31:17 PM:
> 
> > From: Gleb Natapov <gleb@redhat.com>
> > To: Abel Gordon/Haifa/IBM@IBMIL,
> > Cc: dongxiao.xu@intel.com, jun.nakajima@intel.com,
> > kvm@vger.kernel.org, "Nadav Har'El" <nyh@math.technion.ac.il>,
> > owasserm@redhat.com
> > Date: 12/04/2013 01:31 PM
> > Subject: Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content
> > with the shadow vmcs
> > Sent by: kvm-owner@vger.kernel.org
> >
> > On Fri, Apr 12, 2013 at 01:26:32PM +0300, Abel Gordon wrote:
> > >
> > >
> > > Gleb Natapov <gleb@redhat.com> wrote on 11/04/2013 09:54:11 AM:
> > >
> > > > On Wed, Apr 10, 2013 at 10:15:37PM +0300, Abel Gordon wrote:
> > > > >
> > > > >
> > > > > Gleb Natapov <gleb@redhat.com> wrote on 09/04/2013 04:14:35 PM:
> > > > > > I think the patch already miss some fields. What if
> nested_vmx_run()
> > > > > > fails and calls nested_vmx_entry_failure().
> nested_vmx_entry_failure
> > > ()
> > > > > > sets vmcs12->vm_exit_reason and vmcs12->exit_qualification, but
> where
> > > do
> > > > > > we copy them back to shadow before going back to L1?
> > > > >
> > > > > Good catch! :)
> > > > >
> > > > > Note that the entry path is easy to handle because we copy the
> fields
> > > > > as part of nested_vmx_entry. This is not like exit path where
> > > > > KVM(L0) code can modify fields after nested_vmx_vmexit is called.
> > > > >
> > > > > So here, we could simple call copy_vmcs12_to_shadow if the entry
> fails
> > > > > (as part of nested_vmx_entry_failure or nested_vmx). We could
> optimize
> > > > > the code by updating these specific fields directly, but I don't
> think
> > > > > we really need to optimize code that is part of the error path.
> > > > >
> > > > We needn't.
> > > >
> > > > > > May be we need to introduce vmcs12 accessors to track what is
> changes
> > > > > > and if something need to be copied to shadow before going back to
> L1.
> > > > >
> > > > > That means we will need to modify all the lines of code that uses
> > > > > "vmcs12->" with an inline nested_vmcs_read or nested_vmcs_write
> > > function.
> > > > > Inside these inline functions we could access the shadow vmcs
> directly.
> > > > > However, to access the shadow vmcs we need to vmptrld first and
> this
> > > will
> > > > > force
> > > > > unnecessary vmptrlds (between shadow vmcs 12 and current vmcs 01)
> each
> > > time
> > > > > the code accesses a vmcs12 field. Alternatively, if we want to
> avoid
> > > > > unnecessary vmptrlds each time we access vmcs12 we could simple set
> a
> > > > > flag that indicates when a shadow field was changed. In this case,
> we
> > > will
> > > > > need to find all the places to check the flag and copy the fields,
> > > > > considering both success and error paths.
> > > > That's not how I see it. nested_vmcs_write() will set a request bit
> in
> > > > vcpu (this is the flag you mention above). The bit will be checked
> during
> > > > a guest entry and vmcs12 will be synced to shadow at this point.
> Later
> > > > we can track what fields were written and sync only them.
> > > >
> > XXX
> >
> > > > > Finally, I am afraid that these directions will introduce new
> issues,
> > > > > will force us to modify too many lines and they may create a
> > > merge/rebase
> > > > > mess...
> > > > Conflicts should be trivial and I do not expect many iterations for
> the
> > > > patch series. I like the approach you take to use vmcs shadowing and
> most
> > > > comment are nitpicks.  I promise to review the next version as soon
> as
> > > > it is posted :)
> > > >
> > > > >
> > > > > Maybe we should simple fix nested_vmx_entry_failure (as well as the
> > > > > other fixes you suggested in other patches) and apply the code.
> > > > > Do you agree ?
> > > > >
> > > > The approach is error prone. Even if we will fix all bugs in the
> current
> > > > code each new nested vmx modification will have to be reviewed with
> > > shadowing
> > > > in mind and sometimes it is hard to see global picture just from a
> > > > patch. It is better to hide shadow details from occasional nested vmx
> > > hacker.
> > >
> > > But how ? I suggested 2 alternatives with pros/cons.
> > I outlined how above (search for XXX). Use accessors, set request bit,
> > sync during guest entry if request bit is set.
> 
> Ok, so then you prefer to add the inline functions to read/write to the
> vmcs12
> fields, (to set the request bit if shadowed field changed) and you are not
> concerned
> about any merge/rebase mess. I will work on this direction.
> I'll first send an independent patch to introduce the accessors. Once you
> apply this patch, I'll continue and send you v2 patches for shadow vmcs.
> 
> Do you agree ?
Yes.

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-12 10:48                   ` Gleb Natapov
@ 2013-04-14  9:51                     ` Abel Gordon
  2013-04-14 10:00                       ` Gleb Natapov
  0 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-04-14  9:51 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: dongxiao.xu, jun.nakajima, kvm, kvm-owner, Nadav Har'El, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:

> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> >
> > Ok, so then you prefer to add the inline functions to read/write to the
> > vmcs12
> > fields, (to set the request bit if shadowed field changed) and you are
not
> > concerned
> > about any merge/rebase mess. I will work on this direction.
> > I'll first send an independent patch to introduce the accessors. Once
you
> > apply this patch, I'll continue and send you v2 patches for shadow
vmcs.
> >
> > Do you agree ?
> Yes.

Looking again at the code it seems like we could avoid adding the
accessors.
We could just set a flag in nested_vmx_vmexit and
nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset the flag and
call copy_vmcs12_to_shadow (if required).

What do you think ?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14  9:51                     ` Abel Gordon
@ 2013-04-14 10:00                       ` Gleb Natapov
  2013-04-14 10:07                         ` Gleb Natapov
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-14 10:00 UTC (permalink / raw)
  To: Abel Gordon
  Cc: dongxiao.xu, jun.nakajima, kvm, kvm-owner, Nadav Har'El, owasserm

On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> 
> > On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > >
> > > Ok, so then you prefer to add the inline functions to read/write to the
> > > vmcs12
> > > fields, (to set the request bit if shadowed field changed) and you are
> not
> > > concerned
> > > about any merge/rebase mess. I will work on this direction.
> > > I'll first send an independent patch to introduce the accessors. Once
> you
> > > apply this patch, I'll continue and send you v2 patches for shadow
> vmcs.
> > >
> > > Do you agree ?
> > Yes.
> 
> Looking again at the code it seems like we could avoid adding the
> accessors.
> We could just set a flag in nested_vmx_vmexit and
> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset the flag and
> call copy_vmcs12_to_shadow (if required).
> 
> What do you think ?
Good idea! With accessors we can do further optimization by copying only
things that changed, but it will be premature optimization at this
point.

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:00                       ` Gleb Natapov
@ 2013-04-14 10:07                         ` Gleb Natapov
  2013-04-14 10:27                           ` Jan Kiszka
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-14 10:07 UTC (permalink / raw)
  To: Abel Gordon
  Cc: dongxiao.xu, jun.nakajima, kvm, kvm-owner, Nadav Har'El, owasserm

On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> > 
> > 
> > Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> > 
> > > On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > > >
> > > > Ok, so then you prefer to add the inline functions to read/write to the
> > > > vmcs12
> > > > fields, (to set the request bit if shadowed field changed) and you are
> > not
> > > > concerned
> > > > about any merge/rebase mess. I will work on this direction.
> > > > I'll first send an independent patch to introduce the accessors. Once
> > you
> > > > apply this patch, I'll continue and send you v2 patches for shadow
> > vmcs.
> > > >
> > > > Do you agree ?
> > > Yes.
> > 
> > Looking again at the code it seems like we could avoid adding the
> > accessors.
> > We could just set a flag in nested_vmx_vmexit and
> > nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset the flag and
> > call copy_vmcs12_to_shadow (if required).
> > 
> > What do you think ?
> Good idea! With accessors we can do further optimization by copying only
> things that changed, but it will be premature optimization at this
> point.
> 
Actually this is good idea only if we know for sure that VMX emulation
changes vmcs12 only during guest entry/exit. Is this the case? I think
so.

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:07                         ` Gleb Natapov
@ 2013-04-14 10:27                           ` Jan Kiszka
  2013-04-14 10:34                             ` Abel Gordon
  2013-04-14 10:34                             ` Gleb Natapov
  0 siblings, 2 replies; 46+ messages in thread
From: Jan Kiszka @ 2013-04-14 10:27 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Abel Gordon, dongxiao.xu, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

On 2013-04-14 12:07, Gleb Natapov wrote:
> On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
>> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
>>>
>>>
>>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
>>>
>>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
>>>>>
>>>>> Ok, so then you prefer to add the inline functions to read/write to the
>>>>> vmcs12
>>>>> fields, (to set the request bit if shadowed field changed) and you are
>>> not
>>>>> concerned
>>>>> about any merge/rebase mess. I will work on this direction.
>>>>> I'll first send an independent patch to introduce the accessors. Once
>>> you
>>>>> apply this patch, I'll continue and send you v2 patches for shadow
>>> vmcs.
>>>>>
>>>>> Do you agree ?
>>>> Yes.
>>>
>>> Looking again at the code it seems like we could avoid adding the
>>> accessors.
>>> We could just set a flag in nested_vmx_vmexit and
>>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset the flag and
>>> call copy_vmcs12_to_shadow (if required).
>>>
>>> What do you think ?
>> Good idea! With accessors we can do further optimization by copying only
>> things that changed, but it will be premature optimization at this
>> point.
>>
> Actually this is good idea only if we know for sure that VMX emulation
> changes vmcs12 only during guest entry/exit. Is this the case? I think
> so.

Some vmcs12 fields that are exposed to L1 are changed outside L2<->L1
transitions. What comes to my mind: L0 emulates some change that L1 does
not trap, e.g. CRx accesses. Or what do you mean?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:27                           ` Jan Kiszka
@ 2013-04-14 10:34                             ` Abel Gordon
  2013-04-14 10:34                             ` Gleb Natapov
  1 sibling, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-04-14 10:34 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: dongxiao.xu, Gleb Natapov, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm



Jan Kiszka <jan.kiszka@web.de> wrote on 14/04/2013 01:27:10 PM:


> On 2013-04-14 12:07, Gleb Natapov wrote:
> > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> >>>
> >>>
> >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> >>>
> >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> >>>>>
> >>>>> Ok, so then you prefer to add the inline functions to read/write to
the
> >>>>> vmcs12
> >>>>> fields, (to set the request bit if shadowed field changed) and you
are
> >>> not
> >>>>> concerned
> >>>>> about any merge/rebase mess. I will work on this direction.
> >>>>> I'll first send an independent patch to introduce the accessors.
Once
> >>> you
> >>>>> apply this patch, I'll continue and send you v2 patches for shadow
> >>> vmcs.
> >>>>>
> >>>>> Do you agree ?
> >>>> Yes.
> >>>
> >>> Looking again at the code it seems like we could avoid adding the
> >>> accessors.
> >>> We could just set a flag in nested_vmx_vmexit and
> >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset
> the flag and
> >>> call copy_vmcs12_to_shadow (if required).
> >>>
> >>> What do you think ?
> >> Good idea! With accessors we can do further optimization by copying
only
> >> things that changed, but it will be premature optimization at this
> >> point.
> >>
> > Actually this is good idea only if we know for sure that VMX emulation
> > changes vmcs12 only during guest entry/exit. Is this the case? I think
> > so.
>
> Some vmcs12 fields that are exposed to L1 are changed outside L2<->L1
> transitions. What comes to my mind: L0 emulates some change that L1 does
> not trap, e.g. CRx accesses. Or what do you mean?

If L0 emulates some change that L1 does not trap, L1 will not read (vmread)
the change until we switch to L1 (for other exit).
So, later we will do nested_vmx_vmexit and sync the shadowed field.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:27                           ` Jan Kiszka
  2013-04-14 10:34                             ` Abel Gordon
@ 2013-04-14 10:34                             ` Gleb Natapov
  2013-04-14 10:49                               ` Abel Gordon
  1 sibling, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-14 10:34 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Abel Gordon, dongxiao.xu, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm

On Sun, Apr 14, 2013 at 12:27:10PM +0200, Jan Kiszka wrote:
> On 2013-04-14 12:07, Gleb Natapov wrote:
> > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> >>>
> >>>
> >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> >>>
> >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> >>>>>
> >>>>> Ok, so then you prefer to add the inline functions to read/write to the
> >>>>> vmcs12
> >>>>> fields, (to set the request bit if shadowed field changed) and you are
> >>> not
> >>>>> concerned
> >>>>> about any merge/rebase mess. I will work on this direction.
> >>>>> I'll first send an independent patch to introduce the accessors. Once
> >>> you
> >>>>> apply this patch, I'll continue and send you v2 patches for shadow
> >>> vmcs.
> >>>>>
> >>>>> Do you agree ?
> >>>> Yes.
> >>>
> >>> Looking again at the code it seems like we could avoid adding the
> >>> accessors.
> >>> We could just set a flag in nested_vmx_vmexit and
> >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset the flag and
> >>> call copy_vmcs12_to_shadow (if required).
> >>>
> >>> What do you think ?
> >> Good idea! With accessors we can do further optimization by copying only
> >> things that changed, but it will be premature optimization at this
> >> point.
> >>
> > Actually this is good idea only if we know for sure that VMX emulation
> > changes vmcs12 only during guest entry/exit. Is this the case? I think
> > so.
> 
> Some vmcs12 fields that are exposed to L1 are changed outside L2<->L1
> transitions. What comes to my mind: L0 emulates some change that L1 does
> not trap, e.g. CRx accesses. Or what do you mean?
> 
If vmcs12 is changed by L0 while L2 is running this is OK. If L0 changes
shadowed vmcs12 field while L1 is running this is not OK. So for
instance if field XXX is R/W but we allow only read to be shadowed then
write emulation in L0 has to sync new value back to shadow before going
back to L1.

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:34                             ` Gleb Natapov
@ 2013-04-14 10:49                               ` Abel Gordon
  2013-04-14 11:16                                 ` Gleb Natapov
  0 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-04-14 10:49 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: dongxiao.xu, Jan Kiszka, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 01:34:52 PM:

> On Sun, Apr 14, 2013 at 12:27:10PM +0200, Jan Kiszka wrote:
> > On 2013-04-14 12:07, Gleb Natapov wrote:
> > > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> > >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> > >>>
> > >>>
> > >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> > >>>
> > >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > >>>>>
> > >>>>> Ok, so then you prefer to add the inline functions to read/
> write to the
> > >>>>> vmcs12
> > >>>>> fields, (to set the request bit if shadowed field changed) and
you are
> > >>> not
> > >>>>> concerned
> > >>>>> about any merge/rebase mess. I will work on this direction.
> > >>>>> I'll first send an independent patch to introduce the accessors.
Once
> > >>> you
> > >>>>> apply this patch, I'll continue and send you v2 patches for
shadow
> > >>> vmcs.
> > >>>>>
> > >>>>> Do you agree ?
> > >>>> Yes.
> > >>>
> > >>> Looking again at the code it seems like we could avoid adding the
> > >>> accessors.
> > >>> We could just set a flag in nested_vmx_vmexit and
> > >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset
> the flag and
> > >>> call copy_vmcs12_to_shadow (if required).
> > >>>
> > >>> What do you think ?
> > >> Good idea! With accessors we can do further optimization by copying
only
> > >> things that changed, but it will be premature optimization at this
> > >> point.
> > >>
> > > Actually this is good idea only if we know for sure that VMX
emulation
> > > changes vmcs12 only during guest entry/exit. Is this the case? I
think
> > > so.
> >
> > Some vmcs12 fields that are exposed to L1 are changed outside L2<->L1
> > transitions. What comes to my mind: L0 emulates some change that L1
does
> > not trap, e.g. CRx accesses. Or what do you mean?
> >
> If vmcs12 is changed by L0 while L2 is running this is OK. If L0 changes
> shadowed vmcs12 field while L1 is running this is not OK. So for
> instance if field XXX is R/W but we allow only read to be shadowed then
> write emulation in L0 has to sync new value back to shadow before going
> back to L1.

Exactly.

While L1 runs (L1 root mode), L0 does NOT change VMCS12 (unless L1
executes vmwrite).
VMCS12 fields are changed once L1 launches/resumes L2 and there is a
L2 exit.

L0 can change VMCS12 while it handles a L2 exit directly which is not
forwarded to L1. But that's OK because L1 will eventually see the change
once we switch to L1 due to other exit that L0 let L1 handle.

L0 should NOT change VMCS12 fields if L1 is running and L1 didn't
execute any vmlaunch, vmresume or vmwrite instruction.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 10:49                               ` Abel Gordon
@ 2013-04-14 11:16                                 ` Gleb Natapov
  2013-04-14 13:47                                   ` Abel Gordon
  0 siblings, 1 reply; 46+ messages in thread
From: Gleb Natapov @ 2013-04-14 11:16 UTC (permalink / raw)
  To: Abel Gordon
  Cc: dongxiao.xu, Jan Kiszka, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm

On Sun, Apr 14, 2013 at 01:49:44PM +0300, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 01:34:52 PM:
> 
> > On Sun, Apr 14, 2013 at 12:27:10PM +0200, Jan Kiszka wrote:
> > > On 2013-04-14 12:07, Gleb Natapov wrote:
> > > > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> > > >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> > > >>>
> > > >>>
> > > >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> > > >>>
> > > >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > > >>>>>
> > > >>>>> Ok, so then you prefer to add the inline functions to read/
> > write to the
> > > >>>>> vmcs12
> > > >>>>> fields, (to set the request bit if shadowed field changed) and
> you are
> > > >>> not
> > > >>>>> concerned
> > > >>>>> about any merge/rebase mess. I will work on this direction.
> > > >>>>> I'll first send an independent patch to introduce the accessors.
> Once
> > > >>> you
> > > >>>>> apply this patch, I'll continue and send you v2 patches for
> shadow
> > > >>> vmcs.
> > > >>>>>
> > > >>>>> Do you agree ?
> > > >>>> Yes.
> > > >>>
> > > >>> Looking again at the code it seems like we could avoid adding the
> > > >>> accessors.
> > > >>> We could just set a flag in nested_vmx_vmexit and
> > > >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset
> > the flag and
> > > >>> call copy_vmcs12_to_shadow (if required).
> > > >>>
> > > >>> What do you think ?
> > > >> Good idea! With accessors we can do further optimization by copying
> only
> > > >> things that changed, but it will be premature optimization at this
> > > >> point.
> > > >>
> > > > Actually this is good idea only if we know for sure that VMX
> emulation
> > > > changes vmcs12 only during guest entry/exit. Is this the case? I
> think
> > > > so.
> > >
> > > Some vmcs12 fields that are exposed to L1 are changed outside L2<->L1
> > > transitions. What comes to my mind: L0 emulates some change that L1
> does
> > > not trap, e.g. CRx accesses. Or what do you mean?
> > >
> > If vmcs12 is changed by L0 while L2 is running this is OK. If L0 changes
> > shadowed vmcs12 field while L1 is running this is not OK. So for
> > instance if field XXX is R/W but we allow only read to be shadowed then
> > write emulation in L0 has to sync new value back to shadow before going
> > back to L1.
> 
> Exactly.
> 
> While L1 runs (L1 root mode), L0 does NOT change VMCS12 (unless L1
> executes vmwrite).
> VMCS12 fields are changed once L1 launches/resumes L2 and there is a
> L2 exit.
> 
> L0 can change VMCS12 while it handles a L2 exit directly which is not
> forwarded to L1. But that's OK because L1 will eventually see the change
> once we switch to L1 due to other exit that L0 let L1 handle.
> 
> L0 should NOT change VMCS12 fields if L1 is running and L1 didn't
> execute any vmlaunch, vmresume or vmwrite instruction.
> 
The question is: is there a case like I described above when we shadow
only reads from R/W field and handle vmwrites in L0?

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 11:16                                 ` Gleb Natapov
@ 2013-04-14 13:47                                   ` Abel Gordon
  2013-04-14 14:41                                     ` Gleb Natapov
  0 siblings, 1 reply; 46+ messages in thread
From: Abel Gordon @ 2013-04-14 13:47 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: dongxiao.xu, Jan Kiszka, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm



Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 02:16:02 PM:

> On Sun, Apr 14, 2013 at 01:49:44PM +0300, Abel Gordon wrote:
> >
> >
> > Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 01:34:52 PM:
> >
> > > On Sun, Apr 14, 2013 at 12:27:10PM +0200, Jan Kiszka wrote:
> > > > On 2013-04-14 12:07, Gleb Natapov wrote:
> > > > > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> > > > >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> > > > >>>
> > > > >>>
> > > > >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> > > > >>>
> > > > >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > > > >>>>>
> > > > >>>>> Ok, so then you prefer to add the inline functions to read/
> > > write to the
> > > > >>>>> vmcs12
> > > > >>>>> fields, (to set the request bit if shadowed field changed)
and
> > you are
> > > > >>> not
> > > > >>>>> concerned
> > > > >>>>> about any merge/rebase mess. I will work on this direction.
> > > > >>>>> I'll first send an independent patch to introduce the
accessors.
> > Once
> > > > >>> you
> > > > >>>>> apply this patch, I'll continue and send you v2 patches for
> > shadow
> > > > >>> vmcs.
> > > > >>>>>
> > > > >>>>> Do you agree ?
> > > > >>>> Yes.
> > > > >>>
> > > > >>> Looking again at the code it seems like we could avoid adding
the
> > > > >>> accessors.
> > > > >>> We could just set a flag in nested_vmx_vmexit and
> > > > >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset
> > > the flag and
> > > > >>> call copy_vmcs12_to_shadow (if required).
> > > > >>>
> > > > >>> What do you think ?
> > > > >> Good idea! With accessors we can do further optimization by
copying
> > only
> > > > >> things that changed, but it will be premature optimization at
this
> > > > >> point.
> > > > >>
> > > > > Actually this is good idea only if we know for sure that VMX
> > emulation
> > > > > changes vmcs12 only during guest entry/exit. Is this the case? I
> > think
> > > > > so.
> > > >
> > > > Some vmcs12 fields that are exposed to L1 are changed outside L2<->
L1
> > > > transitions. What comes to my mind: L0 emulates some change that L1
> > does
> > > > not trap, e.g. CRx accesses. Or what do you mean?
> > > >
> > > If vmcs12 is changed by L0 while L2 is running this is OK. If L0
changes
> > > shadowed vmcs12 field while L1 is running this is not OK. So for
> > > instance if field XXX is R/W but we allow only read to be shadowed
then
> > > write emulation in L0 has to sync new value back to shadow before
going
> > > back to L1.
> >
> > Exactly.
> >
> > While L1 runs (L1 root mode), L0 does NOT change VMCS12 (unless L1
> > executes vmwrite).
> > VMCS12 fields are changed once L1 launches/resumes L2 and there is a
> > L2 exit.
> >
> > L0 can change VMCS12 while it handles a L2 exit directly which is not
> > forwarded to L1. But that's OK because L1 will eventually see the
change
> > once we switch to L1 due to other exit that L0 let L1 handle.
> >
> > L0 should NOT change VMCS12 fields if L1 is running and L1 didn't
> > execute any vmlaunch, vmresume or vmwrite instruction.
> >
> The question is: is there a case like I described above when we shadow
> only reads from R/W field and handle vmwrites in L0?

No, there is no such thing with the patches I submitted.
The shadow VMCS patches assumes that only R/O fields can be shadowed
for read and not for write (the shadow_read_only_fields array). R/O
fields can't be written by L1 thus L0 will trap vmwrite
to any r/o field and emulate a fail
(VMXERR_VMWRITE_READ_ONLY_VMCS_COMPONENT).

Writable fields are shadowed for both read and write
(the shadow_read_write_fields array).

There are no writable fields shadowed for write and not for read.
There are no writable fields shadowed for read and not for write







^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-14 13:47                                   ` Abel Gordon
@ 2013-04-14 14:41                                     ` Gleb Natapov
  0 siblings, 0 replies; 46+ messages in thread
From: Gleb Natapov @ 2013-04-14 14:41 UTC (permalink / raw)
  To: Abel Gordon
  Cc: dongxiao.xu, Jan Kiszka, jun.nakajima, kvm, kvm-owner,
	Nadav Har'El, owasserm

On Sun, Apr 14, 2013 at 04:47:48PM +0300, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 02:16:02 PM:
> 
> > On Sun, Apr 14, 2013 at 01:49:44PM +0300, Abel Gordon wrote:
> > >
> > >
> > > Gleb Natapov <gleb@redhat.com> wrote on 14/04/2013 01:34:52 PM:
> > >
> > > > On Sun, Apr 14, 2013 at 12:27:10PM +0200, Jan Kiszka wrote:
> > > > > On 2013-04-14 12:07, Gleb Natapov wrote:
> > > > > > On Sun, Apr 14, 2013 at 01:00:10PM +0300, Gleb Natapov wrote:
> > > > > >> On Sun, Apr 14, 2013 at 12:51:34PM +0300, Abel Gordon wrote:
> > > > > >>>
> > > > > >>>
> > > > > >>> Gleb Natapov <gleb@redhat.com> wrote on 12/04/2013 01:48:04 PM:
> > > > > >>>
> > > > > >>>> On Fri, Apr 12, 2013 at 01:44:14PM +0300, Abel Gordon wrote:
> > > > > >>>>>
> > > > > >>>>> Ok, so then you prefer to add the inline functions to read/
> > > > write to the
> > > > > >>>>> vmcs12
> > > > > >>>>> fields, (to set the request bit if shadowed field changed)
> and
> > > you are
> > > > > >>> not
> > > > > >>>>> concerned
> > > > > >>>>> about any merge/rebase mess. I will work on this direction.
> > > > > >>>>> I'll first send an independent patch to introduce the
> accessors.
> > > Once
> > > > > >>> you
> > > > > >>>>> apply this patch, I'll continue and send you v2 patches for
> > > shadow
> > > > > >>> vmcs.
> > > > > >>>>>
> > > > > >>>>> Do you agree ?
> > > > > >>>> Yes.
> > > > > >>>
> > > > > >>> Looking again at the code it seems like we could avoid adding
> the
> > > > > >>> accessors.
> > > > > >>> We could just set a flag in nested_vmx_vmexit and
> > > > > >>> nested_vmx_entry_failure. Then, in vmx_vcpu_run we check/reset
> > > > the flag and
> > > > > >>> call copy_vmcs12_to_shadow (if required).
> > > > > >>>
> > > > > >>> What do you think ?
> > > > > >> Good idea! With accessors we can do further optimization by
> copying
> > > only
> > > > > >> things that changed, but it will be premature optimization at
> this
> > > > > >> point.
> > > > > >>
> > > > > > Actually this is good idea only if we know for sure that VMX
> > > emulation
> > > > > > changes vmcs12 only during guest entry/exit. Is this the case? I
> > > think
> > > > > > so.
> > > > >
> > > > > Some vmcs12 fields that are exposed to L1 are changed outside L2<->
> L1
> > > > > transitions. What comes to my mind: L0 emulates some change that L1
> > > does
> > > > > not trap, e.g. CRx accesses. Or what do you mean?
> > > > >
> > > > If vmcs12 is changed by L0 while L2 is running this is OK. If L0
> changes
> > > > shadowed vmcs12 field while L1 is running this is not OK. So for
> > > > instance if field XXX is R/W but we allow only read to be shadowed
> then
> > > > write emulation in L0 has to sync new value back to shadow before
> going
> > > > back to L1.
> > >
> > > Exactly.
> > >
> > > While L1 runs (L1 root mode), L0 does NOT change VMCS12 (unless L1
> > > executes vmwrite).
> > > VMCS12 fields are changed once L1 launches/resumes L2 and there is a
> > > L2 exit.
> > >
> > > L0 can change VMCS12 while it handles a L2 exit directly which is not
> > > forwarded to L1. But that's OK because L1 will eventually see the
> change
> > > once we switch to L1 due to other exit that L0 let L1 handle.
> > >
> > > L0 should NOT change VMCS12 fields if L1 is running and L1 didn't
> > > execute any vmlaunch, vmresume or vmwrite instruction.
> > >
> > The question is: is there a case like I described above when we shadow
> > only reads from R/W field and handle vmwrites in L0?
> 
> No, there is no such thing with the patches I submitted.
> The shadow VMCS patches assumes that only R/O fields can be shadowed
> for read and not for write (the shadow_read_only_fields array). R/O
> fields can't be written by L1 thus L0 will trap vmwrite
> to any r/o field and emulate a fail
> (VMXERR_VMWRITE_READ_ONLY_VMCS_COMPONENT).
> 
> Writable fields are shadowed for both read and write
> (the shadow_read_write_fields array).
> 
> There are no writable fields shadowed for write and not for read.
> There are no writable fields shadowed for read and not for write
> 
Got it, thanks. It may make sense for some fields to be shadowed for
read, but not for write, but they can be individually synced with shadow
during vmwrite emulation.

--
			Gleb.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
@ 2013-04-18 11:39 ` Abel Gordon
  0 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-04-18 11:39 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Synchronize between the VMCS12 software controlled structure and the
processor-specific shadow vmcs

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 13:31:35.000000000 +0300
@@ -356,6 +356,11 @@ struct nested_vmx {
 	struct page *current_vmcs12_page;
 	struct vmcs12 *current_vmcs12;
 	struct vmcs *current_shadow_vmcs;
+	/*
+	 * Indicates if the shadow vmcs must be updated with the
+	 * data hold by vmcs12
+	 */
+	bool sync_shadow_vmcs;
 
 	/* vmcs02_list cache of VMCSs recently used to run L2 guests */
 	struct list_head vmcs02_pool;
@@ -5611,6 +5616,14 @@ static int nested_vmx_check_permission(s
 
 static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 {
+	if (enable_shadow_vmcs) {
+		if (vmx->nested.current_vmcs12 != NULL) {
+			/* copy to memory all shadowed fields in case
+			   they were modified */
+			copy_shadow_to_vmcs12(vmx);
+			vmx->nested.sync_shadow_vmcs = false;
+		}
+	}
 	kunmap(vmx->nested.current_vmcs12_page);
 	nested_release_page(vmx->nested.current_vmcs12_page);
 }
@@ -5739,6 +5752,10 @@ static void nested_vmx_failValid(struct 
 			    X86_EFLAGS_SF | X86_EFLAGS_OF))
 			| X86_EFLAGS_ZF);
 	get_vmcs12(vcpu)->vm_instruction_error = vm_instruction_error;
+	/*
+	 * We don't need to force a shadow sync because
+	 * VM_INSTRUCTION_ERROR is not shadowed
+	 */
 }
 
 /* Emulate the VMCLEAR instruction */
@@ -6137,6 +6154,9 @@ static int handle_vmptrld(struct kvm_vcp
 		vmx->nested.current_vmptr = vmptr;
 		vmx->nested.current_vmcs12 = new_vmcs12;
 		vmx->nested.current_vmcs12_page = page;
+		if (enable_shadow_vmcs) {
+			vmx->nested.sync_shadow_vmcs = true;
+		}
 	}
 
 	nested_vmx_succeed(vcpu);
@@ -6895,6 +6915,11 @@ static void __noclone vmx_vcpu_run(struc
 	if (vmx->emulation_required)
 		return;
 
+	if (vmx->nested.sync_shadow_vmcs) {
+		copy_vmcs12_to_shadow(vmx);
+		vmx->nested.sync_shadow_vmcs = false;
+	}
+
 	if (test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty))
 		vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
 	if (test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty))
@@ -7504,6 +7529,9 @@ static int nested_vmx_run(struct kvm_vcp
 	skip_emulated_instruction(vcpu);
 	vmcs12 = get_vmcs12(vcpu);
 
+	if (enable_shadow_vmcs)
+		copy_shadow_to_vmcs12(vmx);
+
 	/*
 	 * The nested entry process starts with enforcing various prerequisites
 	 * on vmcs12 as required by the Intel SDM, and act appropriately when
@@ -7950,6 +7978,8 @@ static void nested_vmx_vmexit(struct kvm
 		nested_vmx_failValid(vcpu, vmcs_read32(VM_INSTRUCTION_ERROR));
 	} else
 		nested_vmx_succeed(vcpu);
+	if (enable_shadow_vmcs)
+		vmx->nested.sync_shadow_vmcs = true;
 }
 
 /*
@@ -7967,6 +7997,8 @@ static void nested_vmx_entry_failure(str
 	vmcs12->vm_exit_reason = reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
 	vmcs12->exit_qualification = qualification;
 	nested_vmx_succeed(vcpu);
+	if (enable_shadow_vmcs)
+		to_vmx(vcpu)->nested.sync_shadow_vmcs = true;
 }
 
 static int vmx_check_intercept(struct kvm_vcpu *vcpu,


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
  2013-04-18  8:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v4 Abel Gordon
@ 2013-04-18  8:39 ` Abel Gordon
  0 siblings, 0 replies; 46+ messages in thread
From: Abel Gordon @ 2013-04-18  8:39 UTC (permalink / raw)
  To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg

Synchronize between the VMCS12 software controlled structure and the
processor-specific shadow vmcs

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2013-04-18 11:28:23.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2013-04-18 11:28:23.000000000 +0300
@@ -356,6 +356,11 @@ struct nested_vmx {
 	struct page *current_vmcs12_page;
 	struct vmcs12 *current_vmcs12;
 	struct vmcs *current_shadow_vmcs;
+	/*
+	 * Indicates if the shadow vmcs must be updated with the
+	 * data hold by vmcs12
+	 */
+	bool sync_shadow_vmcs;
 
 	/* vmcs02_list cache of VMCSs recently used to run L2 guests */
 	struct list_head vmcs02_pool;
@@ -5611,6 +5616,14 @@ static int nested_vmx_check_permission(s
 
 static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 {
+	if (enable_shadow_vmcs) {
+		if (vmx->nested.current_vmcs12 != NULL) {
+			/* copy to memory all shadowed fields in case
+			   they were modified */
+			copy_shadow_to_vmcs12(vmx);
+			vmx->nested.sync_shadow_vmcs = false;
+		}
+	}
 	kunmap(vmx->nested.current_vmcs12_page);
 	nested_release_page(vmx->nested.current_vmcs12_page);
 }
@@ -5739,6 +5752,10 @@ static void nested_vmx_failValid(struct 
 			    X86_EFLAGS_SF | X86_EFLAGS_OF))
 			| X86_EFLAGS_ZF);
 	get_vmcs12(vcpu)->vm_instruction_error = vm_instruction_error;
+	/*
+	 * We don't need to force a shadow sync because
+	 * VM_INSTRUCTION_ERROR is not shadowed
+	 */
 }
 
 /* Emulate the VMCLEAR instruction */
@@ -6137,6 +6154,9 @@ static int handle_vmptrld(struct kvm_vcp
 		vmx->nested.current_vmptr = vmptr;
 		vmx->nested.current_vmcs12 = new_vmcs12;
 		vmx->nested.current_vmcs12_page = page;
+		if (enable_shadow_vmcs) {
+			vmx->nested.sync_shadow_vmcs = true;
+		}
 	}
 
 	nested_vmx_succeed(vcpu);
@@ -6895,6 +6915,11 @@ static void __noclone vmx_vcpu_run(struc
 	if (vmx->emulation_required)
 		return;
 
+	if (vmx->nested.sync_shadow_vmcs) {
+		copy_vmcs12_to_shadow(vmx);
+		vmx->nested.sync_shadow_vmcs = false;
+	}
+
 	if (test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty))
 		vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
 	if (test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty))
@@ -7504,6 +7529,9 @@ static int nested_vmx_run(struct kvm_vcp
 	skip_emulated_instruction(vcpu);
 	vmcs12 = get_vmcs12(vcpu);
 
+	if (enable_shadow_vmcs)
+		copy_shadow_to_vmcs12(vmx);
+
 	/*
 	 * The nested entry process starts with enforcing various prerequisites
 	 * on vmcs12 as required by the Intel SDM, and act appropriately when
@@ -7950,6 +7978,8 @@ static void nested_vmx_vmexit(struct kvm
 		nested_vmx_failValid(vcpu, vmcs_read32(VM_INSTRUCTION_ERROR));
 	} else
 		nested_vmx_succeed(vcpu);
+	if (enable_shadow_vmcs)
+		vmx->nested.sync_shadow_vmcs = true;
 }
 
 /*
@@ -7967,6 +7997,8 @@ static void nested_vmx_entry_failure(str
 	vmcs12->vm_exit_reason = reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
 	vmcs12->exit_qualification = qualification;
 	nested_vmx_succeed(vcpu);
+	if (enable_shadow_vmcs)
+		to_vmx(vcpu)->nested.sync_shadow_vmcs = true;
 }
 
 static int vmx_check_intercept(struct kvm_vcpu *vcpu,


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2013-04-18 11:39 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-10 16:03 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Abel Gordon
2013-03-10 16:03 ` [PATCH 01/11] KVM: nVMX: Stats counters for nVMX Abel Gordon
2013-04-08 10:27   ` Gleb Natapov
2013-04-10 19:08     ` Abel Gordon
2013-04-11  6:10       ` Gleb Natapov
2013-03-10 16:04 ` [PATCH 02/11] KVM: nVMX: Shadow-vmcs control fields/bits Abel Gordon
2013-03-10 16:04 ` [PATCH 03/11] KVM: nVMX: Detect shadow-vmcs capability Abel Gordon
2013-04-08 11:12   ` Gleb Natapov
2013-04-10 19:14     ` Abel Gordon
2013-03-10 16:05 ` [PATCH 04/11] KVM: nVMX: Introduce vmread and vmwrite bitmaps Abel Gordon
2013-04-08 11:50   ` Gleb Natapov
2013-04-10 19:14     ` Abel Gordon
2013-03-10 16:05 ` [PATCH 05/11] KVM: nVMX: Refactor handle_vmwrite Abel Gordon
2013-04-09 11:05   ` Gleb Natapov
2013-04-10 20:35     ` Abel Gordon
2013-03-10 16:06 ` [PATCH 06/11] KVM: nVMX: Allocate shadow vmcs Abel Gordon
2013-03-10 16:06 ` [PATCH 07/11] KVM: nVMX: Release " Abel Gordon
2013-03-10 16:07 ` [PATCH 08/11] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 Abel Gordon
2013-03-10 16:07 ` [PATCH 09/11] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs Abel Gordon
2013-04-09 12:47   ` Gleb Natapov
2013-04-10 19:15     ` Abel Gordon
2013-03-10 16:08 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the " Abel Gordon
2013-03-10 22:43   ` Nadav Har'El
2013-03-11  7:54     ` Abel Gordon
2013-04-09 13:14       ` Gleb Natapov
2013-04-10 19:15         ` Abel Gordon
2013-04-11  6:54           ` Gleb Natapov
2013-04-12 10:26             ` Abel Gordon
2013-04-12 10:31               ` Gleb Natapov
2013-04-12 10:44                 ` Abel Gordon
2013-04-12 10:48                   ` Gleb Natapov
2013-04-14  9:51                     ` Abel Gordon
2013-04-14 10:00                       ` Gleb Natapov
2013-04-14 10:07                         ` Gleb Natapov
2013-04-14 10:27                           ` Jan Kiszka
2013-04-14 10:34                             ` Abel Gordon
2013-04-14 10:34                             ` Gleb Natapov
2013-04-14 10:49                               ` Abel Gordon
2013-04-14 11:16                                 ` Gleb Natapov
2013-04-14 13:47                                   ` Abel Gordon
2013-04-14 14:41                                     ` Gleb Natapov
2013-03-10 16:08 ` [PATCH 11/11] KVM: nVMX: Enable and disable shadow vmcs functionality Abel Gordon
2013-03-21 12:22 ` [PATCH 0/11] KVM: nVMX: shadow VMCS support, v1 Orit Wasserman
2013-03-21 13:56   ` Abel Gordon
2013-04-18  8:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v4 Abel Gordon
2013-04-18  8:39 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs Abel Gordon
2013-04-18 11:34 [PATCH 0/11] KVM: nVMX: shadow VMCS support, v5 Abel Gordon
2013-04-18 11:39 ` [PATCH 10/11] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs Abel Gordon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.