linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Enlightened VMCS support for KVM on Hyper-V
@ 2018-02-26 17:11 Vitaly Kuznetsov
  2018-02-26 17:11 ` [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE Vitaly Kuznetsov
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2018-02-26 17:11 UTC (permalink / raw)
  To: kvm
  Cc: x86, Paolo Bonzini, Radim Krčmář,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

Changes since v1:
- The only comment I got for v1 was from kbuild test robot. The issue
  was addressed by moving HV_X64_ENLIGHTENED_VMCS_RECOMMENDED definition
  to PATCH2.
- Rebased to current kvm/queue.

When running nested KVM on Hyper-V it's possible to use so called
'Enlightened VMCS' and do normal memory reads/writes instead of
doing VMWRITE/VMREAD instructions. In addition, clean field mask
provides a huge room for optimization on L0's side.

Tight CPUID loop test shows significant speedup (current kvm/queue on
E5-2667 v4 @ 3.20GHz):
 Before: 20766 cycles
 After: 8912 cycles

The series is based on current kvm/queue tree.

Ladi Prosek (1):
  x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to
    HV_X64_MSR_VP_ASSIST_PAGE

Vitaly Kuznetsov (4):
  x86/hyper-v: allocate and use Virtual Processor Assist Pages
  x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
  x86/hyper-v: detect nested features
  x86/kvm: use Enlightened VMCS when running on Hyper-V

 arch/x86/hyperv/hv_init.c          |  33 +++
 arch/x86/include/asm/mshyperv.h    |  12 +
 arch/x86/include/uapi/asm/hyperv.h | 223 ++++++++++++++-
 arch/x86/kernel/cpu/mshyperv.c     |   3 +
 arch/x86/kvm/hyperv.c              |   8 +-
 arch/x86/kvm/lapic.h               |   2 +-
 arch/x86/kvm/vmx.c                 | 561 ++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                 |   2 +-
 8 files changed, 825 insertions(+), 19 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE
  2018-02-26 17:11 [PATCH v2 0/5] Enlightened VMCS support for KVM on Hyper-V Vitaly Kuznetsov
@ 2018-02-26 17:11 ` Vitaly Kuznetsov
  2018-03-07 16:19   ` Radim Krčmář
  2018-02-26 17:11 ` [PATCH v2 2/5] x86/hyper-v: allocate and use Virtual Processor Assist Pages Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Vitaly Kuznetsov @ 2018-02-26 17:11 UTC (permalink / raw)
  To: kvm
  Cc: x86, Paolo Bonzini, Radim Krčmář,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

From: Ladi Prosek <lprosek@redhat.com>

The assist page has been used only for the paravirtual EOI so far, hence
the "APIC" in the MSR name. Renaming to match the Hyper-V TLFS where it's
called "Virtual VP Assist MSR".

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/uapi/asm/hyperv.h | 10 +++++-----
 arch/x86/kvm/hyperv.c              |  8 ++++----
 arch/x86/kvm/lapic.h               |  2 +-
 arch/x86/kvm/x86.c                 |  2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index 1c12aaf33915..45cc62352040 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -189,7 +189,7 @@
 #define HV_X64_MSR_EOI				0x40000070
 #define HV_X64_MSR_ICR				0x40000071
 #define HV_X64_MSR_TPR				0x40000072
-#define HV_X64_MSR_APIC_ASSIST_PAGE		0x40000073
+#define HV_X64_MSR_VP_ASSIST_PAGE		0x40000073
 
 /* Define synthetic interrupt controller model specific registers. */
 #define HV_X64_MSR_SCONTROL			0x40000080
@@ -275,10 +275,10 @@ struct hv_tsc_emulation_status {
 #define HVCALL_POST_MESSAGE			0x005c
 #define HVCALL_SIGNAL_EVENT			0x005d
 
-#define HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE		0x00000001
-#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT	12
-#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK	\
-		(~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
+#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE	0x00000001
+#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT	12
+#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK	\
+		(~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
 
 #define HV_X64_MSR_TSC_REFERENCE_ENABLE		0x00000001
 #define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT	12
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 8e38a6ef84cc..d99465d7002f 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1010,17 +1010,17 @@ static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)
 			return 1;
 		hv->vp_index = (u32)data;
 		break;
-	case HV_X64_MSR_APIC_ASSIST_PAGE: {
+	case HV_X64_MSR_VP_ASSIST_PAGE: {
 		u64 gfn;
 		unsigned long addr;
 
-		if (!(data & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE)) {
+		if (!(data & HV_X64_MSR_VP_ASSIST_PAGE_ENABLE)) {
 			hv->hv_vapic = data;
 			if (kvm_lapic_enable_pv_eoi(vcpu, 0))
 				return 1;
 			break;
 		}
-		gfn = data >> HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT;
+		gfn = data >> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT;
 		addr = kvm_vcpu_gfn_to_hva(vcpu, gfn);
 		if (kvm_is_error_hva(addr))
 			return 1;
@@ -1130,7 +1130,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 		return kvm_hv_vapic_msr_read(vcpu, APIC_ICR, pdata);
 	case HV_X64_MSR_TPR:
 		return kvm_hv_vapic_msr_read(vcpu, APIC_TASKPRI, pdata);
-	case HV_X64_MSR_APIC_ASSIST_PAGE:
+	case HV_X64_MSR_VP_ASSIST_PAGE:
 		data = hv->hv_vapic;
 		break;
 	case HV_X64_MSR_VP_RUNTIME:
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 56c36014f7b7..edce055e9fd7 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -109,7 +109,7 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
 
 static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.hyperv.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
+	return vcpu->arch.hyperv.hv_vapic & HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
 }
 
 int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1a3ed81031f1..1d7be99ced4a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1034,7 +1034,7 @@ static u32 emulated_msrs[] = {
 	HV_X64_MSR_VP_RUNTIME,
 	HV_X64_MSR_SCONTROL,
 	HV_X64_MSR_STIMER0_CONFIG,
-	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+	HV_X64_MSR_VP_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
 	MSR_KVM_PV_EOI_EN,
 
 	MSR_IA32_TSC_ADJUST,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/5] x86/hyper-v: allocate and use Virtual Processor Assist Pages
  2018-02-26 17:11 [PATCH v2 0/5] Enlightened VMCS support for KVM on Hyper-V Vitaly Kuznetsov
  2018-02-26 17:11 ` [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE Vitaly Kuznetsov
@ 2018-02-26 17:11 ` Vitaly Kuznetsov
  2018-02-26 17:11 ` [PATCH v2 3/5] x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2018-02-26 17:11 UTC (permalink / raw)
  To: kvm
  Cc: x86, Paolo Bonzini, Radim Krčmář,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

Virtual Processor Assist Pages usage allows us to do optimized EOI
processing for APIC, enable Enlightened VMCS support in KVM and more.
struct hv_vp_assist_page is defined according to the Hyper-V TLFS v5.0b.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
Changes since v1:
  move HV_X64_ENLIGHTENED_VMCS_RECOMMENDED definition to this patch
---
 arch/x86/hyperv/hv_init.c          | 33 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h    | 10 ++++++++++
 arch/x86/include/uapi/asm/hyperv.h | 13 +++++++++++++
 3 files changed, 56 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 2edc49e7409b..acf21fa93e2c 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -88,6 +88,9 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
 u32 *hv_vp_index;
 EXPORT_SYMBOL_GPL(hv_vp_index);
 
+struct hv_vp_assist_page **hv_vp_assist_page;
+EXPORT_SYMBOL_GPL(hv_vp_assist_page);
+
 u32 hv_max_vp_index;
 
 static int hv_cpu_init(unsigned int cpu)
@@ -101,6 +104,23 @@ static int hv_cpu_init(unsigned int cpu)
 	if (msr_vp_index > hv_max_vp_index)
 		hv_max_vp_index = msr_vp_index;
 
+	if (!hv_vp_assist_page)
+		return 0;
+
+	if (!hv_vp_assist_page[smp_processor_id()])
+		hv_vp_assist_page[smp_processor_id()] =
+			__vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
+
+	if (hv_vp_assist_page[smp_processor_id()]) {
+		u64 val;
+
+		val = vmalloc_to_pfn(hv_vp_assist_page[smp_processor_id()]);
+		val = (val << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) |
+			HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
+
+		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
+	}
+
 	return 0;
 }
 
@@ -198,6 +218,12 @@ static int hv_cpu_die(unsigned int cpu)
 	struct hv_reenlightenment_control re_ctrl;
 	unsigned int new_cpu;
 
+	if (hv_vp_assist_page && hv_vp_assist_page[cpu]) {
+		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
+		vfree(hv_vp_assist_page[cpu]);
+		hv_vp_assist_page[cpu] = NULL;
+	}
+
 	if (hv_reenlightenment_cb == NULL)
 		return 0;
 
@@ -241,6 +267,13 @@ void hyperv_init(void)
 	if (!hv_vp_index)
 		return;
 
+	hv_vp_assist_page = kcalloc(num_possible_cpus(),
+				    sizeof(*hv_vp_assist_page), GFP_KERNEL);
+	if (!hv_vp_assist_page) {
+		ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
+		return;
+	}
+
 	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
 			      hv_cpu_init, hv_cpu_die) < 0)
 		goto free_vp_index;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 25283f7eb299..778d2efd34f1 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -294,6 +294,12 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
  */
 extern u32 *hv_vp_index;
 extern u32 hv_max_vp_index;
+extern struct hv_vp_assist_page **hv_vp_assist_page;
+
+static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
+{
+	return hv_vp_assist_page[cpu];
+}
 
 /**
  * hv_cpu_number_to_vp_number() - Map CPU to VP.
@@ -330,6 +336,10 @@ static inline void hyperv_setup_mmu_ops(void) {}
 static inline void set_hv_tscchange_cb(void (*cb)(void)) {}
 static inline void clear_hv_tscchange_cb(void) {}
 static inline void hyperv_stop_tsc_emulation(void) {};
+static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
+{
+	return NULL;
+}
 #endif /* CONFIG_HYPERV */
 
 #ifdef CONFIG_HYPERV_TSCPAGE
diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index 45cc62352040..bd7a2f020f68 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -156,6 +156,9 @@
 /* Recommend using the newer ExProcessorMasks interface */
 #define HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED	(1 << 11)
 
+/* Recommend using enlightened VMCS */
+#define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED    (1 << 14)
+
 /*
  * Crash notification flag.
  */
@@ -414,6 +417,16 @@ struct hv_timer_message_payload {
 	__u64 delivery_time;	/* When the message was delivered */
 };
 
+/* Define virtual processor assist page structure. */
+struct hv_vp_assist_page {
+	__u32 apic_assist;
+	__u32 reserved;
+	__u64 vtl_control[2];
+	__u64 nested_enlightenments_control[2];
+	__u32 enlighten_vmentry;
+	__u64 current_nested_vmcs;
+};
+
 #define HV_STIMER_ENABLE		(1ULL << 0)
 #define HV_STIMER_PERIODIC		(1ULL << 1)
 #define HV_STIMER_LAZY			(1ULL << 2)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/5] x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
  2018-02-26 17:11 [PATCH v2 0/5] Enlightened VMCS support for KVM on Hyper-V Vitaly Kuznetsov
  2018-02-26 17:11 ` [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE Vitaly Kuznetsov
  2018-02-26 17:11 ` [PATCH v2 2/5] x86/hyper-v: allocate and use Virtual Processor Assist Pages Vitaly Kuznetsov
@ 2018-02-26 17:11 ` Vitaly Kuznetsov
  2018-02-26 17:11 ` [PATCH v2 4/5] x86/hyper-v: detect nested features Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2018-02-26 17:11 UTC (permalink / raw)
  To: kvm
  Cc: x86, Paolo Bonzini, Radim Krčmář,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

The definitions are according to the Hyper-V TLFS v5.0. KVM on Hyper-V will
use these.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/uapi/asm/hyperv.h | 200 +++++++++++++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)

diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index bd7a2f020f68..d029897f241f 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -427,6 +427,206 @@ struct hv_vp_assist_page {
 	__u64 current_nested_vmcs;
 };
 
+struct hv_enlightened_vmcs {
+	u32 revision_id;
+	u32 abort;
+
+	u16 host_es_selector;
+	u16 host_cs_selector;
+	u16 host_ss_selector;
+	u16 host_ds_selector;
+	u16 host_fs_selector;
+	u16 host_gs_selector;
+	u16 host_tr_selector;
+
+	u64 host_ia32_pat;
+	u64 host_ia32_efer;
+
+	u64 host_cr0;
+	u64 host_cr3;
+	u64 host_cr4;
+
+	u64 host_ia32_sysenter_esp;
+	u64 host_ia32_sysenter_eip;
+	u64 host_rip;
+	u32 host_ia32_sysenter_cs;
+
+	u32 pin_based_vm_exec_control;
+	u32 vm_exit_controls;
+	u32 secondary_vm_exec_control;
+
+	u64 io_bitmap_a;
+	u64 io_bitmap_b;
+	u64 msr_bitmap;
+
+	u16 guest_es_selector;
+	u16 guest_cs_selector;
+	u16 guest_ss_selector;
+	u16 guest_ds_selector;
+	u16 guest_fs_selector;
+	u16 guest_gs_selector;
+	u16 guest_ldtr_selector;
+	u16 guest_tr_selector;
+
+	u32 guest_es_limit;
+	u32 guest_cs_limit;
+	u32 guest_ss_limit;
+	u32 guest_ds_limit;
+	u32 guest_fs_limit;
+	u32 guest_gs_limit;
+	u32 guest_ldtr_limit;
+	u32 guest_tr_limit;
+	u32 guest_gdtr_limit;
+	u32 guest_idtr_limit;
+
+	u32 guest_es_ar_bytes;
+	u32 guest_cs_ar_bytes;
+	u32 guest_ss_ar_bytes;
+	u32 guest_ds_ar_bytes;
+	u32 guest_fs_ar_bytes;
+	u32 guest_gs_ar_bytes;
+	u32 guest_ldtr_ar_bytes;
+	u32 guest_tr_ar_bytes;
+
+	u64 guest_es_base;
+	u64 guest_cs_base;
+	u64 guest_ss_base;
+	u64 guest_ds_base;
+	u64 guest_fs_base;
+	u64 guest_gs_base;
+	u64 guest_ldtr_base;
+	u64 guest_tr_base;
+	u64 guest_gdtr_base;
+	u64 guest_idtr_base;
+
+	u64 padding64_1[3];
+
+	u64 vm_exit_msr_store_addr;
+	u64 vm_exit_msr_load_addr;
+	u64 vm_entry_msr_load_addr;
+
+	u64 cr3_target_value0;
+	u64 cr3_target_value1;
+	u64 cr3_target_value2;
+	u64 cr3_target_value3;
+
+	u32 page_fault_error_code_mask;
+	u32 page_fault_error_code_match;
+
+	u32 cr3_target_count;
+	u32 vm_exit_msr_store_count;
+	u32 vm_exit_msr_load_count;
+	u32 vm_entry_msr_load_count;
+
+	u64 tsc_offset;
+	u64 virtual_apic_page_addr;
+	u64 vmcs_link_pointer;
+
+	u64 guest_ia32_debugctl;
+	u64 guest_ia32_pat;
+	u64 guest_ia32_efer;
+
+	u64 guest_pdptr0;
+	u64 guest_pdptr1;
+	u64 guest_pdptr2;
+	u64 guest_pdptr3;
+
+	u64 guest_pending_dbg_exceptions;
+	u64 guest_sysenter_esp;
+	u64 guest_sysenter_eip;
+
+	u32 guest_activity_state;
+	u32 guest_sysenter_cs;
+
+	u64 cr0_guest_host_mask;
+	u64 cr4_guest_host_mask;
+	u64 cr0_read_shadow;
+	u64 cr4_read_shadow;
+	u64 guest_cr0;
+	u64 guest_cr3;
+	u64 guest_cr4;
+	u64 guest_dr7;
+
+	u64 host_fs_base;
+	u64 host_gs_base;
+	u64 host_tr_base;
+	u64 host_gdtr_base;
+	u64 host_idtr_base;
+	u64 host_rsp;
+
+	u64 ept_pointer;
+
+	u16 virtual_processor_id;
+	u16 padding16[3];
+
+	u64 padding64_2[5];
+	u64 guest_physical_address;
+
+	u32 vm_instruction_error;
+	u32 vm_exit_reason;
+	u32 vm_exit_intr_info;
+	u32 vm_exit_intr_error_code;
+	u32 idt_vectoring_info_field;
+	u32 idt_vectoring_error_code;
+	u32 vm_exit_instruction_len;
+	u32 vmx_instruction_info;
+
+	u64 exit_qualification;
+	u64 exit_io_instruction_ecx;
+	u64 exit_io_instruction_esi;
+	u64 exit_io_instruction_edi;
+	u64 exit_io_instruction_eip;
+
+	u64 guest_linear_address;
+	u64 guest_rsp;
+	u64 guest_rflags;
+
+	u32 guest_interruptibility_info;
+	u32 cpu_based_vm_exec_control;
+	u32 exception_bitmap;
+	u32 vm_entry_controls;
+	u32 vm_entry_intr_info_field;
+	u32 vm_entry_exception_error_code;
+	u32 vm_entry_instruction_len;
+	u32 tpr_threshold;
+
+	u64 guest_rip;
+
+	u32 hv_clean_fields;
+	u32 hv_padding_32;
+	u32 hv_synthetic_controls;
+	u32 hv_enlightenments_control;
+	u32 hv_vp_id;
+
+	u64 hv_vm_id;
+	u64 partition_assist_page;
+	u64 padding64_4[4];
+	u64 guest_bndcfgs;
+	u64 padding64_5[7];
+	u64 xss_exit_bitmap;
+	u64 padding64_6[7];
+};
+
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE			0
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_IO_BITMAP		BIT(0)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP		BIT(1)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2		BIT(2)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP1		BIT(3)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_PROC		BIT(4)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_EVENT		BIT(5)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_ENTRY		BIT(6)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_EXCPN		BIT(7)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CRDR			BIT(8)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_XLAT		BIT(9)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC		BIT(10)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1		BIT(11)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2		BIT(12)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_POINTER		BIT(13)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1		BIT(14)
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ENLIGHTENMENTSCONTROL	BIT(15)
+
+#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL			0xFFFF
+
 #define HV_STIMER_ENABLE		(1ULL << 0)
 #define HV_STIMER_PERIODIC		(1ULL << 1)
 #define HV_STIMER_LAZY			(1ULL << 2)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/5] x86/hyper-v: detect nested features
  2018-02-26 17:11 [PATCH v2 0/5] Enlightened VMCS support for KVM on Hyper-V Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2018-02-26 17:11 ` [PATCH v2 3/5] x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits Vitaly Kuznetsov
@ 2018-02-26 17:11 ` Vitaly Kuznetsov
  2018-02-26 17:11 ` [PATCH v2 5/5] x86/kvm: use Enlightened VMCS when running on Hyper-V Vitaly Kuznetsov
  2018-02-28 17:19 ` [PATCH v2 0/5] Enlightened VMCS support for KVM " Thomas Gleixner
  5 siblings, 0 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2018-02-26 17:11 UTC (permalink / raw)
  To: kvm
  Cc: x86, Paolo Bonzini, Radim Krčmář,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

TLFS 5.0 says: "Support for an enlightened VMCS interface is reported with
CPUID leaf 0x40000004. If an enlightened VMCS interface is supported,
 additional nested enlightenments may be discovered by reading the CPUID
leaf 0x4000000A (see 2.4.11)."

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/mshyperv.h | 2 ++
 arch/x86/kernel/cpu/mshyperv.c  | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 778d2efd34f1..355844d5ac4f 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -26,12 +26,14 @@ enum hv_cpuid_function {
 	HVCPUID_FEATURES			= 0x40000003,
 	HVCPUID_ENLIGHTENMENT_INFO		= 0x40000004,
 	HVCPUID_IMPLEMENTATION_LIMITS		= 0x40000005,
+	HVCPUID_NESTED_FEATURES			= 0x4000000A
 };
 
 struct ms_hyperv_info {
 	u32 features;
 	u32 misc_features;
 	u32 hints;
+	u32 nested_features;
 	u32 max_vp_index;
 	u32 max_lp_index;
 };
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 9340f41ce8d3..7a387f7a28b8 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -207,6 +207,9 @@ static void __init ms_hyperv_init_platform(void)
 		x86_platform.calibrate_cpu = hv_get_tsc_khz;
 	}
 
+	if (ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED)
+		ms_hyperv.nested_features = cpuid_eax(HVCPUID_NESTED_FEATURES);
+
 #ifdef CONFIG_X86_LOCAL_APIC
 	if (ms_hyperv.features & HV_X64_ACCESS_FREQUENCY_MSRS &&
 	    ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/5] x86/kvm: use Enlightened VMCS when running on Hyper-V
  2018-02-26 17:11 [PATCH v2 0/5] Enlightened VMCS support for KVM on Hyper-V Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2018-02-26 17:11 ` [PATCH v2 4/5] x86/hyper-v: detect nested features Vitaly Kuznetsov
@ 2018-02-26 17:11 ` Vitaly Kuznetsov
  2018-03-07 17:56   ` Radim Krčmář
  2018-02-28 17:19 ` [PATCH v2 0/5] Enlightened VMCS support for KVM " Thomas Gleixner
  5 siblings, 1 reply; 16+ messages in thread
From: Vitaly Kuznetsov @ 2018-02-26 17:11 UTC (permalink / raw)
  To: kvm
  Cc: x86, Paolo Bonzini, Radim Krčmář,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

Enlightened VMCS is just a structure in memory, the main benefit
besides avoiding somewhat slower VMREAD/VMWRITE is using clean field
mask: we tell the underlying hypervisor which fields were modified
since VMEXIT so there's no need to inspect them all.

Tight CPUID loop test shows significant speedup:
Before: 20766 cycles
After: 8912 cycles

Static key is being used to avoid performance penalty for non-Hyper-V
deployments. Tests show we add around 3 (three) CPU cycles on each
VMEXIT (1077.5 cycles before, 1080.7 cycles after for the same CPUID
loop on bare metal). We can probably avoid one test/jmp in vmx_vcpu_run()
but I don't see a clean way to use static key in assembly.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx.c | 561 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 553 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 130fca0ea1bf..f7c8ab2df6c1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -52,6 +52,7 @@
 #include <asm/irq_remapping.h>
 #include <asm/mmu_context.h>
 #include <asm/nospec-branch.h>
+#include <asm/mshyperv.h>
 
 #include "trace.h"
 #include "pmu.h"
@@ -999,6 +1000,442 @@ static const u32 vmx_msr_index[] = {
 	MSR_EFER, MSR_TSC_AUX, MSR_STAR,
 };
 
+DEFINE_STATIC_KEY_FALSE(enable_evmcs);
+
+#define current_evmcs ((struct hv_enlightened_vmcs *)this_cpu_read(current_vmcs))
+
+#if IS_ENABLED(CONFIG_HYPERV)
+static bool __read_mostly enlightened_vmcs = true;
+module_param(enlightened_vmcs, bool, 0444);
+
+#define EVMCS1_OFFSET(x) offsetof(struct hv_enlightened_vmcs, x)
+#define EVMCS1_FIELD(number, name, clean_mask)[ROL16(number, 6)] = \
+		(u32)EVMCS1_OFFSET(name) | ((u32)clean_mask << 16)
+
+/*
+ * Lower 16 bits encode offset of the field in struct hv_enlightened_vmcs,
+ * upped 16 bits hold clean field mask.
+ */
+static const u32 vmcs_field_to_evmcs_1[] = {
+	/* 64 bit rw */
+	EVMCS1_FIELD(GUEST_RIP, guest_rip,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+	EVMCS1_FIELD(GUEST_RSP, guest_rsp,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC),
+	EVMCS1_FIELD(GUEST_RFLAGS, guest_rflags,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC),
+	EVMCS1_FIELD(HOST_IA32_PAT, host_ia32_pat,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_IA32_EFER, host_ia32_efer,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_CR0, host_cr0,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_CR3, host_cr3,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_CR4, host_cr4,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_IA32_SYSENTER_ESP, host_ia32_sysenter_esp,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_RIP, host_rip,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(IO_BITMAP_A, io_bitmap_a,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_IO_BITMAP),
+	EVMCS1_FIELD(IO_BITMAP_B, io_bitmap_b,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_IO_BITMAP),
+	EVMCS1_FIELD(MSR_BITMAP, msr_bitmap,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP),
+	EVMCS1_FIELD(GUEST_ES_BASE, guest_es_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_CS_BASE, guest_cs_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_SS_BASE, guest_ss_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_DS_BASE, guest_ds_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_FS_BASE, guest_fs_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_GS_BASE, guest_gs_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_LDTR_BASE, guest_ldtr_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_TR_BASE, guest_tr_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_GDTR_BASE, guest_gdtr_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_IDTR_BASE, guest_idtr_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(TSC_OFFSET, tsc_offset,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),
+	EVMCS1_FIELD(VIRTUAL_APIC_PAGE_ADDR, virtual_apic_page_addr,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),
+	EVMCS1_FIELD(VMCS_LINK_POINTER, vmcs_link_pointer,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_IA32_DEBUGCTL, guest_ia32_debugctl,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_IA32_PAT, guest_ia32_pat,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_IA32_EFER, guest_ia32_efer,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_PDPTR0, guest_pdptr0,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_PDPTR1, guest_pdptr1,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_PDPTR2, guest_pdptr2,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_PDPTR3, guest_pdptr3,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(CR0_GUEST_HOST_MASK, cr0_guest_host_mask,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CRDR),
+	EVMCS1_FIELD(CR4_GUEST_HOST_MASK, cr4_guest_host_mask,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CRDR),
+	EVMCS1_FIELD(CR0_READ_SHADOW, cr0_read_shadow,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CRDR),
+	EVMCS1_FIELD(CR4_READ_SHADOW, cr4_read_shadow,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CRDR),
+	EVMCS1_FIELD(GUEST_CR0, guest_cr0,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CRDR),
+	EVMCS1_FIELD(GUEST_CR3, guest_cr3,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CRDR),
+	EVMCS1_FIELD(GUEST_CR4, guest_cr4,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CRDR),
+	EVMCS1_FIELD(GUEST_DR7, guest_dr7,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CRDR),
+	EVMCS1_FIELD(HOST_FS_BASE, host_fs_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_POINTER),
+	EVMCS1_FIELD(HOST_GS_BASE, host_gs_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_POINTER),
+	EVMCS1_FIELD(HOST_TR_BASE, host_tr_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_POINTER),
+	EVMCS1_FIELD(HOST_GDTR_BASE, host_gdtr_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_POINTER),
+	EVMCS1_FIELD(HOST_IDTR_BASE, host_idtr_base,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_POINTER),
+	EVMCS1_FIELD(HOST_RSP, host_rsp,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_POINTER),
+	EVMCS1_FIELD(EPT_POINTER, ept_pointer,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_XLAT),
+	EVMCS1_FIELD(GUEST_BNDCFGS, guest_bndcfgs,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(XSS_EXIT_BITMAP, xss_exit_bitmap,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP2),
+
+	/* 64 bit read only */
+	EVMCS1_FIELD(GUEST_PHYSICAL_ADDRESS, guest_physical_address,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+	EVMCS1_FIELD(EXIT_QUALIFICATION, exit_qualification,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+	/*
+	 * Not defined in KVM:
+	 *
+	 * EVMCS1_FIELD(0x00006402, exit_io_instruction_ecx,
+	 *		HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE);
+	 * EVMCS1_FIELD(0x00006404, exit_io_instruction_esi,
+	 *		HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE);
+	 * EVMCS1_FIELD(0x00006406, exit_io_instruction_esi,
+	 *		HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE);
+	 * EVMCS1_FIELD(0x00006408, exit_io_instruction_eip,
+	 *		HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE);
+	 */
+	EVMCS1_FIELD(GUEST_LINEAR_ADDRESS, guest_linear_address,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+
+	/*
+	 * No mask defined in the spec as Hyper-V doesn't currently support
+	 * these. Future proof by resetting the whole clean field mask on
+	 * access.
+	 */
+	EVMCS1_FIELD(VM_EXIT_MSR_STORE_ADDR, vm_exit_msr_store_addr,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+	EVMCS1_FIELD(VM_EXIT_MSR_LOAD_ADDR, vm_exit_msr_load_addr,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+	EVMCS1_FIELD(VM_ENTRY_MSR_LOAD_ADDR, vm_entry_msr_load_addr,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+	EVMCS1_FIELD(CR3_TARGET_VALUE0, cr3_target_value0,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+	EVMCS1_FIELD(CR3_TARGET_VALUE1, cr3_target_value1,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+	EVMCS1_FIELD(CR3_TARGET_VALUE2, cr3_target_value2,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+	EVMCS1_FIELD(CR3_TARGET_VALUE3, cr3_target_value3,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+
+	/* 32 bit rw */
+	EVMCS1_FIELD(TPR_THRESHOLD, tpr_threshold,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+	EVMCS1_FIELD(GUEST_INTERRUPTIBILITY_INFO, guest_interruptibility_info,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC),
+	EVMCS1_FIELD(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_PROC),
+	EVMCS1_FIELD(EXCEPTION_BITMAP, exception_bitmap,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_EXCPN),
+	EVMCS1_FIELD(VM_ENTRY_CONTROLS, vm_entry_controls,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_ENTRY),
+	EVMCS1_FIELD(VM_ENTRY_INTR_INFO_FIELD, vm_entry_intr_info_field,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_EVENT),
+	EVMCS1_FIELD(VM_ENTRY_EXCEPTION_ERROR_CODE,
+		     vm_entry_exception_error_code,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_EVENT),
+	EVMCS1_FIELD(VM_ENTRY_INSTRUCTION_LEN, vm_entry_instruction_len,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_EVENT),
+	EVMCS1_FIELD(HOST_IA32_SYSENTER_CS, host_ia32_sysenter_cs,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(PIN_BASED_VM_EXEC_CONTROL, pin_based_vm_exec_control,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP1),
+	EVMCS1_FIELD(VM_EXIT_CONTROLS, vm_exit_controls,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP1),
+	EVMCS1_FIELD(SECONDARY_VM_EXEC_CONTROL, secondary_vm_exec_control,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_GRP1),
+	EVMCS1_FIELD(GUEST_ES_LIMIT, guest_es_limit,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_CS_LIMIT, guest_cs_limit,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_SS_LIMIT, guest_ss_limit,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_DS_LIMIT, guest_ds_limit,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_FS_LIMIT, guest_fs_limit,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_GS_LIMIT, guest_gs_limit,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_LDTR_LIMIT, guest_ldtr_limit,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_TR_LIMIT, guest_tr_limit,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_GDTR_LIMIT, guest_gdtr_limit,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_IDTR_LIMIT, guest_idtr_limit,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_ES_AR_BYTES, guest_es_ar_bytes,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_CS_AR_BYTES, guest_cs_ar_bytes,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_SS_AR_BYTES, guest_ss_ar_bytes,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_DS_AR_BYTES, guest_ds_ar_bytes,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_FS_AR_BYTES, guest_fs_ar_bytes,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_GS_AR_BYTES, guest_gs_ar_bytes,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_LDTR_AR_BYTES, guest_ldtr_ar_bytes,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_TR_AR_BYTES, guest_tr_ar_bytes,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_ACTIVITY_STATE, guest_activity_state,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+	EVMCS1_FIELD(GUEST_SYSENTER_CS, guest_sysenter_cs,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1),
+
+	/* 32 bit read only */
+	EVMCS1_FIELD(VM_INSTRUCTION_ERROR, vm_instruction_error,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+	EVMCS1_FIELD(VM_EXIT_REASON, vm_exit_reason,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+	EVMCS1_FIELD(VM_EXIT_INTR_INFO, vm_exit_intr_info,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+	EVMCS1_FIELD(VM_EXIT_INTR_ERROR_CODE, vm_exit_intr_error_code,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+	EVMCS1_FIELD(IDT_VECTORING_INFO_FIELD, idt_vectoring_info_field,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+	EVMCS1_FIELD(IDT_VECTORING_ERROR_CODE, idt_vectoring_error_code,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+	EVMCS1_FIELD(VM_EXIT_INSTRUCTION_LEN, vm_exit_instruction_len,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+	EVMCS1_FIELD(VMX_INSTRUCTION_INFO, vmx_instruction_info,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE),
+
+	/* No mask defined in the spec (not used) */
+	EVMCS1_FIELD(PAGE_FAULT_ERROR_CODE_MASK, page_fault_error_code_mask,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+	EVMCS1_FIELD(PAGE_FAULT_ERROR_CODE_MATCH, page_fault_error_code_match,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+	EVMCS1_FIELD(CR3_TARGET_COUNT, cr3_target_count,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+	EVMCS1_FIELD(VM_EXIT_MSR_STORE_COUNT, vm_exit_msr_store_count,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+	EVMCS1_FIELD(VM_EXIT_MSR_LOAD_COUNT, vm_exit_msr_load_count,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+	EVMCS1_FIELD(VM_ENTRY_MSR_LOAD_COUNT, vm_entry_msr_load_count,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL),
+
+	/* 16 bit rw */
+	EVMCS1_FIELD(HOST_ES_SELECTOR, host_es_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_CS_SELECTOR, host_cs_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_SS_SELECTOR, host_ss_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_DS_SELECTOR, host_ds_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_FS_SELECTOR, host_fs_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_GS_SELECTOR, host_gs_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(HOST_TR_SELECTOR, host_tr_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_GRP1),
+	EVMCS1_FIELD(GUEST_ES_SELECTOR, guest_es_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_CS_SELECTOR, guest_cs_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_SS_SELECTOR, guest_ss_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_DS_SELECTOR, guest_ds_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_FS_SELECTOR, guest_fs_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_GS_SELECTOR, guest_gs_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_LDTR_SELECTOR, guest_ldtr_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(GUEST_TR_SELECTOR, guest_tr_selector,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2),
+	EVMCS1_FIELD(VIRTUAL_PROCESSOR_ID, virtual_processor_id,
+		     HV_VMX_ENLIGHTENED_CLEAN_FIELD_CONTROL_XLAT),
+
+/*
+ *  Enlightened VMCSv1 doesn't support these:
+ *	POSTED_INTR_NV                  = 0x00000002,
+ *	GUEST_INTR_STATUS               = 0x00000810,
+ *	GUEST_PML_INDEX			= 0x00000812,
+ *	PML_ADDRESS			= 0x0000200e,
+ *	APIC_ACCESS_ADDR		= 0x00002014,
+ *	POSTED_INTR_DESC_ADDR           = 0x00002016,
+ *	VM_FUNCTION_CONTROL             = 0x00002018,
+ *	EOI_EXIT_BITMAP0                = 0x0000201c,
+ *	EOI_EXIT_BITMAP1                = 0x0000201e,
+ *	EOI_EXIT_BITMAP2                = 0x00002020,
+ *	EOI_EXIT_BITMAP3                = 0x00002022,
+ *	EPTP_LIST_ADDRESS               = 0x00002024,
+ *	VMREAD_BITMAP                   = 0x00002026,
+ *	VMWRITE_BITMAP                  = 0x00002028,
+ *	TSC_MULTIPLIER                  = 0x00002032,
+ *	GUEST_IA32_PERF_GLOBAL_CTRL	= 0x00002808,
+ *	GUEST_IA32_RTIT_CTL		= 0x00002814,
+ *	HOST_IA32_PERF_GLOBAL_CTRL	= 0x00002c04,
+ *	PLE_GAP                         = 0x00004020,
+ *	PLE_WINDOW                      = 0x00004022,
+ *	VMX_PREEMPTION_TIMER_VALUE      = 0x0000482E,
+ */
+};
+
+static inline u16 get_evmcs_offset(unsigned long field)
+{
+	unsigned int index = ROL16(field, 6);
+
+	if (index >= ARRAY_SIZE(vmcs_field_to_evmcs_1))
+		return 0;
+
+	return (u16)vmcs_field_to_evmcs_1[index];
+}
+
+static inline u16 get_evmcs_offset_cf(unsigned long field, u16 *clean_field)
+{
+	unsigned int index = ROL16(field, 6);
+	u32 evmcs_field;
+
+	if (index >= ARRAY_SIZE(vmcs_field_to_evmcs_1))
+		return 0;
+
+	evmcs_field = vmcs_field_to_evmcs_1[index];
+
+	*clean_field = evmcs_field >> 16;
+
+	return (u16)evmcs_field;
+}
+
+static inline void evmcs_write64(unsigned long field, u64 value)
+{
+	u16 clean_field;
+	u16 offset = get_evmcs_offset_cf(field, &clean_field);
+
+	if (!offset)
+		return;
+
+	*(u64 *)((char *)current_evmcs + offset) = value;
+
+	current_evmcs->hv_clean_fields &= ~clean_field;
+}
+
+static inline void evmcs_write32(unsigned long field, u32 value)
+{
+	u16 clean_field;
+	u16 offset = get_evmcs_offset_cf(field, &clean_field);
+
+	if (!offset)
+		return;
+
+	*(u32 *)((char *)current_evmcs + offset) = value;
+	current_evmcs->hv_clean_fields &= ~clean_field;
+}
+
+static inline void evmcs_write16(unsigned long field, u16 value)
+{
+	u16 clean_field;
+	u16 offset = get_evmcs_offset_cf(field, &clean_field);
+
+	if (!offset)
+		return;
+
+	*(u16 *)((char *)current_evmcs + offset) = value;
+	current_evmcs->hv_clean_fields &= ~clean_field;
+}
+
+static inline u64 evmcs_read64(unsigned long field)
+{
+	u16 offset = get_evmcs_offset(field);
+
+	if (!offset)
+		return 0;
+
+	return *(u64 *)((char *)current_evmcs + offset);
+}
+
+static inline u32 evmcs_read32(unsigned long field)
+{
+	u16 offset = get_evmcs_offset(field);
+
+	if (!offset)
+		return 0;
+
+	return *(u32 *)((char *)current_evmcs + offset);
+}
+
+static inline u16 evmcs_read16(unsigned long field)
+{
+	u16 offset = get_evmcs_offset(field);
+
+	if (!offset)
+		return 0;
+
+	return *(u16 *)((char *)current_evmcs + offset);
+}
+
+static void vmcs_load_enlightened(u64 phys_addr)
+{
+	struct hv_vp_assist_page *vp_ap =
+		hv_get_vp_assist_page(smp_processor_id());
+
+	vp_ap->current_nested_vmcs = phys_addr;
+	vp_ap->enlighten_vmentry = 1;
+}
+#else /* !IS_ENABLED(CONFIG_HYPERV) */
+static inline void evmcs_write64(unsigned long field, u64 value) {}
+static inline void evmcs_write32(unsigned long field, u32 value) {}
+static inline void evmcs_write16(unsigned long field, u16 value) {}
+static inline u64 evmcs_read64(unsigned long field) { return 0; }
+static inline u32 evmcs_read32(unsigned long field) { return 0; }
+static inline u16 evmcs_read16(unsigned long field) { return 0; }
+static inline void vmcs_load_enlightened(u64 phys_addr) {}
+#endif /* IS_ENABLED(CONFIG_HYPERV) */
+
 static inline bool is_exception_n(u32 intr_info, u8 vector)
 {
 	return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VECTOR_MASK |
@@ -1472,6 +1909,9 @@ static void vmcs_load(struct vmcs *vmcs)
 	u64 phys_addr = __pa(vmcs);
 	u8 error;
 
+	if (static_branch_unlikely(&enable_evmcs))
+		return vmcs_load_enlightened(phys_addr);
+
 	asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) "; setna %0"
 			: "=qm"(error) : "a"(&phys_addr), "m"(phys_addr)
 			: "cc", "memory");
@@ -1645,18 +2085,24 @@ static __always_inline unsigned long __vmcs_readl(unsigned long field)
 static __always_inline u16 vmcs_read16(unsigned long field)
 {
 	vmcs_check16(field);
+	if (static_branch_unlikely(&enable_evmcs))
+		return evmcs_read16(field);
 	return __vmcs_readl(field);
 }
 
 static __always_inline u32 vmcs_read32(unsigned long field)
 {
 	vmcs_check32(field);
+	if (static_branch_unlikely(&enable_evmcs))
+		return evmcs_read32(field);
 	return __vmcs_readl(field);
 }
 
 static __always_inline u64 vmcs_read64(unsigned long field)
 {
 	vmcs_check64(field);
+	if (static_branch_unlikely(&enable_evmcs))
+		return evmcs_read64(field);
 #ifdef CONFIG_X86_64
 	return __vmcs_readl(field);
 #else
@@ -1667,6 +2113,8 @@ static __always_inline u64 vmcs_read64(unsigned long field)
 static __always_inline unsigned long vmcs_readl(unsigned long field)
 {
 	vmcs_checkl(field);
+	if (static_branch_unlikely(&enable_evmcs))
+		return evmcs_read64(field);
 	return __vmcs_readl(field);
 }
 
@@ -1690,18 +2138,27 @@ static __always_inline void __vmcs_writel(unsigned long field, unsigned long val
 static __always_inline void vmcs_write16(unsigned long field, u16 value)
 {
 	vmcs_check16(field);
+	if (static_branch_unlikely(&enable_evmcs))
+		return evmcs_write16(field, value);
+
 	__vmcs_writel(field, value);
 }
 
 static __always_inline void vmcs_write32(unsigned long field, u32 value)
 {
 	vmcs_check32(field);
+	if (static_branch_unlikely(&enable_evmcs))
+		return evmcs_write32(field, value);
+
 	__vmcs_writel(field, value);
 }
 
 static __always_inline void vmcs_write64(unsigned long field, u64 value)
 {
 	vmcs_check64(field);
+	if (static_branch_unlikely(&enable_evmcs))
+		return evmcs_write64(field, value);
+
 	__vmcs_writel(field, value);
 #ifndef CONFIG_X86_64
 	asm volatile ("");
@@ -1712,6 +2169,9 @@ static __always_inline void vmcs_write64(unsigned long field, u64 value)
 static __always_inline void vmcs_writel(unsigned long field, unsigned long value)
 {
 	vmcs_checkl(field);
+	if (static_branch_unlikely(&enable_evmcs))
+		return evmcs_write64(field, value);
+
 	__vmcs_writel(field, value);
 }
 
@@ -1719,6 +2179,9 @@ static __always_inline void vmcs_clear_bits(unsigned long field, u32 mask)
 {
         BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x2000,
 			 "vmcs_clear_bits does not support 64-bit fields");
+	if (static_branch_unlikely(&enable_evmcs))
+		return evmcs_write32(field, evmcs_read32(field) & ~mask);
+
 	__vmcs_writel(field, __vmcs_readl(field) & ~mask);
 }
 
@@ -1726,6 +2189,9 @@ static __always_inline void vmcs_set_bits(unsigned long field, u32 mask)
 {
         BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x2000,
 			 "vmcs_set_bits does not support 64-bit fields");
+	if (static_branch_unlikely(&enable_evmcs))
+		return evmcs_write32(field, evmcs_read32(field) | mask);
+
 	__vmcs_writel(field, __vmcs_readl(field) | mask);
 }
 
@@ -3595,6 +4061,14 @@ static int hardware_enable(void)
 	if (cr4_read_shadow() & X86_CR4_VMXE)
 		return -EBUSY;
 
+	/*
+	 * This can happen if we hot-added a CPU but failed to allocate
+	 * VP assist page for it.
+	 */
+	if (static_branch_unlikely(&enable_evmcs) &&
+	    !hv_get_vp_assist_page(cpu))
+		return -EFAULT;
+
 	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
 	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
 	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
@@ -3828,7 +4302,12 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 	vmcs_conf->size = vmx_msr_high & 0x1fff;
 	vmcs_conf->order = get_order(vmcs_conf->size);
 	vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
-	vmcs_conf->revision_id = vmx_msr_low;
+
+	/* KVM supports Enlightened VMCS v1 only */
+	if (static_branch_unlikely(&enable_evmcs))
+		vmcs_conf->revision_id = 1;
+	else
+		vmcs_conf->revision_id = vmx_msr_low;
 
 	vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
 	vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
@@ -9395,7 +9874,7 @@ static void vmx_arm_hv_timer(struct kvm_vcpu *vcpu)
 static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	unsigned long cr3, cr4;
+	unsigned long cr3, cr4, evmcs_rsp;
 
 	/* Record the guest's net vcpu time for enforced NMI injections. */
 	if (unlikely(!enable_vnmi &&
@@ -9461,6 +9940,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 		wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
 
 	vmx->__launched = vmx->loaded_vmcs->launched;
+
+	evmcs_rsp = static_branch_unlikely(&enable_evmcs) ?
+		(unsigned long)&current_evmcs->host_rsp : 0;
+
 	asm(
 		/* Store host registers */
 		"push %%" _ASM_DX "; push %%" _ASM_BP ";"
@@ -9469,15 +9952,21 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 		"cmp %%" _ASM_SP ", %c[host_rsp](%0) \n\t"
 		"je 1f \n\t"
 		"mov %%" _ASM_SP ", %c[host_rsp](%0) \n\t"
+		/* Avoid VMWRITE when Enlightened VMCS is in use */
+		"test %%" _ASM_SI ", %%" _ASM_SI " \n\t"
+		"jz 2f \n\t"
+		"mov %%" _ASM_SP ", (%%" _ASM_SI ") \n\t"
+		"jmp 1f \n\t"
+		"2: \n\t"
 		__ex(ASM_VMX_VMWRITE_RSP_RDX) "\n\t"
 		"1: \n\t"
 		/* Reload cr2 if changed */
 		"mov %c[cr2](%0), %%" _ASM_AX " \n\t"
 		"mov %%cr2, %%" _ASM_DX " \n\t"
 		"cmp %%" _ASM_AX ", %%" _ASM_DX " \n\t"
-		"je 2f \n\t"
+		"je 3f \n\t"
 		"mov %%" _ASM_AX", %%cr2 \n\t"
-		"2: \n\t"
+		"3: \n\t"
 		/* Check if vmlaunch of vmresume is needed */
 		"cmpl $0, %c[launched](%0) \n\t"
 		/* Load guest registers.  Don't clobber flags. */
@@ -9546,7 +10035,7 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 		".global vmx_return \n\t"
 		"vmx_return: " _ASM_PTR " 2b \n\t"
 		".popsection"
-	      : : "c"(vmx), "d"((unsigned long)HOST_RSP),
+	      : : "c"(vmx), "d"((unsigned long)HOST_RSP), "S"(evmcs_rsp),
 		[launched]"i"(offsetof(struct vcpu_vmx, __launched)),
 		[fail]"i"(offsetof(struct vcpu_vmx, fail)),
 		[host_rsp]"i"(offsetof(struct vcpu_vmx, host_rsp)),
@@ -9571,10 +10060,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 		[wordsize]"i"(sizeof(ulong))
 	      : "cc", "memory"
 #ifdef CONFIG_X86_64
-		, "rax", "rbx", "rdi", "rsi"
+		, "rax", "rbx", "rdi"
 		, "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15"
 #else
-		, "eax", "ebx", "edi", "esi"
+		, "eax", "ebx", "edi"
 #endif
 	      );
 
@@ -9602,6 +10091,11 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();
 
+	/* All fields are clean at this point */
+	if (static_branch_unlikely(&enable_evmcs))
+		current_evmcs->hv_clean_fields |=
+			HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
+
 	/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
 	if (vmx->host_debugctlmsr)
 		update_debugctlmsr(vmx->host_debugctlmsr);
@@ -12414,7 +12908,35 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 
 static int __init vmx_init(void)
 {
-	int r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
+	int r;
+
+#if IS_ENABLED(CONFIG_HYPERV)
+	/*
+	 * Enlightened VMCS usage should be recommended and the host needs
+	 * to support eVMCS v1 or above. We can also disable eVMCS support
+	 * with module parameter.
+	 */
+	if (enlightened_vmcs &&
+	    ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED &&
+	    (ms_hyperv.nested_features & 0xff) >= 1) {
+		int cpu;
+
+		/* Check that we have assist pages on all online CPUs */
+		for_each_online_cpu(cpu) {
+			if (!hv_get_vp_assist_page(cpu)) {
+				enlightened_vmcs = false;
+				break;
+			}
+		}
+
+		if (enlightened_vmcs) {
+			pr_info("kvm: vmx: using Hyper-V Enlightened VMCS\n");
+			static_branch_enable(&enable_evmcs);
+		}
+	}
+#endif
+
+	r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
                      __alignof__(struct vcpu_vmx), THIS_MODULE);
 	if (r)
 		return r;
@@ -12435,6 +12957,29 @@ static void __exit vmx_exit(void)
 #endif
 
 	kvm_exit();
+
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (static_branch_unlikely(&enable_evmcs)) {
+		int cpu;
+		struct hv_vp_assist_page *vp_ap;
+		/*
+		 * Reset everything to support using non-enlightened VMCS
+		 * access later (e.g. when we reload the module with
+		 * enlightened_vmcs=0)
+		 */
+		for_each_online_cpu(cpu) {
+			vp_ap =	hv_get_vp_assist_page(cpu);
+
+			if (!vp_ap)
+				continue;
+
+			vp_ap->current_nested_vmcs = 0;
+			vp_ap->enlighten_vmentry = 0;
+		}
+
+		static_branch_disable(&enable_evmcs);
+	}
+#endif
 }
 
 module_init(vmx_init)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/5] Enlightened VMCS support for KVM on Hyper-V
  2018-02-26 17:11 [PATCH v2 0/5] Enlightened VMCS support for KVM on Hyper-V Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2018-02-26 17:11 ` [PATCH v2 5/5] x86/kvm: use Enlightened VMCS when running on Hyper-V Vitaly Kuznetsov
@ 2018-02-28 17:19 ` Thomas Gleixner
  5 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2018-02-28 17:19 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, x86, Paolo Bonzini, Radim Krčmář,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

On Mon, 26 Feb 2018, Vitaly Kuznetsov wrote:

For the non KVM pats of this series:

Acked-by: Thomas Gleixner <tglx@linutronix.de>


> Changes since v1:
> - The only comment I got for v1 was from kbuild test robot. The issue
>   was addressed by moving HV_X64_ENLIGHTENED_VMCS_RECOMMENDED definition
>   to PATCH2.
> - Rebased to current kvm/queue.
> 
> When running nested KVM on Hyper-V it's possible to use so called
> 'Enlightened VMCS' and do normal memory reads/writes instead of
> doing VMWRITE/VMREAD instructions. In addition, clean field mask
> provides a huge room for optimization on L0's side.
> 
> Tight CPUID loop test shows significant speedup (current kvm/queue on
> E5-2667 v4 @ 3.20GHz):
>  Before: 20766 cycles
>  After: 8912 cycles
> 
> The series is based on current kvm/queue tree.
> 
> Ladi Prosek (1):
>   x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to
>     HV_X64_MSR_VP_ASSIST_PAGE
> 
> Vitaly Kuznetsov (4):
>   x86/hyper-v: allocate and use Virtual Processor Assist Pages
>   x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
>   x86/hyper-v: detect nested features
>   x86/kvm: use Enlightened VMCS when running on Hyper-V
> 
>  arch/x86/hyperv/hv_init.c          |  33 +++
>  arch/x86/include/asm/mshyperv.h    |  12 +
>  arch/x86/include/uapi/asm/hyperv.h | 223 ++++++++++++++-
>  arch/x86/kernel/cpu/mshyperv.c     |   3 +
>  arch/x86/kvm/hyperv.c              |   8 +-
>  arch/x86/kvm/lapic.h               |   2 +-
>  arch/x86/kvm/vmx.c                 | 561 ++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c                 |   2 +-
>  8 files changed, 825 insertions(+), 19 deletions(-)
> 
> -- 
> 2.14.3
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE
  2018-02-26 17:11 ` [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE Vitaly Kuznetsov
@ 2018-03-07 16:19   ` Radim Krčmář
  2018-03-07 16:48     ` Roman Kagan
  0 siblings, 1 reply; 16+ messages in thread
From: Radim Krčmář @ 2018-03-07 16:19 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, x86, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

2018-02-26 18:11+0100, Vitaly Kuznetsov:
> From: Ladi Prosek <lprosek@redhat.com>
> 
> The assist page has been used only for the paravirtual EOI so far, hence
> the "APIC" in the MSR name. Renaming to match the Hyper-V TLFS where it's
> called "Virtual VP Assist MSR".
> 
> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/uapi/asm/hyperv.h | 10 +++++-----
>  arch/x86/kvm/hyperv.c              |  8 ++++----
>  arch/x86/kvm/lapic.h               |  2 +-
>  arch/x86/kvm/x86.c                 |  2 +-
>  4 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
> index 1c12aaf33915..45cc62352040 100644
> --- a/arch/x86/include/uapi/asm/hyperv.h
> +++ b/arch/x86/include/uapi/asm/hyperv.h
> @@ -189,7 +189,7 @@
>  #define HV_X64_MSR_EOI				0x40000070
>  #define HV_X64_MSR_ICR				0x40000071
>  #define HV_X64_MSR_TPR				0x40000072
> -#define HV_X64_MSR_APIC_ASSIST_PAGE		0x40000073
> +#define HV_X64_MSR_VP_ASSIST_PAGE		0x40000073
>  
>  /* Define synthetic interrupt controller model specific registers. */
>  #define HV_X64_MSR_SCONTROL			0x40000080
> @@ -275,10 +275,10 @@ struct hv_tsc_emulation_status {
>  #define HVCALL_POST_MESSAGE			0x005c
>  #define HVCALL_SIGNAL_EVENT			0x005d
>  
> -#define HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE		0x00000001
> -#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT	12
> -#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK	\
> -		(~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))

Removing definitions from userspace api isn't a good idea.

I have no idea why hyper.h is a userspace api, though -- Linux doesn't
define any of those, so we could copy the definitions to a private
header, rename, and never look at this file again.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE
  2018-03-07 16:19   ` Radim Krčmář
@ 2018-03-07 16:48     ` Roman Kagan
  2018-03-07 18:04       ` Radim Krčmář
  0 siblings, 1 reply; 16+ messages in thread
From: Roman Kagan @ 2018-03-07 16:48 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Vitaly Kuznetsov, kvm, x86, Paolo Bonzini, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

On Wed, Mar 07, 2018 at 05:19:44PM +0100, Radim Krčmář wrote:
> 2018-02-26 18:11+0100, Vitaly Kuznetsov:
> > From: Ladi Prosek <lprosek@redhat.com>
> > 
> > The assist page has been used only for the paravirtual EOI so far, hence
> > the "APIC" in the MSR name. Renaming to match the Hyper-V TLFS where it's
> > called "Virtual VP Assist MSR".
> > 
> > Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > ---
> >  arch/x86/include/uapi/asm/hyperv.h | 10 +++++-----
> >  arch/x86/kvm/hyperv.c              |  8 ++++----
> >  arch/x86/kvm/lapic.h               |  2 +-
> >  arch/x86/kvm/x86.c                 |  2 +-
> >  4 files changed, 11 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
> > index 1c12aaf33915..45cc62352040 100644
> > --- a/arch/x86/include/uapi/asm/hyperv.h
> > +++ b/arch/x86/include/uapi/asm/hyperv.h
> > @@ -189,7 +189,7 @@
> >  #define HV_X64_MSR_EOI				0x40000070
> >  #define HV_X64_MSR_ICR				0x40000071
> >  #define HV_X64_MSR_TPR				0x40000072
> > -#define HV_X64_MSR_APIC_ASSIST_PAGE		0x40000073
> > +#define HV_X64_MSR_VP_ASSIST_PAGE		0x40000073
> >  
> >  /* Define synthetic interrupt controller model specific registers. */
> >  #define HV_X64_MSR_SCONTROL			0x40000080
> > @@ -275,10 +275,10 @@ struct hv_tsc_emulation_status {
> >  #define HVCALL_POST_MESSAGE			0x005c
> >  #define HVCALL_SIGNAL_EVENT			0x005d
> >  
> > -#define HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE		0x00000001
> > -#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT	12
> > -#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK	\
> > -		(~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
> 
> Removing definitions from userspace api isn't a good idea.
> 
> I have no idea why hyper.h is a userspace api, though -- Linux doesn't
> define any of those, so we could copy the definitions to a private
> header, rename, and never look at this file again.

That was a thinko when it was moved to uapi, and it has already been
identified as a problem, so now QEMU has its own header with the
definitions it needs, and I'm unaware of any other userspace project
that depends on this stuff.  So I've been planning to remove it from
uapi but still haven't got around to posting the patch :(

Roman.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] x86/kvm: use Enlightened VMCS when running on Hyper-V
  2018-02-26 17:11 ` [PATCH v2 5/5] x86/kvm: use Enlightened VMCS when running on Hyper-V Vitaly Kuznetsov
@ 2018-03-07 17:56   ` Radim Krčmář
  2018-03-08 10:23     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 16+ messages in thread
From: Radim Krčmář @ 2018-03-07 17:56 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, x86, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

2018-02-26 18:11+0100, Vitaly Kuznetsov:
> Enlightened VMCS is just a structure in memory, the main benefit
> besides avoiding somewhat slower VMREAD/VMWRITE is using clean field
> mask: we tell the underlying hypervisor which fields were modified
> since VMEXIT so there's no need to inspect them all.
> 
> Tight CPUID loop test shows significant speedup:
> Before: 20766 cycles
> After: 8912 cycles
> 
> Static key is being used to avoid performance penalty for non-Hyper-V
> deployments. Tests show we add around 3 (three) CPU cycles on each
> VMEXIT (1077.5 cycles before, 1080.7 cycles after for the same CPUID
> loop on bare metal). We can probably avoid one test/jmp in vmx_vcpu_run()
> but I don't see a clean way to use static key in assembly.

Patching the correct instruction should be simpler than replicating
static_branch (because we have to support assemblers without ASM_GOTO),
but I'd care about that later, if ever.

> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -999,6 +1000,442 @@ static const u32 vmx_msr_index[] = {
> +/*
> + *  Enlightened VMCSv1 doesn't support these:

I take it that the code assumes that the hypervisor will never enable
these features if we have enlightened VMCS.  I think it would be best to
disable those features when turning on evmcs.

> + *	POSTED_INTR_NV                  = 0x00000002,
> + *	GUEST_INTR_STATUS               = 0x00000810,
> + *	APIC_ACCESS_ADDR		= 0x00002014,
> + *	POSTED_INTR_DESC_ADDR           = 0x00002016,
> + *	EOI_EXIT_BITMAP0                = 0x0000201c,
> + *	EOI_EXIT_BITMAP1                = 0x0000201e,
> + *	EOI_EXIT_BITMAP2                = 0x00002020,
> + *	EOI_EXIT_BITMAP3                = 0x00002022,

enable_apicv, flexpriority_enabled

> + *	GUEST_PML_INDEX			= 0x00000812,
> + *	PML_ADDRESS			= 0x0000200e,

enable_pml

> + *	VM_FUNCTION_CONTROL             = 0x00002018,
> + *	EPTP_LIST_ADDRESS               = 0x00002024,

(only vm controls)

> + *	VMREAD_BITMAP                   = 0x00002026,
> + *	VMWRITE_BITMAP                  = 0x00002028,

enable_shadow_vmcs

> + *	TSC_MULTIPLIER                  = 0x00002032,

(only vm controls)

> + *	GUEST_IA32_PERF_GLOBAL_CTRL	= 0x00002808,
> + *	GUEST_IA32_RTIT_CTL		= 0x00002814,
> + *	HOST_IA32_PERF_GLOBAL_CTRL	= 0x00002c04,

(only vm controls)

> + *	PLE_GAP                         = 0x00004020,
> + *	PLE_WINDOW                      = 0x00004022,

ple_gap

> + *	VMX_PREEMPTION_TIMER_VALUE      = 0x0000482E,

enable_preemption_timer

> + */
> +};
> +
> +static inline u16 get_evmcs_offset(unsigned long field)
> +{
> +	unsigned int index = ROL16(field, 6);
> +
> +	if (index >= ARRAY_SIZE(vmcs_field_to_evmcs_1))
> +		return 0;

Please add a warning when trying to use an EVMCS that doesn't exist,
just like the VMREAD/WRITE does -- it's an internal KVM error.

> +
> +	return (u16)vmcs_field_to_evmcs_1[index];
> +}
> +
> @@ -3828,7 +4302,12 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  	vmcs_conf->size = vmx_msr_high & 0x1fff;
>  	vmcs_conf->order = get_order(vmcs_conf->size);
>  	vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
> -	vmcs_conf->revision_id = vmx_msr_low;
> +
> +	/* KVM supports Enlightened VMCS v1 only */
> +	if (static_branch_unlikely(&enable_evmcs))
> +		vmcs_conf->revision_id = 1;

I think we have to put the bottom bits from ms_hyperv.nested_features
here. 16.5.1 Enlightened VMCS Versioning:

  Each enlightened VMCS structure contains a version field, which is
  reported by the L0 hypervisor (see 2.4.11)



> +	else
> +		vmcs_conf->revision_id = vmx_msr_low;
>  
>  	vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
>  	vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
> @@ -12414,7 +12908,35 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  
>  static int __init vmx_init(void)
>  {
> -	int r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
> +	int r;
> +
> +#if IS_ENABLED(CONFIG_HYPERV)
> +	/*
> +	 * Enlightened VMCS usage should be recommended and the host needs
> +	 * to support eVMCS v1 or above. We can also disable eVMCS support
> +	 * with module parameter.
> +	 */
> +	if (enlightened_vmcs &&
> +	    ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED &&
> +	    (ms_hyperv.nested_features & 0xff) >= 1) {

Please add macros like HV_X64_ENLIGHTENED_VMCS_VERSION and
KVM_EVMCS_VERSION for those numbers.
I think we should not proceed with version other than 1 for now -- there
is no mention of backward compatibility in the spec.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE
  2018-03-07 16:48     ` Roman Kagan
@ 2018-03-07 18:04       ` Radim Krčmář
  2018-03-08 10:17         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 16+ messages in thread
From: Radim Krčmář @ 2018-03-07 18:04 UTC (permalink / raw)
  To: Roman Kagan, Vitaly Kuznetsov, kvm, x86, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

2018-03-07 19:48+0300, Roman Kagan:
> On Wed, Mar 07, 2018 at 05:19:44PM +0100, Radim Krčmář wrote:
> > 2018-02-26 18:11+0100, Vitaly Kuznetsov:
> > > diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
> > > @@ -275,10 +275,10 @@ struct hv_tsc_emulation_status {
> > >  #define HVCALL_POST_MESSAGE			0x005c
> > >  #define HVCALL_SIGNAL_EVENT			0x005d
> > >  
> > > -#define HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE		0x00000001
> > > -#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT	12
> > > -#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK	\
> > > -		(~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
> > 
> > Removing definitions from userspace api isn't a good idea.
> > 
> > I have no idea why hyper.h is a userspace api, though -- Linux doesn't
> > define any of those, so we could copy the definitions to a private
> > header, rename, and never look at this file again.
> 
> That was a thinko when it was moved to uapi, and it has already been
> identified as a problem, so now QEMU has its own header with the
> definitions it needs, and I'm unaware of any other userspace project
> that depends on this stuff.  So I've been planning to remove it from
> uapi but still haven't got around to posting the patch :(

Great, let's be bold here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE
  2018-03-07 18:04       ` Radim Krčmář
@ 2018-03-08 10:17         ` Vitaly Kuznetsov
  2018-03-08 16:29           ` Michael Kelley (EOSG)
  0 siblings, 1 reply; 16+ messages in thread
From: Vitaly Kuznetsov @ 2018-03-08 10:17 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Roman Kagan, kvm, x86, Paolo Bonzini, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

Radim Krčmář <rkrcmar@redhat.com> writes:

> 2018-03-07 19:48+0300, Roman Kagan:
>> On Wed, Mar 07, 2018 at 05:19:44PM +0100, Radim Krčmář wrote:
>> > 2018-02-26 18:11+0100, Vitaly Kuznetsov:
>> > > diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
>> > > @@ -275,10 +275,10 @@ struct hv_tsc_emulation_status {
>> > >  #define HVCALL_POST_MESSAGE			0x005c
>> > >  #define HVCALL_SIGNAL_EVENT			0x005d
>> > >  
>> > > -#define HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE		0x00000001
>> > > -#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT	12
>> > > -#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK	\
>> > > -		(~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
>> > 
>> > Removing definitions from userspace api isn't a good idea.
>> > 
>> > I have no idea why hyper.h is a userspace api, though -- Linux doesn't
>> > define any of those, so we could copy the definitions to a private
>> > header, rename, and never look at this file again.
>> 
>> That was a thinko when it was moved to uapi, and it has already been
>> identified as a problem, so now QEMU has its own header with the
>> definitions it needs, and I'm unaware of any other userspace project
>> that depends on this stuff.  So I've been planning to remove it from
>> uapi but still haven't got around to posting the patch :(
>
> Great, let's be bold here.

asm/hyperv.h is not uapi.

I would include a patch renaming arch/x86/include/uapi/asm/hyperv.h to
arch/x86/include/asm/hyperv.h but we already have 'mshyperv.h' there and
I don't quite understand the difference. We can either merge them or
come up with a rule distinguishing them.

K. Y., Michael, what do you think?

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] x86/kvm: use Enlightened VMCS when running on Hyper-V
  2018-03-07 17:56   ` Radim Krčmář
@ 2018-03-08 10:23     ` Vitaly Kuznetsov
  2018-03-08 16:56       ` Radim Krčmář
  0 siblings, 1 reply; 16+ messages in thread
From: Vitaly Kuznetsov @ 2018-03-08 10:23 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: kvm, x86, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

Radim Krčmář <rkrcmar@redhat.com> writes:

> 2018-02-26 18:11+0100, Vitaly Kuznetsov:
>> Enlightened VMCS is just a structure in memory, the main benefit
>> besides avoiding somewhat slower VMREAD/VMWRITE is using clean field
>> mask: we tell the underlying hypervisor which fields were modified
>> since VMEXIT so there's no need to inspect them all.
>> 
>> Tight CPUID loop test shows significant speedup:
>> Before: 20766 cycles
>> After: 8912 cycles
>> 
>> Static key is being used to avoid performance penalty for non-Hyper-V
>> deployments. Tests show we add around 3 (three) CPU cycles on each
>> VMEXIT (1077.5 cycles before, 1080.7 cycles after for the same CPUID
>> loop on bare metal). We can probably avoid one test/jmp in vmx_vcpu_run()
>> but I don't see a clean way to use static key in assembly.
>
> Patching the correct instruction should be simpler than replicating
> static_branch (because we have to support assemblers without ASM_GOTO),
> but I'd care about that later, if ever.
>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -999,6 +1000,442 @@ static const u32 vmx_msr_index[] = {
>> +/*
>> + *  Enlightened VMCSv1 doesn't support these:
>
> I take it that the code assumes that the hypervisor will never enable
> these features if we have enlightened VMCS.  I think it would be best to
> disable those features when turning on evmcs.

Sure.

>
>> + *	POSTED_INTR_NV                  = 0x00000002,
>> + *	GUEST_INTR_STATUS               = 0x00000810,
>> + *	APIC_ACCESS_ADDR		= 0x00002014,
>> + *	POSTED_INTR_DESC_ADDR           = 0x00002016,
>> + *	EOI_EXIT_BITMAP0                = 0x0000201c,
>> + *	EOI_EXIT_BITMAP1                = 0x0000201e,
>> + *	EOI_EXIT_BITMAP2                = 0x00002020,
>> + *	EOI_EXIT_BITMAP3                = 0x00002022,
>
> enable_apicv, flexpriority_enabled
>
>> + *	GUEST_PML_INDEX			= 0x00000812,
>> + *	PML_ADDRESS			= 0x0000200e,
>
> enable_pml
>
>> + *	VM_FUNCTION_CONTROL             = 0x00002018,
>> + *	EPTP_LIST_ADDRESS               = 0x00002024,
>
> (only vm controls)
>
>> + *	VMREAD_BITMAP                   = 0x00002026,
>> + *	VMWRITE_BITMAP                  = 0x00002028,
>
> enable_shadow_vmcs
>
>> + *	TSC_MULTIPLIER                  = 0x00002032,
>
> (only vm controls)
>
>> + *	GUEST_IA32_PERF_GLOBAL_CTRL	= 0x00002808,
>> + *	GUEST_IA32_RTIT_CTL		= 0x00002814,
>> + *	HOST_IA32_PERF_GLOBAL_CTRL	= 0x00002c04,
>
> (only vm controls)
>
>> + *	PLE_GAP                         = 0x00004020,
>> + *	PLE_WINDOW                      = 0x00004022,
>
> ple_gap
>
>> + *	VMX_PREEMPTION_TIMER_VALUE      = 0x0000482E,
>
> enable_preemption_timer
>
>> + */
>> +};
>> +
>> +static inline u16 get_evmcs_offset(unsigned long field)
>> +{
>> +	unsigned int index = ROL16(field, 6);
>> +
>> +	if (index >= ARRAY_SIZE(vmcs_field_to_evmcs_1))
>> +		return 0;
>
> Please add a warning when trying to use an EVMCS that doesn't exist,
> just like the VMREAD/WRITE does -- it's an internal KVM error.
>

Will do.

>> +
>> +	return (u16)vmcs_field_to_evmcs_1[index];
>> +}
>> +
>> @@ -3828,7 +4302,12 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>>  	vmcs_conf->size = vmx_msr_high & 0x1fff;
>>  	vmcs_conf->order = get_order(vmcs_conf->size);
>>  	vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
>> -	vmcs_conf->revision_id = vmx_msr_low;
>> +
>> +	/* KVM supports Enlightened VMCS v1 only */
>> +	if (static_branch_unlikely(&enable_evmcs))
>> +		vmcs_conf->revision_id = 1;
>
> I think we have to put the bottom bits from ms_hyperv.nested_features
> here. 16.5.1 Enlightened VMCS Versioning:
>
>   Each enlightened VMCS structure contains a version field, which is
>   reported by the L0 hypervisor (see 2.4.11)

see below

>
>> +	else
>> +		vmcs_conf->revision_id = vmx_msr_low;
>>  
>>  	vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
>>  	vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
>> @@ -12414,7 +12908,35 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>>  
>>  static int __init vmx_init(void)
>>  {
>> -	int r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
>> +	int r;
>> +
>> +#if IS_ENABLED(CONFIG_HYPERV)
>> +	/*
>> +	 * Enlightened VMCS usage should be recommended and the host needs
>> +	 * to support eVMCS v1 or above. We can also disable eVMCS support
>> +	 * with module parameter.
>> +	 */
>> +	if (enlightened_vmcs &&
>> +	    ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED &&
>> +	    (ms_hyperv.nested_features & 0xff) >= 1) {
>
> Please add macros like HV_X64_ENLIGHTENED_VMCS_VERSION and
> KVM_EVMCS_VERSION for those numbers.

This can be arranged

> I think we should not proceed with version other than 1 for now -- there
> is no mention of backward compatibility in the spec.

I think the check is correct. eVMCS version represents the format of the
structure in memory. New versions may have nothing in common but at the
same time Ver1 support can't be dropped from future Hyper-V versions
without regressing L1s which don't support new version.

So currently we check that L0 supports at least ver1 (or some newer
version which implies all previously released versions are supported
too) and we declare that KVM supports ver1 exactly (this is the format
of the structure we put to memory and share with L0). We can't declare
support for any other version.

Currently, it is only Ver1 on WS2016.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE
  2018-03-08 10:17         ` Vitaly Kuznetsov
@ 2018-03-08 16:29           ` Michael Kelley (EOSG)
  2018-03-08 16:54             ` Vitaly Kuznetsov
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Kelley (EOSG) @ 2018-03-08 16:29 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Radim Krčmář
  Cc: Roman Kagan, kvm, x86, Paolo Bonzini, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Mohammed Gamal, Cathy Avery,
	Bandan Das, linux-kernel

> >> > Removing definitions from userspace api isn't a good idea.
> >> >
> >> > I have no idea why hyper.h is a userspace api, though -- Linux doesn't
> >> > define any of those, so we could copy the definitions to a private
> >> > header, rename, and never look at this file again.
> >>
> >> That was a thinko when it was moved to uapi, and it has already been
> >> identified as a problem, so now QEMU has its own header with the
> >> definitions it needs, and I'm unaware of any other userspace project
> >> that depends on this stuff.  So I've been planning to remove it from
> >> uapi but still haven't got around to posting the patch :(
> >
> > Great, let's be bold here.
> 
> asm/hyperv.h is not uapi.
> 
> I would include a patch renaming arch/x86/include/uapi/asm/hyperv.h to
> arch/x86/include/asm/hyperv.h but we already have 'mshyperv.h' there and
> I don't quite understand the difference. We can either merge them or
> come up with a rule distinguishing them.
> 
> K. Y., Michael, what do you think?

Good timing for this topic, as I'm now looking at cloning these two
files into the arch/arm64 tree for Hyper-V on ARM64.  It would be great
to get a plan agreed on so I can be consistent on the arm64 side.

I would suggest keeping two files:  one with just the data structures and
#defines that come from the Hyper-V Top-Level Functional Spec (TLFS),  
and the other with additional data structures, macros, function prototypes,
etc. that are specific to Linux guest code.   uapi/asm/hyperv.h is already
the first one, and asm/mshyperv.h is mostly the second one, though it has
some things that should probably move to the first one.   There are also a
few stray definitions from the Hyper-V TLFS in drivers/pci/host/pci-hyperv.c
(HVCALL_RETARGET_INTERRUPT and HV_PARTITION_ID_SELF, for example)
that really belong in the first file.

And since we're already changing the location of the first file, let's rename
it to hyperv-tlfs.h or something similar.

Michael

> 
> --
>   Vitaly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE
  2018-03-08 16:29           ` Michael Kelley (EOSG)
@ 2018-03-08 16:54             ` Vitaly Kuznetsov
  0 siblings, 0 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2018-03-08 16:54 UTC (permalink / raw)
  To: Michael Kelley (EOSG)
  Cc: Radim Krčmář,
	Roman Kagan, kvm, x86, Paolo Bonzini, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Mohammed Gamal, Cathy Avery,
	Bandan Das, linux-kernel

"Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> writes:

>> >> > Removing definitions from userspace api isn't a good idea.
>> >> >
>> >> > I have no idea why hyper.h is a userspace api, though -- Linux doesn't
>> >> > define any of those, so we could copy the definitions to a private
>> >> > header, rename, and never look at this file again.
>> >>
>> >> That was a thinko when it was moved to uapi, and it has already been
>> >> identified as a problem, so now QEMU has its own header with the
>> >> definitions it needs, and I'm unaware of any other userspace project
>> >> that depends on this stuff.  So I've been planning to remove it from
>> >> uapi but still haven't got around to posting the patch :(
>> >
>> > Great, let's be bold here.
>> 
>> asm/hyperv.h is not uapi.
>> 
>> I would include a patch renaming arch/x86/include/uapi/asm/hyperv.h to
>> arch/x86/include/asm/hyperv.h but we already have 'mshyperv.h' there and
>> I don't quite understand the difference. We can either merge them or
>> come up with a rule distinguishing them.
>> 
>> K. Y., Michael, what do you think?
>
> Good timing for this topic, as I'm now looking at cloning these two
> files into the arch/arm64 tree for Hyper-V on ARM64.  It would be great
> to get a plan agreed on so I can be consistent on the arm64 side.
>
> I would suggest keeping two files:  one with just the data structures and
> #defines that come from the Hyper-V Top-Level Functional Spec (TLFS),  
> and the other with additional data structures, macros, function prototypes,
> etc. that are specific to Linux guest code.   uapi/asm/hyperv.h is already
> the first one, and asm/mshyperv.h is mostly the second one, though it has
> some things that should probably move to the first one.   There are also a
> few stray definitions from the Hyper-V TLFS in drivers/pci/host/pci-hyperv.c
> (HVCALL_RETARGET_INTERRUPT and HV_PARTITION_ID_SELF, for example)
> that really belong in the first file.
>
> And since we're already changing the location of the first file, let's rename
> it to hyperv-tlfs.h or something similar.
>

Sounds good,

I'll add a patch or two to my eVMCS series.

Thanks!

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] x86/kvm: use Enlightened VMCS when running on Hyper-V
  2018-03-08 10:23     ` Vitaly Kuznetsov
@ 2018-03-08 16:56       ` Radim Krčmář
  0 siblings, 0 replies; 16+ messages in thread
From: Radim Krčmář @ 2018-03-08 16:56 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, x86, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Michael Kelley (EOSG),
	Mohammed Gamal, Cathy Avery, Bandan Das, linux-kernel

2018-03-08 11:23+0100, Vitaly Kuznetsov:
> Radim Krčmář <rkrcmar@redhat.com> writes:
> > 2018-02-26 18:11+0100, Vitaly Kuznetsov:
> >> @@ -3828,7 +4302,12 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
> >>  	vmcs_conf->size = vmx_msr_high & 0x1fff;
> >>  	vmcs_conf->order = get_order(vmcs_conf->size);
> >>  	vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
> >> -	vmcs_conf->revision_id = vmx_msr_low;
> >> +
> >> +	/* KVM supports Enlightened VMCS v1 only */
> >> +	if (static_branch_unlikely(&enable_evmcs))
> >> +		vmcs_conf->revision_id = 1;
> >
> > I think we have to put the bottom bits from ms_hyperv.nested_features
> > here. 16.5.1 Enlightened VMCS Versioning:
> >
> >   Each enlightened VMCS structure contains a version field, which is
> >   reported by the L0 hypervisor (see 2.4.11)
> 
> see below
> 
> >
> >> +	else
> >> +		vmcs_conf->revision_id = vmx_msr_low;
> >>  
> >>  	vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
> >>  	vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
> >> @@ -12414,7 +12908,35 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >>  
> >>  static int __init vmx_init(void)
> >>  {
> >> -	int r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
> >> +	int r;
> >> +
> >> +#if IS_ENABLED(CONFIG_HYPERV)
> >> +	/*
> >> +	 * Enlightened VMCS usage should be recommended and the host needs
> >> +	 * to support eVMCS v1 or above. We can also disable eVMCS support
> >> +	 * with module parameter.
> >> +	 */
> >> +	if (enlightened_vmcs &&
> >> +	    ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED &&
> >> +	    (ms_hyperv.nested_features & 0xff) >= 1) {
> 
> > I think we should not proceed with version other than 1 for now -- there
> > is no mention of backward compatibility in the spec.
> 
> I think the check is correct. eVMCS version represents the format of the
> structure in memory. New versions may have nothing in common but at the
> same time Ver1 support can't be dropped from future Hyper-V versions
> without regressing L1s which don't support new version.

I am concerned because Intel doesn't and eVMCS is based on VMCS, so
maybe they are going to use the live migration notifier to disable eVMCS
on incompatible versions.  TLFS points that they plan to handle eVMCS
versions the same as Intel, but the spec might just be misleading.

> So currently we check that L0 supports at least ver1 (or some newer
> version which implies all previously released versions are supported
> too) and we declare that KVM supports ver1 exactly (this is the format
> of the structure we put to memory and share with L0). We can't declare
> support for any other version.

Yes, I'm saying that we don't proceed with any other version than 1.

> Currently, it is only Ver1 on WS2016.

Right.  It should be an easy bug to spot, so I'm ok with the current
version.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-03-08 16:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-26 17:11 [PATCH v2 0/5] Enlightened VMCS support for KVM on Hyper-V Vitaly Kuznetsov
2018-02-26 17:11 ` [PATCH v2 1/5] x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE Vitaly Kuznetsov
2018-03-07 16:19   ` Radim Krčmář
2018-03-07 16:48     ` Roman Kagan
2018-03-07 18:04       ` Radim Krčmář
2018-03-08 10:17         ` Vitaly Kuznetsov
2018-03-08 16:29           ` Michael Kelley (EOSG)
2018-03-08 16:54             ` Vitaly Kuznetsov
2018-02-26 17:11 ` [PATCH v2 2/5] x86/hyper-v: allocate and use Virtual Processor Assist Pages Vitaly Kuznetsov
2018-02-26 17:11 ` [PATCH v2 3/5] x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits Vitaly Kuznetsov
2018-02-26 17:11 ` [PATCH v2 4/5] x86/hyper-v: detect nested features Vitaly Kuznetsov
2018-02-26 17:11 ` [PATCH v2 5/5] x86/kvm: use Enlightened VMCS when running on Hyper-V Vitaly Kuznetsov
2018-03-07 17:56   ` Radim Krčmář
2018-03-08 10:23     ` Vitaly Kuznetsov
2018-03-08 16:56       ` Radim Krčmář
2018-02-28 17:19 ` [PATCH v2 0/5] Enlightened VMCS support for KVM " Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).