linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] IPI virtualization support for VM
@ 2021-07-16  6:48 Zeng Guang
  2021-07-16  6:48 ` [PATCH 1/6] x86/feat_ctl: Add new VMX feature, Tertiary VM-Execution control Zeng Guang
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Zeng Guang @ 2021-07-16  6:48 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang
  Cc: x86, linux-kernel, Robert Hu, Gao Chao, Zeng Guang

Current IPI process in guest VM will virtualize the writing to interrupt
command register(ICR) of the local APIC which will cause VM-exit anyway
on source vCPU. Frequent VM-exit could induce much overhead accumulated
if running IPI intensive task.

IPI virtualization as a new VT-x feature targets to eliminate VM-exits
when issuing IPI on source vCPU. It introduces a new VM-execution
control - "IPI virtualization"(bit4) in the tertiary processor-based
VM-exection controls and a new data structure - "PID-pointer table
address" and "Last PID-pointer index" referenced by the VMCS. When "IPI
virtualization" is enabled, processor emulateds following kind of writes
to APIC registers that would send IPIs, moreover without causing VM-exits.
- Memory-mapped ICR writes
- MSR-mapped ICR writes
- SENDUIPI execution

This patch series implement IPI virtualization support in KVM.

Patches 1-3 add tertiary processor-based VM-execution support
framework. 

Patch 4 implement interrupt dispatch support in x2APIC mode with
APIC-write VM exit. In previous platform, no CPU would produce
APIC-write VM exit with exit qulification 300H when the "virtual x2APIC
mode" VM-execution control was 1.

Patch 5 implement IPI virtualization related function including
feature enabling through tertiary processor-based VM-execution in
various scenario of VMCS configuration, PID table setup in vCPU creation
and vCPU block consideration.     

Document for IPI virtualization is now available at the latest "Intel
Architecture Instruction Set Extensions Programming Reference".

Document Link:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

We did experiment to measure average time sending IPI from source vCPU
to the target vCPU completing the IPI handling by kvm unittest w/ and
w/o IPI virtualization. When IPI virtualizatin enabled, it will reduce
22.21% and 15.98% cycles comsuming in xAPIC mode and x2APIC mode
respectly.

KMV unittest:vmexit/ipi, 2 vCPU, AP runs without halt to ensure no VM
exit impact on target vCPU. 
 
		Cycles of IPI 				
		xAPIC mode		x2APIC mode	
	test	w/o IPIv  w/ IPIv	w/o IPIv  w/ IPIv
	1	6106	  4816		4265	  3768
	2	6244	  4656		4404	  3546
	3	6165	  4658		4233	  3474
	4	5992	  4710		4363	  3430
	5	6083	  4741		4215	  3551
	6	6238	  4904		4304	  3547
	7	6164	  4617		4263	  3709
	8	5984	  4763		4518	  3779
	9	5931	  4712		4645	  3667
	10	5955	  4530		4332	  3724
	11	5897	  4673		4283	  3569
	12	6140	  4794		4178	  3598
	13	6183	  4728		4363	  3628
	14	5991	  4994		4509	  3842
	15	5866	  4665		4520	  3739
	16	6032	  4654		4229	  3701
	17	6050	  4653		4185	  3726
	18	6004	  4792		4319	  3746
	19	5961	  4626		4196	  3392
	20	6194	  4576		4433	  3760
					
Average cycles	6059	  4713.1	4337.85	  3644.8
%Reduction		  -22.21%		  -15.98%

Gao Chao (1):
  KVM: VMX: enable IPI virtualization

Robert Hoo (4):
  x86/feat_ctl: Add new VMX feature, Tertiary VM-Execution control
  KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit
    variation
  KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config
  KVM: VMX: dump_vmcs() reports tertiary_exec_control field as well

Zeng Guang (1):
  KVM: x86: Support interrupt dispatch in x2APIC mode with APIC-write VM
    exit

 arch/x86/include/asm/msr-index.h   |   1 +
 arch/x86/include/asm/vmx.h         |  11 +++
 arch/x86/include/asm/vmxfeatures.h |   5 +-
 arch/x86/kernel/cpu/feat_ctl.c     |   9 +++
 arch/x86/kvm/lapic.c               |   9 ++-
 arch/x86/kvm/vmx/capabilities.h    |   8 ++
 arch/x86/kvm/vmx/evmcs.c           |   2 +
 arch/x86/kvm/vmx/evmcs.h           |   1 +
 arch/x86/kvm/vmx/posted_intr.c     |  22 ++++--
 arch/x86/kvm/vmx/vmcs.h            |   1 +
 arch/x86/kvm/vmx/vmx.c             | 123 +++++++++++++++++++++++++++--
 arch/x86/kvm/vmx/vmx.h             |  30 ++++---
 12 files changed, 198 insertions(+), 24 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/6] x86/feat_ctl: Add new VMX feature, Tertiary VM-Execution control
  2021-07-16  6:48 [PATCH 0/5] IPI virtualization support for VM Zeng Guang
@ 2021-07-16  6:48 ` Zeng Guang
  2021-07-28 23:44   ` Sean Christopherson
  2021-07-16  6:48 ` [PATCH 2/6] KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation Zeng Guang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Zeng Guang @ 2021-07-16  6:48 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang
  Cc: x86, linux-kernel, Robert Hu, Gao Chao, Zeng Guang, Robert Hoo

From: Robert Hoo <robert.hu@linux.intel.com>

New VMX capability MSR IA32_VMX_PROCBASED_CTLS3 conresponse to this new
VM-Execution control field. And it is 64bit allow-1 semantics, not like
previous capability MSRs 32bit allow-0 and 32bit allow-1. So with Tertiary
VM-Execution control field introduced, 2 vmx_feature leaves are introduced,
TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
---
 arch/x86/include/asm/msr-index.h   | 1 +
 arch/x86/include/asm/vmxfeatures.h | 3 ++-
 arch/x86/kernel/cpu/feat_ctl.c     | 9 +++++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index a7c413432b33..3df26e27b554 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -919,6 +919,7 @@
 #define MSR_IA32_VMX_TRUE_EXIT_CTLS      0x0000048f
 #define MSR_IA32_VMX_TRUE_ENTRY_CTLS     0x00000490
 #define MSR_IA32_VMX_VMFUNC             0x00000491
+#define MSR_IA32_VMX_PROCBASED_CTLS3	0x00000492
 
 /* VMX_BASIC bits and bitmasks */
 #define VMX_BASIC_VMCS_SIZE_SHIFT	32
diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h
index d9a74681a77d..27e76eeca05b 100644
--- a/arch/x86/include/asm/vmxfeatures.h
+++ b/arch/x86/include/asm/vmxfeatures.h
@@ -5,7 +5,7 @@
 /*
  * Defines VMX CPU feature bits
  */
-#define NVMXINTS			3 /* N 32-bit words worth of info */
+#define NVMXINTS			5 /* N 32-bit words worth of info */
 
 /*
  * Note: If the comment begins with a quoted string, that string is used
@@ -43,6 +43,7 @@
 #define VMX_FEATURE_RDTSC_EXITING	( 1*32+ 12) /* "" VM-Exit on RDTSC */
 #define VMX_FEATURE_CR3_LOAD_EXITING	( 1*32+ 15) /* "" VM-Exit on writes to CR3 */
 #define VMX_FEATURE_CR3_STORE_EXITING	( 1*32+ 16) /* "" VM-Exit on reads from CR3 */
+#define VMX_FEATURE_TER_CONTROLS	(1*32 + 17) /* "" Enable Tertiary VM-Execution Controls */
 #define VMX_FEATURE_CR8_LOAD_EXITING	( 1*32+ 19) /* "" VM-Exit on writes to CR8 */
 #define VMX_FEATURE_CR8_STORE_EXITING	( 1*32+ 20) /* "" VM-Exit on reads from CR8 */
 #define VMX_FEATURE_VIRTUAL_TPR		( 1*32+ 21) /* "vtpr" TPR virtualization, a.k.a. TPR shadow */
diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
index da696eb4821a..2e0272d127e4 100644
--- a/arch/x86/kernel/cpu/feat_ctl.c
+++ b/arch/x86/kernel/cpu/feat_ctl.c
@@ -15,6 +15,8 @@ enum vmx_feature_leafs {
 	MISC_FEATURES = 0,
 	PRIMARY_CTLS,
 	SECONDARY_CTLS,
+	TERTIARY_CTLS_LOW,
+	TERTIARY_CTLS_HIGH,
 	NR_VMX_FEATURE_WORDS,
 };
 
@@ -42,6 +44,13 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c)
 	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS2, &ign, &supported);
 	c->vmx_capability[SECONDARY_CTLS] = supported;
 
+	/*
+	 * For tertiary execution controls MSR, it's actually a 64bit allowed-1.
+	 */
+	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &ign, &supported);
+	c->vmx_capability[TERTIARY_CTLS_LOW] = ign;
+	c->vmx_capability[TERTIARY_CTLS_HIGH] = supported;
+
 	rdmsr(MSR_IA32_VMX_PINBASED_CTLS, ign, supported);
 	rdmsr_safe(MSR_IA32_VMX_VMFUNC, &ign, &funcs);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/6] KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation
  2021-07-16  6:48 [PATCH 0/5] IPI virtualization support for VM Zeng Guang
  2021-07-16  6:48 ` [PATCH 1/6] x86/feat_ctl: Add new VMX feature, Tertiary VM-Execution control Zeng Guang
@ 2021-07-16  6:48 ` Zeng Guang
  2021-07-16  6:48 ` [PATCH 3/6] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config Zeng Guang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Zeng Guang @ 2021-07-16  6:48 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang
  Cc: x86, linux-kernel, Robert Hu, Gao Chao, Zeng Guang, Robert Hoo

From: Robert Hoo <robert.hu@linux.intel.com>

The Tertiary VM-Exec Control, different from previous control fields, is 64
bit. So extend BUILD_CONTROLS_SHADOW() by adding a 'bit' parameter, to
support both 32 bit and 64 bit fields' auxiliary functions building.
Also, define the auxiliary functions for Tertiary control field here, using
the new BUILD_CONTROLS_SHADOW().

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 arch/x86/kvm/vmx/vmx.h | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 3979a947933a..945c6639ce24 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -413,31 +413,32 @@ static inline u8 vmx_get_rvi(void)
 	return vmcs_read16(GUEST_INTR_STATUS) & 0xff;
 }
 
-#define BUILD_CONTROLS_SHADOW(lname, uname)				    \
-static inline void lname##_controls_set(struct vcpu_vmx *vmx, u32 val)	    \
+#define BUILD_CONTROLS_SHADOW(lname, uname, bits)			    \
+static inline void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val) \
 {									    \
 	if (vmx->loaded_vmcs->controls_shadow.lname != val) {		    \
-		vmcs_write32(uname, val);				    \
+		vmcs_write##bits(uname, val);				    \
 		vmx->loaded_vmcs->controls_shadow.lname = val;		    \
 	}								    \
 }									    \
-static inline u32 lname##_controls_get(struct vcpu_vmx *vmx)		    \
+static inline u##bits lname##_controls_get(struct vcpu_vmx *vmx)	    \
 {									    \
 	return vmx->loaded_vmcs->controls_shadow.lname;			    \
 }									    \
-static inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u32 val)   \
+static inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u##bits val)   \
 {									    \
 	lname##_controls_set(vmx, lname##_controls_get(vmx) | val);	    \
 }									    \
-static inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u32 val) \
+static inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##bits val) \
 {									    \
 	lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val);	    \
 }
-BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS)
-BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS)
-BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL)
-BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL)
-BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL)
+BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32)
+BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32)
+BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL, 32)
+BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL, 32)
+BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL, 32)
+BUILD_CONTROLS_SHADOW(tertiary_exec, TERTIARY_VM_EXEC_CONTROL, 64)
 
 static inline void vmx_register_cache_reset(struct kvm_vcpu *vcpu)
 {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/6] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config
  2021-07-16  6:48 [PATCH 0/5] IPI virtualization support for VM Zeng Guang
  2021-07-16  6:48 ` [PATCH 1/6] x86/feat_ctl: Add new VMX feature, Tertiary VM-Execution control Zeng Guang
  2021-07-16  6:48 ` [PATCH 2/6] KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation Zeng Guang
@ 2021-07-16  6:48 ` Zeng Guang
  2021-07-29  0:03   ` Sean Christopherson
  2021-07-16  6:48 ` [PATCH 4/6] KVM: VMX: dump_vmcs() reports tertiary_exec_control field as well Zeng Guang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Zeng Guang @ 2021-07-16  6:48 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang
  Cc: x86, linux-kernel, Robert Hu, Gao Chao, Zeng Guang, Robert Hoo

From: Robert Hoo <robert.hu@linux.intel.com>

Check VMX feature on tertiary execution control in VMCS config setup.
Currently it's not supported for hyper-v and disabled for now.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
---
 arch/x86/include/asm/vmx.h      |  3 +++
 arch/x86/kvm/vmx/capabilities.h |  7 ++++++
 arch/x86/kvm/vmx/evmcs.c        |  2 ++
 arch/x86/kvm/vmx/evmcs.h        |  1 +
 arch/x86/kvm/vmx/vmcs.h         |  1 +
 arch/x86/kvm/vmx/vmx.c          | 44 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h          |  1 +
 7 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 0ffaa3156a4e..15652047f2db 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -31,6 +31,7 @@
 #define CPU_BASED_RDTSC_EXITING                 VMCS_CONTROL_BIT(RDTSC_EXITING)
 #define CPU_BASED_CR3_LOAD_EXITING		VMCS_CONTROL_BIT(CR3_LOAD_EXITING)
 #define CPU_BASED_CR3_STORE_EXITING		VMCS_CONTROL_BIT(CR3_STORE_EXITING)
+#define CPU_BASED_ACTIVATE_TERTIARY_CONTROLS	VMCS_CONTROL_BIT(TER_CONTROLS)
 #define CPU_BASED_CR8_LOAD_EXITING              VMCS_CONTROL_BIT(CR8_LOAD_EXITING)
 #define CPU_BASED_CR8_STORE_EXITING             VMCS_CONTROL_BIT(CR8_STORE_EXITING)
 #define CPU_BASED_TPR_SHADOW                    VMCS_CONTROL_BIT(VIRTUAL_TPR)
@@ -221,6 +222,8 @@ enum vmcs_field {
 	ENCLS_EXITING_BITMAP_HIGH	= 0x0000202F,
 	TSC_MULTIPLIER                  = 0x00002032,
 	TSC_MULTIPLIER_HIGH             = 0x00002033,
+	TERTIARY_VM_EXEC_CONTROL	= 0x00002034,
+	TERTIARY_VM_EXEC_CONTROL_HIGH	= 0x00002035,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
 	VMCS_LINK_POINTER               = 0x00002800,
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 4705ad55abb5..38d414f64e61 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -59,6 +59,7 @@ struct vmcs_config {
 	u32 pin_based_exec_ctrl;
 	u32 cpu_based_exec_ctrl;
 	u32 cpu_based_2nd_exec_ctrl;
+	u64 cpu_based_3rd_exec_ctrl;
 	u32 vmexit_ctrl;
 	u32 vmentry_ctrl;
 	struct nested_vmx_msrs nested;
@@ -131,6 +132,12 @@ static inline bool cpu_has_secondary_exec_ctrls(void)
 		CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
 }
 
+static inline bool cpu_has_tertiary_exec_ctrls(void)
+{
+	return vmcs_config.cpu_based_exec_ctrl &
+		CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
+}
+
 static inline bool cpu_has_vmx_virtualize_apic_accesses(void)
 {
 	return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 896b2a50b4aa..03c15e1e5807 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -299,8 +299,10 @@ const unsigned int nr_evmcs_1_fields = ARRAY_SIZE(vmcs_field_to_evmcs_1);
 
 __init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
 {
+	vmcs_conf->cpu_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_EXEC_CTRL;
 	vmcs_conf->pin_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_PINCTRL;
 	vmcs_conf->cpu_based_2nd_exec_ctrl &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
+	vmcs_conf->cpu_based_3rd_exec_ctrl = 0;
 
 	vmcs_conf->vmexit_ctrl &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
 	vmcs_conf->vmentry_ctrl &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 2ec9b46f0d0c..8a20295f4f0f 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -50,6 +50,7 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
  */
 #define EVMCS1_UNSUPPORTED_PINCTRL (PIN_BASED_POSTED_INTR | \
 				    PIN_BASED_VMX_PREEMPTION_TIMER)
+#define EVMCS1_UNSUPPORTED_EXEC_CTRL (CPU_BASED_ACTIVATE_TERTIARY_CONTROLS)
 #define EVMCS1_UNSUPPORTED_2NDEXEC					\
 	(SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |				\
 	 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |			\
diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
index 4b9957e2bf5b..83e2065a955d 100644
--- a/arch/x86/kvm/vmx/vmcs.h
+++ b/arch/x86/kvm/vmx/vmcs.h
@@ -48,6 +48,7 @@ struct vmcs_controls_shadow {
 	u32 pin;
 	u32 exec;
 	u32 secondary_exec;
+	u64 tertiary_exec;
 };
 
 /*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 927a552393b9..728873971913 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2391,6 +2391,23 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
 	return 0;
 }
 
+static __init int adjust_vmx_controls_64(u64 ctl_min, u64 ctl_opt,
+					 u32 msr, u64 *result)
+{
+	u64 vmx_msr;
+	u64 ctl = ctl_min | ctl_opt;
+
+	rdmsrl(msr, vmx_msr);
+	ctl &= vmx_msr; /* bit == 1 means it can be set */
+
+	/* Ensure minimum (required) set of control bits are supported. */
+	if (ctl_min & ~ctl)
+		return -EIO;
+
+	*result = ctl;
+	return 0;
+}
+
 static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 				    struct vmx_capability *vmx_cap)
 {
@@ -2399,6 +2416,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	u32 _pin_based_exec_control = 0;
 	u32 _cpu_based_exec_control = 0;
 	u32 _cpu_based_2nd_exec_control = 0;
+	u64 _cpu_based_3rd_exec_control = 0;
 	u32 _vmexit_control = 0;
 	u32 _vmentry_control = 0;
 
@@ -2420,7 +2438,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 
 	opt = CPU_BASED_TPR_SHADOW |
 	      CPU_BASED_USE_MSR_BITMAPS |
-	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
+	      CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
+	      CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS,
 				&_cpu_based_exec_control) < 0)
 		return -EIO;
@@ -2494,6 +2513,16 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 				"1-setting enable VPID VM-execution control\n");
 	}
 
+	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
+		u64 opt3 = 0;
+		u64 min3 = 0;
+
+		if (adjust_vmx_controls_64(min3, opt3,
+					   MSR_IA32_VMX_PROCBASED_CTLS3,
+					   &_cpu_based_3rd_exec_control))
+			return -EIO;
+	}
+
 	min = VM_EXIT_SAVE_DEBUG_CONTROLS | VM_EXIT_ACK_INTR_ON_EXIT;
 #ifdef CONFIG_X86_64
 	min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
@@ -2581,6 +2610,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
 	vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
 	vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control;
+	vmcs_conf->cpu_based_3rd_exec_ctrl = _cpu_based_3rd_exec_control;
 	vmcs_conf->vmexit_ctrl         = _vmexit_control;
 	vmcs_conf->vmentry_ctrl        = _vmentry_control;
 
@@ -4204,6 +4234,13 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
 #define vmx_adjust_sec_exec_exiting(vmx, exec_control, lname, uname) \
 	vmx_adjust_sec_exec_control(vmx, exec_control, lname, uname, uname##_EXITING, true)
 
+static void vmx_compute_tertiary_exec_control(struct vcpu_vmx *vmx)
+{
+	u32 exec_control = vmcs_config.cpu_based_3rd_exec_ctrl;
+
+	vmx->tertiary_exec_control = exec_control;
+}
+
 static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 {
 	struct kvm_vcpu *vcpu = &vmx->vcpu;
@@ -4319,6 +4356,11 @@ static void init_vmcs(struct vcpu_vmx *vmx)
 		secondary_exec_controls_set(vmx, vmx->secondary_exec_control);
 	}
 
+	if (cpu_has_tertiary_exec_ctrls()) {
+		vmx_compute_tertiary_exec_control(vmx);
+		tertiary_exec_controls_set(vmx, vmx->tertiary_exec_control);
+	}
+
 	if (kvm_vcpu_apicv_active(&vmx->vcpu)) {
 		vmcs_write64(EOI_EXIT_BITMAP0, 0);
 		vmcs_write64(EOI_EXIT_BITMAP1, 0);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 945c6639ce24..c356ceebe84c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -266,6 +266,7 @@ struct vcpu_vmx {
 	u32		      msr_ia32_umwait_control;
 
 	u32 secondary_exec_control;
+	u64 tertiary_exec_control;
 
 	/*
 	 * loaded_vmcs points to the VMCS currently used in this vcpu. For a
-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 4/6] KVM: VMX: dump_vmcs() reports tertiary_exec_control field as well
  2021-07-16  6:48 [PATCH 0/5] IPI virtualization support for VM Zeng Guang
                   ` (2 preceding siblings ...)
  2021-07-16  6:48 ` [PATCH 3/6] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config Zeng Guang
@ 2021-07-16  6:48 ` Zeng Guang
  2021-07-16  6:48 ` [PATCH 5/6] KVM: x86: Support interrupt dispatch in x2APIC mode with APIC-write VM exit Zeng Guang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Zeng Guang @ 2021-07-16  6:48 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang
  Cc: x86, linux-kernel, Robert Hu, Gao Chao, Zeng Guang, Robert Hoo

From: Robert Hoo <robert.hu@linux.intel.com>

Add tertiary_exec_control field report in dump_vmcs()

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 728873971913..820fc131d258 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5749,6 +5749,7 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u32 vmentry_ctl, vmexit_ctl;
 	u32 cpu_based_exec_ctrl, pin_based_exec_ctrl, secondary_exec_control;
+	u64 tertiary_exec_control = 0;
 	unsigned long cr4;
 	int efer_slot;
 
@@ -5766,6 +5767,9 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (cpu_has_secondary_exec_ctrls())
 		secondary_exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
 
+	if (cpu_has_tertiary_exec_ctrls())
+		tertiary_exec_control = vmcs_read64(TERTIARY_VM_EXEC_CONTROL);
+
 	pr_err("VMCS %p, last attempted VM-entry on CPU %d\n",
 	       vmx->loaded_vmcs->vmcs, vcpu->arch.last_vmentry_cpu);
 	pr_err("*** Guest State ***\n");
@@ -5864,8 +5868,9 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 		vmx_dump_msrs("host autoload", &vmx->msr_autoload.host);
 
 	pr_err("*** Control State ***\n");
-	pr_err("PinBased=%08x CPUBased=%08x SecondaryExec=%08x\n",
-	       pin_based_exec_ctrl, cpu_based_exec_ctrl, secondary_exec_control);
+	pr_err("PinBased=0x%08x CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n",
+	       pin_based_exec_ctrl, cpu_based_exec_ctrl, secondary_exec_control,
+	       tertiary_exec_control);
 	pr_err("EntryControls=%08x ExitControls=%08x\n", vmentry_ctl, vmexit_ctl);
 	pr_err("ExceptionBitmap=%08x PFECmask=%08x PFECmatch=%08x\n",
 	       vmcs_read32(EXCEPTION_BITMAP),
-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 5/6] KVM: x86: Support interrupt dispatch in x2APIC mode with APIC-write VM exit
  2021-07-16  6:48 [PATCH 0/5] IPI virtualization support for VM Zeng Guang
                   ` (3 preceding siblings ...)
  2021-07-16  6:48 ` [PATCH 4/6] KVM: VMX: dump_vmcs() reports tertiary_exec_control field as well Zeng Guang
@ 2021-07-16  6:48 ` Zeng Guang
  2021-07-16  6:48 ` [PATCH 6/6] KVM: VMX: enable IPI virtualization Zeng Guang
  2021-07-16  9:25 ` [PATCH 0/5] IPI virtualization support for VM Wanpeng Li
  6 siblings, 0 replies; 25+ messages in thread
From: Zeng Guang @ 2021-07-16  6:48 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang
  Cc: x86, linux-kernel, Robert Hu, Gao Chao, Zeng Guang

Since IA x86 platform introduce features of IPI virtualization and
User Interrupts, new behavior applies to the execution of WRMSR ICR
register that causes APIC-write VM exit instead of MSR-write VM exit
in x2APIC mode.

This requires KVM to emulate writing 64-bit value to offset 300H on
the virtual-APIC page(VICR) for guest running in x2APIC mode when
APIC-wrtie VM exit occurs. Prevoisely KVM doesn't consider this
situation as CPU never produce APIC-write VM exit in x2APIC mode before.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
---
 arch/x86/kvm/lapic.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ba5a27879f1d..0b0f0ce96679 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2188,7 +2188,14 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 	/* hw has done the conditional check and inst decode */
 	offset &= 0xff0;
 
-	kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
+	if (apic_x2apic_mode(vcpu->arch.apic) && (offset == APIC_ICR)) {
+		u64 icr_val = *((u64 *)(vcpu->arch.apic->regs + offset));
+
+		kvm_lapic_reg_write(vcpu->arch.apic, APIC_ICR2, (u32)(icr_val>>32));
+		val = (u32)icr_val;
+	} else {
+		kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
+	}
 
 	/* TODO: optimize to just emulate side effect w/o one more write */
 	kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 6/6] KVM: VMX: enable IPI virtualization
  2021-07-16  6:48 [PATCH 0/5] IPI virtualization support for VM Zeng Guang
                   ` (4 preceding siblings ...)
  2021-07-16  6:48 ` [PATCH 5/6] KVM: x86: Support interrupt dispatch in x2APIC mode with APIC-write VM exit Zeng Guang
@ 2021-07-16  6:48 ` Zeng Guang
  2021-07-16  9:52   ` Paolo Bonzini
  2021-07-16  9:25 ` [PATCH 0/5] IPI virtualization support for VM Wanpeng Li
  6 siblings, 1 reply; 25+ messages in thread
From: Zeng Guang @ 2021-07-16  6:48 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang
  Cc: x86, linux-kernel, Robert Hu, Gao Chao, Zeng Guang

From: Gao Chao <chao.gao@intel.com>

With IPI virtualization enabled, the processor emulates writes
to APIC registers that would send IPIs. The processor sets the
bit corresponding to the vector in target vCPU's PIR and may send
a notification (IPI) specified by NDST and NV fields in target vCPU's
PID. It is similar to what IOMMU engine does when dealing with posted
interrupt from devices.

A PID-pointer table is used by the processor to locate the PID of a
vCPU with the vCPU's APIC ID.

Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its
notification vector to wakeup vector. This can ensure that when an IPI
for blocked vCPUs arrives, VMM can get control and wake up blocked
vCPUs. And if a VCPU is preempted, its posted interrupt notification
is suppressed.

Note that IPI virtualization can only virualize physical-addressing,
flat mode, unicast IPIs. Sending other IPIs would still cause a
VM exit and need to be handled by VMM.

Signed-off-by: Gao Chao <chao.gao@intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
---
 arch/x86/include/asm/vmx.h         |  8 ++++
 arch/x86/include/asm/vmxfeatures.h |  2 +
 arch/x86/kvm/vmx/capabilities.h    |  1 +
 arch/x86/kvm/vmx/posted_intr.c     | 22 ++++++---
 arch/x86/kvm/vmx/vmx.c             | 72 ++++++++++++++++++++++++++++--
 arch/x86/kvm/vmx/vmx.h             |  6 +++
 6 files changed, 102 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 15652047f2db..e97cf7b9ff12 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -76,6 +76,11 @@
 #define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE	VMCS_CONTROL_BIT(USR_WAIT_PAUSE)
 #define SECONDARY_EXEC_BUS_LOCK_DETECTION	VMCS_CONTROL_BIT(BUS_LOCK_DETECTION)
 
+/*
+ * Definitions of Tertiary Processor-Based VM-Execution Controls.
+ */
+#define TERTIARY_EXEC_IPI_VIRT			VMCS_CONTROL_BIT(IPI_VIRT)
+
 #define PIN_BASED_EXT_INTR_MASK                 VMCS_CONTROL_BIT(INTR_EXITING)
 #define PIN_BASED_NMI_EXITING                   VMCS_CONTROL_BIT(NMI_EXITING)
 #define PIN_BASED_VIRTUAL_NMIS                  VMCS_CONTROL_BIT(VIRTUAL_NMIS)
@@ -159,6 +164,7 @@ static inline int vmx_misc_mseg_revid(u64 vmx_misc)
 enum vmcs_field {
 	VIRTUAL_PROCESSOR_ID            = 0x00000000,
 	POSTED_INTR_NV                  = 0x00000002,
+	LAST_PID_POINTER_INDEX		= 0x00000008,
 	GUEST_ES_SELECTOR               = 0x00000800,
 	GUEST_CS_SELECTOR               = 0x00000802,
 	GUEST_SS_SELECTOR               = 0x00000804,
@@ -224,6 +230,8 @@ enum vmcs_field {
 	TSC_MULTIPLIER_HIGH             = 0x00002033,
 	TERTIARY_VM_EXEC_CONTROL	= 0x00002034,
 	TERTIARY_VM_EXEC_CONTROL_HIGH	= 0x00002035,
+	PID_POINTER_TABLE		= 0x00002042,
+	PID_POINTER_TABLE_HIGH		= 0x00002043,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
 	VMCS_LINK_POINTER               = 0x00002800,
diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h
index 27e76eeca05b..e821e8126097 100644
--- a/arch/x86/include/asm/vmxfeatures.h
+++ b/arch/x86/include/asm/vmxfeatures.h
@@ -86,4 +86,6 @@
 #define VMX_FEATURE_ENCLV_EXITING	( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf dependent) */
 #define VMX_FEATURE_BUS_LOCK_DETECTION	( 2*32+ 30) /* "" VM-Exit when bus lock caused */
 
+/* Tertiary Processor-Based VM-Execution Controls, word 3 */
+#define VMX_FEATURE_IPI_VIRT		( 3*32 + 4) /* "" Enable IPI virtualization */
 #endif /* _ASM_X86_VMXFEATURES_H */
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 38d414f64e61..9e9710c3ee51 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -12,6 +12,7 @@ extern bool __read_mostly enable_ept;
 extern bool __read_mostly enable_unrestricted_guest;
 extern bool __read_mostly enable_ept_ad_bits;
 extern bool __read_mostly enable_pml;
+extern bool __read_mostly enable_ipiv;
 extern int __read_mostly pt_mode;
 
 #define PT_MODE_SYSTEM		0
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 5f81ef092bd4..d817331bfb05 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -81,9 +81,12 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
 
-	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
-		!irq_remapping_cap(IRQ_POSTING_CAP)  ||
-		!kvm_vcpu_apicv_active(vcpu))
+	if (!kvm_vcpu_apicv_active(vcpu))
+		return;
+
+	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
+		!irq_remapping_cap(IRQ_POSTING_CAP)) &&
+		!to_vmx(vcpu)->ipiv_active)
 		return;
 
 	/* Set SN when the vCPU is preempted */
@@ -141,9 +144,16 @@ int pi_pre_block(struct kvm_vcpu *vcpu)
 	struct pi_desc old, new;
 	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
 
-	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
-		!irq_remapping_cap(IRQ_POSTING_CAP)  ||
-		!kvm_vcpu_apicv_active(vcpu))
+	if (!kvm_vcpu_apicv_active(vcpu))
+		return 0;
+
+	/* Put vCPU into a list and set NV to wakeup vector if it is
+	 * one of the following cases:
+	 * 1. any assigned device is in use.
+	 * 2. IPI virtualization is enabled.
+	 */
+	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
+		!irq_remapping_cap(IRQ_POSTING_CAP)) && !to_vmx(vcpu)->ipiv_active)
 		return 0;
 
 	WARN_ON(irqs_disabled());
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 820fc131d258..8a45f45b263c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -104,6 +104,9 @@ module_param(fasteoi, bool, S_IRUGO);
 
 module_param(enable_apicv, bool, S_IRUGO);
 
+bool __read_mostly enable_ipiv = 1;
+module_param(enable_ipiv, bool, S_IRUGO);
+
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
  * VMX and be a hypervisor for its own guests. If nested=0, guests may not
@@ -225,6 +228,7 @@ static const struct {
 };
 
 #define L1D_CACHE_ORDER 4
+#define PID_TABLE_ORDER get_order(KVM_MAX_VCPU_ID << 3)
 static void *vmx_l1d_flush_pages;
 
 static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
@@ -2514,7 +2518,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	}
 
 	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
-		u64 opt3 = 0;
+		u64 opt3 = enable_ipiv ? TERTIARY_EXEC_IPI_VIRT : 0;
 		u64 min3 = 0;
 
 		if (adjust_vmx_controls_64(min3, opt3,
@@ -2523,6 +2527,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 			return -EIO;
 	}
 
+	if (!(_cpu_based_3rd_exec_control & TERTIARY_EXEC_IPI_VIRT))
+		enable_ipiv = 0;
+
 	min = VM_EXIT_SAVE_DEBUG_CONTROLS | VM_EXIT_ACK_INTR_ON_EXIT;
 #ifdef CONFIG_X86_64
 	min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
@@ -3870,6 +3877,8 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu, u8 mode)
 		vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
 		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
 		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
+		vmx_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR),
+				MSR_TYPE_RW, !to_vmx(vcpu)->ipiv_active);
 	}
 }
 
@@ -4138,14 +4147,21 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 
 	pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx));
 	if (cpu_has_secondary_exec_ctrls()) {
-		if (kvm_vcpu_apicv_active(vcpu))
+		if (kvm_vcpu_apicv_active(vcpu)) {
 			secondary_exec_controls_setbit(vmx,
 				      SECONDARY_EXEC_APIC_REGISTER_VIRT |
 				      SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
-		else
+			if (cpu_has_tertiary_exec_ctrls() && enable_ipiv)
+				tertiary_exec_controls_setbit(vmx,
+					TERTIARY_EXEC_IPI_VIRT);
+		} else {
 			secondary_exec_controls_clearbit(vmx,
 					SECONDARY_EXEC_APIC_REGISTER_VIRT |
 					SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
+			if (cpu_has_tertiary_exec_ctrls())
+				tertiary_exec_controls_clearbit(vmx,
+					TERTIARY_EXEC_IPI_VIRT);
+		}
 	}
 
 	if (cpu_has_vmx_msr_bitmap())
@@ -4236,8 +4252,14 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
 
 static void vmx_compute_tertiary_exec_control(struct vcpu_vmx *vmx)
 {
+	struct kvm_vcpu *vcpu = &vmx->vcpu;
 	u32 exec_control = vmcs_config.cpu_based_3rd_exec_ctrl;
 
+	if (!kvm_vcpu_apicv_active(vcpu))
+		exec_control &= ~TERTIARY_EXEC_IPI_VIRT;
+
+	vmx->ipiv_active = (exec_control & TERTIARY_EXEC_IPI_VIRT) ? true : false;
+
 	vmx->tertiary_exec_control = exec_control;
 }
 
@@ -4332,6 +4354,17 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 
 #define VMX_XSS_EXIT_BITMAP 0
 
+static void install_pid(struct vcpu_vmx *vmx)
+{
+	struct kvm_vmx *kvm_vmx = to_kvm_vmx(vmx->vcpu.kvm);
+
+	BUG_ON(vmx->vcpu.vcpu_id > kvm_vmx->pid_last_index);
+	/* Bit 0 is the valid bit */
+	kvm_vmx->pid_table[vmx->vcpu.vcpu_id] = __pa(&vmx->pi_desc) | 1;
+	vmcs_write64(PID_POINTER_TABLE, __pa(kvm_vmx->pid_table));
+	vmcs_write16(LAST_PID_POINTER_INDEX, kvm_vmx->pid_last_index);
+}
+
 /*
  * Noting that the initialization of Guest-state Area of VMCS is in
  * vmx_vcpu_reset().
@@ -4430,6 +4463,9 @@ static void init_vmcs(struct vcpu_vmx *vmx)
 		vmx->pt_desc.guest.output_mask = 0x7F;
 		vmcs_write64(GUEST_IA32_RTIT_CTL, 0);
 	}
+
+	if (vmx->ipiv_active)
+		install_pid(vmx);
 }
 
 static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -6969,6 +7005,22 @@ static int vmx_vm_init(struct kvm *kvm)
 			break;
 		}
 	}
+
+	if (enable_ipiv) {
+		struct page *pages;
+
+		/* Allocate pages for PID table in order of PID_TABLE_ORDER
+		 * depending on KVM_MAX_VCPU_ID. Each PID entry is 8 bytes.
+		 */
+		pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, PID_TABLE_ORDER);
+
+		if (!pages)
+			return -ENOMEM;
+
+		to_kvm_vmx(kvm)->pid_table = (void *)page_address(pages);
+		to_kvm_vmx(kvm)->pid_last_index = KVM_MAX_VCPU_ID;
+	}
+
 	return 0;
 }
 
@@ -7579,6 +7631,14 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
 	return supported & BIT(bit);
 }
 
+static void vmx_vm_destroy(struct kvm *kvm)
+{
+	struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
+
+	if (kvm_vmx->pid_table)
+		free_pages((unsigned long)kvm_vmx->pid_table, PID_TABLE_ORDER);
+}
+
 static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.hardware_unsetup = hardware_unsetup,
 
@@ -7589,6 +7649,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.vm_size = sizeof(struct kvm_vmx),
 	.vm_init = vmx_vm_init,
+	.vm_destroy = vmx_vm_destroy,
 
 	.vcpu_create = vmx_create_vcpu,
 	.vcpu_free = vmx_free_vcpu,
@@ -7828,6 +7889,11 @@ static __init int hardware_setup(void)
 		vmx_x86_ops.sync_pir_to_irr = NULL;
 	}
 
+	if (!enable_apicv) {
+		enable_ipiv = 0;
+		vmcs_config.cpu_based_3rd_exec_ctrl &= ~TERTIARY_EXEC_IPI_VIRT;
+	}
+
 	if (cpu_has_vmx_tsc_scaling()) {
 		kvm_has_tsc_control = true;
 		kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c356ceebe84c..0dee1e4c628c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -344,6 +344,9 @@ struct vcpu_vmx {
 		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 	} shadow_msr_intercept;
+
+	/* IPI virtualization status */
+	bool ipiv_active;
 };
 
 struct kvm_vmx {
@@ -352,6 +355,9 @@ struct kvm_vmx {
 	unsigned int tss_addr;
 	bool ept_identity_pagetable_done;
 	gpa_t ept_identity_map_addr;
+	/* PID table for IPI virtualization */
+	u64 *pid_table;
+	u16 pid_last_index;
 };
 
 bool nested_vmx_allowed(struct kvm_vcpu *vcpu);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] IPI virtualization support for VM
  2021-07-16  6:48 [PATCH 0/5] IPI virtualization support for VM Zeng Guang
                   ` (5 preceding siblings ...)
  2021-07-16  6:48 ` [PATCH 6/6] KVM: VMX: enable IPI virtualization Zeng Guang
@ 2021-07-16  9:25 ` Wanpeng Li
  2021-07-17  1:46   ` Zeng Guang
  2021-07-19  7:26   ` Zeng Guang
  6 siblings, 2 replies; 25+ messages in thread
From: Wanpeng Li @ 2021-07-16  9:25 UTC (permalink / raw)
  To: Zeng Guang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang, the arch/x86 maintainers, LKML, Robert Hu, Gao Chao

On Fri, 16 Jul 2021 at 15:14, Zeng Guang <guang.zeng@intel.com> wrote:
>
> Current IPI process in guest VM will virtualize the writing to interrupt
> command register(ICR) of the local APIC which will cause VM-exit anyway
> on source vCPU. Frequent VM-exit could induce much overhead accumulated
> if running IPI intensive task.
>
> IPI virtualization as a new VT-x feature targets to eliminate VM-exits
> when issuing IPI on source vCPU. It introduces a new VM-execution
> control - "IPI virtualization"(bit4) in the tertiary processor-based
> VM-exection controls and a new data structure - "PID-pointer table
> address" and "Last PID-pointer index" referenced by the VMCS. When "IPI
> virtualization" is enabled, processor emulateds following kind of writes
> to APIC registers that would send IPIs, moreover without causing VM-exits.
> - Memory-mapped ICR writes
> - MSR-mapped ICR writes
> - SENDUIPI execution
>
> This patch series implement IPI virtualization support in KVM.
>
> Patches 1-3 add tertiary processor-based VM-execution support
> framework.
>
> Patch 4 implement interrupt dispatch support in x2APIC mode with
> APIC-write VM exit. In previous platform, no CPU would produce
> APIC-write VM exit with exit qulification 300H when the "virtual x2APIC
> mode" VM-execution control was 1.
>
> Patch 5 implement IPI virtualization related function including
> feature enabling through tertiary processor-based VM-execution in
> various scenario of VMCS configuration, PID table setup in vCPU creation
> and vCPU block consideration.
>
> Document for IPI virtualization is now available at the latest "Intel
> Architecture Instruction Set Extensions Programming Reference".
>
> Document Link:
> https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
>
> We did experiment to measure average time sending IPI from source vCPU
> to the target vCPU completing the IPI handling by kvm unittest w/ and
> w/o IPI virtualization. When IPI virtualizatin enabled, it will reduce
> 22.21% and 15.98% cycles comsuming in xAPIC mode and x2APIC mode
> respectly.
>
> KMV unittest:vmexit/ipi, 2 vCPU, AP runs without halt to ensure no VM
> exit impact on target vCPU.
>
>                 Cycles of IPI
>                 xAPIC mode              x2APIC mode
>         test    w/o IPIv  w/ IPIv       w/o IPIv  w/ IPIv
>         1       6106      4816          4265      3768
>         2       6244      4656          4404      3546
>         3       6165      4658          4233      3474
>         4       5992      4710          4363      3430
>         5       6083      4741          4215      3551
>         6       6238      4904          4304      3547
>         7       6164      4617          4263      3709
>         8       5984      4763          4518      3779
>         9       5931      4712          4645      3667
>         10      5955      4530          4332      3724
>         11      5897      4673          4283      3569
>         12      6140      4794          4178      3598
>         13      6183      4728          4363      3628
>         14      5991      4994          4509      3842
>         15      5866      4665          4520      3739
>         16      6032      4654          4229      3701
>         17      6050      4653          4185      3726
>         18      6004      4792          4319      3746
>         19      5961      4626          4196      3392
>         20      6194      4576          4433      3760
>
> Average cycles  6059      4713.1        4337.85   3644.8
> %Reduction                -22.21%                 -15.98%

Commit a9ab13ff6e (KVM: X86: Improve latency for single target IPI
fastpath) mentioned that the whole ipi fastpath feature reduces the
latency from 4238 to 3293 around 22.3% on SKX server, why your IPIv
hardware acceleration is worse than software emulation? In addition,
please post the IPI microbenchmark score w/ and w/o the
patchset.(https://lore.kernel.org/kvm/20171219085010.4081-1-ynorov@caviumnetworks.com),
I found that the hardware acceleration is not always outstanding.
https://lore.kernel.org/kvm/CANRm+Cx597FNRUCyVz1D=B6Vs2GX3Sw57X7Muk+yMpi_hb+v1w@mail.gmail.com

    Wanpeng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] KVM: VMX: enable IPI virtualization
  2021-07-16  6:48 ` [PATCH 6/6] KVM: VMX: enable IPI virtualization Zeng Guang
@ 2021-07-16  9:52   ` Paolo Bonzini
  2021-07-17  3:55     ` Zeng Guang
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2021-07-16  9:52 UTC (permalink / raw)
  To: Zeng Guang, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang
  Cc: x86, linux-kernel, Robert Hu, Gao Chao, Emanuele Giuseppe Esposito

On 16/07/21 08:48, Zeng Guang wrote:
>  
> +	if (!(_cpu_based_3rd_exec_control & TERTIARY_EXEC_IPI_VIRT))
> +		enable_ipiv = 0;
> +
>   	}

Please move this to hardware_setup(), using a new function 
cpu_has_vmx_ipiv() in vmx/capabilities.h.

>  	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
> -		u64 opt3 = 0;
> +		u64 opt3 = enable_ipiv ? TERTIARY_EXEC_IPI_VIRT : 0;
>  		u64 min3 = 0;

I like the idea of changing opt3, but it's different from how 
setup_vmcs_config works for the other execution controls.  Let me think 
if it makes sense to clean this up, and move the handling of other 
module parameters from hardware_setup() to setup_vmcs_config().

> +
> +	if (vmx->ipiv_active)
> +		install_pid(vmx);

This should be if (enable_ipiv) instead, I think.

In fact, in all other places that are using vmx->ipiv_active, you can 
actually replace it with enable_ipiv; they are all reached only with 
kvm_vcpu_apicv_active(vcpu) == true.

> +	if (!enable_apicv) {
> +		enable_ipiv = 0;
> +		vmcs_config.cpu_based_3rd_exec_ctrl &= ~TERTIARY_EXEC_IPI_VIRT;
> +	}

The assignment to vmcs_config.cpu_based_3rd_exec_ctrl should not be 
necessary; kvm_vcpu_apicv_active will always be false in that case and 
IPI virtualization would never be enabled.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] IPI virtualization support for VM
  2021-07-16  9:25 ` [PATCH 0/5] IPI virtualization support for VM Wanpeng Li
@ 2021-07-17  1:46   ` Zeng Guang
  2021-07-19  7:26   ` Zeng Guang
  1 sibling, 0 replies; 25+ messages in thread
From: Zeng Guang @ 2021-07-17  1:46 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang, the arch/x86 maintainers, LKML, Robert Hu, Gao Chao

On 7/16/2021 5:25 PM, Wanpeng Li wrote:
> On Fri, 16 Jul 2021 at 15:14, Zeng Guang <guang.zeng@intel.com> wrote:
>> Current IPI process in guest VM will virtualize the writing to interrupt
>> command register(ICR) of the local APIC which will cause VM-exit anyway
>> on source vCPU. Frequent VM-exit could induce much overhead accumulated
>> if running IPI intensive task.
>>
>> IPI virtualization as a new VT-x feature targets to eliminate VM-exits
>> when issuing IPI on source vCPU. It introduces a new VM-execution
>> control - "IPI virtualization"(bit4) in the tertiary processor-based
>> VM-exection controls and a new data structure - "PID-pointer table
>> address" and "Last PID-pointer index" referenced by the VMCS. When "IPI
>> virtualization" is enabled, processor emulateds following kind of writes
>> to APIC registers that would send IPIs, moreover without causing VM-exits.
>> - Memory-mapped ICR writes
>> - MSR-mapped ICR writes
>> - SENDUIPI execution
>>
>> This patch series implement IPI virtualization support in KVM.
>>
>> Patches 1-3 add tertiary processor-based VM-execution support
>> framework.
>>
>> Patch 4 implement interrupt dispatch support in x2APIC mode with
>> APIC-write VM exit. In previous platform, no CPU would produce
>> APIC-write VM exit with exit qulification 300H when the "virtual x2APIC
>> mode" VM-execution control was 1.
>>
>> Patch 5 implement IPI virtualization related function including
>> feature enabling through tertiary processor-based VM-execution in
>> various scenario of VMCS configuration, PID table setup in vCPU creation
>> and vCPU block consideration.
>>
>> Document for IPI virtualization is now available at the latest "Intel
>> Architecture Instruction Set Extensions Programming Reference".
>>
>> Document Link:
>> https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
>>
>> We did experiment to measure average time sending IPI from source vCPU
>> to the target vCPU completing the IPI handling by kvm unittest w/ and
>> w/o IPI virtualization. When IPI virtualizatin enabled, it will reduce
>> 22.21% and 15.98% cycles comsuming in xAPIC mode and x2APIC mode
>> respectly.
>>
>> KMV unittest:vmexit/ipi, 2 vCPU, AP runs without halt to ensure no VM
>> exit impact on target vCPU.
>>
>>                  Cycles of IPI
>>                  xAPIC mode              x2APIC mode
>>          test    w/o IPIv  w/ IPIv       w/o IPIv  w/ IPIv
>>          1       6106      4816          4265      3768
>>          2       6244      4656          4404      3546
>>          3       6165      4658          4233      3474
>>          4       5992      4710          4363      3430
>>          5       6083      4741          4215      3551
>>          6       6238      4904          4304      3547
>>          7       6164      4617          4263      3709
>>          8       5984      4763          4518      3779
>>          9       5931      4712          4645      3667
>>          10      5955      4530          4332      3724
>>          11      5897      4673          4283      3569
>>          12      6140      4794          4178      3598
>>          13      6183      4728          4363      3628
>>          14      5991      4994          4509      3842
>>          15      5866      4665          4520      3739
>>          16      6032      4654          4229      3701
>>          17      6050      4653          4185      3726
>>          18      6004      4792          4319      3746
>>          19      5961      4626          4196      3392
>>          20      6194      4576          4433      3760
>>
>> Average cycles  6059      4713.1        4337.85   3644.8
>> %Reduction                -22.21%                 -15.98%
> Commit a9ab13ff6e (KVM: X86: Improve latency for single target IPI
> fastpath) mentioned that the whole ipi fastpath feature reduces the
> latency from 4238 to 3293 around 22.3% on SKX server, why your IPIv
> hardware acceleration is worse than software emulation? In addition,
> please post the IPI microbenchmark score w/ and w/o the
> patchset.(https://lore.kernel.org/kvm/20171219085010.4081-1-ynorov@caviumnetworks.com),
> I found that the hardware acceleration is not always outstanding.
> https://lore.kernel.org/kvm/CANRm+Cx597FNRUCyVz1D=B6Vs2GX3Sw57X7Muk+yMpi_hb+v1w@mail.gmail.com
We will check on it and get back later.  Thanks for your point.
>      Wanpeng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] KVM: VMX: enable IPI virtualization
  2021-07-16  9:52   ` Paolo Bonzini
@ 2021-07-17  3:55     ` Zeng Guang
  2021-07-18 20:32       ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Zeng Guang @ 2021-07-17  3:55 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang
  Cc: x86, linux-kernel, Robert Hu, Gao Chao, Emanuele Giuseppe Esposito

On 7/16/2021 5:52 PM, Paolo Bonzini wrote:
> On 16/07/21 08:48, Zeng Guang wrote:
>>
>> +    if (!(_cpu_based_3rd_exec_control & TERTIARY_EXEC_IPI_VIRT))
>> +        enable_ipiv = 0;
>> +
>>       }
>
> Please move this to hardware_setup(), using a new function 
> cpu_has_vmx_ipiv() in vmx/capabilities.h.
>
ok, we will change it to follow current framework.
>>      if (_cpu_based_exec_control & 
>> CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
>> -        u64 opt3 = 0;
>> +        u64 opt3 = enable_ipiv ? TERTIARY_EXEC_IPI_VIRT : 0;
>>          u64 min3 = 0;
>
> I like the idea of changing opt3, but it's different from how 
> setup_vmcs_config works for the other execution controls.  Let me 
> think if it makes sense to clean this up, and move the handling of 
> other module parameters from hardware_setup() to setup_vmcs_config().
>
May be an exception for ipiv feature ?
>> +
>> +    if (vmx->ipiv_active)
>> +        install_pid(vmx);
>
> This should be if (enable_ipiv) instead, I think.
>
> In fact, in all other places that are using vmx->ipiv_active, you can 
> actually replace it with enable_ipiv; they are all reached only with 
> kvm_vcpu_apicv_active(vcpu) == true.
>
enable_ipiv as a global variable indicates the hardware capability to 
enable IPIv. Each VM may have different IPIv configuration according to 
kvm_vcpu_apicv_active status. So we use ipiv_active per VM to enclose 
IPIv related operations.
>> +    if (!enable_apicv) {
>> +        enable_ipiv = 0;
>> +        vmcs_config.cpu_based_3rd_exec_ctrl &= ~TERTIARY_EXEC_IPI_VIRT;
>> +    }
>
> The assignment to vmcs_config.cpu_based_3rd_exec_ctrl should not be 
> necessary; kvm_vcpu_apicv_active will always be false in that case and 
> IPI virtualization would never be enabled.
>
We originally intend to make vmcs_config consistent with the actual ipiv 
capability and decouple it from other factors. As you mentioned , it's 
not necessary to update vmcs_config.cpu_based_3rd_exec_ctrl in this 
case. We will remove it.

Thanks.

> Paolo
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] KVM: VMX: enable IPI virtualization
  2021-07-17  3:55     ` Zeng Guang
@ 2021-07-18 20:32       ` Paolo Bonzini
  2021-07-19 12:38         ` Zeng Guang
  2021-07-19 13:16         ` Zeng Guang
  0 siblings, 2 replies; 25+ messages in thread
From: Paolo Bonzini @ 2021-07-18 20:32 UTC (permalink / raw)
  To: Zeng Guang, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang
  Cc: x86, linux-kernel, Robert Hu, Gao Chao, Emanuele Giuseppe Esposito

On 17/07/21 05:55, Zeng Guang wrote:
>>>      if (_cpu_based_exec_control & 
>>> CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
>>> -        u64 opt3 = 0;
>>> +        u64 opt3 = enable_ipiv ? TERTIARY_EXEC_IPI_VIRT : 0;
>>>          u64 min3 = 0;
>>
>> I like the idea of changing opt3, but it's different from how 
>> setup_vmcs_config works for the other execution controls.  Let me 
>> think if it makes sense to clean this up, and move the handling of 
>> other module parameters from hardware_setup() to setup_vmcs_config().
>>
> May be an exception for ipiv feature ?

Yes, possibly.  I'll look into using this idea for other parameters.

>>> +    if (vmx->ipiv_active)
>>> +        install_pid(vmx);
>>
>> This should be if (enable_ipiv) instead, I think.
>>
>> In fact, in all other places that are using vmx->ipiv_active, you can 
>> actually replace it with enable_ipiv; they are all reached only with 
>> kvm_vcpu_apicv_active(vcpu) == true.
>>
> enable_ipiv as a global variable indicates the hardware capability to 
> enable IPIv. Each VM may have different IPIv configuration according to 
> kvm_vcpu_apicv_active status. So we use ipiv_active per VM to enclose 
> IPIv related operations.

Understood, but in practice all uses of vmx->ipiv_active are guarded by 
kvm_vcpu_apicv_active so they are always reached with vmx->ipiv_active 
== enable_ipiv.

The one above instead seems wrong and should just use enable_ipiv.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] IPI virtualization support for VM
  2021-07-16  9:25 ` [PATCH 0/5] IPI virtualization support for VM Wanpeng Li
  2021-07-17  1:46   ` Zeng Guang
@ 2021-07-19  7:26   ` Zeng Guang
  2021-07-19  7:37     ` Wanpeng Li
  1 sibling, 1 reply; 25+ messages in thread
From: Zeng Guang @ 2021-07-19  7:26 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang, the arch/x86 maintainers, LKML, Robert Hu, Gao Chao

On 7/16/2021 5:25 PM, Wanpeng Li wrote:
> On Fri, 16 Jul 2021 at 15:14, Zeng Guang <guang.zeng@intel.com> wrote:
>> Current IPI process in guest VM will virtualize the writing to interrupt
>> command register(ICR) of the local APIC which will cause VM-exit anyway
>> on source vCPU. Frequent VM-exit could induce much overhead accumulated
>> if running IPI intensive task.
>>
>> IPI virtualization as a new VT-x feature targets to eliminate VM-exits
>> when issuing IPI on source vCPU. It introduces a new VM-execution
>> control - "IPI virtualization"(bit4) in the tertiary processor-based
>> VM-exection controls and a new data structure - "PID-pointer table
>> address" and "Last PID-pointer index" referenced by the VMCS. When "IPI
>> virtualization" is enabled, processor emulateds following kind of writes
>> to APIC registers that would send IPIs, moreover without causing VM-exits.
>> - Memory-mapped ICR writes
>> - MSR-mapped ICR writes
>> - SENDUIPI execution
>>
>> This patch series implement IPI virtualization support in KVM.
>>
>> Patches 1-3 add tertiary processor-based VM-execution support
>> framework.
>>
>> Patch 4 implement interrupt dispatch support in x2APIC mode with
>> APIC-write VM exit. In previous platform, no CPU would produce
>> APIC-write VM exit with exit qulification 300H when the "virtual x2APIC
>> mode" VM-execution control was 1.
>>
>> Patch 5 implement IPI virtualization related function including
>> feature enabling through tertiary processor-based VM-execution in
>> various scenario of VMCS configuration, PID table setup in vCPU creation
>> and vCPU block consideration.
>>
>> Document for IPI virtualization is now available at the latest "Intel
>> Architecture Instruction Set Extensions Programming Reference".
>>
>> Document Link:
>> https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
>>
>> We did experiment to measure average time sending IPI from source vCPU
>> to the target vCPU completing the IPI handling by kvm unittest w/ and
>> w/o IPI virtualization. When IPI virtualizatin enabled, it will reduce
>> 22.21% and 15.98% cycles comsuming in xAPIC mode and x2APIC mode
>> respectly.
>>
>> KMV unittest:vmexit/ipi, 2 vCPU, AP runs without halt to ensure no VM
>> exit impact on target vCPU.
>>
>>                  Cycles of IPI
>>                  xAPIC mode              x2APIC mode
>>          test    w/o IPIv  w/ IPIv       w/o IPIv  w/ IPIv
>>          1       6106      4816          4265      3768
>>          2       6244      4656          4404      3546
>>          3       6165      4658          4233      3474
>>          4       5992      4710          4363      3430
>>          5       6083      4741          4215      3551
>>          6       6238      4904          4304      3547
>>          7       6164      4617          4263      3709
>>          8       5984      4763          4518      3779
>>          9       5931      4712          4645      3667
>>          10      5955      4530          4332      3724
>>          11      5897      4673          4283      3569
>>          12      6140      4794          4178      3598
>>          13      6183      4728          4363      3628
>>          14      5991      4994          4509      3842
>>          15      5866      4665          4520      3739
>>          16      6032      4654          4229      3701
>>          17      6050      4653          4185      3726
>>          18      6004      4792          4319      3746
>>          19      5961      4626          4196      3392
>>          20      6194      4576          4433      3760
>>
>> Average cycles  6059      4713.1        4337.85   3644.8
>> %Reduction                -22.21%                 -15.98%
> Commit a9ab13ff6e (KVM: X86: Improve latency for single target IPI
> fastpath) mentioned that the whole ipi fastpath feature reduces the
> latency from 4238 to 3293 around 22.3% on SKX server, why your IPIv
> hardware acceleration is worse than software emulation? In addition,

Actually this performance data was measured on the basis of fastpath 
optimization while cpu runs at base frequency.

As a result, IPI virtualization could have extra 15.98% cost reduction 
over IPI fastpath process in x2apic mode.

> please post the IPI microbenchmark score w/ and w/o the
> patchset.(https://lore.kernel.org/kvm/20171219085010.4081-1-ynorov@caviumnetworks.com),
> I found that the hardware acceleration is not always outstanding.
> https://lore.kernel.org/kvm/CANRm+Cx597FNRUCyVz1D=B6Vs2GX3Sw57X7Muk+yMpi_hb+v1w@mail.gmail.com
>
>      Wanpeng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] IPI virtualization support for VM
  2021-07-19  7:26   ` Zeng Guang
@ 2021-07-19  7:37     ` Wanpeng Li
  2021-07-23  6:15       ` Zeng Guang
  0 siblings, 1 reply; 25+ messages in thread
From: Wanpeng Li @ 2021-07-19  7:37 UTC (permalink / raw)
  To: Zeng Guang
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Tony Luck,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Kai Huang, the arch/x86 maintainers, LKML, Robert Hu, Gao Chao

On Mon, 19 Jul 2021 at 15:26, Zeng Guang <guang.zeng@intel.com> wrote:
>
> On 7/16/2021 5:25 PM, Wanpeng Li wrote:
> > On Fri, 16 Jul 2021 at 15:14, Zeng Guang <guang.zeng@intel.com> wrote:
> >> Current IPI process in guest VM will virtualize the writing to interrupt
> >> command register(ICR) of the local APIC which will cause VM-exit anyway
> >> on source vCPU. Frequent VM-exit could induce much overhead accumulated
> >> if running IPI intensive task.
> >>
> >> IPI virtualization as a new VT-x feature targets to eliminate VM-exits
> >> when issuing IPI on source vCPU. It introduces a new VM-execution
> >> control - "IPI virtualization"(bit4) in the tertiary processor-based
> >> VM-exection controls and a new data structure - "PID-pointer table
> >> address" and "Last PID-pointer index" referenced by the VMCS. When "IPI
> >> virtualization" is enabled, processor emulateds following kind of writes
> >> to APIC registers that would send IPIs, moreover without causing VM-exits.
> >> - Memory-mapped ICR writes
> >> - MSR-mapped ICR writes
> >> - SENDUIPI execution
> >>
> >> This patch series implement IPI virtualization support in KVM.
> >>
> >> Patches 1-3 add tertiary processor-based VM-execution support
> >> framework.
> >>
> >> Patch 4 implement interrupt dispatch support in x2APIC mode with
> >> APIC-write VM exit. In previous platform, no CPU would produce
> >> APIC-write VM exit with exit qulification 300H when the "virtual x2APIC
> >> mode" VM-execution control was 1.
> >>
> >> Patch 5 implement IPI virtualization related function including
> >> feature enabling through tertiary processor-based VM-execution in
> >> various scenario of VMCS configuration, PID table setup in vCPU creation
> >> and vCPU block consideration.
> >>
> >> Document for IPI virtualization is now available at the latest "Intel
> >> Architecture Instruction Set Extensions Programming Reference".
> >>
> >> Document Link:
> >> https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
> >>
> >> We did experiment to measure average time sending IPI from source vCPU
> >> to the target vCPU completing the IPI handling by kvm unittest w/ and
> >> w/o IPI virtualization. When IPI virtualizatin enabled, it will reduce
> >> 22.21% and 15.98% cycles comsuming in xAPIC mode and x2APIC mode
> >> respectly.
> >>
> >> KMV unittest:vmexit/ipi, 2 vCPU, AP runs without halt to ensure no VM
> >> exit impact on target vCPU.
> >>
> >>                  Cycles of IPI
> >>                  xAPIC mode              x2APIC mode
> >>          test    w/o IPIv  w/ IPIv       w/o IPIv  w/ IPIv
> >>          1       6106      4816          4265      3768
> >>          2       6244      4656          4404      3546
> >>          3       6165      4658          4233      3474
> >>          4       5992      4710          4363      3430
> >>          5       6083      4741          4215      3551
> >>          6       6238      4904          4304      3547
> >>          7       6164      4617          4263      3709
> >>          8       5984      4763          4518      3779
> >>          9       5931      4712          4645      3667
> >>          10      5955      4530          4332      3724
> >>          11      5897      4673          4283      3569
> >>          12      6140      4794          4178      3598
> >>          13      6183      4728          4363      3628
> >>          14      5991      4994          4509      3842
> >>          15      5866      4665          4520      3739
> >>          16      6032      4654          4229      3701
> >>          17      6050      4653          4185      3726
> >>          18      6004      4792          4319      3746
> >>          19      5961      4626          4196      3392
> >>          20      6194      4576          4433      3760
> >>
> >> Average cycles  6059      4713.1        4337.85   3644.8
> >> %Reduction                -22.21%                 -15.98%
> > Commit a9ab13ff6e (KVM: X86: Improve latency for single target IPI
> > fastpath) mentioned that the whole ipi fastpath feature reduces the
> > latency from 4238 to 3293 around 22.3% on SKX server, why your IPIv
> > hardware acceleration is worse than software emulation? In addition,
>
> Actually this performance data was measured on the basis of fastpath
> optimization while cpu runs at base frequency.
>
> As a result, IPI virtualization could have extra 15.98% cost reduction
> over IPI fastpath process in x2apic mode.

I observed that adaptive advance lapic timer and adaptive halt-polling
will influence kvm-unit-tests/vmexit.flat IPI testing score, could you
post the score after disabling these features as commit a9ab13ff6e
(KVM: X86: Improve latency for single target IPI fastpath) mentioned?
In addition, please post the hackbench(./hackbench -l 1000000) and ipi
microbenchmark scores.

    Wanpeng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] KVM: VMX: enable IPI virtualization
  2021-07-18 20:32       ` Paolo Bonzini
@ 2021-07-19 12:38         ` Zeng Guang
  2021-07-19 13:58           ` Paolo Bonzini
  2021-07-19 13:16         ` Zeng Guang
  1 sibling, 1 reply; 25+ messages in thread
From: Zeng Guang @ 2021-07-19 12:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Luck, Tony,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Huang, Kai
  Cc: x86, linux-kernel, Hu, Robert, Gao, Chao, Emanuele Giuseppe Esposito

On 7/19/2021 4:32 AM, Paolo Bonzini wrote:
> On 17/07/21 05:55, Zeng Guang wrote:
>>>> +    if (vmx->ipiv_active)
>>>> +        install_pid(vmx);
>>> This should be if (enable_ipiv) instead, I think.
>>>
>>> In fact, in all other places that are using vmx->ipiv_active, you can
>>> actually replace it with enable_ipiv; they are all reached only with
>>> kvm_vcpu_apicv_active(vcpu) == true.
>>>
>> enable_ipiv as a global variable indicates the hardware capability to
>> enable IPIv. Each VM may have different IPIv configuration according to
>> kvm_vcpu_apicv_active status. So we use ipiv_active per VM to enclose
>> IPIv related operations.
> Understood, but in practice all uses of vmx->ipiv_active are guarded by
> kvm_vcpu_apicv_active so they are always reached with vmx->ipiv_active
> == enable_ipiv.
>
> The one above instead seems wrong and should just use enable_ipiv.

enable_ipiv associate with "IPI virtualization" setting in tertiary exec 
controls and
enable_apicv which depends on cpu_has_vmx_apicv(). kvm_vcpu_apicv_active 
still
can be false even if enable_ipiv is true, e.g. in case irqchip not 
emulated in kernel.

It's ok to use enable_ipiv here. vmcs setup succeed for IPIv but it 
won't take into effect as
false kvm_vcpu_apicv_active disable “IPI virtualization" in this case.

> Paolo
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] KVM: VMX: enable IPI virtualization
  2021-07-18 20:32       ` Paolo Bonzini
  2021-07-19 12:38         ` Zeng Guang
@ 2021-07-19 13:16         ` Zeng Guang
  1 sibling, 0 replies; 25+ messages in thread
From: Zeng Guang @ 2021-07-19 13:16 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Luck, Tony,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Huang, Kai
  Cc: x86, linux-kernel, Hu, Robert, Gao, Chao, Emanuele Giuseppe Esposito

On 7/19/2021 4:32 AM, Paolo Bonzini wrote:
> On 17/07/21 05:55, Zeng Guang wrote:
>>>> +    if (vmx->ipiv_active)
>>>> +        install_pid(vmx);
>>> This should be if (enable_ipiv) instead, I think.
>>>
>>> In fact, in all other places that are using vmx->ipiv_active, you can
>>> actually replace it with enable_ipiv; they are all reached only with
>>> kvm_vcpu_apicv_active(vcpu) == true.
>>>
>> enable_ipiv as a global variable indicates the hardware capability to
>> enable IPIv. Each VM may have different IPIv configuration according to
>> kvm_vcpu_apicv_active status. So we use ipiv_active per VM to enclose
>> IPIv related operations.
> Understood, but in practice all uses of vmx->ipiv_active are guarded by
> kvm_vcpu_apicv_active so they are always reached with vmx->ipiv_active
> == enable_ipiv.
>
> The one above instead seems wrong and should just use enable_ipiv.
enable_ipiv associate with "IPI virtualization" setting in tertiary exec 
controls and
enable_apicv which depends on cpu_has_vmx_apicv(). kvm_vcpu_apicv_active 
still
can be false even if enable_ipiv is true, e.g. in case irqchip not 
emulated in kernel.

Though it's ok to use enable_ipiv here, vmcs setup succeed for IPIv but 
it won't take into effect as
false kvm_vcpu_apicv_active runtime disable “IPI virtualization" for VM 
in this case and
leads vmx->ipiv_active become false as well. So vmx->ipiv_active is more 
accurate to reflect
runtime IPIv status.
> Paolo
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] KVM: VMX: enable IPI virtualization
  2021-07-19 12:38         ` Zeng Guang
@ 2021-07-19 13:58           ` Paolo Bonzini
  2021-07-20  1:07             ` Zeng Guang
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2021-07-19 13:58 UTC (permalink / raw)
  To: Zeng Guang, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Luck, Tony,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Huang, Kai
  Cc: x86, linux-kernel, Hu, Robert, Gao, Chao, Emanuele Giuseppe Esposito

On 19/07/21 14:38, Zeng Guang wrote:
>> Understood, but in practice all uses of vmx->ipiv_active are
>> guarded by kvm_vcpu_apicv_active so they are always reached with
>> vmx->ipiv_active == enable_ipiv.
>> 
>> The one above instead seems wrong and should just use enable_ipiv.
> 
> enable_ipiv associate with "IPI virtualization" setting in tertiary
> exec controls and enable_apicv which depends on cpu_has_vmx_apicv().
> kvm_vcpu_apicv_active still can be false even if enable_ipiv is true,
> e.g. in case irqchip not emulated in kernel.

Right, kvm_vcpu_apicv_active *is* set in init_vmcs.  But there's an 
     "if (kvm_vcpu_apicv_active(&vmx->vcpu))" above.  You can just stick

	if (enable_ipicv)
		install_pid(vmx);

inside there.  As to the other occurrences of vmx->ipiv_active, look here:

> +	if (!kvm_vcpu_apicv_active(vcpu))
> +		return;
> +
> +	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
> +		!irq_remapping_cap(IRQ_POSTING_CAP)) &&
> +		!to_vmx(vcpu)->ipiv_active)
>  		return;
>  

This one can be enable_ipiv because APICv must be active.

> +	if (!kvm_vcpu_apicv_active(vcpu))
> +		return 0;
> +
> +	/* Put vCPU into a list and set NV to wakeup vector if it is
> +	 * one of the following cases:
> +	 * 1. any assigned device is in use.
> +	 * 2. IPI virtualization is enabled.
> +	 */
> +	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
> +		!irq_remapping_cap(IRQ_POSTING_CAP)) && !to_vmx(vcpu)->ipiv_active)
>  		return 0;

This one can be !enable_ipiv because APICv must be active.

> 
> @@ -3870,6 +3877,8 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu, u8 mode)
>  		vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
>  		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
>  		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
> +		vmx_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR),
> +				MSR_TYPE_RW, !to_vmx(vcpu)->ipiv_active);
>  	}
>  }

Is inside "if (mode & MSR_BITMAP_MODE_X2APIC_APICV)" so APICv must be 
activ; so it can be enable_ipiv as well.

In conclusion, you do not need vmx->ipiv_active.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] KVM: VMX: enable IPI virtualization
  2021-07-19 13:58           ` Paolo Bonzini
@ 2021-07-20  1:07             ` Zeng Guang
  0 siblings, 0 replies; 25+ messages in thread
From: Zeng Guang @ 2021-07-20  1:07 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Luck, Tony,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Huang, Kai
  Cc: x86, linux-kernel, Hu, Robert, Gao, Chao, Emanuele Giuseppe Esposito

On 7/19/2021 9:58 PM, Paolo Bonzini wrote:
> On 19/07/21 14:38, Zeng Guang wrote:
>>> Understood, but in practice all uses of vmx->ipiv_active are
>>> guarded by kvm_vcpu_apicv_active so they are always reached with
>>> vmx->ipiv_active == enable_ipiv.
>>>
>>> The one above instead seems wrong and should just use enable_ipiv.
>> enable_ipiv associate with "IPI virtualization" setting in tertiary
>> exec controls and enable_apicv which depends on cpu_has_vmx_apicv().
>> kvm_vcpu_apicv_active still can be false even if enable_ipiv is true,
>> e.g. in case irqchip not emulated in kernel.
> Right, kvm_vcpu_apicv_active *is* set in init_vmcs.  But there's an
>       "if (kvm_vcpu_apicv_active(&vmx->vcpu))" above.  You can just stick
>
> 	if (enable_ipicv)
> 		install_pid(vmx);
Ok, got your point now. I will revise to remove vmx->ipiv_active. Thanks.
> inside there.  As to the other occurrences of vmx->ipiv_active, look here:
>
>> +	if (!kvm_vcpu_apicv_active(vcpu))
>> +		return;
>> +
>> +	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
>> +		!irq_remapping_cap(IRQ_POSTING_CAP)) &&
>> +		!to_vmx(vcpu)->ipiv_active)
>>   		return;
>>   
> This one can be enable_ipiv because APICv must be active.
>
>> +	if (!kvm_vcpu_apicv_active(vcpu))
>> +		return 0;
>> +
>> +	/* Put vCPU into a list and set NV to wakeup vector if it is
>> +	 * one of the following cases:
>> +	 * 1. any assigned device is in use.
>> +	 * 2. IPI virtualization is enabled.
>> +	 */
>> +	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
>> +		!irq_remapping_cap(IRQ_POSTING_CAP)) && !to_vmx(vcpu)->ipiv_active)
>>   		return 0;
> This one can be !enable_ipiv because APICv must be active.
>
>> @@ -3870,6 +3877,8 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu, u8 mode)
>>   		vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
>>   		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
>>   		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
>> +		vmx_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR),
>> +				MSR_TYPE_RW, !to_vmx(vcpu)->ipiv_active);
>>   	}
>>   }
> Is inside "if (mode & MSR_BITMAP_MODE_X2APIC_APICV)" so APICv must be
> activ; so it can be enable_ipiv as well.
>
> In conclusion, you do not need vmx->ipiv_active.
>
> Paolo
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] IPI virtualization support for VM
  2021-07-19  7:37     ` Wanpeng Li
@ 2021-07-23  6:15       ` Zeng Guang
  0 siblings, 0 replies; 25+ messages in thread
From: Zeng Guang @ 2021-07-23  6:15 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, Dave Hansen, Luck, Tony,
	Kan Liang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Kim Phillips, Jarkko Sakkinen, Jethro Beekman,
	Huang, Kai, the arch/x86 maintainers, LKML, Hu, Robert, Gao,
	Chao

On 7/19/2021 3:37 PM, Wanpeng Li wrote:
> On Mon, 19 Jul 2021 at 15:26, Zeng Guang <guang.zeng@intel.com> wrote:
>> On 7/16/2021 5:25 PM, Wanpeng Li wrote:
>>> On Fri, 16 Jul 2021 at 15:14, Zeng Guang <guang.zeng@intel.com> wrote:
>>>> Current IPI process in guest VM will virtualize the writing to interrupt
>>>> command register(ICR) of the local APIC which will cause VM-exit anyway
>>>> on source vCPU. Frequent VM-exit could induce much overhead accumulated
>>>> if running IPI intensive task.
>>>>
>>>> IPI virtualization as a new VT-x feature targets to eliminate VM-exits
>>>> when issuing IPI on source vCPU. It introduces a new VM-execution
>>>> control - "IPI virtualization"(bit4) in the tertiary processor-based
>>>> VM-exection controls and a new data structure - "PID-pointer table
>>>> address" and "Last PID-pointer index" referenced by the VMCS. When "IPI
>>>> virtualization" is enabled, processor emulateds following kind of writes
>>>> to APIC registers that would send IPIs, moreover without causing VM-exits.
>>>> - Memory-mapped ICR writes
>>>> - MSR-mapped ICR writes
>>>> - SENDUIPI execution
>>>>
>>>> This patch series implement IPI virtualization support in KVM.
>>>>
>>>> Patches 1-3 add tertiary processor-based VM-execution support
>>>> framework.
>>>>
>>>> Patch 4 implement interrupt dispatch support in x2APIC mode with
>>>> APIC-write VM exit. In previous platform, no CPU would produce
>>>> APIC-write VM exit with exit qulification 300H when the "virtual x2APIC
>>>> mode" VM-execution control was 1.
>>>>
>>>> Patch 5 implement IPI virtualization related function including
>>>> feature enabling through tertiary processor-based VM-execution in
>>>> various scenario of VMCS configuration, PID table setup in vCPU creation
>>>> and vCPU block consideration.
>>>>
>>>> Document for IPI virtualization is now available at the latest "Intel
>>>> Architecture Instruction Set Extensions Programming Reference".
>>>>
>>>> Document Link:
>>>> https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
>>>>
>>>> We did experiment to measure average time sending IPI from source vCPU
>>>> to the target vCPU completing the IPI handling by kvm unittest w/ and
>>>> w/o IPI virtualization. When IPI virtualizatin enabled, it will reduce
>>>> 22.21% and 15.98% cycles comsuming in xAPIC mode and x2APIC mode
>>>> respectly.
>>>>
>>>> KMV unittest:vmexit/ipi, 2 vCPU, AP runs without halt to ensure no VM
>>>> exit impact on target vCPU.
>>>>
>>>>                   Cycles of IPI
>>>>                   xAPIC mode              x2APIC mode
>>>>           test    w/o IPIv  w/ IPIv       w/o IPIv  w/ IPIv
>>>>           1       6106      4816          4265      3768
>>>>           2       6244      4656          4404      3546
>>>>           3       6165      4658          4233      3474
>>>>           4       5992      4710          4363      3430
>>>>           5       6083      4741          4215      3551
>>>>           6       6238      4904          4304      3547
>>>>           7       6164      4617          4263      3709
>>>>           8       5984      4763          4518      3779
>>>>           9       5931      4712          4645      3667
>>>>           10      5955      4530          4332      3724
>>>>           11      5897      4673          4283      3569
>>>>           12      6140      4794          4178      3598
>>>>           13      6183      4728          4363      3628
>>>>           14      5991      4994          4509      3842
>>>>           15      5866      4665          4520      3739
>>>>           16      6032      4654          4229      3701
>>>>           17      6050      4653          4185      3726
>>>>           18      6004      4792          4319      3746
>>>>           19      5961      4626          4196      3392
>>>>           20      6194      4576          4433      3760
>>>>
>>>> Average cycles  6059      4713.1        4337.85   3644.8
>>>> %Reduction                -22.21%                 -15.98%
>>> Commit a9ab13ff6e (KVM: X86: Improve latency for single target IPI
>>> fastpath) mentioned that the whole ipi fastpath feature reduces the
>>> latency from 4238 to 3293 around 22.3% on SKX server, why your IPIv
>>> hardware acceleration is worse than software emulation? In addition,
>> Actually this performance data was measured on the basis of fastpath
>> optimization while cpu runs at base frequency.
>>
>> As a result, IPI virtualization could have extra 15.98% cost reduction
>> over IPI fastpath process in x2apic mode.
> I observed that adaptive advance lapic timer and adaptive halt-polling
> will influence kvm-unit-tests/vmexit.flat IPI testing score, could you
> post the score after disabling these features as commit a9ab13ff6e
> (KVM: X86: Improve latency for single target IPI fastpath) mentioned?
> In addition, please post the hackbench(./hackbench -l 1000000) and ipi
> microbenchmark scores.

We modified unittest to make AP runing with idle loop instead of hlt . 
This eliminates the impact
from adaptive halt-polling. So far we don't observe the influence from 
adaptive advance lapic timer
by test either. vmexit/ipi test should not involve lapic timer.

We post the hackbench and ipi microbenchmark score in patch V2 for your 
reference.

Thanks.

>
>      Wanpeng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] x86/feat_ctl: Add new VMX feature, Tertiary VM-Execution control
  2021-07-16  6:48 ` [PATCH 1/6] x86/feat_ctl: Add new VMX feature, Tertiary VM-Execution control Zeng Guang
@ 2021-07-28 23:44   ` Sean Christopherson
  2021-07-29 15:56     ` Paolo Bonzini
  2021-08-02  8:22     ` Zeng Guang
  0 siblings, 2 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-07-28 23:44 UTC (permalink / raw)
  To: Zeng Guang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Dave Hansen, Tony Luck, Kan Liang,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Kim Phillips, Jarkko Sakkinen, Jethro Beekman, Kai Huang, x86,
	linux-kernel, Robert Hu, Gao Chao, Robert Hoo

On Fri, Jul 16, 2021, Zeng Guang wrote:
> From: Robert Hoo <robert.hu@linux.intel.com>
> 
> New VMX capability MSR IA32_VMX_PROCBASED_CTLS3 conresponse to this new
> VM-Execution control field. And it is 64bit allow-1 semantics, not like
> previous capability MSRs 32bit allow-0 and 32bit allow-1. So with Tertiary
> VM-Execution control field introduced, 2 vmx_feature leaves are introduced,
> TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH.

...

>  /*
>   * Note: If the comment begins with a quoted string, that string is used
> @@ -43,6 +43,7 @@
>  #define VMX_FEATURE_RDTSC_EXITING	( 1*32+ 12) /* "" VM-Exit on RDTSC */
>  #define VMX_FEATURE_CR3_LOAD_EXITING	( 1*32+ 15) /* "" VM-Exit on writes to CR3 */
>  #define VMX_FEATURE_CR3_STORE_EXITING	( 1*32+ 16) /* "" VM-Exit on reads from CR3 */
> +#define VMX_FEATURE_TER_CONTROLS	(1*32 + 17) /* "" Enable Tertiary VM-Execution Controls */

Maybe spell out TERTIARY?   SEC_CONTROLS is at least somewhat guessable, I doubt
TERTIARY is the first thing that comes to mind for most people when seeing "TER" :-)

>  #define VMX_FEATURE_CR8_LOAD_EXITING	( 1*32+ 19) /* "" VM-Exit on writes to CR8 */
>  #define VMX_FEATURE_CR8_STORE_EXITING	( 1*32+ 20) /* "" VM-Exit on reads from CR8 */
>  #define VMX_FEATURE_VIRTUAL_TPR		( 1*32+ 21) /* "vtpr" TPR virtualization, a.k.a. TPR shadow */
> diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
> index da696eb4821a..2e0272d127e4 100644
> --- a/arch/x86/kernel/cpu/feat_ctl.c
> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> @@ -15,6 +15,8 @@ enum vmx_feature_leafs {
>  	MISC_FEATURES = 0,
>  	PRIMARY_CTLS,
>  	SECONDARY_CTLS,
> +	TERTIARY_CTLS_LOW,
> +	TERTIARY_CTLS_HIGH,
>  	NR_VMX_FEATURE_WORDS,
>  };
>  
> @@ -42,6 +44,13 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c)
>  	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS2, &ign, &supported);
>  	c->vmx_capability[SECONDARY_CTLS] = supported;
>  
> +	/*
> +	 * For tertiary execution controls MSR, it's actually a 64bit allowed-1.
> +	 */
> +	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &ign, &supported);
> +	c->vmx_capability[TERTIARY_CTLS_LOW] = ign;
> +	c->vmx_capability[TERTIARY_CTLS_HIGH] = supported;

Assuming only the lower 32 bits are going to be used for the near future (next
few years), what about defining just TERTIARY_CTLS_LOW and then doing:

	/*
	 * Tertiary controls are 64-bit allowed-1, so unlikely other MSRs, the
	 * upper bits are ignored (because they're not used, yet...).
	 */
	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &supported, &ign);
	c->vmx_capability[TERTIARY_CTLS_LOW] = supported;

I.e. punt the ugliness issue down the road a few years.

> +
>  	rdmsr(MSR_IA32_VMX_PINBASED_CTLS, ign, supported);
>  	rdmsr_safe(MSR_IA32_VMX_VMFUNC, &ign, &funcs);
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/6] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config
  2021-07-16  6:48 ` [PATCH 3/6] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config Zeng Guang
@ 2021-07-29  0:03   ` Sean Christopherson
  2021-08-02  6:59     ` Zeng Guang
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2021-07-29  0:03 UTC (permalink / raw)
  To: Zeng Guang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Dave Hansen, Tony Luck, Kan Liang,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Kim Phillips, Jarkko Sakkinen, Jethro Beekman, Kai Huang, x86,
	linux-kernel, Robert Hu, Gao Chao, Robert Hoo

On Fri, Jul 16, 2021, Zeng Guang wrote:
> @@ -4204,6 +4234,13 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
>  #define vmx_adjust_sec_exec_exiting(vmx, exec_control, lname, uname) \
>  	vmx_adjust_sec_exec_control(vmx, exec_control, lname, uname, uname##_EXITING, true)
>  
> +static void vmx_compute_tertiary_exec_control(struct vcpu_vmx *vmx)
> +{
> +	u32 exec_control = vmcs_config.cpu_based_3rd_exec_ctrl;

This is incorrectly truncating the value.

> +
> +	vmx->tertiary_exec_control = exec_control;
> +}
> +
>  static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
>  {
>  	struct kvm_vcpu *vcpu = &vmx->vcpu;
> @@ -4319,6 +4356,11 @@ static void init_vmcs(struct vcpu_vmx *vmx)
>  		secondary_exec_controls_set(vmx, vmx->secondary_exec_control);
>  	}
>  
> +	if (cpu_has_tertiary_exec_ctrls()) {
> +		vmx_compute_tertiary_exec_control(vmx);
> +		tertiary_exec_controls_set(vmx, vmx->tertiary_exec_control);

IMO, the existing vmx->secondary_exec_control is an abomination that should not
exist.  Looking at the code, it's actually not hard to get rid, there's just one
annoying use in prepare_vmcs02_early() that requires a bit of extra work to get
rid of.

Anyways, for tertiary controls, I'd prefer to avoid the same mess and instead
follow vmx_exec_control(), both in functionality and in name:

  static u64 vmx_tertiary_exec_control(struct vcpu_vmx *vmx)
  {
	return vmcs_config.cpu_based_3rd_exec_ctrl;
  }

and:

	if (cpu_has_tertiary_exec_ctrls())
		tertiary_exec_controls_set(vmx, vmx_tertiary_exec_control(vmx));

and then the next patch becomes:

  static u64 vmx_tertiary_exec_control(struct vcpu_vmx *vmx)
  {
	u64 exec_control = vmcs_config.cpu_based_3rd_exec_ctrl;

	if (!kvm_vcpu_apicv_active(vcpu))
		exec_control &= ~TERTIARY_EXEC_IPI_VIRT;

	return exec_control;
  }


And I'll work on a patch to purge vmx->secondary_exec_control.

> +	}
> +
>  	if (kvm_vcpu_apicv_active(&vmx->vcpu)) {
>  		vmcs_write64(EOI_EXIT_BITMAP0, 0);
>  		vmcs_write64(EOI_EXIT_BITMAP1, 0);
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 945c6639ce24..c356ceebe84c 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -266,6 +266,7 @@ struct vcpu_vmx {
>  	u32		      msr_ia32_umwait_control;
>  
>  	u32 secondary_exec_control;
> +	u64 tertiary_exec_control;
>  
>  	/*
>  	 * loaded_vmcs points to the VMCS currently used in this vcpu. For a
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] x86/feat_ctl: Add new VMX feature, Tertiary VM-Execution control
  2021-07-28 23:44   ` Sean Christopherson
@ 2021-07-29 15:56     ` Paolo Bonzini
  2021-08-02  8:22     ` Zeng Guang
  1 sibling, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2021-07-29 15:56 UTC (permalink / raw)
  To: Sean Christopherson, Zeng Guang
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	Dave Hansen, Tony Luck, Kan Liang, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Kim Phillips, Jarkko Sakkinen,
	Jethro Beekman, Kai Huang, x86, linux-kernel, Robert Hu,
	Gao Chao, Robert Hoo

On 29/07/21 01:44, Sean Christopherson wrote:
>> +	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &ign, &supported);
>> +	c->vmx_capability[TERTIARY_CTLS_LOW] = ign;
>> +	c->vmx_capability[TERTIARY_CTLS_HIGH] = supported;
> Assuming only the lower 32 bits are going to be used for the near future (next
> few years), what about defining just TERTIARY_CTLS_LOW and then doing:
> 
> 	/*
> 	 * Tertiary controls are 64-bit allowed-1, so unlikely other MSRs, the
> 	 * upper bits are ignored (because they're not used, yet...).
> 	 */
> 	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &supported, &ign);
> 	c->vmx_capability[TERTIARY_CTLS_LOW] = supported;
> 
> I.e. punt the ugliness issue down the road a few years.
> 

Or use two new variables low/high instead of supported/ign.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/6] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config
  2021-07-29  0:03   ` Sean Christopherson
@ 2021-08-02  6:59     ` Zeng Guang
  0 siblings, 0 replies; 25+ messages in thread
From: Zeng Guang @ 2021-08-02  6:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Dave Hansen, Luck, Tony, Kan Liang,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Kim Phillips, Jarkko Sakkinen, Jethro Beekman, Huang, Kai, x86,
	linux-kernel, Hu, Robert, Gao, Chao, Robert Hoo

On 7/29/2021 8:03 AM, Sean Christopherson wrote:
> On Fri, Jul 16, 2021, Zeng Guang wrote:
>> @@ -4204,6 +4234,13 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
>>   #define vmx_adjust_sec_exec_exiting(vmx, exec_control, lname, uname) \
>>   	vmx_adjust_sec_exec_control(vmx, exec_control, lname, uname, uname##_EXITING, true)
>>   
>> +static void vmx_compute_tertiary_exec_control(struct vcpu_vmx *vmx)
>> +{
>> +	u32 exec_control = vmcs_config.cpu_based_3rd_exec_ctrl;
> This is incorrectly truncating the value.
>
>> +
>> +	vmx->tertiary_exec_control = exec_control;
>> +}
>> +
>>   static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
>>   {
>>   	struct kvm_vcpu *vcpu = &vmx->vcpu;
>> @@ -4319,6 +4356,11 @@ static void init_vmcs(struct vcpu_vmx *vmx)
>>   		secondary_exec_controls_set(vmx, vmx->secondary_exec_control);
>>   	}
>>   
>> +	if (cpu_has_tertiary_exec_ctrls()) {
>> +		vmx_compute_tertiary_exec_control(vmx);
>> +		tertiary_exec_controls_set(vmx, vmx->tertiary_exec_control);
> IMO, the existing vmx->secondary_exec_control is an abomination that should not
> exist.  Looking at the code, it's actually not hard to get rid, there's just one
> annoying use in prepare_vmcs02_early() that requires a bit of extra work to get
> rid of.
>
> Anyways, for tertiary controls, I'd prefer to avoid the same mess and instead
> follow vmx_exec_control(), both in functionality and in name:
>
>    static u64 vmx_tertiary_exec_control(struct vcpu_vmx *vmx)
>    {
> 	return vmcs_config.cpu_based_3rd_exec_ctrl;
>    }
>
> and:
>
> 	if (cpu_has_tertiary_exec_ctrls())
> 		tertiary_exec_controls_set(vmx, vmx_tertiary_exec_control(vmx));
>
> and then the next patch becomes:
>
>    static u64 vmx_tertiary_exec_control(struct vcpu_vmx *vmx)
>    {
> 	u64 exec_control = vmcs_config.cpu_based_3rd_exec_ctrl;
>
> 	if (!kvm_vcpu_apicv_active(vcpu))
> 		exec_control &= ~TERTIARY_EXEC_IPI_VIRT;
>
> 	return exec_control;
>    }
>
>
> And I'll work on a patch to purge vmx->secondary_exec_control.
Ok, it looks much concise. I will change as you suggest. Thanks.
>> +	}
>> +
>>   	if (kvm_vcpu_apicv_active(&vmx->vcpu)) {
>>   		vmcs_write64(EOI_EXIT_BITMAP0, 0);
>>   		vmcs_write64(EOI_EXIT_BITMAP1, 0);
>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>> index 945c6639ce24..c356ceebe84c 100644
>> --- a/arch/x86/kvm/vmx/vmx.h
>> +++ b/arch/x86/kvm/vmx/vmx.h
>> @@ -266,6 +266,7 @@ struct vcpu_vmx {
>>   	u32		      msr_ia32_umwait_control;
>>   
>>   	u32 secondary_exec_control;
>> +	u64 tertiary_exec_control;
>>   
>>   	/*
>>   	 * loaded_vmcs points to the VMCS currently used in this vcpu. For a
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] x86/feat_ctl: Add new VMX feature, Tertiary VM-Execution control
  2021-07-28 23:44   ` Sean Christopherson
  2021-07-29 15:56     ` Paolo Bonzini
@ 2021-08-02  8:22     ` Zeng Guang
  2021-08-02 16:20       ` Sean Christopherson
  1 sibling, 1 reply; 25+ messages in thread
From: Zeng Guang @ 2021-08-02  8:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Dave Hansen, Luck, Tony, Kan Liang,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Kim Phillips, Jarkko Sakkinen, Jethro Beekman, Huang, Kai, x86,
	linux-kernel, Hu, Robert, Gao, Chao, Robert Hoo

On 7/29/2021 7:44 AM, Sean Christopherson wrote:
> On Fri, Jul 16, 2021, Zeng Guang wrote:
>> From: Robert Hoo <robert.hu@linux.intel.com>
>>
>> New VMX capability MSR IA32_VMX_PROCBASED_CTLS3 conresponse to this new
>> VM-Execution control field. And it is 64bit allow-1 semantics, not like
>> previous capability MSRs 32bit allow-0 and 32bit allow-1. So with Tertiary
>> VM-Execution control field introduced, 2 vmx_feature leaves are introduced,
>> TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH.
> ...
>
>>   /*
>>    * Note: If the comment begins with a quoted string, that string is used
>> @@ -43,6 +43,7 @@
>>   #define VMX_FEATURE_RDTSC_EXITING	( 1*32+ 12) /* "" VM-Exit on RDTSC */
>>   #define VMX_FEATURE_CR3_LOAD_EXITING	( 1*32+ 15) /* "" VM-Exit on writes to CR3 */
>>   #define VMX_FEATURE_CR3_STORE_EXITING	( 1*32+ 16) /* "" VM-Exit on reads from CR3 */
>> +#define VMX_FEATURE_TER_CONTROLS	(1*32 + 17) /* "" Enable Tertiary VM-Execution Controls */
> Maybe spell out TERTIARY?   SEC_CONTROLS is at least somewhat guessable, I doubt
> TERTIARY is the first thing that comes to mind for most people when seeing "TER" :-)
Agree. TERTIARY could be readable without any confusion.
>>   #define VMX_FEATURE_CR8_LOAD_EXITING	( 1*32+ 19) /* "" VM-Exit on writes to CR8 */
>>   #define VMX_FEATURE_CR8_STORE_EXITING	( 1*32+ 20) /* "" VM-Exit on reads from CR8 */
>>   #define VMX_FEATURE_VIRTUAL_TPR		( 1*32+ 21) /* "vtpr" TPR virtualization, a.k.a. TPR shadow */
>> diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
>> index da696eb4821a..2e0272d127e4 100644
>> --- a/arch/x86/kernel/cpu/feat_ctl.c
>> +++ b/arch/x86/kernel/cpu/feat_ctl.c
>> @@ -15,6 +15,8 @@ enum vmx_feature_leafs {
>>   	MISC_FEATURES = 0,
>>   	PRIMARY_CTLS,
>>   	SECONDARY_CTLS,
>> +	TERTIARY_CTLS_LOW,
>> +	TERTIARY_CTLS_HIGH,
>>   	NR_VMX_FEATURE_WORDS,
>>   };
>>   
>> @@ -42,6 +44,13 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c)
>>   	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS2, &ign, &supported);
>>   	c->vmx_capability[SECONDARY_CTLS] = supported;
>>   
>> +	/*
>> +	 * For tertiary execution controls MSR, it's actually a 64bit allowed-1.
>> +	 */
>> +	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &ign, &supported);
>> +	c->vmx_capability[TERTIARY_CTLS_LOW] = ign;
>> +	c->vmx_capability[TERTIARY_CTLS_HIGH] = supported;
> Assuming only the lower 32 bits are going to be used for the near future (next
> few years), what about defining just TERTIARY_CTLS_LOW and then doing:
>
> 	/*
> 	 * Tertiary controls are 64-bit allowed-1, so unlikely other MSRs, the
> 	 * upper bits are ignored (because they're not used, yet...).
> 	 */
> 	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &supported, &ign);
> 	c->vmx_capability[TERTIARY_CTLS_LOW] = supported;
>
> I.e. punt the ugliness issue down the road a few years.
Prefer to keep it complete, and use new variables like low/high 
consistent with its function meaning. Ok for that ?
>> +
>>   	rdmsr(MSR_IA32_VMX_PINBASED_CTLS, ign, supported);
>>   	rdmsr_safe(MSR_IA32_VMX_VMFUNC, &ign, &funcs);
>>   
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] x86/feat_ctl: Add new VMX feature, Tertiary VM-Execution control
  2021-08-02  8:22     ` Zeng Guang
@ 2021-08-02 16:20       ` Sean Christopherson
  0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-08-02 16:20 UTC (permalink / raw)
  To: Zeng Guang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Dave Hansen, Luck, Tony, Kan Liang,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Kim Phillips, Jarkko Sakkinen, Jethro Beekman, Huang, Kai, x86,
	linux-kernel, Hu, Robert, Gao, Chao, Robert Hoo

On Mon, Aug 02, 2021, Zeng Guang wrote:
> On 7/29/2021 7:44 AM, Sean Christopherson wrote:
> > On Fri, Jul 16, 2021, Zeng Guang wrote:
> > > @@ -42,6 +44,13 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c)
> > >   	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS2, &ign, &supported);
> > >   	c->vmx_capability[SECONDARY_CTLS] = supported;
> > > +	/*
> > > +	 * For tertiary execution controls MSR, it's actually a 64bit allowed-1.
> > > +	 */
> > > +	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &ign, &supported);
> > > +	c->vmx_capability[TERTIARY_CTLS_LOW] = ign;
> > > +	c->vmx_capability[TERTIARY_CTLS_HIGH] = supported;
> > Assuming only the lower 32 bits are going to be used for the near future (next
> > few years), what about defining just TERTIARY_CTLS_LOW and then doing:
> > 
> > 	/*
> > 	 * Tertiary controls are 64-bit allowed-1, so unlikely other MSRs, the
> > 	 * upper bits are ignored (because they're not used, yet...).
> > 	 */
> > 	rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &supported, &ign);
> > 	c->vmx_capability[TERTIARY_CTLS_LOW] = supported;
> > 
> > I.e. punt the ugliness issue down the road a few years.
> Prefer to keep it complete, and use new variables like low/high consistent
> with its function meaning. Ok for that ?

Ya, either way works.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-08-02 16:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-16  6:48 [PATCH 0/5] IPI virtualization support for VM Zeng Guang
2021-07-16  6:48 ` [PATCH 1/6] x86/feat_ctl: Add new VMX feature, Tertiary VM-Execution control Zeng Guang
2021-07-28 23:44   ` Sean Christopherson
2021-07-29 15:56     ` Paolo Bonzini
2021-08-02  8:22     ` Zeng Guang
2021-08-02 16:20       ` Sean Christopherson
2021-07-16  6:48 ` [PATCH 2/6] KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation Zeng Guang
2021-07-16  6:48 ` [PATCH 3/6] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config Zeng Guang
2021-07-29  0:03   ` Sean Christopherson
2021-08-02  6:59     ` Zeng Guang
2021-07-16  6:48 ` [PATCH 4/6] KVM: VMX: dump_vmcs() reports tertiary_exec_control field as well Zeng Guang
2021-07-16  6:48 ` [PATCH 5/6] KVM: x86: Support interrupt dispatch in x2APIC mode with APIC-write VM exit Zeng Guang
2021-07-16  6:48 ` [PATCH 6/6] KVM: VMX: enable IPI virtualization Zeng Guang
2021-07-16  9:52   ` Paolo Bonzini
2021-07-17  3:55     ` Zeng Guang
2021-07-18 20:32       ` Paolo Bonzini
2021-07-19 12:38         ` Zeng Guang
2021-07-19 13:58           ` Paolo Bonzini
2021-07-20  1:07             ` Zeng Guang
2021-07-19 13:16         ` Zeng Guang
2021-07-16  9:25 ` [PATCH 0/5] IPI virtualization support for VM Wanpeng Li
2021-07-17  1:46   ` Zeng Guang
2021-07-19  7:26   ` Zeng Guang
2021-07-19  7:37     ` Wanpeng Li
2021-07-23  6:15       ` Zeng Guang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).