kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Linear Address Masking (LAM) KVM Enabling
@ 2023-03-19  8:49 Binbin Wu
  2023-03-19  8:49 ` [PATCH v6 1/7] KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3() Binbin Wu
                   ` (6 more replies)
  0 siblings, 7 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-19  8:49 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini; +Cc: chao.gao, robert.hu, binbin.wu

===Feature Introduction===

Linear-address masking (LAM) [1], modifies the checking that is applied to
*64-bit* linear addresses, allowing software to use of the untranslated address
bits for metadata. 

When the feature is virtualized and exposed to guest, it can be used for efficient
address sanitizers (ASAN) implementation and for optimizations in JITs and virtual
machines.

Regarding which pointer bits are masked and can be used for metadata, LAM has 2
modes:
- LAM_48: metadata bits 62:48, i.e. LAM width of 15.
- LAM_57: metadata bits 62:57, i.e. LAM width of 6.

* For user pointers:
  CR3.LAM_U57 = CR3.LAM_U48 = 0, LAM is off;
  CR3.LAM_U57 = 1, LAM57 is active;
  CR3.LAM_U57 = 0 and CR3.LAM_U48 = 1, LAM48 is active.
* For supervisor pointers: 
  CR4.LAM_SUP =0, LAM is off;
  CR4.LAM_SUP =1 with 5-level paging mode, LAM57 is active;
  CR4.LAM_SUP =1 with 4-level paging mode, LAM48 is active.

The modified LAM canonicality check:
* LAM_S48                : [ 1 ][ metadata ][ 1 ]
                             63               47
* LAM_U48                : [ 0 ][ metadata ][ 0 ]
                             63               47
* LAM_S57                : [ 1 ][ metadata ][ 1 ]
                             63               56
* LAM_U57 + 5-lvl paging : [ 0 ][ metadata ][ 0 ]
                             63               56
* LAM_U57 + 4-lvl paging : [ 0 ][ metadata ][ 0...0 ]
                             63               56..47

Note:
1. LAM applies to only data address, not to instructions.
2. LAM identification of an address as user or supervisor is based solely on the
   value of pointer bit 63 and does not, for the purposes of LAM, depend on the CPL.
3. For user mode address, it is possible that 5-level paging and LAM_U48 are both
   set, in this case, the effective usable linear address width is 48. [2]
4. When VM exit, the page faulting linear address saved in VMCS field is clean,
   i.e. metadata cleared with canonical form.

===LAM KVM Design===
LAM KVM enabling includes the following parts:
- Feature Enumeration
  LAM feature is enumerated by CPUID.7.1:EAX.LAM[bit 26].
  If hardware supports LAM and host doesn't disable it explicitly (e.g. via 
  clearcpuid), LAM feature will be exposed to user VMM.

- CR4 Virtualization
  LAM uses CR4.LAM_SUP (bit 28) to configure LAM masking on supervisor pointers.
  CR4[28] is allowed to be set if vCPU supports LAM, including in nested guest.
  CR4.LAM_SUP is allowed to be set even not in 64-bit mode, but it will not take
  effect since LAM only applies to 64-bit linear address.
  Change of CR4.LAM_SUP bit is intercepted to avoid vmread every time when KVM
  fetches its value, with the expectation that guest won't toggle the bit frequently.
  Hardware is not required to do TLB flush when CR4.LAM_SUP toggled, so KVM doesn't
  need to emulate TLB flush based on it. 

- CR3 Virtualization
  LAM uses CR3.LAM_U48 (bit 62) and CR3.LAM_U57 (bit 61) to configure LAM masking
  for user mode pointers.

  When EPT is on:
  CR3 is fully under control of guest, guest LAM is thus transparent to KVM.

  When EPT is off (shadow paging):
  KVM needs to handle guest CR3.LAM_U48 and CR3.LAM_U57 toggles.
  The two bits don't participate in page table walking. They should be masked to
  get the base address of page table. When shadow paging is used, the two bits 
  should be kept as they are in the shadow CR3.
  To be generic, introduce a field 'cr3_ctrl_bits' in kvm_vcpu_arch to record
  the bits used to control supported features related to CR3 (e.g. LAM).
  - Supported control bits are set to cr3_ctrl_bits.
  - Add kvm_vcpu_is_legal_cr3() to validate CR3, allow setting of the
    control bits for the supported features.
  - cr3_ctrl_bits is used to mask the control bits when calculate the base
    address of page table from mmu::get_guest_pgd().
  - Add kvm_get_active_cr3_ctrl_bits() to get the active control bits to
    form a new guest CR3 (in vmx_load_mmu_pgd()).
  For only control bits toggle cases, it is unnecessary to make new pgd, but just
  make request of load pgd.
  Especially, for ONLY-LAM-bits toggle cases, skip TLB flush since hardware
  is not required to flush TLB when CR3 LAM bits toggled.

- Modified Canonicality Check and Metadata Mask
  When LAM is enabled, 64-bit linear address may be tagged with metadata. Linear
  address should be checked for modified canonicality and untagged (i.e. metadata
  bits should be masked by sign-extending the bit 47 or bit 56) in instruction
  emulations and vmexit handlings when LAM is applicable.

  Introduce untag_addr() to kvm_x86_ops to hide the code, which is vendor specific.
  The interface will be called in x86 instruction emulation path.
  For VMX, LAM version is implemented.
  For SVM, add a dummy version to do nothing, but return the original address.

  Also, vmexit handler changes needed are done inside vmx, including VMX instruction
  and SGX ENCLS.

LAM inside nested guest is supported. 
LAM inside SGX enclave mode is NOT supported by this patch series.

The patch series is based on misc branch of kvm-x86 and one patch from Kiril. [3]
The patch series organized as following:
Patch 1/2: Code Clean up.
Patch 3/4: CR4/CR3 virtualization
Patch 5: Implementation of untag_addr
Patch 6: Untag address when LAM applicable
Patch 7: Expose LAM feature to userspace VMM

===Unit Test===
1. Add a kvm-unit-test [4] for LAM, including LAM_SUP and LAM_{U57,U48}.
   For supervisor mode, this test covers CR4 LAM_SUP bits toggle, Memory/MMIO
   access with tagged pointer, and some special instructions (INVLPG, INVPCID,
   INVVPID), INVVIID cases also used to cover VMX instruction vmexit path.
   For uer mode, this test covers CR3 LAM bits toggle, Memory/MMIO access with
   tagged pointer.
   MMIO cases are used to trigger instruction emulation path.
   Run the unit test with both LAM feature on/off (i.e. including negative cases).
2. Run Kernel LAM kselftests in guest, with both EPT=Y/N.
3. Launch a nested guest.

All tests have passed in Simics environment.

[1] Intel ISE https://cdrdv2.intel.com/v1/dl/getContent/671368
    Chapter Linear Address Masking (LAM)
[2] Thus currently, LAM kernel enabling patch only enables LAM_U57. 
    https://lore.kernel.org/lkml/20230123220500.21077-1-kirill.shutemov@linux.intel.com/ 
[3] https://lore.kernel.org/lkml/20230123220500.21077-4-kirill.shutemov@linux.intel.com/
[4] https://lore.kernel.org/kvm/20230319083732.29458-1-binbin.wu@linux.intel.com/

---
Changelog
v5 --> v6:
Add Patch 2 to fix the check of 64-bit mode.
Add untag_addr() to kvm_x86_ops to hide vendor specific code.
Simplify the LAM canonicality check per Chao's suggestion.
Add cr3_ctrl_bits to kvm_vcpu_arch to simplify cr3 invalidation/extract/mask (Chao Gao)
Extend the patchset scope to include nested virtualization and SGX ENCLS handling.
- Add X86_CR4_LAM_SUP in cr4_fixed1_update for nested vmx. (Chao Gao)
- Add SGX ENCLS vmexit handling
- Add VMX insturction vmexit handling
More descriptions in cover letter.
Add Chao's reviwed-by on Patch 4.
Add more test cases in kvm-unit-test.

v4 --> v5:
Reorder and melt patches surround CR3.LAM bits into Patch 3 of this
version.
Revise Patch 1's subject and description
Drop Patch 3
Use kvm_read_cr4_bits() instead of kvm_read_cr4()
Fix: No need to untag addr when write to msr, it should be legacy canonical check
Rename kvm_is_valid_cr3() --> kvm_vcpu_is_valid_cr3(), and update some call
sites of kvm_vcpu_is_valid_cr3() to use kvm_is_valid_cr3().
Other refactors and Miscs.

v3 --> v4:
Drop unrelated Patch 1 in v3 (Binbin, Sean, Xiaoyao)
Intercept CR4.LAM_SUP instead of pass through to guest (Sean)
Just filter out CR3.LAM_{U48, U57}, instead of all reserved high bits
(Sean, Yuan)
Use existing __canonical_address() helper instead write a new one (Weijiang)
Add LAM handling in KVM emulation (Yu, Yuan)
Add Jingqi's reviwed-by on Patch 7
Rebased to Kirill's latest code, which is 6.2-rc1 base.

v2 --> v3:
As LAM Kernel patches are in tip tree now, rebase to it.
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/

v1 --> v2:
1. Fixes i386-allyesconfig build error on get_pgd(), where
   CR3_HIGH_RSVD_MASK isn't applicable.
   (Reported-by: kernel test robot <lkp@intel.com>)
2. In kvm_set_cr3(), be conservative on skip tlb flush when only LAM bits
   toggles. (Kirill)

Binbin Wu (3):
  KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  KVM: x86: Introduce untag_addr() in kvm_x86_ops
  KVM: x86: Untag address when LAM applicable

Robert Hoo (4):
  KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3()
  KVM: x86: Virtualize CR4.LAM_SUP
  KVM: x86: Virtualize CR3.LAM_{U48,U57}
  KVM: x86: Expose LAM feature to userspace VMM

 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    | 15 ++++++-
 arch/x86/kvm/cpuid.c               |  2 +-
 arch/x86/kvm/cpuid.h               |  5 +++
 arch/x86/kvm/emulate.c             | 25 +++++++----
 arch/x86/kvm/mmu.h                 |  5 +++
 arch/x86/kvm/mmu/mmu.c             |  2 +-
 arch/x86/kvm/mmu/paging_tmpl.h     |  2 +-
 arch/x86/kvm/svm/svm.c             |  7 +++
 arch/x86/kvm/vmx/nested.c          | 10 +++--
 arch/x86/kvm/vmx/sgx.c             |  5 ++-
 arch/x86/kvm/vmx/vmx.c             | 69 +++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h             |  2 +
 arch/x86/kvm/x86.c                 | 35 ++++++++++++---
 arch/x86/kvm/x86.h                 |  2 +
 15 files changed, 161 insertions(+), 26 deletions(-)


base-commit: e73ba25fdc241c06ab48a1f708a30305d6036e66
prerequisite-patch-id: cc6bd2338cdf17334431476fbb88f930adeb28ab
-- 
2.25.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v6 1/7] KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3()
  2023-03-19  8:49 [PATCH v6 0/7] Linear Address Masking (LAM) KVM Enabling Binbin Wu
@ 2023-03-19  8:49 ` Binbin Wu
  2023-03-20  1:30   ` Binbin Wu
  2023-03-19  8:49 ` [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode Binbin Wu
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Binbin Wu @ 2023-03-19  8:49 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini; +Cc: chao.gao, robert.hu, binbin.wu

From: Robert Hoo <robert.hu@linux.intel.com>

kvm_read_cr4_bits() returns ulong, explicitly cast it bool when assign to a
bool variable.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 810cbe3b3ba6..410327e7eb55 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1238,7 +1238,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	bool skip_tlb_flush = false;
 	unsigned long pcid = 0;
 #ifdef CONFIG_X86_64
-	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
+	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
 
 	if (pcid_enabled) {
 		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-19  8:49 [PATCH v6 0/7] Linear Address Masking (LAM) KVM Enabling Binbin Wu
  2023-03-19  8:49 ` [PATCH v6 1/7] KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3() Binbin Wu
@ 2023-03-19  8:49 ` Binbin Wu
  2023-03-20 12:36   ` Chao Gao
  2023-03-20 22:36   ` Huang, Kai
  2023-03-19  8:49 ` [PATCH v6 3/7] KVM: x86: Virtualize CR4.LAM_SUP Binbin Wu
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-19  8:49 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini; +Cc: chao.gao, robert.hu, binbin.wu

get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
to check 64-bit mode. Should use is_64_bit_mode() instead.

Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/vmx/nested.c | 2 +-
 arch/x86/kvm/vmx/sgx.c    | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 557b9c468734..0f84cc05f57c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4959,7 +4959,7 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
 
 	/* Checks for #GP/#SS exceptions. */
 	exn = false;
-	if (is_long_mode(vcpu)) {
+	if (is_64_bit_mode(vcpu)) {
 		/*
 		 * The virtual/linear address is never truncated in 64-bit
 		 * mode, e.g. a 32-bit address size can yield a 64-bit virtual
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index aa53c98034bf..0574030b071f 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -29,14 +29,14 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
 
 	/* Skip vmcs.GUEST_DS retrieval for 64-bit mode to avoid VMREADs. */
 	*gva = offset;
-	if (!is_long_mode(vcpu)) {
+	if (!is_64_bit_mode(vcpu)) {
 		vmx_get_segment(vcpu, &s, VCPU_SREG_DS);
 		*gva += s.base;
 	}
 
 	if (!IS_ALIGNED(*gva, alignment)) {
 		fault = true;
-	} else if (likely(is_long_mode(vcpu))) {
+	} else if (likely(is_64_bit_mode(vcpu))) {
 		fault = is_noncanonical_address(*gva, vcpu);
 	} else {
 		*gva &= 0xffffffff;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v6 3/7] KVM: x86: Virtualize CR4.LAM_SUP
  2023-03-19  8:49 [PATCH v6 0/7] Linear Address Masking (LAM) KVM Enabling Binbin Wu
  2023-03-19  8:49 ` [PATCH v6 1/7] KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3() Binbin Wu
  2023-03-19  8:49 ` [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode Binbin Wu
@ 2023-03-19  8:49 ` Binbin Wu
  2023-03-19  8:49 ` [PATCH v6 4/7] KVM: x86: Virtualize CR3.LAM_{U48,U57} Binbin Wu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-19  8:49 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini; +Cc: chao.gao, robert.hu, binbin.wu

From: Robert Hoo <robert.hu@linux.intel.com>

Allow setting of CR4.LAM_SUP (bit 28) by guest if vCPU supports LAM,
and intercept the bit (as it already is).

LAM uses CR4.LAM_SUP to configure LAM masking on supervisor mode address.
To virtualize that, move CR4.LAM_SUP out of CR4_RESERVED_BITS and its
reservation depends on vCPU has LAM feature or not.
CR4.LAM_SUP is allowed to be set even not in 64-bit mode, but it will not
take effect since LAM only applies to 64-bit linear address.

Leave the bit intercepted to avoid vmread every time when KVM fetches its
value, with the expectation that guest won't toggle the bit frequently.

Hardware is not required to do TLB flush when CR4.LAM_SUP toggled, so KVM
doesn't need to emulate TLB flush based on it.
There's no other features/vmx_exec_controls connection, therefore no code
need to be complemented in kvm/vmx_set_cr4().

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/vmx/vmx.c          | 3 +++
 arch/x86/kvm/x86.h              | 2 ++
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8b38a4cb2e29..742fd84c7997 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -125,7 +125,8 @@
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+			  | X86_CR4_LAM_SUP))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bbf60bda877e..66a50224293e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7617,6 +7617,9 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 	cr4_fixed1_update(X86_CR4_UMIP,       ecx, feature_bit(UMIP));
 	cr4_fixed1_update(X86_CR4_LA57,       ecx, feature_bit(LA57));
 
+	entry = kvm_find_cpuid_entry_index(vcpu, 0x7, 1);
+	cr4_fixed1_update(X86_CR4_LAM_SUP,    eax, feature_bit(LAM));
+
 #undef cr4_fixed1_update
 }
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 9de72586f406..fe32554e0da6 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -475,6 +475,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 		__reserved_bits |= X86_CR4_VMXE;        \
 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
 		__reserved_bits |= X86_CR4_PCIDE;       \
+	if (!__cpu_has(__c, X86_FEATURE_LAM))           \
+		__reserved_bits |= X86_CR4_LAM_SUP;     \
 	__reserved_bits;                                \
 })
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v6 4/7] KVM: x86: Virtualize CR3.LAM_{U48,U57}
  2023-03-19  8:49 [PATCH v6 0/7] Linear Address Masking (LAM) KVM Enabling Binbin Wu
                   ` (2 preceding siblings ...)
  2023-03-19  8:49 ` [PATCH v6 3/7] KVM: x86: Virtualize CR4.LAM_SUP Binbin Wu
@ 2023-03-19  8:49 ` Binbin Wu
  2023-03-30  8:33   ` Yang, Weijiang
  2023-03-19  8:49 ` [PATCH v6 5/7] KVM: x86: Introduce untag_addr() in kvm_x86_ops Binbin Wu
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Binbin Wu @ 2023-03-19  8:49 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini; +Cc: chao.gao, robert.hu, binbin.wu

From: Robert Hoo <robert.hu@linux.intel.com>

LAM uses CR3.LAM_U48 (bit 62) and CR3.LAM_U57 (bit 61) to configure LAM
masking for user mode pointers.

When EPT is on:
CR3 is fully under control of guest, guest LAM is thus transparent to KVM.

When EPT is off (shadow paging):
KVM needs to handle guest CR3.LAM_U48 and CR3.LAM_U57 toggles.
The two bits don't participate in page table walking. They should be masked
to get the base address of page table. When shadow paging is used, the two
bits should be kept as they are in the shadow CR3.
To be generic, introduce a field 'cr3_ctrl_bits' in kvm_vcpu_arch to record
the bits used to control supported features related to CR3 (e.g. LAM).
- Supported control bits are set to cr3_ctrl_bits.
- Add kvm_vcpu_is_legal_cr3() to validate CR3, allow setting of the control
  bits for the supported features.
- cr3_ctrl_bits is used to mask the control bits when calculate the base
  address of page table from mmu::get_guest_pgd().
- Add kvm_get_active_cr3_ctrl_bits() to get the active control bits to form
  a new guest CR3 (in vmx_load_mmu_pgd()).
- For only control bits toggle cases, it is unnecessary to make new pgd, but
  just make request of load pgd.
  Especially, for ONLY-LAM-bits toggle cases, skip TLB flush since hardware
  is not required to flush TLB when CR3 LAM bits toggled.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  7 +++++++
 arch/x86/kvm/cpuid.h            |  5 +++++
 arch/x86/kvm/mmu.h              |  5 +++++
 arch/x86/kvm/mmu/mmu.c          |  2 +-
 arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
 arch/x86/kvm/vmx/nested.c       |  6 +++---
 arch/x86/kvm/vmx/vmx.c          |  6 +++++-
 arch/x86/kvm/x86.c              | 29 +++++++++++++++++++++++------
 8 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 742fd84c7997..2174ad27013b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -730,6 +730,13 @@ struct kvm_vcpu_arch {
 	unsigned long cr0_guest_owned_bits;
 	unsigned long cr2;
 	unsigned long cr3;
+	/*
+	 * Bits in CR3 used to enable certain features. These bits don't
+	 * participate in page table walking. They should be masked to
+	 * get the base address of page table. When shadow paging is
+	 * used, these bits should be kept as they are in the shadow CR3.
+	 */
+	u64 cr3_ctrl_bits;
 	unsigned long cr4;
 	unsigned long cr4_guest_owned_bits;
 	unsigned long cr4_guest_rsvd_bits;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index b1658c0de847..ef8e1b912d7d 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -42,6 +42,11 @@ static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
 	return vcpu->arch.maxphyaddr;
 }
 
+static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
+{
+	return !((cr3 & vcpu->arch.reserved_gpa_bits) & ~vcpu->arch.cr3_ctrl_bits);
+}
+
 static inline bool kvm_vcpu_is_legal_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
 	return !(gpa & vcpu->arch.reserved_gpa_bits);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 168c46fd8dd1..29985eeb8e12 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -142,6 +142,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
 	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
 }
 
+static inline u64 kvm_get_active_cr3_ctrl_bits(struct kvm_vcpu *vcpu)
+{
+	return kvm_read_cr3(vcpu) & vcpu->arch.cr3_ctrl_bits;
+}
+
 static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 {
 	u64 root_hpa = vcpu->arch.mmu->root.hpa;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index aeb240b339f5..e0b86ace7326 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3722,7 +3722,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	int quadrant, i, r;
 	hpa_t root;
 
-	root_pgd = mmu->get_guest_pgd(vcpu);
+	root_pgd = mmu->get_guest_pgd(vcpu) & ~vcpu->arch.cr3_ctrl_bits;
 	root_gfn = root_pgd >> PAGE_SHIFT;
 
 	if (mmu_check_root(vcpu, root_gfn))
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e5662dbd519c..8887615534b0 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -324,7 +324,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	trace_kvm_mmu_pagetable_walk(addr, access);
 retry_walk:
 	walker->level = mmu->cpu_role.base.level;
-	pte           = mmu->get_guest_pgd(vcpu);
+	pte           = mmu->get_guest_pgd(vcpu) & ~vcpu->arch.cr3_ctrl_bits;
 	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
 
 #if PTTYPE == 64
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0f84cc05f57c..2eb258992d63 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1079,7 +1079,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 			       bool nested_ept, bool reload_pdptrs,
 			       enum vm_entry_failure_code *entry_failure_code)
 {
-	if (CC(kvm_vcpu_is_illegal_gpa(vcpu, cr3))) {
+	if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3))) {
 		*entry_failure_code = ENTRY_FAIL_DEFAULT;
 		return -EINVAL;
 	}
@@ -1101,7 +1101,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 	kvm_init_mmu(vcpu);
 
 	if (!nested_ept)
-		kvm_mmu_new_pgd(vcpu, cr3);
+		kvm_mmu_new_pgd(vcpu, cr3 & ~vcpu->arch.cr3_ctrl_bits);
 
 	return 0;
 }
@@ -2907,7 +2907,7 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 
 	if (CC(!nested_host_cr0_valid(vcpu, vmcs12->host_cr0)) ||
 	    CC(!nested_host_cr4_valid(vcpu, vmcs12->host_cr4)) ||
-	    CC(kvm_vcpu_is_illegal_gpa(vcpu, vmcs12->host_cr3)))
+	    CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3)))
 		return -EINVAL;
 
 	if (CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 66a50224293e..9638a3000256 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3390,7 +3390,8 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			update_guest_cr3 = false;
 		vmx_ept_load_pdptrs(vcpu);
 	} else {
-		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu) |
+		            kvm_get_active_cr3_ctrl_bits(vcpu);
 	}
 
 	if (update_guest_cr3)
@@ -7750,6 +7751,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		vmx->msr_ia32_feature_control_valid_bits &=
 			~FEAT_CTL_SGX_LC_ENABLED;
 
+	if (guest_cpuid_has(vcpu, X86_FEATURE_LAM))
+		vcpu->arch.cr3_ctrl_bits |= X86_CR3_LAM_U48 | X86_CR3_LAM_U57;
+
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 410327e7eb55..e74af72f53ec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1236,7 +1236,7 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	bool skip_tlb_flush = false;
-	unsigned long pcid = 0;
+	unsigned long pcid = 0, old_cr3;
 #ifdef CONFIG_X86_64
 	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
 
@@ -1247,8 +1247,9 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	}
 #endif
 
+	old_cr3 = kvm_read_cr3(vcpu);
 	/* PDPTRs are always reloaded for PAE paging. */
-	if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
+	if (cr3 == old_cr3 && !is_pae_paging(vcpu))
 		goto handle_tlb_flush;
 
 	/*
@@ -1256,14 +1257,30 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
 	 * the current vCPU mode is accurate.
 	 */
-	if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
+	if (!kvm_vcpu_is_legal_cr3(vcpu, cr3))
 		return 1;
 
 	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
 		return 1;
 
-	if (cr3 != kvm_read_cr3(vcpu))
-		kvm_mmu_new_pgd(vcpu, cr3);
+	if (cr3 != old_cr3) {
+		if ((cr3 ^ old_cr3) & ~vcpu->arch.cr3_ctrl_bits) {
+			kvm_mmu_new_pgd(vcpu, cr3 & ~vcpu->arch.cr3_ctrl_bits);
+		} else {
+			/*
+			 * Though only control (LAM) bits changed, make the
+			 * request to force an update on guest CR3 because the
+			 * control (LAM) bits are stale
+			 */
+			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
+			/*
+			 * HW is not required to flush TLB when CR3 LAM bits toggled.
+			 * Currently only LAM bits in cr3_ctrl_bits, if more bits added in
+			 * the future, need to check whether to skip TLB flush or not.
+			 */
+			skip_tlb_flush = true;
+		}
+	}
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
@@ -11305,7 +11322,7 @@ static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 		 */
 		if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
 			return false;
-		if (kvm_vcpu_is_illegal_gpa(vcpu, sregs->cr3))
+		if (!kvm_vcpu_is_legal_cr3(vcpu, sregs->cr3))
 			return false;
 	} else {
 		/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v6 5/7] KVM: x86: Introduce untag_addr() in kvm_x86_ops
  2023-03-19  8:49 [PATCH v6 0/7] Linear Address Masking (LAM) KVM Enabling Binbin Wu
                   ` (3 preceding siblings ...)
  2023-03-19  8:49 ` [PATCH v6 4/7] KVM: x86: Virtualize CR3.LAM_{U48,U57} Binbin Wu
@ 2023-03-19  8:49 ` Binbin Wu
  2023-03-20 12:07   ` Chao Gao
  2023-03-19  8:49 ` [PATCH v6 6/7] KVM: x86: Untag address when LAM applicable Binbin Wu
  2023-03-19  8:49 ` [PATCH v6 7/7] KVM: x86: Expose LAM feature to userspace VMM Binbin Wu
  6 siblings, 1 reply; 43+ messages in thread
From: Binbin Wu @ 2023-03-19  8:49 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini; +Cc: chao.gao, robert.hu, binbin.wu

Introduce a new interface untag_addr() to kvm_x86_ops to untag the metadata
from linear address. Implement LAM version in VMX and dummy version in SVM.

When enabled feature like Intel Linear Address Masking or AMD Upper
Address Ignore, linear address may be tagged with metadata. Linear
address should be checked for modified canonicality and untagged in
instrution emulations or vmexit handlings if LAM or UAI is applicable.

Introduce untag_addr() to kvm_x86_ops to hide the code related to vendor
specific details.
- For VMX, LAM version is implemented.
  LAM has a modified canonical check when applicable:
  * LAM_S48                : [ 1 ][ metadata ][ 1 ]
                               63               47
  * LAM_U48                : [ 0 ][ metadata ][ 0 ]
                               63               47
  * LAM_S57                : [ 1 ][ metadata ][ 1 ]
                               63               56
  * LAM_U57 + 5-lvl paging : [ 0 ][ metadata ][ 0 ]
                               63               56
  * LAM_U57 + 4-lvl paging : [ 0 ][ metadata ][ 0...0 ]
                               63               56..47
  If LAM is applicable to certain address, untag the metadata bits and
  replace them with the value of bit 47 (LAM48) or bit 56 (LAM57). Later
  the untagged address will do legacy canonical check. So that LAM canonical
  check and mask can be covered by "untag + legacy canonical check".

  For cases LAM is not applicable, 'flags' is passed to the interface
  to skip untag.

- For SVM, add a dummy version to do nothing, but return the original
  address.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  5 +++
 arch/x86/kvm/svm/svm.c             |  7 ++++
 arch/x86/kvm/vmx/vmx.c             | 60 ++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h             |  2 +
 5 files changed, 75 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 8dc345cc6318..7d63d1b942ac 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -52,6 +52,7 @@ KVM_X86_OP(cache_reg)
 KVM_X86_OP(get_rflags)
 KVM_X86_OP(set_rflags)
 KVM_X86_OP(get_if_flag)
+KVM_X86_OP(untag_addr)
 KVM_X86_OP(flush_tlb_all)
 KVM_X86_OP(flush_tlb_current)
 KVM_X86_OP_OPTIONAL(tlb_remote_flush)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2174ad27013b..dd34041e3a4a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -69,6 +69,9 @@
 #define KVM_X86_NOTIFY_VMEXIT_VALID_BITS	(KVM_X86_NOTIFY_VMEXIT_ENABLED | \
 						 KVM_X86_NOTIFY_VMEXIT_USER)
 
+/* flags for kvm_x86_ops::untag_addr() */
+#define KVM_X86_UNTAG_ADDR_SKIP_LAM	_BITULL(0)
+
 /* x86-specific vcpu->requests bit members */
 #define KVM_REQ_MIGRATE_TIMER		KVM_ARCH_REQ(0)
 #define KVM_REQ_REPORT_TPR_ACCESS	KVM_ARCH_REQ(1)
@@ -1597,6 +1600,8 @@ struct kvm_x86_ops {
 	void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
 	bool (*get_if_flag)(struct kvm_vcpu *vcpu);
 
+	u64 (*untag_addr)(struct kvm_vcpu *vcpu, u64 la, u64 flags);
+
 	void (*flush_tlb_all)(struct kvm_vcpu *vcpu);
 	void (*flush_tlb_current)(struct kvm_vcpu *vcpu);
 	int  (*tlb_remote_flush)(struct kvm *kvm);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d13cf53e7390..48ce80235728 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4696,6 +4696,11 @@ static int svm_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+static u64 svm_untag_addr(struct kvm_vcpu *vcpu, u64 addr, u64 flags)
+{
+	return addr;
+}
+
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
@@ -4745,6 +4750,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.set_rflags = svm_set_rflags,
 	.get_if_flag = svm_get_if_flag,
 
+	.untag_addr = svm_untag_addr,
+
 	.flush_tlb_all = svm_flush_tlb_current,
 	.flush_tlb_current = svm_flush_tlb_current,
 	.flush_tlb_gva = svm_flush_tlb_gva,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9638a3000256..736544f20709 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8124,6 +8124,64 @@ static void vmx_vm_destroy(struct kvm *kvm)
 	free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm));
 }
 
+
+#define LAM_S57_EN_MASK (X86_CR4_LAM_SUP | X86_CR4_LA57)
+
+static int lam_sign_extend_bit(bool user, struct kvm_vcpu *vcpu)
+{
+	u64 cr3, cr4;
+
+	if (user) {
+		cr3 = kvm_read_cr3(vcpu);
+		if (cr3 & X86_CR3_LAM_U57)
+			return 56;
+		if (cr3 & X86_CR3_LAM_U48)
+			return 47;
+	} else {
+		cr4 = kvm_read_cr4_bits(vcpu, LAM_S57_EN_MASK);
+		if (cr4 == LAM_S57_EN_MASK)
+			return 56;
+		if (cr4 & X86_CR4_LAM_SUP)
+			return 47;
+	}
+	return -1;
+}
+
+/*
+ * Only called in 64-bit mode.
+ *
+ * Mask metadata in pointers by sign-extending the value of bit 47 (LAM48) or
+ * 56 (LAM57). Metadata are bits [62:48] in LAM48 and are [62:57] in LAM57.
+ * The resulting address after untagging isn't guaranteed to be canonical.
+ * Callers should perform the original canonical check and raise #GP/#SS if the
+ * address is non-canonical.
+ */
+u64 vmx_untag_addr(struct kvm_vcpu *vcpu, u64 addr, u64 flags)
+{
+	int sign_ext_bit;
+
+	/*
+	 * Instead of calling relatively expensive guest_cpuid_has(), just check
+	 * LAM_U48 in cr3_ctrl_bits. If not set, vCPU doesn't supports LAM.
+	 */
+	if (!(vcpu->arch.cr3_ctrl_bits & X86_CR3_LAM_U48) ||
+	    (flags & KVM_X86_UNTAG_ADDR_SKIP_LAM))
+		return addr;
+
+	if(!is_64_bit_mode(vcpu)){
+		WARN_ONCE(1, "Only be called in 64-bit mode");
+		return addr;
+	}
+
+	sign_ext_bit = lam_sign_extend_bit(!(addr >> 63), vcpu);
+
+	if (sign_ext_bit < 0)
+		return addr;
+
+	return (sign_extend64(addr, sign_ext_bit) & ~BIT_ULL(63)) |
+	       (addr & BIT_ULL(63));
+}
+
 static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
@@ -8172,6 +8230,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.set_rflags = vmx_set_rflags,
 	.get_if_flag = vmx_get_if_flag,
 
+	.untag_addr = vmx_untag_addr,
+
 	.flush_tlb_all = vmx_flush_tlb_all,
 	.flush_tlb_current = vmx_flush_tlb_current,
 	.flush_tlb_gva = vmx_flush_tlb_gva,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a3da84f4ea45..023d9b359ded 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -433,6 +433,8 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
 u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu);
 u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu);
 
+u64 vmx_untag_addr(struct kvm_vcpu *vcpu, u64 addr, u64 flags);
+
 static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
 					     int type, bool value)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v6 6/7] KVM: x86: Untag address when LAM applicable
  2023-03-19  8:49 [PATCH v6 0/7] Linear Address Masking (LAM) KVM Enabling Binbin Wu
                   ` (4 preceding siblings ...)
  2023-03-19  8:49 ` [PATCH v6 5/7] KVM: x86: Introduce untag_addr() in kvm_x86_ops Binbin Wu
@ 2023-03-19  8:49 ` Binbin Wu
  2023-03-20 11:51   ` Chao Gao
                     ` (2 more replies)
  2023-03-19  8:49 ` [PATCH v6 7/7] KVM: x86: Expose LAM feature to userspace VMM Binbin Wu
  6 siblings, 3 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-19  8:49 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini; +Cc: chao.gao, robert.hu, binbin.wu

Untag address for 64-bit memory/mmio operand in instruction emulations
and vmexit handlers when LAM is applicable.

For instruction emulation, untag address in __linearize() before
canonical check. LAM doesn't apply to instruction fetch and invlpg,
use KVM_X86_UNTAG_ADDR_SKIP_LAM to skip LAM untag.

For vmexit handlings related to 64-bit linear address:
- Cases need to untag address
  Operand(s) of VMX instructions and INVPCID
  Operand(s) of SGX ENCLS
  Linear address in INVVPID descriptor.
- Cases LAM doesn't apply to (no change needed)
  Operand of INVLPG
  Linear address in INVPCID descriptor

Co-developed-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/emulate.c    | 25 +++++++++++++++++--------
 arch/x86/kvm/vmx/nested.c |  2 ++
 arch/x86/kvm/vmx/sgx.c    |  1 +
 arch/x86/kvm/x86.c        |  4 ++++
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a630c5db971c..c46f0162498e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -688,7 +688,8 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
 				       struct segmented_address addr,
 				       unsigned *max_size, unsigned size,
 				       bool write, bool fetch,
-				       enum x86emul_mode mode, ulong *linear)
+				       enum x86emul_mode mode, ulong *linear,
+				       u64 untag_flags)
 {
 	struct desc_struct desc;
 	bool usable;
@@ -701,9 +702,10 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
 	*max_size = 0;
 	switch (mode) {
 	case X86EMUL_MODE_PROT64:
-		*linear = la;
+		*linear = static_call(kvm_x86_untag_addr)(ctxt->vcpu, la, untag_flags);
+
 		va_bits = ctxt_virt_addr_bits(ctxt);
-		if (!__is_canonical_address(la, va_bits))
+		if (!__is_canonical_address(*linear, va_bits))
 			goto bad;
 
 		*max_size = min_t(u64, ~0u, (1ull << va_bits) - la);
@@ -757,8 +759,8 @@ static int linearize(struct x86_emulate_ctxt *ctxt,
 		     ulong *linear)
 {
 	unsigned max_size;
-	return __linearize(ctxt, addr, &max_size, size, write, false,
-			   ctxt->mode, linear);
+	return __linearize(ctxt, addr, &max_size, size, false, false,
+			   ctxt->mode, linear, 0);
 }
 
 static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
@@ -771,7 +773,9 @@ static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
 
 	if (ctxt->op_bytes != sizeof(unsigned long))
 		addr.ea = dst & ((1UL << (ctxt->op_bytes << 3)) - 1);
-	rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode, &linear);
+	/* skip LAM untag for instruction */
+	rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode,
+		         &linear, KVM_X86_UNTAG_ADDR_SKIP_LAM);
 	if (rc == X86EMUL_CONTINUE)
 		ctxt->_eip = addr.ea;
 	return rc;
@@ -906,9 +910,11 @@ static int __do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, int op_size)
 	 * __linearize is called with size 0 so that it does not do any
 	 * boundary check itself.  Instead, we use max_size to check
 	 * against op_size.
+	 *
+	 * skip LAM untag for instruction
 	 */
 	rc = __linearize(ctxt, addr, &max_size, 0, false, true, ctxt->mode,
-			 &linear);
+			 &linear, KVM_X86_UNTAG_ADDR_SKIP_LAM);
 	if (unlikely(rc != X86EMUL_CONTINUE))
 		return rc;
 
@@ -3433,8 +3439,11 @@ static int em_invlpg(struct x86_emulate_ctxt *ctxt)
 {
 	int rc;
 	ulong linear;
+	unsigned max_size;
 
-	rc = linearize(ctxt, ctxt->src.addr.mem, 1, false, &linear);
+	/* skip untag for invlpg since LAM is not applied to invlpg */
+	rc = __linearize(ctxt, ctxt->src.addr.mem, &max_size, 1, false, false,
+			 ctxt->mode, &linear, KVM_X86_UNTAG_ADDR_SKIP_LAM);
 	if (rc == X86EMUL_CONTINUE)
 		ctxt->ops->invlpg(ctxt, linear);
 	/* Disable writeback. */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2eb258992d63..dd1d28a0d147 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4970,6 +4970,7 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
 		else
 			*ret = off;
 
+		*ret = vmx_untag_addr(vcpu, *ret, 0);
 		/* Long mode: #GP(0)/#SS(0) if the memory address is in a
 		 * non-canonical form. This is the only check on the memory
 		 * destination for long mode!
@@ -5787,6 +5788,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 	vpid02 = nested_get_vpid02(vcpu);
 	switch (type) {
 	case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
+		operand.gla = vmx_untag_addr(vcpu, operand.gla, 0);
 		if (!operand.vpid ||
 		    is_noncanonical_address(operand.gla, vcpu))
 			return nested_vmx_fail(vcpu,
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 0574030b071f..527f1a902c65 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -37,6 +37,7 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
 	if (!IS_ALIGNED(*gva, alignment)) {
 		fault = true;
 	} else if (likely(is_64_bit_mode(vcpu))) {
+		*gva = vmx_untag_addr(vcpu, *gva, 0);
 		fault = is_noncanonical_address(*gva, vcpu);
 	} else {
 		*gva &= 0xffffffff;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e74af72f53ec..d85f87a19f58 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13233,6 +13233,10 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 
 	switch (type) {
 	case INVPCID_TYPE_INDIV_ADDR:
+		/*
+		 * LAM doesn't apply to the linear address in the descriptor,
+		 * still need to be canonical
+		 */
 		if ((!pcid_enabled && (operand.pcid != 0)) ||
 		    is_noncanonical_address(operand.gla, vcpu)) {
 			kvm_inject_gp(vcpu, 0);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v6 7/7] KVM: x86: Expose LAM feature to userspace VMM
  2023-03-19  8:49 [PATCH v6 0/7] Linear Address Masking (LAM) KVM Enabling Binbin Wu
                   ` (5 preceding siblings ...)
  2023-03-19  8:49 ` [PATCH v6 6/7] KVM: x86: Untag address when LAM applicable Binbin Wu
@ 2023-03-19  8:49 ` Binbin Wu
  2023-03-20  8:57   ` Chao Gao
  6 siblings, 1 reply; 43+ messages in thread
From: Binbin Wu @ 2023-03-19  8:49 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini; +Cc: chao.gao, robert.hu, binbin.wu, Jingqi Liu

From: Robert Hoo <robert.hu@linux.intel.com>

LAM feature is enumerated by CPUID.7.1:EAX.LAM[bit 26].
Expose the feature to guest as the final step after the following
supports:
- CR4.LAM_SUP virtualization
- CR3.LAM_U48 and CR3.LAM_U57 virtualization
- Check and untag 64-bit linear address when LAM applies in instruction
  emulations and vmexit handlers.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/kvm/cpuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8f8edeaf8177..13840ef897c7 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -670,7 +670,7 @@ void kvm_set_cpu_caps(void)
 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
 		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
 		F(FZRM) | F(FSRS) | F(FSRC) |
-		F(AMX_FP16) | F(AVX_IFMA)
+		F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
 	);
 
 	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 1/7] KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3()
  2023-03-19  8:49 ` [PATCH v6 1/7] KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3() Binbin Wu
@ 2023-03-20  1:30   ` Binbin Wu
  0 siblings, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-20  1:30 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini; +Cc: chao.gao, robert.hu

I will drop this patch in this LAM KVM enabling patch series per Sean's 
comments in V5.


On 3/19/2023 4:49 PM, Binbin Wu wrote:
> From: Robert Hoo <robert.hu@linux.intel.com>
>
> kvm_read_cr4_bits() returns ulong, explicitly cast it bool when assign to a
> bool variable.
>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
>   arch/x86/kvm/x86.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 810cbe3b3ba6..410327e7eb55 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1238,7 +1238,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   	bool skip_tlb_flush = false;
>   	unsigned long pcid = 0;
>   #ifdef CONFIG_X86_64
> -	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> +	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>   
>   	if (pcid_enabled) {
>   		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 7/7] KVM: x86: Expose LAM feature to userspace VMM
  2023-03-19  8:49 ` [PATCH v6 7/7] KVM: x86: Expose LAM feature to userspace VMM Binbin Wu
@ 2023-03-20  8:57   ` Chao Gao
  2023-03-20 12:00     ` Binbin Wu
  0 siblings, 1 reply; 43+ messages in thread
From: Chao Gao @ 2023-03-20  8:57 UTC (permalink / raw)
  To: Binbin Wu; +Cc: kvm, seanjc, pbonzini, robert.hu, Jingqi Liu

On Sun, Mar 19, 2023 at 04:49:27PM +0800, Binbin Wu wrote:
>From: Robert Hoo <robert.hu@linux.intel.com>
>
>LAM feature is enumerated by CPUID.7.1:EAX.LAM[bit 26].
>Expose the feature to guest as the final step after the following

Nit: s/guest/userspace/. Because it is QEMU that decides whther or not
to expose a feature to guests.

>supports:
>- CR4.LAM_SUP virtualization
>- CR3.LAM_U48 and CR3.LAM_U57 virtualization
>- Check and untag 64-bit linear address when LAM applies in instruction
>  emulations and vmexit handlers.
>
>Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

>---
> arch/x86/kvm/cpuid.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>index 8f8edeaf8177..13840ef897c7 100644
>--- a/arch/x86/kvm/cpuid.c
>+++ b/arch/x86/kvm/cpuid.c
>@@ -670,7 +670,7 @@ void kvm_set_cpu_caps(void)
> 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
> 		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
> 		F(FZRM) | F(FSRS) | F(FSRC) |
>-		F(AMX_FP16) | F(AVX_IFMA)
>+		F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
> 	);
> 
> 	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
>-- 
>2.25.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 6/7] KVM: x86: Untag address when LAM applicable
  2023-03-19  8:49 ` [PATCH v6 6/7] KVM: x86: Untag address when LAM applicable Binbin Wu
@ 2023-03-20 11:51   ` Chao Gao
  2023-03-20 11:56     ` Binbin Wu
  2023-03-20 12:04   ` Binbin Wu
  2023-03-29  5:02   ` Binbin Wu
  2 siblings, 1 reply; 43+ messages in thread
From: Chao Gao @ 2023-03-20 11:51 UTC (permalink / raw)
  To: Binbin Wu; +Cc: kvm, seanjc, pbonzini, robert.hu

On Sun, Mar 19, 2023 at 04:49:26PM +0800, Binbin Wu wrote:
>Untag address for 64-bit memory/mmio operand in instruction emulations
>and vmexit handlers when LAM is applicable.
>
>For instruction emulation, untag address in __linearize() before
>canonical check. LAM doesn't apply to instruction fetch and invlpg,
>use KVM_X86_UNTAG_ADDR_SKIP_LAM to skip LAM untag.
>
>For vmexit handlings related to 64-bit linear address:
>- Cases need to untag address
>  Operand(s) of VMX instructions and INVPCID
>  Operand(s) of SGX ENCLS
>  Linear address in INVVPID descriptor.
>- Cases LAM doesn't apply to (no change needed)
>  Operand of INVLPG
>  Linear address in INVPCID descriptor
>
>Co-developed-by: Robert Hoo <robert.hu@linux.intel.com>
>Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>---
> arch/x86/kvm/emulate.c    | 25 +++++++++++++++++--------
> arch/x86/kvm/vmx/nested.c |  2 ++
> arch/x86/kvm/vmx/sgx.c    |  1 +
> arch/x86/kvm/x86.c        |  4 ++++
> 4 files changed, 24 insertions(+), 8 deletions(-)
>
>diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>index a630c5db971c..c46f0162498e 100644
>--- a/arch/x86/kvm/emulate.c
>+++ b/arch/x86/kvm/emulate.c
>@@ -688,7 +688,8 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
> 				       struct segmented_address addr,
> 				       unsigned *max_size, unsigned size,
> 				       bool write, bool fetch,
>-				       enum x86emul_mode mode, ulong *linear)
>+				       enum x86emul_mode mode, ulong *linear,
>+				       u64 untag_flags)
> {
> 	struct desc_struct desc;
> 	bool usable;
>@@ -701,9 +702,10 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
> 	*max_size = 0;
> 	switch (mode) {
> 	case X86EMUL_MODE_PROT64:
>-		*linear = la;
>+		*linear = static_call(kvm_x86_untag_addr)(ctxt->vcpu, la, untag_flags);
>+
> 		va_bits = ctxt_virt_addr_bits(ctxt);
>-		if (!__is_canonical_address(la, va_bits))
>+		if (!__is_canonical_address(*linear, va_bits))
> 			goto bad;
> 
> 		*max_size = min_t(u64, ~0u, (1ull << va_bits) - la);
>@@ -757,8 +759,8 @@ static int linearize(struct x86_emulate_ctxt *ctxt,
> 		     ulong *linear)
> {
> 	unsigned max_size;
>-	return __linearize(ctxt, addr, &max_size, size, write, false,
>-			   ctxt->mode, linear);
>+	return __linearize(ctxt, addr, &max_size, size, false, false,

							^^^^^
							Should be "write".

>+			   ctxt->mode, linear, 0);
> }
> 
> static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
>@@ -771,7 +773,9 @@ static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
> 
> 	if (ctxt->op_bytes != sizeof(unsigned long))
> 		addr.ea = dst & ((1UL << (ctxt->op_bytes << 3)) - 1);
>-	rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode, &linear);
>+	/* skip LAM untag for instruction */

I think it would be more accurate to quote the spec:

LAM does not apply to addresses used for instruction fetches or to those
that specify the targets of jump and call instructions

>+	rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode,
>+		         &linear, KVM_X86_UNTAG_ADDR_SKIP_LAM);
> 	if (rc == X86EMUL_CONTINUE)
> 		ctxt->_eip = addr.ea;
> 	return rc;
>@@ -906,9 +910,11 @@ static int __do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, int op_size)
> 	 * __linearize is called with size 0 so that it does not do any
> 	 * boundary check itself.  Instead, we use max_size to check
> 	 * against op_size.
>+	 *
>+	 * skip LAM untag for instruction

ditto

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 6/7] KVM: x86: Untag address when LAM applicable
  2023-03-20 11:51   ` Chao Gao
@ 2023-03-20 11:56     ` Binbin Wu
  0 siblings, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-20 11:56 UTC (permalink / raw)
  To: Chao Gao; +Cc: kvm, seanjc, pbonzini, robert.hu


On 3/20/2023 7:51 PM, Chao Gao wrote:
> On Sun, Mar 19, 2023 at 04:49:26PM +0800, Binbin Wu wrote:
>> Untag address for 64-bit memory/mmio operand in instruction emulations
>> and vmexit handlers when LAM is applicable.
>>
>> For instruction emulation, untag address in __linearize() before
>> canonical check. LAM doesn't apply to instruction fetch and invlpg,
>> use KVM_X86_UNTAG_ADDR_SKIP_LAM to skip LAM untag.
>>
>> For vmexit handlings related to 64-bit linear address:
>> - Cases need to untag address
>>   Operand(s) of VMX instructions and INVPCID
>>   Operand(s) of SGX ENCLS
>>   Linear address in INVVPID descriptor.
>> - Cases LAM doesn't apply to (no change needed)
>>   Operand of INVLPG
>>   Linear address in INVPCID descriptor
>>
>> Co-developed-by: Robert Hoo <robert.hu@linux.intel.com>
>> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> ---
>> arch/x86/kvm/emulate.c    | 25 +++++++++++++++++--------
>> arch/x86/kvm/vmx/nested.c |  2 ++
>> arch/x86/kvm/vmx/sgx.c    |  1 +
>> arch/x86/kvm/x86.c        |  4 ++++
>> 4 files changed, 24 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index a630c5db971c..c46f0162498e 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -688,7 +688,8 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
>> 				       struct segmented_address addr,
>> 				       unsigned *max_size, unsigned size,
>> 				       bool write, bool fetch,
>> -				       enum x86emul_mode mode, ulong *linear)
>> +				       enum x86emul_mode mode, ulong *linear,
>> +				       u64 untag_flags)
>> {
>> 	struct desc_struct desc;
>> 	bool usable;
>> @@ -701,9 +702,10 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
>> 	*max_size = 0;
>> 	switch (mode) {
>> 	case X86EMUL_MODE_PROT64:
>> -		*linear = la;
>> +		*linear = static_call(kvm_x86_untag_addr)(ctxt->vcpu, la, untag_flags);
>> +
>> 		va_bits = ctxt_virt_addr_bits(ctxt);
>> -		if (!__is_canonical_address(la, va_bits))
>> +		if (!__is_canonical_address(*linear, va_bits))
>> 			goto bad;
>>
>> 		*max_size = min_t(u64, ~0u, (1ull << va_bits) - la);
>> @@ -757,8 +759,8 @@ static int linearize(struct x86_emulate_ctxt *ctxt,
>> 		     ulong *linear)
>> {
>> 	unsigned max_size;
>> -	return __linearize(ctxt, addr, &max_size, size, write, false,
>> -			   ctxt->mode, linear);
>> +	return __linearize(ctxt, addr, &max_size, size, false, false,
> 							^^^^^
> 							Should be "write".

Oops, thanks for catching it.


>
>> +			   ctxt->mode, linear, 0);
>> }
>>
>> static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
>> @@ -771,7 +773,9 @@ static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
>>
>> 	if (ctxt->op_bytes != sizeof(unsigned long))
>> 		addr.ea = dst & ((1UL << (ctxt->op_bytes << 3)) - 1);
>> -	rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode, &linear);
>> +	/* skip LAM untag for instruction */
> I think it would be more accurate to quote the spec:
>
> LAM does not apply to addresses used for instruction fetches or to those
> that specify the targets of jump and call instructions


OK.

>
>> +	rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode,
>> +		         &linear, KVM_X86_UNTAG_ADDR_SKIP_LAM);
>> 	if (rc == X86EMUL_CONTINUE)
>> 		ctxt->_eip = addr.ea;
>> 	return rc;
>> @@ -906,9 +910,11 @@ static int __do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, int op_size)
>> 	 * __linearize is called with size 0 so that it does not do any
>> 	 * boundary check itself.  Instead, we use max_size to check
>> 	 * against op_size.
>> +	 *
>> +	 * skip LAM untag for instruction
> ditto

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 7/7] KVM: x86: Expose LAM feature to userspace VMM
  2023-03-20  8:57   ` Chao Gao
@ 2023-03-20 12:00     ` Binbin Wu
  0 siblings, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-20 12:00 UTC (permalink / raw)
  To: Chao Gao; +Cc: kvm, seanjc, pbonzini, robert.hu, Jingqi Liu


On 3/20/2023 4:57 PM, Chao Gao wrote:
> On Sun, Mar 19, 2023 at 04:49:27PM +0800, Binbin Wu wrote:
>> From: Robert Hoo <robert.hu@linux.intel.com>
>>
>> LAM feature is enumerated by CPUID.7.1:EAX.LAM[bit 26].
>> Expose the feature to guest as the final step after the following
> Nit: s/guest/userspace/. Because it is QEMU that decides whther or not
> to expose a feature to guests.


OK. Will update it and thanks for review-by

>
>> supports:
>> - CR4.LAM_SUP virtualization
>> - CR3.LAM_U48 and CR3.LAM_U57 virtualization
>> - Check and untag 64-bit linear address when LAM applies in instruction
>>   emulations and vmexit handlers.
>>
>> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
>
>> ---
>> arch/x86/kvm/cpuid.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index 8f8edeaf8177..13840ef897c7 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -670,7 +670,7 @@ void kvm_set_cpu_caps(void)
>> 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
>> 		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
>> 		F(FZRM) | F(FSRS) | F(FSRC) |
>> -		F(AMX_FP16) | F(AVX_IFMA)
>> +		F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
>> 	);
>>
>> 	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 6/7] KVM: x86: Untag address when LAM applicable
  2023-03-19  8:49 ` [PATCH v6 6/7] KVM: x86: Untag address when LAM applicable Binbin Wu
  2023-03-20 11:51   ` Chao Gao
@ 2023-03-20 12:04   ` Binbin Wu
  2023-03-29  5:02   ` Binbin Wu
  2 siblings, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-20 12:04 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini; +Cc: chao.gao, robert.hu


On 3/19/2023 4:49 PM, Binbin Wu wrote:
> Untag address for 64-bit memory/mmio operand in instruction emulations
> and vmexit handlers when LAM is applicable.
>
> For instruction emulation, untag address in __linearize() before
> canonical check. LAM doesn't apply to instruction fetch and invlpg,
> use KVM_X86_UNTAG_ADDR_SKIP_LAM to skip LAM untag.
>
> For vmexit handlings related to 64-bit linear address:
> - Cases need to untag address
>    Operand(s) of VMX instructions and INVPCID
>    Operand(s) of SGX ENCLS
>    Linear address in INVVPID descriptor.
> - Cases LAM doesn't apply to (no change needed)
>    Operand of INVLPG
>    Linear address in INVPCID descriptor
>
> Co-developed-by: Robert Hoo <robert.hu@linux.intel.com>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
>   arch/x86/kvm/emulate.c    | 25 +++++++++++++++++--------
>   arch/x86/kvm/vmx/nested.c |  2 ++
>   arch/x86/kvm/vmx/sgx.c    |  1 +
>   arch/x86/kvm/x86.c        |  4 ++++
>   4 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index a630c5db971c..c46f0162498e 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -688,7 +688,8 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
>   				       struct segmented_address addr,
>   				       unsigned *max_size, unsigned size,
>   				       bool write, bool fetch,
> -				       enum x86emul_mode mode, ulong *linear)
> +				       enum x86emul_mode mode, ulong *linear,
> +				       u64 untag_flags)
>   {
>   	struct desc_struct desc;
>   	bool usable;
> @@ -701,9 +702,10 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
>   	*max_size = 0;
>   	switch (mode) {
>   	case X86EMUL_MODE_PROT64:
> -		*linear = la;
> +		*linear = static_call(kvm_x86_untag_addr)(ctxt->vcpu, la, untag_flags);

Per Sean's comment "Derefencing ctxt->vcpu in the emulator is not 
allowed" in V5, I will update this as following:

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c46f0162498e..5fbce7bb3bc8 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -702,7 +702,7 @@ static __always_inline int __linearize(struct 
x86_emulate_ctxt *ctxt,
         *max_size = 0;
         switch (mode) {
         case X86EMUL_MODE_PROT64:
-               *linear = static_call(kvm_x86_untag_addr)(ctxt->vcpu, 
la, untag_flags);
+               *linear = ctxt->ops->untag_addr(ctxt, la, untag_flags);

                 va_bits = ctxt_virt_addr_bits(ctxt);
                 if (!__is_canonical_address(*linear, va_bits))
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 2d9662be8333..14b32c7c2abb 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -224,6 +224,8 @@ struct x86_emulate_ops {
         int (*leave_smm)(struct x86_emulate_ctxt *ctxt);
         void (*triple_fault)(struct x86_emulate_ctxt *ctxt);
         int (*set_xcr)(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr);
+
+       u64 (*untag_addr)(struct x86_emulate_ctxt *ctxt, u64 addr, u64 
flags);
  };

  /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d85f87a19f58..a3560ea7560d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8190,6 +8190,11 @@ static void emulator_vm_bugged(struct 
x86_emulate_ctxt *ctxt)
                 kvm_vm_bugged(kvm);
  }

+static u64 emulator_untag_addr(struct x86_emulate_ctxt *ctxt, u64 addr, 
u64 flags)
+{
+       return static_call(kvm_x86_untag_addr)(emul_to_vcpu(ctxt), addr, 
flags);
+}
+
  static const struct x86_emulate_ops emulate_ops = {
         .vm_bugged           = emulator_vm_bugged,
         .read_gpr            = emulator_read_gpr,
@@ -8234,6 +8239,7 @@ static const struct x86_emulate_ops emulate_ops = {
         .leave_smm           = emulator_leave_smm,
         .triple_fault        = emulator_triple_fault,
         .set_xcr             = emulator_set_xcr,
+       .untag_addr          = emulator_untag_addr,
  };


> +
>   		va_bits = ctxt_virt_addr_bits(ctxt);
> -		if (!__is_canonical_address(la, va_bits))
> +		if (!__is_canonical_address(*linear, va_bits))
>   			goto bad;
>   
>   		*max_size = min_t(u64, ~0u, (1ull << va_bits) - la);
> @@ -757,8 +759,8 @@ static int linearize(struct x86_emulate_ctxt *ctxt,
>   		     ulong *linear)
>   {
>   	unsigned max_size;
> -	return __linearize(ctxt, addr, &max_size, size, write, false,
> -			   ctxt->mode, linear);
> +	return __linearize(ctxt, addr, &max_size, size, false, false,
> +			   ctxt->mode, linear, 0);
>   }
>   
>   static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
> @@ -771,7 +773,9 @@ static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
>   
>   	if (ctxt->op_bytes != sizeof(unsigned long))
>   		addr.ea = dst & ((1UL << (ctxt->op_bytes << 3)) - 1);
> -	rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode, &linear);
> +	/* skip LAM untag for instruction */
> +	rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode,
> +		         &linear, KVM_X86_UNTAG_ADDR_SKIP_LAM);
>   	if (rc == X86EMUL_CONTINUE)
>   		ctxt->_eip = addr.ea;
>   	return rc;
> @@ -906,9 +910,11 @@ static int __do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, int op_size)
>   	 * __linearize is called with size 0 so that it does not do any
>   	 * boundary check itself.  Instead, we use max_size to check
>   	 * against op_size.
> +	 *
> +	 * skip LAM untag for instruction
>   	 */
>   	rc = __linearize(ctxt, addr, &max_size, 0, false, true, ctxt->mode,
> -			 &linear);
> +			 &linear, KVM_X86_UNTAG_ADDR_SKIP_LAM);
>   	if (unlikely(rc != X86EMUL_CONTINUE))
>   		return rc;
>   
> @@ -3433,8 +3439,11 @@ static int em_invlpg(struct x86_emulate_ctxt *ctxt)
>   {
>   	int rc;
>   	ulong linear;
> +	unsigned max_size;
>   
> -	rc = linearize(ctxt, ctxt->src.addr.mem, 1, false, &linear);
> +	/* skip untag for invlpg since LAM is not applied to invlpg */
> +	rc = __linearize(ctxt, ctxt->src.addr.mem, &max_size, 1, false, false,
> +			 ctxt->mode, &linear, KVM_X86_UNTAG_ADDR_SKIP_LAM);
>   	if (rc == X86EMUL_CONTINUE)
>   		ctxt->ops->invlpg(ctxt, linear);
>   	/* Disable writeback. */
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 2eb258992d63..dd1d28a0d147 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4970,6 +4970,7 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
>   		else
>   			*ret = off;
>   
> +		*ret = vmx_untag_addr(vcpu, *ret, 0);
>   		/* Long mode: #GP(0)/#SS(0) if the memory address is in a
>   		 * non-canonical form. This is the only check on the memory
>   		 * destination for long mode!
> @@ -5787,6 +5788,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
>   	vpid02 = nested_get_vpid02(vcpu);
>   	switch (type) {
>   	case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
> +		operand.gla = vmx_untag_addr(vcpu, operand.gla, 0);
>   		if (!operand.vpid ||
>   		    is_noncanonical_address(operand.gla, vcpu))
>   			return nested_vmx_fail(vcpu,
> diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
> index 0574030b071f..527f1a902c65 100644
> --- a/arch/x86/kvm/vmx/sgx.c
> +++ b/arch/x86/kvm/vmx/sgx.c
> @@ -37,6 +37,7 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
>   	if (!IS_ALIGNED(*gva, alignment)) {
>   		fault = true;
>   	} else if (likely(is_64_bit_mode(vcpu))) {
> +		*gva = vmx_untag_addr(vcpu, *gva, 0);
>   		fault = is_noncanonical_address(*gva, vcpu);
>   	} else {
>   		*gva &= 0xffffffff;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e74af72f53ec..d85f87a19f58 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13233,6 +13233,10 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
>   
>   	switch (type) {
>   	case INVPCID_TYPE_INDIV_ADDR:
> +		/*
> +		 * LAM doesn't apply to the linear address in the descriptor,
> +		 * still need to be canonical
> +		 */
>   		if ((!pcid_enabled && (operand.pcid != 0)) ||
>   		    is_noncanonical_address(operand.gla, vcpu)) {
>   			kvm_inject_gp(vcpu, 0);

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 5/7] KVM: x86: Introduce untag_addr() in kvm_x86_ops
  2023-03-19  8:49 ` [PATCH v6 5/7] KVM: x86: Introduce untag_addr() in kvm_x86_ops Binbin Wu
@ 2023-03-20 12:07   ` Chao Gao
  2023-03-20 12:23     ` Binbin Wu
  0 siblings, 1 reply; 43+ messages in thread
From: Chao Gao @ 2023-03-20 12:07 UTC (permalink / raw)
  To: Binbin Wu; +Cc: kvm, seanjc, pbonzini, robert.hu

On Sun, Mar 19, 2023 at 04:49:25PM +0800, Binbin Wu wrote:
>Introduce a new interface untag_addr() to kvm_x86_ops to untag the metadata
>from linear address. Implement LAM version in VMX and dummy version in SVM.
>
>When enabled feature like Intel Linear Address Masking or AMD Upper
>Address Ignore, linear address may be tagged with metadata. Linear
>address should be checked for modified canonicality and untagged in
>instrution emulations or vmexit handlings if LAM or UAI is applicable.
>
>Introduce untag_addr() to kvm_x86_ops to hide the code related to vendor
>specific details.
>- For VMX, LAM version is implemented.
>  LAM has a modified canonical check when applicable:
>  * LAM_S48                : [ 1 ][ metadata ][ 1 ]
>                               63               47
>  * LAM_U48                : [ 0 ][ metadata ][ 0 ]
>                               63               47
>  * LAM_S57                : [ 1 ][ metadata ][ 1 ]
>                               63               56
>  * LAM_U57 + 5-lvl paging : [ 0 ][ metadata ][ 0 ]
>                               63               56
>  * LAM_U57 + 4-lvl paging : [ 0 ][ metadata ][ 0...0 ]
>                               63               56..47
>  If LAM is applicable to certain address, untag the metadata bits and
>  replace them with the value of bit 47 (LAM48) or bit 56 (LAM57). Later
>  the untagged address will do legacy canonical check. So that LAM canonical
>  check and mask can be covered by "untag + legacy canonical check".
>
>  For cases LAM is not applicable, 'flags' is passed to the interface
>  to skip untag.
>
>- For SVM, add a dummy version to do nothing, but return the original
>  address.
>
>Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>---
> arch/x86/include/asm/kvm-x86-ops.h |  1 +
> arch/x86/include/asm/kvm_host.h    |  5 +++
> arch/x86/kvm/svm/svm.c             |  7 ++++
> arch/x86/kvm/vmx/vmx.c             | 60 ++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.h             |  2 +
> 5 files changed, 75 insertions(+)
>
>diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
>index 8dc345cc6318..7d63d1b942ac 100644
>--- a/arch/x86/include/asm/kvm-x86-ops.h
>+++ b/arch/x86/include/asm/kvm-x86-ops.h
>@@ -52,6 +52,7 @@ KVM_X86_OP(cache_reg)
> KVM_X86_OP(get_rflags)
> KVM_X86_OP(set_rflags)
> KVM_X86_OP(get_if_flag)
>+KVM_X86_OP(untag_addr)

Suppose AMD doesn't/won't use CR4.LAM_SUP and CR3.LAM_U48/U57 for other
purposes, it is fine to use a common x86 function to perform LAM masking
for pointers. It won't do anything harmful on AMD parts because those
enabling bits shouldn't be set and then no bits will be masked out by
the common x86 function.

Probably we can defer the introduction of the hook to when the
assumption becomes wrong.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 5/7] KVM: x86: Introduce untag_addr() in kvm_x86_ops
  2023-03-20 12:07   ` Chao Gao
@ 2023-03-20 12:23     ` Binbin Wu
  2023-03-29  1:54       ` Binbin Wu
  0 siblings, 1 reply; 43+ messages in thread
From: Binbin Wu @ 2023-03-20 12:23 UTC (permalink / raw)
  To: Chao Gao; +Cc: kvm, seanjc, pbonzini, robert.hu


On 3/20/2023 8:07 PM, Chao Gao wrote:
> On Sun, Mar 19, 2023 at 04:49:25PM +0800, Binbin Wu wrote:
>> Introduce a new interface untag_addr() to kvm_x86_ops to untag the metadata
> >from linear address. Implement LAM version in VMX and dummy version in SVM.
>> When enabled feature like Intel Linear Address Masking or AMD Upper
>> Address Ignore, linear address may be tagged with metadata. Linear
>> address should be checked for modified canonicality and untagged in
>> instrution emulations or vmexit handlings if LAM or UAI is applicable.
>>
>> Introduce untag_addr() to kvm_x86_ops to hide the code related to vendor
>> specific details.
>> - For VMX, LAM version is implemented.
>>   LAM has a modified canonical check when applicable:
>>   * LAM_S48                : [ 1 ][ metadata ][ 1 ]
>>                                63               47
>>   * LAM_U48                : [ 0 ][ metadata ][ 0 ]
>>                                63               47
>>   * LAM_S57                : [ 1 ][ metadata ][ 1 ]
>>                                63               56
>>   * LAM_U57 + 5-lvl paging : [ 0 ][ metadata ][ 0 ]
>>                                63               56
>>   * LAM_U57 + 4-lvl paging : [ 0 ][ metadata ][ 0...0 ]
>>                                63               56..47
>>   If LAM is applicable to certain address, untag the metadata bits and
>>   replace them with the value of bit 47 (LAM48) or bit 56 (LAM57). Later
>>   the untagged address will do legacy canonical check. So that LAM canonical
>>   check and mask can be covered by "untag + legacy canonical check".
>>
>>   For cases LAM is not applicable, 'flags' is passed to the interface
>>   to skip untag.
>>
>> - For SVM, add a dummy version to do nothing, but return the original
>>   address.
>>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> ---
>> arch/x86/include/asm/kvm-x86-ops.h |  1 +
>> arch/x86/include/asm/kvm_host.h    |  5 +++
>> arch/x86/kvm/svm/svm.c             |  7 ++++
>> arch/x86/kvm/vmx/vmx.c             | 60 ++++++++++++++++++++++++++++++
>> arch/x86/kvm/vmx/vmx.h             |  2 +
>> 5 files changed, 75 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
>> index 8dc345cc6318..7d63d1b942ac 100644
>> --- a/arch/x86/include/asm/kvm-x86-ops.h
>> +++ b/arch/x86/include/asm/kvm-x86-ops.h
>> @@ -52,6 +52,7 @@ KVM_X86_OP(cache_reg)
>> KVM_X86_OP(get_rflags)
>> KVM_X86_OP(set_rflags)
>> KVM_X86_OP(get_if_flag)
>> +KVM_X86_OP(untag_addr)
> Suppose AMD doesn't/won't use CR4.LAM_SUP and CR3.LAM_U48/U57 for other
> purposes, it is fine to use a common x86 function to perform LAM masking
> for pointers. It won't do anything harmful on AMD parts because those
> enabling bits shouldn't be set and then no bits will be masked out by
> the common x86 function.
>
> Probably we can defer the introduction of the hook to when the
> assumption becomes wrong.

Another reason I introduced the hook is I noticed the AMD Upper Address 
Ignore using [63:57] as metadata.
So the untag implementaion will be differnet. But indeed, it also will 
be a future issue.

Let's hear more opinions from others, if more guys think the hook is 
unnecessary for now, I can switch back to
a common x86 function.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-19  8:49 ` [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode Binbin Wu
@ 2023-03-20 12:36   ` Chao Gao
  2023-03-20 12:51     ` Binbin Wu
  2023-03-21 21:35     ` Sean Christopherson
  2023-03-20 22:36   ` Huang, Kai
  1 sibling, 2 replies; 43+ messages in thread
From: Chao Gao @ 2023-03-20 12:36 UTC (permalink / raw)
  To: Binbin Wu; +Cc: kvm, seanjc, pbonzini, robert.hu

On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
>get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
>to check 64-bit mode. Should use is_64_bit_mode() instead.
>
>Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
>Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")

It is better to split this patch into two: one for nested and one for
SGX.

It is possible that there is a kernel release which has just one of
above two flawed commits, then this fix patch cannot be applied cleanly
to the release.

>Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>---
> arch/x86/kvm/vmx/nested.c | 2 +-
> arch/x86/kvm/vmx/sgx.c    | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>index 557b9c468734..0f84cc05f57c 100644
>--- a/arch/x86/kvm/vmx/nested.c
>+++ b/arch/x86/kvm/vmx/nested.c
>@@ -4959,7 +4959,7 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
> 
> 	/* Checks for #GP/#SS exceptions. */
> 	exn = false;
>-	if (is_long_mode(vcpu)) {
>+	if (is_64_bit_mode(vcpu)) {
> 		/*
> 		 * The virtual/linear address is never truncated in 64-bit
> 		 * mode, e.g. a 32-bit address size can yield a 64-bit virtual
>diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
>index aa53c98034bf..0574030b071f 100644
>--- a/arch/x86/kvm/vmx/sgx.c
>+++ b/arch/x86/kvm/vmx/sgx.c
>@@ -29,14 +29,14 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
> 
> 	/* Skip vmcs.GUEST_DS retrieval for 64-bit mode to avoid VMREADs. */
> 	*gva = offset;
>-	if (!is_long_mode(vcpu)) {
>+	if (!is_64_bit_mode(vcpu)) {
> 		vmx_get_segment(vcpu, &s, VCPU_SREG_DS);
> 		*gva += s.base;
> 	}
> 
> 	if (!IS_ALIGNED(*gva, alignment)) {
> 		fault = true;
>-	} else if (likely(is_long_mode(vcpu))) {
>+	} else if (likely(is_64_bit_mode(vcpu))) {
> 		fault = is_noncanonical_address(*gva, vcpu);
> 	} else {
> 		*gva &= 0xffffffff;
>-- 
>2.25.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-20 12:36   ` Chao Gao
@ 2023-03-20 12:51     ` Binbin Wu
  2023-03-21 21:35     ` Sean Christopherson
  1 sibling, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-20 12:51 UTC (permalink / raw)
  To: Chao Gao; +Cc: kvm, seanjc, pbonzini, robert.hu


On 3/20/2023 8:36 PM, Chao Gao wrote:
> On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
>> get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
>> to check 64-bit mode. Should use is_64_bit_mode() instead.
>>
>> Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
>> Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
> It is better to split this patch into two: one for nested and one for
> SGX.
>
> It is possible that there is a kernel release which has just one of
> above two flawed commits, then this fix patch cannot be applied cleanly
> to the release.


OK.

>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> ---
>> arch/x86/kvm/vmx/nested.c | 2 +-
>> arch/x86/kvm/vmx/sgx.c    | 4 ++--
>> 2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index 557b9c468734..0f84cc05f57c 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -4959,7 +4959,7 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
>>
>> 	/* Checks for #GP/#SS exceptions. */
>> 	exn = false;
>> -	if (is_long_mode(vcpu)) {
>> +	if (is_64_bit_mode(vcpu)) {
>> 		/*
>> 		 * The virtual/linear address is never truncated in 64-bit
>> 		 * mode, e.g. a 32-bit address size can yield a 64-bit virtual
>> diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
>> index aa53c98034bf..0574030b071f 100644
>> --- a/arch/x86/kvm/vmx/sgx.c
>> +++ b/arch/x86/kvm/vmx/sgx.c
>> @@ -29,14 +29,14 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
>>
>> 	/* Skip vmcs.GUEST_DS retrieval for 64-bit mode to avoid VMREADs. */
>> 	*gva = offset;
>> -	if (!is_long_mode(vcpu)) {
>> +	if (!is_64_bit_mode(vcpu)) {
>> 		vmx_get_segment(vcpu, &s, VCPU_SREG_DS);
>> 		*gva += s.base;
>> 	}
>>
>> 	if (!IS_ALIGNED(*gva, alignment)) {
>> 		fault = true;
>> -	} else if (likely(is_long_mode(vcpu))) {
>> +	} else if (likely(is_64_bit_mode(vcpu))) {
>> 		fault = is_noncanonical_address(*gva, vcpu);
>> 	} else {
>> 		*gva &= 0xffffffff;
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-19  8:49 ` [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode Binbin Wu
  2023-03-20 12:36   ` Chao Gao
@ 2023-03-20 22:36   ` Huang, Kai
  1 sibling, 0 replies; 43+ messages in thread
From: Huang, Kai @ 2023-03-20 22:36 UTC (permalink / raw)
  To: kvm, pbonzini, Christopherson,, Sean, binbin.wu; +Cc: robert.hu, Gao, Chao

On Sun, 2023-03-19 at 16:49 +0800, Binbin Wu wrote:
> get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
> to check 64-bit mode. Should use is_64_bit_mode() instead.
> 
> Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
> Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 2 +-
>  arch/x86/kvm/vmx/sgx.c    | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 557b9c468734..0f84cc05f57c 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4959,7 +4959,7 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
>  
>  	/* Checks for #GP/#SS exceptions. */
>  	exn = false;
> -	if (is_long_mode(vcpu)) {
> +	if (is_64_bit_mode(vcpu)) {
>  		/*
>  		 * The virtual/linear address is never truncated in 64-bit
>  		 * mode, e.g. a 32-bit address size can yield a 64-bit virtual
> diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
> index aa53c98034bf..0574030b071f 100644
> --- a/arch/x86/kvm/vmx/sgx.c
> +++ b/arch/x86/kvm/vmx/sgx.c
> @@ -29,14 +29,14 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
>  
>  	/* Skip vmcs.GUEST_DS retrieval for 64-bit mode to avoid VMREADs. */
>  	*gva = offset;
> -	if (!is_long_mode(vcpu)) {
> +	if (!is_64_bit_mode(vcpu)) {
>  		vmx_get_segment(vcpu, &s, VCPU_SREG_DS);
>  		*gva += s.base;
>  	}
>  
>  	if (!IS_ALIGNED(*gva, alignment)) {
>  		fault = true;
> -	} else if (likely(is_long_mode(vcpu))) {
> +	} else if (likely(is_64_bit_mode(vcpu))) {
>  		fault = is_noncanonical_address(*gva, vcpu);
>  	} else {
>  		*gva &= 0xffffffff;

For SGX part,

Reviewed-by: Kai Huang <kai.huang@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-20 12:36   ` Chao Gao
  2023-03-20 12:51     ` Binbin Wu
@ 2023-03-21 21:35     ` Sean Christopherson
  2023-03-22  1:09       ` Binbin Wu
  2023-03-28 23:33       ` Huang, Kai
  1 sibling, 2 replies; 43+ messages in thread
From: Sean Christopherson @ 2023-03-21 21:35 UTC (permalink / raw)
  To: Chao Gao; +Cc: Binbin Wu, kvm, pbonzini, robert.hu

On Mon, Mar 20, 2023, Chao Gao wrote:
> On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
> >get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
> >to check 64-bit mode. Should use is_64_bit_mode() instead.
> >
> >Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
> >Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
> 
> It is better to split this patch into two: one for nested and one for
> SGX.
> 
> It is possible that there is a kernel release which has just one of
> above two flawed commits, then this fix patch cannot be applied cleanly
> to the release.

The nVMX code isn't buggy, VMX instructions #UD in compatibility mode, and except
for VMCALL, that #UD has higher priority than VM-Exit interception.  So I'd say
just drop the nVMX side of things.

I could have sworn ENCLS had the same behavior, but the SDM disagrees.  Though why
on earth ENCLS is allowed in compatibility mode is beyond me.  ENCLU I can kinda
sorta understand, but ENCLS?!?!!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-21 21:35     ` Sean Christopherson
@ 2023-03-22  1:09       ` Binbin Wu
  2023-03-28 23:33       ` Huang, Kai
  1 sibling, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-22  1:09 UTC (permalink / raw)
  To: Sean Christopherson, Chao Gao; +Cc: kvm, pbonzini, robert.hu


On 3/22/2023 5:35 AM, Sean Christopherson wrote:
> On Mon, Mar 20, 2023, Chao Gao wrote:
>> On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
>>> get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
>>> to check 64-bit mode. Should use is_64_bit_mode() instead.
>>>
>>> Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
>>> Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
>> It is better to split this patch into two: one for nested and one for
>> SGX.
>>
>> It is possible that there is a kernel release which has just one of
>> above two flawed commits, then this fix patch cannot be applied cleanly
>> to the release.
> The nVMX code isn't buggy, VMX instructions #UD in compatibility mode, and except
> for VMCALL, that #UD has higher priority than VM-Exit interception.  So I'd say
> just drop the nVMX side of things.

Got it.
Do you mind if I add a comment about it in code?


>
> I could have sworn ENCLS had the same behavior, but the SDM disagrees.  Though why
> on earth ENCLS is allowed in compatibility mode is beyond me.  ENCLU I can kinda
> sorta understand, but ENCLS?!?!!

Yes, the SDM does have definition about the behavior "outside 64-bit 
mode (IA32_EFER.LAM = 0 || CS.L = 0)".

IMO, the change has very little impact on performance since the two 
intercepted ENCLS leaf EINIT & ECREATE will
only be called once per lifecycle of a SGX Enclave.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-21 21:35     ` Sean Christopherson
  2023-03-22  1:09       ` Binbin Wu
@ 2023-03-28 23:33       ` Huang, Kai
  2023-03-29  1:27         ` Binbin Wu
  1 sibling, 1 reply; 43+ messages in thread
From: Huang, Kai @ 2023-03-28 23:33 UTC (permalink / raw)
  To: Christopherson,, Sean, Gao, Chao; +Cc: kvm, pbonzini, robert.hu, binbin.wu

On Tue, 2023-03-21 at 14:35 -0700, Sean Christopherson wrote:
> On Mon, Mar 20, 2023, Chao Gao wrote:
> > On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
> > > get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
> > > to check 64-bit mode. Should use is_64_bit_mode() instead.
> > > 
> > > Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
> > > Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
> > 
> > It is better to split this patch into two: one for nested and one for
> > SGX.
> > 
> > It is possible that there is a kernel release which has just one of
> > above two flawed commits, then this fix patch cannot be applied cleanly
> > to the release.
> 
> The nVMX code isn't buggy, VMX instructions #UD in compatibility mode, and except
> for VMCALL, that #UD has higher priority than VM-Exit interception.  So I'd say
> just drop the nVMX side of things.

But it looks the old code doesn't unconditionally inject #UD when in
compatibility mode?

        /* Checks for #GP/#SS exceptions. */                                   
        exn = false;                                                           
        if (is_long_mode(vcpu)) {                                              
                /*                                                             
                 * The virtual/linear address is never truncated in 64-bit     
                 * mode, e.g. a 32-bit address size can yield a 64-bit virtual 
                 * address when using FS/GS with a non-zero base.              
                 */                                                            
                if (seg_reg == VCPU_SREG_FS || seg_reg == VCPU_SREG_GS)        
                        *ret = s.base + off;                                   
                else                                                           
                        *ret = off;                                            
                                                                                                                                                   
                /* Long mode: #GP(0)/#SS(0) if the memory address is in a      
                 * non-canonical form. This is the only check on the memory    
                 * destination for long mode!                                  
                 */                                                            
                exn = is_noncanonical_address(*ret, vcpu); 
	}
	...

The logic of only adding seg.base for FS/GS to linear address (and ignoring
seg.base for all other segs) only applies to 64 bit mode, but the code only
checks _long_ mode.

Am I missing something?
 
> 
> I could have sworn ENCLS had the same behavior, but the SDM disagrees.  Though why
> on earth ENCLS is allowed in compatibility mode is beyond me.  ENCLU I can kinda
> sorta understand, but ENCLS?!?!!

I can reach out to Intel guys to (try to) find the answer if you want me to? :)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-28 23:33       ` Huang, Kai
@ 2023-03-29  1:27         ` Binbin Wu
  2023-03-29  2:04           ` Huang, Kai
  0 siblings, 1 reply; 43+ messages in thread
From: Binbin Wu @ 2023-03-29  1:27 UTC (permalink / raw)
  To: Huang, Kai, Christopherson,, Sean, Gao, Chao; +Cc: kvm, pbonzini, robert.hu


On 3/29/2023 7:33 AM, Huang, Kai wrote:
> On Tue, 2023-03-21 at 14:35 -0700, Sean Christopherson wrote:
>> On Mon, Mar 20, 2023, Chao Gao wrote:
>>> On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
>>>> get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
>>>> to check 64-bit mode. Should use is_64_bit_mode() instead.
>>>>
>>>> Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
>>>> Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
>>> It is better to split this patch into two: one for nested and one for
>>> SGX.
>>>
>>> It is possible that there is a kernel release which has just one of
>>> above two flawed commits, then this fix patch cannot be applied cleanly
>>> to the release.
>> The nVMX code isn't buggy, VMX instructions #UD in compatibility mode, and except
>> for VMCALL, that #UD has higher priority than VM-Exit interception.  So I'd say
>> just drop the nVMX side of things.
> But it looks the old code doesn't unconditionally inject #UD when in
> compatibility mode?

I think Sean means VMX instructions is not valid in compatibility mode 
and it triggers #UD, which has higher priority than VM-Exit, by the 
processor in non-root mode.

So if there is a VM-Exit due to VMX instruction , it is in 64-bit mode 
for sure if it is in long mode.


>
>          /* Checks for #GP/#SS exceptions. */
>          exn = false;
>          if (is_long_mode(vcpu)) {
>                  /*
>                   * The virtual/linear address is never truncated in 64-bit
>                   * mode, e.g. a 32-bit address size can yield a 64-bit virtual
>                   * address when using FS/GS with a non-zero base.
>                   */
>                  if (seg_reg == VCPU_SREG_FS || seg_reg == VCPU_SREG_GS)
>                          *ret = s.base + off;
>                  else
>                          *ret = off;
>                                                                                                                                                     
>                  /* Long mode: #GP(0)/#SS(0) if the memory address is in a
>                   * non-canonical form. This is the only check on the memory
>                   * destination for long mode!
>                   */
>                  exn = is_noncanonical_address(*ret, vcpu);
> 	}
> 	...
>
> The logic of only adding seg.base for FS/GS to linear address (and ignoring
> seg.base for all other segs) only applies to 64 bit mode, but the code only
> checks _long_ mode.
>
> Am I missing something?
>   
>> I could have sworn ENCLS had the same behavior, but the SDM disagrees.  Though why
>> on earth ENCLS is allowed in compatibility mode is beyond me.  ENCLU I can kinda
>> sorta understand, but ENCLS?!?!!
> I can reach out to Intel guys to (try to) find the answer if you want me to? :)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 5/7] KVM: x86: Introduce untag_addr() in kvm_x86_ops
  2023-03-20 12:23     ` Binbin Wu
@ 2023-03-29  1:54       ` Binbin Wu
  0 siblings, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-29  1:54 UTC (permalink / raw)
  To: seanjc; +Cc: kvm, pbonzini, Chao Gao, robert.hu


On 3/20/2023 8:23 PM, Binbin Wu wrote:
>
> On 3/20/2023 8:07 PM, Chao Gao wrote:
>> On Sun, Mar 19, 2023 at 04:49:25PM +0800, Binbin Wu wrote:
>>> Introduce a new interface untag_addr() to kvm_x86_ops to untag the 
>>> metadata
>> >from linear address. Implement LAM version in VMX and dummy version 
>> in SVM.
>>> When enabled feature like Intel Linear Address Masking or AMD Upper
>>> Address Ignore, linear address may be tagged with metadata. Linear
>>> address should be checked for modified canonicality and untagged in
>>> instrution emulations or vmexit handlings if LAM or UAI is applicable.
>>>
>>> Introduce untag_addr() to kvm_x86_ops to hide the code related to 
>>> vendor
>>> specific details.
>>> - For VMX, LAM version is implemented.
>>>   LAM has a modified canonical check when applicable:
>>>   * LAM_S48                : [ 1 ][ metadata ][ 1 ]
>>>                                63               47
>>>   * LAM_U48                : [ 0 ][ metadata ][ 0 ]
>>>                                63               47
>>>   * LAM_S57                : [ 1 ][ metadata ][ 1 ]
>>>                                63               56
>>>   * LAM_U57 + 5-lvl paging : [ 0 ][ metadata ][ 0 ]
>>>                                63               56
>>>   * LAM_U57 + 4-lvl paging : [ 0 ][ metadata ][ 0...0 ]
>>>                                63               56..47
>>>   If LAM is applicable to certain address, untag the metadata bits and
>>>   replace them with the value of bit 47 (LAM48) or bit 56 (LAM57). 
>>> Later
>>>   the untagged address will do legacy canonical check. So that LAM 
>>> canonical
>>>   check and mask can be covered by "untag + legacy canonical check".
>>>
>>>   For cases LAM is not applicable, 'flags' is passed to the interface
>>>   to skip untag.
>>>
>>> - For SVM, add a dummy version to do nothing, but return the original
>>>   address.
>>>
>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>> ---
>>> arch/x86/include/asm/kvm-x86-ops.h |  1 +
>>> arch/x86/include/asm/kvm_host.h    |  5 +++
>>> arch/x86/kvm/svm/svm.c             |  7 ++++
>>> arch/x86/kvm/vmx/vmx.c             | 60 ++++++++++++++++++++++++++++++
>>> arch/x86/kvm/vmx/vmx.h             |  2 +
>>> 5 files changed, 75 insertions(+)
>>>
>>> diff --git a/arch/x86/include/asm/kvm-x86-ops.h 
>>> b/arch/x86/include/asm/kvm-x86-ops.h
>>> index 8dc345cc6318..7d63d1b942ac 100644
>>> --- a/arch/x86/include/asm/kvm-x86-ops.h
>>> +++ b/arch/x86/include/asm/kvm-x86-ops.h
>>> @@ -52,6 +52,7 @@ KVM_X86_OP(cache_reg)
>>> KVM_X86_OP(get_rflags)
>>> KVM_X86_OP(set_rflags)
>>> KVM_X86_OP(get_if_flag)
>>> +KVM_X86_OP(untag_addr)
>> Suppose AMD doesn't/won't use CR4.LAM_SUP and CR3.LAM_U48/U57 for other
>> purposes, it is fine to use a common x86 function to perform LAM masking
>> for pointers. It won't do anything harmful on AMD parts because those
>> enabling bits shouldn't be set and then no bits will be masked out by
>> the common x86 function.
>>
>> Probably we can defer the introduction of the hook to when the
>> assumption becomes wrong.
>
> Another reason I introduced the hook is I noticed the AMD Upper 
> Address Ignore using [63:57] as metadata.
> So the untag implementaion will be differnet. But indeed, it also will 
> be a future issue.
>
> Let's hear more opinions from others, if more guys think the hook is 
> unnecessary for now, I can switch back to
> a common x86 function.

Hi Sean,

What's your opinion? Do you think it is too early to introduce the hook?



>
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-29  1:27         ` Binbin Wu
@ 2023-03-29  2:04           ` Huang, Kai
  2023-03-29  2:08             ` Binbin Wu
  0 siblings, 1 reply; 43+ messages in thread
From: Huang, Kai @ 2023-03-29  2:04 UTC (permalink / raw)
  To: Christopherson,, Sean, binbin.wu, Gao, Chao; +Cc: kvm, pbonzini, robert.hu

On Wed, 2023-03-29 at 09:27 +0800, Binbin Wu wrote:
> On 3/29/2023 7:33 AM, Huang, Kai wrote:
> > On Tue, 2023-03-21 at 14:35 -0700, Sean Christopherson wrote:
> > > On Mon, Mar 20, 2023, Chao Gao wrote:
> > > > On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
> > > > > get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
> > > > > to check 64-bit mode. Should use is_64_bit_mode() instead.
> > > > > 
> > > > > Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
> > > > > Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
> > > > It is better to split this patch into two: one for nested and one for
> > > > SGX.
> > > > 
> > > > It is possible that there is a kernel release which has just one of
> > > > above two flawed commits, then this fix patch cannot be applied cleanly
> > > > to the release.
> > > The nVMX code isn't buggy, VMX instructions #UD in compatibility mode, and except
> > > for VMCALL, that #UD has higher priority than VM-Exit interception.  So I'd say
> > > just drop the nVMX side of things.
> > But it looks the old code doesn't unconditionally inject #UD when in
> > compatibility mode?
> 
> I think Sean means VMX instructions is not valid in compatibility mode 
> and it triggers #UD, which has higher priority than VM-Exit, by the 
> processor in non-root mode.
> 
> So if there is a VM-Exit due to VMX instruction , it is in 64-bit mode 
> for sure if it is in long mode.

Oh I see thanks.

Then is it better to add some comment to explain, or add a WARN() if it's not in
64-bit mode?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-29  2:04           ` Huang, Kai
@ 2023-03-29  2:08             ` Binbin Wu
  2023-03-29 17:34               ` Sean Christopherson
  0 siblings, 1 reply; 43+ messages in thread
From: Binbin Wu @ 2023-03-29  2:08 UTC (permalink / raw)
  To: Huang, Kai, Christopherson,, Sean, Gao, Chao; +Cc: kvm, pbonzini, robert.hu


On 3/29/2023 10:04 AM, Huang, Kai wrote:
> On Wed, 2023-03-29 at 09:27 +0800, Binbin Wu wrote:
>> On 3/29/2023 7:33 AM, Huang, Kai wrote:
>>> On Tue, 2023-03-21 at 14:35 -0700, Sean Christopherson wrote:
>>>> On Mon, Mar 20, 2023, Chao Gao wrote:
>>>>> On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
>>>>>> get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
>>>>>> to check 64-bit mode. Should use is_64_bit_mode() instead.
>>>>>>
>>>>>> Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
>>>>>> Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
>>>>> It is better to split this patch into two: one for nested and one for
>>>>> SGX.
>>>>>
>>>>> It is possible that there is a kernel release which has just one of
>>>>> above two flawed commits, then this fix patch cannot be applied cleanly
>>>>> to the release.
>>>> The nVMX code isn't buggy, VMX instructions #UD in compatibility mode, and except
>>>> for VMCALL, that #UD has higher priority than VM-Exit interception.  So I'd say
>>>> just drop the nVMX side of things.
>>> But it looks the old code doesn't unconditionally inject #UD when in
>>> compatibility mode?
>> I think Sean means VMX instructions is not valid in compatibility mode
>> and it triggers #UD, which has higher priority than VM-Exit, by the
>> processor in non-root mode.
>>
>> So if there is a VM-Exit due to VMX instruction , it is in 64-bit mode
>> for sure if it is in long mode.
> Oh I see thanks.
>
> Then is it better to add some comment to explain, or add a WARN() if it's not in
> 64-bit mode?

I also prefer to add a comment if no objection.

Seems I am not the only one who didn't get it  : )

>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 6/7] KVM: x86: Untag address when LAM applicable
  2023-03-19  8:49 ` [PATCH v6 6/7] KVM: x86: Untag address when LAM applicable Binbin Wu
  2023-03-20 11:51   ` Chao Gao
  2023-03-20 12:04   ` Binbin Wu
@ 2023-03-29  5:02   ` Binbin Wu
  2 siblings, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-29  5:02 UTC (permalink / raw)
  To: kvm, seanjc, pbonzini; +Cc: chao.gao, robert.hu


On 3/19/2023 4:49 PM, Binbin Wu wrote:
> Untag address for 64-bit memory/mmio operand in instruction emulations
> and vmexit handlers when LAM is applicable.
>
> For instruction emulation, untag address in __linearize() before
> canonical check. LAM doesn't apply to instruction fetch and invlpg,
> use KVM_X86_UNTAG_ADDR_SKIP_LAM to skip LAM untag.
>
> For vmexit handlings related to 64-bit linear address:
> - Cases need to untag address
>    Operand(s) of VMX instructions and INVPCID
>    Operand(s) of SGX ENCLS
>    Linear address in INVVPID descriptor.
> - Cases LAM doesn't apply to (no change needed)
>    Operand of INVLPG
>    Linear address in INVPCID descriptor
>
> Co-developed-by: Robert Hoo <robert.hu@linux.intel.com>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
>   arch/x86/kvm/emulate.c    | 25 +++++++++++++++++--------
>   arch/x86/kvm/vmx/nested.c |  2 ++
>   arch/x86/kvm/vmx/sgx.c    |  1 +
>   arch/x86/kvm/x86.c        |  4 ++++
>   4 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index a630c5db971c..c46f0162498e 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -688,7 +688,8 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
>   				       struct segmented_address addr,
>   				       unsigned *max_size, unsigned size,
>   				       bool write, bool fetch,
> -				       enum x86emul_mode mode, ulong *linear)
> +				       enum x86emul_mode mode, ulong *linear,
> +				       u64 untag_flags)
>   {
>   	struct desc_struct desc;
>   	bool usable;
> @@ -701,9 +702,10 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
>   	*max_size = 0;
>   	switch (mode) {
>   	case X86EMUL_MODE_PROT64:
> -		*linear = la;
> +		*linear = static_call(kvm_x86_untag_addr)(ctxt->vcpu, la, untag_flags);
> +
>   		va_bits = ctxt_virt_addr_bits(ctxt);
> -		if (!__is_canonical_address(la, va_bits))
> +		if (!__is_canonical_address(*linear, va_bits))
>   			goto bad;

Find  la as a local variable, will be used later.
This part will be updated as following to make la untagged to avoid 
further changes:

@@ -701,6 +702,7 @@ static __always_inline int __linearize(struct 
x86_emulate_ctxt *ctxt,
         *max_size = 0;
         switch (mode) {
         case X86EMUL_MODE_PROT64:
+               la = ctxt->ops->untag_addr(ctxt, la, untag_flags);
                 *linear = la;
                 va_bits = ctxt_virt_addr_bits(ctxt);
                 if (!__is_canonical_address(la, va_bits))



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-29  2:08             ` Binbin Wu
@ 2023-03-29 17:34               ` Sean Christopherson
  2023-03-29 22:46                 ` Huang, Kai
  2023-04-04  6:14                 ` Binbin Wu
  0 siblings, 2 replies; 43+ messages in thread
From: Sean Christopherson @ 2023-03-29 17:34 UTC (permalink / raw)
  To: Binbin Wu; +Cc: Kai Huang, Chao Gao, kvm, pbonzini, robert.hu

On Wed, Mar 29, 2023, Binbin Wu wrote:
> 
> On 3/29/2023 10:04 AM, Huang, Kai wrote:
> > On Wed, 2023-03-29 at 09:27 +0800, Binbin Wu wrote:
> > > On 3/29/2023 7:33 AM, Huang, Kai wrote:
> > > > On Tue, 2023-03-21 at 14:35 -0700, Sean Christopherson wrote:
> > > > > On Mon, Mar 20, 2023, Chao Gao wrote:
> > > > > > On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
> > > > > > > get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
> > > > > > > to check 64-bit mode. Should use is_64_bit_mode() instead.
> > > > > > > 
> > > > > > > Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
> > > > > > > Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
> > > > > > It is better to split this patch into two: one for nested and one for
> > > > > > SGX.
> > > > > > 
> > > > > > It is possible that there is a kernel release which has just one of
> > > > > > above two flawed commits, then this fix patch cannot be applied cleanly
> > > > > > to the release.
> > > > > The nVMX code isn't buggy, VMX instructions #UD in compatibility mode, and except
> > > > > for VMCALL, that #UD has higher priority than VM-Exit interception.  So I'd say
> > > > > just drop the nVMX side of things.
> > > > But it looks the old code doesn't unconditionally inject #UD when in
> > > > compatibility mode?
> > > I think Sean means VMX instructions is not valid in compatibility mode
> > > and it triggers #UD, which has higher priority than VM-Exit, by the
> > > processor in non-root mode.
> > > 
> > > So if there is a VM-Exit due to VMX instruction , it is in 64-bit mode
> > > for sure if it is in long mode.
> > Oh I see thanks.
> > 
> > Then is it better to add some comment to explain, or add a WARN() if it's not in
> > 64-bit mode?
> 
> I also prefer to add a comment if no objection.
> 
> Seems I am not the only one who didn't get it� : )

I would rather have a code change than a comment, e.g. 

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f63b28f46a71..0460ca219f96 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4931,7 +4931,8 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
        int  base_reg       = (vmx_instruction_info >> 23) & 0xf;
        bool base_is_valid  = !(vmx_instruction_info & (1u << 27));
 
-       if (is_reg) {
+       if (is_reg ||
+           WARN_ON_ONCE(is_long_mode(vcpu) && !is_64_bit_mode(vcpu))) {
                kvm_queue_exception(vcpu, UD_VECTOR);
                return 1;
        }


The only downside is that querying is_64_bit_mode() could unnecessarily trigger a
VMREAD to get the current CS.L bit, but a measurable performance regressions is
extremely unlikely because is_64_bit_mode() all but guaranteed to be called in
these paths anyways (and KVM caches segment info), e.g. by kvm_register_read().

And then in a follow-up, we should also be able to do:

@@ -5402,7 +5403,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
        if (instr_info & BIT(10)) {
                kvm_register_write(vcpu, (((instr_info) >> 3) & 0xf), value);
        } else {
-               len = is_64_bit_mode(vcpu) ? 8 : 4;
+               len = is_long_mode(vcpu) ? 8 : 4;
                if (get_vmx_mem_address(vcpu, exit_qualification,
                                        instr_info, true, len, &gva))
                        return 1;
@@ -5476,7 +5477,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
        if (instr_info & BIT(10))
                value = kvm_register_read(vcpu, (((instr_info) >> 3) & 0xf));
        else {
-               len = is_64_bit_mode(vcpu) ? 8 : 4;
+               len = is_long_mode(vcpu) ? 8 : 4;
                if (get_vmx_mem_address(vcpu, exit_qualification,
                                        instr_info, false, len, &gva))
                        return 1;


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-29 17:34               ` Sean Christopherson
@ 2023-03-29 22:46                 ` Huang, Kai
  2023-04-03  3:37                   ` Binbin Wu
  2023-04-04  6:14                 ` Binbin Wu
  1 sibling, 1 reply; 43+ messages in thread
From: Huang, Kai @ 2023-03-29 22:46 UTC (permalink / raw)
  To: Christopherson,, Sean, binbin.wu; +Cc: kvm, pbonzini, robert.hu, Gao, Chao

On Wed, 2023-03-29 at 10:34 -0700, Sean Christopherson wrote:
> On Wed, Mar 29, 2023, Binbin Wu wrote:
> > 
> > On 3/29/2023 10:04 AM, Huang, Kai wrote:
> > > On Wed, 2023-03-29 at 09:27 +0800, Binbin Wu wrote:
> > > > On 3/29/2023 7:33 AM, Huang, Kai wrote:
> > > > > On Tue, 2023-03-21 at 14:35 -0700, Sean Christopherson wrote:
> > > > > > On Mon, Mar 20, 2023, Chao Gao wrote:
> > > > > > > On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
> > > > > > > > get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
> > > > > > > > to check 64-bit mode. Should use is_64_bit_mode() instead.
> > > > > > > > 
> > > > > > > > Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
> > > > > > > > Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
> > > > > > > It is better to split this patch into two: one for nested and one for
> > > > > > > SGX.
> > > > > > > 
> > > > > > > It is possible that there is a kernel release which has just one of
> > > > > > > above two flawed commits, then this fix patch cannot be applied cleanly
> > > > > > > to the release.
> > > > > > The nVMX code isn't buggy, VMX instructions #UD in compatibility mode, and except
> > > > > > for VMCALL, that #UD has higher priority than VM-Exit interception.  So I'd say
> > > > > > just drop the nVMX side of things.
> > > > > But it looks the old code doesn't unconditionally inject #UD when in
> > > > > compatibility mode?
> > > > I think Sean means VMX instructions is not valid in compatibility mode
> > > > and it triggers #UD, which has higher priority than VM-Exit, by the
> > > > processor in non-root mode.
> > > > 
> > > > So if there is a VM-Exit due to VMX instruction , it is in 64-bit mode
> > > > for sure if it is in long mode.
> > > Oh I see thanks.
> > > 
> > > Then is it better to add some comment to explain, or add a WARN() if it's not in
> > > 64-bit mode?
> > 
> > I also prefer to add a comment if no objection.
> > 
> > Seems I am not the only one who didn't get it� : )
> 
> I would rather have a code change than a comment, e.g. 
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index f63b28f46a71..0460ca219f96 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4931,7 +4931,8 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
>         int  base_reg       = (vmx_instruction_info >> 23) & 0xf;
>         bool base_is_valid  = !(vmx_instruction_info & (1u << 27));
>  
> -       if (is_reg) {
> +       if (is_reg ||
> +           WARN_ON_ONCE(is_long_mode(vcpu) && !is_64_bit_mode(vcpu))) {
>                 kvm_queue_exception(vcpu, UD_VECTOR);
>                 return 1;
>         }
> 
> 

Looks good to me.

> The only downside is that querying is_64_bit_mode() could unnecessarily trigger a
> VMREAD to get the current CS.L bit, but a measurable performance regressions is
> extremely unlikely because is_64_bit_mode() all but guaranteed to be called in
> these paths anyways (and KVM caches segment info), e.g. by kvm_register_read().

Agreed.

> 
> And then in a follow-up, we should also be able to do:
> 
> @@ -5402,7 +5403,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
>         if (instr_info & BIT(10)) {
>                 kvm_register_write(vcpu, (((instr_info) >> 3) & 0xf), value);
>         } else {
> -               len = is_64_bit_mode(vcpu) ? 8 : 4;
> +               len = is_long_mode(vcpu) ? 8 : 4;
>                 if (get_vmx_mem_address(vcpu, exit_qualification,
>                                         instr_info, true, len, &gva))
>                         return 1;
> @@ -5476,7 +5477,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
>         if (instr_info & BIT(10))
>                 value = kvm_register_read(vcpu, (((instr_info) >> 3) & 0xf));
>         else {
> -               len = is_64_bit_mode(vcpu) ? 8 : 4;
> +               len = is_long_mode(vcpu) ? 8 : 4;
>                 if (get_vmx_mem_address(vcpu, exit_qualification,
>                                         instr_info, false, len, &gva))
>                         return 1;
> 

Yeah, although it's a little bit wired the actual WARN() happens after above
code change.  But I don't know how to make the code better.  Maybe we should put
the WARN() at the very beginning but this would require duplicated code in each
handle_xxx() for VMX instructions.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 4/7] KVM: x86: Virtualize CR3.LAM_{U48,U57}
  2023-03-19  8:49 ` [PATCH v6 4/7] KVM: x86: Virtualize CR3.LAM_{U48,U57} Binbin Wu
@ 2023-03-30  8:33   ` Yang, Weijiang
  2023-03-30  8:40     ` Binbin Wu
  0 siblings, 1 reply; 43+ messages in thread
From: Yang, Weijiang @ 2023-03-30  8:33 UTC (permalink / raw)
  To: Binbin Wu; +Cc: chao.gao, robert.hu, kvm, seanjc, pbonzini


On 3/19/2023 4:49 PM, Binbin Wu wrote:
> From: Robert Hoo <robert.hu@linux.intel.com>
>
> LAM uses CR3.LAM_U48 (bit 62) and CR3.LAM_U57 (bit 61) to configure LAM
> masking for user mode pointers.
>
> When EPT is on:
> CR3 is fully under control of guest, guest LAM is thus transparent to KVM.
>
> When EPT is off (shadow paging):
> KVM needs to handle guest CR3.LAM_U48 and CR3.LAM_U57 toggles.
> The two bits don't participate in page table walking. They should be masked
> to get the base address of page table. When shadow paging is used, the two
> bits should be kept as they are in the shadow CR3.
> To be generic, introduce a field 'cr3_ctrl_bits' in kvm_vcpu_arch to record
> the bits used to control supported features related to CR3 (e.g. LAM).
> - Supported control bits are set to cr3_ctrl_bits.
> - Add kvm_vcpu_is_legal_cr3() to validate CR3, allow setting of the control
>    bits for the supported features.
> - cr3_ctrl_bits is used to mask the control bits when calculate the base
>    address of page table from mmu::get_guest_pgd().
> - Add kvm_get_active_cr3_ctrl_bits() to get the active control bits to form
>    a new guest CR3 (in vmx_load_mmu_pgd()).
> - For only control bits toggle cases, it is unnecessary to make new pgd, but
>    just make request of load pgd.
>    Especially, for ONLY-LAM-bits toggle cases, skip TLB flush since hardware
>    is not required to flush TLB when CR3 LAM bits toggled.
>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  7 +++++++
>   arch/x86/kvm/cpuid.h            |  5 +++++
>   arch/x86/kvm/mmu.h              |  5 +++++
>   arch/x86/kvm/mmu/mmu.c          |  2 +-
>   arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
>   arch/x86/kvm/vmx/nested.c       |  6 +++---
>   arch/x86/kvm/vmx/vmx.c          |  6 +++++-
>   arch/x86/kvm/x86.c              | 29 +++++++++++++++++++++++------
>   8 files changed, 50 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 742fd84c7997..2174ad27013b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -730,6 +730,13 @@ struct kvm_vcpu_arch {
>   	unsigned long cr0_guest_owned_bits;
>   	unsigned long cr2;
>   	unsigned long cr3;
> +	/*
> +	 * Bits in CR3 used to enable certain features. These bits don't
> +	 * participate in page table walking. They should be masked to
> +	 * get the base address of page table. When shadow paging is
> +	 * used, these bits should be kept as they are in the shadow CR3.
> +	 */
> +	u64 cr3_ctrl_bits;

The "ctrl_bits" turns out to be LAM bits only, so better to change the 
name as cr3_lam_bits

to make it specific.

>   	unsigned long cr4;
>   	unsigned long cr4_guest_owned_bits;
>   	unsigned long cr4_guest_rsvd_bits;
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index b1658c0de847..ef8e1b912d7d 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -42,6 +42,11 @@ static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
>   	return vcpu->arch.maxphyaddr;
>   }
>   
> +static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> +{
> +	return !((cr3 & vcpu->arch.reserved_gpa_bits) & ~vcpu->arch.cr3_ctrl_bits);
> +}
> +
>   static inline bool kvm_vcpu_is_legal_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
>   {
>   	return !(gpa & vcpu->arch.reserved_gpa_bits);
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 168c46fd8dd1..29985eeb8e12 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -142,6 +142,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
>   	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
>   }
>   
> +static inline u64 kvm_get_active_cr3_ctrl_bits(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_read_cr3(vcpu) & vcpu->arch.cr3_ctrl_bits;
> +}
Same as above, change the function name to kvm_get_active_cr3_lam_bits().
> +
>   static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
>   {
>   	u64 root_hpa = vcpu->arch.mmu->root.hpa;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index aeb240b339f5..e0b86ace7326 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3722,7 +3722,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>   	int quadrant, i, r;
>   	hpa_t root;
>   
> -	root_pgd = mmu->get_guest_pgd(vcpu);
> +	root_pgd = mmu->get_guest_pgd(vcpu) & ~vcpu->arch.cr3_ctrl_bits;
>   	root_gfn = root_pgd >> PAGE_SHIFT;
>   
>   	if (mmu_check_root(vcpu, root_gfn))
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index e5662dbd519c..8887615534b0 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -324,7 +324,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>   	trace_kvm_mmu_pagetable_walk(addr, access);
>   retry_walk:
>   	walker->level = mmu->cpu_role.base.level;
> -	pte           = mmu->get_guest_pgd(vcpu);
> +	pte           = mmu->get_guest_pgd(vcpu) & ~vcpu->arch.cr3_ctrl_bits;
>   	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
>   
>   #if PTTYPE == 64
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 0f84cc05f57c..2eb258992d63 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1079,7 +1079,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
>   			       bool nested_ept, bool reload_pdptrs,
>   			       enum vm_entry_failure_code *entry_failure_code)
>   {
> -	if (CC(kvm_vcpu_is_illegal_gpa(vcpu, cr3))) {
> +	if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3))) {
>   		*entry_failure_code = ENTRY_FAIL_DEFAULT;
>   		return -EINVAL;
>   	}
> @@ -1101,7 +1101,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
>   	kvm_init_mmu(vcpu);
>   
>   	if (!nested_ept)
> -		kvm_mmu_new_pgd(vcpu, cr3);
> +		kvm_mmu_new_pgd(vcpu, cr3 & ~vcpu->arch.cr3_ctrl_bits);
>   
>   	return 0;
>   }
> @@ -2907,7 +2907,7 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
>   
>   	if (CC(!nested_host_cr0_valid(vcpu, vmcs12->host_cr0)) ||
>   	    CC(!nested_host_cr4_valid(vcpu, vmcs12->host_cr4)) ||
> -	    CC(kvm_vcpu_is_illegal_gpa(vcpu, vmcs12->host_cr3)))
> +	    CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3)))
>   		return -EINVAL;
>   
>   	if (CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 66a50224293e..9638a3000256 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3390,7 +3390,8 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
>   			update_guest_cr3 = false;
>   		vmx_ept_load_pdptrs(vcpu);
>   	} else {
> -		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu);
> +		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu) |
> +		            kvm_get_active_cr3_ctrl_bits(vcpu);
>   	}
>   
>   	if (update_guest_cr3)
> @@ -7750,6 +7751,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   		vmx->msr_ia32_feature_control_valid_bits &=
>   			~FEAT_CTL_SGX_LC_ENABLED;
>   
> +	if (guest_cpuid_has(vcpu, X86_FEATURE_LAM))
> +		vcpu->arch.cr3_ctrl_bits |= X86_CR3_LAM_U48 | X86_CR3_LAM_U57;
> +
>   	/* Refresh #PF interception to account for MAXPHYADDR changes. */
>   	vmx_update_exception_bitmap(vcpu);
>   }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 410327e7eb55..e74af72f53ec 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1236,7 +1236,7 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
>   int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   {
>   	bool skip_tlb_flush = false;
> -	unsigned long pcid = 0;
> +	unsigned long pcid = 0, old_cr3;
>   #ifdef CONFIG_X86_64
>   	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>   
> @@ -1247,8 +1247,9 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   	}
>   #endif
>   
> +	old_cr3 = kvm_read_cr3(vcpu);
>   	/* PDPTRs are always reloaded for PAE paging. */
> -	if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
> +	if (cr3 == old_cr3 && !is_pae_paging(vcpu))
>   		goto handle_tlb_flush;
>   
>   	/*
> @@ -1256,14 +1257,30 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
>   	 * the current vCPU mode is accurate.
>   	 */
> -	if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
> +	if (!kvm_vcpu_is_legal_cr3(vcpu, cr3))
>   		return 1;
>   
>   	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
>   		return 1;
>   
> -	if (cr3 != kvm_read_cr3(vcpu))
> -		kvm_mmu_new_pgd(vcpu, cr3);
> +	if (cr3 != old_cr3) {
> +		if ((cr3 ^ old_cr3) & ~vcpu->arch.cr3_ctrl_bits) {
> +			kvm_mmu_new_pgd(vcpu, cr3 & ~vcpu->arch.cr3_ctrl_bits);
> +		} else {
> +			/*
> +			 * Though only control (LAM) bits changed, make the
> +			 * request to force an update on guest CR3 because the
> +			 * control (LAM) bits are stale
> +			 */
> +			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
> +			/*
> +			 * HW is not required to flush TLB when CR3 LAM bits toggled.
> +			 * Currently only LAM bits in cr3_ctrl_bits, if more bits added in
> +			 * the future, need to check whether to skip TLB flush or not.
> +			 */
> +			skip_tlb_flush = true;
> +		}
> +	}
>   
>   	vcpu->arch.cr3 = cr3;
>   	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> @@ -11305,7 +11322,7 @@ static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>   		 */
>   		if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
>   			return false;
> -		if (kvm_vcpu_is_illegal_gpa(vcpu, sregs->cr3))
> +		if (!kvm_vcpu_is_legal_cr3(vcpu, sregs->cr3))
>   			return false;
>   	} else {
>   		/*

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 4/7] KVM: x86: Virtualize CR3.LAM_{U48,U57}
  2023-03-30  8:33   ` Yang, Weijiang
@ 2023-03-30  8:40     ` Binbin Wu
  0 siblings, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-03-30  8:40 UTC (permalink / raw)
  To: Yang, Weijiang; +Cc: chao.gao, robert.hu, kvm, seanjc, pbonzini


On 3/30/2023 4:33 PM, Yang, Weijiang wrote:
>
> On 3/19/2023 4:49 PM, Binbin Wu wrote:
>> From: Robert Hoo <robert.hu@linux.intel.com>
>>
>> LAM uses CR3.LAM_U48 (bit 62) and CR3.LAM_U57 (bit 61) to configure LAM
>> masking for user mode pointers.
>>
>> When EPT is on:
>> CR3 is fully under control of guest, guest LAM is thus transparent to 
>> KVM.
>>
>> When EPT is off (shadow paging):
>> KVM needs to handle guest CR3.LAM_U48 and CR3.LAM_U57 toggles.
>> The two bits don't participate in page table walking. They should be 
>> masked
>> to get the base address of page table. When shadow paging is used, 
>> the two
>> bits should be kept as they are in the shadow CR3.
>> To be generic, introduce a field 'cr3_ctrl_bits' in kvm_vcpu_arch to 
>> record
>> the bits used to control supported features related to CR3 (e.g. LAM).
>> - Supported control bits are set to cr3_ctrl_bits.
>> - Add kvm_vcpu_is_legal_cr3() to validate CR3, allow setting of the 
>> control
>>    bits for the supported features.
>> - cr3_ctrl_bits is used to mask the control bits when calculate the base
>>    address of page table from mmu::get_guest_pgd().
>> - Add kvm_get_active_cr3_ctrl_bits() to get the active control bits 
>> to form
>>    a new guest CR3 (in vmx_load_mmu_pgd()).
>> - For only control bits toggle cases, it is unnecessary to make new 
>> pgd, but
>>    just make request of load pgd.
>>    Especially, for ONLY-LAM-bits toggle cases, skip TLB flush since 
>> hardware
>>    is not required to flush TLB when CR3 LAM bits toggled.
>>
>> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  7 +++++++
>>   arch/x86/kvm/cpuid.h            |  5 +++++
>>   arch/x86/kvm/mmu.h              |  5 +++++
>>   arch/x86/kvm/mmu/mmu.c          |  2 +-
>>   arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
>>   arch/x86/kvm/vmx/nested.c       |  6 +++---
>>   arch/x86/kvm/vmx/vmx.c          |  6 +++++-
>>   arch/x86/kvm/x86.c              | 29 +++++++++++++++++++++++------
>>   8 files changed, 50 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h 
>> b/arch/x86/include/asm/kvm_host.h
>> index 742fd84c7997..2174ad27013b 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -730,6 +730,13 @@ struct kvm_vcpu_arch {
>>       unsigned long cr0_guest_owned_bits;
>>       unsigned long cr2;
>>       unsigned long cr3;
>> +    /*
>> +     * Bits in CR3 used to enable certain features. These bits don't
>> +     * participate in page table walking. They should be masked to
>> +     * get the base address of page table. When shadow paging is
>> +     * used, these bits should be kept as they are in the shadow CR3.
>> +     */
>> +    u64 cr3_ctrl_bits;
>
> The "ctrl_bits" turns out to be LAM bits only, so better to change the 
> name as cr3_lam_bits
>
> to make it specific.

The purpose to add the field here is to make it generic and can easily 
be extended for other
future features (if any), which also will use the reserved high bit(s), 
although currently, it is
only used by LAM.


>
>>       unsigned long cr4;
>>       unsigned long cr4_guest_owned_bits;
>>       unsigned long cr4_guest_rsvd_bits;
>> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
>> index b1658c0de847..ef8e1b912d7d 100644
>> --- a/arch/x86/kvm/cpuid.h
>> +++ b/arch/x86/kvm/cpuid.h
>> @@ -42,6 +42,11 @@ static inline int cpuid_maxphyaddr(struct kvm_vcpu 
>> *vcpu)
>>       return vcpu->arch.maxphyaddr;
>>   }
>>   +static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, 
>> unsigned long cr3)
>> +{
>> +    return !((cr3 & vcpu->arch.reserved_gpa_bits) & 
>> ~vcpu->arch.cr3_ctrl_bits);
>> +}
>> +
>>   static inline bool kvm_vcpu_is_legal_gpa(struct kvm_vcpu *vcpu, 
>> gpa_t gpa)
>>   {
>>       return !(gpa & vcpu->arch.reserved_gpa_bits);
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index 168c46fd8dd1..29985eeb8e12 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -142,6 +142,11 @@ static inline unsigned long 
>> kvm_get_active_pcid(struct kvm_vcpu *vcpu)
>>       return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
>>   }
>>   +static inline u64 kvm_get_active_cr3_ctrl_bits(struct kvm_vcpu *vcpu)
>> +{
>> +    return kvm_read_cr3(vcpu) & vcpu->arch.cr3_ctrl_bits;
>> +}
> Same as above, change the function name to kvm_get_active_cr3_lam_bits().
>> +
>>   static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
>>   {
>>       u64 root_hpa = vcpu->arch.mmu->root.hpa;
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index aeb240b339f5..e0b86ace7326 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -3722,7 +3722,7 @@ static int mmu_alloc_shadow_roots(struct 
>> kvm_vcpu *vcpu)
>>       int quadrant, i, r;
>>       hpa_t root;
>>   -    root_pgd = mmu->get_guest_pgd(vcpu);
>> +    root_pgd = mmu->get_guest_pgd(vcpu) & ~vcpu->arch.cr3_ctrl_bits;
>>       root_gfn = root_pgd >> PAGE_SHIFT;
>>         if (mmu_check_root(vcpu, root_gfn))
>> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h 
>> b/arch/x86/kvm/mmu/paging_tmpl.h
>> index e5662dbd519c..8887615534b0 100644
>> --- a/arch/x86/kvm/mmu/paging_tmpl.h
>> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
>> @@ -324,7 +324,7 @@ static int FNAME(walk_addr_generic)(struct 
>> guest_walker *walker,
>>       trace_kvm_mmu_pagetable_walk(addr, access);
>>   retry_walk:
>>       walker->level = mmu->cpu_role.base.level;
>> -    pte           = mmu->get_guest_pgd(vcpu);
>> +    pte           = mmu->get_guest_pgd(vcpu) & 
>> ~vcpu->arch.cr3_ctrl_bits;
>>       have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
>>     #if PTTYPE == 64
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index 0f84cc05f57c..2eb258992d63 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -1079,7 +1079,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu 
>> *vcpu, unsigned long cr3,
>>                      bool nested_ept, bool reload_pdptrs,
>>                      enum vm_entry_failure_code *entry_failure_code)
>>   {
>> -    if (CC(kvm_vcpu_is_illegal_gpa(vcpu, cr3))) {
>> +    if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3))) {
>>           *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>           return -EINVAL;
>>       }
>> @@ -1101,7 +1101,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu 
>> *vcpu, unsigned long cr3,
>>       kvm_init_mmu(vcpu);
>>         if (!nested_ept)
>> -        kvm_mmu_new_pgd(vcpu, cr3);
>> +        kvm_mmu_new_pgd(vcpu, cr3 & ~vcpu->arch.cr3_ctrl_bits);
>>         return 0;
>>   }
>> @@ -2907,7 +2907,7 @@ static int nested_vmx_check_host_state(struct 
>> kvm_vcpu *vcpu,
>>         if (CC(!nested_host_cr0_valid(vcpu, vmcs12->host_cr0)) ||
>>           CC(!nested_host_cr4_valid(vcpu, vmcs12->host_cr4)) ||
>> -        CC(kvm_vcpu_is_illegal_gpa(vcpu, vmcs12->host_cr3)))
>> +        CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3)))
>>           return -EINVAL;
>>         if 
>> (CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 66a50224293e..9638a3000256 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -3390,7 +3390,8 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu 
>> *vcpu, hpa_t root_hpa,
>>               update_guest_cr3 = false;
>>           vmx_ept_load_pdptrs(vcpu);
>>       } else {
>> -        guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu);
>> +        guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu) |
>> +                    kvm_get_active_cr3_ctrl_bits(vcpu);
>>       }
>>         if (update_guest_cr3)
>> @@ -7750,6 +7751,9 @@ static void vmx_vcpu_after_set_cpuid(struct 
>> kvm_vcpu *vcpu)
>>           vmx->msr_ia32_feature_control_valid_bits &=
>>               ~FEAT_CTL_SGX_LC_ENABLED;
>>   +    if (guest_cpuid_has(vcpu, X86_FEATURE_LAM))
>> +        vcpu->arch.cr3_ctrl_bits |= X86_CR3_LAM_U48 | X86_CR3_LAM_U57;
>> +
>>       /* Refresh #PF interception to account for MAXPHYADDR changes. */
>>       vmx_update_exception_bitmap(vcpu);
>>   }
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 410327e7eb55..e74af72f53ec 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1236,7 +1236,7 @@ static void kvm_invalidate_pcid(struct kvm_vcpu 
>> *vcpu, unsigned long pcid)
>>   int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>>   {
>>       bool skip_tlb_flush = false;
>> -    unsigned long pcid = 0;
>> +    unsigned long pcid = 0, old_cr3;
>>   #ifdef CONFIG_X86_64
>>       bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>>   @@ -1247,8 +1247,9 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, 
>> unsigned long cr3)
>>       }
>>   #endif
>>   +    old_cr3 = kvm_read_cr3(vcpu);
>>       /* PDPTRs are always reloaded for PAE paging. */
>> -    if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
>> +    if (cr3 == old_cr3 && !is_pae_paging(vcpu))
>>           goto handle_tlb_flush;
>>         /*
>> @@ -1256,14 +1257,30 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, 
>> unsigned long cr3)
>>        * stuff CR3, e.g. for RSM emulation, and there is no guarantee 
>> that
>>        * the current vCPU mode is accurate.
>>        */
>> -    if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
>> +    if (!kvm_vcpu_is_legal_cr3(vcpu, cr3))
>>           return 1;
>>         if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
>>           return 1;
>>   -    if (cr3 != kvm_read_cr3(vcpu))
>> -        kvm_mmu_new_pgd(vcpu, cr3);
>> +    if (cr3 != old_cr3) {
>> +        if ((cr3 ^ old_cr3) & ~vcpu->arch.cr3_ctrl_bits) {
>> +            kvm_mmu_new_pgd(vcpu, cr3 & ~vcpu->arch.cr3_ctrl_bits);
>> +        } else {
>> +            /*
>> +             * Though only control (LAM) bits changed, make the
>> +             * request to force an update on guest CR3 because the
>> +             * control (LAM) bits are stale
>> +             */
>> +            kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
>> +            /*
>> +             * HW is not required to flush TLB when CR3 LAM bits 
>> toggled.
>> +             * Currently only LAM bits in cr3_ctrl_bits, if more 
>> bits added in
>> +             * the future, need to check whether to skip TLB flush 
>> or not.
>> +             */
>> +            skip_tlb_flush = true;
>> +        }
>> +    }
>>         vcpu->arch.cr3 = cr3;
>>       kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
>> @@ -11305,7 +11322,7 @@ static bool kvm_is_valid_sregs(struct 
>> kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>>            */
>>           if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
>>               return false;
>> -        if (kvm_vcpu_is_illegal_gpa(vcpu, sregs->cr3))
>> +        if (!kvm_vcpu_is_legal_cr3(vcpu, sregs->cr3))
>>               return false;
>>       } else {
>>           /*

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-29 22:46                 ` Huang, Kai
@ 2023-04-03  3:37                   ` Binbin Wu
  2023-04-03 11:24                     ` Huang, Kai
  0 siblings, 1 reply; 43+ messages in thread
From: Binbin Wu @ 2023-04-03  3:37 UTC (permalink / raw)
  To: Huang, Kai, Christopherson,, Sean; +Cc: kvm, pbonzini, robert.hu, Gao, Chao


On 3/30/2023 6:46 AM, Huang, Kai wrote:
> On Wed, 2023-03-29 at 10:34 -0700, Sean Christopherson wrote:
>> On Wed, Mar 29, 2023, Binbin Wu wrote:
>>> On 3/29/2023 10:04 AM, Huang, Kai wrote:
>>>> On Wed, 2023-03-29 at 09:27 +0800, Binbin Wu wrote:
>>>>> On 3/29/2023 7:33 AM, Huang, Kai wrote:
>>>>>> On Tue, 2023-03-21 at 14:35 -0700, Sean Christopherson wrote:
>>>>>>> On Mon, Mar 20, 2023, Chao Gao wrote:
>>>>>>>> On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
>>>>>>>>> get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
>>>>>>>>> to check 64-bit mode. Should use is_64_bit_mode() instead.
>>>>>>>>>
>>>>>>>>> Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
>>>>>>>>> Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
>>>>>>>> It is better to split this patch into two: one for nested and one for
>>>>>>>> SGX.
>>>>>>>>
>>>>>>>> It is possible that there is a kernel release which has just one of
>>>>>>>> above two flawed commits, then this fix patch cannot be applied cleanly
>>>>>>>> to the release.
>>>>>>> The nVMX code isn't buggy, VMX instructions #UD in compatibility mode, and except
>>>>>>> for VMCALL, that #UD has higher priority than VM-Exit interception.  So I'd say
>>>>>>> just drop the nVMX side of things.
>>>>>> But it looks the old code doesn't unconditionally inject #UD when in
>>>>>> compatibility mode?
>>>>> I think Sean means VMX instructions is not valid in compatibility mode
>>>>> and it triggers #UD, which has higher priority than VM-Exit, by the
>>>>> processor in non-root mode.
>>>>>
>>>>> So if there is a VM-Exit due to VMX instruction , it is in 64-bit mode
>>>>> for sure if it is in long mode.
>>>> Oh I see thanks.
>>>>
>>>> Then is it better to add some comment to explain, or add a WARN() if it's not in
>>>> 64-bit mode?
>>> I also prefer to add a comment if no objection.
>>>
>>> Seems I am not the only one who didn't get it� : )
>> I would rather have a code change than a comment, e.g.
>>
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index f63b28f46a71..0460ca219f96 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -4931,7 +4931,8 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
>>          int  base_reg       = (vmx_instruction_info >> 23) & 0xf;
>>          bool base_is_valid  = !(vmx_instruction_info & (1u << 27));
>>   
>> -       if (is_reg) {
>> +       if (is_reg ||
>> +           WARN_ON_ONCE(is_long_mode(vcpu) && !is_64_bit_mode(vcpu))) {
>>                  kvm_queue_exception(vcpu, UD_VECTOR);
>>                  return 1;
>>          }
>>
>>
> Looks good to me.
>
>> The only downside is that querying is_64_bit_mode() could unnecessarily trigger a
>> VMREAD to get the current CS.L bit, but a measurable performance regressions is
>> extremely unlikely because is_64_bit_mode() all but guaranteed to be called in
>> these paths anyways (and KVM caches segment info), e.g. by kvm_register_read().
> Agreed.
>
>> And then in a follow-up, we should also be able to do:
>>
>> @@ -5402,7 +5403,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
>>          if (instr_info & BIT(10)) {
>>                  kvm_register_write(vcpu, (((instr_info) >> 3) & 0xf), value);
>>          } else {
>> -               len = is_64_bit_mode(vcpu) ? 8 : 4;
>> +               len = is_long_mode(vcpu) ? 8 : 4;
>>                  if (get_vmx_mem_address(vcpu, exit_qualification,
>>                                          instr_info, true, len, &gva))
>>                          return 1;
>> @@ -5476,7 +5477,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
>>          if (instr_info & BIT(10))
>>                  value = kvm_register_read(vcpu, (((instr_info) >> 3) & 0xf));
>>          else {
>> -               len = is_64_bit_mode(vcpu) ? 8 : 4;
>> +               len = is_long_mode(vcpu) ? 8 : 4;
>>                  if (get_vmx_mem_address(vcpu, exit_qualification,
>>                                          instr_info, false, len, &gva))
>>                          return 1;
>>
> Yeah, although it's a little bit wired the actual WARN() happens after above
> code change.  But I don't know how to make the code better.  Maybe we should put
> the WARN() at the very beginning but this would require duplicated code in each
> handle_xxx() for VMX instructions.

I checked the code again and find the comment of 
nested_vmx_check_permission().

"/*
  * Intel's VMX Instruction Reference specifies a common set of 
prerequisites
  * for running VMX instructions (except VMXON, whose prerequisites are
  * slightly different). It also specifies what exception to inject 
otherwise.
  * Note that many of these exceptions have priority over VM exits, so they
  * don't have to be checked again here.
  */"

I think the Note part in the comment has tried to callout why the check 
for compatibility mode is unnecessary.

But I have a question here, nested_vmx_check_permission() checks that 
the vcpu is vmxon,
otherwise it will inject a #UD. Why this #UD is handled in the VMExit 
handler specifically?
Not all #UDs have higher priority than VM exits?

According to SDM Section "Relative Priority of Faults and VM Exits":
"Certain exceptions have priority over VM exits. These include 
invalid-opcode exceptions, ..."
Seems not further classifications of #UDs.

Anyway, I will seperate this patch from the LAM KVM enabling patch. And 
send a patch seperately if
needed later.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-04-03  3:37                   ` Binbin Wu
@ 2023-04-03 11:24                     ` Huang, Kai
  2023-04-03 15:02                       ` Sean Christopherson
                                         ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Huang, Kai @ 2023-04-03 11:24 UTC (permalink / raw)
  To: Christopherson,, Sean, binbin.wu; +Cc: kvm, pbonzini, robert.hu, Gao, Chao


> 
> I checked the code again and find the comment of 
> nested_vmx_check_permission().
> 
> "/*
>   * Intel's VMX Instruction Reference specifies a common set of 
> prerequisites
>   * for running VMX instructions (except VMXON, whose prerequisites are
>   * slightly different). It also specifies what exception to inject 
> otherwise.
>   * Note that many of these exceptions have priority over VM exits, so they
>   * don't have to be checked again here.
>   */"
> 
> I think the Note part in the comment has tried to callout why the check 
> for compatibility mode is unnecessary.
> 
> But I have a question here, nested_vmx_check_permission() checks that 
> the vcpu is vmxon,
> otherwise it will inject a #UD. Why this #UD is handled in the VMExit 
> handler specifically?
> Not all #UDs have higher priority than VM exits?
> 
> According to SDM Section "Relative Priority of Faults and VM Exits":
> "Certain exceptions have priority over VM exits. These include 
> invalid-opcode exceptions, ..."
> Seems not further classifications of #UDs.

This is clarified in the pseudo code of VMX instructions in the SDM.  If you
look at the pseudo code, all VMX instructions except VMXON (obviously) have
something like below:

	IF (not in VMX operation) ...
		THEN #UD;
	ELSIF in VMX non-root operation
		THEN VMexit;

So to me "this particular" #UD has higher priority over VM exits (while other
#UDs may not).

But IIUC above #UD won't happen when running VMX instruction in the guest,
because if there's any live guest, the CPU must already have been in VMX
operation.  So below check in nested_vmx_check_permission():

	if (!to_vmx(vcpu)->nested.vmxon) {                                            
                kvm_queue_exception(vcpu, UD_VECTOR);                          
                return 0;                                                      
        }

is needed to emulate the case that guest runs any other VMX instructions before
VMXON.

> 
> Anyway, I will seperate this patch from the LAM KVM enabling patch. And 
> send a patch seperately if
> needed later.
> 

I think your change for SGX is still needed based on the pseudo code of ENCLS.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-04-03 11:24                     ` Huang, Kai
@ 2023-04-03 15:02                       ` Sean Christopherson
  2023-04-03 23:13                         ` Huang, Kai
  2023-04-04  1:21                       ` Binbin Wu
  2023-04-04  1:31                       ` Binbin Wu
  2 siblings, 1 reply; 43+ messages in thread
From: Sean Christopherson @ 2023-04-03 15:02 UTC (permalink / raw)
  To: Kai Huang; +Cc: binbin.wu, kvm, pbonzini, robert.hu, Chao Gao

On Mon, Apr 03, 2023, Huang, Kai wrote:
> > 
> > I checked the code again and find the comment of 
> > nested_vmx_check_permission().
> > 
> > "/*
> >  �* Intel's VMX Instruction Reference specifies a common set of 
> > prerequisites
> >  �* for running VMX instructions (except VMXON, whose prerequisites are
> >  �* slightly different). It also specifies what exception to inject 
> > otherwise.
> >  �* Note that many of these exceptions have priority over VM exits, so they
> >  �* don't have to be checked again here.
> >  �*/"
> > 
> > I think the Note part in the comment has tried to callout why the check 
> > for compatibility mode is unnecessary.
> > 
> > But I have a question here, nested_vmx_check_permission() checks that the
> > vcpu is vmxon, otherwise it will inject a #UD. Why this #UD is handled in
> > the VMExit handler specifically?  Not all #UDs have higher priority than VM
> > exits?
> > 
> > According to SDM Section "Relative Priority of Faults and VM Exits":
> > "Certain exceptions have priority over VM exits. These include 
> > invalid-opcode exceptions, ..."
> > Seems not further classifications of #UDs.
> 
> This is clarified in the pseudo code of VMX instructions in the SDM.  If you
> look at the pseudo code, all VMX instructions except VMXON (obviously) have
> something like below:
> 
> 	IF (not in VMX operation) ...
> 		THEN #UD;
> 	ELSIF in VMX non-root operation
> 		THEN VMexit;
> 
> So to me "this particular" #UD has higher priority over VM exits (while other
> #UDs may not).

> But IIUC above #UD won't happen when running VMX instruction in the guest,
> because if there's any live guest, the CPU must already have been in VMX
> operation.  So below check in nested_vmx_check_permission():
> 
> 	if (!to_vmx(vcpu)->nested.vmxon) {                                            
>                 kvm_queue_exception(vcpu, UD_VECTOR);                          
>                 return 0;                                                      
>         }
> 
> is needed to emulate the case that guest runs any other VMX instructions before
> VMXON.

Yep.  IMO, the pseucode is misleading/confusing, the "in VMX non-root operation"
check should really come first.  The VMXON pseudocode has the same awkward
sequence:

    IF (register operand) or (CR0.PE = 0) or (CR4.VMXE = 0) or ...
        THEN #UD;
    ELSIF not in VMX operation
        THEN
            IF (CPL > 0) or (in A20M mode) or
            (the values of CR0 and CR4 are not supported in VMX operation)
                THEN #GP(0);
    ELSIF in VMX non-root operation
        THEN VMexit;
    ELSIF CPL > 0
        THEN #GP(0);
    ELSE VMfail("VMXON executed in VMX root operation");
    FI;


whereas I find this sequence for VMXON more representative of what actually happens:

    IF (register operand) or (CR0.PE = 0) or (CR4.VMXE = 0) or ...
        THEN #UD

    IF in VMX non-root operation
        THEN VMexit;

    IF CPL > 0
        THEN #GP(0)

    IF in VMX operation
        THEN VMfail("VMXON executed in VMX root operation");

    IF (in A20M mode) or
       (the values of CR0 and CR4 are not supported in VMX operation)
        THEN #GP(0);

> > Anyway, I will seperate this patch from the LAM KVM enabling patch. And 
> > send a patch seperately if needed later.
> 
> I think your change for SGX is still needed based on the pseudo code of ENCLS.

Agreed.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-04-03 15:02                       ` Sean Christopherson
@ 2023-04-03 23:13                         ` Huang, Kai
  0 siblings, 0 replies; 43+ messages in thread
From: Huang, Kai @ 2023-04-03 23:13 UTC (permalink / raw)
  To: Christopherson,, Sean; +Cc: kvm, pbonzini, robert.hu, binbin.wu, Gao, Chao

On Mon, 2023-04-03 at 08:02 -0700, Sean Christopherson wrote:
> On Mon, Apr 03, 2023, Huang, Kai wrote:
> > > 
> > > I checked the code again and find the comment of 
> > > nested_vmx_check_permission().
> > > 
> > > "/*
> > >  �* Intel's VMX Instruction Reference specifies a common set of 
> > > prerequisites
> > >  �* for running VMX instructions (except VMXON, whose prerequisites are
> > >  �* slightly different). It also specifies what exception to inject 
> > > otherwise.
> > >  �* Note that many of these exceptions have priority over VM exits, so they
> > >  �* don't have to be checked again here.
> > >  �*/"
> > > 
> > > I think the Note part in the comment has tried to callout why the check 
> > > for compatibility mode is unnecessary.
> > > 
> > > But I have a question here, nested_vmx_check_permission() checks that the
> > > vcpu is vmxon, otherwise it will inject a #UD. Why this #UD is handled in
> > > the VMExit handler specifically?  Not all #UDs have higher priority than VM
> > > exits?
> > > 
> > > According to SDM Section "Relative Priority of Faults and VM Exits":
> > > "Certain exceptions have priority over VM exits. These include 
> > > invalid-opcode exceptions, ..."
> > > Seems not further classifications of #UDs.
> > 
> > This is clarified in the pseudo code of VMX instructions in the SDM.  If you
> > look at the pseudo code, all VMX instructions except VMXON (obviously) have
> > something like below:
> > 
> > 	IF (not in VMX operation) ...
> > 		THEN #UD;
> > 	ELSIF in VMX non-root operation
> > 		THEN VMexit;
> > 
> > So to me "this particular" #UD has higher priority over VM exits (while other
> > #UDs may not).
> 
> > But IIUC above #UD won't happen when running VMX instruction in the guest,
> > because if there's any live guest, the CPU must already have been in VMX
> > operation.  So below check in nested_vmx_check_permission():
> > 
> > 	if (!to_vmx(vcpu)->nested.vmxon) {                                            
> >                 kvm_queue_exception(vcpu, UD_VECTOR);                          
> >                 return 0;                                                      
> >         }
> > 
> > is needed to emulate the case that guest runs any other VMX instructions before
> > VMXON.
> 
> Yep.  IMO, the pseucode is misleading/confusing, the "in VMX non-root operation"
> check should really come first.  The VMXON pseudocode has the same awkward
> sequence:
> 
>     IF (register operand) or (CR0.PE = 0) or (CR4.VMXE = 0) or ...
>         THEN #UD;
>     ELSIF not in VMX operation
>         THEN
>             IF (CPL > 0) or (in A20M mode) or
>             (the values of CR0 and CR4 are not supported in VMX operation)
>                 THEN #GP(0);
>     ELSIF in VMX non-root operation
>         THEN VMexit;
>     ELSIF CPL > 0
>         THEN #GP(0);
>     ELSE VMfail("VMXON executed in VMX root operation");
>     FI;
> 
> 
> whereas I find this sequence for VMXON more representative of what actually happens:
> 
>     IF (register operand) or (CR0.PE = 0) or (CR4.VMXE = 0) or ...
>         THEN #UD
> 
>     IF in VMX non-root operation
>         THEN VMexit;
> 
>     IF CPL > 0
>         THEN #GP(0)
> 
>     IF in VMX operation
>         THEN VMfail("VMXON executed in VMX root operation");
> 
>     IF (in A20M mode) or
>        (the values of CR0 and CR4 are not supported in VMX operation)
>         THEN #GP(0);

Perhaps we need to live with the fact that the pseudo code in the SDM can be
buggy :)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-04-03 11:24                     ` Huang, Kai
  2023-04-03 15:02                       ` Sean Christopherson
@ 2023-04-04  1:21                       ` Binbin Wu
  2023-04-04  1:53                         ` Huang, Kai
  2023-04-04  1:31                       ` Binbin Wu
  2 siblings, 1 reply; 43+ messages in thread
From: Binbin Wu @ 2023-04-04  1:21 UTC (permalink / raw)
  To: Huang, Kai, Christopherson,, Sean; +Cc: kvm, pbonzini, robert.hu, Gao, Chao


On 4/3/2023 7:24 PM, Huang, Kai wrote:
>>
>>
>> Anyway, I will seperate this patch from the LAM KVM enabling patch. And
>> send a patch seperately if
>> needed later.
>>
> I think your change for SGX is still needed based on the pseudo code of ENCLS.

Yes, I meant I would seperate VMX part since it is not a bug after all, 
SGX will still be in the patchset.


>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-04-03 11:24                     ` Huang, Kai
  2023-04-03 15:02                       ` Sean Christopherson
  2023-04-04  1:21                       ` Binbin Wu
@ 2023-04-04  1:31                       ` Binbin Wu
  2 siblings, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-04-04  1:31 UTC (permalink / raw)
  To: Huang, Kai, Christopherson,, Sean; +Cc: kvm, pbonzini, robert.hu, Gao, Chao


On 4/3/2023 7:24 PM, Huang, Kai wrote:
>> I checked the code again and find the comment of
>> nested_vmx_check_permission().
>>
>> "/*
>>    * Intel's VMX Instruction Reference specifies a common set of
>> prerequisites
>>    * for running VMX instructions (except VMXON, whose prerequisites are
>>    * slightly different). It also specifies what exception to inject
>> otherwise.
>>    * Note that many of these exceptions have priority over VM exits, so they
>>    * don't have to be checked again here.
>>    */"
>>
>> I think the Note part in the comment has tried to callout why the check
>> for compatibility mode is unnecessary.
>>
>> But I have a question here, nested_vmx_check_permission() checks that
>> the vcpu is vmxon,
>> otherwise it will inject a #UD. Why this #UD is handled in the VMExit
>> handler specifically?
>> Not all #UDs have higher priority than VM exits?
>>
>> According to SDM Section "Relative Priority of Faults and VM Exits":
>> "Certain exceptions have priority over VM exits. These include
>> invalid-opcode exceptions, ..."
>> Seems not further classifications of #UDs.
> This is clarified in the pseudo code of VMX instructions in the SDM.  If you
> look at the pseudo code, all VMX instructions except VMXON (obviously) have
> something like below:
>
> 	IF (not in VMX operation) ...
> 		THEN #UD;
> 	ELSIF in VMX non-root operation
> 		THEN VMexit;
>
> So to me "this particular" #UD has higher priority over VM exits (while other
> #UDs may not).
>
> But IIUC above #UD won't happen when running VMX instruction in the guest,
> because if there's any live guest, the CPU must already have been in VMX
> operation.  So below check in nested_vmx_check_permission():
>
> 	if (!to_vmx(vcpu)->nested.vmxon) {
>                  kvm_queue_exception(vcpu, UD_VECTOR);
>                  return 0;
>          }
>
> is needed to emulate the case that guest runs any other VMX instructions before
> VMXON.
>
Yes, you are right. Get the point now, thanks.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-04-04  1:21                       ` Binbin Wu
@ 2023-04-04  1:53                         ` Huang, Kai
  2023-04-04  2:45                           ` Binbin Wu
  0 siblings, 1 reply; 43+ messages in thread
From: Huang, Kai @ 2023-04-04  1:53 UTC (permalink / raw)
  To: Christopherson,, Sean, binbin.wu; +Cc: kvm, pbonzini, robert.hu, Gao, Chao

On Tue, 2023-04-04 at 09:21 +0800, Binbin Wu wrote:
> On 4/3/2023 7:24 PM, Huang, Kai wrote:
> > > 
> > > 
> > > Anyway, I will seperate this patch from the LAM KVM enabling patch. And
> > > send a patch seperately if
> > > needed later.
> > > 
> > I think your change for SGX is still needed based on the pseudo code of ENCLS.
> 
> Yes, I meant I would seperate VMX part since it is not a bug after all, 
> SGX will still be in the patchset.
> 
> 

Shouldn't SGX part be also split out as a bug fix patch?

Does it have anything to do with this LAM support series?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-04-04  1:53                         ` Huang, Kai
@ 2023-04-04  2:45                           ` Binbin Wu
  2023-04-04  3:09                             ` Huang, Kai
  0 siblings, 1 reply; 43+ messages in thread
From: Binbin Wu @ 2023-04-04  2:45 UTC (permalink / raw)
  To: Huang, Kai, Christopherson,, Sean; +Cc: kvm, pbonzini, robert.hu, Gao, Chao


On 4/4/2023 9:53 AM, Huang, Kai wrote:
> On Tue, 2023-04-04 at 09:21 +0800, Binbin Wu wrote:
>> On 4/3/2023 7:24 PM, Huang, Kai wrote:
>>>>
>>>> Anyway, I will seperate this patch from the LAM KVM enabling patch. And
>>>> send a patch seperately if
>>>> needed later.
>>>>
>>> I think your change for SGX is still needed based on the pseudo code of ENCLS.
>> Yes, I meant I would seperate VMX part since it is not a bug after all,
>> SGX will still be in the patchset.
>>
>>
> Shouldn't SGX part be also split out as a bug fix patch?
>
> Does it have anything to do with this LAM support series?

It is related to LAM support because LAM only effective in 64-bit mode,
so the untag action should only be done in 64-bit mode.

If the SGX fix patch is not included, that means LAM untag could be 
called in compatiblity mode in SGX ENCLS handler.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-04-04  2:45                           ` Binbin Wu
@ 2023-04-04  3:09                             ` Huang, Kai
  2023-04-04  3:15                               ` Binbin Wu
  0 siblings, 1 reply; 43+ messages in thread
From: Huang, Kai @ 2023-04-04  3:09 UTC (permalink / raw)
  To: Christopherson,, Sean, binbin.wu; +Cc: kvm, pbonzini, robert.hu, Gao, Chao

On Tue, 2023-04-04 at 10:45 +0800, Binbin Wu wrote:
> On 4/4/2023 9:53 AM, Huang, Kai wrote:
> > On Tue, 2023-04-04 at 09:21 +0800, Binbin Wu wrote:
> > > On 4/3/2023 7:24 PM, Huang, Kai wrote:
> > > > > 
> > > > > Anyway, I will seperate this patch from the LAM KVM enabling patch. And
> > > > > send a patch seperately if
> > > > > needed later.
> > > > > 
> > > > I think your change for SGX is still needed based on the pseudo code of ENCLS.
> > > Yes, I meant I would seperate VMX part since it is not a bug after all,
> > > SGX will still be in the patchset.
> > > 
> > > 
> > Shouldn't SGX part be also split out as a bug fix patch?
> > 
> > Does it have anything to do with this LAM support series?
> 
> It is related to LAM support because LAM only effective in 64-bit mode,
> so the untag action should only be done in 64-bit mode.
> 
> If the SGX fix patch is not included, that means LAM untag could be 
> called in compatiblity mode in SGX ENCLS handler.
> 
> 

Yes I got this point, and your patch 6/7 depends on it.

But my point is this fix is needed anyway regardless the LAM support, and it
should be merged, for instance, asap as a bug fix (and CC stable perhaps) --
while the LAM support is a feature, and can be merged at a different time frame.

Of course just my 2cents and this is up to maintainers.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-04-04  3:09                             ` Huang, Kai
@ 2023-04-04  3:15                               ` Binbin Wu
  2023-04-04  3:27                                 ` Binbin Wu
  0 siblings, 1 reply; 43+ messages in thread
From: Binbin Wu @ 2023-04-04  3:15 UTC (permalink / raw)
  To: Huang, Kai, Christopherson,, Sean; +Cc: kvm, pbonzini, robert.hu, Gao, Chao


On 4/4/2023 11:09 AM, Huang, Kai wrote:
> On Tue, 2023-04-04 at 10:45 +0800, Binbin Wu wrote:
>> On 4/4/2023 9:53 AM, Huang, Kai wrote:
>>> On Tue, 2023-04-04 at 09:21 +0800, Binbin Wu wrote:
>>>> On 4/3/2023 7:24 PM, Huang, Kai wrote:
>>>>>> Anyway, I will seperate this patch from the LAM KVM enabling patch. And
>>>>>> send a patch seperately if
>>>>>> needed later.
>>>>>>
>>>>> I think your change for SGX is still needed based on the pseudo code of ENCLS.
>>>> Yes, I meant I would seperate VMX part since it is not a bug after all,
>>>> SGX will still be in the patchset.
>>>>
>>>>
>>> Shouldn't SGX part be also split out as a bug fix patch?
>>>
>>> Does it have anything to do with this LAM support series?
>> It is related to LAM support because LAM only effective in 64-bit mode,
>> so the untag action should only be done in 64-bit mode.
>>
>> If the SGX fix patch is not included, that means LAM untag could be
>> called in compatiblity mode in SGX ENCLS handler.
>>
>>
> Yes I got this point, and your patch 6/7 depends on it.
>
> But my point is this fix is needed anyway regardless the LAM support, and it
> should be merged, for instance, asap as a bug fix (and CC stable perhaps) --
> while the LAM support is a feature, and can be merged at a different time frame.

OK, I can seperate the patch.


>
> Of course just my 2cents and this is up to maintainers.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-04-04  3:15                               ` Binbin Wu
@ 2023-04-04  3:27                                 ` Binbin Wu
  0 siblings, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-04-04  3:27 UTC (permalink / raw)
  To: Huang, Kai; +Cc: kvm, Christopherson,, Sean, pbonzini, robert.hu, Gao, Chao


On 4/4/2023 11:15 AM, Binbin Wu wrote:
>
> On 4/4/2023 11:09 AM, Huang, Kai wrote:
>> On Tue, 2023-04-04 at 10:45 +0800, Binbin Wu wrote:
>>> On 4/4/2023 9:53 AM, Huang, Kai wrote:
>>>> On Tue, 2023-04-04 at 09:21 +0800, Binbin Wu wrote:
>>>>> On 4/3/2023 7:24 PM, Huang, Kai wrote:
>>>>>>> Anyway, I will seperate this patch from the LAM KVM enabling 
>>>>>>> patch. And
>>>>>>> send a patch seperately if
>>>>>>> needed later.
>>>>>>>
>>>>>> I think your change for SGX is still needed based on the pseudo 
>>>>>> code of ENCLS.
>>>>> Yes, I meant I would seperate VMX part since it is not a bug after 
>>>>> all,
>>>>> SGX will still be in the patchset.
>>>>>
>>>>>
>>>> Shouldn't SGX part be also split out as a bug fix patch?
>>>>
>>>> Does it have anything to do with this LAM support series?
>>> It is related to LAM support because LAM only effective in 64-bit mode,
>>> so the untag action should only be done in 64-bit mode.
>>>
>>> If the SGX fix patch is not included, that means LAM untag could be
>>> called in compatiblity mode in SGX ENCLS handler.
>>>
>>>
>> Yes I got this point, and your patch 6/7 depends on it.
>>
>> But my point is this fix is needed anyway regardless the LAM support, 
>> and it
>> should be merged, for instance, asap as a bug fix (and CC stable 
>> perhaps) --
>> while the LAM support is a feature, and can be merged at a different 
>> time frame.
>
> OK, I can seperate the patch.

Kai,

I sent the patch seperately and added your reivewed-by from the previous 
review.
https://lore.kernel.org/kvm/20230404032502.27798-1-binbin.wu@linux.intel.com/T/#u


>
>
>>
>> Of course just my 2cents and this is up to maintainers.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode
  2023-03-29 17:34               ` Sean Christopherson
  2023-03-29 22:46                 ` Huang, Kai
@ 2023-04-04  6:14                 ` Binbin Wu
  1 sibling, 0 replies; 43+ messages in thread
From: Binbin Wu @ 2023-04-04  6:14 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Kai Huang, Chao Gao, kvm, pbonzini, robert.hu


On 3/30/2023 1:34 AM, Sean Christopherson wrote:
> On Wed, Mar 29, 2023, Binbin Wu wrote:
>> On 3/29/2023 10:04 AM, Huang, Kai wrote:
>>> On Wed, 2023-03-29 at 09:27 +0800, Binbin Wu wrote:
>>>> On 3/29/2023 7:33 AM, Huang, Kai wrote:
>>>>> On Tue, 2023-03-21 at 14:35 -0700, Sean Christopherson wrote:
>>>>>> On Mon, Mar 20, 2023, Chao Gao wrote:
>>>>>>> On Sun, Mar 19, 2023 at 04:49:22PM +0800, Binbin Wu wrote:
>>>>>>>> get_vmx_mem_address() and sgx_get_encls_gva() use is_long_mode()
>>>>>>>> to check 64-bit mode. Should use is_64_bit_mode() instead.
>>>>>>>>
>>>>>>>> Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
>>>>>>>> Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
>>>>>>> It is better to split this patch into two: one for nested and one for
>>>>>>> SGX.
>>>>>>>
>>>>>>> It is possible that there is a kernel release which has just one of
>>>>>>> above two flawed commits, then this fix patch cannot be applied cleanly
>>>>>>> to the release.
>>>>>> The nVMX code isn't buggy, VMX instructions #UD in compatibility mode, and except
>>>>>> for VMCALL, that #UD has higher priority than VM-Exit interception.  So I'd say
>>>>>> just drop the nVMX side of things.
>>>>> But it looks the old code doesn't unconditionally inject #UD when in
>>>>> compatibility mode?
>>>> I think Sean means VMX instructions is not valid in compatibility mode
>>>> and it triggers #UD, which has higher priority than VM-Exit, by the
>>>> processor in non-root mode.
>>>>
>>>> So if there is a VM-Exit due to VMX instruction , it is in 64-bit mode
>>>> for sure if it is in long mode.
>>> Oh I see thanks.
>>>
>>> Then is it better to add some comment to explain, or add a WARN() if it's not in
>>> 64-bit mode?
>> I also prefer to add a comment if no objection.
>>
>> Seems I am not the only one who didn't get it� : )
> I would rather have a code change than a comment, e.g.
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index f63b28f46a71..0460ca219f96 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4931,7 +4931,8 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
>          int  base_reg       = (vmx_instruction_info >> 23) & 0xf;
>          bool base_is_valid  = !(vmx_instruction_info & (1u << 27));
>   
> -       if (is_reg) {
> +       if (is_reg ||
> +           WARN_ON_ONCE(is_long_mode(vcpu) && !is_64_bit_mode(vcpu))) {
>                  kvm_queue_exception(vcpu, UD_VECTOR);
>                  return 1;
>          }
>
>
> The only downside is that querying is_64_bit_mode() could unnecessarily trigger a
> VMREAD to get the current CS.L bit, but a measurable performance regressions is
> extremely unlikely because is_64_bit_mode() all but guaranteed to be called in
> these paths anyways (and KVM caches segment info), e.g. by kvm_register_read().
>
> And then in a follow-up, we should also be able to do:
>
> @@ -5402,7 +5403,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
>          if (instr_info & BIT(10)) {
>                  kvm_register_write(vcpu, (((instr_info) >> 3) & 0xf), value);
>          } else {
> -               len = is_64_bit_mode(vcpu) ? 8 : 4;
> +               len = is_long_mode(vcpu) ? 8 : 4;
>                  if (get_vmx_mem_address(vcpu, exit_qualification,
>                                          instr_info, true, len, &gva))
>                          return 1;
> @@ -5476,7 +5477,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
>          if (instr_info & BIT(10))
>                  value = kvm_register_read(vcpu, (((instr_info) >> 3) & 0xf));
>          else {
> -               len = is_64_bit_mode(vcpu) ? 8 : 4;
> +               len = is_long_mode(vcpu) ? 8 : 4;
>                  if (get_vmx_mem_address(vcpu, exit_qualification,
>                                          instr_info, false, len, &gva))
>                          return 1;

Agree to replace is_64_bit_mode() with is_long_mode().
But, based on the implementation and comment of 
nested_vmx_check_permission(),
do you think it still needs to add the check for compatibility mode?



^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2023-04-04  6:15 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-19  8:49 [PATCH v6 0/7] Linear Address Masking (LAM) KVM Enabling Binbin Wu
2023-03-19  8:49 ` [PATCH v6 1/7] KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3() Binbin Wu
2023-03-20  1:30   ` Binbin Wu
2023-03-19  8:49 ` [PATCH v6 2/7] KVM: VMX: Use is_64_bit_mode() to check 64-bit mode Binbin Wu
2023-03-20 12:36   ` Chao Gao
2023-03-20 12:51     ` Binbin Wu
2023-03-21 21:35     ` Sean Christopherson
2023-03-22  1:09       ` Binbin Wu
2023-03-28 23:33       ` Huang, Kai
2023-03-29  1:27         ` Binbin Wu
2023-03-29  2:04           ` Huang, Kai
2023-03-29  2:08             ` Binbin Wu
2023-03-29 17:34               ` Sean Christopherson
2023-03-29 22:46                 ` Huang, Kai
2023-04-03  3:37                   ` Binbin Wu
2023-04-03 11:24                     ` Huang, Kai
2023-04-03 15:02                       ` Sean Christopherson
2023-04-03 23:13                         ` Huang, Kai
2023-04-04  1:21                       ` Binbin Wu
2023-04-04  1:53                         ` Huang, Kai
2023-04-04  2:45                           ` Binbin Wu
2023-04-04  3:09                             ` Huang, Kai
2023-04-04  3:15                               ` Binbin Wu
2023-04-04  3:27                                 ` Binbin Wu
2023-04-04  1:31                       ` Binbin Wu
2023-04-04  6:14                 ` Binbin Wu
2023-03-20 22:36   ` Huang, Kai
2023-03-19  8:49 ` [PATCH v6 3/7] KVM: x86: Virtualize CR4.LAM_SUP Binbin Wu
2023-03-19  8:49 ` [PATCH v6 4/7] KVM: x86: Virtualize CR3.LAM_{U48,U57} Binbin Wu
2023-03-30  8:33   ` Yang, Weijiang
2023-03-30  8:40     ` Binbin Wu
2023-03-19  8:49 ` [PATCH v6 5/7] KVM: x86: Introduce untag_addr() in kvm_x86_ops Binbin Wu
2023-03-20 12:07   ` Chao Gao
2023-03-20 12:23     ` Binbin Wu
2023-03-29  1:54       ` Binbin Wu
2023-03-19  8:49 ` [PATCH v6 6/7] KVM: x86: Untag address when LAM applicable Binbin Wu
2023-03-20 11:51   ` Chao Gao
2023-03-20 11:56     ` Binbin Wu
2023-03-20 12:04   ` Binbin Wu
2023-03-29  5:02   ` Binbin Wu
2023-03-19  8:49 ` [PATCH v6 7/7] KVM: x86: Expose LAM feature to userspace VMM Binbin Wu
2023-03-20  8:57   ` Chao Gao
2023-03-20 12:00     ` Binbin Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).