kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/5] Linear Address Masking (LAM) KVM Enabling
@ 2023-02-27  8:45 Robert Hoo
  2023-02-27  8:45 ` [PATCH v5 1/5] KVM: x86: Virtualize CR4.LAM_SUP Robert Hoo
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Robert Hoo @ 2023-02-27  8:45 UTC (permalink / raw)
  To: seanjc, pbonzini, chao.gao, binbin.wu; +Cc: kvm, Robert Hoo

===Feature Introduction===

Linear-address masking (LAM) [1], modifies the checking that is applied to
*64-bit* linear addresses, allowing software to use of the untranslated
address (upper) bits for metadata.
As for which upper bits of linear address can be borrowed, LAM has 2 modes:
LAM_48 (bits 62:48, i.e. LAM width of 15) and LAM_57 (bits 62:57, i.e. LAM
width of 6), controlled by these new bits: CR3[62] (LAM_U48), CR3[61]
(LAM_U57), and CR4[28] (LAM_SUP).

* LAM_U48 and LAM_U57 bits controls LAM for user mode address. I.e. if
  CR3.LAM_U57 = 1, LAM57 is applied; if CR3.LAM_U48 = 1 and CR3.LAM_U57 = 0,
  LAM48 is applied.
* LAM_SUP bit, combined with paging mode (4-level or 5-level), determines
  LAM status for supervisor mode address. I.e. when CR4.LAM_SUP =1, 4-level
  paging mode will have LAM48 for supervisor mode address while 5-level paging
  will have LAM57.

Note:
1. LAM applies to only data address, not to instructions.
2. LAM identification of an address as user or supervisor is based solely on the
   value of pointer bit 63 and does not, for the purposes of LAM, depend on the CPL.
3. For user mode address, it is possible that 5-level paging and LAM_U48 are both
   set, in this case, the effective usable linear address width is 48. [2]
4. When VM exit, the problematic address saved in VMCS field is clean, i.e.
   metadata cleared with canonical form.


===LAM KVM Design===

Intercept CR4.LAM_SUP by KVM, to avoid read VMCS field every time, with
expectation that guest won't toggle this bit frequently.

Under EPT mode, CR3 is fully under guest control, guest LAM is thus transparent to
KVM.

For Shadow paging (EPT = off), KVM need to handle guest CR3.LAM_U48 and CR3.LAM_U57
toggles.

In KVM emulator, when a linear address is calculated, imitate HW LAM rules
per LAM setting.

===Unit Test===
1. Run Kernel LAM kselftests in guest, with both EPT=Y/N.
2. Add a kvm-unit-test [3] for CR4.LAM_SUP part, as Kernel LAM selftests doesn't
cover this yet. This test covers CR4 LAM_SUP bits toggle, LAM supervisor
mode address masking, KVM emulator code patch. Run the unit test with both LAM feature
on/off (i.e. including negative cases).
3. Launch a nested guest.

All tests has passed in Simics environment.

[1] ISE Chap 10 https://cdrdv2.intel.com/v1/dl/getContent/671368 (Section 10.6 VMX interaction)
[2] Thus currently, Kernel enabling patch only enables LAM_U57. https://lore.kernel.org/lkml/20230123220500.21077-1-kirill.shutemov@linux.intel.com/ 
[3] https://lore.kernel.org/kvm/20230227082557.403584-1-robert.hu@linux.intel.com/

---
Changelog
v4 --> v5:
Reorder and melt patches surround CR3.LAM bits into Patch 3 of this
version.
Revise Patch 1's subject and description
Drop Patch 3
Use kvm_read_cr4_bits() instead of kvm_read_cr4()
Fix: No need to untag addr when write to msr, it should be legacy canonical check
Rename kvm_is_valid_cr3() --> kvm_vcpu_is_valid_cr3(), and update some call
sites of kvm_vcpu_is_valid_cr3() to use kvm_is_valid_cr3().
Other refactors and Miscs.

v3 --> v4:
Drop unrelated Patch 1 in v3 (Binbin, Sean, Xiaoyao)
Intercept CR4.LAM_SUP instead of pass through to guest (Sean)
Just filter out CR3.LAM_{U48, U57}, instead of all reserved high bits
(Sean, Yuan)
Use existing __canonical_address() helper instead write a new one (Weijiang)
Add LAM handling in KVM emulation (Yu, Yuan)
Add Jingqi's reviwed-by on Patch 7
Rebased to Kirill's latest code, which is 6.2-rc1 base.

v2 --> v3:
As LAM Kernel patches are in tip tree now, rebase to it.
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/

v1 --> v2:
1. Fixes i386-allyesconfig build error on get_pgd(), where
   CR3_HIGH_RSVD_MASK isn't applicable.
   (Reported-by: kernel test robot <lkp@intel.com>)
2. In kvm_set_cr3(), be conservative on skip tlb flush when only LAM bits
   toggles. (Kirill)


Robert Hoo (5):
  KVM: x86: Virtualize CR4.LAM_SUP
  [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3()
  KVM: x86: Virtualize CR3.LAM_{U48,U57}
  KVM: x86: emulation: Apply LAM mask when emulating data access in
    64-bit mode
  KVM: x86: LAM: Expose LAM CPUID to user space VMM

 arch/x86/include/asm/kvm_host.h |  3 +-
 arch/x86/kvm/cpuid.c            |  2 +-
 arch/x86/kvm/emulate.c          | 13 ++++++
 arch/x86/kvm/mmu.h              |  5 +++
 arch/x86/kvm/mmu/mmu.c          |  9 +++-
 arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
 arch/x86/kvm/vmx/nested.c       |  4 +-
 arch/x86/kvm/vmx/vmx.c          |  3 +-
 arch/x86/kvm/x86.c              | 35 +++++++++++++---
 arch/x86/kvm/x86.h              | 73 +++++++++++++++++++++++++++++++++
 10 files changed, 136 insertions(+), 13 deletions(-)

https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam
base-commit: 03334443640f226f56f71b5dfa3b1be6d4a1a1bc
-- 
2.31.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v5 1/5] KVM: x86: Virtualize CR4.LAM_SUP
  2023-02-27  8:45 [PATCH v5 0/5] Linear Address Masking (LAM) KVM Enabling Robert Hoo
@ 2023-02-27  8:45 ` Robert Hoo
  2023-03-02  7:17   ` Chao Gao
  2023-02-27  8:45 ` [PATCH v5 2/5] [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3() Robert Hoo
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Robert Hoo @ 2023-02-27  8:45 UTC (permalink / raw)
  To: seanjc, pbonzini, chao.gao, binbin.wu; +Cc: kvm, Robert Hoo

LAM feature uses CR4 bit[28] (LAM_SUP) to enable/config LAM masking on
supervisor mode address. To virtualize that, move CR4.LAM_SUP out of
unconditional CR4_RESERVED_BITS; its reservation now depends on vCPU has
LAM feature or not.

Not passing through to guest but intercept it, is to avoid read VMCS field
every time when KVM fetch its value, with expectation that guest won't
toggle this bit frequently.

There's no other features/vmx_exec_controls connections, therefore no code
need to be complemented in kvm/vmx_set_cr4().

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/x86.h              | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f35f1ff4427b..4684896698f4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -125,7 +125,8 @@
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+			  | X86_CR4_LAM_SUP))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 9de72586f406..8ec5cc983062 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -475,6 +475,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 		__reserved_bits |= X86_CR4_VMXE;        \
 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
 		__reserved_bits |= X86_CR4_PCIDE;       \
+	if (!__cpu_has(__c, X86_FEATURE_LAM))		\
+		__reserved_bits |= X86_CR4_LAM_SUP;	\
 	__reserved_bits;                                \
 })
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 2/5] [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3()
  2023-02-27  8:45 [PATCH v5 0/5] Linear Address Masking (LAM) KVM Enabling Robert Hoo
  2023-02-27  8:45 ` [PATCH v5 1/5] KVM: x86: Virtualize CR4.LAM_SUP Robert Hoo
@ 2023-02-27  8:45 ` Robert Hoo
  2023-03-02  7:24   ` Chao Gao
  2023-02-27  8:45 ` [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57} Robert Hoo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Robert Hoo @ 2023-02-27  8:45 UTC (permalink / raw)
  To: seanjc, pbonzini, chao.gao, binbin.wu; +Cc: kvm, Robert Hoo

kvm_read_cr4_bits() returns ulong, explicitly cast it bool when assign to a
bool variable.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 312aea1854ae..b9611690561d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1236,7 +1236,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	bool skip_tlb_flush = false;
 	unsigned long pcid = 0;
 #ifdef CONFIG_X86_64
-	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
+	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
 
 	if (pcid_enabled) {
 		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}
  2023-02-27  8:45 [PATCH v5 0/5] Linear Address Masking (LAM) KVM Enabling Robert Hoo
  2023-02-27  8:45 ` [PATCH v5 1/5] KVM: x86: Virtualize CR4.LAM_SUP Robert Hoo
  2023-02-27  8:45 ` [PATCH v5 2/5] [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3() Robert Hoo
@ 2023-02-27  8:45 ` Robert Hoo
  2023-03-03  6:21   ` Chao Gao
  2023-03-10 20:12   ` Sean Christopherson
  2023-02-27  8:45 ` [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode Robert Hoo
  2023-02-27  8:45 ` [PATCH v5 5/5] KVM: x86: LAM: Expose LAM CPUID to user space VMM Robert Hoo
  4 siblings, 2 replies; 33+ messages in thread
From: Robert Hoo @ 2023-02-27  8:45 UTC (permalink / raw)
  To: seanjc, pbonzini, chao.gao, binbin.wu; +Cc: kvm, Robert Hoo

LAM feature uses 2 high bits in CR3 (bit 62 for LAM_U48 and bit 61 for
LAM_U57) to enable/config LAM feature for user mode addresses. The LAM
masking is done before legacy canonical checks.

To virtualize LAM CR3 bits usage, this patch:
1. Don't reserve these 2 bits when LAM is enable on the vCPU. Previously
when validate CR3, kvm uses kvm_vcpu_is_legal_gpa(), now define
kvm_vcpu_is_valid_cr3() which is actually kvm_vcpu_is_legal_gpa()
+ CR3.LAM bits validation. Substitutes kvm_vcpu_is_legal/illegal_gpa()
with kvm_vcpu_is_valid_cr3() in call sites where is validating CR3 rather
than pure GPA.
2. mmu::get_guest_pgd(), its implementation is get_cr3() which returns
whole guest CR3 value. Strip LAM bits in those call sites that need pure
PGD value, e.g. mmu_alloc_shadow_roots(), FNAME(walk_addr_generic)().
3. When form a new guest CR3 (vmx_load_mmu_pgd()), melt in LAM bit
(kvm_get_active_lam()).
4. When guest sets CR3, identify ONLY-LAM-bits-toggling cases, where it is
unnecessary to make new pgd, but just make request of load pgd, then new
CR3.LAM bits configuration will be melt in (above point 3). To be
conservative, this case still do TLB flush.
5. For nested VM entry, allow the 2 CR3 bits set in corresponding
VMCS host/guest fields.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 arch/x86/kvm/mmu.h             |  5 +++++
 arch/x86/kvm/mmu/mmu.c         |  9 ++++++++-
 arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
 arch/x86/kvm/vmx/nested.c      |  4 ++--
 arch/x86/kvm/vmx/vmx.c         |  3 ++-
 arch/x86/kvm/x86.c             | 33 ++++++++++++++++++++++++++++-----
 arch/x86/kvm/x86.h             |  1 +
 7 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6bdaacb6faa0..866f2b7cb509 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -142,6 +142,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
 	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
 }
 
+static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu)
+{
+	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
+}
+
 static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 {
 	u64 root_hpa = vcpu->arch.mmu->root.hpa;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 835426254e76..3efec7f8d8c6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3699,7 +3699,14 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	int quadrant, i, r;
 	hpa_t root;
 
-	root_pgd = mmu->get_guest_pgd(vcpu);
+	/*
+	 * Omit guest_cpuid_has(X86_FEATURE_LAM) check but can unconditionally
+	 * strip CR3 LAM bits because they resides in high reserved bits,
+	 * with LAM or not, those high bits should be striped anyway when
+	 * interpreted to pgd.
+	 */
+	root_pgd = mmu->get_guest_pgd(vcpu) &
+		   ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
 	root_gfn = root_pgd >> PAGE_SHIFT;
 
 	if (mmu_check_root(vcpu, root_gfn))
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 0f6455072055..57f39c7492ed 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -324,7 +324,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	trace_kvm_mmu_pagetable_walk(addr, access);
 retry_walk:
 	walker->level = mmu->cpu_role.base.level;
-	pte           = mmu->get_guest_pgd(vcpu);
+	pte           = mmu->get_guest_pgd(vcpu) & ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
 	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
 
 #if PTTYPE == 64
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b6f4411b613e..301912155dd1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1078,7 +1078,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 			       bool nested_ept, bool reload_pdptrs,
 			       enum vm_entry_failure_code *entry_failure_code)
 {
-	if (CC(kvm_vcpu_is_illegal_gpa(vcpu, cr3))) {
+	if (CC(!kvm_vcpu_is_valid_cr3(vcpu, cr3))) {
 		*entry_failure_code = ENTRY_FAIL_DEFAULT;
 		return -EINVAL;
 	}
@@ -2906,7 +2906,7 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 
 	if (CC(!nested_host_cr0_valid(vcpu, vmcs12->host_cr0)) ||
 	    CC(!nested_host_cr4_valid(vcpu, vmcs12->host_cr4)) ||
-	    CC(kvm_vcpu_is_illegal_gpa(vcpu, vmcs12->host_cr3)))
+	    CC(!kvm_vcpu_is_valid_cr3(vcpu, vmcs12->host_cr3)))
 		return -EINVAL;
 
 	if (CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fe5615fd8295..66edd091f145 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3289,7 +3289,8 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			update_guest_cr3 = false;
 		vmx_ept_load_pdptrs(vcpu);
 	} else {
-		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu) |
+			    kvm_get_active_lam(vcpu);
 	}
 
 	if (update_guest_cr3)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b9611690561d..be6c7c986f0f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1231,10 +1231,21 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
 	kvm_mmu_free_roots(vcpu->kvm, mmu, roots_to_free);
 }
 
+bool kvm_vcpu_is_valid_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
+{
+	if (guest_cpuid_has(vcpu, X86_FEATURE_LAM))
+		cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
+	else if (cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57))
+		return false;
+
+	return kvm_vcpu_is_legal_gpa(vcpu, cr3);
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_is_valid_cr3);
+
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	bool skip_tlb_flush = false;
-	unsigned long pcid = 0;
+	unsigned long pcid = 0, old_cr3;
 #ifdef CONFIG_X86_64
 	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
 
@@ -1254,14 +1265,26 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
 	 * the current vCPU mode is accurate.
 	 */
-	if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
+	if (!kvm_vcpu_is_valid_cr3(vcpu, cr3))
 		return 1;
 
 	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
 		return 1;
 
-	if (cr3 != kvm_read_cr3(vcpu))
-		kvm_mmu_new_pgd(vcpu, cr3);
+	old_cr3 = kvm_read_cr3(vcpu);
+	if (cr3 != old_cr3) {
+		if ((cr3 ^ old_cr3) & ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57)) {
+			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
+					X86_CR3_LAM_U57));
+		} else {
+			/*
+			 * Though only LAM bits change, mark the
+			 * request in order to force an update on guest CR3
+			 * because the LAM bits of old CR3 is stale
+			 */
+			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
+		}
+	}
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
@@ -11185,7 +11208,7 @@ static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 		 */
 		if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
 			return false;
-		if (kvm_vcpu_is_illegal_gpa(vcpu, sregs->cr3))
+		if (!kvm_vcpu_is_valid_cr3(vcpu, sregs->cr3))
 			return false;
 	} else {
 		/*
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 8ec5cc983062..6b6bfddc84e0 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -440,6 +440,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
 void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
 int kvm_spec_ctrl_test_value(u64 value);
 bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
+bool kvm_vcpu_is_valid_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
 int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
 			      struct x86_exception *e);
 int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-02-27  8:45 [PATCH v5 0/5] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (2 preceding siblings ...)
  2023-02-27  8:45 ` [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57} Robert Hoo
@ 2023-02-27  8:45 ` Robert Hoo
  2023-03-02  6:41   ` Binbin Wu
                     ` (2 more replies)
  2023-02-27  8:45 ` [PATCH v5 5/5] KVM: x86: LAM: Expose LAM CPUID to user space VMM Robert Hoo
  4 siblings, 3 replies; 33+ messages in thread
From: Robert Hoo @ 2023-02-27  8:45 UTC (permalink / raw)
  To: seanjc, pbonzini, chao.gao, binbin.wu; +Cc: kvm, Robert Hoo

Emulate HW LAM masking when doing data access under 64-bit mode.

kvm_lam_untag_addr() implements this: per CR4/CR3 LAM bits configuration,
firstly check the linear addr conforms LAM canonical, i.e. the highest
address bit matches bit 63. Then mask out meta data per LAM configuration.
If failed in above process, emulate #GP to guest.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 arch/x86/kvm/emulate.c | 13 ++++++++
 arch/x86/kvm/x86.h     | 70 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5cc3efa0e21c..77bd13f40711 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -700,6 +700,19 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
 	*max_size = 0;
 	switch (mode) {
 	case X86EMUL_MODE_PROT64:
+		/* LAM applies only on data access */
+		if (!fetch && guest_cpuid_has(ctxt->vcpu, X86_FEATURE_LAM)) {
+			enum lam_type type;
+
+			type = kvm_vcpu_lam_type(la, ctxt->vcpu);
+			if (type == LAM_ILLEGAL) {
+				*linear = la;
+				goto bad;
+			} else {
+				la = kvm_lam_untag_addr(la, type);
+			}
+		}
+
 		*linear = la;
 		va_bits = ctxt_virt_addr_bits(ctxt);
 		if (!__is_canonical_address(la, va_bits))
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 6b6bfddc84e0..d992e5220602 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -201,6 +201,76 @@ static inline bool is_noncanonical_address(u64 la, struct kvm_vcpu *vcpu)
 	return !__is_canonical_address(la, vcpu_virt_addr_bits(vcpu));
 }
 
+enum lam_type {
+	LAM_ILLEGAL = -1,
+	LAM_U57,
+	LAM_U48,
+	LAM_S57,
+	LAM_S48,
+	LAM_NONE
+};
+
+#ifdef CONFIG_X86_64
+/*
+ * LAM Canonical Rule:
+ * LAM_U/S48 -- bit 63 == bit 47
+ * LAM_U/S57 -- bit 63 == bit 56
+ */
+static inline bool lam_canonical(u64 addr, int effect_width)
+{
+	return (addr >> 63) == ((addr >> effect_width) & BIT(0));
+}
+
+static inline enum lam_type kvm_vcpu_lam_type(u64 addr, struct kvm_vcpu *vcpu)
+{
+	WARN_ON_ONCE(!is_64_bit_mode(vcpu));
+
+	if (addr >> 63 == 0) {
+		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
+			return lam_canonical(addr, 56) ?  LAM_U57 : LAM_ILLEGAL;
+		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48)
+			return lam_canonical(addr, 47) ?  LAM_U48 : LAM_ILLEGAL;
+	} else if (kvm_read_cr4_bits(vcpu, X86_CR4_LAM_SUP)) {
+		if (kvm_read_cr4_bits(vcpu, X86_CR4_LA57))
+			return lam_canonical(addr, 56) ?  LAM_S57 : LAM_ILLEGAL;
+		else
+			return lam_canonical(addr, 47) ?  LAM_S48 : LAM_ILLEGAL;
+	}
+
+	return LAM_NONE;
+}
+
+/* untag addr for guest, according to vCPU's LAM config */
+static inline u64 kvm_lam_untag_addr(u64 addr, enum lam_type type)
+{
+	switch (type) {
+	case LAM_U57:
+	case LAM_S57:
+		addr = __canonical_address(addr, 57);
+		break;
+	case LAM_U48:
+	case LAM_S48:
+		addr = __canonical_address(addr, 48);
+		break;
+	case LAM_NONE:
+	default:
+		break;
+	}
+
+	return addr;
+}
+#else
+static inline enum lam_type kvm_vcpu_lam_type(u64 addr, struct kvm_vcpu *vcpu)
+{
+	return LAM_NONE;
+}
+
+static inline u64 kvm_lam_untag_addr(u64 addr, enum lam_type type)
+{
+	return addr;
+}
+#endif
+
 static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
 					gva_t gva, gfn_t gfn, unsigned access)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 5/5] KVM: x86: LAM: Expose LAM CPUID to user space VMM
  2023-02-27  8:45 [PATCH v5 0/5] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (3 preceding siblings ...)
  2023-02-27  8:45 ` [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode Robert Hoo
@ 2023-02-27  8:45 ` Robert Hoo
  2023-03-03  6:46   ` Chao Gao
  4 siblings, 1 reply; 33+ messages in thread
From: Robert Hoo @ 2023-02-27  8:45 UTC (permalink / raw)
  To: seanjc, pbonzini, chao.gao, binbin.wu; +Cc: kvm, Robert Hoo, Jingqi Liu

LAM feature is enumerated by (EAX=07H, ECX=01H):EAX.LAM[bit26].

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/kvm/cpuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b14653b61470..79f45cbe587e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -664,7 +664,7 @@ void kvm_set_cpu_caps(void)
 
 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
 		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) | F(AMX_FP16) |
-		F(AVX_IFMA)
+		F(AVX_IFMA) | F(LAM)
 	);
 
 	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-02-27  8:45 ` [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode Robert Hoo
@ 2023-03-02  6:41   ` Binbin Wu
  2023-03-02 13:16     ` Robert Hoo
  2023-03-02  8:55   ` Chao Gao
  2023-03-10 20:23   ` Sean Christopherson
  2 siblings, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2023-03-02  6:41 UTC (permalink / raw)
  To: Robert Hoo, seanjc, pbonzini, chao.gao; +Cc: kvm


On 2/27/2023 4:45 PM, Robert Hoo wrote:
> Emulate HW LAM masking when doing data access under 64-bit mode.
>
> kvm_lam_untag_addr() implements this: per CR4/CR3 LAM bits configuration,
> firstly check the linear addr conforms LAM canonical, i.e. the highest
> address bit matches bit 63. Then mask out meta data per LAM configuration.
> If failed in above process, emulate #GP to guest.
>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> ---
>   arch/x86/kvm/emulate.c | 13 ++++++++
>   arch/x86/kvm/x86.h     | 70 ++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 83 insertions(+)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 5cc3efa0e21c..77bd13f40711 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -700,6 +700,19 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
>   	*max_size = 0;
>   	switch (mode) {
>   	case X86EMUL_MODE_PROT64:
> +		/* LAM applies only on data access */
> +		if (!fetch && guest_cpuid_has(ctxt->vcpu, X86_FEATURE_LAM)) {
> +			enum lam_type type;
> +
> +			type = kvm_vcpu_lam_type(la, ctxt->vcpu);
> +			if (type == LAM_ILLEGAL) {
> +				*linear = la;
> +				goto bad;
> +			} else {
> +				la = kvm_lam_untag_addr(la, type);
> +			}
> +		}
> +

__linearize is not the only path the modified LAM canonical check 
needed, also some vmexits path should be taken care of, like VMX, SGX 
ENCLS.

Also the instruction INVLPG, INVPCID should have some special handling 
since LAM is not applied to the memory operand of the two instruction 
according to the LAM spec.


>   		*linear = la;
>   		va_bits = ctxt_virt_addr_bits(ctxt);
>   		if (!__is_canonical_address(la, va_bits))
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 6b6bfddc84e0..d992e5220602 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -201,6 +201,76 @@ static inline bool is_noncanonical_address(u64 la, struct kvm_vcpu *vcpu)
>   	return !__is_canonical_address(la, vcpu_virt_addr_bits(vcpu));
>   }
>   
> +enum lam_type {
> +	LAM_ILLEGAL = -1,
> +	LAM_U57,
> +	LAM_U48,
> +	LAM_S57,
> +	LAM_S48,
> +	LAM_NONE
> +};
> +
> +#ifdef CONFIG_X86_64
> +/*
> + * LAM Canonical Rule:
> + * LAM_U/S48 -- bit 63 == bit 47
> + * LAM_U/S57 -- bit 63 == bit 56

The modified LAM canonical check for LAM_U57 + 4-level paging is: bit 
63, bit 56:47 should be all 0s.


> + */
> +static inline bool lam_canonical(u64 addr, int effect_width)
> +{
> +	return (addr >> 63) == ((addr >> effect_width) & BIT(0));
> +}
> +
> +static inline enum lam_type kvm_vcpu_lam_type(u64 addr, struct kvm_vcpu *vcpu)
> +{
> +	WARN_ON_ONCE(!is_64_bit_mode(vcpu));
> +
> +	if (addr >> 63 == 0) {
> +		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
> +			return lam_canonical(addr, 56) ?  LAM_U57 : LAM_ILLEGAL;
> +		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48)
> +			return lam_canonical(addr, 47) ?  LAM_U48 : LAM_ILLEGAL;
> +	} else if (kvm_read_cr4_bits(vcpu, X86_CR4_LAM_SUP)) {
> +		if (kvm_read_cr4_bits(vcpu, X86_CR4_LA57))
> +			return lam_canonical(addr, 56) ?  LAM_S57 : LAM_ILLEGAL;
> +		else
> +			return lam_canonical(addr, 47) ?  LAM_S48 : LAM_ILLEGAL;
> +	}
> +
> +	return LAM_NONE;
> +}
> +
> +/* untag addr for guest, according to vCPU's LAM config */
> +static inline u64 kvm_lam_untag_addr(u64 addr, enum lam_type type)
> +{
> +	switch (type) {
> +	case LAM_U57:
> +	case LAM_S57:
> +		addr = __canonical_address(addr, 57);
> +		break;
> +	case LAM_U48:
> +	case LAM_S48:
> +		addr = __canonical_address(addr, 48);
> +		break;
> +	case LAM_NONE:
> +	default:
> +		break;
> +	}
> +
> +	return addr;
> +}
> +#else
> +static inline enum lam_type kvm_vcpu_lam_type(u64 addr, struct kvm_vcpu *vcpu)
> +{
> +	return LAM_NONE;
> +}
> +
> +static inline u64 kvm_lam_untag_addr(u64 addr, enum lam_type type)
> +{
> +	return addr;
> +}
> +#endif
> +
>   static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
>   					gva_t gva, gfn_t gfn, unsigned access)
>   {

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 1/5] KVM: x86: Virtualize CR4.LAM_SUP
  2023-02-27  8:45 ` [PATCH v5 1/5] KVM: x86: Virtualize CR4.LAM_SUP Robert Hoo
@ 2023-03-02  7:17   ` Chao Gao
  2023-03-02 12:03     ` Binbin Wu
  2023-03-02 13:00     ` Robert Hoo
  0 siblings, 2 replies; 33+ messages in thread
From: Chao Gao @ 2023-03-02  7:17 UTC (permalink / raw)
  To: Robert Hoo; +Cc: seanjc, pbonzini, binbin.wu, kvm

On Mon, Feb 27, 2023 at 04:45:43PM +0800, Robert Hoo wrote:
>LAM feature uses CR4 bit[28] (LAM_SUP) to enable/config LAM masking on
>supervisor mode address. To virtualize that, move CR4.LAM_SUP out of
>unconditional CR4_RESERVED_BITS; its reservation now depends on vCPU has
>LAM feature or not.
>
>Not passing through to guest but intercept it, is to avoid read VMCS field
>every time when KVM fetch its value, with expectation that guest won't
>toggle this bit frequently.
>
>There's no other features/vmx_exec_controls connections, therefore no code
>need to be complemented in kvm/vmx_set_cr4().
>
>Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>---
> arch/x86/include/asm/kvm_host.h | 3 ++-
> arch/x86/kvm/x86.h              | 2 ++
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>index f35f1ff4427b..4684896698f4 100644
>--- a/arch/x86/include/asm/kvm_host.h
>+++ b/arch/x86/include/asm/kvm_host.h
>@@ -125,7 +125,8 @@
> 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
> 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
> 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
>-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
>+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
>+			  | X86_CR4_LAM_SUP))
> 
> #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
> 
>diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>index 9de72586f406..8ec5cc983062 100644
>--- a/arch/x86/kvm/x86.h
>+++ b/arch/x86/kvm/x86.h
>@@ -475,6 +475,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
> 		__reserved_bits |= X86_CR4_VMXE;        \
> 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
> 		__reserved_bits |= X86_CR4_PCIDE;       \
>+	if (!__cpu_has(__c, X86_FEATURE_LAM))		\
>+		__reserved_bits |= X86_CR4_LAM_SUP;	\
> 	__reserved_bits;                                \
> })

Add X86_CR4_LAM_SUP to cr4_fixed1 in nested_vmx_cr_fixed1_bits_update()
to indicate CR4.LAM_SUP is allowed to be 0 or 1 in VMX operation.

With this fixed,

Reviewed-by: Chao Gao <chao.gao@intel.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 2/5] [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3()
  2023-02-27  8:45 ` [PATCH v5 2/5] [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3() Robert Hoo
@ 2023-03-02  7:24   ` Chao Gao
  2023-03-03  3:23     ` Robert Hoo
  0 siblings, 1 reply; 33+ messages in thread
From: Chao Gao @ 2023-03-02  7:24 UTC (permalink / raw)
  To: Robert Hoo; +Cc: seanjc, pbonzini, binbin.wu, kvm

On Mon, Feb 27, 2023 at 04:45:44PM +0800, Robert Hoo wrote:
>kvm_read_cr4_bits() returns ulong, explicitly cast it bool when assign to a
>bool variable.
>
>Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>---
> arch/x86/kvm/x86.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 312aea1854ae..b9611690561d 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -1236,7 +1236,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> 	bool skip_tlb_flush = false;
> 	unsigned long pcid = 0;
> #ifdef CONFIG_X86_64
>-	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>+	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> 
> 	if (pcid_enabled) {
> 		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;

pcid_enabled is used only once. You can drop it, i.e.,

	if (kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)) {

>-- 
>2.31.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-02-27  8:45 ` [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode Robert Hoo
  2023-03-02  6:41   ` Binbin Wu
@ 2023-03-02  8:55   ` Chao Gao
  2023-03-02 11:31     ` Binbin Wu
  2023-03-10 20:23   ` Sean Christopherson
  2 siblings, 1 reply; 33+ messages in thread
From: Chao Gao @ 2023-03-02  8:55 UTC (permalink / raw)
  To: Robert Hoo; +Cc: seanjc, pbonzini, binbin.wu, kvm

On Mon, Feb 27, 2023 at 04:45:46PM +0800, Robert Hoo wrote:
>Emulate HW LAM masking when doing data access under 64-bit mode.
>
>kvm_lam_untag_addr() implements this: per CR4/CR3 LAM bits configuration,
>firstly check the linear addr conforms LAM canonical, i.e. the highest
>address bit matches bit 63. Then mask out meta data per LAM configuration.
>If failed in above process, emulate #GP to guest.
>
>Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>---
> arch/x86/kvm/emulate.c | 13 ++++++++
> arch/x86/kvm/x86.h     | 70 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 83 insertions(+)
>
>diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>index 5cc3efa0e21c..77bd13f40711 100644
>--- a/arch/x86/kvm/emulate.c
>+++ b/arch/x86/kvm/emulate.c
>@@ -700,6 +700,19 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
> 	*max_size = 0;
> 	switch (mode) {
> 	case X86EMUL_MODE_PROT64:
>+		/* LAM applies only on data access */
>+		if (!fetch && guest_cpuid_has(ctxt->vcpu, X86_FEATURE_LAM)) {
>+			enum lam_type type;
>+
>+			type = kvm_vcpu_lam_type(la, ctxt->vcpu);
>+			if (type == LAM_ILLEGAL) {
>+				*linear = la;
>+				goto bad;
>+			} else {
>+				la = kvm_lam_untag_addr(la, type);
>+			}
>+		}
>+
> 		*linear = la;
> 		va_bits = ctxt_virt_addr_bits(ctxt);
> 		if (!__is_canonical_address(la, va_bits))

...

>+static inline u64 kvm_lam_untag_addr(u64 addr, enum lam_type type)
>+{
>+	switch (type) {
>+	case LAM_U57:
>+	case LAM_S57:
>+		addr = __canonical_address(addr, 57);
>+		break;
>+	case LAM_U48:
>+	case LAM_S48:
>+		addr = __canonical_address(addr, 48);
>+		break;
>+	case LAM_NONE:
>+	default:
>+		break;
>+	}
>+
>+	return addr;
>+}

LAM's change to canonicality check is:
before performing the check, software metadata in pointers is masked by
sign-extending the value of bit 56/47.

so, to emulate this behavior, in kvm_lam_untag_addr(), we can simply:
1. determine which LAM configuration is enabled, LAM57 or LAM48.
2. mask software metadata by sign-extending the bit56/47, i.e.,

	addr = (sign_extern64(addr, X) & ~BIT_ULL(63)) |
	       (addr & BIT_ULL(63));

	where X=56 for LAM57 and X=47 for LAM48.

Note that this doesn't ensure the resulting @addr is canonical. It
isn't a problem because the original canonicality check
(__is_canonical_address() above) can identify non-canonical addresses
and raise #GP/#SS to the guest.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-03-02  8:55   ` Chao Gao
@ 2023-03-02 11:31     ` Binbin Wu
  0 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2023-03-02 11:31 UTC (permalink / raw)
  To: Chao Gao, Robert Hoo; +Cc: seanjc, pbonzini, kvm


On 3/2/2023 4:55 PM, Chao Gao wrote:
> On Mon, Feb 27, 2023 at 04:45:46PM +0800, Robert Hoo wrote:
>> Emulate HW LAM masking when doing data access under 64-bit mode.
>>
>> kvm_lam_untag_addr() implements this: per CR4/CR3 LAM bits configuration,
>> firstly check the linear addr conforms LAM canonical, i.e. the highest
>> address bit matches bit 63. Then mask out meta data per LAM configuration.
>> If failed in above process, emulate #GP to guest.
>>
>> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>> ---
>> arch/x86/kvm/emulate.c | 13 ++++++++
>> arch/x86/kvm/x86.h     | 70 ++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 83 insertions(+)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 5cc3efa0e21c..77bd13f40711 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -700,6 +700,19 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
>> 	*max_size = 0;
>> 	switch (mode) {
>> 	case X86EMUL_MODE_PROT64:
>> +		/* LAM applies only on data access */
>> +		if (!fetch && guest_cpuid_has(ctxt->vcpu, X86_FEATURE_LAM)) {
>> +			enum lam_type type;
>> +
>> +			type = kvm_vcpu_lam_type(la, ctxt->vcpu);
>> +			if (type == LAM_ILLEGAL) {
>> +				*linear = la;
>> +				goto bad;
>> +			} else {
>> +				la = kvm_lam_untag_addr(la, type);
>> +			}
>> +		}
>> +
>> 		*linear = la;
>> 		va_bits = ctxt_virt_addr_bits(ctxt);
>> 		if (!__is_canonical_address(la, va_bits))
> ...
>
>> +static inline u64 kvm_lam_untag_addr(u64 addr, enum lam_type type)
>> +{
>> +	switch (type) {
>> +	case LAM_U57:
>> +	case LAM_S57:
>> +		addr = __canonical_address(addr, 57);
>> +		break;
>> +	case LAM_U48:
>> +	case LAM_S48:
>> +		addr = __canonical_address(addr, 48);
>> +		break;
>> +	case LAM_NONE:
>> +	default:
>> +		break;
>> +	}
>> +
>> +	return addr;
>> +}
> LAM's change to canonicality check is:
> before performing the check, software metadata in pointers is masked by
> sign-extending the value of bit 56/47.
>
> so, to emulate this behavior, in kvm_lam_untag_addr(), we can simply:
> 1. determine which LAM configuration is enabled, LAM57 or LAM48.
> 2. mask software metadata by sign-extending the bit56/47, i.e.,
>
> 	addr = (sign_extern64(addr, X) & ~BIT_ULL(63)) |
> 	       (addr & BIT_ULL(63));
>
> 	where X=56 for LAM57 and X=47 for LAM48.
>
> Note that this doesn't ensure the resulting @addr is canonical. It
> isn't a problem because the original canonicality check
> (__is_canonical_address() above) can identify non-canonical addresses
> and raise #GP/#SS to the guest.

Thanks for your suggestion. It's much simpler.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 1/5] KVM: x86: Virtualize CR4.LAM_SUP
  2023-03-02  7:17   ` Chao Gao
@ 2023-03-02 12:03     ` Binbin Wu
  2023-03-02 13:00     ` Robert Hoo
  1 sibling, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2023-03-02 12:03 UTC (permalink / raw)
  To: Chao Gao, Robert Hoo; +Cc: seanjc, pbonzini, kvm


On 3/2/2023 3:17 PM, Chao Gao wrote:
> On Mon, Feb 27, 2023 at 04:45:43PM +0800, Robert Hoo wrote:
>> LAM feature uses CR4 bit[28] (LAM_SUP) to enable/config LAM masking on
>> supervisor mode address. To virtualize that, move CR4.LAM_SUP out of
>> unconditional CR4_RESERVED_BITS; its reservation now depends on vCPU has
>> LAM feature or not.
>>
>> Not passing through to guest but intercept it, is to avoid read VMCS field
>> every time when KVM fetch its value, with expectation that guest won't
>> toggle this bit frequently.
>>
>> There's no other features/vmx_exec_controls connections, therefore no code
>> need to be complemented in kvm/vmx_set_cr4().
>>
>> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>> ---
>> arch/x86/include/asm/kvm_host.h | 3 ++-
>> arch/x86/kvm/x86.h              | 2 ++
>> 2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index f35f1ff4427b..4684896698f4 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -125,7 +125,8 @@
>> 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
>> 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
>> 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
>> -			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
>> +			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
>> +			  | X86_CR4_LAM_SUP))
>>
>> #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
>>
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index 9de72586f406..8ec5cc983062 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -475,6 +475,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
>> 		__reserved_bits |= X86_CR4_VMXE;        \
>> 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
>> 		__reserved_bits |= X86_CR4_PCIDE;       \
>> +	if (!__cpu_has(__c, X86_FEATURE_LAM))		\
>> +		__reserved_bits |= X86_CR4_LAM_SUP;	\
>> 	__reserved_bits;                                \
>> })
> Add X86_CR4_LAM_SUP to cr4_fixed1 in nested_vmx_cr_fixed1_bits_update()
> to indicate CR4.LAM_SUP is allowed to be 0 or 1 in VMX operation.

Thanks for pointing it out. Will fix it in the next version.


>
> With this fixed,
>
> Reviewed-by: Chao Gao <chao.gao@intel.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 1/5] KVM: x86: Virtualize CR4.LAM_SUP
  2023-03-02  7:17   ` Chao Gao
  2023-03-02 12:03     ` Binbin Wu
@ 2023-03-02 13:00     ` Robert Hoo
  1 sibling, 0 replies; 33+ messages in thread
From: Robert Hoo @ 2023-03-02 13:00 UTC (permalink / raw)
  To: Chao Gao; +Cc: seanjc, pbonzini, binbin.wu, kvm

On Thu, 2023-03-02 at 15:17 +0800, Chao Gao wrote:
> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > index 9de72586f406..8ec5cc983062 100644
> > --- a/arch/x86/kvm/x86.h
> > +++ b/arch/x86/kvm/x86.h
> > @@ -475,6 +475,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32
> > index, u32 type);
> > 		__reserved_bits |= X86_CR4_VMXE;        \
> > 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
> > 		__reserved_bits |= X86_CR4_PCIDE;       \
> > +	if (!__cpu_has(__c, X86_FEATURE_LAM))		\
> > +		__reserved_bits |= X86_CR4_LAM_SUP;	\
> > 	__reserved_bits;                                \
> > })
> 
> Add X86_CR4_LAM_SUP to cr4_fixed1 in
> nested_vmx_cr_fixed1_bits_update()
> to indicate CR4.LAM_SUP is allowed to be 0 or 1 in VMX operation.
> 
Is this going to support nested LAM?

> With this fixed,
> 
> Reviewed-by: Chao Gao <chao.gao@intel.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-03-02  6:41   ` Binbin Wu
@ 2023-03-02 13:16     ` Robert Hoo
  2023-03-03  1:08       ` Binbin Wu
  0 siblings, 1 reply; 33+ messages in thread
From: Robert Hoo @ 2023-03-02 13:16 UTC (permalink / raw)
  To: Binbin Wu, seanjc, pbonzini, chao.gao; +Cc: kvm

On Thu, 2023-03-02 at 14:41 +0800, Binbin Wu wrote:
> __linearize is not the only path the modified LAM canonical check 
> needed, also some vmexits path should be taken care of, like VMX,
> SGX 
> ENCLS.
> 
SGX isn't in this version's implementation's scope, like nested LAM.

> Also the instruction INVLPG, INVPCID should have some special
> handling 
> since LAM is not applied to the memory operand of the two
> instruction 
> according to the LAM spec.

The spec's meaning on these 2 is: LAM masking doesn't apply to their
operands (the address), so the behavior is like before LAM feature
introduced. No change.
> 
> 
> > +#ifdef CONFIG_X86_64
> > +/*
> > + * LAM Canonical Rule:
> > + * LAM_U/S48 -- bit 63 == bit 47
> > + * LAM_U/S57 -- bit 63 == bit 56
> 
> The modified LAM canonical check for LAM_U57 + 4-level paging is:
> bit 
> 63, bit 56:47 should be all 0s.
> 
Yes, this case was missed. Chao's suggestion on signed-extend + legacy
canonical check can cover this.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-03-02 13:16     ` Robert Hoo
@ 2023-03-03  1:08       ` Binbin Wu
  2023-03-03  3:16         ` Robert Hoo
  2023-03-10 20:26         ` Sean Christopherson
  0 siblings, 2 replies; 33+ messages in thread
From: Binbin Wu @ 2023-03-03  1:08 UTC (permalink / raw)
  To: Robert Hoo, seanjc, pbonzini, chao.gao; +Cc: kvm


On 3/2/2023 9:16 PM, Robert Hoo wrote:
> On Thu, 2023-03-02 at 14:41 +0800, Binbin Wu wrote:
>> __linearize is not the only path the modified LAM canonical check
>> needed, also some vmexits path should be taken care of, like VMX,
>> SGX
>> ENCLS.
>>
> SGX isn't in this version's implementation's scope, like nested LAM.

LAM in SGX enclave mode is not the scope of the this version.
But I think since the capability is exposed to guest, need to cover the 
case if the guest use the supervisor mode pointer as the operand of SGX 
EENCS operations.


>
>> Also the instruction INVLPG, INVPCID should have some special
>> handling
>> since LAM is not applied to the memory operand of the two
>> instruction
>> according to the LAM spec.
> The spec's meaning on these 2 is: LAM masking doesn't apply to their
> operands (the address), so the behavior is like before LAM feature
> introduced. No change.

Yes, LAM are not applied to the 2 instrustions, but the __linearize is 
changed.
For example, the emulation of invlpg (em_invpg) will also call it. So 
need to handle the case specificlly.
Can add a flag as the input of linearize to indicate the LAM check and 
untag is needed or not.


>>
>>> +#ifdef CONFIG_X86_64
>>> +/*
>>> + * LAM Canonical Rule:
>>> + * LAM_U/S48 -- bit 63 == bit 47
>>> + * LAM_U/S57 -- bit 63 == bit 56
>> The modified LAM canonical check for LAM_U57 + 4-level paging is:
>> bit
>> 63, bit 56:47 should be all 0s.
>>
> Yes, this case was missed. Chao's suggestion on signed-extend + legacy
> canonical check can cover this.
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-03-03  1:08       ` Binbin Wu
@ 2023-03-03  3:16         ` Robert Hoo
  2023-03-03  3:35           ` Binbin Wu
  2023-03-10 20:26         ` Sean Christopherson
  1 sibling, 1 reply; 33+ messages in thread
From: Robert Hoo @ 2023-03-03  3:16 UTC (permalink / raw)
  To: Binbin Wu, seanjc, pbonzini, chao.gao; +Cc: kvm

On Fri, 2023-03-03 at 09:08 +0800, Binbin Wu wrote:
> On 3/2/2023 9:16 PM, Robert Hoo wrote:
> > On Thu, 2023-03-02 at 14:41 +0800, Binbin Wu wrote:
> > > __linearize is not the only path the modified LAM canonical check
> > > needed, also some vmexits path should be taken care of, like VMX,
> > > SGX
> > > ENCLS.
> > > 
> > 
> > SGX isn't in this version's implementation's scope, like nested
> > LAM.
> 
> LAM in SGX enclave mode is not the scope of the this version.
> But I think since the capability is exposed to guest, 

I think you can document this or other method to call out this to user.
Even Kernel enabling doesn't include SGX interaction yet, I doubt if
it's that urgent for KVM to do this at this phase.

> need to cover the 
> case if the guest use the supervisor mode pointer

No business with pointer mode here, I think.

>  as the operand of SGX 
> EENCS operations.
> 
> 
> > 
> > > Also the instruction INVLPG, INVPCID should have some special
> > > handling
> > > since LAM is not applied to the memory operand of the two
> > > instruction
> > > according to the LAM spec.
> > 
> > The spec's meaning on these 2 is: LAM masking doesn't apply to
> > their
> > operands (the address), so the behavior is like before LAM feature
> > introduced. No change.
> 
> Yes, LAM are not applied to the 2 instrustions, but the __linearize
> is 
> changed.
> For example, the emulation of invlpg (em_invpg) will also call it.
> So 
> need to handle the case specificlly.
> Can add a flag as the input of linearize to indicate the LAM check
> and 
> untag is needed or not.
> 
No need.

"The INVLPG instruction ...
LAM does not apply to the specified memory address. Thus, in 64-bit
mode, ** if the memory address specified is in non-canonical form then
the INVLPG is the same as a NOP. **

The INVPCID instruction ...
LAM does not apply to the specified memory address, and in 64-bit
mode ** if this memory address is in non-canonical form then the
processor generates a #GP(0) exception. **"

You can double confirm in SDM: Before-and-After LAM introduced, the
behavior hasn't changed. Thus you don't need to worry about these 2
INS's emulations.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 2/5] [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3()
  2023-03-02  7:24   ` Chao Gao
@ 2023-03-03  3:23     ` Robert Hoo
  2023-03-10 20:22       ` Sean Christopherson
  0 siblings, 1 reply; 33+ messages in thread
From: Robert Hoo @ 2023-03-03  3:23 UTC (permalink / raw)
  To: Chao Gao; +Cc: seanjc, pbonzini, binbin.wu, kvm

On Thu, 2023-03-02 at 15:24 +0800, Chao Gao wrote:
> > -	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> > +	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> > 
> > 	if (pcid_enabled) {
> > 		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
> 
> pcid_enabled is used only once. You can drop it, i.e.,
> 
> 	if (kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)) {
> 
Emm, that's actually another point.
Though I won't object so, wouldn't this be compiler optimized?

And my point was: honor bool type, though in C implemention it's 0 and
!0, it has its own type value: true, false.
Implicit type casting always isn't good habit.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-03-03  3:16         ` Robert Hoo
@ 2023-03-03  3:35           ` Binbin Wu
  2023-03-03  9:00             ` Robert Hoo
  0 siblings, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2023-03-03  3:35 UTC (permalink / raw)
  To: Robert Hoo, seanjc, pbonzini, chao.gao; +Cc: kvm


On 3/3/2023 11:16 AM, Robert Hoo wrote:
> On Fri, 2023-03-03 at 09:08 +0800, Binbin Wu wrote:
>> On 3/2/2023 9:16 PM, Robert Hoo wrote:
>>> On Thu, 2023-03-02 at 14:41 +0800, Binbin Wu wrote:
>>>> __linearize is not the only path the modified LAM canonical check
>>>> needed, also some vmexits path should be taken care of, like VMX,
>>>> SGX
>>>> ENCLS.
>>>>
>>> SGX isn't in this version's implementation's scope, like nested
>>> LAM.
>> LAM in SGX enclave mode is not the scope of the this version.
>> But I think since the capability is exposed to guest,
> I think you can document this or other method to call out this to user.
> Even Kernel enabling doesn't include SGX interaction yet, I doubt if
> it's that urgent for KVM to do this at this phase.
>
>> need to cover the
>> case if the guest use the supervisor mode pointer
> No business with pointer mode here, I think.
>
>>   as the operand of SGX
>> EENCS operations.
>>
>>
>>>> Also the instruction INVLPG, INVPCID should have some special
>>>> handling
>>>> since LAM is not applied to the memory operand of the two
>>>> instruction
>>>> according to the LAM spec.
>>> The spec's meaning on these 2 is: LAM masking doesn't apply to
>>> their
>>> operands (the address), so the behavior is like before LAM feature
>>> introduced. No change.
>> Yes, LAM are not applied to the 2 instrustions, but the __linearize
>> is
>> changed.
>> For example, the emulation of invlpg (em_invpg) will also call it.
>> So
>> need to handle the case specificlly.
>> Can add a flag as the input of linearize to indicate the LAM check
>> and
>> untag is needed or not.
>>
> No need.
>
> "The INVLPG instruction ...
> LAM does not apply to the specified memory address. Thus, in 64-bit
> mode, ** if the memory address specified is in non-canonical form then
> the INVLPG is the same as a NOP. **

Based on current patch, if the address of invlpg is non-canonical, it 
will be first checked and converted by the new LAM handling.
After that, I will be canonical and do the invalidition, but not NOP.
Maybe we can say do an additional TLB invalidation may be no big 
different as NOP, but it need to be documented/comment somewhere


>
> The INVPCID instruction ...
> LAM does not apply to the specified memory address, and in 64-bit
> mode ** if this memory address is in non-canonical form then the
> processor generates a #GP(0) exception. **"
>
> You can double confirm in SDM: Before-and-After LAM introduced, the
> behavior hasn't changed. Thus you don't need to worry about these 2
> INS's emulations.

This is because currently, VMX vmexit handling is not considered yet.
The linear address of guest is retrived from get_vmx_mem_address, which 
is also will be called by INVPCID.

What arguable is that we need to cover all supervisor mode pointer cases 
in this phase.
But IMO if thesel cases are not covered, CR4.LAM_SUP should be not allow 
to be set by guest.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}
  2023-02-27  8:45 ` [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57} Robert Hoo
@ 2023-03-03  6:21   ` Chao Gao
  2023-03-03 14:23     ` Robert Hoo
  2023-03-10 20:12   ` Sean Christopherson
  1 sibling, 1 reply; 33+ messages in thread
From: Chao Gao @ 2023-03-03  6:21 UTC (permalink / raw)
  To: Robert Hoo; +Cc: seanjc, pbonzini, binbin.wu, kvm

On Mon, Feb 27, 2023 at 04:45:45PM +0800, Robert Hoo wrote:
>LAM feature uses 2 high bits in CR3 (bit 62 for LAM_U48 and bit 61 for
>LAM_U57) to enable/config LAM feature for user mode addresses. The LAM
>masking is done before legacy canonical checks.
>
>To virtualize LAM CR3 bits usage, this patch:
>1. Don't reserve these 2 bits when LAM is enable on the vCPU. Previously
>when validate CR3, kvm uses kvm_vcpu_is_legal_gpa(), now define
>kvm_vcpu_is_valid_cr3() which is actually kvm_vcpu_is_legal_gpa()
>+ CR3.LAM bits validation. Substitutes kvm_vcpu_is_legal/illegal_gpa()
>with kvm_vcpu_is_valid_cr3() in call sites where is validating CR3 rather
>than pure GPA.
>2. mmu::get_guest_pgd(), its implementation is get_cr3() which returns
>whole guest CR3 value. Strip LAM bits in those call sites that need pure
>PGD value, e.g. mmu_alloc_shadow_roots(), FNAME(walk_addr_generic)().
>3. When form a new guest CR3 (vmx_load_mmu_pgd()), melt in LAM bit
>(kvm_get_active_lam()).
>4. When guest sets CR3, identify ONLY-LAM-bits-toggling cases, where it is
>unnecessary to make new pgd, but just make request of load pgd, then new
>CR3.LAM bits configuration will be melt in (above point 3). To be
>conservative, this case still do TLB flush.

>5. For nested VM entry, allow the 2 CR3 bits set in corresponding
>VMCS host/guest fields.

isn't this already covered by item #1 above?

>
>Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>---
> arch/x86/kvm/mmu.h             |  5 +++++
> arch/x86/kvm/mmu/mmu.c         |  9 ++++++++-
> arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
> arch/x86/kvm/vmx/nested.c      |  4 ++--
> arch/x86/kvm/vmx/vmx.c         |  3 ++-
> arch/x86/kvm/x86.c             | 33 ++++++++++++++++++++++++++++-----
> arch/x86/kvm/x86.h             |  1 +
> 7 files changed, 47 insertions(+), 10 deletions(-)
>
>diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>index 6bdaacb6faa0..866f2b7cb509 100644
>--- a/arch/x86/kvm/mmu.h
>+++ b/arch/x86/kvm/mmu.h
>@@ -142,6 +142,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
> 	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
> }
> 
>+static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu)
>+{
>+	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
>+}

I think it is better to define a mask (like reserved_gpa_bits):

kvm_vcpu_arch {
	...

	/*
	 * Bits in CR3 used to enable certain features. These bits don't
	 * participate in page table walking. They should be masked to
	 * get the base address of page table. When shadow paging is
	 * used, these bits should be kept as is in the shadow CR3.
	 */
	u64 cr3_control_bits;

and initialize the mask in kvm_vcpu_after_set_cpuid():

	if (guest_cpuid_has(X86_FEATURE_LAM))
		vcpu->arch.cr3_control_bits = X86_CR3_LAM_U48 | X86_CR3_LAM_U57;

then add helpers to extract/mask control bits.

It is cleaner and can avoid looking up guest CPUID at runtime. And if
AMD has a similar feature (e.g., some CR3 bits are used as control bits),
it is easy to support that feature.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 5/5] KVM: x86: LAM: Expose LAM CPUID to user space VMM
  2023-02-27  8:45 ` [PATCH v5 5/5] KVM: x86: LAM: Expose LAM CPUID to user space VMM Robert Hoo
@ 2023-03-03  6:46   ` Chao Gao
  0 siblings, 0 replies; 33+ messages in thread
From: Chao Gao @ 2023-03-03  6:46 UTC (permalink / raw)
  To: Robert Hoo; +Cc: seanjc, pbonzini, binbin.wu, kvm, Jingqi Liu

On Mon, Feb 27, 2023 at 04:45:47PM +0800, Robert Hoo wrote:
>LAM feature is enumerated by (EAX=07H, ECX=01H):EAX.LAM[bit26].

Please add some high-level introduction about LAM.

Maybe also call out that LAM's CR3 bits, CR4 bit and the modified
canonicality check are virtualized already. As the last step,
advertise LAM support to user space.

>
>Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
>---
> arch/x86/kvm/cpuid.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>index b14653b61470..79f45cbe587e 100644
>--- a/arch/x86/kvm/cpuid.c
>+++ b/arch/x86/kvm/cpuid.c
>@@ -664,7 +664,7 @@ void kvm_set_cpu_caps(void)
> 
> 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
> 		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) | F(AMX_FP16) |
>-		F(AVX_IFMA)
>+		F(AVX_IFMA) | F(LAM)
> 	);
> 
> 	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
>-- 
>2.31.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-03-03  3:35           ` Binbin Wu
@ 2023-03-03  9:00             ` Robert Hoo
  2023-03-03 10:18               ` Binbin Wu
  0 siblings, 1 reply; 33+ messages in thread
From: Robert Hoo @ 2023-03-03  9:00 UTC (permalink / raw)
  To: Binbin Wu, seanjc, pbonzini, chao.gao; +Cc: kvm

On Fri, 2023-03-03 at 11:35 +0800, Binbin Wu wrote:
> > > > > Also the instruction INVLPG, INVPCID should have some special
> > > > > handling
> > > > > since LAM is not applied to the memory operand of the two
> > > > > instruction
> > > > > according to the LAM spec.
> > > > 
> > > > The spec's meaning on these 2 is: LAM masking doesn't apply to
> > > > their
> > > > operands (the address), so the behavior is like before LAM
> > > > feature
> > > > introduced. No change.
> > > 
> > > Yes, LAM are not applied to the 2 instrustions, but the
> > > __linearize
> > > is
> > > changed.
> > > For example, the emulation of invlpg (em_invpg) will also call
> > > it.

Can you elaborate more on this? what emulation case of INVLPG? do you
mean vm-exit handling due to Guest execute INVLPG when
VMCS_control.INVLPG_exiting set? or other case?

> > > So
> > > need to handle the case specificlly.
> > > Can add a flag as the input of linearize to indicate the LAM
> > > check
> > > and
> > > untag is needed or not.
> > > 
> > 
> > No need.
> > 
> > "The INVLPG instruction ...
> > LAM does not apply to the specified memory address. Thus, in 64-bit
> > mode, ** if the memory address specified is in non-canonical form
> > then
> > the INVLPG is the same as a NOP. **
> 
> Based on current patch, if the address of invlpg is non-canonical,
> it 
> will be first checked and converted by the new LAM handling.
> After that, I will be canonical and do the invalidition, but not NOP.
> Maybe we can say do an additional TLB invalidation may be no big 
> different as NOP, but it need to be documented/comment somewhere
> 
> 
> > 
> > The INVPCID instruction ...
> > LAM does not apply to the specified memory address, and in 64-bit
> > mode ** if this memory address is in non-canonical form then the
> > processor generates a #GP(0) exception. **"
> > 
> > You can double confirm in SDM: Before-and-After LAM introduced, the
> > behavior hasn't changed. Thus you don't need to worry about these 2
> > INS's emulations.
> 
> This is because currently, VMX vmexit handling is not considered yet.
> The linear address of guest is retrived from get_vmx_mem_address,
> which 
> is also will be called by INVPCID.

Again, nested LAM isn't in this patch set's scope.
In terms of handle_invpcid() --> get_vmx_mem_address(), per Spec, no
behavior changes, no changes needed.
> 
> What arguable is that we need to cover all supervisor mode pointer
> cases 
> in this phase.
> But IMO if thesel cases are not covered, CR4.LAM_SUP should be not
> allow 
> to be set by guest.
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-03-03  9:00             ` Robert Hoo
@ 2023-03-03 10:18               ` Binbin Wu
  0 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2023-03-03 10:18 UTC (permalink / raw)
  To: Robert Hoo, seanjc, pbonzini, chao.gao; +Cc: kvm


On 3/3/2023 5:00 PM, Robert Hoo wrote:
> On Fri, 2023-03-03 at 11:35 +0800, Binbin Wu wrote:
>>>>>> Also the instruction INVLPG, INVPCID should have some special
>>>>>> handling
>>>>>> since LAM is not applied to the memory operand of the two
>>>>>> instruction
>>>>>> according to the LAM spec.
>>>>> The spec's meaning on these 2 is: LAM masking doesn't apply to
>>>>> their
>>>>> operands (the address), so the behavior is like before LAM
>>>>> feature
>>>>> introduced. No change.
>>>> Yes, LAM are not applied to the 2 instrustions, but the
>>>> __linearize
>>>> is
>>>> changed.
>>>> For example, the emulation of invlpg (em_invpg) will also call
>>>> it.
> Can you elaborate more on this? what emulation case of INVLPG? do you
> mean vm-exit handling due to Guest execute INVLPG when
> VMCS_control.INVLPG_exiting set? or other case?

Not the vm-exit handling code. em_invpg() belongs to instruction emulation.

But I am not sure this code is reachable or not. Let me double check.



>
>>>> So
>>>> need to handle the case specificlly.
>>>> Can add a flag as the input of linearize to indicate the LAM
>>>> check
>>>> and
>>>> untag is needed or not.
>>>>
>>> No need.
>>>
>>> "The INVLPG instruction ...
>>> LAM does not apply to the specified memory address. Thus, in 64-bit
>>> mode, ** if the memory address specified is in non-canonical form
>>> then
>>> the INVLPG is the same as a NOP. **
>> Based on current patch, if the address of invlpg is non-canonical,
>> it
>> will be first checked and converted by the new LAM handling.
>> After that, I will be canonical and do the invalidition, but not NOP.
>> Maybe we can say do an additional TLB invalidation may be no big
>> different as NOP, but it need to be documented/comment somewhere
>>
>>
>>> The INVPCID instruction ...
>>> LAM does not apply to the specified memory address, and in 64-bit
>>> mode ** if this memory address is in non-canonical form then the
>>> processor generates a #GP(0) exception. **"
>>>
>>> You can double confirm in SDM: Before-and-After LAM introduced, the
>>> behavior hasn't changed. Thus you don't need to worry about these 2
>>> INS's emulations.
>> This is because currently, VMX vmexit handling is not considered yet.
>> The linear address of guest is retrived from get_vmx_mem_address,
>> which
>> is also will be called by INVPCID.
> Again, nested LAM isn't in this patch set's scope.
> In terms of handle_invpcid() --> get_vmx_mem_address(), per Spec, no
> behavior changes, no changes needed.

OK.


>> What arguable is that we need to cover all supervisor mode pointer
>> cases
>> in this phase.
>> But IMO if thesel cases are not covered, CR4.LAM_SUP should be not
>> allow
>> to be set by guest.
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}
  2023-03-03  6:21   ` Chao Gao
@ 2023-03-03 14:23     ` Robert Hoo
  2023-03-03 15:53       ` Chao Gao
  0 siblings, 1 reply; 33+ messages in thread
From: Robert Hoo @ 2023-03-03 14:23 UTC (permalink / raw)
  To: Chao Gao; +Cc: seanjc, pbonzini, binbin.wu, kvm

On Fri, 2023-03-03 at 14:21 +0800, Chao Gao wrote:
> On Mon, Feb 27, 2023 at 04:45:45PM +0800, Robert Hoo wrote:
> > LAM feature uses 2 high bits in CR3 (bit 62 for LAM_U48 and bit 61
> > for
> > LAM_U57) to enable/config LAM feature for user mode addresses. The
> > LAM
> > masking is done before legacy canonical checks.
> > 
> > To virtualize LAM CR3 bits usage, this patch:
> > 1. Don't reserve these 2 bits when LAM is enable on the vCPU.
> > Previously
> > when validate CR3, kvm uses kvm_vcpu_is_legal_gpa(), now define
> > kvm_vcpu_is_valid_cr3() which is actually kvm_vcpu_is_legal_gpa()
> > + CR3.LAM bits validation. Substitutes
> > kvm_vcpu_is_legal/illegal_gpa()
> > with kvm_vcpu_is_valid_cr3() in call sites where is validating CR3
> > rather
> > than pure GPA.
> > 2. mmu::get_guest_pgd(), its implementation is get_cr3() which
> > returns
> > whole guest CR3 value. Strip LAM bits in those call sites that need
> > pure
> > PGD value, e.g. mmu_alloc_shadow_roots(),
> > FNAME(walk_addr_generic)().
> > 3. When form a new guest CR3 (vmx_load_mmu_pgd()), melt in LAM bit
> > (kvm_get_active_lam()).
> > 4. When guest sets CR3, identify ONLY-LAM-bits-toggling cases,
> > where it is
> > unnecessary to make new pgd, but just make request of load pgd,
> > then new
> > CR3.LAM bits configuration will be melt in (above point 3). To be
> > conservative, this case still do TLB flush.
> > 5. For nested VM entry, allow the 2 CR3 bits set in corresponding
> > VMCS host/guest fields.
> 
> isn't this already covered by item #1 above?

Ah, it is to address your comments on last version. To repeat/emphasize
again, doesn't harm, does it?;) 
> 
(...)
> > 
> > +static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu)
> > +{
> > +	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 |
> > X86_CR3_LAM_U57);
> > +}
> 
> I think it is better to define a mask (like reserved_gpa_bits):
> 
> kvm_vcpu_arch {
> 	...
> 
> 	/*
> 	 * Bits in CR3 used to enable certain features. These bits
> don't
> 	 * participate in page table walking. They should be masked to
> 	 * get the base address of page table. When shadow paging is
> 	 * used, these bits should be kept as is in the shadow CR3.
> 	 */
> 	u64 cr3_control_bits;
> 

I don't strongly object this. But per SDM, CR3.bit[63:MAXPHYADDR] are
reserved; and MAXPHYADDR is at most 52 [1]. So can we assert and simply
define the MASK bit[63:52]? (I did this in v3 and prior)

[1] CPUID.80000008H:EAX[7:0] reports the physical-address width
supported by the processor. (For processors
that do not support CPUID function 80000008H, the width is generally 36
if CPUID.01H:EDX.PAE [bit 6] = 1
and 32 otherwise.) This width is referred to as MAXPHYADDR. MAXPHYADDR
is at most 52. (SDM 4.1.4 Enumeration of Paging Features by CPUID)

> and initialize the mask in kvm_vcpu_after_set_cpuid():
> 
> 	if (guest_cpuid_has(X86_FEATURE_LAM))
> 		vcpu->arch.cr3_control_bits = X86_CR3_LAM_U48 |
> X86_CR3_LAM_U57;
> 
> then add helpers to extract/mask control bits.
> 
> It is cleaner and can avoid looking up guest CPUID at runtime.
>  And if
> AMD has a similar feature (e.g., some CR3 bits are used as control
> bits),
> it is easy to support that feature.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}
  2023-03-03 14:23     ` Robert Hoo
@ 2023-03-03 15:53       ` Chao Gao
  2023-03-05  1:31         ` Robert Hoo
  0 siblings, 1 reply; 33+ messages in thread
From: Chao Gao @ 2023-03-03 15:53 UTC (permalink / raw)
  To: Robert Hoo; +Cc: seanjc, pbonzini, binbin.wu, kvm

On Fri, Mar 03, 2023 at 10:23:50PM +0800, Robert Hoo wrote:
>On Fri, 2023-03-03 at 14:21 +0800, Chao Gao wrote:
>> On Mon, Feb 27, 2023 at 04:45:45PM +0800, Robert Hoo wrote:
>> > LAM feature uses 2 high bits in CR3 (bit 62 for LAM_U48 and bit 61
>> > for
>> > LAM_U57) to enable/config LAM feature for user mode addresses. The
>> > LAM
>> > masking is done before legacy canonical checks.
>> > 
>> > To virtualize LAM CR3 bits usage, this patch:
>> > 1. Don't reserve these 2 bits when LAM is enable on the vCPU.
>> > Previously
>> > when validate CR3, kvm uses kvm_vcpu_is_legal_gpa(), now define
>> > kvm_vcpu_is_valid_cr3() which is actually kvm_vcpu_is_legal_gpa()
>> > + CR3.LAM bits validation. Substitutes
>> > kvm_vcpu_is_legal/illegal_gpa()
>> > with kvm_vcpu_is_valid_cr3() in call sites where is validating CR3
>> > rather
>> > than pure GPA.
>> > 2. mmu::get_guest_pgd(), its implementation is get_cr3() which
>> > returns
>> > whole guest CR3 value. Strip LAM bits in those call sites that need
>> > pure
>> > PGD value, e.g. mmu_alloc_shadow_roots(),
>> > FNAME(walk_addr_generic)().
>> > 3. When form a new guest CR3 (vmx_load_mmu_pgd()), melt in LAM bit
>> > (kvm_get_active_lam()).
>> > 4. When guest sets CR3, identify ONLY-LAM-bits-toggling cases,
>> > where it is
>> > unnecessary to make new pgd, but just make request of load pgd,
>> > then new
>> > CR3.LAM bits configuration will be melt in (above point 3). To be
>> > conservative, this case still do TLB flush.
>> > 5. For nested VM entry, allow the 2 CR3 bits set in corresponding
>> > VMCS host/guest fields.
>> 
>> isn't this already covered by item #1 above?
>
>Ah, it is to address your comments on last version. To repeat/emphasize
>again, doesn't harm, does it?;) 

It is confusing. Trying to merge #5 to #1:

If LAM is supported, bits 62:61 (LAM_U48 and LAM_U57) are not reserved
in CR3. VM entry also allows the two bits to be set in CR3 field in
guest-state and host-state area of the VMCS. Previously ...

>> 
>(...)
>> > 
>> > +static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu)
>> > +{
>> > +	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 |
>> > X86_CR3_LAM_U57);
>> > +}
>> 
>> I think it is better to define a mask (like reserved_gpa_bits):
>> 
>> kvm_vcpu_arch {
>> 	...
>> 
>> 	/*
>> 	 * Bits in CR3 used to enable certain features. These bits
>> don't
>> 	 * participate in page table walking. They should be masked to
>> 	 * get the base address of page table. When shadow paging is
>> 	 * used, these bits should be kept as is in the shadow CR3.
>> 	 */
>> 	u64 cr3_control_bits;
>> 
>
>I don't strongly object this. But per SDM, CR3.bit[63:MAXPHYADDR] are
>reserved; and MAXPHYADDR is at most 52 [1]. So can we assert and simply
>define the MASK bit[63:52]? (I did this in v3 and prior)

No. Setting any bit in 60:52 should be rejected. And setting bit 62 or
61 should be allowed if LAM is supported by the vCPU. I don't see how
your proposal can distinguish these two cases.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}
  2023-03-03 15:53       ` Chao Gao
@ 2023-03-05  1:31         ` Robert Hoo
  0 siblings, 0 replies; 33+ messages in thread
From: Robert Hoo @ 2023-03-05  1:31 UTC (permalink / raw)
  To: Chao Gao; +Cc: seanjc, pbonzini, binbin.wu, kvm

On Fri, 2023-03-03 at 23:53 +0800, Chao Gao wrote:
> On Fri, Mar 03, 2023 at 10:23:50PM +0800, Robert Hoo wrote:
> > On Fri, 2023-03-03 at 14:21 +0800, Chao Gao wrote:
> > > On Mon, Feb 27, 2023 at 04:45:45PM +0800, Robert Hoo wrote:
> > > > LAM feature uses 2 high bits in CR3 (bit 62 for LAM_U48 and bit
> > > > 61
> > > > for
> > > > LAM_U57) to enable/config LAM feature for user mode addresses.
> > > > The
> > > > LAM
> > > > masking is done before legacy canonical checks.
> > > > 
> > > > To virtualize LAM CR3 bits usage, this patch:
> > > > 1. Don't reserve these 2 bits when LAM is enable on the vCPU.
> > > > Previously
> > > > when validate CR3, kvm uses kvm_vcpu_is_legal_gpa(), now define
> > > > kvm_vcpu_is_valid_cr3() which is actually
> > > > kvm_vcpu_is_legal_gpa()
> > > > + CR3.LAM bits validation. Substitutes
> > > > kvm_vcpu_is_legal/illegal_gpa()
> > > > with kvm_vcpu_is_valid_cr3() in call sites where is validating
> > > > CR3
> > > > rather
> > > > than pure GPA.
> > > > 2. mmu::get_guest_pgd(), its implementation is get_cr3() which
> > > > returns
> > > > whole guest CR3 value. Strip LAM bits in those call sites that
> > > > need
> > > > pure
> > > > PGD value, e.g. mmu_alloc_shadow_roots(),
> > > > FNAME(walk_addr_generic)().
> > > > 3. When form a new guest CR3 (vmx_load_mmu_pgd()), melt in LAM
> > > > bit
> > > > (kvm_get_active_lam()).
> > > > 4. When guest sets CR3, identify ONLY-LAM-bits-toggling cases,
> > > > where it is
> > > > unnecessary to make new pgd, but just make request of load pgd,
> > > > then new
> > > > CR3.LAM bits configuration will be melt in (above point 3). To
> > > > be
> > > > conservative, this case still do TLB flush.
> > > > 5. For nested VM entry, allow the 2 CR3 bits set in
> > > > corresponding
> > > > VMCS host/guest fields.
> > > 
> > > isn't this already covered by item #1 above?
> > 
> > Ah, it is to address your comments on last version. To
> > repeat/emphasize
> > again, doesn't harm, does it?;) 
> 
> It is confusing. Trying to merge #5 to #1:

Well this is kind of subjective. I don't have any bias on this.
> 
> If LAM is supported, bits 62:61 (LAM_U48 and LAM_U57) are not
> reserved
> in CR3. VM entry also allows the two bits to be set in CR3 field in
> guest-state and host-state area of the VMCS. Previously ...
> 
> > > 
> > 
> > (...)
> > > > 
> > > > +static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 |
> > > > X86_CR3_LAM_U57);
> > > > +}
> > > 
> > > I think it is better to define a mask (like reserved_gpa_bits):
> > > 
> > > kvm_vcpu_arch {
> > > 	...
> > > 
> > > 	/*
> > > 	 * Bits in CR3 used to enable certain features. These bits
> > > don't
> > > 	 * participate in page table walking. They should be masked to
> > > 	 * get the base address of page table. When shadow paging is
> > > 	 * used, these bits should be kept as is in the shadow CR3.
> > > 	 */
> > > 	u64 cr3_control_bits;
> > > 
> > 
> > I don't strongly object this. But per SDM, CR3.bit[63:MAXPHYADDR]
> > are
> > reserved; and MAXPHYADDR is at most 52 [1]. So can we assert and
> > simply
> > define the MASK bit[63:52]? (I did this in v3 and prior)
> 
> No. Setting any bit in 60:52 should be rejected. And setting bit 62
> or
> 61 should be allowed if LAM is supported by the vCPU. I don't see how
> your proposal can distinguish these two cases.

No you didn't get my point.
Perhaps you can take a look at v3 patch and prior
https://lore.kernel.org/kvm/20221209044557.1496580-4-robert.hu@linux.intel.com/

define CR3_HIGH_RSVD_MASK, given "MAXPHYADDR is at most 52" is stated
in SDM.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}
  2023-02-27  8:45 ` [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57} Robert Hoo
  2023-03-03  6:21   ` Chao Gao
@ 2023-03-10 20:12   ` Sean Christopherson
  2023-03-20  6:57     ` Binbin Wu
  1 sibling, 1 reply; 33+ messages in thread
From: Sean Christopherson @ 2023-03-10 20:12 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, chao.gao, binbin.wu, kvm

On Mon, Feb 27, 2023, Robert Hoo wrote:
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 835426254e76..3efec7f8d8c6 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3699,7 +3699,14 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  	int quadrant, i, r;
>  	hpa_t root;
>  
> -	root_pgd = mmu->get_guest_pgd(vcpu);
> +	/*
> +	 * Omit guest_cpuid_has(X86_FEATURE_LAM) check but can unconditionally
> +	 * strip CR3 LAM bits because they resides in high reserved bits,
> +	 * with LAM or not, those high bits should be striped anyway when
> +	 * interpreted to pgd.
> +	 */

This misses the most important part: why it's safe to ignore LAM bits when reusing
a root.

> +	root_pgd = mmu->get_guest_pgd(vcpu) &
> +		   ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);

Unconditionally clearing LAM bits is unsafe.  At some point the EPTP may define
bits in the same area that must NOT be omitted from the root cache, e.g. the PWL
bits in the EPTP _need_ to be incorporated into is_root_usable().

For simplicity, I'm very, very tempted to say we should just leave the LAM bits
in root.pgd, i.e. force a new root for a CR3+LAM combination.  First and foremost,
that only matters for shadow paging.  Second, unless a guest kernel allows per-thread
LAM settings, KVM the extra checks will be benign.  And AIUI, the proposed kernel
implementation is to apply LAM on a per-MM basis.

And I would much prefer to solve the GFN calculation generically.  E.g. it really
should be something like this

	root_pgd = mmu->get_guest_pgd(vcpu);
	root_gfn = mmu->gpte_to_gfn(root_pgd);

but having to set gpte_to_gfn() in the MMU is quite unfortunate, and gpte_to_gfn()
is technically insufficient for PAE since it relies on previous checks to prevent
consuming a 64-bit CR3.

I was going to suggest extracting the maximal base addr mask and use that, e.g.

	#define __PT_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))

Maybe this?

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c8ebe542c565..8b2d2a6081b3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3732,7 +3732,12 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
        hpa_t root;
 
        root_pgd = mmu->get_guest_pgd(vcpu);
-       root_gfn = root_pgd >> PAGE_SHIFT;
+
+       /*
+        * The guest PGD has already been checked for validity, unconditionally
+        * strip non-address bits when computing the GFN.
+        */
+       root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
 
        if (mmu_check_root(vcpu, root_gfn))
                return 1;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index cc58631e2336..c0479cbc2ca3 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -21,6 +21,7 @@ extern bool dbg;
 #endif
 
 /* Page table builder macros common to shadow (host) PTEs and guest PTEs. */
+#define __PT_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
 #define __PT_LEVEL_SHIFT(level, bits_per_level)        \
        (PAGE_SHIFT + ((level) - 1) * (bits_per_level))
 #define __PT_INDEX(address, level, bits_per_level) \
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 57f0b75c80f9..0583bfce3b52 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -62,7 +62,7 @@
 #endif
 
 /* Common logic, but per-type values.  These also need to be undefined. */
-#define PT_BASE_ADDR_MASK      ((pt_element_t)(((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)))
+#define PT_BASE_ADDR_MASK      ((pt_element_t)__PT_BASE_ADDR_MASK)
 #define PT_LVL_ADDR_MASK(lvl)  __PT_LVL_ADDR_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
 #define PT_LVL_OFFSET_MASK(lvl)        __PT_LVL_OFFSET_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
 #define PT_INDEX(addr, lvl)    __PT_INDEX(addr, lvl, PT_LEVEL_BITS)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 1279db2eab44..777f7d443e3b 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -36,7 +36,7 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
 #ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
 #define SPTE_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1))
 #else
-#define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
+#define SPTE_BASE_ADDR_MASK __PT_BASE_ADDR_MASK
 #endif
 
 #define SPTE_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \

>  	root_gfn = root_pgd >> PAGE_SHIFT;
>  
>  	if (mmu_check_root(vcpu, root_gfn))
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 0f6455072055..57f39c7492ed 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -324,7 +324,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  	trace_kvm_mmu_pagetable_walk(addr, access);
>  retry_walk:
>  	walker->level = mmu->cpu_role.base.level;
> -	pte           = mmu->get_guest_pgd(vcpu);
> +	pte           = mmu->get_guest_pgd(vcpu) & ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);

This should be unnecessary, gpte_to_gfn() is supposed to strip non-address bits.

>  	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
>  
>  #ifdef CONFIG_X86_64
>  	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>  
> @@ -1254,14 +1265,26 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
>  	 * the current vCPU mode is accurate.
>  	 */
> -	if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
> +	if (!kvm_vcpu_is_valid_cr3(vcpu, cr3))
>  		return 1;
>  
>  	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
>  		return 1;
>  
> -	if (cr3 != kvm_read_cr3(vcpu))
> -		kvm_mmu_new_pgd(vcpu, cr3);
> +	old_cr3 = kvm_read_cr3(vcpu);
> +	if (cr3 != old_cr3) {
> +		if ((cr3 ^ old_cr3) & ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57)) {
> +			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
> +					X86_CR3_LAM_U57));

As above, no change is needed here if LAM is tracked in the PGD.

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 2/5] [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3()
  2023-03-03  3:23     ` Robert Hoo
@ 2023-03-10 20:22       ` Sean Christopherson
  2023-03-20 12:05         ` Binbin Wu
  0 siblings, 1 reply; 33+ messages in thread
From: Sean Christopherson @ 2023-03-10 20:22 UTC (permalink / raw)
  To: Robert Hoo; +Cc: Chao Gao, pbonzini, binbin.wu, kvm

As Chao pointed out, this does not belong in the LAM series.  And FWIW, I highly
recommend NOT tagging things as Trivial.  If you're wrong and the patch _isn't_
trivial, it only slows things down.  And if you're right, then expediting the
patch can't possibly be necessary.

On Fri, Mar 03, 2023, Robert Hoo wrote:
> On Thu, 2023-03-02 at 15:24 +0800, Chao Gao wrote:
> > > -	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> > > +	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> > > 
> > > 	if (pcid_enabled) {
> > > 		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
> > 
> > pcid_enabled is used only once. You can drop it, i.e.,
> > 
> > 	if (kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)) {
> > 
> Emm, that's actually another point.
> Though I won't object so, wouldn't this be compiler optimized?
> 
> And my point was: honor bool type, though in C implemention it's 0 and
> !0, it has its own type value: true, false.
> Implicit type casting always isn't good habit.

I don't disagree, but I also don't particularly want to "fix" one case while
ignoring the many others, e.g. kvm_handle_invpcid() has the exact same "buggy"
pattern.

I would be supportive of a patch that adds helpers and then converts all of the
relevant CR0/CR4 checks though...

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 4c91f626c058..6e3cb958afdd 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -157,6 +157,14 @@ static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask)
        return vcpu->arch.cr0 & mask;
 }
 
+static __always_inline bool kvm_is_cr0_bit_set(struct kvm_vcpu *vcpu,
+                                              unsigned long cr0_bit)
+{
+       BUILD_BUG_ON(!is_power_of_2(cr0_bit));
+
+       return !!kvm_read_cr0_bits(vcpu, cr0_bit);
+}
+
 static inline ulong kvm_read_cr0(struct kvm_vcpu *vcpu)
 {
        return kvm_read_cr0_bits(vcpu, ~0UL);
@@ -178,6 +186,14 @@ static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu)
        return vcpu->arch.cr3;
 }
 
+static __always_inline bool kvm_is_cr4_bit_set(struct kvm_vcpu *vcpu,
+                                              unsigned long cr4_bit)
+{
+       BUILD_BUG_ON(!is_power_of_2(cr4_bit));
+
+       return !!kvm_read_cr4_bits(vcpu, cr4_bit);
+}
+

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-02-27  8:45 ` [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode Robert Hoo
  2023-03-02  6:41   ` Binbin Wu
  2023-03-02  8:55   ` Chao Gao
@ 2023-03-10 20:23   ` Sean Christopherson
  2 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2023-03-10 20:23 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, chao.gao, binbin.wu, kvm

On Mon, Feb 27, 2023, Robert Hoo wrote:
> Emulate HW LAM masking when doing data access under 64-bit mode.
> 
> kvm_lam_untag_addr() implements this: per CR4/CR3 LAM bits configuration,
> firstly check the linear addr conforms LAM canonical, i.e. the highest
> address bit matches bit 63. Then mask out meta data per LAM configuration.
> If failed in above process, emulate #GP to guest.
> 
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> ---
>  arch/x86/kvm/emulate.c | 13 ++++++++
>  arch/x86/kvm/x86.h     | 70 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 5cc3efa0e21c..77bd13f40711 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -700,6 +700,19 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
>  	*max_size = 0;
>  	switch (mode) {
>  	case X86EMUL_MODE_PROT64:
> +		/* LAM applies only on data access */
> +		if (!fetch && guest_cpuid_has(ctxt->vcpu, X86_FEATURE_LAM)) {

Derefencing ctxt->vcpu in the emulator is not allowed.

> +			enum lam_type type;
> +
> +			type = kvm_vcpu_lam_type(la, ctxt->vcpu);
> +			if (type == LAM_ILLEGAL) {
> +				*linear = la;
> +				goto bad;
> +			} else {
> +				la = kvm_lam_untag_addr(la, type);
> +			}
> +		}

This is wildly over-engineered.  Just do the untagging and let __is_canonical_address()
catch any non-canonical bits that weren't stripped.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode
  2023-03-03  1:08       ` Binbin Wu
  2023-03-03  3:16         ` Robert Hoo
@ 2023-03-10 20:26         ` Sean Christopherson
  1 sibling, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2023-03-10 20:26 UTC (permalink / raw)
  To: Binbin Wu; +Cc: Robert Hoo, pbonzini, chao.gao, kvm

On Fri, Mar 03, 2023, Binbin Wu wrote:
> 
> On 3/2/2023 9:16 PM, Robert Hoo wrote:
> > On Thu, 2023-03-02 at 14:41 +0800, Binbin Wu wrote:
> > > __linearize is not the only path the modified LAM canonical check
> > > needed, also some vmexits path should be taken care of, like VMX,
> > > SGX
> > > ENCLS.
> > > 
> > SGX isn't in this version's implementation's scope, like nested LAM.
> 
> LAM in SGX enclave mode is not the scope of the this version.

I'm not merging half-baked support.  Not supporting nested LAM _may_ be ok, because
KVM can _prevent_ exposing LAM to L2.  I say "may" because I would still _very_
strongly prefer that nested support be added in the initial series.

But omitting architectural interactions just because is not going to happen.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}
  2023-03-10 20:12   ` Sean Christopherson
@ 2023-03-20  6:57     ` Binbin Wu
  0 siblings, 0 replies; 33+ messages in thread
From: Binbin Wu @ 2023-03-20  6:57 UTC (permalink / raw)
  To: Sean Christopherson, Robert Hoo; +Cc: pbonzini, chao.gao, kvm


On 3/11/2023 4:12 AM, Sean Christopherson wrote:
> On Mon, Feb 27, 2023, Robert Hoo wrote:
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 835426254e76..3efec7f8d8c6 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -3699,7 +3699,14 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>>   	int quadrant, i, r;
>>   	hpa_t root;
>>   
>> -	root_pgd = mmu->get_guest_pgd(vcpu);
>> +	/*
>> +	 * Omit guest_cpuid_has(X86_FEATURE_LAM) check but can unconditionally
>> +	 * strip CR3 LAM bits because they resides in high reserved bits,
>> +	 * with LAM or not, those high bits should be striped anyway when
>> +	 * interpreted to pgd.
>> +	 */
> This misses the most important part: why it's safe to ignore LAM bits when reusing
> a root.
>
>> +	root_pgd = mmu->get_guest_pgd(vcpu) &
>> +		   ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
> Unconditionally clearing LAM bits is unsafe.  At some point the EPTP may define
> bits in the same area that must NOT be omitted from the root cache, e.g. the PWL
> bits in the EPTP _need_ to be incorporated into is_root_usable().

Sean, sorry that I missed the mail when I cooked & sent out the v6 patch 
of the series.

You are right that the mmu->get_guest_pgd() could be EPTP, and I do 
agree it is not
reasonable to unconditionally ignore LAM bits when the value is EPTP, 
although PWL bits
are not in the high reserved bits.


>
> For simplicity, I'm very, very tempted to say we should just leave the LAM bits
> in root.pgd, i.e. force a new root for a CR3+LAM combination.
Thanks for the suggestion.

Force a new root for a CR3+LAM combination, one concern is that if LAM bits are part of
root.pgd, then toggle of CR3 LAM bit(s) will trigger the kvm mmu reload. That means for a
process that is created without LAM bits set, after it calling prctl to enable LAM, kvm will
have to do mmu load twice. Do you think it would be a problem?


> First and foremost,
> that only matters for shadow paging.  Second, unless a guest kernel allows per-thread
> LAM settings, KVM the extra checks will be benign.  And AIUI, the proposed kernel
> implementation is to apply LAM on a per-MM basis.

Yes, the proposed linux kernel implementaion is to apply LAM on a per-MM basis, and currently
no interface provided to disable LAM after enabling it.However, for kvm, guests are not limited to linux, and not sure how LAM 
feature is enabled in other guest OSes.


>
> And I would much prefer to solve the GFN calculation generically.  E.g. it really
> should be something like this
>
> 	root_pgd = mmu->get_guest_pgd(vcpu);
> 	root_gfn = mmu->gpte_to_gfn(root_pgd);
>
> but having to set gpte_to_gfn() in the MMU is quite unfortunate, and gpte_to_gfn()
> is technically insufficient for PAE since it relies on previous checks to prevent
> consuming a 64-bit CR3.
>
> I was going to suggest extracting the maximal base addr mask and use that, e.g.
>
> 	#define __PT_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
>
> Maybe this?
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c8ebe542c565..8b2d2a6081b3 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3732,7 +3732,12 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>          hpa_t root;
>   
>          root_pgd = mmu->get_guest_pgd(vcpu);
> -       root_gfn = root_pgd >> PAGE_SHIFT;
> +
> +       /*
> +        * The guest PGD has already been checked for validity, unconditionally
> +        * strip non-address bits when computing the GFN.
> +        */
> +       root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
>   
>          if (mmu_check_root(vcpu, root_gfn))
>                  return 1;
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index cc58631e2336..c0479cbc2ca3 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -21,6 +21,7 @@ extern bool dbg;
>   #endif
>   
>   /* Page table builder macros common to shadow (host) PTEs and guest PTEs. */
> +#define __PT_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
>   #define __PT_LEVEL_SHIFT(level, bits_per_level)        \
>          (PAGE_SHIFT + ((level) - 1) * (bits_per_level))
>   #define __PT_INDEX(address, level, bits_per_level) \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 57f0b75c80f9..0583bfce3b52 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -62,7 +62,7 @@
>   #endif
>   
>   /* Common logic, but per-type values.  These also need to be undefined. */
> -#define PT_BASE_ADDR_MASK      ((pt_element_t)(((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)))
> +#define PT_BASE_ADDR_MASK      ((pt_element_t)__PT_BASE_ADDR_MASK)
>   #define PT_LVL_ADDR_MASK(lvl)  __PT_LVL_ADDR_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
>   #define PT_LVL_OFFSET_MASK(lvl)        __PT_LVL_OFFSET_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
>   #define PT_INDEX(addr, lvl)    __PT_INDEX(addr, lvl, PT_LEVEL_BITS)
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 1279db2eab44..777f7d443e3b 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -36,7 +36,7 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
>   #ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
>   #define SPTE_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1))
>   #else
> -#define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
> +#define SPTE_BASE_ADDR_MASK __PT_BASE_ADDR_MASK
>   #endif
>   
>   #define SPTE_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
>
>>   	root_gfn = root_pgd >> PAGE_SHIFT;
>>   
>>   	if (mmu_check_root(vcpu, root_gfn))
>> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
>> index 0f6455072055..57f39c7492ed 100644
>> --- a/arch/x86/kvm/mmu/paging_tmpl.h
>> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
>> @@ -324,7 +324,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>>   	trace_kvm_mmu_pagetable_walk(addr, access);
>>   retry_walk:
>>   	walker->level = mmu->cpu_role.base.level;
>> -	pte           = mmu->get_guest_pgd(vcpu);
>> +	pte           = mmu->get_guest_pgd(vcpu) & ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
> This should be unnecessary, gpte_to_gfn() is supposed to strip non-address bits.
>
>>   	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
>>   
>>   #ifdef CONFIG_X86_64
>>   	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>>   
>> @@ -1254,14 +1265,26 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>>   	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
>>   	 * the current vCPU mode is accurate.
>>   	 */
>> -	if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
>> +	if (!kvm_vcpu_is_valid_cr3(vcpu, cr3))
>>   		return 1;
>>   
>>   	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
>>   		return 1;
>>   
>> -	if (cr3 != kvm_read_cr3(vcpu))
>> -		kvm_mmu_new_pgd(vcpu, cr3);
>> +	old_cr3 = kvm_read_cr3(vcpu);
>> +	if (cr3 != old_cr3) {
>> +		if ((cr3 ^ old_cr3) & ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57)) {
>> +			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
>> +					X86_CR3_LAM_U57));
> As above, no change is needed here if LAM is tracked in the PGD.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 2/5] [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3()
  2023-03-10 20:22       ` Sean Christopherson
@ 2023-03-20 12:05         ` Binbin Wu
  2023-03-20 13:56           ` Binbin Wu
  0 siblings, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2023-03-20 12:05 UTC (permalink / raw)
  To: Sean Christopherson, Robert Hoo; +Cc: Chao Gao, pbonzini, kvm


On 3/11/2023 4:22 AM, Sean Christopherson wrote:
> As Chao pointed out, this does not belong in the LAM series.  And FWIW, I highly
> recommend NOT tagging things as Trivial.  If you're wrong and the patch _isn't_
> trivial, it only slows things down.  And if you're right, then expediting the
> patch can't possibly be necessary.
>
> On Fri, Mar 03, 2023, Robert Hoo wrote:
>> On Thu, 2023-03-02 at 15:24 +0800, Chao Gao wrote:
>>>> -	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>>>> +	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>>>>
>>>> 	if (pcid_enabled) {
>>>> 		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
>>> pcid_enabled is used only once. You can drop it, i.e.,
>>>
>>> 	if (kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)) {
>>>
>> Emm, that's actually another point.
>> Though I won't object so, wouldn't this be compiler optimized?
>>
>> And my point was: honor bool type, though in C implemention it's 0 and
>> !0, it has its own type value: true, false.
>> Implicit type casting always isn't good habit.
> I don't disagree, but I also don't particularly want to "fix" one case while
> ignoring the many others, e.g. kvm_handle_invpcid() has the exact same "buggy"
> pattern.
>
> I would be supportive of a patch that adds helpers and then converts all of the
> relevant CR0/CR4 checks though...

Hi Sean, I can cook a patch by your suggesion and sent out the patch 
seperately.


>
> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
> index 4c91f626c058..6e3cb958afdd 100644
> --- a/arch/x86/kvm/kvm_cache_regs.h
> +++ b/arch/x86/kvm/kvm_cache_regs.h
> @@ -157,6 +157,14 @@ static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask)
>          return vcpu->arch.cr0 & mask;
>   }
>   
> +static __always_inline bool kvm_is_cr0_bit_set(struct kvm_vcpu *vcpu,
> +                                              unsigned long cr0_bit)
> +{
> +       BUILD_BUG_ON(!is_power_of_2(cr0_bit));
> +
> +       return !!kvm_read_cr0_bits(vcpu, cr0_bit);
> +}
> +
>   static inline ulong kvm_read_cr0(struct kvm_vcpu *vcpu)
>   {
>          return kvm_read_cr0_bits(vcpu, ~0UL);
> @@ -178,6 +186,14 @@ static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu)
>          return vcpu->arch.cr3;
>   }
>   
> +static __always_inline bool kvm_is_cr4_bit_set(struct kvm_vcpu *vcpu,
> +                                              unsigned long cr4_bit)
> +{
> +       BUILD_BUG_ON(!is_power_of_2(cr4_bit));
> +
> +       return !!kvm_read_cr4_bits(vcpu, cr4_bit);
> +}
> +

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 2/5] [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3()
  2023-03-20 12:05         ` Binbin Wu
@ 2023-03-20 13:56           ` Binbin Wu
  2023-03-21 16:03             ` Sean Christopherson
  0 siblings, 1 reply; 33+ messages in thread
From: Binbin Wu @ 2023-03-20 13:56 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Chao Gao, Robert Hoo, pbonzini, kvm


On 3/20/2023 8:05 PM, Binbin Wu wrote:
>
> On 3/11/2023 4:22 AM, Sean Christopherson wrote:
>> As Chao pointed out, this does not belong in the LAM series.  And 
>> FWIW, I highly
>> recommend NOT tagging things as Trivial.  If you're wrong and the 
>> patch _isn't_
>> trivial, it only slows things down.  And if you're right, then 
>> expediting the
>> patch can't possibly be necessary.
>>
>> On Fri, Mar 03, 2023, Robert Hoo wrote:
>>> On Thu, 2023-03-02 at 15:24 +0800, Chao Gao wrote:
>>>>> -    bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>>>>> +    bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>>>>>
>>>>>     if (pcid_enabled) {
>>>>>         skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
>>>> pcid_enabled is used only once. You can drop it, i.e.,
>>>>
>>>>     if (kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)) {
>>>>
>>> Emm, that's actually another point.
>>> Though I won't object so, wouldn't this be compiler optimized?
>>>
>>> And my point was: honor bool type, though in C implemention it's 0 and
>>> !0, it has its own type value: true, false.
>>> Implicit type casting always isn't good habit.
>> I don't disagree, but I also don't particularly want to "fix" one 
>> case while
>> ignoring the many others, e.g. kvm_handle_invpcid() has the exact 
>> same "buggy"
>> pattern.
>>
>> I would be supportive of a patch that adds helpers and then converts 
>> all of the
>> relevant CR0/CR4 checks though...
>
> Hi Sean, I can cook a patch by your suggesion and sent out the patch 
> seperately.

Sean, besides the call of kvm_read_cr0_bits() and kvm_read_cr4_bits(), 
there are also a lot checks in if statement like
if ( cr4 & X86_CR4_XXX )
or
if ( cr0 & X86_CR0_XXX )
I suppose these usages are OK, right?


>
>
>>
>> diff --git a/arch/x86/kvm/kvm_cache_regs.h 
>> b/arch/x86/kvm/kvm_cache_regs.h
>> index 4c91f626c058..6e3cb958afdd 100644
>> --- a/arch/x86/kvm/kvm_cache_regs.h
>> +++ b/arch/x86/kvm/kvm_cache_regs.h
>> @@ -157,6 +157,14 @@ static inline ulong kvm_read_cr0_bits(struct 
>> kvm_vcpu *vcpu, ulong mask)
>>          return vcpu->arch.cr0 & mask;
>>   }
>>   +static __always_inline bool kvm_is_cr0_bit_set(struct kvm_vcpu *vcpu,
>> +                                              unsigned long cr0_bit)
>> +{
>> +       BUILD_BUG_ON(!is_power_of_2(cr0_bit));
>> +
>> +       return !!kvm_read_cr0_bits(vcpu, cr0_bit);
>> +}
>> +
>>   static inline ulong kvm_read_cr0(struct kvm_vcpu *vcpu)
>>   {
>>          return kvm_read_cr0_bits(vcpu, ~0UL);
>> @@ -178,6 +186,14 @@ static inline ulong kvm_read_cr3(struct kvm_vcpu 
>> *vcpu)
>>          return vcpu->arch.cr3;
>>   }
>>   +static __always_inline bool kvm_is_cr4_bit_set(struct kvm_vcpu *vcpu,
>> +                                              unsigned long cr4_bit)
>> +{
>> +       BUILD_BUG_ON(!is_power_of_2(cr4_bit));
>> +
>> +       return !!kvm_read_cr4_bits(vcpu, cr4_bit);
>> +}
>> +

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 2/5] [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3()
  2023-03-20 13:56           ` Binbin Wu
@ 2023-03-21 16:03             ` Sean Christopherson
  0 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2023-03-21 16:03 UTC (permalink / raw)
  To: Binbin Wu; +Cc: Chao Gao, Robert Hoo, pbonzini, kvm

On Mon, Mar 20, 2023, Binbin Wu wrote:
> 
> On 3/20/2023 8:05 PM, Binbin Wu wrote:
> > 
> > On 3/11/2023 4:22 AM, Sean Christopherson wrote:
> > > As Chao pointed out, this does not belong in the LAM series.� And
> > > FWIW, I highly
> > > recommend NOT tagging things as Trivial.� If you're wrong and the
> > > patch _isn't_
> > > trivial, it only slows things down.� And if you're right, then
> > > expediting the
> > > patch can't possibly be necessary.
> > > 
> > > On Fri, Mar 03, 2023, Robert Hoo wrote:
> > > > On Thu, 2023-03-02 at 15:24 +0800, Chao Gao wrote:
> > > > > > -��� bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> > > > > > +��� bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> > > > > > 
> > > > > > ����if (pcid_enabled) {
> > > > > > ������� skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
> > > > > pcid_enabled is used only once. You can drop it, i.e.,
> > > > > 
> > > > > ����if (kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)) {
> > > > > 
> > > > Emm, that's actually another point.
> > > > Though I won't object so, wouldn't this be compiler optimized?
> > > > 
> > > > And my point was: honor bool type, though in C implemention it's 0 and
> > > > !0, it has its own type value: true, false.
> > > > Implicit type casting always isn't good habit.
> > > I don't disagree, but I also don't particularly want to "fix" one
> > > case while
> > > ignoring the many others, e.g. kvm_handle_invpcid() has the exact
> > > same "buggy"
> > > pattern.
> > > 
> > > I would be supportive of a patch that adds helpers and then converts
> > > all of the
> > > relevant CR0/CR4 checks though...
> > 
> > Hi Sean, I can cook a patch by your suggesion and sent out the patch
> > seperately.
> 
> Sean, besides the call of kvm_read_cr0_bits() and kvm_read_cr4_bits(), there
> are also a lot checks in if statement like
> if ( cr4 & X86_CR4_XXX )
> or
> if ( cr0 & X86_CR0_XXX )
> I suppose these usages are OK, right?

Generally speaking, yes.  Most flows of that nature use a local variable for very
good reasons.  The only one I would probably convert is this code in
svm_can_emulate_instruction().  

	cr4 = kvm_read_cr4(vcpu);
	smep = cr4 & X86_CR4_SMEP;
	smap = cr4 & X86_CR4_SMAP;

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2023-03-21 16:04 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-27  8:45 [PATCH v5 0/5] Linear Address Masking (LAM) KVM Enabling Robert Hoo
2023-02-27  8:45 ` [PATCH v5 1/5] KVM: x86: Virtualize CR4.LAM_SUP Robert Hoo
2023-03-02  7:17   ` Chao Gao
2023-03-02 12:03     ` Binbin Wu
2023-03-02 13:00     ` Robert Hoo
2023-02-27  8:45 ` [PATCH v5 2/5] [Trivial]KVM: x86: Explicitly cast ulong to bool in kvm_set_cr3() Robert Hoo
2023-03-02  7:24   ` Chao Gao
2023-03-03  3:23     ` Robert Hoo
2023-03-10 20:22       ` Sean Christopherson
2023-03-20 12:05         ` Binbin Wu
2023-03-20 13:56           ` Binbin Wu
2023-03-21 16:03             ` Sean Christopherson
2023-02-27  8:45 ` [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57} Robert Hoo
2023-03-03  6:21   ` Chao Gao
2023-03-03 14:23     ` Robert Hoo
2023-03-03 15:53       ` Chao Gao
2023-03-05  1:31         ` Robert Hoo
2023-03-10 20:12   ` Sean Christopherson
2023-03-20  6:57     ` Binbin Wu
2023-02-27  8:45 ` [PATCH v5 4/5] KVM: x86: emulation: Apply LAM mask when emulating data access in 64-bit mode Robert Hoo
2023-03-02  6:41   ` Binbin Wu
2023-03-02 13:16     ` Robert Hoo
2023-03-03  1:08       ` Binbin Wu
2023-03-03  3:16         ` Robert Hoo
2023-03-03  3:35           ` Binbin Wu
2023-03-03  9:00             ` Robert Hoo
2023-03-03 10:18               ` Binbin Wu
2023-03-10 20:26         ` Sean Christopherson
2023-03-02  8:55   ` Chao Gao
2023-03-02 11:31     ` Binbin Wu
2023-03-10 20:23   ` Sean Christopherson
2023-02-27  8:45 ` [PATCH v5 5/5] KVM: x86: LAM: Expose LAM CPUID to user space VMM Robert Hoo
2023-03-03  6:46   ` Chao Gao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).