All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Linear Address Masking (LAM) KVM Enabling
@ 2022-10-17  7:04 Robert Hoo
  2022-10-17  7:04 ` [PATCH 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable Robert Hoo
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Robert Hoo @ 2022-10-17  7:04 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, Robert Hoo

===Feature Introduction===

Linear-address masking (LAM) [1], modifies the checking that is applied to
*64-bit* linear addresses, allowing software to use of the untranslated
address (upper) bits for metadata.
As for which upper bits of linear address can be borrowed, LAM has 2 modes:
LAM_48 (bits 62:48, i.e. LAM width of 15) and LAM_57 (bits 62:57, i.e. LAM
width of 6), controlled by these new bits: CR3[62] (LAM_U48), CR3[61]
(LAM_U57), and CR4[28] (LAM_SUP).

* LAM_U48 and LAM_U57 bits controls LAM for user mode address. I.e. if
  CR3.LAM_U57 = 1, LAM57 is applied; if CR3.LAM_U48 = 1 and CR3.LAM_U57 = 0,
  LAM48 is applied.
* LAM_SUP bit, combined with paging mode (4-level or 5-level), determines
  LAM status for supervisor mode address. I.e. when CR4.LAM_SUP =1, 4-level
  paging mode will have LAM48 for supervisor mode address while 5-level paging
  will have LAM57.

Note:
1. LAM applies to only data address, not to instructions.
2. LAM identification of an address as user or supervisor is based solely on the
   value of pointer bit 63 and does not, for the purposes of LAM, depend on the CPL.
3. For user mode address, it is possible that 5-level paging and LAM_U48 are both
   set, in this case, the effective usable linear address width is 48, i.e. bit
   56:47 is reserved by LAM. [2]


===LAM KVM Design===

Pass CR4.LAM_SUP under guest control.

Under EPT mode, CR3 is fully under guest control, guest LAM is thus transparent to
KVM. Nothing more need to do.

For Shadow paging (EPT = off), KVM need to handle guest CR3.LAM_U48 and CR3.LAM_U57
toggles.

Patch 1 -- This patch can be mostly independent from LAM enabling. It just renames
           CR4 reserved bits for better understanding, esp. for beginners.
	   
Patch 2, 9 -- Common part for both EPT and Shadow Paging modes enabling.

Patch 3 ~ 8 -- For Shadow Paging mode LAM enabling.


This patch set is based on Kirill's up-to-date LAM Kernel enabling
(e3e52d2898d66c34eefbe09cbeae0d3ba53fb989)
https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam

Unit tested with self test tools in both host and VM, passed.

[1] ISE Chap10 https://cdrdv2.intel.com/v1/dl/getContent/671368 (Section 10.6 VMX interaction)
[2] Thus currently, Kernel enabling patch only enables LAM57 mode. https://lore.kernel.org/lkml/20220815041803.17954-1-kirill.shutemov@linux.intel.com/


Robert Hoo (9):
  KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
  KVM: x86: Add CR4.LAM_SUP in guest owned bits
  KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for
    pgd
  [Trivial] KVM: x86: MMU: Commets update
  KVM: x86: MMU: Integrate LAM bits when build guest CR3
  KVM: x86: Untag LAM bits when applicable
  KVM: x86: When judging setting CR3 valid or not, consider LAM bits
  KVM: x86: When guest set CR3, handle LAM bits semantics
  KVM: x86: LAM: Expose LAM CPUID to user space VMM

 arch/x86/include/asm/kvm_host.h        |  7 ++--
 arch/x86/include/asm/processor-flags.h |  1 +
 arch/x86/kvm/cpuid.c                   |  6 +--
 arch/x86/kvm/kvm_cache_regs.h          |  3 +-
 arch/x86/kvm/mmu.h                     |  5 +++
 arch/x86/kvm/mmu/mmu.c                 | 16 +++++---
 arch/x86/kvm/vmx/vmx.c                 |  8 +++-
 arch/x86/kvm/x86.c                     | 53 ++++++++++++++++++++------
 arch/x86/kvm/x86.h                     | 43 ++++++++++++++++++++-
 9 files changed, 114 insertions(+), 28 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
  2022-10-17  7:04 [PATCH 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
@ 2022-10-17  7:04 ` Robert Hoo
  2022-10-17  7:04 ` [PATCH 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits Robert Hoo
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Robert Hoo @ 2022-10-17  7:04 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, Robert Hoo

kvm_vcpu_arch::cr4_guest_owned_bits and kvm_vcpu_arch::cr4_guest_rsvd_bits
looks confusing. Rename latter to cr4_host_rsvd_bits, because it in fact
decribes the effective host reserved cr4 bits from the vcpu's perspective.

Meanwhile, rename other related variables/macros to be better descriptive:
* CR4_RESERVED_BITS --> CR4_HOST_RESERVED_BITS, which describes host bare
metal CR4 reserved bits.

* cr4_reserved_bits --> cr4_kvm_reserved_bits, which describes
CR4_HOST_RESERVED_BITS + !kvm_cap_has() = kvm level cr4 reserved bits.

* __cr4_reserved_bits() --> __cr4_calc_reserved_bits(), which to calc
effective cr4 reserved bits for kvm or vm level, by corresponding
x_cpu_has() input.

Thus, by these renames, the hierarchical relations of those reserved CR4
bits is more clear.

Just renames, no functional changes intended.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/cpuid.c            |  4 ++--
 arch/x86/kvm/vmx/vmx.c          |  2 +-
 arch/x86/kvm/x86.c              | 12 ++++++------
 arch/x86/kvm/x86.h              |  4 ++--
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5ffa578cafe1..4858436c64ef 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -114,7 +114,7 @@
 			  | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
 			  | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG))
 
-#define CR4_RESERVED_BITS                                               \
+#define CR4_HOST_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
 			  | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE     \
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
@@ -654,7 +654,7 @@ struct kvm_vcpu_arch {
 	unsigned long cr3;
 	unsigned long cr4;
 	unsigned long cr4_guest_owned_bits;
-	unsigned long cr4_guest_rsvd_bits;
+	unsigned long cr4_host_rsvd_bits;
 	unsigned long cr8;
 	u32 host_pkru;
 	u32 pkru;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 75dcf7a72605..b935b3b04a7e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -338,8 +338,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
 
 	kvm_pmu_refresh(vcpu);
-	vcpu->arch.cr4_guest_rsvd_bits =
-	    __cr4_reserved_bits(guest_cpuid_has, vcpu);
+	vcpu->arch.cr4_host_rsvd_bits =
+	    __cr4_calc_reserved_bits(guest_cpuid_has, vcpu);
 
 	kvm_hv_set_cpuid(vcpu);
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d7f8331d6f7e..97a2b8759ce8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4294,7 +4294,7 @@ void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
 	struct kvm_vcpu *vcpu = &vmx->vcpu;
 
 	vcpu->arch.cr4_guest_owned_bits = KVM_POSSIBLE_CR4_GUEST_BITS &
-					  ~vcpu->arch.cr4_guest_rsvd_bits;
+					  ~vcpu->arch.cr4_host_rsvd_bits;
 	if (!enable_ept) {
 		vcpu->arch.cr4_guest_owned_bits &= ~X86_CR4_TLBFLUSH_BITS;
 		vcpu->arch.cr4_guest_owned_bits &= ~X86_CR4_PDPTR_BITS;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 205ebdc2b11b..05a40ab7cda2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -108,7 +108,7 @@ u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
 static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 #endif
 
-static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
+static u64 __read_mostly cr4_kvm_reserved_bits = CR4_HOST_RESERVED_BITS;
 
 #define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
 
@@ -1082,10 +1082,10 @@ EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);
 
 bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
-	if (cr4 & cr4_reserved_bits)
+	if (cr4 & cr4_kvm_reserved_bits)
 		return false;
 
-	if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
+	if (cr4 & vcpu->arch.cr4_host_rsvd_bits)
 		return false;
 
 	return true;
@@ -11965,7 +11965,7 @@ int kvm_arch_hardware_setup(void *opaque)
 		kvm_caps.supported_xss = 0;
 
 #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
-	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
+	cr4_kvm_reserved_bits = __cr4_calc_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
 #undef __kvm_cpu_cap_has
 
 	if (kvm_caps.has_tsc_control) {
@@ -11998,8 +11998,8 @@ int kvm_arch_check_processor_compat(void *opaque)
 
 	WARN_ON(!irqs_disabled());
 
-	if (__cr4_reserved_bits(cpu_has, c) !=
-	    __cr4_reserved_bits(cpu_has, &boot_cpu_data))
+	if (__cr4_calc_reserved_bits(cpu_has, c) !=
+	    __cr4_calc_reserved_bits(cpu_has, &boot_cpu_data))
 		return -EIO;
 
 	return ops->check_processor_compatibility();
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 1926d2cb8e79..4473bc0ba0f1 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -448,9 +448,9 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 #define  KVM_MSR_RET_INVALID	2	/* in-kernel MSR emulation #GP condition */
 #define  KVM_MSR_RET_FILTERED	3	/* #GP due to userspace MSR filter */
 
-#define __cr4_reserved_bits(__cpu_has, __c)             \
+#define __cr4_calc_reserved_bits(__cpu_has, __c)             \
 ({                                                      \
-	u64 __reserved_bits = CR4_RESERVED_BITS;        \
+	u64 __reserved_bits = CR4_HOST_RESERVED_BITS;        \
                                                         \
 	if (!__cpu_has(__c, X86_FEATURE_XSAVE))         \
 		__reserved_bits |= X86_CR4_OSXSAVE;     \
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits
  2022-10-17  7:04 [PATCH 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
  2022-10-17  7:04 ` [PATCH 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable Robert Hoo
@ 2022-10-17  7:04 ` Robert Hoo
  2022-10-17  7:04 ` [PATCH 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd Robert Hoo
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Robert Hoo @ 2022-10-17  7:04 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, Robert Hoo, Jingqi Liu

If LAM enabled, CR4.LAM_SUP is owned by guest; otherwise, reserved.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/kvm_cache_regs.h   | 3 ++-
 arch/x86/kvm/x86.h              | 2 ++
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4858436c64ef..e961fbd12833 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -120,7 +120,8 @@
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+			  | X86_CR4_LAM_SUP))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 3febc342360c..917f1b770839 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -7,7 +7,8 @@
 #define KVM_POSSIBLE_CR0_GUEST_BITS X86_CR0_TS
 #define KVM_POSSIBLE_CR4_GUEST_BITS				  \
 	(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR  \
-	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)
+	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE \
+	 | X86_CR4_LAM_SUP)
 
 #define X86_CR0_PDPTR_BITS    (X86_CR0_CD | X86_CR0_NW | X86_CR0_PG)
 #define X86_CR4_TLBFLUSH_BITS (X86_CR4_PGE | X86_CR4_PCIDE | X86_CR4_PAE | X86_CR4_SMEP)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 4473bc0ba0f1..c55d9e517d01 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -470,6 +470,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 		__reserved_bits |= X86_CR4_VMXE;        \
 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
 		__reserved_bits |= X86_CR4_PCIDE;       \
+	if (!__cpu_has(__c, X86_FEATURE_LAM))		\
+		__reserved_bits |= X86_CR4_LAM_SUP;	\
 	__reserved_bits;                                \
 })
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd
  2022-10-17  7:04 [PATCH 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
  2022-10-17  7:04 ` [PATCH 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable Robert Hoo
  2022-10-17  7:04 ` [PATCH 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits Robert Hoo
@ 2022-10-17  7:04 ` Robert Hoo
  2022-10-17  7:04 ` [PATCH 4/9] [Trivial] KVM: x86: MMU: Commets update Robert Hoo
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Robert Hoo @ 2022-10-17  7:04 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, Robert Hoo, Jingqi Liu

The get_cr3() is the implementation of kvm_mmu::get_guest_pgd(), well, CR3
cannot be naturally equivalent to pgd, SDM says CR3 high bits are reserved,
must be zero.
And now, with LAM feature's introduction, bit 61 ~ 62 are used.
So, rename get_cr3() --> get_pgd() to better indicate function purpose and
in it, filtered out CR3 high bits.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/include/asm/processor-flags.h |  1 +
 arch/x86/kvm/mmu/mmu.c                 | 10 +++++-----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index d8cccadc83a6..bb0f8dd16956 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -38,6 +38,7 @@
 #ifdef CONFIG_X86_64
 /* Mask off the address space ID and SME encryption bits. */
 #define CR3_ADDR_MASK	__sme_clr(PHYSICAL_PAGE_MASK)
+#define CR3_HIGH_RSVD_MASK	GENMASK_ULL(63, 52)
 #define CR3_PCID_MASK	0xFFFull
 #define CR3_NOFLUSH	BIT_ULL(63)
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index eccddb136954..385a1a9b1ac4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4473,9 +4473,9 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
 
-static unsigned long get_cr3(struct kvm_vcpu *vcpu)
+static unsigned long get_pgd(struct kvm_vcpu *vcpu)
 {
-	return kvm_read_cr3(vcpu);
+	return kvm_read_cr3(vcpu) & ~CR3_HIGH_RSVD_MASK;
 }
 
 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
@@ -5028,7 +5028,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu,
 	context->page_fault = kvm_tdp_page_fault;
 	context->sync_page = nonpaging_sync_page;
 	context->invlpg = NULL;
-	context->get_guest_pgd = get_cr3;
+	context->get_guest_pgd = get_pgd;
 	context->get_pdptr = kvm_pdptr_read;
 	context->inject_page_fault = kvm_inject_page_fault;
 
@@ -5178,7 +5178,7 @@ static void init_kvm_softmmu(struct kvm_vcpu *vcpu,
 
 	kvm_init_shadow_mmu(vcpu, cpu_role);
 
-	context->get_guest_pgd     = get_cr3;
+	context->get_guest_pgd     = get_pgd;
 	context->get_pdptr         = kvm_pdptr_read;
 	context->inject_page_fault = kvm_inject_page_fault;
 }
@@ -5192,7 +5192,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu,
 		return;
 
 	g_context->cpu_role.as_u64   = new_mode.as_u64;
-	g_context->get_guest_pgd     = get_cr3;
+	g_context->get_guest_pgd     = get_pgd;
 	g_context->get_pdptr         = kvm_pdptr_read;
 	g_context->inject_page_fault = kvm_inject_page_fault;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/9] [Trivial] KVM: x86: MMU: Commets update
  2022-10-17  7:04 [PATCH 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (2 preceding siblings ...)
  2022-10-17  7:04 ` [PATCH 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd Robert Hoo
@ 2022-10-17  7:04 ` Robert Hoo
  2022-10-17  7:04 ` [PATCH 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3 Robert Hoo
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Robert Hoo @ 2022-10-17  7:04 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, Robert Hoo

kvm_mmu_ensure_valid_pgd() is stale. Update the comments according to
latest code.

No function changes.

P.S. Sean firstly noticed this in
https://lore.kernel.org/kvm/Yg%2FguAXFLJBmDflh@google.com/.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 385a1a9b1ac4..315456f30964 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4437,8 +4437,12 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	union kvm_mmu_page_role new_role = mmu->root_role;
 
+	/*
+	 * If no root is found in cache, current active root.hpa will be (set)
+	 * INVALID_PAGE, a new root will be set up during vcpu_enter_guest()
+	 * --> kvm_mmu_reload().
+	 */
 	if (!fast_pgd_switch(vcpu->kvm, mmu, new_pgd, new_role)) {
-		/* kvm_mmu_ensure_valid_pgd will set up a new root.  */
 		return;
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3
  2022-10-17  7:04 [PATCH 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (3 preceding siblings ...)
  2022-10-17  7:04 ` [PATCH 4/9] [Trivial] KVM: x86: MMU: Commets update Robert Hoo
@ 2022-10-17  7:04 ` Robert Hoo
  2022-10-17  7:04 ` [PATCH 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Robert Hoo @ 2022-10-17  7:04 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, Robert Hoo, Jingqi Liu

When calc the new CR3 value, take LAM bits in.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/kvm/mmu.h     | 5 +++++
 arch/x86/kvm/vmx/vmx.c | 3 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6bdaacb6faa0..866f2b7cb509 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -142,6 +142,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
 	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
 }
 
+static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu)
+{
+	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
+}
+
 static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 {
 	u64 root_hpa = vcpu->arch.mmu->root.hpa;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 97a2b8759ce8..ffb82daee1d3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3305,7 +3305,8 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			update_guest_cr3 = false;
 		vmx_ept_load_pdptrs(vcpu);
 	} else {
-		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu) |
+			    kvm_get_active_lam(vcpu);
 	}
 
 	if (update_guest_cr3)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/9] KVM: x86: Untag LAM bits when applicable
  2022-10-17  7:04 [PATCH 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (4 preceding siblings ...)
  2022-10-17  7:04 ` [PATCH 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3 Robert Hoo
@ 2022-10-17  7:04 ` Robert Hoo
  2022-10-17  7:04 ` [PATCH 7/9] KVM: x86: When judging setting CR3 valid or not, consider LAM bits Robert Hoo
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Robert Hoo @ 2022-10-17  7:04 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, Robert Hoo, Jingqi Liu

Define kvm_untagged_addr() per LAM feature spec: Address high bits are sign
extended, from highest effective address bit.
Note that LAM_U48 and LA57 has some effective bits overlap. This patch
gives a WARN() on that case.

Now the only applicable possible case that addresses passed down from VM
with LAM bits is those for MPX MSRs.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/kvm/vmx/vmx.c |  3 +++
 arch/x86/kvm/x86.c     |  5 +++++
 arch/x86/kvm/x86.h     | 37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ffb82daee1d3..76c9f4b8b340 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2116,6 +2116,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		    (!msr_info->host_initiated &&
 		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
 			return 1;
+
+		data = kvm_untagged_addr(data, vcpu);
+
 		if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
 		    (data & MSR_IA32_BNDCFGS_RSVD))
 			return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a40ab7cda2..3fa532cd1911 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1780,6 +1780,11 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
 	case MSR_KERNEL_GS_BASE:
 	case MSR_CSTAR:
 	case MSR_LSTAR:
+		/*
+		 * LAM applies only addresses used for data accesses.
+		 * Tagged address should never reach here.
+		 * Strict canonical check still applies here.
+		 */
 		if (is_noncanonical_address(data, vcpu))
 			return 1;
 		break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index c55d9e517d01..f01a2ed9d3c0 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -187,11 +187,48 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
 	return kvm_read_cr4_bits(vcpu, X86_CR4_LA57) ? 57 : 48;
 }
 
+static inline u64 get_canonical(u64 la, u8 vaddr_bits)
+{
+	return ((int64_t)la << (64 - vaddr_bits)) >> (64 - vaddr_bits);
+}
+
 static inline bool is_noncanonical_address(u64 la, struct kvm_vcpu *vcpu)
 {
 	return !__is_canonical_address(la, vcpu_virt_addr_bits(vcpu));
 }
 
+#ifdef CONFIG_X86_64
+/* untag addr for guest, according to vCPU CR3 and CR4 settings */
+static inline u64 kvm_untagged_addr(u64 addr, struct kvm_vcpu *vcpu)
+{
+	if (addr >> 63 == 0) {
+		/* User pointers */
+		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
+			addr = get_canonical(addr, 57);
+		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48) {
+			/*
+			 * If guest enabled 5-level paging and LAM_U48,
+			 * bit 47 should be 0, bit 48:56 contains meta data
+			 * although bit 47:56 are valid 5-level address
+			 * bits.
+			 * If LAM_U48 and 4-level paging, bit47 is 0.
+			 */
+			WARN_ON(addr & _BITUL(47));
+			addr = get_canonical(addr, 48);
+		}
+	} else if (kvm_read_cr4(vcpu) & X86_CR4_LAM_SUP) { /* Supervisor pointers */
+		if (kvm_read_cr4(vcpu) & X86_CR4_LA57)
+			addr = get_canonical(addr, 57);
+		else
+			addr = get_canonical(addr, 48);
+	}
+
+	return addr;
+}
+#else
+#define kvm_untagged_addr(addr, vcpu)	(addr)
+#endif
+
 static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
 					gva_t gva, gfn_t gfn, unsigned access)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/9] KVM: x86: When judging setting CR3 valid or not, consider LAM bits
  2022-10-17  7:04 [PATCH 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (5 preceding siblings ...)
  2022-10-17  7:04 ` [PATCH 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
@ 2022-10-17  7:04 ` Robert Hoo
  2022-10-17  7:04 ` [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics Robert Hoo
  2022-10-17  7:04 ` [PATCH 9/9] KVM: x86: LAM: Expose LAM CPUID to user space VMM Robert Hoo
  8 siblings, 0 replies; 19+ messages in thread
From: Robert Hoo @ 2022-10-17  7:04 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, Robert Hoo, Jingqi Liu

Before apply to kvm_vcpu_is_illegal_gpa(), clear LAM bits if it's valid.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/kvm/x86.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3fa532cd1911..e9b465bff8d3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1217,6 +1217,14 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
 	kvm_mmu_free_roots(vcpu->kvm, mmu, roots_to_free);
 }
 
+static bool kvm_is_valid_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
+{
+	if (guest_cpuid_has(vcpu, X86_FEATURE_LAM))
+		cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
+
+	return kvm_vcpu_is_legal_gpa(vcpu, cr3);
+}
+
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	bool skip_tlb_flush = false;
@@ -1240,7 +1248,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
 	 * the current vCPU mode is accurate.
 	 */
-	if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
+	if (!kvm_is_valid_cr3(vcpu, cr3))
 		return 1;
 
 	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-10-17  7:04 [PATCH 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (6 preceding siblings ...)
  2022-10-17  7:04 ` [PATCH 7/9] KVM: x86: When judging setting CR3 valid or not, consider LAM bits Robert Hoo
@ 2022-10-17  7:04 ` Robert Hoo
  2022-10-31  2:59   ` Kirill A. Shutemov
  2022-10-17  7:04 ` [PATCH 9/9] KVM: x86: LAM: Expose LAM CPUID to user space VMM Robert Hoo
  8 siblings, 1 reply; 19+ messages in thread
From: Robert Hoo @ 2022-10-17  7:04 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, Robert Hoo

When only changes LAM bits, ask next vcpu run to load mmu pgd, so that it
will build new CR3 with LAM bits updates. No TLB flush needed on this case.
When changes on effective addresses, no matter LAM bits changes or not, go
through normal pgd update process.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 arch/x86/kvm/x86.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9b465bff8d3..fb779f88ae88 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1228,9 +1228,9 @@ static bool kvm_is_valid_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	bool skip_tlb_flush = false;
-	unsigned long pcid = 0;
+	unsigned long pcid = 0, old_cr3;
 #ifdef CONFIG_X86_64
-	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
+	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
 
 	if (pcid_enabled) {
 		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
@@ -1243,6 +1243,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
 		goto handle_tlb_flush;
 
+	if (!guest_cpuid_has(vcpu, X86_FEATURE_LAM) &&
+	    (cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57)))
+		return	1;
+
 	/*
 	 * Do not condition the GPA check on long mode, this helper is used to
 	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
@@ -1254,8 +1258,22 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
 		return 1;
 
-	if (cr3 != kvm_read_cr3(vcpu))
-		kvm_mmu_new_pgd(vcpu, cr3);
+	old_cr3 = kvm_read_cr3(vcpu);
+	if (cr3 != old_cr3) {
+		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
+			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
+					X86_CR3_LAM_U57));
+		} else {
+			/* Only LAM conf changes, no tlb flush needed */
+			skip_tlb_flush = true;
+			/*
+			 * Though effective addr no change, mark the
+			 * request so that LAM bits will take effect
+			 * when enter guest.
+			 */
+			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
+		}
+	}
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 9/9] KVM: x86: LAM: Expose LAM CPUID to user space VMM
  2022-10-17  7:04 [PATCH 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (7 preceding siblings ...)
  2022-10-17  7:04 ` [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics Robert Hoo
@ 2022-10-17  7:04 ` Robert Hoo
  8 siblings, 0 replies; 19+ messages in thread
From: Robert Hoo @ 2022-10-17  7:04 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, Robert Hoo, Jingqi Liu

LAM feature is enumerated by (EAX=07H, ECX=01H):EAX.LAM[bit26].

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/kvm/cpuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b935b3b04a7e..89bcac3e4ac4 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -636,7 +636,7 @@ void kvm_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
 
 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
-		F(AVX_VNNI) | F(AVX512_BF16)
+		F(AVX_VNNI) | F(AVX512_BF16) | F(LAM)
 	);
 
 	kvm_cpu_cap_mask(CPUID_D_1_EAX,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-10-17  7:04 ` [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics Robert Hoo
@ 2022-10-31  2:59   ` Kirill A. Shutemov
  2022-11-01  1:46     ` Robert Hoo
  0 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2022-10-31  2:59 UTC (permalink / raw)
  To: Robert Hoo; +Cc: seanjc, pbonzini, kvm

On Mon, Oct 17, 2022 at 03:04:49PM +0800, Robert Hoo wrote:
> When only changes LAM bits, ask next vcpu run to load mmu pgd, so that it
> will build new CR3 with LAM bits updates. No TLB flush needed on this case.
> When changes on effective addresses, no matter LAM bits changes or not, go
> through normal pgd update process.
> 
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> ---
>  arch/x86/kvm/x86.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e9b465bff8d3..fb779f88ae88 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1228,9 +1228,9 @@ static bool kvm_is_valid_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  {
>  	bool skip_tlb_flush = false;
> -	unsigned long pcid = 0;
> +	unsigned long pcid = 0, old_cr3;
>  #ifdef CONFIG_X86_64
> -	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> +	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>  
>  	if (pcid_enabled) {
>  		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
> @@ -1243,6 +1243,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  	if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
>  		goto handle_tlb_flush;
>  
> +	if (!guest_cpuid_has(vcpu, X86_FEATURE_LAM) &&
> +	    (cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57)))
> +		return	1;
> +
>  	/*
>  	 * Do not condition the GPA check on long mode, this helper is used to
>  	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
> @@ -1254,8 +1258,22 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
>  		return 1;
>  
> -	if (cr3 != kvm_read_cr3(vcpu))
> -		kvm_mmu_new_pgd(vcpu, cr3);
> +	old_cr3 = kvm_read_cr3(vcpu);
> +	if (cr3 != old_cr3) {
> +		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
> +			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
> +					X86_CR3_LAM_U57));
> +		} else {
> +			/* Only LAM conf changes, no tlb flush needed */
> +			skip_tlb_flush = true;

I'm not sure about this.

Consider case when LAM_U48 gets enabled on 5-level paging machines. We may
have valid TLB entries for addresses above 47-bit. It's kinda broken case,
but seems valid from architectural PoV, no?

I guess after enabling LAM, these entries will never match. But if LAM
gets disabled again they will become active. Hm?

Maybe just flush?

> +			/*
> +			 * Though effective addr no change, mark the
> +			 * request so that LAM bits will take effect
> +			 * when enter guest.
> +			 */
> +			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
> +		}
> +	}
>  
>  	vcpu->arch.cr3 = cr3;
>  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> -- 
> 2.31.1
> 

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-10-31  2:59   ` Kirill A. Shutemov
@ 2022-11-01  1:46     ` Robert Hoo
  2022-11-01  2:04       ` Kirill A. Shutemov
  0 siblings, 1 reply; 19+ messages in thread
From: Robert Hoo @ 2022-11-01  1:46 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: seanjc, pbonzini, kvm

On Mon, 2022-10-31 at 05:59 +0300, Kirill A. Shutemov wrote:
> On Mon, Oct 17, 2022 at 03:04:49PM +0800, Robert Hoo wrote:
> > When only changes LAM bits, ask next vcpu run to load mmu pgd, so
> > that it
> > will build new CR3 with LAM bits updates. No TLB flush needed on
> > this case.
> > When changes on effective addresses, no matter LAM bits changes or
> > not, go
> > through normal pgd update process.
> > 
> > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > ---
> >  arch/x86/kvm/x86.c | 26 ++++++++++++++++++++++----
> >  1 file changed, 22 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index e9b465bff8d3..fb779f88ae88 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1228,9 +1228,9 @@ static bool kvm_is_valid_cr3(struct kvm_vcpu
> > *vcpu, unsigned long cr3)
> >  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >  {
> >  	bool skip_tlb_flush = false;
> > -	unsigned long pcid = 0;
> > +	unsigned long pcid = 0, old_cr3;
> >  #ifdef CONFIG_X86_64
> > -	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> > +	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> >  
> >  	if (pcid_enabled) {
> >  		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
> > @@ -1243,6 +1243,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu,
> > unsigned long cr3)
> >  	if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
> >  		goto handle_tlb_flush;
> >  
> > +	if (!guest_cpuid_has(vcpu, X86_FEATURE_LAM) &&
> > +	    (cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57)))
> > +		return	1;
> > +
> >  	/*
> >  	 * Do not condition the GPA check on long mode, this helper is
> > used to
> >  	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee
> > that
> > @@ -1254,8 +1258,22 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu,
> > unsigned long cr3)
> >  	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
> >  		return 1;
> >  
> > -	if (cr3 != kvm_read_cr3(vcpu))
> > -		kvm_mmu_new_pgd(vcpu, cr3);
> > +	old_cr3 = kvm_read_cr3(vcpu);
> > +	if (cr3 != old_cr3) {
> > +		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
> > +			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
> > +					X86_CR3_LAM_U57));
> > +		} else {
> > +			/* Only LAM conf changes, no tlb flush needed
> > */
> > +			skip_tlb_flush = true;
> 
> I'm not sure about this.
> 
> Consider case when LAM_U48 gets enabled on 5-level paging machines.
> We may
> have valid TLB entries for addresses above 47-bit. It's kinda broken
> case,
> but seems valid from architectural PoV, no?

You're right, thanks Kirill.

I noticed in your Kernel enabling, because of this LAM_U48 and LA_57
overlapping, you enabled LAM_U57 only for simplicity at this moment. I
thought at that time, that this trickiness will be contained in Kernel
layer, but now it turns out at least non-EPT KVM MMU is not spared.
> 
> I guess after enabling LAM, these entries will never match. But if
> LAM
> gets disabled again they will become active. Hm?
> 
> Maybe just flush?

Now we have 2 options
1. as you suggested, just flush
2. more precisely identify the case Guest.LA57 && (CR3.bit[62:61] 00
-->10 switching), flush. (LAM_U57 bit take precedence over LAM_U48,
from spec.)

Considering CR3 change is relatively hot path, and tlb flush is heavy,
I lean towards option 2. Your opinion? 
 
> 
> > +			/*
> > +			 * Though effective addr no change, mark the
> > +			 * request so that LAM bits will take effect
> > +			 * when enter guest.
> > +			 */
> > +			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
> > +		}
> > +	}
> >  
> >  	vcpu->arch.cr3 = cr3;
> >  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> > -- 
> > 2.31.1
> > 
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-11-01  1:46     ` Robert Hoo
@ 2022-11-01  2:04       ` Kirill A. Shutemov
  2022-11-01  2:26         ` Robert Hoo
  2022-11-02  7:29         ` Robert Hoo
  0 siblings, 2 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2022-11-01  2:04 UTC (permalink / raw)
  To: Robert Hoo; +Cc: seanjc, pbonzini, kvm

On Tue, Nov 01, 2022 at 09:46:39AM +0800, Robert Hoo wrote:
> On Mon, 2022-10-31 at 05:59 +0300, Kirill A. Shutemov wrote:
> > On Mon, Oct 17, 2022 at 03:04:49PM +0800, Robert Hoo wrote:
> > > When only changes LAM bits, ask next vcpu run to load mmu pgd, so
> > > that it
> > > will build new CR3 with LAM bits updates. No TLB flush needed on
> > > this case.
> > > When changes on effective addresses, no matter LAM bits changes or
> > > not, go
> > > through normal pgd update process.
> > > 
> > > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > > ---
> > >  arch/x86/kvm/x86.c | 26 ++++++++++++++++++++++----
> > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index e9b465bff8d3..fb779f88ae88 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -1228,9 +1228,9 @@ static bool kvm_is_valid_cr3(struct kvm_vcpu
> > > *vcpu, unsigned long cr3)
> > >  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> > >  {
> > >  	bool skip_tlb_flush = false;
> > > -	unsigned long pcid = 0;
> > > +	unsigned long pcid = 0, old_cr3;
> > >  #ifdef CONFIG_X86_64
> > > -	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> > > +	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> > >  
> > >  	if (pcid_enabled) {
> > >  		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
> > > @@ -1243,6 +1243,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu,
> > > unsigned long cr3)
> > >  	if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
> > >  		goto handle_tlb_flush;
> > >  
> > > +	if (!guest_cpuid_has(vcpu, X86_FEATURE_LAM) &&
> > > +	    (cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57)))
> > > +		return	1;
> > > +
> > >  	/*
> > >  	 * Do not condition the GPA check on long mode, this helper is
> > > used to
> > >  	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee
> > > that
> > > @@ -1254,8 +1258,22 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu,
> > > unsigned long cr3)
> > >  	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
> > >  		return 1;
> > >  
> > > -	if (cr3 != kvm_read_cr3(vcpu))
> > > -		kvm_mmu_new_pgd(vcpu, cr3);
> > > +	old_cr3 = kvm_read_cr3(vcpu);
> > > +	if (cr3 != old_cr3) {
> > > +		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
> > > +			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
> > > +					X86_CR3_LAM_U57));
> > > +		} else {
> > > +			/* Only LAM conf changes, no tlb flush needed
> > > */
> > > +			skip_tlb_flush = true;
> > 
> > I'm not sure about this.
> > 
> > Consider case when LAM_U48 gets enabled on 5-level paging machines.
> > We may
> > have valid TLB entries for addresses above 47-bit. It's kinda broken
> > case,
> > but seems valid from architectural PoV, no?
> 
> You're right, thanks Kirill.
> 
> I noticed in your Kernel enabling, because of this LAM_U48 and LA_57
> overlapping, you enabled LAM_U57 only for simplicity at this moment. I
> thought at that time, that this trickiness will be contained in Kernel
> layer, but now it turns out at least non-EPT KVM MMU is not spared.
> > 
> > I guess after enabling LAM, these entries will never match. But if
> > LAM
> > gets disabled again they will become active. Hm?
> > 
> > Maybe just flush?
> 
> Now we have 2 options
> 1. as you suggested, just flush
> 2. more precisely identify the case Guest.LA57 && (CR3.bit[62:61] 00
> -->10 switching), flush. (LAM_U57 bit take precedence over LAM_U48,
> from spec.)
> 
> Considering CR3 change is relatively hot path, and tlb flush is heavy,
> I lean towards option 2. Your opinion? 

11 in bits [62:61] is also considered LAM_U57. So your option 2 is broken.

And I don't buy argument about hot path: the case we talking about is
about enabling/disabling LAM with constant PGD. It's not hot path by any
mean.

Let's not be fancy. Just flush TLB.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-11-01  2:04       ` Kirill A. Shutemov
@ 2022-11-01  2:26         ` Robert Hoo
  2022-11-02  7:29         ` Robert Hoo
  1 sibling, 0 replies; 19+ messages in thread
From: Robert Hoo @ 2022-11-01  2:26 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: seanjc, pbonzini, kvm

On Tue, 2022-11-01 at 05:04 +0300, Kirill A. Shutemov wrote:
> > > > @@ -1254,8 +1258,22 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu,
> > > > unsigned long cr3)
> > > >  	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
> > > >  		return 1;
> > > >  
> > > > -	if (cr3 != kvm_read_cr3(vcpu))
> > > > -		kvm_mmu_new_pgd(vcpu, cr3);
> > > > +	old_cr3 = kvm_read_cr3(vcpu);
> > > > +	if (cr3 != old_cr3) {
> > > > +		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
> > > > +			kvm_mmu_new_pgd(vcpu, cr3 &
> > > > ~(X86_CR3_LAM_U48 |
> > > > +					X86_CR3_LAM_U57));
> > > > +		} else {
> > > > +			/* Only LAM conf changes, no tlb flush
> > > > needed
> > > > */
> > > > +			skip_tlb_flush = true;
> > > 
> > > I'm not sure about this.
> > > 
> > > Consider case when LAM_U48 gets enabled on 5-level paging
> > > machines.
> > > We may
> > > have valid TLB entries for addresses above 47-bit. It's kinda
> > > broken
> > > case,
> > > but seems valid from architectural PoV, no?
> > 
> > You're right, thanks Kirill.
> > 
> > I noticed in your Kernel enabling, because of this LAM_U48 and
> > LA_57
> > overlapping, you enabled LAM_U57 only for simplicity at this
> > moment. I
> > thought at that time, that this trickiness will be contained in
> > Kernel
> > layer, but now it turns out at least non-EPT KVM MMU is not spared.
> > > 
> > > I guess after enabling LAM, these entries will never match. But
> > > if
> > > LAM
> > > gets disabled again they will become active. Hm?
> > > 
> > > Maybe just flush?
> > 
> > Now we have 2 options
> > 1. as you suggested, just flush
> > 2. more precisely identify the case Guest.LA57 && (CR3.bit[62:61]
> > 00
> > -->10 switching), flush. (LAM_U57 bit take precedence over LAM_U48,
> > from spec.)
> > 
> > Considering CR3 change is relatively hot path, and tlb flush is
> > heavy,
> > I lean towards option 2. Your opinion? 
> 
> 11 in bits [62:61] is also considered LAM_U57. So your option 2 is
> broken.

I don't get you here. Yes, 11 is also LAM_U57, but LAM_U57 is safe (6
bit).
Anyway, flush, as below hot path doesn't hold.
> 
> And I don't buy argument about hot path: the case we talking about is
> about enabling/disabling LAM with constant PGD. 

Right.

> It's not hot path by any
> mean.
> 
> Let's not be fancy. Just flush TLB.
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-11-01  2:04       ` Kirill A. Shutemov
  2022-11-01  2:26         ` Robert Hoo
@ 2022-11-02  7:29         ` Robert Hoo
  2022-11-02 21:05           ` Kirill A. Shutemov
  1 sibling, 1 reply; 19+ messages in thread
From: Robert Hoo @ 2022-11-02  7:29 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: seanjc, pbonzini, kvm

On Tue, 2022-11-01 at 05:04 +0300, Kirill A. Shutemov wrote:
...
> > > > -	if (cr3 != kvm_read_cr3(vcpu))
> > > > -		kvm_mmu_new_pgd(vcpu, cr3);
> > > > +	old_cr3 = kvm_read_cr3(vcpu);
> > > > +	if (cr3 != old_cr3) {
> > > > +		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
> > > > +			kvm_mmu_new_pgd(vcpu, cr3 &
> > > > ~(X86_CR3_LAM_U48 |
> > > > +					X86_CR3_LAM_U57));
> > > > +		} else {
> > > > +			/* Only LAM conf changes, no tlb flush
> > > > needed
> > > > */
> > > > +			skip_tlb_flush = true;
> > > 
> > > I'm not sure about this.
> > > 
> > > Consider case when LAM_U48 gets enabled on 5-level paging
> > > machines.
> > > We may
> > > have valid TLB entries for addresses above 47-bit. It's kinda
> > > broken
> > > case,
> > > but seems valid from architectural PoV, no?
> > 
> > You're right, thanks Kirill.
> > 
> > I noticed in your Kernel enabling, because of this LAM_U48 and
> > LA_57
> > overlapping, you enabled LAM_U57 only for simplicity at this
> > moment. I
> > thought at that time, that this trickiness will be contained in
> > Kernel
> > layer, but now it turns out at least non-EPT KVM MMU is not spared.
> > > 
> > > I guess after enabling LAM, these entries will never match. But
> > > if
> > > LAM
> > > gets disabled again they will become active. Hm?
> > > 
> > > Maybe just flush?
> > 
> > Now we have 2 options
> > 1. as you suggested, just flush
> > 2. more precisely identify the case Guest.LA57 && (CR3.bit[62:61]
> > 00
> > -->10 switching), flush. (LAM_U57 bit take precedence over LAM_U48,
> > from spec.)
> > 
> > Considering CR3 change is relatively hot path, and tlb flush is
> > heavy,
> > I lean towards option 2. Your opinion? 
> 
> 11 in bits [62:61] is also considered LAM_U57. So your option 2 is
> broken.

Hi Kirill,

When I came to cook v2 per your suggestion, i.e. leave it just flush, I
pondered on the necessity on all the cases of the 2 bits (LAM_U48,
LAM_U57) flips.
Hold this: LAM_U57 (bit61) takes precedence over LAM_U48 (bit62).

(0,0) --> {(0,1), (1,0), (1,1)}
(0,1) --> {(0,0), (1,0), (1,1)}
(1,0) --> {(0,0), (0,1), (1,1)}
(1,1) --> {(0,0), (1,0), (1,0)}

Among all the 12 cases, only (0,0) --> (1,0) && 5-level paging on, has
to flush tlb. Am I right? if so, would you still prefer unconditionally
flush, just for 1/12 necessity? (if include 5-level/4-level variations,
1/24)

> 
> And I don't buy argument about hot path: the case we talking about is
> about enabling/disabling LAM with constant PGD. It's not hot path by
> any
> mean.
> 
> Let's not be fancy. Just flush TLB.
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-11-02  7:29         ` Robert Hoo
@ 2022-11-02 21:05           ` Kirill A. Shutemov
  2022-11-03  1:04             ` Robert Hoo
  0 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2022-11-02 21:05 UTC (permalink / raw)
  To: Robert Hoo; +Cc: seanjc, pbonzini, kvm

On Wed, Nov 02, 2022 at 03:29:10PM +0800, Robert Hoo wrote:
> On Tue, 2022-11-01 at 05:04 +0300, Kirill A. Shutemov wrote:
> ...
> > > > > -	if (cr3 != kvm_read_cr3(vcpu))
> > > > > -		kvm_mmu_new_pgd(vcpu, cr3);
> > > > > +	old_cr3 = kvm_read_cr3(vcpu);
> > > > > +	if (cr3 != old_cr3) {
> > > > > +		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
> > > > > +			kvm_mmu_new_pgd(vcpu, cr3 &
> > > > > ~(X86_CR3_LAM_U48 |
> > > > > +					X86_CR3_LAM_U57));
> > > > > +		} else {
> > > > > +			/* Only LAM conf changes, no tlb flush
> > > > > needed
> > > > > */
> > > > > +			skip_tlb_flush = true;
> > > > 
> > > > I'm not sure about this.
> > > > 
> > > > Consider case when LAM_U48 gets enabled on 5-level paging
> > > > machines.
> > > > We may
> > > > have valid TLB entries for addresses above 47-bit. It's kinda
> > > > broken
> > > > case,
> > > > but seems valid from architectural PoV, no?
> > > 
> > > You're right, thanks Kirill.
> > > 
> > > I noticed in your Kernel enabling, because of this LAM_U48 and
> > > LA_57
> > > overlapping, you enabled LAM_U57 only for simplicity at this
> > > moment. I
> > > thought at that time, that this trickiness will be contained in
> > > Kernel
> > > layer, but now it turns out at least non-EPT KVM MMU is not spared.
> > > > 
> > > > I guess after enabling LAM, these entries will never match. But
> > > > if
> > > > LAM
> > > > gets disabled again they will become active. Hm?
> > > > 
> > > > Maybe just flush?
> > > 
> > > Now we have 2 options
> > > 1. as you suggested, just flush
> > > 2. more precisely identify the case Guest.LA57 && (CR3.bit[62:61]
> > > 00
> > > -->10 switching), flush. (LAM_U57 bit take precedence over LAM_U48,
> > > from spec.)
> > > 
> > > Considering CR3 change is relatively hot path, and tlb flush is
> > > heavy,
> > > I lean towards option 2. Your opinion? 
> > 
> > 11 in bits [62:61] is also considered LAM_U57. So your option 2 is
> > broken.
> 
> Hi Kirill,
> 
> When I came to cook v2 per your suggestion, i.e. leave it just flush, I
> pondered on the necessity on all the cases of the 2 bits (LAM_U48,
> LAM_U57) flips.
> Hold this: LAM_U57 (bit61) takes precedence over LAM_U48 (bit62).
> 
> (0,0) --> {(0,1), (1,0), (1,1)}
> (0,1) --> {(0,0), (1,0), (1,1)}
> (1,0) --> {(0,0), (0,1), (1,1)}
> (1,1) --> {(0,0), (1,0), (1,0)}
> 
> Among all the 12 cases, only (0,0) --> (1,0) && 5-level paging on, has
> to flush tlb. Am I right? if so, would you still prefer unconditionally
> flush, just for 1/12 necessity? (if include 5-level/4-level variations,
> 1/24)

I would keep it simple. We can always add optimization later if there's
a workload that actually benefit from it. But I cannot imagine situation
where enabling LAM is a hot path.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-11-02 21:05           ` Kirill A. Shutemov
@ 2022-11-03  1:04             ` Robert Hoo
  2022-11-03  2:40               ` Kirill A. Shutemov
  0 siblings, 1 reply; 19+ messages in thread
From: Robert Hoo @ 2022-11-03  1:04 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: seanjc, pbonzini, kvm

On Thu, 2022-11-03 at 00:05 +0300, Kirill A. Shutemov wrote:
> On Wed, Nov 02, 2022 at 03:29:10PM +0800, Robert Hoo wrote:
> > On Tue, 2022-11-01 at 05:04 +0300, Kirill A. Shutemov wrote:
> > ...
> > > > > > -	if (cr3 != kvm_read_cr3(vcpu))
> > > > > > -		kvm_mmu_new_pgd(vcpu, cr3);
> > > > > > +	old_cr3 = kvm_read_cr3(vcpu);
> > > > > > +	if (cr3 != old_cr3) {
> > > > > > +		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
> > > > > > +			kvm_mmu_new_pgd(vcpu, cr3 &
> > > > > > ~(X86_CR3_LAM_U48 |
> > > > > > +					X86_CR3_LAM_U57));
> > > > > > +		} else {
> > > > > > +			/* Only LAM conf changes, no tlb flush
> > > > > > needed
> > > > > > */
> > > > > > +			skip_tlb_flush = true;
> > > > > 
> > > > > I'm not sure about this.
> > > > > 
> > > > > Consider case when LAM_U48 gets enabled on 5-level paging
> > > > > machines.
> > > > > We may
> > > > > have valid TLB entries for addresses above 47-bit. It's kinda
> > > > > broken
> > > > > case,
> > > > > but seems valid from architectural PoV, no?
> > > > 
> > > > You're right, thanks Kirill.
> > > > 
> > > > I noticed in your Kernel enabling, because of this LAM_U48 and
> > > > LA_57
> > > > overlapping, you enabled LAM_U57 only for simplicity at this
> > > > moment. I
> > > > thought at that time, that this trickiness will be contained in
> > > > Kernel
> > > > layer, but now it turns out at least non-EPT KVM MMU is not
> > > > spared.
> > > > > 
> > > > > I guess after enabling LAM, these entries will never match.
> > > > > But
> > > > > if
> > > > > LAM
> > > > > gets disabled again they will become active. Hm?
> > > > > 
> > > > > Maybe just flush?
> > > > 
> > > > Now we have 2 options
> > > > 1. as you suggested, just flush
> > > > 2. more precisely identify the case Guest.LA57 &&
> > > > (CR3.bit[62:61]
> > > > 00
> > > > -->10 switching), flush. (LAM_U57 bit take precedence over
> > > > LAM_U48,
> > > > from spec.)
> > > > 
> > > > Considering CR3 change is relatively hot path, and tlb flush is
> > > > heavy,
> > > > I lean towards option 2. Your opinion? 
> > > 
> > > 11 in bits [62:61] is also considered LAM_U57. So your option 2
> > > is
> > > broken.
> > 
> > Hi Kirill,
> > 
> > When I came to cook v2 per your suggestion, i.e. leave it just
> > flush, I
> > pondered on the necessity on all the cases of the 2 bits (LAM_U48,
> > LAM_U57) flips.
> > Hold this: LAM_U57 (bit61) takes precedence over LAM_U48 (bit62).
> > 
> > (0,0) --> {(0,1), (1,0), (1,1)}
> > (0,1) --> {(0,0), (1,0), (1,1)}
> > (1,0) --> {(0,0), (0,1), (1,1)}
> > (1,1) --> {(0,0), (1,0), (1,0)}
> > 
> > Among all the 12 cases, only (0,0) --> (1,0) && 5-level paging on,
> > has
> > to flush tlb. Am I right? if so, would you still prefer
> > unconditionally
> > flush, just for 1/12 necessity? (if include 5-level/4-level
> > variations,
> > 1/24)
> 
> I would keep it simple. We can always add optimization later if
> there's
> a workload that actually benefit from it. But I cannot imagine
> situation
> where enabling LAM is a hot path.
> 
OK, I'm open to this.

I also notice that skip_tlb_flush is set when pcid_enabled && (CR3
& X86_CR3_PCID_NOFLUSH). Under this condition, do you think (0,0) -->
(1,0) need to flip it back to false?

int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
{
	bool skip_tlb_flush = false;
	unsigned long pcid = 0, old_cr3;
#ifdef CONFIG_X86_64
	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);

	if (pcid_enabled) {
		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-11-03  1:04             ` Robert Hoo
@ 2022-11-03  2:40               ` Kirill A. Shutemov
  2022-11-03  8:07                 ` Robert Hoo
  0 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2022-11-03  2:40 UTC (permalink / raw)
  To: Robert Hoo; +Cc: seanjc, pbonzini, kvm

On Thu, Nov 03, 2022 at 09:04:23AM +0800, Robert Hoo wrote:
> I also notice that skip_tlb_flush is set when pcid_enabled && (CR3
> & X86_CR3_PCID_NOFLUSH). Under this condition, do you think (0,0) -->
> (1,0) need to flip it back to false?

Yes, I think we should. We know it is a safe choice.

It also would be nice to get LAM documentation updated on the expected
behaviour. It is not clear from current documentation if enabling LAM
causes flush. We can only guess that it should at least for some
scenarios.

Phantom TLB entires that resurface after LAM gets disable would be fun to
debug.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-11-03  2:40               ` Kirill A. Shutemov
@ 2022-11-03  8:07                 ` Robert Hoo
  0 siblings, 0 replies; 19+ messages in thread
From: Robert Hoo @ 2022-11-03  8:07 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: seanjc, pbonzini, kvm

On Thu, 2022-11-03 at 05:40 +0300, Kirill A. Shutemov wrote:
> On Thu, Nov 03, 2022 at 09:04:23AM +0800, Robert Hoo wrote:
> > I also notice that skip_tlb_flush is set when pcid_enabled && (CR3
> > & X86_CR3_PCID_NOFLUSH). Under this condition, do you think (0,0)
> > -->
> > (1,0) need to flip it back to false?
> 
> Yes, I think we should. We know it is a safe choice.

If so, then judging the (0,0) --> (1,0) case in the else{} branch is
inevitable, isn't it?

Or totally remove the skip_tlb_flush logic in this function, but this
would break existing logic. You won't like it. 
> 
> It also would be nice to get LAM documentation updated on the
> expected
> behaviour. It is not clear from current documentation if enabling LAM
> causes flush. We can only guess that it should at least for some
> scenarios.
> 
> Phantom TLB entires that resurface after LAM gets disable would be
> fun to
> debug.
> 
Agree, and echo your conservativeness.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-11-03  8:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-17  7:04 [PATCH 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
2022-10-17  7:04 ` [PATCH 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable Robert Hoo
2022-10-17  7:04 ` [PATCH 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits Robert Hoo
2022-10-17  7:04 ` [PATCH 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd Robert Hoo
2022-10-17  7:04 ` [PATCH 4/9] [Trivial] KVM: x86: MMU: Commets update Robert Hoo
2022-10-17  7:04 ` [PATCH 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3 Robert Hoo
2022-10-17  7:04 ` [PATCH 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
2022-10-17  7:04 ` [PATCH 7/9] KVM: x86: When judging setting CR3 valid or not, consider LAM bits Robert Hoo
2022-10-17  7:04 ` [PATCH 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics Robert Hoo
2022-10-31  2:59   ` Kirill A. Shutemov
2022-11-01  1:46     ` Robert Hoo
2022-11-01  2:04       ` Kirill A. Shutemov
2022-11-01  2:26         ` Robert Hoo
2022-11-02  7:29         ` Robert Hoo
2022-11-02 21:05           ` Kirill A. Shutemov
2022-11-03  1:04             ` Robert Hoo
2022-11-03  2:40               ` Kirill A. Shutemov
2022-11-03  8:07                 ` Robert Hoo
2022-10-17  7:04 ` [PATCH 9/9] KVM: x86: LAM: Expose LAM CPUID to user space VMM Robert Hoo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.