All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling
@ 2022-12-09  4:45 Robert Hoo
  2022-12-09  4:45 ` [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable Robert Hoo
                   ` (11 more replies)
  0 siblings, 12 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-09  4:45 UTC (permalink / raw)
  To: pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Robert Hoo

===Feature Introduction===

Linear-address masking (LAM) [1], modifies the checking that is applied to
*64-bit* linear addresses, allowing software to use of the untranslated
address (upper) bits for metadata.
As for which upper bits of linear address can be borrowed, LAM has 2 modes:
LAM_48 (bits 62:48, i.e. LAM width of 15) and LAM_57 (bits 62:57, i.e. LAM
width of 6), controlled by these new bits: CR3[62] (LAM_U48), CR3[61]
(LAM_U57), and CR4[28] (LAM_SUP).

* LAM_U48 and LAM_U57 bits controls LAM for user mode address. I.e. if
  CR3.LAM_U57 = 1, LAM57 is applied; if CR3.LAM_U48 = 1 and CR3.LAM_U57 = 0,
  LAM48 is applied.
* LAM_SUP bit, combined with paging mode (4-level or 5-level), determines
  LAM status for supervisor mode address. I.e. when CR4.LAM_SUP =1, 4-level
  paging mode will have LAM48 for supervisor mode address while 5-level paging
  will have LAM57.

Note:
1. LAM applies to only data address, not to instructions.
2. LAM identification of an address as user or supervisor is based solely on the
   value of pointer bit 63 and does not, for the purposes of LAM, depend on the CPL.
3. For user mode address, it is possible that 5-level paging and LAM_U48 are both
   set, in this case, the effective usable linear address width is 48, i.e. bit
   56:47 is reserved by LAM. [2]


===LAM KVM Design===

Pass CR4.LAM_SUP under guest control.

Under EPT mode, CR3 is fully under guest control, guest LAM is thus transparent to
KVM. Nothing more need to do.

For Shadow paging (EPT = off), KVM need to handle guest CR3.LAM_U48 and CR3.LAM_U57
toggles.

Patch 1 -- This patch can be mostly independent from LAM enabling. It just renames
           CR4 reserved bits for better understanding, esp. for beginners.
	   
Patch 2, 9 -- Common part for both EPT and Shadow Paging modes enabling.

Patch 3 ~ 8 -- For Shadow Paging mode LAM enabling.

[1] ISE Chap10 https://cdrdv2.intel.com/v1/dl/getContent/671368 (Section 10.6 VMX interaction)
[2] Thus currently, Kernel enabling patch only enables LAM57 mode. https://lore.kernel.org/lkml/20220815041803.17954-1-kirill.shutemov@linux.intel.com/

---
Changelog
v2 --> v3:
As LAM Kernel patches are in tip tree now, rebase to it.
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/

v1 --> v2:
1. Fixes i386-allyesconfig build error on get_pgd(), where
   CR3_HIGH_RSVD_MASK isn't applicable.
   (Reported-by: kernel test robot <lkp@intel.com>)
2. In kvm_set_cr3(), be conservative on skip tlb flush when only LAM bits
   toggles. (Kirill)

Robert Hoo (9):
  KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
  KVM: x86: Add CR4.LAM_SUP in guest owned bits
  KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for
    pgd
  KVM: x86: MMU: Commets update
  KVM: x86: MMU: Integrate LAM bits when build guest CR3
  KVM: x86: Untag LAM bits when applicable
  KVM: x86: When judging setting CR3 valid or not, consider LAM bits
  KVM: x86: When guest set CR3, handle LAM bits semantics
  KVM: x86: LAM: Expose LAM CPUID to user space VMM

 arch/x86/include/asm/kvm_host.h        |  7 ++--
 arch/x86/include/asm/processor-flags.h |  1 +
 arch/x86/kvm/cpuid.c                   |  6 +--
 arch/x86/kvm/kvm_cache_regs.h          |  3 +-
 arch/x86/kvm/mmu.h                     |  5 +++
 arch/x86/kvm/mmu/mmu.c                 | 18 ++++++---
 arch/x86/kvm/vmx/vmx.c                 |  8 +++-
 arch/x86/kvm/x86.c                     | 51 ++++++++++++++++++++------
 arch/x86/kvm/x86.h                     | 43 +++++++++++++++++++++-
 9 files changed, 115 insertions(+), 27 deletions(-)


base-commit: a5dadcb601b4954c60494d797b4dd1e03a4b1ebe
-- 
2.31.1


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
@ 2022-12-09  4:45 ` Robert Hoo
  2022-12-28  3:37   ` Binbin Wu
  2022-12-09  4:45 ` [PATCH v3 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits Robert Hoo
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 63+ messages in thread
From: Robert Hoo @ 2022-12-09  4:45 UTC (permalink / raw)
  To: pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Robert Hoo

kvm_vcpu_arch::cr4_guest_owned_bits and kvm_vcpu_arch::cr4_guest_rsvd_bits
looks confusing. Rename latter to cr4_host_rsvd_bits, because it in fact
decribes the effective host reserved cr4 bits from the vcpu's perspective.

Meanwhile, rename other related variables/macros to be better descriptive:
* CR4_RESERVED_BITS --> CR4_HOST_RESERVED_BITS, which describes host bare
metal CR4 reserved bits.

* cr4_reserved_bits --> cr4_kvm_reserved_bits, which describes
CR4_HOST_RESERVED_BITS + !kvm_cap_has() = kvm level cr4 reserved bits.

* __cr4_reserved_bits() --> __cr4_calc_reserved_bits(), which to calc
effective cr4 reserved bits for kvm or vm level, by corresponding
x_cpu_has() input.

Thus, by these renames, the hierarchical relations of those reserved CR4
bits is more clear.

Just renames, no functional changes intended.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/cpuid.c            |  4 ++--
 arch/x86/kvm/vmx/vmx.c          |  2 +-
 arch/x86/kvm/x86.c              | 12 ++++++------
 arch/x86/kvm/x86.h              |  4 ++--
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f05ebaa26f0f..3c736e00b6b1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -114,7 +114,7 @@
 			  | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
 			  | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG))
 
-#define CR4_RESERVED_BITS                                               \
+#define CR4_HOST_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
 			  | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE     \
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
@@ -671,7 +671,7 @@ struct kvm_vcpu_arch {
 	unsigned long cr3;
 	unsigned long cr4;
 	unsigned long cr4_guest_owned_bits;
-	unsigned long cr4_guest_rsvd_bits;
+	unsigned long cr4_host_rsvd_bits;
 	unsigned long cr8;
 	u32 host_pkru;
 	u32 pkru;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c92c49a0b35b..01e2b93ef563 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -352,8 +352,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
 
 	kvm_pmu_refresh(vcpu);
-	vcpu->arch.cr4_guest_rsvd_bits =
-	    __cr4_reserved_bits(guest_cpuid_has, vcpu);
+	vcpu->arch.cr4_host_rsvd_bits =
+	    __cr4_calc_reserved_bits(guest_cpuid_has, vcpu);
 
 	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries,
 						    vcpu->arch.cpuid_nent));
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 63247c57c72c..cfa06c7c062e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4250,7 +4250,7 @@ void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
 	struct kvm_vcpu *vcpu = &vmx->vcpu;
 
 	vcpu->arch.cr4_guest_owned_bits = KVM_POSSIBLE_CR4_GUEST_BITS &
-					  ~vcpu->arch.cr4_guest_rsvd_bits;
+					  ~vcpu->arch.cr4_host_rsvd_bits;
 	if (!enable_ept) {
 		vcpu->arch.cr4_guest_owned_bits &= ~X86_CR4_TLBFLUSH_BITS;
 		vcpu->arch.cr4_guest_owned_bits &= ~X86_CR4_PDPTR_BITS;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 69227f77b201..eb1f2c20e19e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -108,7 +108,7 @@ u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
 static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 #endif
 
-static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
+static u64 __read_mostly cr4_kvm_reserved_bits = CR4_HOST_RESERVED_BITS;
 
 #define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
 
@@ -1102,10 +1102,10 @@ EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);
 
 bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
-	if (cr4 & cr4_reserved_bits)
+	if (cr4 & cr4_kvm_reserved_bits)
 		return false;
 
-	if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
+	if (cr4 & vcpu->arch.cr4_host_rsvd_bits)
 		return false;
 
 	return true;
@@ -12290,7 +12290,7 @@ int kvm_arch_hardware_setup(void *opaque)
 		kvm_caps.supported_xss = 0;
 
 #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
-	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
+	cr4_kvm_reserved_bits = __cr4_calc_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
 #undef __kvm_cpu_cap_has
 
 	if (kvm_caps.has_tsc_control) {
@@ -12323,8 +12323,8 @@ int kvm_arch_check_processor_compat(void *opaque)
 
 	WARN_ON(!irqs_disabled());
 
-	if (__cr4_reserved_bits(cpu_has, c) !=
-	    __cr4_reserved_bits(cpu_has, &boot_cpu_data))
+	if (__cr4_calc_reserved_bits(cpu_has, c) !=
+	    __cr4_calc_reserved_bits(cpu_has, &boot_cpu_data))
 		return -EIO;
 
 	return ops->check_processor_compatibility();
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 829d3134c1eb..d92e580768e5 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -452,9 +452,9 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 #define  KVM_MSR_RET_INVALID	2	/* in-kernel MSR emulation #GP condition */
 #define  KVM_MSR_RET_FILTERED	3	/* #GP due to userspace MSR filter */
 
-#define __cr4_reserved_bits(__cpu_has, __c)             \
+#define __cr4_calc_reserved_bits(__cpu_has, __c)             \
 ({                                                      \
-	u64 __reserved_bits = CR4_RESERVED_BITS;        \
+	u64 __reserved_bits = CR4_HOST_RESERVED_BITS;        \
                                                         \
 	if (!__cpu_has(__c, X86_FEATURE_XSAVE))         \
 		__reserved_bits |= X86_CR4_OSXSAVE;     \
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
  2022-12-09  4:45 ` [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable Robert Hoo
@ 2022-12-09  4:45 ` Robert Hoo
  2023-01-07  0:38   ` Sean Christopherson
  2022-12-09  4:45 ` [PATCH v3 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd Robert Hoo
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 63+ messages in thread
From: Robert Hoo @ 2022-12-09  4:45 UTC (permalink / raw)
  To: pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Robert Hoo, Jingqi Liu

If LAM enabled, CR4.LAM_SUP is owned by guest; otherwise, reserved.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/kvm_cache_regs.h   | 3 ++-
 arch/x86/kvm/x86.h              | 2 ++
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3c736e00b6b1..275a6b2337b1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -120,7 +120,8 @@
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+			  | X86_CR4_LAM_SUP))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 3febc342360c..917f1b770839 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -7,7 +7,8 @@
 #define KVM_POSSIBLE_CR0_GUEST_BITS X86_CR0_TS
 #define KVM_POSSIBLE_CR4_GUEST_BITS				  \
 	(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR  \
-	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)
+	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE \
+	 | X86_CR4_LAM_SUP)
 
 #define X86_CR0_PDPTR_BITS    (X86_CR0_CD | X86_CR0_NW | X86_CR0_PG)
 #define X86_CR4_TLBFLUSH_BITS (X86_CR4_PGE | X86_CR4_PCIDE | X86_CR4_PAE | X86_CR4_SMEP)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index d92e580768e5..6c1fbe27616f 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -474,6 +474,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 		__reserved_bits |= X86_CR4_VMXE;        \
 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
 		__reserved_bits |= X86_CR4_PCIDE;       \
+	if (!__cpu_has(__c, X86_FEATURE_LAM))		\
+		__reserved_bits |= X86_CR4_LAM_SUP;	\
 	__reserved_bits;                                \
 })
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
  2022-12-09  4:45 ` [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable Robert Hoo
  2022-12-09  4:45 ` [PATCH v3 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits Robert Hoo
@ 2022-12-09  4:45 ` Robert Hoo
  2022-12-19  6:44   ` Yuan Yao
  2023-01-07  0:45   ` Sean Christopherson
  2022-12-09  4:45 ` [PATCH v3 4/9] KVM: x86: MMU: Commets update Robert Hoo
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-09  4:45 UTC (permalink / raw)
  To: pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Robert Hoo, Jingqi Liu

The get_cr3() is the implementation of kvm_mmu::get_guest_pgd(), well, CR3
cannot be naturally equivalent to pgd, SDM says CR3 high bits are reserved,
must be zero.
And now, with LAM feature's introduction, bit 61 ~ 62 are used.
So, rename get_cr3() --> get_pgd() to better indicate function purpose and
in it, filtered out CR3 high bits.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/include/asm/processor-flags.h |  1 +
 arch/x86/kvm/mmu/mmu.c                 | 12 ++++++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index d8cccadc83a6..bb0f8dd16956 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -38,6 +38,7 @@
 #ifdef CONFIG_X86_64
 /* Mask off the address space ID and SME encryption bits. */
 #define CR3_ADDR_MASK	__sme_clr(PHYSICAL_PAGE_MASK)
+#define CR3_HIGH_RSVD_MASK	GENMASK_ULL(63, 52)
 #define CR3_PCID_MASK	0xFFFull
 #define CR3_NOFLUSH	BIT_ULL(63)
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b6f96d47e596..d433c8923b18 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4488,9 +4488,13 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
 
-static unsigned long get_cr3(struct kvm_vcpu *vcpu)
+static unsigned long get_pgd(struct kvm_vcpu *vcpu)
 {
+#ifdef CONFIG_X86_64
+	return kvm_read_cr3(vcpu) & ~CR3_HIGH_RSVD_MASK;
+#else
 	return kvm_read_cr3(vcpu);
+#endif
 }
 
 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
@@ -5043,7 +5047,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu,
 	context->page_fault = kvm_tdp_page_fault;
 	context->sync_page = nonpaging_sync_page;
 	context->invlpg = NULL;
-	context->get_guest_pgd = get_cr3;
+	context->get_guest_pgd = get_pgd;
 	context->get_pdptr = kvm_pdptr_read;
 	context->inject_page_fault = kvm_inject_page_fault;
 
@@ -5193,7 +5197,7 @@ static void init_kvm_softmmu(struct kvm_vcpu *vcpu,
 
 	kvm_init_shadow_mmu(vcpu, cpu_role);
 
-	context->get_guest_pgd     = get_cr3;
+	context->get_guest_pgd     = get_pgd;
 	context->get_pdptr         = kvm_pdptr_read;
 	context->inject_page_fault = kvm_inject_page_fault;
 }
@@ -5207,7 +5211,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu,
 		return;
 
 	g_context->cpu_role.as_u64   = new_mode.as_u64;
-	g_context->get_guest_pgd     = get_cr3;
+	g_context->get_guest_pgd     = get_pgd;
 	g_context->get_pdptr         = kvm_pdptr_read;
 	g_context->inject_page_fault = kvm_inject_page_fault;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 4/9] KVM: x86: MMU: Commets update
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (2 preceding siblings ...)
  2022-12-09  4:45 ` [PATCH v3 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd Robert Hoo
@ 2022-12-09  4:45 ` Robert Hoo
  2022-12-09  4:45 ` [PATCH v3 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3 Robert Hoo
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-09  4:45 UTC (permalink / raw)
  To: pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Robert Hoo

kvm_mmu_ensure_valid_pgd() is stale. Update the comments according to
latest code.

No function changes.

P.S. Sean firstly noticed this in
https://lore.kernel.org/kvm/Yg%2FguAXFLJBmDflh@google.com/.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d433c8923b18..450500086932 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4452,8 +4452,12 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	union kvm_mmu_page_role new_role = mmu->root_role;
 
+	/*
+	 * If no root is found in cache, current active root.hpa will be (set)
+	 * INVALID_PAGE, a new root will be set up during vcpu_enter_guest()
+	 * --> kvm_mmu_reload().
+	 */
 	if (!fast_pgd_switch(vcpu->kvm, mmu, new_pgd, new_role)) {
-		/* kvm_mmu_ensure_valid_pgd will set up a new root.  */
 		return;
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (3 preceding siblings ...)
  2022-12-09  4:45 ` [PATCH v3 4/9] KVM: x86: MMU: Commets update Robert Hoo
@ 2022-12-09  4:45 ` Robert Hoo
  2022-12-19  6:53   ` Yuan Yao
  2022-12-09  4:45 ` [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 63+ messages in thread
From: Robert Hoo @ 2022-12-09  4:45 UTC (permalink / raw)
  To: pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Robert Hoo, Jingqi Liu

When calc the new CR3 value, take LAM bits in.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/kvm/mmu.h     | 5 +++++
 arch/x86/kvm/vmx/vmx.c | 3 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6bdaacb6faa0..866f2b7cb509 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -142,6 +142,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
 	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
 }
 
+static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu)
+{
+	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
+}
+
 static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 {
 	u64 root_hpa = vcpu->arch.mmu->root.hpa;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cfa06c7c062e..9985dbb63e7b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3261,7 +3261,8 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			update_guest_cr3 = false;
 		vmx_ept_load_pdptrs(vcpu);
 	} else {
-		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu) |
+			    kvm_get_active_lam(vcpu);
 	}
 
 	if (update_guest_cr3)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (4 preceding siblings ...)
  2022-12-09  4:45 ` [PATCH v3 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3 Robert Hoo
@ 2022-12-09  4:45 ` Robert Hoo
  2022-12-19  7:32   ` Yuan Yao
                     ` (5 more replies)
  2022-12-09  4:45 ` [PATCH v3 7/9] KVM: x86: When judging setting CR3 valid or not, consider LAM bits Robert Hoo
                   ` (5 subsequent siblings)
  11 siblings, 6 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-09  4:45 UTC (permalink / raw)
  To: pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Robert Hoo, Jingqi Liu

Define kvm_untagged_addr() per LAM feature spec: Address high bits are sign
extended, from highest effective address bit.
Note that LAM_U48 and LA57 has some effective bits overlap. This patch
gives a WARN() on that case.

Now the only applicable possible case that addresses passed down from VM
with LAM bits is those for MPX MSRs.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/kvm/vmx/vmx.c |  3 +++
 arch/x86/kvm/x86.c     |  5 +++++
 arch/x86/kvm/x86.h     | 37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9985dbb63e7b..16ddd3fcd3cb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2134,6 +2134,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		    (!msr_info->host_initiated &&
 		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
 			return 1;
+
+		data = kvm_untagged_addr(data, vcpu);
+
 		if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
 		    (data & MSR_IA32_BNDCFGS_RSVD))
 			return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eb1f2c20e19e..0a446b45e3d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1812,6 +1812,11 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
 	case MSR_KERNEL_GS_BASE:
 	case MSR_CSTAR:
 	case MSR_LSTAR:
+		/*
+		 * LAM applies only addresses used for data accesses.
+		 * Tagged address should never reach here.
+		 * Strict canonical check still applies here.
+		 */
 		if (is_noncanonical_address(data, vcpu))
 			return 1;
 		break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 6c1fbe27616f..f5a2a15783c6 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -195,11 +195,48 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
 	return kvm_read_cr4_bits(vcpu, X86_CR4_LA57) ? 57 : 48;
 }
 
+static inline u64 get_canonical(u64 la, u8 vaddr_bits)
+{
+	return ((int64_t)la << (64 - vaddr_bits)) >> (64 - vaddr_bits);
+}
+
 static inline bool is_noncanonical_address(u64 la, struct kvm_vcpu *vcpu)
 {
 	return !__is_canonical_address(la, vcpu_virt_addr_bits(vcpu));
 }
 
+#ifdef CONFIG_X86_64
+/* untag addr for guest, according to vCPU CR3 and CR4 settings */
+static inline u64 kvm_untagged_addr(u64 addr, struct kvm_vcpu *vcpu)
+{
+	if (addr >> 63 == 0) {
+		/* User pointers */
+		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
+			addr = get_canonical(addr, 57);
+		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48) {
+			/*
+			 * If guest enabled 5-level paging and LAM_U48,
+			 * bit 47 should be 0, bit 48:56 contains meta data
+			 * although bit 47:56 are valid 5-level address
+			 * bits.
+			 * If LAM_U48 and 4-level paging, bit47 is 0.
+			 */
+			WARN_ON(addr & _BITUL(47));
+			addr = get_canonical(addr, 48);
+		}
+	} else if (kvm_read_cr4(vcpu) & X86_CR4_LAM_SUP) { /* Supervisor pointers */
+		if (kvm_read_cr4(vcpu) & X86_CR4_LA57)
+			addr = get_canonical(addr, 57);
+		else
+			addr = get_canonical(addr, 48);
+	}
+
+	return addr;
+}
+#else
+#define kvm_untagged_addr(addr, vcpu)	(addr)
+#endif
+
 static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
 					gva_t gva, gfn_t gfn, unsigned access)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 7/9] KVM: x86: When judging setting CR3 valid or not, consider LAM bits
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (5 preceding siblings ...)
  2022-12-09  4:45 ` [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
@ 2022-12-09  4:45 ` Robert Hoo
  2022-12-09  4:45 ` [PATCH v3 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics Robert Hoo
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-09  4:45 UTC (permalink / raw)
  To: pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Robert Hoo, Jingqi Liu

Before apply to kvm_vcpu_is_illegal_gpa(), clear LAM bits if it's valid.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/kvm/x86.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a446b45e3d6..48a2ad1e4cd6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1237,6 +1237,14 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
 	kvm_mmu_free_roots(vcpu->kvm, mmu, roots_to_free);
 }
 
+static bool kvm_is_valid_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
+{
+	if (guest_cpuid_has(vcpu, X86_FEATURE_LAM))
+		cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
+
+	return kvm_vcpu_is_legal_gpa(vcpu, cr3);
+}
+
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	bool skip_tlb_flush = false;
@@ -1260,7 +1268,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
 	 * the current vCPU mode is accurate.
 	 */
-	if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
+	if (!kvm_is_valid_cr3(vcpu, cr3))
 		return 1;
 
 	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (6 preceding siblings ...)
  2022-12-09  4:45 ` [PATCH v3 7/9] KVM: x86: When judging setting CR3 valid or not, consider LAM bits Robert Hoo
@ 2022-12-09  4:45 ` Robert Hoo
  2022-12-20  9:10   ` Liu, Jingqi
  2022-12-21  8:30   ` Yu Zhang
  2022-12-09  4:45 ` [PATCH v3 9/9] KVM: x86: LAM: Expose LAM CPUID to user space VMM Robert Hoo
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-09  4:45 UTC (permalink / raw)
  To: pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Robert Hoo

When only changes LAM bits, ask next vcpu run to load mmu pgd, so that it
will build new CR3 with LAM bits updates. No TLB flush needed on this case.
When changes on effective addresses, no matter LAM bits changes or not, go
through normal pgd update process.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 arch/x86/kvm/x86.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 48a2ad1e4cd6..6fbe8dd36b1e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1248,9 +1248,9 @@ static bool kvm_is_valid_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	bool skip_tlb_flush = false;
-	unsigned long pcid = 0;
+	unsigned long pcid = 0, old_cr3;
 #ifdef CONFIG_X86_64
-	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
+	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
 
 	if (pcid_enabled) {
 		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
@@ -1263,6 +1263,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
 		goto handle_tlb_flush;
 
+	if (!guest_cpuid_has(vcpu, X86_FEATURE_LAM) &&
+	    (cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57)))
+		return	1;
+
 	/*
 	 * Do not condition the GPA check on long mode, this helper is used to
 	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
@@ -1274,8 +1278,20 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
 		return 1;
 
-	if (cr3 != kvm_read_cr3(vcpu))
-		kvm_mmu_new_pgd(vcpu, cr3);
+	old_cr3 = kvm_read_cr3(vcpu);
+	if (cr3 != old_cr3) {
+		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
+			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
+					X86_CR3_LAM_U57));
+		} else {
+			/*
+			 * Though effective addr no change, mark the
+			 * request so that LAM bits will take effect
+			 * when enter guest.
+			 */
+			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
+		}
+	}
 
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 9/9] KVM: x86: LAM: Expose LAM CPUID to user space VMM
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (7 preceding siblings ...)
  2022-12-09  4:45 ` [PATCH v3 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics Robert Hoo
@ 2022-12-09  4:45 ` Robert Hoo
  2022-12-19  6:12 ` [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-09  4:45 UTC (permalink / raw)
  To: pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Robert Hoo, Jingqi Liu

LAM feature is enumerated by (EAX=07H, ECX=01H):EAX.LAM[bit26].

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
---
 arch/x86/kvm/cpuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 01e2b93ef563..1e7c7f9d756b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -657,7 +657,7 @@ void kvm_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
 
 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
-		F(AVX_VNNI) | F(AVX512_BF16)
+		F(AVX_VNNI) | F(AVX512_BF16) | F(LAM)
 	);
 
 	kvm_cpu_cap_mask(CPUID_D_1_EAX,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (8 preceding siblings ...)
  2022-12-09  4:45 ` [PATCH v3 9/9] KVM: x86: LAM: Expose LAM CPUID to user space VMM Robert Hoo
@ 2022-12-19  6:12 ` Robert Hoo
  2022-12-19  8:09 ` Yuan Yao
  2022-12-20  9:20 ` Liu, Jingqi
  11 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-19  6:12 UTC (permalink / raw)
  To: pbonzini, seanjc, kirill.shutemov, kvm

A gentle ping ... for help review, thanks in advance :-)

On Fri, 2022-12-09 at 12:45 +0800, Robert Hoo wrote:
> ===Feature Introduction===
> 
> Linear-address masking (LAM) [1], modifies the checking that is
> applied to
> *64-bit* linear addresses, allowing software to use of the
> untranslated
> address (upper) bits for metadata.
> As for which upper bits of linear address can be borrowed, LAM has 2
> modes:
> LAM_48 (bits 62:48, i.e. LAM width of 15) and LAM_57 (bits 62:57,
> i.e. LAM
> width of 6), controlled by these new bits: CR3[62] (LAM_U48), CR3[61]
> (LAM_U57), and CR4[28] (LAM_SUP).
> 
> * LAM_U48 and LAM_U57 bits controls LAM for user mode address. I.e.
> if
>   CR3.LAM_U57 = 1, LAM57 is applied; if CR3.LAM_U48 = 1 and
> CR3.LAM_U57 = 0,
>   LAM48 is applied.
> * LAM_SUP bit, combined with paging mode (4-level or 5-level),
> determines
>   LAM status for supervisor mode address. I.e. when CR4.LAM_SUP =1,
> 4-level
>   paging mode will have LAM48 for supervisor mode address while 5-
> level paging
>   will have LAM57.
> 
> Note:
> 1. LAM applies to only data address, not to instructions.
> 2. LAM identification of an address as user or supervisor is based
> solely on the
>    value of pointer bit 63 and does not, for the purposes of LAM,
> depend on the CPL.
> 3. For user mode address, it is possible that 5-level paging and
> LAM_U48 are both
>    set, in this case, the effective usable linear address width is
> 48, i.e. bit
>    56:47 is reserved by LAM. [2]
> 
> 
> ===LAM KVM Design===
> 
> Pass CR4.LAM_SUP under guest control.
> 
> Under EPT mode, CR3 is fully under guest control, guest LAM is thus
> transparent to
> KVM. Nothing more need to do.
> 
> For Shadow paging (EPT = off), KVM need to handle guest CR3.LAM_U48
> and CR3.LAM_U57
> toggles.
> 
> Patch 1 -- This patch can be mostly independent from LAM enabling. It
> just renames
>            CR4 reserved bits for better understanding, esp. for
> beginners.
> 	   
> Patch 2, 9 -- Common part for both EPT and Shadow Paging modes
> enabling.
> 
> Patch 3 ~ 8 -- For Shadow Paging mode LAM enabling.
> 
> [1] ISE Chap10 https://cdrdv2.intel.com/v1/dl/getContent/671368
> (Section 10.6 VMX interaction)
> [2] Thus currently, Kernel enabling patch only enables LAM57 mode. 
> https://lore.kernel.org/lkml/20220815041803.17954-1-kirill.shutemov@linux.intel.com/
> 
> ---
> Changelog
> v2 --> v3:
> As LAM Kernel patches are in tip tree now, rebase to it.
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/
> 
> v1 --> v2:
> 1. Fixes i386-allyesconfig build error on get_pgd(), where
>    CR3_HIGH_RSVD_MASK isn't applicable.
>    (Reported-by: kernel test robot <lkp@intel.com>)
> 2. In kvm_set_cr3(), be conservative on skip tlb flush when only LAM
> bits
>    toggles. (Kirill)
> 
> Robert Hoo (9):
>   KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
>   KVM: x86: Add CR4.LAM_SUP in guest owned bits
>   KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits
> for
>     pgd
>   KVM: x86: MMU: Commets update
>   KVM: x86: MMU: Integrate LAM bits when build guest CR3
>   KVM: x86: Untag LAM bits when applicable
>   KVM: x86: When judging setting CR3 valid or not, consider LAM bits
>   KVM: x86: When guest set CR3, handle LAM bits semantics
>   KVM: x86: LAM: Expose LAM CPUID to user space VMM
> 
>  arch/x86/include/asm/kvm_host.h        |  7 ++--
>  arch/x86/include/asm/processor-flags.h |  1 +
>  arch/x86/kvm/cpuid.c                   |  6 +--
>  arch/x86/kvm/kvm_cache_regs.h          |  3 +-
>  arch/x86/kvm/mmu.h                     |  5 +++
>  arch/x86/kvm/mmu/mmu.c                 | 18 ++++++---
>  arch/x86/kvm/vmx/vmx.c                 |  8 +++-
>  arch/x86/kvm/x86.c                     | 51 ++++++++++++++++++++--
> ----
>  arch/x86/kvm/x86.h                     | 43 +++++++++++++++++++++-
>  9 files changed, 115 insertions(+), 27 deletions(-)
> 
> 
> base-commit: a5dadcb601b4954c60494d797b4dd1e03a4b1ebe


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd
  2022-12-09  4:45 ` [PATCH v3 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd Robert Hoo
@ 2022-12-19  6:44   ` Yuan Yao
  2022-12-20 14:07     ` Robert Hoo
  2023-01-07  0:45   ` Sean Christopherson
  1 sibling, 1 reply; 63+ messages in thread
From: Yuan Yao @ 2022-12-19  6:44 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Fri, Dec 09, 2022 at 12:45:51PM +0800, Robert Hoo wrote:
> The get_cr3() is the implementation of kvm_mmu::get_guest_pgd(), well, CR3
> cannot be naturally equivalent to pgd, SDM says CR3 high bits are reserved,
> must be zero.
> And now, with LAM feature's introduction, bit 61 ~ 62 are used.
> So, rename get_cr3() --> get_pgd() to better indicate function purpose and
> in it, filtered out CR3 high bits.
>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> ---
>  arch/x86/include/asm/processor-flags.h |  1 +
>  arch/x86/kvm/mmu/mmu.c                 | 12 ++++++++----
>  2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
> index d8cccadc83a6..bb0f8dd16956 100644
> --- a/arch/x86/include/asm/processor-flags.h
> +++ b/arch/x86/include/asm/processor-flags.h
> @@ -38,6 +38,7 @@
>  #ifdef CONFIG_X86_64
>  /* Mask off the address space ID and SME encryption bits. */
>  #define CR3_ADDR_MASK	__sme_clr(PHYSICAL_PAGE_MASK)
> +#define CR3_HIGH_RSVD_MASK	GENMASK_ULL(63, 52)
>  #define CR3_PCID_MASK	0xFFFull
>  #define CR3_NOFLUSH	BIT_ULL(63)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b6f96d47e596..d433c8923b18 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4488,9 +4488,13 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
>  }
>  EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
>
> -static unsigned long get_cr3(struct kvm_vcpu *vcpu)
> +static unsigned long get_pgd(struct kvm_vcpu *vcpu)
>  {
> +#ifdef CONFIG_X86_64
> +	return kvm_read_cr3(vcpu) & ~CR3_HIGH_RSVD_MASK;

CR3_HIGH_RSVD_MASK is used to extract the guest pgd, may
need to use guest's MAXPHYADDR but not hard code to 52.
Or easily, just mask out the LAM bits.

> +#else
>  	return kvm_read_cr3(vcpu);
> +#endif
>  }
>
>  static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
> @@ -5043,7 +5047,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu,
>  	context->page_fault = kvm_tdp_page_fault;
>  	context->sync_page = nonpaging_sync_page;
>  	context->invlpg = NULL;
> -	context->get_guest_pgd = get_cr3;
> +	context->get_guest_pgd = get_pgd;
>  	context->get_pdptr = kvm_pdptr_read;
>  	context->inject_page_fault = kvm_inject_page_fault;
>
> @@ -5193,7 +5197,7 @@ static void init_kvm_softmmu(struct kvm_vcpu *vcpu,
>
>  	kvm_init_shadow_mmu(vcpu, cpu_role);
>
> -	context->get_guest_pgd     = get_cr3;
> +	context->get_guest_pgd     = get_pgd;
>  	context->get_pdptr         = kvm_pdptr_read;
>  	context->inject_page_fault = kvm_inject_page_fault;
>  }
> @@ -5207,7 +5211,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu,
>  		return;
>
>  	g_context->cpu_role.as_u64   = new_mode.as_u64;
> -	g_context->get_guest_pgd     = get_cr3;
> +	g_context->get_guest_pgd     = get_pgd;
>  	g_context->get_pdptr         = kvm_pdptr_read;
>  	g_context->inject_page_fault = kvm_inject_page_fault;
>
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3
  2022-12-09  4:45 ` [PATCH v3 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3 Robert Hoo
@ 2022-12-19  6:53   ` Yuan Yao
  2022-12-20 14:07     ` Robert Hoo
  0 siblings, 1 reply; 63+ messages in thread
From: Yuan Yao @ 2022-12-19  6:53 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Fri, Dec 09, 2022 at 12:45:53PM +0800, Robert Hoo wrote:
> When calc the new CR3 value, take LAM bits in.
>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> ---
>  arch/x86/kvm/mmu.h     | 5 +++++
>  arch/x86/kvm/vmx/vmx.c | 3 ++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 6bdaacb6faa0..866f2b7cb509 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -142,6 +142,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
>  	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
>  }
>
> +static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu)
> +{

Unlike the PCIDs, LAM bits in CR3 are  not sharing with other features,
(e.g. PCID vs non-PCIN on bit 0:11) so not check CR4[28] here should
be fine, otherwise follows kvm_get_pcid() looks better.

> +	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
> +}
> +
>  static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
>  {
>  	u64 root_hpa = vcpu->arch.mmu->root.hpa;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index cfa06c7c062e..9985dbb63e7b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3261,7 +3261,8 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
>  			update_guest_cr3 = false;
>  		vmx_ept_load_pdptrs(vcpu);
>  	} else {
> -		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu);
> +		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu) |
> +			    kvm_get_active_lam(vcpu);
>  	}
>
>  	if (update_guest_cr3)
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-09  4:45 ` [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
@ 2022-12-19  7:32   ` Yuan Yao
  2022-12-20 14:07     ` Robert Hoo
  2022-12-19  9:45   ` Yuan Yao
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 63+ messages in thread
From: Yuan Yao @ 2022-12-19  7:32 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Fri, Dec 09, 2022 at 12:45:54PM +0800, Robert Hoo wrote:
> Define kvm_untagged_addr() per LAM feature spec: Address high bits are sign
> extended, from highest effective address bit.
> Note that LAM_U48 and LA57 has some effective bits overlap. This patch
> gives a WARN() on that case.
>
> Now the only applicable possible case that addresses passed down from VM
> with LAM bits is those for MPX MSRs.
>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c |  3 +++
>  arch/x86/kvm/x86.c     |  5 +++++
>  arch/x86/kvm/x86.h     | 37 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 45 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9985dbb63e7b..16ddd3fcd3cb 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2134,6 +2134,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		    (!msr_info->host_initiated &&
>  		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
>  			return 1;
> +
> +		data = kvm_untagged_addr(data, vcpu);
> +
>  		if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
>  		    (data & MSR_IA32_BNDCFGS_RSVD))
>  			return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eb1f2c20e19e..0a446b45e3d6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1812,6 +1812,11 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
>  	case MSR_KERNEL_GS_BASE:
>  	case MSR_CSTAR:
>  	case MSR_LSTAR:
> +		/*
> +		 * LAM applies only addresses used for data accesses.

Confused due to the MSR_KERNEL_GS_BASE also used for data accessing,
how about add below:
The strict canonical checking is sitll appplied to MSR writing even
LAM is enabled.

> +		 * Tagged address should never reach here.
> +		 * Strict canonical check still applies here.
> +		 */
>  		if (is_noncanonical_address(data, vcpu))
>  			return 1;
>  		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 6c1fbe27616f..f5a2a15783c6 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -195,11 +195,48 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
>  	return kvm_read_cr4_bits(vcpu, X86_CR4_LA57) ? 57 : 48;
>  }
>
> +static inline u64 get_canonical(u64 la, u8 vaddr_bits)
> +{
> +	return ((int64_t)la << (64 - vaddr_bits)) >> (64 - vaddr_bits);
> +}
> +
>  static inline bool is_noncanonical_address(u64 la, struct kvm_vcpu *vcpu)
>  {
>  	return !__is_canonical_address(la, vcpu_virt_addr_bits(vcpu));
>  }
>
> +#ifdef CONFIG_X86_64
> +/* untag addr for guest, according to vCPU CR3 and CR4 settings */
> +static inline u64 kvm_untagged_addr(u64 addr, struct kvm_vcpu *vcpu)
> +{
> +	if (addr >> 63 == 0) {
> +		/* User pointers */
> +		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
> +			addr = get_canonical(addr, 57);
> +		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48) {
> +			/*
> +			 * If guest enabled 5-level paging and LAM_U48,
> +			 * bit 47 should be 0, bit 48:56 contains meta data
> +			 * although bit 47:56 are valid 5-level address

Still 48:56.

> +			 * bits.
> +			 * If LAM_U48 and 4-level paging, bit47 is 0.
> +			 */
> +			WARN_ON(addr & _BITUL(47));
> +			addr = get_canonical(addr, 48);
> +		}
> +	} else if (kvm_read_cr4(vcpu) & X86_CR4_LAM_SUP) { /* Supervisor pointers */
> +		if (kvm_read_cr4(vcpu) & X86_CR4_LA57)
> +			addr = get_canonical(addr, 57);
> +		else
> +			addr = get_canonical(addr, 48);
> +	}
> +
> +	return addr;
> +}
> +#else
> +#define kvm_untagged_addr(addr, vcpu)	(addr)
> +#endif
> +
>  static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
>  					gva_t gva, gfn_t gfn, unsigned access)
>  {
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (9 preceding siblings ...)
  2022-12-19  6:12 ` [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
@ 2022-12-19  8:09 ` Yuan Yao
  2022-12-20 14:06   ` Robert Hoo
  2022-12-20  9:20 ` Liu, Jingqi
  11 siblings, 1 reply; 63+ messages in thread
From: Yuan Yao @ 2022-12-19  8:09 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm

On Fri, Dec 09, 2022 at 12:45:48PM +0800, Robert Hoo wrote:
> ===Feature Introduction===
>
> Linear-address masking (LAM) [1], modifies the checking that is applied to
> *64-bit* linear addresses, allowing software to use of the untranslated
> address (upper) bits for metadata.
> As for which upper bits of linear address can be borrowed, LAM has 2 modes:
> LAM_48 (bits 62:48, i.e. LAM width of 15) and LAM_57 (bits 62:57, i.e. LAM
> width of 6), controlled by these new bits: CR3[62] (LAM_U48), CR3[61]
> (LAM_U57), and CR4[28] (LAM_SUP).
>
> * LAM_U48 and LAM_U57 bits controls LAM for user mode address. I.e. if
>   CR3.LAM_U57 = 1, LAM57 is applied; if CR3.LAM_U48 = 1 and CR3.LAM_U57 = 0,
>   LAM48 is applied.
> * LAM_SUP bit, combined with paging mode (4-level or 5-level), determines
>   LAM status for supervisor mode address. I.e. when CR4.LAM_SUP =1, 4-level
>   paging mode will have LAM48 for supervisor mode address while 5-level paging
>   will have LAM57.
>
> Note:
> 1. LAM applies to only data address, not to instructions.
> 2. LAM identification of an address as user or supervisor is based solely on the
>    value of pointer bit 63 and does not, for the purposes of LAM, depend on the CPL.
> 3. For user mode address, it is possible that 5-level paging and LAM_U48 are both
>    set, in this case, the effective usable linear address width is 48, i.e. bit
>    56:47 is reserved by LAM. [2]

It's worth to higlight that vmx exit Guest Linear Address field is always filled
without the LAM metadata part, it can be used as linear address directly. I think
this explains reason of no modification on paging_tmpl.h for shadow paging.

>
>
> ===LAM KVM Design===
>
> Pass CR4.LAM_SUP under guest control.
>
> Under EPT mode, CR3 is fully under guest control, guest LAM is thus transparent to
> KVM. Nothing more need to do.
>
> For Shadow paging (EPT = off), KVM need to handle guest CR3.LAM_U48 and CR3.LAM_U57
> toggles.
>
> Patch 1 -- This patch can be mostly independent from LAM enabling. It just renames
>            CR4 reserved bits for better understanding, esp. for beginners.
>
> Patch 2, 9 -- Common part for both EPT and Shadow Paging modes enabling.
>
> Patch 3 ~ 8 -- For Shadow Paging mode LAM enabling.
>
> [1] ISE Chap10 https://cdrdv2.intel.com/v1/dl/getContent/671368 (Section 10.6 VMX interaction)
> [2] Thus currently, Kernel enabling patch only enables LAM57 mode. https://lore.kernel.org/lkml/20220815041803.17954-1-kirill.shutemov@linux.intel.com/
>
> ---
> Changelog
> v2 --> v3:
> As LAM Kernel patches are in tip tree now, rebase to it.
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/
>
> v1 --> v2:
> 1. Fixes i386-allyesconfig build error on get_pgd(), where
>    CR3_HIGH_RSVD_MASK isn't applicable.
>    (Reported-by: kernel test robot <lkp@intel.com>)
> 2. In kvm_set_cr3(), be conservative on skip tlb flush when only LAM bits
>    toggles. (Kirill)
>
> Robert Hoo (9):
>   KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
>   KVM: x86: Add CR4.LAM_SUP in guest owned bits
>   KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for
>     pgd
>   KVM: x86: MMU: Commets update
>   KVM: x86: MMU: Integrate LAM bits when build guest CR3
>   KVM: x86: Untag LAM bits when applicable
>   KVM: x86: When judging setting CR3 valid or not, consider LAM bits
>   KVM: x86: When guest set CR3, handle LAM bits semantics
>   KVM: x86: LAM: Expose LAM CPUID to user space VMM
>
>  arch/x86/include/asm/kvm_host.h        |  7 ++--
>  arch/x86/include/asm/processor-flags.h |  1 +
>  arch/x86/kvm/cpuid.c                   |  6 +--
>  arch/x86/kvm/kvm_cache_regs.h          |  3 +-
>  arch/x86/kvm/mmu.h                     |  5 +++
>  arch/x86/kvm/mmu/mmu.c                 | 18 ++++++---
>  arch/x86/kvm/vmx/vmx.c                 |  8 +++-
>  arch/x86/kvm/x86.c                     | 51 ++++++++++++++++++++------
>  arch/x86/kvm/x86.h                     | 43 +++++++++++++++++++++-
>  9 files changed, 115 insertions(+), 27 deletions(-)
>
>
> base-commit: a5dadcb601b4954c60494d797b4dd1e03a4b1ebe
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-09  4:45 ` [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
  2022-12-19  7:32   ` Yuan Yao
@ 2022-12-19  9:45   ` Yuan Yao
  2022-12-20 14:07     ` Robert Hoo
  2022-12-21  0:35   ` Yang, Weijiang
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 63+ messages in thread
From: Yuan Yao @ 2022-12-19  9:45 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Fri, Dec 09, 2022 at 12:45:54PM +0800, Robert Hoo wrote:
> Define kvm_untagged_addr() per LAM feature spec: Address high bits are sign
> extended, from highest effective address bit.
> Note that LAM_U48 and LA57 has some effective bits overlap. This patch
> gives a WARN() on that case.
>
> Now the only applicable possible case that addresses passed down from VM
> with LAM bits is those for MPX MSRs.

How about the instruction emulation case ? e.g. KVM on behalf of CPU
to do linear address accessing ? In this case the kvm_untagged_addr()
should also be used to mask out the linear address, otherwise unexpected
#GP(or other exception) will be injected into guest.

Please see all callers of __is_canonical_address()

>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c |  3 +++
>  arch/x86/kvm/x86.c     |  5 +++++
>  arch/x86/kvm/x86.h     | 37 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 45 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9985dbb63e7b..16ddd3fcd3cb 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2134,6 +2134,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		    (!msr_info->host_initiated &&
>  		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
>  			return 1;
> +
> +		data = kvm_untagged_addr(data, vcpu);
> +
>  		if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
>  		    (data & MSR_IA32_BNDCFGS_RSVD))
>  			return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eb1f2c20e19e..0a446b45e3d6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1812,6 +1812,11 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
>  	case MSR_KERNEL_GS_BASE:
>  	case MSR_CSTAR:
>  	case MSR_LSTAR:
> +		/*
> +		 * LAM applies only addresses used for data accesses.
> +		 * Tagged address should never reach here.
> +		 * Strict canonical check still applies here.
> +		 */
>  		if (is_noncanonical_address(data, vcpu))
>  			return 1;
>  		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 6c1fbe27616f..f5a2a15783c6 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -195,11 +195,48 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
>  	return kvm_read_cr4_bits(vcpu, X86_CR4_LA57) ? 57 : 48;
>  }
>
> +static inline u64 get_canonical(u64 la, u8 vaddr_bits)
> +{
> +	return ((int64_t)la << (64 - vaddr_bits)) >> (64 - vaddr_bits);
> +}
> +
>  static inline bool is_noncanonical_address(u64 la, struct kvm_vcpu *vcpu)
>  {
>  	return !__is_canonical_address(la, vcpu_virt_addr_bits(vcpu));
>  }
>
> +#ifdef CONFIG_X86_64
> +/* untag addr for guest, according to vCPU CR3 and CR4 settings */
> +static inline u64 kvm_untagged_addr(u64 addr, struct kvm_vcpu *vcpu)
> +{
> +	if (addr >> 63 == 0) {
> +		/* User pointers */
> +		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
> +			addr = get_canonical(addr, 57);
> +		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48) {
> +			/*
> +			 * If guest enabled 5-level paging and LAM_U48,
> +			 * bit 47 should be 0, bit 48:56 contains meta data
> +			 * although bit 47:56 are valid 5-level address
> +			 * bits.
> +			 * If LAM_U48 and 4-level paging, bit47 is 0.
> +			 */
> +			WARN_ON(addr & _BITUL(47));
> +			addr = get_canonical(addr, 48);
> +		}
> +	} else if (kvm_read_cr4(vcpu) & X86_CR4_LAM_SUP) { /* Supervisor pointers */
> +		if (kvm_read_cr4(vcpu) & X86_CR4_LA57)
> +			addr = get_canonical(addr, 57);
> +		else
> +			addr = get_canonical(addr, 48);
> +	}
> +
> +	return addr;
> +}
> +#else
> +#define kvm_untagged_addr(addr, vcpu)	(addr)
> +#endif
> +
>  static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
>  					gva_t gva, gfn_t gfn, unsigned access)
>  {
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-12-09  4:45 ` [PATCH v3 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics Robert Hoo
@ 2022-12-20  9:10   ` Liu, Jingqi
  2022-12-20 14:16     ` Robert Hoo
  2022-12-21  8:30   ` Yu Zhang
  1 sibling, 1 reply; 63+ messages in thread
From: Liu, Jingqi @ 2022-12-20  9:10 UTC (permalink / raw)
  To: Robert Hoo, pbonzini, seanjc, kirill.shutemov, kvm

On 12/9/2022 12:45 PM, Robert Hoo wrote:
> When only changes LAM bits, ask next vcpu run to load mmu pgd, so that it
> will build new CR3 with LAM bits updates. No TLB flush needed on this case.
> When changes on effective addresses, no matter LAM bits changes or not, go
> through normal pgd update process.
>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> ---
>   arch/x86/kvm/x86.c | 24 ++++++++++++++++++++----
>   1 file changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 48a2ad1e4cd6..6fbe8dd36b1e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1248,9 +1248,9 @@ static bool kvm_is_valid_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   {
>   	bool skip_tlb_flush = false;
> -	unsigned long pcid = 0;
> +	unsigned long pcid = 0, old_cr3;
>   #ifdef CONFIG_X86_64
> -	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> +	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>   
>   	if (pcid_enabled) {
>   		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
> @@ -1263,6 +1263,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   	if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
>   		goto handle_tlb_flush;
>   
> +	if (!guest_cpuid_has(vcpu, X86_FEATURE_LAM) &&
> +	    (cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57)))
> +		return	1;
> +
>   	/*
>   	 * Do not condition the GPA check on long mode, this helper is used to
>   	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
> @@ -1274,8 +1278,20 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
>   		return 1;
>   
> -	if (cr3 != kvm_read_cr3(vcpu))
> -		kvm_mmu_new_pgd(vcpu, cr3);
> +	old_cr3 = kvm_read_cr3(vcpu);
> +	if (cr3 != old_cr3) {
> +		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
> +			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
> +					X86_CR3_LAM_U57));
"CR3_ADDR_MASK" should not contain "X86_CR3_LAM_U48 | X86_CR3_LAM_U57"
But seems it is not defined explicitly.
Besides this, looks good for me.
Reviewed-by: Jingqi Liu<jingqi.liu@intel.com>
> +		} else {
> +			/*
> +			 * Though effective addr no change, mark the
> +			 * request so that LAM bits will take effect
> +			 * when enter guest.
> +			 */
> +			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
> +		}
> +	}
>   
>   	vcpu->arch.cr3 = cr3;
>   	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling
  2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
                   ` (10 preceding siblings ...)
  2022-12-19  8:09 ` Yuan Yao
@ 2022-12-20  9:20 ` Liu, Jingqi
  2022-12-20 14:19   ` Robert Hoo
  11 siblings, 1 reply; 63+ messages in thread
From: Liu, Jingqi @ 2022-12-20  9:20 UTC (permalink / raw)
  To: Robert Hoo, pbonzini, seanjc, kirill.shutemov, kvm

On 12/9/2022 12:45 PM, Robert Hoo wrote:
> ===Feature Introduction===
>
> Linear-address masking (LAM) [1], modifies the checking that is applied to
> *64-bit* linear addresses, allowing software to use of the untranslated
> address (upper) bits for metadata.
> As for which upper bits of linear address can be borrowed, LAM has 2 modes:
> LAM_48 (bits 62:48, i.e. LAM width of 15) and LAM_57 (bits 62:57, i.e. LAM
> width of 6), controlled by these new bits: CR3[62] (LAM_U48), CR3[61]
> (LAM_U57), and CR4[28] (LAM_SUP).
>
> * LAM_U48 and LAM_U57 bits controls LAM for user mode address. I.e. if
>    CR3.LAM_U57 = 1, LAM57 is applied; if CR3.LAM_U48 = 1 and CR3.LAM_U57 = 0,
>    LAM48 is applied.
> * LAM_SUP bit, combined with paging mode (4-level or 5-level), determines
>    LAM status for supervisor mode address. I.e. when CR4.LAM_SUP =1, 4-level
>    paging mode will have LAM48 for supervisor mode address while 5-level paging
>    will have LAM57.
>
> Note:
> 1. LAM applies to only data address, not to instructions.
> 2. LAM identification of an address as user or supervisor is based solely on the
>     value of pointer bit 63 and does not, for the purposes of LAM, depend on the CPL.
> 3. For user mode address, it is possible that 5-level paging and LAM_U48 are both
>     set, in this case, the effective usable linear address width is 48, i.e. bit
>     56:47 is reserved by LAM. [2]
>
>
> ===LAM KVM Design===
>
> Pass CR4.LAM_SUP under guest control.
>
> Under EPT mode, CR3 is fully under guest control, guest LAM is thus transparent to
> KVM. Nothing more need to do.
>
> For Shadow paging (EPT = off), KVM need to handle guest CR3.LAM_U48 and CR3.LAM_U57
> toggles.
>
> Patch 1 -- This patch can be mostly independent from LAM enabling. It just renames
>             CR4 reserved bits for better understanding, esp. for beginners.
> 	
> Patch 2, 9 -- Common part for both EPT and Shadow Paging modes enabling.
>
> Patch 3 ~ 8 -- For Shadow Paging mode LAM enabling.
>
> [1] ISE Chap10 https://cdrdv2.intel.com/v1/dl/getContent/671368 (Section 10.6 VMX interaction)
> [2] Thus currently, Kernel enabling patch only enables LAM57 mode. https://lore.kernel.org/lkml/20220815041803.17954-1-kirill.shutemov@linux.intel.com/
>
> ---
> Changelog
> v2 --> v3:
> As LAM Kernel patches are in tip tree now, rebase to it.
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/
>
> v1 --> v2:
> 1. Fixes i386-allyesconfig build error on get_pgd(), where
>     CR3_HIGH_RSVD_MASK isn't applicable.
>     (Reported-by: kernel test robot <lkp@intel.com>)
> 2. In kvm_set_cr3(), be conservative on skip tlb flush when only LAM bits
>     toggles. (Kirill)
>
> Robert Hoo (9):
>    KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
>    KVM: x86: Add CR4.LAM_SUP in guest owned bits
>    KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for
>      pgd
>    KVM: x86: MMU: Commets update
>    KVM: x86: MMU: Integrate LAM bits when build guest CR3
>    KVM: x86: Untag LAM bits when applicable
>    KVM: x86: When judging setting CR3 valid or not, consider LAM bits
>    KVM: x86: When guest set CR3, handle LAM bits semantics
>    KVM: x86: LAM: Expose LAM CPUID to user space VMM
>
>   arch/x86/include/asm/kvm_host.h        |  7 ++--
>   arch/x86/include/asm/processor-flags.h |  1 +
>   arch/x86/kvm/cpuid.c                   |  6 +--
>   arch/x86/kvm/kvm_cache_regs.h          |  3 +-
>   arch/x86/kvm/mmu.h                     |  5 +++
>   arch/x86/kvm/mmu/mmu.c                 | 18 ++++++---
>   arch/x86/kvm/vmx/vmx.c                 |  8 +++-
>   arch/x86/kvm/x86.c                     | 51 ++++++++++++++++++++------
>   arch/x86/kvm/x86.h                     | 43 +++++++++++++++++++++-
>   9 files changed, 115 insertions(+), 27 deletions(-)
>
>
> base-commit: a5dadcb601b4954c60494d797b4dd1e03a4b1ebe

It would be better if you can provide a URL link to easily reach this 
base-commit.

Thanks,
Jingqi

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling
  2022-12-19  8:09 ` Yuan Yao
@ 2022-12-20 14:06   ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-20 14:06 UTC (permalink / raw)
  To: Yuan Yao; +Cc: pbonzini, seanjc, kirill.shutemov, kvm

On Mon, 2022-12-19 at 16:09 +0800, Yuan Yao wrote:
> On Fri, Dec 09, 2022 at 12:45:48PM +0800, Robert Hoo wrote:
> > ===Feature Introduction===
> > 
> > Linear-address masking (LAM) [1], modifies the checking that is
> > applied to
> > *64-bit* linear addresses, allowing software to use of the
> > untranslated
> > address (upper) bits for metadata.
> > As for which upper bits of linear address can be borrowed, LAM has
> > 2 modes:
> > LAM_48 (bits 62:48, i.e. LAM width of 15) and LAM_57 (bits 62:57,
> > i.e. LAM
> > width of 6), controlled by these new bits: CR3[62] (LAM_U48),
> > CR3[61]
> > (LAM_U57), and CR4[28] (LAM_SUP).
> > 
> > * LAM_U48 and LAM_U57 bits controls LAM for user mode address. I.e.
> > if
> >   CR3.LAM_U57 = 1, LAM57 is applied; if CR3.LAM_U48 = 1 and
> > CR3.LAM_U57 = 0,
> >   LAM48 is applied.
> > * LAM_SUP bit, combined with paging mode (4-level or 5-level),
> > determines
> >   LAM status for supervisor mode address. I.e. when CR4.LAM_SUP =1,
> > 4-level
> >   paging mode will have LAM48 for supervisor mode address while 5-
> > level paging
> >   will have LAM57.
> > 
> > Note:
> > 1. LAM applies to only data address, not to instructions.
> > 2. LAM identification of an address as user or supervisor is based
> > solely on the
> >    value of pointer bit 63 and does not, for the purposes of LAM,
> > depend on the CPL.
> > 3. For user mode address, it is possible that 5-level paging and
> > LAM_U48 are both
> >    set, in this case, the effective usable linear address width is
> > 48, i.e. bit
> >    56:47 is reserved by LAM. [2]
> 
> It's worth to higlight that vmx exit Guest Linear Address field is
> always filled
> without the LAM metadata part, it can be used as linear address
> directly. 

OK

> I think
> this explains reason of no modification on paging_tmpl.h for shadow
> paging.
> 
> > 
> > 
> > ===LAM KVM Design===
> > 
> > Pass CR4.LAM_SUP under guest control.
> > 
> > Under EPT mode, CR3 is fully under guest control, guest LAM is thus
> > transparent to
> > KVM. Nothing more need to do.
> > 
> > For Shadow paging (EPT = off), KVM need to handle guest CR3.LAM_U48
> > and CR3.LAM_U57
> > toggles.
> > 
> > Patch 1 -- This patch can be mostly independent from LAM enabling.
> > It just renames
> >            CR4 reserved bits for better understanding, esp. for
> > beginners.
> > 
> > Patch 2, 9 -- Common part for both EPT and Shadow Paging modes
> > enabling.
> > 
> > Patch 3 ~ 8 -- For Shadow Paging mode LAM enabling.
> > 
> > [1] ISE Chap10 https://cdrdv2.intel.com/v1/dl/getContent/671368
> > (Section 10.6 VMX interaction)
> > [2] Thus currently, Kernel enabling patch only enables LAM57 mode. 
> > https://lore.kernel.org/lkml/20220815041803.17954-1-kirill.shutemov@linux.intel.com/
> > 
> > ---
> > Changelog
> > v2 --> v3:
> > As LAM Kernel patches are in tip tree now, rebase to it.
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/
> > 
> > v1 --> v2:
> > 1. Fixes i386-allyesconfig build error on get_pgd(), where
> >    CR3_HIGH_RSVD_MASK isn't applicable.
> >    (Reported-by: kernel test robot <lkp@intel.com>)
> > 2. In kvm_set_cr3(), be conservative on skip tlb flush when only
> > LAM bits
> >    toggles. (Kirill)
> > 
> > Robert Hoo (9):
> >   KVM: x86: Rename cr4_reserved/rsvd_* variables to be more
> > readable
> >   KVM: x86: Add CR4.LAM_SUP in guest owned bits
> >   KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits
> > for
> >     pgd
> >   KVM: x86: MMU: Commets update
> >   KVM: x86: MMU: Integrate LAM bits when build guest CR3
> >   KVM: x86: Untag LAM bits when applicable
> >   KVM: x86: When judging setting CR3 valid or not, consider LAM
> > bits
> >   KVM: x86: When guest set CR3, handle LAM bits semantics
> >   KVM: x86: LAM: Expose LAM CPUID to user space VMM
> > 
> >  arch/x86/include/asm/kvm_host.h        |  7 ++--
> >  arch/x86/include/asm/processor-flags.h |  1 +
> >  arch/x86/kvm/cpuid.c                   |  6 +--
> >  arch/x86/kvm/kvm_cache_regs.h          |  3 +-
> >  arch/x86/kvm/mmu.h                     |  5 +++
> >  arch/x86/kvm/mmu/mmu.c                 | 18 ++++++---
> >  arch/x86/kvm/vmx/vmx.c                 |  8 +++-
> >  arch/x86/kvm/x86.c                     | 51 ++++++++++++++++++++
> > ------
> >  arch/x86/kvm/x86.h                     | 43 +++++++++++++++++++++-
> >  9 files changed, 115 insertions(+), 27 deletions(-)
> > 
> > 
> > base-commit: a5dadcb601b4954c60494d797b4dd1e03a4b1ebe
> > --
> > 2.31.1
> > 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd
  2022-12-19  6:44   ` Yuan Yao
@ 2022-12-20 14:07     ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-20 14:07 UTC (permalink / raw)
  To: Yuan Yao; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Mon, 2022-12-19 at 14:44 +0800, Yuan Yao wrote:
> On Fri, Dec 09, 2022 at 12:45:51PM +0800, Robert Hoo wrote:
> > The get_cr3() is the implementation of kvm_mmu::get_guest_pgd(),
> > well, CR3
> > cannot be naturally equivalent to pgd, SDM says CR3 high bits are
> > reserved,
> > must be zero.
> > And now, with LAM feature's introduction, bit 61 ~ 62 are used.
> > So, rename get_cr3() --> get_pgd() to better indicate function
> > purpose and
> > in it, filtered out CR3 high bits.
> > 
> > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> > ---
> >  arch/x86/include/asm/processor-flags.h |  1 +
> >  arch/x86/kvm/mmu/mmu.c                 | 12 ++++++++----
> >  2 files changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/processor-flags.h
> > b/arch/x86/include/asm/processor-flags.h
> > index d8cccadc83a6..bb0f8dd16956 100644
> > --- a/arch/x86/include/asm/processor-flags.h
> > +++ b/arch/x86/include/asm/processor-flags.h
> > @@ -38,6 +38,7 @@
> >  #ifdef CONFIG_X86_64
> >  /* Mask off the address space ID and SME encryption bits. */
> >  #define CR3_ADDR_MASK	__sme_clr(PHYSICAL_PAGE_MASK)
> > +#define CR3_HIGH_RSVD_MASK	GENMASK_ULL(63, 52)
> >  #define CR3_PCID_MASK	0xFFFull
> >  #define CR3_NOFLUSH	BIT_ULL(63)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index b6f96d47e596..d433c8923b18 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4488,9 +4488,13 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu,
> > gpa_t new_pgd)
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
> > 
> > -static unsigned long get_cr3(struct kvm_vcpu *vcpu)
> > +static unsigned long get_pgd(struct kvm_vcpu *vcpu)
> >  {
> > +#ifdef CONFIG_X86_64
> > +	return kvm_read_cr3(vcpu) & ~CR3_HIGH_RSVD_MASK;
> 
> CR3_HIGH_RSVD_MASK is used to extract the guest pgd, may
> need to use guest's MAXPHYADDR but not hard code to 52.
> Or easily, just mask out the LAM bits.
> 
I define this CR3_HIGH_RSVD_MASK for extracting possible feature
control bits in [63, 52], now we already have LAM bits (bit 61, 62) and
PCID_NO_FLUSHING (bit 63) for examples. These bits, along with possible
future new ones, won't cross bit 52, as it is the MAXPHYADDR maximum
defined by current SDM. As for [51, guest actual max_phy_addr], I think
it should be guaranteed by other modules to reserved-as-0 to conform to
SDM. Given this, I chose the conservative const for simplicity.

However, your words also make sense, since this function is get_pgd(),
literally return kvm_read_cr3(vcpu) & ~vcpu->arch.reserved_gpa_bits is
more right. I'll take this in next version. Thanks.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3
  2022-12-19  6:53   ` Yuan Yao
@ 2022-12-20 14:07     ` Robert Hoo
  2022-12-21  2:12       ` Yuan Yao
  2022-12-21  7:50       ` Yu Zhang
  0 siblings, 2 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-20 14:07 UTC (permalink / raw)
  To: Yuan Yao; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Mon, 2022-12-19 at 14:53 +0800, Yuan Yao wrote:
> On Fri, Dec 09, 2022 at 12:45:53PM +0800, Robert Hoo wrote:
> > When calc the new CR3 value, take LAM bits in.
> > 
> > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> > ---
> >  arch/x86/kvm/mmu.h     | 5 +++++
> >  arch/x86/kvm/vmx/vmx.c | 3 ++-
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > index 6bdaacb6faa0..866f2b7cb509 100644
> > --- a/arch/x86/kvm/mmu.h
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -142,6 +142,11 @@ static inline unsigned long
> > kvm_get_active_pcid(struct kvm_vcpu *vcpu)
> >  	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
> >  }
> > 
> > +static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu)
> > +{
> 
> Unlike the PCIDs, LAM bits in CR3 are  not sharing with other
> features,
> (e.g. PCID vs non-PCIN on bit 0:11) so not check CR4[28] here should
> be fine, otherwise follows kvm_get_pcid() looks better.
> 
No. CR4.LAM_SUP isn't an enablement switch over CR3.LAM_U{48,57},
they're parallel relationship, CR4.LAM_SUP controls supervisor mode
addresses has LAM or not while CR3.LAM_U controls user mode address's
LAM enablement.

> > +	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 |
> > X86_CR3_LAM_U57);



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-19  7:32   ` Yuan Yao
@ 2022-12-20 14:07     ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-20 14:07 UTC (permalink / raw)
  To: Yuan Yao; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Mon, 2022-12-19 at 15:32 +0800, Yuan Yao wrote:
> On Fri, Dec 09, 2022 at 12:45:54PM +0800, Robert Hoo wrote:
> > Define kvm_untagged_addr() per LAM feature spec: Address high bits
> > are sign
> > extended, from highest effective address bit.
> > Note that LAM_U48 and LA57 has some effective bits overlap. This
> > patch
> > gives a WARN() on that case.
> > 
> > Now the only applicable possible case that addresses passed down
> > from VM
> > with LAM bits is those for MPX MSRs.
> > 
> > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c |  3 +++
> >  arch/x86/kvm/x86.c     |  5 +++++
> >  arch/x86/kvm/x86.h     | 37 +++++++++++++++++++++++++++++++++++++
> >  3 files changed, 45 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 9985dbb63e7b..16ddd3fcd3cb 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -2134,6 +2134,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
> > struct msr_data *msr_info)
> >  		    (!msr_info->host_initiated &&
> >  		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
> >  			return 1;
> > +
> > +		data = kvm_untagged_addr(data, vcpu);
> > +
> >  		if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
> >  		    (data & MSR_IA32_BNDCFGS_RSVD))
> >  			return 1;
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index eb1f2c20e19e..0a446b45e3d6 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1812,6 +1812,11 @@ static int __kvm_set_msr(struct kvm_vcpu
> > *vcpu, u32 index, u64 data,
> >  	case MSR_KERNEL_GS_BASE:
> >  	case MSR_CSTAR:
> >  	case MSR_LSTAR:
> > +		/*
> > +		 * LAM applies only addresses used for data accesses.
> 
> Confused due to the MSR_KERNEL_GS_BASE also used for data accessing,
> how about add below:
> The strict canonical checking is sitll appplied to MSR writing even
> LAM is enabled.

OK

...
> > +#ifdef CONFIG_X86_64
> > +/* untag addr for guest, according to vCPU CR3 and CR4 settings */
> > +static inline u64 kvm_untagged_addr(u64 addr, struct kvm_vcpu
> > *vcpu)
> > +{
> > +	if (addr >> 63 == 0) {
> > +		/* User pointers */
> > +		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
> > +			addr = get_canonical(addr, 57);
> > +		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48) {
> > +			/*
> > +			 * If guest enabled 5-level paging and LAM_U48,
> > +			 * bit 47 should be 0, bit 48:56 contains meta
> > data
> > +			 * although bit 47:56 are valid 5-level address
> 
> Still 48:56.

OK


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-19  9:45   ` Yuan Yao
@ 2022-12-20 14:07     ` Robert Hoo
  2022-12-21  2:38       ` Yuan Yao
  2022-12-21  8:02       ` Yu Zhang
  0 siblings, 2 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-20 14:07 UTC (permalink / raw)
  To: Yuan Yao; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Mon, 2022-12-19 at 17:45 +0800, Yuan Yao wrote:
> On Fri, Dec 09, 2022 at 12:45:54PM +0800, Robert Hoo wrote:
> > Define kvm_untagged_addr() per LAM feature spec: Address high bits
> > are sign
> > extended, from highest effective address bit.
> > Note that LAM_U48 and LA57 has some effective bits overlap. This
> > patch
> > gives a WARN() on that case.
> > 
> > Now the only applicable possible case that addresses passed down
> > from VM
> > with LAM bits is those for MPX MSRs.
> 
> How about the instruction emulation case ? e.g. KVM on behalf of CPU
> to do linear address accessing ? In this case the kvm_untagged_addr()
> should also be used to mask out the linear address, otherwise
> unexpected
> #GP(or other exception) will be injected into guest.
> 
> Please see all callers of __is_canonical_address()
> 
Emm, I take a look at the callers, looks like they're segment registers
and MSRs. Per spec (ISE 10.4): processors that support LAM continue to
require the addresses written to control registers or MSRs be legacy
canonical. So, like the handling on your last commented point on this
patch, such situation needs no changes, i.e. legacy canonical still
applied.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-12-20  9:10   ` Liu, Jingqi
@ 2022-12-20 14:16     ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-20 14:16 UTC (permalink / raw)
  To: Liu, Jingqi, pbonzini, seanjc, kirill.shutemov, kvm

On Tue, 2022-12-20 at 17:10 +0800, Liu, Jingqi wrote:
> On 12/9/2022 12:45 PM, Robert Hoo wrote:
> > When only changes LAM bits, ask next vcpu run to load mmu pgd, so
> > that it
> > will build new CR3 with LAM bits updates. No TLB flush needed on
> > this case.
> > When changes on effective addresses, no matter LAM bits changes or
> > not, go
> > through normal pgd update process.
> > 
> > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > ---
> >   arch/x86/kvm/x86.c | 24 ++++++++++++++++++++----
> >   1 file changed, 20 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 48a2ad1e4cd6..6fbe8dd36b1e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1248,9 +1248,9 @@ static bool kvm_is_valid_cr3(struct kvm_vcpu
> > *vcpu, unsigned long cr3)
> >   int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >   {
> >   	bool skip_tlb_flush = false;
> > -	unsigned long pcid = 0;
> > +	unsigned long pcid = 0, old_cr3;
> >   #ifdef CONFIG_X86_64
> > -	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> > +	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> >   
> >   	if (pcid_enabled) {
> >   		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
> > @@ -1263,6 +1263,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu,
> > unsigned long cr3)
> >   	if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
> >   		goto handle_tlb_flush;
> >   
> > +	if (!guest_cpuid_has(vcpu, X86_FEATURE_LAM) &&
> > +	    (cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57)))
> > +		return	1;
> > +
> >   	/*
> >   	 * Do not condition the GPA check on long mode, this helper is
> > used to
> >   	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee
> > that
> > @@ -1274,8 +1278,20 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu,
> > unsigned long cr3)
> >   	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
> >   		return 1;
> >   
> > -	if (cr3 != kvm_read_cr3(vcpu))
> > -		kvm_mmu_new_pgd(vcpu, cr3);
> > +	old_cr3 = kvm_read_cr3(vcpu);
> > +	if (cr3 != old_cr3) {
> > +		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
> > +			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
> > +					X86_CR3_LAM_U57));
> 
> "CR3_ADDR_MASK" should not contain "X86_CR3_LAM_U48 |
> X86_CR3_LAM_U57"

Yes, you're right. CR3_ADDR_MASK is the effective pgd address bits
range. This hunk of code is: judge if changed bits falls in effective
address, if true, need to load new pgd, no matter LAM bits changed or
not.

> But seems it is not defined explicitly.
> Besides this, looks good for me.
> Reviewed-by: Jingqi Liu<jingqi.liu@intel.com>
> > +		} else {
> > +			/*
> > +			 * Though effective addr no change, mark the
> > +			 * request so that LAM bits will take effect
> > +			 * when enter guest.
> > +			 */
> > +			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
> > +		}
> > +	}
> >   
> >   	vcpu->arch.cr3 = cr3;
> >   	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling
  2022-12-20  9:20 ` Liu, Jingqi
@ 2022-12-20 14:19   ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-20 14:19 UTC (permalink / raw)
  To: Liu, Jingqi, pbonzini, seanjc, kirill.shutemov, kvm

On Tue, 2022-12-20 at 17:20 +0800, Liu, Jingqi wrote:
> On 12/9/2022 12:45 PM, Robert Hoo wrote:
> > ===Feature Introduction===
> > 
> > Linear-address masking (LAM) [1], modifies the checking that is
> > applied to
> > *64-bit* linear addresses, allowing software to use of the
> > untranslated
> > address (upper) bits for metadata.
> > As for which upper bits of linear address can be borrowed, LAM has
> > 2 modes:
> > LAM_48 (bits 62:48, i.e. LAM width of 15) and LAM_57 (bits 62:57,
> > i.e. LAM
> > width of 6), controlled by these new bits: CR3[62] (LAM_U48),
> > CR3[61]
> > (LAM_U57), and CR4[28] (LAM_SUP).
> > 
> > * LAM_U48 and LAM_U57 bits controls LAM for user mode address. I.e.
> > if
> >    CR3.LAM_U57 = 1, LAM57 is applied; if CR3.LAM_U48 = 1 and
> > CR3.LAM_U57 = 0,
> >    LAM48 is applied.
> > * LAM_SUP bit, combined with paging mode (4-level or 5-level),
> > determines
> >    LAM status for supervisor mode address. I.e. when CR4.LAM_SUP
> > =1, 4-level
> >    paging mode will have LAM48 for supervisor mode address while 5-
> > level paging
> >    will have LAM57.
> > 
> > Note:
> > 1. LAM applies to only data address, not to instructions.
> > 2. LAM identification of an address as user or supervisor is based
> > solely on the
> >     value of pointer bit 63 and does not, for the purposes of LAM,
> > depend on the CPL.
> > 3. For user mode address, it is possible that 5-level paging and
> > LAM_U48 are both
> >     set, in this case, the effective usable linear address width is
> > 48, i.e. bit
> >     56:47 is reserved by LAM. [2]
> > 
> > 
> > ===LAM KVM Design===
> > 
> > Pass CR4.LAM_SUP under guest control.
> > 
> > Under EPT mode, CR3 is fully under guest control, guest LAM is thus
> > transparent to
> > KVM. Nothing more need to do.
> > 
> > For Shadow paging (EPT = off), KVM need to handle guest CR3.LAM_U48
> > and CR3.LAM_U57
> > toggles.
> > 
> > Patch 1 -- This patch can be mostly independent from LAM enabling.
> > It just renames
> >             CR4 reserved bits for better understanding, esp. for
> > beginners.
> > 	
> > Patch 2, 9 -- Common part for both EPT and Shadow Paging modes
> > enabling.
> > 
> > Patch 3 ~ 8 -- For Shadow Paging mode LAM enabling.
> > 
> > [1] ISE Chap10 https://cdrdv2.intel.com/v1/dl/getContent/671368
> > (Section 10.6 VMX interaction)
> > [2] Thus currently, Kernel enabling patch only enables LAM57 mode. 
> > https://lore.kernel.org/lkml/20220815041803.17954-1-kirill.shutemov@linux.intel.com/
> > 
> > ---
> > Changelog
> > v2 --> v3:
> > As LAM Kernel patches are in tip tree now, rebase to it.
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/
> > 
> > v1 --> v2:
> > 1. Fixes i386-allyesconfig build error on get_pgd(), where
> >     CR3_HIGH_RSVD_MASK isn't applicable.
> >     (Reported-by: kernel test robot <lkp@intel.com>)
> > 2. In kvm_set_cr3(), be conservative on skip tlb flush when only
> > LAM bits
> >     toggles. (Kirill)
> > 
> > Robert Hoo (9):
> >    KVM: x86: Rename cr4_reserved/rsvd_* variables to be more
> > readable
> >    KVM: x86: Add CR4.LAM_SUP in guest owned bits
> >    KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high
> > bits for
> >      pgd
> >    KVM: x86: MMU: Commets update
> >    KVM: x86: MMU: Integrate LAM bits when build guest CR3
> >    KVM: x86: Untag LAM bits when applicable
> >    KVM: x86: When judging setting CR3 valid or not, consider LAM
> > bits
> >    KVM: x86: When guest set CR3, handle LAM bits semantics
> >    KVM: x86: LAM: Expose LAM CPUID to user space VMM
> > 
> >   arch/x86/include/asm/kvm_host.h        |  7 ++--
> >   arch/x86/include/asm/processor-flags.h |  1 +
> >   arch/x86/kvm/cpuid.c                   |  6 +--
> >   arch/x86/kvm/kvm_cache_regs.h          |  3 +-
> >   arch/x86/kvm/mmu.h                     |  5 +++
> >   arch/x86/kvm/mmu/mmu.c                 | 18 ++++++---
> >   arch/x86/kvm/vmx/vmx.c                 |  8 +++-
> >   arch/x86/kvm/x86.c                     | 51 ++++++++++++++++++++-
> > -----
> >   arch/x86/kvm/x86.h                     | 43
> > +++++++++++++++++++++-
> >   9 files changed, 115 insertions(+), 27 deletions(-)
> > 
> > 
> > base-commit: a5dadcb601b4954c60494d797b4dd1e03a4b1ebe
> 
> It would be better if you can provide a URL link to easily reach
> this 
> base-commit.

The URL of tip tree is in above change log.
I'll move it here for easy association. Thanks.

> 
> Thanks,
> Jingqi


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-09  4:45 ` [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
  2022-12-19  7:32   ` Yuan Yao
  2022-12-19  9:45   ` Yuan Yao
@ 2022-12-21  0:35   ` Yang, Weijiang
  2022-12-21  1:38     ` Robert Hoo
  2022-12-21  2:55   ` Yuan Yao
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 63+ messages in thread
From: Yang, Weijiang @ 2022-12-21  0:35 UTC (permalink / raw)
  To: Robert Hoo
  Cc: Liu, Jingqi, pbonzini, Christopherson,, Sean, kirill.shutemov, kvm


On 12/9/2022 12:45 PM, Robert Hoo wrote:
> Define kvm_untagged_addr() per LAM feature spec: Address high bits are sign
> extended, from highest effective address bit.
> Note that LAM_U48 and LA57 has some effective bits overlap. This patch
> gives a WARN() on that case.
>
> Now the only applicable possible case that addresses passed down from VM
> with LAM bits is those for MPX MSRs.
>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> ---
>   arch/x86/kvm/vmx/vmx.c |  3 +++
>   arch/x86/kvm/x86.c     |  5 +++++
>   arch/x86/kvm/x86.h     | 37 +++++++++++++++++++++++++++++++++++++
>   3 files changed, 45 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9985dbb63e7b..16ddd3fcd3cb 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2134,6 +2134,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		    (!msr_info->host_initiated &&
>   		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
>   			return 1;
> +
> +		data = kvm_untagged_addr(data, vcpu);
> +
>   		if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
>   		    (data & MSR_IA32_BNDCFGS_RSVD))
>   			return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eb1f2c20e19e..0a446b45e3d6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1812,6 +1812,11 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
>   	case MSR_KERNEL_GS_BASE:
>   	case MSR_CSTAR:
>   	case MSR_LSTAR:
> +		/*
> +		 * LAM applies only addresses used for data accesses.
> +		 * Tagged address should never reach here.
> +		 * Strict canonical check still applies here.
> +		 */
>   		if (is_noncanonical_address(data, vcpu))
>   			return 1;
>   		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 6c1fbe27616f..f5a2a15783c6 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -195,11 +195,48 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
>   	return kvm_read_cr4_bits(vcpu, X86_CR4_LA57) ? 57 : 48;
>   }
>   
> +static inline u64 get_canonical(u64 la, u8 vaddr_bits)
> +{
> +	return ((int64_t)la << (64 - vaddr_bits)) >> (64 - vaddr_bits);
> +}
> +


There's already a helper for the calculation: __canonical_address(), and 
it's used in KVM

before set MSR_IA32_SYSENTER_ESP/MSR_IA32_SYSENTER_EIP.


>   static inline bool is_noncanonical_address(u64 la, struct kvm_vcpu *vcpu)
>   {
>   	return !__is_canonical_address(la, vcpu_virt_addr_bits(vcpu));
>   }
>   
> +#ifdef CONFIG_X86_64
> +/* untag addr for guest, according to vCPU CR3 and CR4 settings */
> +static inline u64 kvm_untagged_addr(u64 addr, struct kvm_vcpu *vcpu)
> +{
> +	if (addr >> 63 == 0) {
> +		/* User pointers */
> +		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
> +			addr = get_canonical(addr, 57);
> +		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48) {
> +			/*
> +			 * If guest enabled 5-level paging and LAM_U48,
> +			 * bit 47 should be 0, bit 48:56 contains meta data
> +			 * although bit 47:56 are valid 5-level address
> +			 * bits.
> +			 * If LAM_U48 and 4-level paging, bit47 is 0.
> +			 */
> +			WARN_ON(addr & _BITUL(47));
> +			addr = get_canonical(addr, 48);
> +		}
> +	} else if (kvm_read_cr4(vcpu) & X86_CR4_LAM_SUP) { /* Supervisor pointers */
> +		if (kvm_read_cr4(vcpu) & X86_CR4_LA57)
> +			addr = get_canonical(addr, 57);
> +		else
> +			addr = get_canonical(addr, 48);
> +	}
> +
> +	return addr;
> +}
> +#else
> +#define kvm_untagged_addr(addr, vcpu)	(addr)
> +#endif
> +
>   static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
>   					gva_t gva, gfn_t gfn, unsigned access)
>   {

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-21  0:35   ` Yang, Weijiang
@ 2022-12-21  1:38     ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-21  1:38 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: Liu, Jingqi, pbonzini, Christopherson,, Sean, kirill.shutemov, kvm

On Wed, 2022-12-21 at 08:35 +0800, Yang, Weijiang wrote: 
> > +static inline u64 get_canonical(u64 la, u8 vaddr_bits)
> > +{
> > +	return ((int64_t)la << (64 - vaddr_bits)) >> (64 - vaddr_bits);
> > +}
> > +
> 
> 
> There's already a helper for the calculation: __canonical_address(),
> and 
> it's used in KVM
> 
> before set MSR_IA32_SYSENTER_ESP/MSR_IA32_SYSENTER_EIP.
> 
Nice, thanks Weijiang.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3
  2022-12-20 14:07     ` Robert Hoo
@ 2022-12-21  2:12       ` Yuan Yao
  2022-12-21  7:50       ` Yu Zhang
  1 sibling, 0 replies; 63+ messages in thread
From: Yuan Yao @ 2022-12-21  2:12 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Tue, Dec 20, 2022 at 10:07:33PM +0800, Robert Hoo wrote:
> On Mon, 2022-12-19 at 14:53 +0800, Yuan Yao wrote:
> > On Fri, Dec 09, 2022 at 12:45:53PM +0800, Robert Hoo wrote:
> > > When calc the new CR3 value, take LAM bits in.
> > >
> > > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > > Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> > > ---
> > >  arch/x86/kvm/mmu.h     | 5 +++++
> > >  arch/x86/kvm/vmx/vmx.c | 3 ++-
> > >  2 files changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > > index 6bdaacb6faa0..866f2b7cb509 100644
> > > --- a/arch/x86/kvm/mmu.h
> > > +++ b/arch/x86/kvm/mmu.h
> > > @@ -142,6 +142,11 @@ static inline unsigned long
> > > kvm_get_active_pcid(struct kvm_vcpu *vcpu)
> > >  	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
> > >  }
> > >
> > > +static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu)
> > > +{
> >
> > Unlike the PCIDs, LAM bits in CR3 are  not sharing with other
> > features,
> > (e.g. PCID vs non-PCIN on bit 0:11) so not check CR4[28] here should
> > be fine, otherwise follows kvm_get_pcid() looks better.
> >
> No. CR4.LAM_SUP isn't an enablement switch over CR3.LAM_U{48,57},
> they're parallel relationship, CR4.LAM_SUP controls supervisor mode
> addresses has LAM or not while CR3.LAM_U controls user mode address's
> LAM enablement.

That's right, I didn't realize this at that time, thanks. : -)

>
> > > +	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 |
> > > X86_CR3_LAM_U57);
>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-20 14:07     ` Robert Hoo
@ 2022-12-21  2:38       ` Yuan Yao
  2022-12-21  8:02       ` Yu Zhang
  1 sibling, 0 replies; 63+ messages in thread
From: Yuan Yao @ 2022-12-21  2:38 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Tue, Dec 20, 2022 at 10:07:57PM +0800, Robert Hoo wrote:
> On Mon, 2022-12-19 at 17:45 +0800, Yuan Yao wrote:
> > On Fri, Dec 09, 2022 at 12:45:54PM +0800, Robert Hoo wrote:
> > > Define kvm_untagged_addr() per LAM feature spec: Address high bits
> > > are sign
> > > extended, from highest effective address bit.
> > > Note that LAM_U48 and LA57 has some effective bits overlap. This
> > > patch
> > > gives a WARN() on that case.
> > >
> > > Now the only applicable possible case that addresses passed down
> > > from VM
> > > with LAM bits is those for MPX MSRs.
> >
> > How about the instruction emulation case ? e.g. KVM on behalf of CPU
> > to do linear address accessing ? In this case the kvm_untagged_addr()
> > should also be used to mask out the linear address, otherwise
> > unexpected
> > #GP(or other exception) will be injected into guest.
> >
> > Please see all callers of __is_canonical_address()
> >
> Emm, I take a look at the callers, looks like they're segment registers
> and MSRs. Per spec (ISE 10.4): processors that support LAM continue to

__linearize() is using __is_canonical_address() for 64 bit mode, and this
is the code path for memory reading/writing emulation, what will happen
if a LAM enabled guest appiled metadata to the address and KVM emulates
the memory accessing for it ?

> require the addresses written to control registers or MSRs be legacy
> canonical. So, like the handling on your last commented point on this
> patch, such situation needs no changes, i.e. legacy canonical still
> applied.
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-09  4:45 ` [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
                     ` (2 preceding siblings ...)
  2022-12-21  0:35   ` Yang, Weijiang
@ 2022-12-21  2:55   ` Yuan Yao
  2022-12-21  8:22     ` Robert Hoo
  2022-12-21  8:14   ` Yu Zhang
  2022-12-28  8:32   ` Binbin Wu
  5 siblings, 1 reply; 63+ messages in thread
From: Yuan Yao @ 2022-12-21  2:55 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Fri, Dec 09, 2022 at 12:45:54PM +0800, Robert Hoo wrote:
> Define kvm_untagged_addr() per LAM feature spec: Address high bits are sign
> extended, from highest effective address bit.
> Note that LAM_U48 and LA57 has some effective bits overlap. This patch
> gives a WARN() on that case.
>
> Now the only applicable possible case that addresses passed down from VM
> with LAM bits is those for MPX MSRs.
>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c |  3 +++
>  arch/x86/kvm/x86.c     |  5 +++++
>  arch/x86/kvm/x86.h     | 37 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 45 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9985dbb63e7b..16ddd3fcd3cb 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2134,6 +2134,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		    (!msr_info->host_initiated &&
>  		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
>  			return 1;
> +
> +		data = kvm_untagged_addr(data, vcpu);
> +
>  		if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
>  		    (data & MSR_IA32_BNDCFGS_RSVD))
>  			return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eb1f2c20e19e..0a446b45e3d6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1812,6 +1812,11 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
>  	case MSR_KERNEL_GS_BASE:
>  	case MSR_CSTAR:
>  	case MSR_LSTAR:
> +		/*
> +		 * LAM applies only addresses used for data accesses.
> +		 * Tagged address should never reach here.
> +		 * Strict canonical check still applies here.
> +		 */
>  		if (is_noncanonical_address(data, vcpu))
>  			return 1;
>  		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 6c1fbe27616f..f5a2a15783c6 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -195,11 +195,48 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
>  	return kvm_read_cr4_bits(vcpu, X86_CR4_LA57) ? 57 : 48;
>  }
>
> +static inline u64 get_canonical(u64 la, u8 vaddr_bits)
> +{
> +	return ((int64_t)la << (64 - vaddr_bits)) >> (64 - vaddr_bits);
> +}
> +
>  static inline bool is_noncanonical_address(u64 la, struct kvm_vcpu *vcpu)
>  {
>  	return !__is_canonical_address(la, vcpu_virt_addr_bits(vcpu));
>  }
>
> +#ifdef CONFIG_X86_64
> +/* untag addr for guest, according to vCPU CR3 and CR4 settings */
> +static inline u64 kvm_untagged_addr(u64 addr, struct kvm_vcpu *vcpu)
> +{
> +	if (addr >> 63 == 0) {
> +		/* User pointers */
> +		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
> +			addr = get_canonical(addr, 57);
> +		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48) {
> +			/*
> +			 * If guest enabled 5-level paging and LAM_U48,
> +			 * bit 47 should be 0, bit 48:56 contains meta data
> +			 * although bit 47:56 are valid 5-level address
> +			 * bits.
> +			 * If LAM_U48 and 4-level paging, bit47 is 0.
> +			 */
> +			WARN_ON(addr & _BITUL(47));

IIUC, LAM_47 userspace canonical checking rule requests "bit 63 == bit 47 == 0"
before sign-extened the address.

if so looks it's guest's fault to not follow the LAM canonical checking rule,
what's the behavior of such violation on bare metal, #GP ? The behavior
shuld be same for guest instead of WARN_ON() on host, host does nothing wrong.

> +			addr = get_canonical(addr, 48);
> +		}
> +	} else if (kvm_read_cr4(vcpu) & X86_CR4_LAM_SUP) { /* Supervisor pointers */
> +		if (kvm_read_cr4(vcpu) & X86_CR4_LA57)
> +			addr = get_canonical(addr, 57);
> +		else
> +			addr = get_canonical(addr, 48);
> +	}
> +
> +	return addr;
> +}
> +#else
> +#define kvm_untagged_addr(addr, vcpu)	(addr)
> +#endif
> +
>  static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
>  					gva_t gva, gfn_t gfn, unsigned access)
>  {
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3
  2022-12-20 14:07     ` Robert Hoo
  2022-12-21  2:12       ` Yuan Yao
@ 2022-12-21  7:50       ` Yu Zhang
  2022-12-21  8:55         ` Robert Hoo
  1 sibling, 1 reply; 63+ messages in thread
From: Yu Zhang @ 2022-12-21  7:50 UTC (permalink / raw)
  To: Robert Hoo; +Cc: Yuan Yao, pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

> No. CR4.LAM_SUP isn't an enablement switch over CR3.LAM_U{48,57},
> they're parallel relationship, CR4.LAM_SUP controls supervisor mode
> addresses has LAM or not while CR3.LAM_U controls user mode address's
> LAM enablement.

Unfortunately, the spec(the one in your cover letter) has a bug in
"10.1 ENUMERATION, ENABLING, AND CONFIGURATION":

CR4.LAM_SUP enables and configures LAM for supervisor pointers:
• If CR3.LAM_SUP = 0, LAM is not enabled for supervisor pointers.
• If CR3.LAM_SUP = 1, LAM is enabled for supervisor pointers with a width determined by the paging mode:

Based on the context, I think "CR3.LAM_SUP" should be "CR4.LAM_SUP".

I believe it could just be a typo. But it is confusing enough.

B.R.
Yu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-20 14:07     ` Robert Hoo
  2022-12-21  2:38       ` Yuan Yao
@ 2022-12-21  8:02       ` Yu Zhang
  2022-12-21  8:49         ` Robert Hoo
  1 sibling, 1 reply; 63+ messages in thread
From: Yu Zhang @ 2022-12-21  8:02 UTC (permalink / raw)
  To: Robert Hoo; +Cc: Yuan Yao, pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

> Emm, I take a look at the callers, looks like they're segment registers
> and MSRs. Per spec (ISE 10.4): processors that support LAM continue to
> require the addresses written to control registers or MSRs be legacy
> canonical. So, like the handling on your last commented point on this
> patch, such situation needs no changes, i.e. legacy canonical still
> applied.
> 
Well, it's not about the control register or MSR emulation. It is about
the instruction decoder, which may encounter an instruction with a memory
operand with LAM bits occupied. 

B.R.
Yu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-09  4:45 ` [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
                     ` (3 preceding siblings ...)
  2022-12-21  2:55   ` Yuan Yao
@ 2022-12-21  8:14   ` Yu Zhang
  2022-12-21  8:37     ` Yu Zhang
  2022-12-28  8:32   ` Binbin Wu
  5 siblings, 1 reply; 63+ messages in thread
From: Yu Zhang @ 2022-12-21  8:14 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9985dbb63e7b..16ddd3fcd3cb 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2134,6 +2134,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		    (!msr_info->host_initiated &&
>  		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
>  			return 1;
> +
> +		data = kvm_untagged_addr(data, vcpu);

Do we really need to take pains to trigger the kvm_untagged_addr()
unconditionally? I mean, LAM may not be enabled by the guest or even
not exposed to the guest at all.

> +
>  		if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
>  		    (data & MSR_IA32_BNDCFGS_RSVD))
>  			return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eb1f2c20e19e..0a446b45e3d6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1812,6 +1812,11 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
>  	case MSR_KERNEL_GS_BASE:
>  	case MSR_CSTAR:
>  	case MSR_LSTAR:
> +		/*
> +		 * LAM applies only addresses used for data accesses.
> +		 * Tagged address should never reach here.
> +		 * Strict canonical check still applies here.
> +		 */
>  		if (is_noncanonical_address(data, vcpu))
>  			return 1;
>  		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 6c1fbe27616f..f5a2a15783c6 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -195,11 +195,48 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
>  	return kvm_read_cr4_bits(vcpu, X86_CR4_LA57) ? 57 : 48;
>  }
>  
> +static inline u64 get_canonical(u64 la, u8 vaddr_bits)
> +{
> +	return ((int64_t)la << (64 - vaddr_bits)) >> (64 - vaddr_bits);
> +}
> +

We already have a __canonical_address() in linux, no need to re-invent
another one. :)

B.R.
Yu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-21  2:55   ` Yuan Yao
@ 2022-12-21  8:22     ` Robert Hoo
  2022-12-21  9:35       ` Yuan Yao
  0 siblings, 1 reply; 63+ messages in thread
From: Robert Hoo @ 2022-12-21  8:22 UTC (permalink / raw)
  To: Yuan Yao; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Wed, 2022-12-21 at 10:55 +0800, Yuan Yao wrote:
> > +#ifdef CONFIG_X86_64
> > +/* untag addr for guest, according to vCPU CR3 and CR4 settings */
> > +static inline u64 kvm_untagged_addr(u64 addr, struct kvm_vcpu
> > *vcpu)
> > +{
> > +	if (addr >> 63 == 0) {
> > +		/* User pointers */
> > +		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
> > +			addr = get_canonical(addr, 57);
> > +		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48) {
> > +			/*
> > +			 * If guest enabled 5-level paging and LAM_U48,
> > +			 * bit 47 should be 0, bit 48:56 contains meta
> > data
> > +			 * although bit 47:56 are valid 5-level address
> > +			 * bits.
> > +			 * If LAM_U48 and 4-level paging, bit47 is 0.
> > +			 */
> > +			WARN_ON(addr & _BITUL(47));
> 
> IIUC, LAM_47 userspace canonical checking rule requests "bit 63 ==
> bit 47 == 0"
> before sign-extened the address.
> 
> if so looks it's guest's fault to not follow the LAM canonical
> checking rule,
> what's the behavior of such violation on bare metal, #GP ? 

Spec (ISE 10.2) doesn't mention a #GP for this case. IIUC, those
overlap bits are zeroed.
Current native Kernel LAM only enables LAM_U57.

> The behavior
> shuld be same for guest instead of WARN_ON() on host, host does
> nothing wrong.
> 
Agree that host did nothing wrong. So remove WARN_ON(), use a pr_warn()
instead? I here meant to warn that such case happens.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-12-09  4:45 ` [PATCH v3 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics Robert Hoo
  2022-12-20  9:10   ` Liu, Jingqi
@ 2022-12-21  8:30   ` Yu Zhang
  2022-12-21 12:52     ` Robert Hoo
  1 sibling, 1 reply; 63+ messages in thread
From: Yu Zhang @ 2022-12-21  8:30 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm

On Fri, Dec 09, 2022 at 12:45:56PM +0800, Robert Hoo wrote:
> When only changes LAM bits, ask next vcpu run to load mmu pgd, so that it
> will build new CR3 with LAM bits updates. No TLB flush needed on this case.
> When changes on effective addresses, no matter LAM bits changes or not, go
> through normal pgd update process.

Sorry, may I ask why this is related to effective address changes?
This patch is only about the CR3 updates...

> 
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> ---
>  arch/x86/kvm/x86.c | 24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 48a2ad1e4cd6..6fbe8dd36b1e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1248,9 +1248,9 @@ static bool kvm_is_valid_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  {
>  	bool skip_tlb_flush = false;
> -	unsigned long pcid = 0;
> +	unsigned long pcid = 0, old_cr3;
>  #ifdef CONFIG_X86_64
> -	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> +	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);

This may qualify a seperate patch. :)

>  
>  	if (pcid_enabled) {
>  		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
> @@ -1263,6 +1263,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  	if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
>  		goto handle_tlb_flush;
>  
> +	if (!guest_cpuid_has(vcpu, X86_FEATURE_LAM) &&
> +	    (cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57)))
> +		return	1;
> +
>  	/*
>  	 * Do not condition the GPA check on long mode, this helper is used to
>  	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
> @@ -1274,8 +1278,20 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
>  		return 1;
>  
> -	if (cr3 != kvm_read_cr3(vcpu))
> -		kvm_mmu_new_pgd(vcpu, cr3);
> +	old_cr3 = kvm_read_cr3(vcpu);
> +	if (cr3 != old_cr3) {
> +		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
> +			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
> +					X86_CR3_LAM_U57));
> +		} else {
> +			/*
> +			 * Though effective addr no change, mark the
Same question here.

> +			 * request so that LAM bits will take effect
> +			 * when enter guest.
> +			 */
> +			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
> +		}
> +	}
>  
>  	vcpu->arch.cr3 = cr3;
>  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> -- 
B.R.
Yu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-21  8:14   ` Yu Zhang
@ 2022-12-21  8:37     ` Yu Zhang
  0 siblings, 0 replies; 63+ messages in thread
From: Yu Zhang @ 2022-12-21  8:37 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Wed, Dec 21, 2022 at 04:14:39PM +0800, Yu Zhang wrote:
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 9985dbb63e7b..16ddd3fcd3cb 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -2134,6 +2134,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  		    (!msr_info->host_initiated &&
> >  		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
> >  			return 1;
> > +
> > +		data = kvm_untagged_addr(data, vcpu);
> 
> Do we really need to take pains to trigger the kvm_untagged_addr()
> unconditionally? I mean, LAM may not be enabled by the guest or even
> not exposed to the guest at all.
> 

Ouch... I just realized, that unlike the paging mode, LAM can
be enabled per-thread, instead of per-VM... 

B.R.
Yu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-21  8:02       ` Yu Zhang
@ 2022-12-21  8:49         ` Robert Hoo
  2022-12-21 10:10           ` Yu Zhang
  0 siblings, 1 reply; 63+ messages in thread
From: Robert Hoo @ 2022-12-21  8:49 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Yuan Yao, pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Wed, 2022-12-21 at 16:02 +0800, Yu Zhang wrote:
> > Emm, I take a look at the callers, looks like they're segment
> > registers
> > and MSRs. Per spec (ISE 10.4): processors that support LAM continue
> > to
> > require the addresses written to control registers or MSRs be
> > legacy
> > canonical. So, like the handling on your last commented point on
> > this
> > patch, such situation needs no changes, i.e. legacy canonical still
> > applied.
> > 
> 
> Well, it's not about the control register or MSR emulation. It is
> about
> the instruction decoder, which may encounter an instruction with a
> memory
> operand with LAM bits occupied. 
> 
OK, combine reply to you and Yuan's comments here.
So you're talking about when KVM emulates an instruction, and that
instruction is accessing memory, and the address for the memory can be
LAM tagged.
I think instruction emulation and memory access should be separated,
and LAM rules should apply to memory access phase. But frankly
speaking, I haven't looked into such case yet. Can you name an example
of such emulated instruction? I can take a look, hoping that the
emulation accessing memory falls into same code path as page fault
handling.

> B.R.
> Yu


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3
  2022-12-21  7:50       ` Yu Zhang
@ 2022-12-21  8:55         ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-21  8:55 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Yuan Yao, pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Wed, 2022-12-21 at 15:50 +0800, Yu Zhang wrote:
> > No. CR4.LAM_SUP isn't an enablement switch over CR3.LAM_U{48,57},
> > they're parallel relationship, CR4.LAM_SUP controls supervisor mode
> > addresses has LAM or not while CR3.LAM_U controls user mode
> > address's
> > LAM enablement.
> 
> Unfortunately, the spec(the one in your cover letter) has a bug in
> "10.1 ENUMERATION, ENABLING, AND CONFIGURATION":
> 
> CR4.LAM_SUP enables and configures LAM for supervisor pointers:
> • If CR3.LAM_SUP = 0, LAM is not enabled for supervisor pointers.
> • If CR3.LAM_SUP = 1, LAM is enabled for supervisor pointers with a
> width determined by the paging mode:
> 
> Based on the context, I think "CR3.LAM_SUP" should be "CR4.LAM_SUP".
> 
> I believe it could just be a typo. 

Ah, right, I hold the same belief with you. We can report it to ISE
author;-)

> But it is confusing enough.
> 
> B.R.
> Yu


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-21  8:22     ` Robert Hoo
@ 2022-12-21  9:35       ` Yuan Yao
  2022-12-21 10:22         ` Yu Zhang
  0 siblings, 1 reply; 63+ messages in thread
From: Yuan Yao @ 2022-12-21  9:35 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Wed, Dec 21, 2022 at 04:22:33PM +0800, Robert Hoo wrote:
> On Wed, 2022-12-21 at 10:55 +0800, Yuan Yao wrote:
> > > +#ifdef CONFIG_X86_64
> > > +/* untag addr for guest, according to vCPU CR3 and CR4 settings */
> > > +static inline u64 kvm_untagged_addr(u64 addr, struct kvm_vcpu
> > > *vcpu)
> > > +{
> > > +	if (addr >> 63 == 0) {
> > > +		/* User pointers */
> > > +		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
> > > +			addr = get_canonical(addr, 57);
> > > +		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48) {
> > > +			/*
> > > +			 * If guest enabled 5-level paging and LAM_U48,
> > > +			 * bit 47 should be 0, bit 48:56 contains meta
> > > data
> > > +			 * although bit 47:56 are valid 5-level address
> > > +			 * bits.
> > > +			 * If LAM_U48 and 4-level paging, bit47 is 0.
> > > +			 */
> > > +			WARN_ON(addr & _BITUL(47));
> >
> > IIUC, LAM_47 userspace canonical checking rule requests "bit 63 ==
> > bit 47 == 0"
> > before sign-extened the address.
> >
> > if so looks it's guest's fault to not follow the LAM canonical
> > checking rule,
> > what's the behavior of such violation on bare metal, #GP ?
>
> Spec (ISE 10.2) doesn't mention a #GP for this case. IIUC, those
> overlap bits are zeroed.

I mean the behavior of violation of "bit 63 == bit 47 == 0" rule,
yes no words in ISE 10.2/3 describe the behavior of such violation
case, but do you know more details of this or had some experiments
on hardware/SIMIC ?

> Current native Kernel LAM only enables LAM_U57.
>
> > The behavior
> > shuld be same for guest instead of WARN_ON() on host, host does
> > nothing wrong.
> >
> Agree that host did nothing wrong. So remove WARN_ON(), use a pr_warn()
> instead? I here meant to warn that such case happens.

I personally think that's not necessary if no hardware behavior is defined
for such violation, KVM doesn't need to warn such guest violation on host,
like KVM already done for many other cases (e.g #GP is injected for
restricted canonical violation but not WARN_ON)

>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-21  8:49         ` Robert Hoo
@ 2022-12-21 10:10           ` Yu Zhang
  2022-12-21 10:30             ` Yuan Yao
  0 siblings, 1 reply; 63+ messages in thread
From: Yu Zhang @ 2022-12-21 10:10 UTC (permalink / raw)
  To: Robert Hoo; +Cc: Yuan Yao, pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Wed, Dec 21, 2022 at 04:49:26PM +0800, Robert Hoo wrote:
> On Wed, 2022-12-21 at 16:02 +0800, Yu Zhang wrote:
> > > Emm, I take a look at the callers, looks like they're segment
> > > registers
> > > and MSRs. Per spec (ISE 10.4): processors that support LAM continue
> > > to
> > > require the addresses written to control registers or MSRs be
> > > legacy
> > > canonical. So, like the handling on your last commented point on
> > > this
> > > patch, such situation needs no changes, i.e. legacy canonical still
> > > applied.
> > > 
> > 
> > Well, it's not about the control register or MSR emulation. It is
> > about
> > the instruction decoder, which may encounter an instruction with a
> > memory
> > operand with LAM bits occupied. 
> > 
> OK, combine reply to you and Yuan's comments here.
> So you're talking about when KVM emulates an instruction, and that
> instruction is accessing memory, and the address for the memory can be
> LAM tagged.
> I think instruction emulation and memory access should be separated,
> and LAM rules should apply to memory access phase. But frankly
> speaking, I haven't looked into such case yet. Can you name an example
> of such emulated instruction? I can take a look, hoping that the
> emulation accessing memory falls into same code path as page fault
> handling.

I do not know the usage case of LAM. According to the spec, LAM does
not apply to instruction fetches, so guest rip and target addresses
in instructions such as jump, call etc. do not need special treatment.
But the spec does not say if LAM can be used to MMIO addresses... 

B.R.
Yu


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-21  9:35       ` Yuan Yao
@ 2022-12-21 10:22         ` Yu Zhang
  2022-12-21 10:33           ` Yuan Yao
  0 siblings, 1 reply; 63+ messages in thread
From: Yu Zhang @ 2022-12-21 10:22 UTC (permalink / raw)
  To: Yuan Yao; +Cc: Robert Hoo, pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

> > >
> > > IIUC, LAM_47 userspace canonical checking rule requests "bit 63 ==
> > > bit 47 == 0"
> > > before sign-extened the address.
> > >
> > > if so looks it's guest's fault to not follow the LAM canonical
> > > checking rule,
> > > what's the behavior of such violation on bare metal, #GP ?
> >
> > Spec (ISE 10.2) doesn't mention a #GP for this case. IIUC, those
> > overlap bits are zeroed.
> 
> I mean the behavior of violation of "bit 63 == bit 47 == 0" rule,
> yes no words in ISE 10.2/3 describe the behavior of such violation
> case, but do you know more details of this or had some experiments
> on hardware/SIMIC ?

Yes, the ISE is vague. But I do believe a #GP will be generated for
such violation, and KVM shall inject one if guest does no follow the
requirement, because such check is called(by the spec) as a "modified
canonicality check".

Anyway, we'd better confirm with the spec owner, instead of making
assumptions by ourselves. :)

B.R.
Yu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-21 10:10           ` Yu Zhang
@ 2022-12-21 10:30             ` Yuan Yao
  2022-12-21 12:40               ` Yu Zhang
  0 siblings, 1 reply; 63+ messages in thread
From: Yuan Yao @ 2022-12-21 10:30 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Robert Hoo, pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Wed, Dec 21, 2022 at 06:10:32PM +0800, Yu Zhang wrote:
> On Wed, Dec 21, 2022 at 04:49:26PM +0800, Robert Hoo wrote:
> > On Wed, 2022-12-21 at 16:02 +0800, Yu Zhang wrote:
> > > > Emm, I take a look at the callers, looks like they're segment
> > > > registers
> > > > and MSRs. Per spec (ISE 10.4): processors that support LAM continue
> > > > to
> > > > require the addresses written to control registers or MSRs be
> > > > legacy
> > > > canonical. So, like the handling on your last commented point on
> > > > this
> > > > patch, such situation needs no changes, i.e. legacy canonical still
> > > > applied.
> > > >
> > >
> > > Well, it's not about the control register or MSR emulation. It is
> > > about
> > > the instruction decoder, which may encounter an instruction with a
> > > memory
> > > operand with LAM bits occupied.
> > >
> > OK, combine reply to you and Yuan's comments here.
> > So you're talking about when KVM emulates an instruction, and that
> > instruction is accessing memory, and the address for the memory can be
> > LAM tagged.
> > I think instruction emulation and memory access should be separated,
> > and LAM rules should apply to memory access phase. But frankly
> > speaking, I haven't looked into such case yet. Can you name an example
> > of such emulated instruction? I can take a look, hoping that the
> > emulation accessing memory falls into same code path as page fault
> > handling.
>
> I do not know the usage case of LAM. According to the spec, LAM does
> not apply to instruction fetches, so guest rip and target addresses
> in instructions such as jump, call etc. do not need special treatment.
> But the spec does not say if LAM can be used to MMIO addresses...

The MMIO accessing in guest is also via GVA, so any emulated
device MMIO accessing hits this case. KVM checks GVA firstly even in TDP
case(which KVM already has GPA in hand) before start to "real"
accessing the GPA:

segmented_read/write() -> linearize() -> __linearize()

>
> B.R.
> Yu
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-21 10:22         ` Yu Zhang
@ 2022-12-21 10:33           ` Yuan Yao
  0 siblings, 0 replies; 63+ messages in thread
From: Yuan Yao @ 2022-12-21 10:33 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Robert Hoo, pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

OBOn Wed, Dec 21, 2022 at 06:22:47PM +0800, Yu Zhang wrote:
> > > >
> > > > IIUC, LAM_47 userspace canonical checking rule requests "bit 63 ==
> > > > bit 47 == 0"
> > > > before sign-extened the address.
> > > >
> > > > if so looks it's guest's fault to not follow the LAM canonical
> > > > checking rule,
> > > > what's the behavior of such violation on bare metal, #GP ?
> > >
> > > Spec (ISE 10.2) doesn't mention a #GP for this case. IIUC, those
> > > overlap bits are zeroed.
> >
> > I mean the behavior of violation of "bit 63 == bit 47 == 0" rule,
> > yes no words in ISE 10.2/3 describe the behavior of such violation
> > case, but do you know more details of this or had some experiments
> > on hardware/SIMIC ?
>
> Yes, the ISE is vague. But I do believe a #GP will be generated for
> such violation, and KVM shall inject one if guest does no follow the
> requirement, because such check is called(by the spec) as a "modified
> canonicality check".

Me too and that's why I had replies here :-)

>
> Anyway, we'd better confirm with the spec owner, instead of making
> assumptions by ourselves. :)

Agree!

>
> B.R.
> Yu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-21 10:30             ` Yuan Yao
@ 2022-12-21 12:40               ` Yu Zhang
  2022-12-22  8:21                 ` Yu Zhang
  0 siblings, 1 reply; 63+ messages in thread
From: Yu Zhang @ 2022-12-21 12:40 UTC (permalink / raw)
  To: Yuan Yao; +Cc: Robert Hoo, pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Wed, Dec 21, 2022 at 06:30:31PM +0800, Yuan Yao wrote:
> On Wed, Dec 21, 2022 at 06:10:32PM +0800, Yu Zhang wrote:
> > On Wed, Dec 21, 2022 at 04:49:26PM +0800, Robert Hoo wrote:
> > > On Wed, 2022-12-21 at 16:02 +0800, Yu Zhang wrote:
> > > > > Emm, I take a look at the callers, looks like they're segment
> > > > > registers
> > > > > and MSRs. Per spec (ISE 10.4): processors that support LAM continue
> > > > > to
> > > > > require the addresses written to control registers or MSRs be
> > > > > legacy
> > > > > canonical. So, like the handling on your last commented point on
> > > > > this
> > > > > patch, such situation needs no changes, i.e. legacy canonical still
> > > > > applied.
> > > > >
> > > >
> > > > Well, it's not about the control register or MSR emulation. It is
> > > > about
> > > > the instruction decoder, which may encounter an instruction with a
> > > > memory
> > > > operand with LAM bits occupied.
> > > >
> > > OK, combine reply to you and Yuan's comments here.
> > > So you're talking about when KVM emulates an instruction, and that
> > > instruction is accessing memory, and the address for the memory can be
> > > LAM tagged.
> > > I think instruction emulation and memory access should be separated,
> > > and LAM rules should apply to memory access phase. But frankly
> > > speaking, I haven't looked into such case yet. Can you name an example
> > > of such emulated instruction? I can take a look, hoping that the
> > > emulation accessing memory falls into same code path as page fault
> > > handling.
> >
> > I do not know the usage case of LAM. According to the spec, LAM does
> > not apply to instruction fetches, so guest rip and target addresses
> > in instructions such as jump, call etc. do not need special treatment.
> > But the spec does not say if LAM can be used to MMIO addresses...
> 
> The MMIO accessing in guest is also via GVA, so any emulated
> device MMIO accessing hits this case. KVM checks GVA firstly even in TDP

Yes. And sorry, I meant the spec does not say LAM can not be used
to MMIO addresses.

> case(which KVM already has GPA in hand) before start to "real"
> accessing the GPA:
> 
> segmented_read/write() -> linearize() -> __linearize()
> 

B.R.
Yu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics
  2022-12-21  8:30   ` Yu Zhang
@ 2022-12-21 12:52     ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-21 12:52 UTC (permalink / raw)
  To: Yu Zhang; +Cc: pbonzini, seanjc, kirill.shutemov, kvm

On Wed, 2022-12-21 at 16:30 +0800, Yu Zhang wrote:
> On Fri, Dec 09, 2022 at 12:45:56PM +0800, Robert Hoo wrote:
> > When only changes LAM bits, ask next vcpu run to load mmu pgd, so
> > that it
> > will build new CR3 with LAM bits updates. No TLB flush needed on
> > this case.
> > When changes on effective addresses, no matter LAM bits changes or
> > not, go
> > through normal pgd update process.
> 
> Sorry, may I ask why this is related to effective address changes?
> This patch is only about the CR3 updates...
> 
Not sure if we mean the same thing. Here effective address I mean the
CR3 & CR3_ADDR_MASK, i.e. pgd part.
> > 
> > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > ---
> >  arch/x86/kvm/x86.c | 24 ++++++++++++++++++++----
> >  1 file changed, 20 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 48a2ad1e4cd6..6fbe8dd36b1e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1248,9 +1248,9 @@ static bool kvm_is_valid_cr3(struct kvm_vcpu
> > *vcpu, unsigned long cr3)
> >  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >  {
> >  	bool skip_tlb_flush = false;
> > -	unsigned long pcid = 0;
> > +	unsigned long pcid = 0, old_cr3;
> >  #ifdef CONFIG_X86_64
> > -	bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> > +	bool pcid_enabled = !!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> 
> This may qualify a seperate patch. :)

I had thought of this as well, but it is so trivial, and literally we
cannot say original code is wrong.
> 
> >  
> >  	if (pcid_enabled) {
> >  		skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
> > @@ -1263,6 +1263,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu,
> > unsigned long cr3)
> >  	if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
> >  		goto handle_tlb_flush;
> >  
> > +	if (!guest_cpuid_has(vcpu, X86_FEATURE_LAM) &&
> > +	    (cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57)))
> > +		return	1;
> > +
> >  	/*
> >  	 * Do not condition the GPA check on long mode, this helper is
> > used to
> >  	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee
> > that
> > @@ -1274,8 +1278,20 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu,
> > unsigned long cr3)
> >  	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
> >  		return 1;
> >  
> > -	if (cr3 != kvm_read_cr3(vcpu))
> > -		kvm_mmu_new_pgd(vcpu, cr3);
> > +	old_cr3 = kvm_read_cr3(vcpu);
> > +	if (cr3 != old_cr3) {
> > +		if ((cr3 ^ old_cr3) & CR3_ADDR_MASK) {
> > +			kvm_mmu_new_pgd(vcpu, cr3 & ~(X86_CR3_LAM_U48 |
> > +					X86_CR3_LAM_U57));
> > +		} else {
> > +			/*
> > +			 * Though effective addr no change, mark the
> 
> Same question here.

Here is the case the CR3 updates are only of LAM bits, no changes to
PGD part.
> 
> > +			 * request so that LAM bits will take effect
> > +			 * when enter guest.
> > +			 */
> > +			kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
> > +		}
> > +	}
> >  
> >  	vcpu->arch.cr3 = cr3;
> >  	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
> > -- 
> 
> B.R.
> Yu


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-21 12:40               ` Yu Zhang
@ 2022-12-22  8:21                 ` Yu Zhang
  2022-12-23  2:36                   ` Yuan Yao
  0 siblings, 1 reply; 63+ messages in thread
From: Yu Zhang @ 2022-12-22  8:21 UTC (permalink / raw)
  To: Yuan Yao; +Cc: Robert Hoo, pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

> > > > >
> > > > > Well, it's not about the control register or MSR emulation. It is
> > > > > about
> > > > > the instruction decoder, which may encounter an instruction with a
> > > > > memory
> > > > > operand with LAM bits occupied.
> > > > >
> > > > OK, combine reply to you and Yuan's comments here.
> > > > So you're talking about when KVM emulates an instruction, and that
> > > > instruction is accessing memory, and the address for the memory can be
> > > > LAM tagged.
> > > > I think instruction emulation and memory access should be separated,
> > > > and LAM rules should apply to memory access phase. But frankly
> > > > speaking, I haven't looked into such case yet. Can you name an example
> > > > of such emulated instruction? I can take a look, hoping that the
> > > > emulation accessing memory falls into same code path as page fault
> > > > handling.
> > >
> > > I do not know the usage case of LAM. According to the spec, LAM does
> > > not apply to instruction fetches, so guest rip and target addresses
> > > in instructions such as jump, call etc. do not need special treatment.
> > > But the spec does not say if LAM can be used to MMIO addresses...
> > 
> > The MMIO accessing in guest is also via GVA, so any emulated
> > device MMIO accessing hits this case. KVM checks GVA firstly even in TDP
> 
> Yes. And sorry, I meant the spec does not say LAM can not be used
> to MMIO addresses.
> 
BTW, it is not just about MMIO. Normal memory address can also trigger the
linearize(), e.g., memory operand of io instructions, though I still have
no idea if this could be one of the usage cases of LAM.

B.R.
Yu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-22  8:21                 ` Yu Zhang
@ 2022-12-23  2:36                   ` Yuan Yao
  2022-12-23  3:55                     ` Robert Hoo
  0 siblings, 1 reply; 63+ messages in thread
From: Yuan Yao @ 2022-12-23  2:36 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Robert Hoo, pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Thu, Dec 22, 2022 at 04:21:32PM +0800, Yu Zhang wrote:
> > > > > >
> > > > > > Well, it's not about the control register or MSR emulation. It is
> > > > > > about
> > > > > > the instruction decoder, which may encounter an instruction with a
> > > > > > memory
> > > > > > operand with LAM bits occupied.
> > > > > >
> > > > > OK, combine reply to you and Yuan's comments here.
> > > > > So you're talking about when KVM emulates an instruction, and that
> > > > > instruction is accessing memory, and the address for the memory can be
> > > > > LAM tagged.
> > > > > I think instruction emulation and memory access should be separated,
> > > > > and LAM rules should apply to memory access phase. But frankly
> > > > > speaking, I haven't looked into such case yet. Can you name an example
> > > > > of such emulated instruction? I can take a look, hoping that the
> > > > > emulation accessing memory falls into same code path as page fault
> > > > > handling.
> > > >
> > > > I do not know the usage case of LAM. According to the spec, LAM does
> > > > not apply to instruction fetches, so guest rip and target addresses
> > > > in instructions such as jump, call etc. do not need special treatment.
> > > > But the spec does not say if LAM can be used to MMIO addresses...
> > >
> > > The MMIO accessing in guest is also via GVA, so any emulated
> > > device MMIO accessing hits this case. KVM checks GVA firstly even in TDP
> >
> > Yes. And sorry, I meant the spec does not say LAM can not be used
> > to MMIO addresses.
> >
> BTW, it is not just about MMIO. Normal memory address can also trigger the
> linearize(), e.g., memory operand of io instructions, though I still have
> no idea if this could be one of the usage cases of LAM.

Yes you are right, the emulated normal memory accessing should also be
considered.

Emm... to me I think the IOS/OUTS instruction family should be part of
LAM usage case, but yet no such explicity description about this in ISE...

>
> B.R.
> Yu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-23  2:36                   ` Yuan Yao
@ 2022-12-23  3:55                     ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-23  3:55 UTC (permalink / raw)
  To: Yuan Yao, Yu Zhang; +Cc: pbonzini, seanjc, kirill.shutemov, kvm, Jingqi Liu

On Fri, 2022-12-23 at 10:36 +0800, Yuan Yao wrote:
> On Thu, Dec 22, 2022 at 04:21:32PM +0800, Yu Zhang wrote:
> > > > > > > 
> > > > > > > Well, it's not about the control register or MSR
> > > > > > > emulation. It is
> > > > > > > about
> > > > > > > the instruction decoder, which may encounter an
> > > > > > > instruction with a
> > > > > > > memory
> > > > > > > operand with LAM bits occupied.
> > > > > > > 
> > > > > > 
> > > > > > OK, combine reply to you and Yuan's comments here.
> > > > > > So you're talking about when KVM emulates an instruction,
> > > > > > and that
> > > > > > instruction is accessing memory, and the address for the
> > > > > > memory can be
> > > > > > LAM tagged.
> > > > > > I think instruction emulation and memory access should be
> > > > > > separated,
> > > > > > and LAM rules should apply to memory access phase. But
> > > > > > frankly
> > > > > > speaking, I haven't looked into such case yet. Can you name
> > > > > > an example
> > > > > > of such emulated instruction? I can take a look, hoping
> > > > > > that the
> > > > > > emulation accessing memory falls into same code path as
> > > > > > page fault
> > > > > > handling.
> > > > > 
> > > > > I do not know the usage case of LAM. According to the spec,
> > > > > LAM does
> > > > > not apply to instruction fetches, so guest rip and target
> > > > > addresses
> > > > > in instructions such as jump, call etc. do not need special
> > > > > treatment.
> > > > > But the spec does not say if LAM can be used to MMIO
> > > > > addresses...
> > > > 
> > > > The MMIO accessing in guest is also via GVA, so any emulated
> > > > device MMIO accessing hits this case. KVM checks GVA firstly
> > > > even in TDP
> > > 
> > > Yes. And sorry, I meant the spec does not say LAM can not be used
> > > to MMIO addresses.
> > > 
> > 
> > BTW, it is not just about MMIO. Normal memory address can also
> > trigger the
> > linearize(), e.g., memory operand of io instructions, though I
> > still have
> > no idea if this could be one of the usage cases of LAM.
> 
> Yes you are right, the emulated normal memory accessing should also
> be
> considered.
> 
> Emm... to me I think the IOS/OUTS instruction family should be part
> of
> LAM usage case, but yet no such explicity description about this in
> ISE...
> 
What instructions will be emulated by KVM now? I don't think KVM will
emulate all that would otherwise #UD.
For callers from handle page fault path, that address has been untagged
by HW before exit to KVM.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
  2022-12-09  4:45 ` [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable Robert Hoo
@ 2022-12-28  3:37   ` Binbin Wu
  2022-12-29  1:42     ` Robert Hoo
  0 siblings, 1 reply; 63+ messages in thread
From: Binbin Wu @ 2022-12-28  3:37 UTC (permalink / raw)
  To: Robert Hoo, pbonzini, seanjc, kirill.shutemov, kvm


On 12/9/2022 12:45 PM, Robert Hoo wrote:
> kvm_vcpu_arch::cr4_guest_owned_bits and kvm_vcpu_arch::cr4_guest_rsvd_bits
> looks confusing. Rename latter to cr4_host_rsvd_bits, because it in fact
> decribes the effective host reserved cr4 bits from the vcpu's perspective.

IMO, the current name cr4_guest_rsvd_bits is OK becuase it shows that 
these bits are reserved bits from the pointview of guest.

Change to *host* is OK, but seems not easier to understand.


>
> Meanwhile, rename other related variables/macros to be better descriptive:
> * CR4_RESERVED_BITS --> CR4_HOST_RESERVED_BITS, which describes host bare
> metal CR4 reserved bits.
>
> * cr4_reserved_bits --> cr4_kvm_reserved_bits, which describes
> CR4_HOST_RESERVED_BITS + !kvm_cap_has() = kvm level cr4 reserved bits.
>
> * __cr4_reserved_bits() --> __cr4_calc_reserved_bits(), which to calc
> effective cr4 reserved bits for kvm or vm level, by corresponding
> x_cpu_has() input.
>
> Thus, by these renames, the hierarchical relations of those reserved CR4
> bits is more clear.
>
> Just renames, no functional changes intended.
>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  4 ++--
>   arch/x86/kvm/cpuid.c            |  4 ++--
>   arch/x86/kvm/vmx/vmx.c          |  2 +-
>   arch/x86/kvm/x86.c              | 12 ++++++------
>   arch/x86/kvm/x86.h              |  4 ++--
>   5 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f05ebaa26f0f..3c736e00b6b1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -114,7 +114,7 @@
>   			  | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
>   			  | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG))
>   
> -#define CR4_RESERVED_BITS                                               \
> +#define CR4_HOST_RESERVED_BITS                                               \
>   	(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
>   			  | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE     \
>   			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
> @@ -671,7 +671,7 @@ struct kvm_vcpu_arch {
>   	unsigned long cr3;
>   	unsigned long cr4;
>   	unsigned long cr4_guest_owned_bits;
> -	unsigned long cr4_guest_rsvd_bits;
> +	unsigned long cr4_host_rsvd_bits;
>   	unsigned long cr8;
>   	u32 host_pkru;
>   	u32 pkru;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index c92c49a0b35b..01e2b93ef563 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -352,8 +352,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   	vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
>   
>   	kvm_pmu_refresh(vcpu);
> -	vcpu->arch.cr4_guest_rsvd_bits =
> -	    __cr4_reserved_bits(guest_cpuid_has, vcpu);
> +	vcpu->arch.cr4_host_rsvd_bits =
> +	    __cr4_calc_reserved_bits(guest_cpuid_has, vcpu);
>   
>   	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries,
>   						    vcpu->arch.cpuid_nent));
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 63247c57c72c..cfa06c7c062e 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4250,7 +4250,7 @@ void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
>   	struct kvm_vcpu *vcpu = &vmx->vcpu;
>   
>   	vcpu->arch.cr4_guest_owned_bits = KVM_POSSIBLE_CR4_GUEST_BITS &
> -					  ~vcpu->arch.cr4_guest_rsvd_bits;
> +					  ~vcpu->arch.cr4_host_rsvd_bits;
>   	if (!enable_ept) {
>   		vcpu->arch.cr4_guest_owned_bits &= ~X86_CR4_TLBFLUSH_BITS;
>   		vcpu->arch.cr4_guest_owned_bits &= ~X86_CR4_PDPTR_BITS;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 69227f77b201..eb1f2c20e19e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -108,7 +108,7 @@ u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
>   static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
>   #endif
>   
> -static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
> +static u64 __read_mostly cr4_kvm_reserved_bits = CR4_HOST_RESERVED_BITS;
>   
>   #define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
>   
> @@ -1102,10 +1102,10 @@ EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);
>   
>   bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>   {
> -	if (cr4 & cr4_reserved_bits)
> +	if (cr4 & cr4_kvm_reserved_bits)
>   		return false;
>   
> -	if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
> +	if (cr4 & vcpu->arch.cr4_host_rsvd_bits)
>   		return false;
>   
>   	return true;
> @@ -12290,7 +12290,7 @@ int kvm_arch_hardware_setup(void *opaque)
>   		kvm_caps.supported_xss = 0;
>   
>   #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
> -	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
> +	cr4_kvm_reserved_bits = __cr4_calc_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
>   #undef __kvm_cpu_cap_has
>   
>   	if (kvm_caps.has_tsc_control) {
> @@ -12323,8 +12323,8 @@ int kvm_arch_check_processor_compat(void *opaque)
>   
>   	WARN_ON(!irqs_disabled());
>   
> -	if (__cr4_reserved_bits(cpu_has, c) !=
> -	    __cr4_reserved_bits(cpu_has, &boot_cpu_data))
> +	if (__cr4_calc_reserved_bits(cpu_has, c) !=
> +	    __cr4_calc_reserved_bits(cpu_has, &boot_cpu_data))
>   		return -EIO;
>   
>   	return ops->check_processor_compatibility();
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 829d3134c1eb..d92e580768e5 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -452,9 +452,9 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
>   #define  KVM_MSR_RET_INVALID	2	/* in-kernel MSR emulation #GP condition */
>   #define  KVM_MSR_RET_FILTERED	3	/* #GP due to userspace MSR filter */
>   
> -#define __cr4_reserved_bits(__cpu_has, __c)             \
> +#define __cr4_calc_reserved_bits(__cpu_has, __c)             \
>   ({                                                      \
> -	u64 __reserved_bits = CR4_RESERVED_BITS;        \
> +	u64 __reserved_bits = CR4_HOST_RESERVED_BITS;        \
>                                                           \
>   	if (!__cpu_has(__c, X86_FEATURE_XSAVE))         \
>   		__reserved_bits |= X86_CR4_OSXSAVE;     \

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-09  4:45 ` [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
                     ` (4 preceding siblings ...)
  2022-12-21  8:14   ` Yu Zhang
@ 2022-12-28  8:32   ` Binbin Wu
  2022-12-29  0:41     ` Robert Hoo
  5 siblings, 1 reply; 63+ messages in thread
From: Binbin Wu @ 2022-12-28  8:32 UTC (permalink / raw)
  To: Robert Hoo, pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Jingqi Liu


On 12/9/2022 12:45 PM, Robert Hoo wrote
> +#ifdef CONFIG_X86_64
> +/* untag addr for guest, according to vCPU CR3 and CR4 settings */
> +static inline u64 kvm_untagged_addr(u64 addr, struct kvm_vcpu *vcpu)
> +{
> +	if (addr >> 63 == 0) {
> +		/* User pointers */
> +		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
> +			addr = get_canonical(addr, 57);

According to the spec, LAM_U57/LAM_SUP also performs a modified 
canonicality check.

Why the check only be done for LAM_U48, but not for LAM_U57 and LAM_SUP 
cases?


> +		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48) {
> +			/*
> +			 * If guest enabled 5-level paging and LAM_U48,
> +			 * bit 47 should be 0, bit 48:56 contains meta data
> +			 * although bit 47:56 are valid 5-level address
> +			 * bits.
> +			 * If LAM_U48 and 4-level paging, bit47 is 0.
> +			 */
> +			WARN_ON(addr & _BITUL(47));
> +			addr = get_canonical(addr, 48);
> +		}
> +	} else if (kvm_read_cr4(vcpu) & X86_CR4_LAM_SUP) { /* Supervisor pointers */
> +		if (kvm_read_cr4(vcpu) & X86_CR4_LA57)
> +			addr = get_canonical(addr, 57);
> +		else
> +			addr = get_canonical(addr, 48);
> +	}
> +
> +	return addr;
> +}
> +#else
> +#define kvm_untagged_addr(addr, vcpu)	(addr)
> +#endif
> +
>   static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
>   					gva_t gva, gfn_t gfn, unsigned access)
>   {

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable
  2022-12-28  8:32   ` Binbin Wu
@ 2022-12-29  0:41     ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2022-12-29  0:41 UTC (permalink / raw)
  To: Binbin Wu, pbonzini, seanjc, kirill.shutemov, kvm; +Cc: Jingqi Liu

On Wed, 2022-12-28 at 16:32 +0800, Binbin Wu wrote:
> On 12/9/2022 12:45 PM, Robert Hoo wrote
> > +#ifdef CONFIG_X86_64
> > +/* untag addr for guest, according to vCPU CR3 and CR4 settings */
> > +static inline u64 kvm_untagged_addr(u64 addr, struct kvm_vcpu
> > *vcpu)
> > +{
> > +	if (addr >> 63 == 0) {
> > +		/* User pointers */
> > +		if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U57)
> > +			addr = get_canonical(addr, 57);
> 
> According to the spec, LAM_U57/LAM_SUP also performs a modified 
> canonicality check.
> 
> Why the check only be done for LAM_U48, but not for LAM_U57 and
> LAM_SUP 
> cases?
> 
Doesn't this check for LAM_U57?
And below else if branch checks LAM_U48.
And below outer else if branch checks CR4.LAM_SUP.
> 
> > +		else if (kvm_read_cr3(vcpu) & X86_CR3_LAM_U48) {
> > +			/*
> > +			 * If guest enabled 5-level paging and LAM_U48,
> > +			 * bit 47 should be 0, bit 48:56 contains meta
> > data
> > +			 * although bit 47:56 are valid 5-level address
> > +			 * bits.
> > +			 * If LAM_U48 and 4-level paging, bit47 is 0.
> > +			 */
> > +			WARN_ON(addr & _BITUL(47));
> > +			addr = get_canonical(addr, 48);
> > +		}
> > +	} else if (kvm_read_cr4(vcpu) & X86_CR4_LAM_SUP) { /*
> > Supervisor pointers */
> > +		if (kvm_read_cr4(vcpu) & X86_CR4_LA57)
> > +			addr = get_canonical(addr, 57);
> > +		else
> > +			addr = get_canonical(addr, 48);
> > +	}
> > +
> > +	return addr;
> > +}
...


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
  2022-12-28  3:37   ` Binbin Wu
@ 2022-12-29  1:42     ` Robert Hoo
  2023-01-07  0:35       ` Sean Christopherson
  0 siblings, 1 reply; 63+ messages in thread
From: Robert Hoo @ 2022-12-29  1:42 UTC (permalink / raw)
  To: Binbin Wu, pbonzini, seanjc, kirill.shutemov, kvm

On Wed, 2022-12-28 at 11:37 +0800, Binbin Wu wrote:
> On 12/9/2022 12:45 PM, Robert Hoo wrote:
> > kvm_vcpu_arch::cr4_guest_owned_bits and
> > kvm_vcpu_arch::cr4_guest_rsvd_bits
> > looks confusing. Rename latter to cr4_host_rsvd_bits, because it in
> > fact
> > decribes the effective host reserved cr4 bits from the vcpu's
> > perspective.
> 
> IMO, the current name cr4_guest_rsvd_bits is OK becuase it shows
> that 
> these bits are reserved bits from the pointview of guest.

Actually, it's cr4_guest_owned_bits that from the perspective of guest.

> 
cr4_guest_owned_bits and cr4_guest_rsvd_bits together looks quite
confusing. The first looks like "guest owns these CR4 bits"; the latter
looks like "guest reserved these bits". My first response of the seeing
is: hey, what's difference between guest owns and guest reserves?

Then take a look at their calculations, we'll find that
cr4_guest_owned_bits comes from cr4_guest_rsvd_bits, and coarsely "~"
relationship.

vcpu->arch.cr4_guest_owned_bits = KVM_POSSIBLE_CR4_GUEST_BITS &
			~vcpu->arch.cr4_guest_rsvd_bits;

Those set bits in cr4_guest_owned_bits means guest owns;
Those set bits in cr4_guest_rsvd_bits means HOST owns. So the ~
calculation is naturally explained.

The more hierarchical relationships among these macros/structure
elements surrounding CR4 virtualization, can be found below.

(P.S. more story: this isn't the first time I dig into these code. I
had got clear on their relationship long ago, but when I come to it
this time, I got confused by the name again. Therefore, I would rather
cook this patch now to avoid next time;-))

> Change to *host* is OK, but seems not easier to understand.
> 
Perhaps "host_reserved" isn't the best name, welcome other suggestions.
But guest_own v.s. guest_rsvd, is really confusing, or even misleading.
> 
> > 
> > Meanwhile, rename other related variables/macros to be better
> > descriptive:
> > * CR4_RESERVED_BITS --> CR4_HOST_RESERVED_BITS, which describes
> > host bare
> > metal CR4 reserved bits.
> > 
> > * cr4_reserved_bits --> cr4_kvm_reserved_bits, which describes
> > CR4_HOST_RESERVED_BITS + !kvm_cap_has() = kvm level cr4 reserved
> > bits.
> > 
> > * __cr4_reserved_bits() --> __cr4_calc_reserved_bits(), which to
> > calc
> > effective cr4 reserved bits for kvm or vm level, by corresponding
> > x_cpu_has() input.
> > 
> > Thus, by these renames, the hierarchical relations of those
> > reserved CR4
> > bits is more clear.
> > 
> > Just renames, no functional changes intended.
> > 
> > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > ---
> >   arch/x86/include/asm/kvm_host.h |  4 ++--
> >   arch/x86/kvm/cpuid.c            |  4 ++--
> >   arch/x86/kvm/vmx/vmx.c          |  2 +-
> >   arch/x86/kvm/x86.c              | 12 ++++++------
> >   arch/x86/kvm/x86.h              |  4 ++--
> >   5 files changed, 13 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h
> > index f05ebaa26f0f..3c736e00b6b1 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -114,7 +114,7 @@
> >   			  | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP |
> > X86_CR0_AM \
> >   			  | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG))
> >   
> > -#define
> > CR4_RESERVED_BITS                                               \
> > +#define
> > CR4_HOST_RESERVED_BITS                                             
> >   \
> >   	(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD |
> > X86_CR4_DE\
> >   			  | X86_CR4_PSE | X86_CR4_PAE |
> > X86_CR4_MCE     \
> >   			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR
> > | X86_CR4_PCIDE \
> > @@ -671,7 +671,7 @@ struct kvm_vcpu_arch {
> >   	unsigned long cr3;
> >   	unsigned long cr4;
> >   	unsigned long cr4_guest_owned_bits;
> > -	unsigned long cr4_guest_rsvd_bits;
> > +	unsigned long cr4_host_rsvd_bits;
> >   	unsigned long cr8;
> >   	u32 host_pkru;
> >   	u32 pkru;
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index c92c49a0b35b..01e2b93ef563 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -352,8 +352,8 @@ static void kvm_vcpu_after_set_cpuid(struct
> > kvm_vcpu *vcpu)
> >   	vcpu->arch.reserved_gpa_bits =
> > kvm_vcpu_reserved_gpa_bits_raw(vcpu);
> >   
> >   	kvm_pmu_refresh(vcpu);
> > -	vcpu->arch.cr4_guest_rsvd_bits =
> > -	    __cr4_reserved_bits(guest_cpuid_has, vcpu);
> > +	vcpu->arch.cr4_host_rsvd_bits =
> > +	    __cr4_calc_reserved_bits(guest_cpuid_has, vcpu);
> >   
> >   	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu-
> > >arch.cpuid_entries,
> >   						    vcpu-
> > >arch.cpuid_nent));
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 63247c57c72c..cfa06c7c062e 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -4250,7 +4250,7 @@ void set_cr4_guest_host_mask(struct vcpu_vmx
> > *vmx)
> >   	struct kvm_vcpu *vcpu = &vmx->vcpu;
> >   
> >   	vcpu->arch.cr4_guest_owned_bits = KVM_POSSIBLE_CR4_GUEST_BITS &
> > -					  ~vcpu-
> > >arch.cr4_guest_rsvd_bits;
> > +					  ~vcpu-
> > >arch.cr4_host_rsvd_bits;
> >   	if (!enable_ept) {
> >   		vcpu->arch.cr4_guest_owned_bits &=
> > ~X86_CR4_TLBFLUSH_BITS;
> >   		vcpu->arch.cr4_guest_owned_bits &= ~X86_CR4_PDPTR_BITS;
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 69227f77b201..eb1f2c20e19e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -108,7 +108,7 @@ u64 __read_mostly efer_reserved_bits =
> > ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
> >   static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
> >   #endif
> >   
> > -static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
> > +static u64 __read_mostly cr4_kvm_reserved_bits =
> > CR4_HOST_RESERVED_BITS;
> >   
> >   #define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
> >   
> > @@ -1102,10 +1102,10 @@ EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);
> >   
> >   bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> >   {
> > -	if (cr4 & cr4_reserved_bits)
> > +	if (cr4 & cr4_kvm_reserved_bits)
> >   		return false;
> >   
> > -	if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
> > +	if (cr4 & vcpu->arch.cr4_host_rsvd_bits)
> >   		return false;
> >   
> >   	return true;
> > @@ -12290,7 +12290,7 @@ int kvm_arch_hardware_setup(void *opaque)
> >   		kvm_caps.supported_xss = 0;
> >   
> >   #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
> > -	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has,
> > UNUSED_);
> > +	cr4_kvm_reserved_bits =
> > __cr4_calc_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
> >   #undef __kvm_cpu_cap_has
> >   
> >   	if (kvm_caps.has_tsc_control) {
> > @@ -12323,8 +12323,8 @@ int kvm_arch_check_processor_compat(void
> > *opaque)
> >   
> >   	WARN_ON(!irqs_disabled());
> >   
> > -	if (__cr4_reserved_bits(cpu_has, c) !=
> > -	    __cr4_reserved_bits(cpu_has, &boot_cpu_data))
> > +	if (__cr4_calc_reserved_bits(cpu_has, c) !=
> > +	    __cr4_calc_reserved_bits(cpu_has, &boot_cpu_data))
> >   		return -EIO;
> >   
> >   	return ops->check_processor_compatibility();
> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > index 829d3134c1eb..d92e580768e5 100644
> > --- a/arch/x86/kvm/x86.h
> > +++ b/arch/x86/kvm/x86.h
> > @@ -452,9 +452,9 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32
> > index, u32 type);
> >   #define  KVM_MSR_RET_INVALID	2	/* in-kernel MSR emulation
> > #GP condition */
> >   #define  KVM_MSR_RET_FILTERED	3	/* #GP due to userspace MSR
> > filter */
> >   
> > -#define __cr4_reserved_bits(__cpu_has, __c)             \
> > +#define __cr4_calc_reserved_bits(__cpu_has, __c)             \
> >   ({                                                      \
> > -	u64 __reserved_bits = CR4_RESERVED_BITS;        \
> > +	u64 __reserved_bits = CR4_HOST_RESERVED_BITS;        \
> >                                                           \
> >   	if (!__cpu_has(__c, X86_FEATURE_XSAVE))         \
> >   		__reserved_bits |= X86_CR4_OSXSAVE;     \


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
  2022-12-29  1:42     ` Robert Hoo
@ 2023-01-07  0:35       ` Sean Christopherson
  2023-01-07 13:30         ` Robert Hoo
  0 siblings, 1 reply; 63+ messages in thread
From: Sean Christopherson @ 2023-01-07  0:35 UTC (permalink / raw)
  To: Robert Hoo; +Cc: Binbin Wu, pbonzini, kirill.shutemov, kvm

On Thu, Dec 29, 2022, Robert Hoo wrote:
> On Wed, 2022-12-28 at 11:37 +0800, Binbin Wu wrote:
> > On 12/9/2022 12:45 PM, Robert Hoo wrote:
> > > kvm_vcpu_arch::cr4_guest_owned_bits and
> > > kvm_vcpu_arch::cr4_guest_rsvd_bits
> > > looks confusing. Rename latter to cr4_host_rsvd_bits, because it in
> > > fact decribes the effective host reserved cr4 bits from the vcpu's
> > > perspective.
> > 
> > IMO, the current name cr4_guest_rsvd_bits is OK becuase it shows that these
> > bits are reserved bits from the pointview of guest.
> 
> Actually, it's cr4_guest_owned_bits that from the perspective of guest.

No, cr4_guest_owned_bits is KVM's view of things.  It tracks which bits have
effectively been passed through to the guest and so need to be read out of the
VMCS after running the vCPU.

> cr4_guest_owned_bits and cr4_guest_rsvd_bits together looks quite
> confusing.

I disagree, KVM (and the SDM and the APM) uses "reserved" or "rsvd" all over the
place to indicate reserved bits/values/fields.

> > > * cr4_reserved_bits --> cr4_kvm_reserved_bits, which describes

Hard no.  They aren't just KVM reserved, many of those bits are reserved by
hardware, which is 100% dependent on the host.

> > > CR4_HOST_RESERVED_BITS + !kvm_cap_has() = kvm level cr4 reserved
> > > bits.
> > > 
> > > * __cr4_reserved_bits() --> __cr4_calc_reserved_bits(), which to
> > > calc
> > > effective cr4 reserved bits for kvm or vm level, by corresponding
> > > x_cpu_has() input.
> > > 
> > > Thus, by these renames, the hierarchical relations of those reserved CR4
> > > bits is more clear.

Sorry, but I don't like any of the changes in this patch.  At best, some of the
changes are a wash (neither better nor worse), and in that case the churn, however
minor isn't worth it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits
  2022-12-09  4:45 ` [PATCH v3 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits Robert Hoo
@ 2023-01-07  0:38   ` Sean Christopherson
  2023-01-07 13:32     ` Robert Hoo
  0 siblings, 1 reply; 63+ messages in thread
From: Sean Christopherson @ 2023-01-07  0:38 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, kirill.shutemov, kvm, Jingqi Liu

On Fri, Dec 09, 2022, Robert Hoo wrote:
> If LAM enabled, CR4.LAM_SUP is owned by guest; otherwise, reserved.

Why is it passed through to the guest?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd
  2022-12-09  4:45 ` [PATCH v3 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd Robert Hoo
  2022-12-19  6:44   ` Yuan Yao
@ 2023-01-07  0:45   ` Sean Christopherson
  2023-01-07 13:36     ` Robert Hoo
  1 sibling, 1 reply; 63+ messages in thread
From: Sean Christopherson @ 2023-01-07  0:45 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, kirill.shutemov, kvm, Jingqi Liu

On Fri, Dec 09, 2022, Robert Hoo wrote:
> The get_cr3() is the implementation of kvm_mmu::get_guest_pgd(), well, CR3
> cannot be naturally equivalent to pgd, SDM says CR3 high bits are reserved,
> must be zero.
> And now, with LAM feature's introduction, bit 61 ~ 62 are used.
> So, rename get_cr3() --> get_pgd() to better indicate function purpose and
> in it, filtered out CR3 high bits.

Depends on one's interpreation of "PGD".  KVM says it's the full thing, e.g. the
nEPT hook returns the full EPTP, not EP4TA (or EP5TA).  I don't think stripping
bits in get_cr3() is the right approach, e.g. the user might want the full thing
for comparison.  E.g. the PCID bits are left as is.

Changing get_cr3() but not nested_svm_get_tdp_cr3() and nested_ept_get_eptp() is
also weird.

I think my preference would be to strip the LAM bits in the few places that want
the physical address and keep get_cr3() as is.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
  2023-01-07  0:35       ` Sean Christopherson
@ 2023-01-07 13:30         ` Robert Hoo
  2023-01-08 14:18           ` Xiaoyao Li
  0 siblings, 1 reply; 63+ messages in thread
From: Robert Hoo @ 2023-01-07 13:30 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Binbin Wu, pbonzini, kirill.shutemov, kvm

On Sat, 2023-01-07 at 00:35 +0000, Sean Christopherson wrote:
> On Thu, Dec 29, 2022, Robert Hoo wrote:
> > On Wed, 2022-12-28 at 11:37 +0800, Binbin Wu wrote:
> > > On 12/9/2022 12:45 PM, Robert Hoo wrote:
> > > > kvm_vcpu_arch::cr4_guest_owned_bits and
> > > > kvm_vcpu_arch::cr4_guest_rsvd_bits
> > > > looks confusing. Rename latter to cr4_host_rsvd_bits, because
> > > > it in
> > > > fact decribes the effective host reserved cr4 bits from the
> > > > vcpu's
> > > > perspective.
> > > 
> > > IMO, the current name cr4_guest_rsvd_bits is OK becuase it shows
> > > that these
> > > bits are reserved bits from the pointview of guest.
> > 
> > Actually, it's cr4_guest_owned_bits that from the perspective of
> > guest.
> 
> No, cr4_guest_owned_bits is KVM's view of things.  

That's all right. Perhaps my expression wasn't very accurate. Perhaps I
would have said "cr4_guest_owned_bits stands on guest's points, as it
reads, guest owns these (set) bits". Whereas, "cr4_guest_rsvd_bits"
doesn't literally as the word reads, its set bits doesn't mean "guest
reserved these bits" but the opposite, those set bits are reserved by
host:

set_cr4_guest_host_mask()
{
...
vcpu->arch.cr4_guest_owned_bits = KVM_POSSIBLE_CR4_GUEST_BITS &
				~vcpu->arch.cr4_guest_rsvd_bits;
// cr4_guest_owned_bit = (~host owned bits) &
KVM_POSSIBLE_CR4_GUEST_BITS (the filter)
// cr4_guest_owned_bit and cr4_guest_rsvd_bits are generally opposite
relationship
...

vmcs_writel(CR4_GUEST_HOST_MASK, ~vcpu->arch.cr4_guest_owned_bits);
//the opposite of guest owned bits are effectively written to
CR4_GUEST_HOST_MASK
}

These code are the implementation of SDM 25.6.6 "Guest/Host Masks and
Read Shadows for CR0 and CR4"
"In general, bits set to 1 in a guest/host mask correspond to bits
“owned” by the host."

> It tracks which bits have
> effectively been passed through to the guest and so need to be read
> out of the
> VMCS after running the vCPU.

Yes, as above, ~cr4_guest_owned_bits is effective final guest/host CR4
mask that's written to VMCS.CR4_GUEST_HOST_MASK.
> 
> > cr4_guest_owned_bits and cr4_guest_rsvd_bits together looks quite
> > confusing.
> 
> I disagree, KVM (and the SDM and the APM) uses "reserved" or "rsvd"
> all over the
> place to indicate reserved bits/values/fields.

I wouldn't object the word "reserved". I was objecting that
"cr4_guest_rsvd_bits" contains guest reserved; it actually contains
"host reserved". ;)
> 
> > > > * cr4_reserved_bits --> cr4_kvm_reserved_bits, which describes
> 
> Hard no.  They aren't just KVM reserved, many of those bits are
> reserved by
> hardware, which is 100% dependent on the host.

That's right. KVM stands on top of HW, then Host, doesn't it? ;)
My interpretation is that, also the theme of this patch, those
xxx_cr4_reserved consts/variables are actually layered relationships.
> 
> > > > CR4_HOST_RESERVED_BITS + !kvm_cap_has() = kvm level cr4
> > > > reserved
> > > > bits.
> > > > 
> > > > * __cr4_reserved_bits() --> __cr4_calc_reserved_bits(), which
> > > > to
> > > > calc
> > > > effective cr4 reserved bits for kvm or vm level, by
> > > > corresponding
> > > > x_cpu_has() input.
> > > > 
> > > > Thus, by these renames, the hierarchical relations of those
> > > > reserved CR4
> > > > bits is more clear.
> 
> Sorry, but I don't like any of the changes in this patch.  At best,
> some of the
> changes are a wash (neither better nor worse), and in that case the
> churn, however
> minor isn't worth it.

That's all right. Regretful I cannot convince you. This patch just
demonstrates my interpretation when I come confused by the code, then
dig into and SDM reading. Anyway it's not LAM necessity, I'll abandon
this patch in next version.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits
  2023-01-07  0:38   ` Sean Christopherson
@ 2023-01-07 13:32     ` Robert Hoo
  2023-01-09 16:29       ` Sean Christopherson
  0 siblings, 1 reply; 63+ messages in thread
From: Robert Hoo @ 2023-01-07 13:32 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: pbonzini, kirill.shutemov, kvm, Jingqi Liu

On Sat, 2023-01-07 at 00:38 +0000, Sean Christopherson wrote:
> On Fri, Dec 09, 2022, Robert Hoo wrote:
> > If LAM enabled, CR4.LAM_SUP is owned by guest; otherwise, reserved.
> 
> Why is it passed through to the guest?

I think no need to intercept guest's control over CR4.LAM_SUP, which
controls LAM appliance to supervisor mode address.

Same pass-through happens to CR3.LAM_U{48, 57} when EPT is effective.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd
  2023-01-07  0:45   ` Sean Christopherson
@ 2023-01-07 13:36     ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2023-01-07 13:36 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: pbonzini, kirill.shutemov, kvm, Jingqi Liu

On Sat, 2023-01-07 at 00:45 +0000, Sean Christopherson wrote:
> On Fri, Dec 09, 2022, Robert Hoo wrote:
> > The get_cr3() is the implementation of kvm_mmu::get_guest_pgd(),
> > well, CR3
> > cannot be naturally equivalent to pgd, SDM says CR3 high bits are
> > reserved,
> > must be zero.
> > And now, with LAM feature's introduction, bit 61 ~ 62 are used.
> > So, rename get_cr3() --> get_pgd() to better indicate function
> > purpose and
> > in it, filtered out CR3 high bits.
> 
> Depends on one's interpreation of "PGD".  KVM says it's the full
> thing, e.g. the
> nEPT hook returns the full EPTP, not EP4TA (or EP5TA).  I don't think
> stripping
> bits in get_cr3() is the right approach, e.g. the user might want the
> full thing
> for comparison.  E.g. the PCID bits are left as is.
> 
> Changing get_cr3() but not nested_svm_get_tdp_cr3() and
> nested_ept_get_eptp() is
> also weird.
> 
> I think my preference would be to strip the LAM bits in the few
> places that want
> the physical address and keep get_cr3() as is.

OK, will do as this in next version. Thanks.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
  2023-01-07 13:30         ` Robert Hoo
@ 2023-01-08 14:18           ` Xiaoyao Li
  2023-01-09  3:07             ` Robert Hoo
  0 siblings, 1 reply; 63+ messages in thread
From: Xiaoyao Li @ 2023-01-08 14:18 UTC (permalink / raw)
  To: Robert Hoo, Sean Christopherson; +Cc: Binbin Wu, pbonzini, kirill.shutemov, kvm

On 1/7/2023 9:30 PM, Robert Hoo wrote:
> On Sat, 2023-01-07 at 00:35 +0000, Sean Christopherson wrote:
>> On Thu, Dec 29, 2022, Robert Hoo wrote:
>>> On Wed, 2022-12-28 at 11:37 +0800, Binbin Wu wrote:
>>>> On 12/9/2022 12:45 PM, Robert Hoo wrote:
>>>>> kvm_vcpu_arch::cr4_guest_owned_bits and
>>>>> kvm_vcpu_arch::cr4_guest_rsvd_bits
>>>>> looks confusing. Rename latter to cr4_host_rsvd_bits, because
>>>>> it in
>>>>> fact decribes the effective host reserved cr4 bits from the
>>>>> vcpu's
>>>>> perspective.
>>>>
>>>> IMO, the current name cr4_guest_rsvd_bits is OK becuase it shows
>>>> that these
>>>> bits are reserved bits from the pointview of guest.
>>>
>>> Actually, it's cr4_guest_owned_bits that from the perspective of
>>> guest.
>>
>> No, cr4_guest_owned_bits is KVM's view of things.
> 
> That's all right. Perhaps my expression wasn't very accurate. Perhaps I
> would have said "cr4_guest_owned_bits stands on guest's points, as it
> reads, guest owns these (set) bits". Whereas, "cr4_guest_rsvd_bits"
> doesn't literally as the word reads, its set bits doesn't mean "guest
> reserved these bits" but the opposite, those set bits are reserved by
> host:
>

I think you can interpret guest_rsvd_bits as bits reserved *for* guest 
stead of *by* guest


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable
  2023-01-08 14:18           ` Xiaoyao Li
@ 2023-01-09  3:07             ` Robert Hoo
  0 siblings, 0 replies; 63+ messages in thread
From: Robert Hoo @ 2023-01-09  3:07 UTC (permalink / raw)
  To: Xiaoyao Li, Sean Christopherson; +Cc: Binbin Wu, pbonzini, kirill.shutemov, kvm

On Sun, 2023-01-08 at 22:18 +0800, Xiaoyao Li wrote:
> On 1/7/2023 9:30 PM, Robert Hoo wrote:
> > On Sat, 2023-01-07 at 00:35 +0000, Sean Christopherson wrote:
> > > On Thu, Dec 29, 2022, Robert Hoo wrote:
> > > > On Wed, 2022-12-28 at 11:37 +0800, Binbin Wu wrote:
> > > > > On 12/9/2022 12:45 PM, Robert Hoo wrote:
> > > > > > kvm_vcpu_arch::cr4_guest_owned_bits and
> > > > > > kvm_vcpu_arch::cr4_guest_rsvd_bits
> > > > > > looks confusing. Rename latter to cr4_host_rsvd_bits,
> > > > > > because
> > > > > > it in
> > > > > > fact decribes the effective host reserved cr4 bits from the
> > > > > > vcpu's
> > > > > > perspective.
> > > > > 
> > > > > IMO, the current name cr4_guest_rsvd_bits is OK becuase it
> > > > > shows
> > > > > that these
> > > > > bits are reserved bits from the pointview of guest.
> > > > 
> > > > Actually, it's cr4_guest_owned_bits that from the perspective
> > > > of
> > > > guest.
> > > 
> > > No, cr4_guest_owned_bits is KVM's view of things.
> > 
> > That's all right. Perhaps my expression wasn't very accurate.
> > Perhaps I
> > would have said "cr4_guest_owned_bits stands on guest's points, as
> > it
> > reads, guest owns these (set) bits". Whereas, "cr4_guest_rsvd_bits"
> > doesn't literally as the word reads, its set bits doesn't mean
> > "guest
> > reserved these bits" but the opposite, those set bits are reserved
> > by
> > host:
> > 
> 
> I think you can interpret guest_rsvd_bits as bits reserved *for*
> guest 
> stead of *by* guest
> 
I think you mean reserved-by guest. OK, buy in, as well as Binbin's
interpretation. 




^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits
  2023-01-07 13:32     ` Robert Hoo
@ 2023-01-09 16:29       ` Sean Christopherson
  2023-01-10  3:56         ` Robert Hoo
  0 siblings, 1 reply; 63+ messages in thread
From: Sean Christopherson @ 2023-01-09 16:29 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, kirill.shutemov, kvm, Jingqi Liu

On Sat, Jan 07, 2023, Robert Hoo wrote:
> On Sat, 2023-01-07 at 00:38 +0000, Sean Christopherson wrote:
> > On Fri, Dec 09, 2022, Robert Hoo wrote:
> > > If LAM enabled, CR4.LAM_SUP is owned by guest; otherwise, reserved.
> > 
> > Why is it passed through to the guest?
> 
> I think no need to intercept guest's control over CR4.LAM_SUP, which
> controls LAM appliance to supervisor mode address.

That's not a sufficient justification.  KVM doesn't strictly need to intercept
most CR4 bits, but not intercepting has performance implications, e.g. KVM needs
to do a VMREAD(GUEST_CR4) to get LAM_SUP if the bit is pass through.  As a base
rule, KVM intercepts CR4 bits unless there's a reason not to, e.g. if the CR4 bit
in question is written frequently by real guests and/or never consumed by KVM.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits
  2023-01-09 16:29       ` Sean Christopherson
@ 2023-01-10  3:56         ` Robert Hoo
  2023-01-11 17:35           ` Sean Christopherson
  0 siblings, 1 reply; 63+ messages in thread
From: Robert Hoo @ 2023-01-10  3:56 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: pbonzini, kirill.shutemov, kvm, Jingqi Liu

On Mon, 2023-01-09 at 16:29 +0000, Sean Christopherson wrote:
> On Sat, Jan 07, 2023, Robert Hoo wrote:
> > On Sat, 2023-01-07 at 00:38 +0000, Sean Christopherson wrote:
> > > On Fri, Dec 09, 2022, Robert Hoo wrote:
> > > > If LAM enabled, CR4.LAM_SUP is owned by guest; otherwise,
> > > > reserved.
> > > 
> > > Why is it passed through to the guest?
> > 
> > I think no need to intercept guest's control over CR4.LAM_SUP,
> > which
> > controls LAM appliance to supervisor mode address.
> 
> That's not a sufficient justification.  KVM doesn't strictly need to
> intercept
> most CR4 bits, 

Yeah, that's also my experiential understanding.

> but not intercepting has performance implications, e.g. KVM needs
> to do a VMREAD(GUEST_CR4) to get LAM_SUP if the bit is pass
> through.  

Right. On the other hand, intercepting has performance penalty by vm-
exit, and much heavier. So, trade-off, right?

> As a base
> rule, KVM intercepts CR4 bits unless there's a reason not to, e.g. if
> the CR4 bit
> in question is written frequently by real guests and/or never
> consumed by KVM.

From these 2 points to judge:
CR4.LAM_SUP is written frequently by guest? I'm not sure, as native
kernel enabling patch has LAM_U57 only yet, not sure its control will
be per-process/thread or whole kernel-level. If it its use case is
kasan kind of, would you expect it will be frequently guest written?

Never consumed by KVM? false, e.g. kvm_untagged_addr() will read this
bit. But not frequently, I think, at least by this patch set.

So in general, you suggestion/preference? I'm all right on both
choices.




^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits
  2023-01-10  3:56         ` Robert Hoo
@ 2023-01-11 17:35           ` Sean Christopherson
  0 siblings, 0 replies; 63+ messages in thread
From: Sean Christopherson @ 2023-01-11 17:35 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, kirill.shutemov, kvm, Jingqi Liu

On Tue, Jan 10, 2023, Robert Hoo wrote:
> On Mon, 2023-01-09 at 16:29 +0000, Sean Christopherson wrote:
> > As a base rule, KVM intercepts CR4 bits unless there's a reason not to,
> > e.g. if the CR4 bit in question is written frequently by real guests and/or
> > never consumed by KVM.
> 
> From these 2 points to judge:
> CR4.LAM_SUP is written frequently by guest? I'm not sure, as native
> kernel enabling patch has LAM_U57 only yet, not sure its control will
> be per-process/thread or whole kernel-level. If it its use case is
> kasan kind of, would you expect it will be frequently guest written?

Controlling a kernel-level knob on a per-process basis would be bizarre.  But
the expected use case definitely needs to be understood.  I assume Kirill, or
whoever is doing the LAM_SUP implementation, can provide answers.

> Never consumed by KVM? false, e.g. kvm_untagged_addr() will read this
> bit. But not frequently, I think, at least by this patch set.

Untagging an address will need to be done any time KVM consumes a guest virtual
address, i.e. performs any kind of emulation.  That's not super high frequency
on modern CPUs, but it's not exactly rare either.

> So in general, you suggestion/preference? I'm all right on both
> choices.

Unless guests will be touching CR4.LAM_SUP on context switches, intercepting is
unquestionably the right choice.

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2023-01-11 17:39 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-09  4:45 [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
2022-12-09  4:45 ` [PATCH v3 1/9] KVM: x86: Rename cr4_reserved/rsvd_* variables to be more readable Robert Hoo
2022-12-28  3:37   ` Binbin Wu
2022-12-29  1:42     ` Robert Hoo
2023-01-07  0:35       ` Sean Christopherson
2023-01-07 13:30         ` Robert Hoo
2023-01-08 14:18           ` Xiaoyao Li
2023-01-09  3:07             ` Robert Hoo
2022-12-09  4:45 ` [PATCH v3 2/9] KVM: x86: Add CR4.LAM_SUP in guest owned bits Robert Hoo
2023-01-07  0:38   ` Sean Christopherson
2023-01-07 13:32     ` Robert Hoo
2023-01-09 16:29       ` Sean Christopherson
2023-01-10  3:56         ` Robert Hoo
2023-01-11 17:35           ` Sean Christopherson
2022-12-09  4:45 ` [PATCH v3 3/9] KVM: x86: MMU: Rename get_cr3() --> get_pgd() and clear high bits for pgd Robert Hoo
2022-12-19  6:44   ` Yuan Yao
2022-12-20 14:07     ` Robert Hoo
2023-01-07  0:45   ` Sean Christopherson
2023-01-07 13:36     ` Robert Hoo
2022-12-09  4:45 ` [PATCH v3 4/9] KVM: x86: MMU: Commets update Robert Hoo
2022-12-09  4:45 ` [PATCH v3 5/9] KVM: x86: MMU: Integrate LAM bits when build guest CR3 Robert Hoo
2022-12-19  6:53   ` Yuan Yao
2022-12-20 14:07     ` Robert Hoo
2022-12-21  2:12       ` Yuan Yao
2022-12-21  7:50       ` Yu Zhang
2022-12-21  8:55         ` Robert Hoo
2022-12-09  4:45 ` [PATCH v3 6/9] KVM: x86: Untag LAM bits when applicable Robert Hoo
2022-12-19  7:32   ` Yuan Yao
2022-12-20 14:07     ` Robert Hoo
2022-12-19  9:45   ` Yuan Yao
2022-12-20 14:07     ` Robert Hoo
2022-12-21  2:38       ` Yuan Yao
2022-12-21  8:02       ` Yu Zhang
2022-12-21  8:49         ` Robert Hoo
2022-12-21 10:10           ` Yu Zhang
2022-12-21 10:30             ` Yuan Yao
2022-12-21 12:40               ` Yu Zhang
2022-12-22  8:21                 ` Yu Zhang
2022-12-23  2:36                   ` Yuan Yao
2022-12-23  3:55                     ` Robert Hoo
2022-12-21  0:35   ` Yang, Weijiang
2022-12-21  1:38     ` Robert Hoo
2022-12-21  2:55   ` Yuan Yao
2022-12-21  8:22     ` Robert Hoo
2022-12-21  9:35       ` Yuan Yao
2022-12-21 10:22         ` Yu Zhang
2022-12-21 10:33           ` Yuan Yao
2022-12-21  8:14   ` Yu Zhang
2022-12-21  8:37     ` Yu Zhang
2022-12-28  8:32   ` Binbin Wu
2022-12-29  0:41     ` Robert Hoo
2022-12-09  4:45 ` [PATCH v3 7/9] KVM: x86: When judging setting CR3 valid or not, consider LAM bits Robert Hoo
2022-12-09  4:45 ` [PATCH v3 8/9] KVM: x86: When guest set CR3, handle LAM bits semantics Robert Hoo
2022-12-20  9:10   ` Liu, Jingqi
2022-12-20 14:16     ` Robert Hoo
2022-12-21  8:30   ` Yu Zhang
2022-12-21 12:52     ` Robert Hoo
2022-12-09  4:45 ` [PATCH v3 9/9] KVM: x86: LAM: Expose LAM CPUID to user space VMM Robert Hoo
2022-12-19  6:12 ` [PATCH v3 0/9] Linear Address Masking (LAM) KVM Enabling Robert Hoo
2022-12-19  8:09 ` Yuan Yao
2022-12-20 14:06   ` Robert Hoo
2022-12-20  9:20 ` Liu, Jingqi
2022-12-20 14:19   ` Robert Hoo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.