kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning
@ 2021-06-22 17:56 Sean Christopherson
  2021-06-22 17:56 ` [PATCH 01/54] KVM: x86/mmu: Remove broken WARN that fires on 32-bit KVM w/ nested EPT Sean Christopherson
                   ` (54 more replies)
  0 siblings, 55 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

I missed spring by a few days...

This gigantic snowball got rolling when I hit the WARN that guards against
setting bits 63:32 in SPTEs on 32-bit builds (patch 01).  The WARN is a
boneheaded mistake on my part as the whole point of EPT is to avoid the
mess that is IA32 paging.

I added a better variant to WARN if KVM attempts to set _any_ reserved bits
in its SPTEs (patch 48).  Unfortunately, the WARN worked too well and fired
on variety of configurations.  The patches in between are a mix of bug fixes,
cleanups, and documentation updates to get KVM to the point where the WARN
can be added without causing explosions, and to fix/document the numerous
issues/gotchas I found along the way.

The meat of this series is big refactoring of the MMU configuration code to
fix nested NPT, which I discovered after writing a test to exercise the
new reserved bit WARN.   With nested NPT, vCPU state is not guaranteed to
reflect vmcb01 state (though in practice the bug is limited to
KVM_SET_NESTED_STATE, i.e. live migration).  KVM passes in the L1 CR0, CR4,
and EFER values, which the MMU takes into consideration for the mmu_role
and then promptly ignores for all other calculations, e.g. reserved bits.

The approach for solving the nested NPT mess, and a variety of other minor
bugs of similar nature, is to take "all" state from the MMU context itself
instead of the vCPU.  None of the refactoring patches are particularly
interesting, there's just a lot of them because so much code uses the vCPU
instead of the correct state.

I have kvm-unit-tests for the SMEP, NX, and LA57 (my personal favorite) bugs
that I'll post separately.  Ditto for a selftest for recomputing the mmu_role
on CPUID updates.

I don't have a standalone test for nested NPT mmu_role changes; adding a
meaningful test mixed with KVM_SET_NESTED_STATE is a bigger lift.  To test
that mess, I randomized vCPU state prior to initializing the nested NPT MMU
and ran kvm-unit-tests.  E.g. Without the mmu_role changes this causes a
number of unit test failures:

	vcpu->arch.cr0 = get_random_long();
	vcpu->arch.cr4 = get_random_long();
	vcpu->arch.efer = get_random_long();

        kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, svm->vmcb01.ptr->save.cr4,
                                svm->vmcb01.ptr->save.efer,
                                svm->nested.ctl.nested_cr3);


Patch 01 is the only patch that is remotely 5.13 worthy, and even then
only because it's about as safe as a patch can be.  Everything else is far
from urgent as these bugs have existed for quite some time.

I labeled the "sections" of this mess in the shortlog below.

P.S. Does anyone know how PKRU interacts with NPT?  I assume/hope NPT
     accesses, which are always "user", ignore PKRU, but the APM doesn't
     say a thing.  If PKRU is ignored, KVM has some fixing to do.  If PKRU
     isn't ignored, AMD has some fixing to do :-)

P.S.S. This series pulled in one patch from my vCPU RESET/INIT series,
       "Properly reset MMU context at vCPU RESET/INIT", as that was needed
       to fix a root_level bug on VMX.  My goal is to get the RESET/INIT
       series refreshed later this week and thoroughly bombard everyone.


Sean Christopherson (54):

 -- bug fixes --
  KVM: x86/mmu: Remove broken WARN that fires on 32-bit KVM w/ nested
    EPT
  KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
  KVM: x86: Properly reset MMU context at vCPU RESET/INIT
  KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT
    walk
  Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack"
  KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified
  KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is
    broken
  Revert "KVM: MMU: record maximum physical address width in
    kvm_mmu_extended_role"
  KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at
    GFN

 -- cleanups --
  KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check
  KVM: x86/mmu: WARN and zap SP when sync'ing if MMU role mismatches
  KVM: x86/mmu: Drop the intermediate "transient" __kvm_sync_page()
  KVM: x86/mmu: Rename unsync helper and update related comments

 -- bug fixes --
  KVM: x86: Fix sizes used to pass around CR0, CR4, and EFER
  KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU
    state
  KVM: x86/mmu: Drop smep_andnot_wp check from "uses NX" for shadow MMUs
  KVM: x86: Read and pass all CR0/CR4 role bits to shadow MMU helper

 -- nested NPT / mmu_role refactoring --
  KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper
  KVM: x86/mmu: Grab shadow root level from mmu_role for shadow MMUs
  KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from
    regs
  KVM: x86/mmu: Consolidate misc updates into shadow_mmu_init_context()
  KVM: x86/mmu: Ignore CR0 and CR4 bits in nested EPT MMU role
  KVM: x86/mmu: Use MMU's role_regs, not vCPU state, to compute mmu_role
  KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans
  KVM: x86/mmu: Add helpers to query mmu_role bits
  KVM: x86/mmu: Do not set paging-related bits in MMU role if CR0.PG=0
  KVM: x86/mmu: Set CR4.PKE/LA57 in MMU role iff long mode is active
  KVM: x86/mmu: Always Set new mmu_role immediately after checking old
    role
  KVM: x86/mmu: Don't grab CR4.PSE for calculating shadow reserved bits
  KVM: x86/mmu: Use MMU's role to get CR4.PSE for computing rsvd bits
  KVM: x86/mmu: Drop vCPU param from reserved bits calculator
  KVM: x86/mmu: Use MMU's role to compute permission bitmask
  KVM: x86/mmu: Use MMU's role to compute PKRU bitmask
  KVM: x86/mmu: Use MMU's roles to compute last non-leaf level
  KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk
  KVM: x86/mmu: Use MMU's role/role_regs to compute context's metadata
  KVM: x86/mmu: Use MMU's role to get EFER.NX during MMU configuration
  KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers
  KVM: x86/mmu: Get nested MMU's root level from the MMU's role
  KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
  KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
  KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
  KVM: x86/mmu: Add helper to update paging metadata
  KVM: x86/mmu: Add a helper to calculate root from role_regs
  KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
  KVM: x86/mmu: Use MMU's role to determine PTTYPE

 -- finally, the new WARN!
  KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic
    MMU
  KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE

 -- more cleanups --
  KVM: x86: Enhance comments for MMU roles and nested transition
    trickiness
  KVM: x86/mmu: Optimize and clean up so called "last nonleaf level"
    logic
  KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
  KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault

 -- RFC-ish "fix" --
  KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP
    is on


Sean Christopherson (54):
  KVM: x86/mmu: Remove broken WARN that fires on 32-bit KVM w/ nested
    EPT
  KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
  KVM: x86: Properly reset MMU context at vCPU RESET/INIT
  KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT
    walk
  Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack"
  KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified
  KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is
    broken
  Revert "KVM: MMU: record maximum physical address width in
    kvm_mmu_extended_role"
  KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at
    GFN
  KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check
  KVM: x86/mmu: WARN and zap SP when sync'ing if MMU role mismatches
  KVM: x86/mmu: Drop the intermediate "transient" __kvm_sync_page()
  KVM: x86/mmu: Rename unsync helper and update related comments
  KVM: x86: Fix sizes used to pass around CR0, CR4, and EFER
  KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU
    state
  KVM: x86/mmu: Drop smep_andnot_wp check from "uses NX" for shadow MMUs
  KVM: x86: Read and pass all CR0/CR4 role bits to shadow MMU helper
  KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper
  KVM: x86/mmu: Grab shadow root level from mmu_role for shadow MMUs
  KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from
    regs
  KVM: x86/mmu: Consolidate misc updates into shadow_mmu_init_context()
  KVM: x86/mmu: Ignore CR0 and CR4 bits in nested EPT MMU role
  KVM: x86/mmu: Use MMU's role_regs, not vCPU state, to compute mmu_role
  KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans
  KVM: x86/mmu: Add helpers to query mmu_role bits
  KVM: x86/mmu: Do not set paging-related bits in MMU role if CR0.PG=0
  KVM: x86/mmu: Set CR4.PKE/LA57 in MMU role iff long mode is active
  KVM: x86/mmu: Always Set new mmu_role immediately after checking old
    role
  KVM: x86/mmu: Don't grab CR4.PSE for calculating shadow reserved bits
  KVM: x86/mmu: Use MMU's role to get CR4.PSE for computing rsvd bits
  KVM: x86/mmu: Drop vCPU param from reserved bits calculator
  KVM: x86/mmu: Use MMU's role to compute permission bitmask
  KVM: x86/mmu: Use MMU's role to compute PKRU bitmask
  KVM: x86/mmu: Use MMU's roles to compute last non-leaf level
  KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk
  KVM: x86/mmu: Use MMU's role/role_regs to compute context's metadata
  KVM: x86/mmu: Use MMU's role to get EFER.NX during MMU configuration
  KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers
  KVM: x86/mmu: Get nested MMU's root level from the MMU's role
  KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
  KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
  KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
  KVM: x86/mmu: Add helper to update paging metadata
  KVM: x86/mmu: Add a helper to calculate root from role_regs
  KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
  KVM: x86/mmu: Use MMU's role to determine PTTYPE
  KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic
    MMU
  KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE
  KVM: x86: Enhance comments for MMU roles and nested transition
    trickiness
  KVM: x86/mmu: Optimize and clean up so called "last nonleaf level"
    logic
  KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
  KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP
    is on

 Documentation/virt/kvm/api.rst            |  11 +-
 Documentation/virt/kvm/mmu.rst            |   7 +-
 arch/x86/include/asm/kvm_host.h           |  71 ++-
 arch/x86/kvm/cpuid.c                      |   6 +-
 arch/x86/kvm/mmu.h                        |  18 +-
 arch/x86/kvm/mmu/mmu.c                    | 648 +++++++++++-----------
 arch/x86/kvm/mmu/mmu_internal.h           |   3 +-
 arch/x86/kvm/mmu/mmutrace.h               |   2 +-
 arch/x86/kvm/mmu/paging_tmpl.h            |  68 ++-
 arch/x86/kvm/mmu/spte.c                   |  22 +-
 arch/x86/kvm/mmu/spte.h                   |  32 ++
 arch/x86/kvm/svm/nested.c                 |  10 +-
 arch/x86/kvm/vmx/nested.c                 |   1 +
 arch/x86/kvm/x86.c                        |  26 +-
 arch/x86/kvm/x86.h                        |  10 -
 tools/lib/traceevent/plugins/plugin_kvm.c |   4 +-
 16 files changed, 530 insertions(+), 409 deletions(-)

-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 01/54] KVM: x86/mmu: Remove broken WARN that fires on 32-bit KVM w/ nested EPT
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-22 17:56 ` [PATCH 02/54] KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs Sean Christopherson
                   ` (53 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Remove a misguided WARN that attempts to detect the scenario where using
a special A/D tracking flag will set reserved bits on a non-MMIO spte.
The WARN triggers false positives when using EPT with 32-bit KVM because
of the !64-bit clause, which is just flat out wrong.  The whole A/D
tracking goo is specific to EPT, and one of the big selling points of EPT
is that EPT is decoupled from the host's native paging mode.

Drop the WARN instead of trying to salvage the check.  Keeping a check
specific to A/D tracking bits would essentially regurgitate the same code
that led to KVM needed the tracking bits in the first place.

A better approach would be to add a generic WARN on reserved bits being
set, which would naturally cover the A/D tracking bits, work for all
flavors of paging, and be self-documenting to some extent.

Fixes: 8a406c89532c ("KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/spte.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 66d43cec0c31..8e8e8da740a0 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -102,13 +102,6 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 	else if (kvm_vcpu_ad_need_write_protect(vcpu))
 		spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
 
-	/*
-	 * Bits 62:52 of PAE SPTEs are reserved.  WARN if said bits are set
-	 * if PAE paging may be employed (shadow paging or any 32-bit KVM).
-	 */
-	WARN_ON_ONCE((!tdp_enabled || !IS_ENABLED(CONFIG_X86_64)) &&
-		     (spte & SPTE_TDP_AD_MASK));
-
 	/*
 	 * For the EPT case, shadow_present_mask is 0 if hardware
 	 * supports exec-only page table entries.  In that case,
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 02/54] KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
  2021-06-22 17:56 ` [PATCH 01/54] KVM: x86/mmu: Remove broken WARN that fires on 32-bit KVM w/ nested EPT Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-22 17:56 ` [PATCH 03/54] KVM: x86: Properly reset MMU context at vCPU RESET/INIT Sean Christopherson
                   ` (52 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Mark NX as being used for all non-nested shadow MMUs, as KVM will set the
NX bit for huge SPTEs if the iTLB mutli-hit mitigation is enabled.
Checking the mitigation itself is not sufficient as it can be toggled on
at any time and KVM doesn't reset MMU contexts when that happens.  KVM
could reset the contexts, but that would require purging all SPTEs in all
MMUs, for no real benefit.  And, KVM already forces EFER.NX=1 when TDP is
disabled (for WP=0, SMEP=1, NX=0), so technically NX is never reserved
for shadow MMUs.

Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 84d48a33e38b..0db12f461c9d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4221,7 +4221,15 @@ static inline u64 reserved_hpa_bits(void)
 void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 {
-	bool uses_nx = context->nx ||
+	/*
+	 * KVM uses NX when TDP is disabled to handle a variety of scenarios,
+	 * notably for huge SPTEs if iTLB multi-hit mitigation is enabled and
+	 * to generate correct permissions for CR0.WP=0/CR4.SMEP=1/EFER.NX=0.
+	 * The iTLB multi-hit workaround can be toggled at any time, so assume
+	 * NX can be used by any non-nested shadow MMU to avoid having to reset
+	 * MMU contexts.  Note, KVM forces EFER.NX=1 when TDP is disabled.
+	 */
+	bool uses_nx = context->nx || !tdp_enabled ||
 		context->mmu_role.base.smep_andnot_wp;
 	struct rsvd_bits_validate *shadow_zero_check;
 	int i;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 03/54] KVM: x86: Properly reset MMU context at vCPU RESET/INIT
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
  2021-06-22 17:56 ` [PATCH 01/54] KVM: x86/mmu: Remove broken WARN that fires on 32-bit KVM w/ nested EPT Sean Christopherson
  2021-06-22 17:56 ` [PATCH 02/54] KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-23 13:59   ` Paolo Bonzini
  2021-06-23 14:01   ` Paolo Bonzini
  2021-06-22 17:56 ` [PATCH 04/54] KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT walk Sean Christopherson
                   ` (51 subsequent siblings)
  54 siblings, 2 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Reset the MMU context at vCPU INIT (and RESET for good measure) if CR0.PG
was set prior to INIT.  Simply re-initializing the current MMU is not
sufficient as the current root HPA may not be usable in the new context.
E.g. if TDP is disabled and INIT arrives while the vCPU is in long mode,
KVM will fail to switch to the 32-bit pae_root and bomb on the next
VM-Enter due to running with a 64-bit CR3 in 32-bit mode.

This bug was papered over in both VMX and SVM, but still managed to rear
its head in the MMU role on VMX.  Because EFER.LMA=1 requires CR0.PG=1,
kvm_calc_shadow_mmu_root_page_role() checks for EFER.LMA without first
checking CR0.PG.  VMX's RESET/INIT flow writes CR0 before EFER, and so
an INIT with the vCPU in 64-bit mode will cause the hack-a-fix to
generate the wrong MMU role.

In VMX, the INIT issue is specific to running without unrestricted guest
since unrestricted guest is available if and only if EPT is enabled.
Commit 8668a3c468ed ("KVM: VMX: Reset mmu context when entering real
mode") resolved the issue by forcing a reset when entering emulated real
mode.

In SVM, commit ebae871a509d ("kvm: svm: reset mmu on VCPU reset") forced
a MMU reset on every INIT to workaround the flaw in common x86.  Note, at
the time the bug was fixed, the SVM problem was exacerbated by a complete
lack of a CR4 update.

The vendor resets will be reverted in future patches, primarily to aid
bisection in case there are non-INIT flows that rely on the existing VMX
logic.

Because CR0.PG is unconditionally cleared on INIT, and because CR0.WP and
all CR4/EFER paging bits are ignored if CR0.PG=0, simply checking that
CR0.PG was '1' prior to INIT/RESET is sufficient to detect a required MMU
context reset.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76dae88cf524..42608b515ce4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10735,6 +10735,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
+	unsigned long old_cr0 = kvm_read_cr0(vcpu);
+
 	kvm_lapic_reset(vcpu, init_event);
 
 	vcpu->arch.hflags = 0;
@@ -10803,6 +10805,17 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	vcpu->arch.ia32_xss = 0;
 
 	static_call(kvm_x86_vcpu_reset)(vcpu, init_event);
+
+	/*
+	 * Reset the MMU context if paging was enabled prior to INIT (which is
+	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
+	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
+	 * checked because it is unconditionally cleared on INIT and all other
+	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
+	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
+	 */
+	if (old_cr0 & X86_CR0_PG)
+		kvm_mmu_reset_context(vcpu);
 }
 
 void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 04/54] KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT walk
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (2 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 03/54] KVM: x86: Properly reset MMU context at vCPU RESET/INIT Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-22 17:56 ` [PATCH 05/54] Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack" Sean Christopherson
                   ` (50 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the MMU's role to get its effective SMEP value when injecting a fault
into the guest.  When walking L1's (nested) NPT while L2 is active, vCPU
state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0,
CR4, EFER, etc...  If L1 and L2 have different settings for SMEP and
L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH
when injecting #NPF.

Fixes: e57d4a356ad3 ("KVM: Add instruction fetch checking when walking guest page table")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..52fffd68b522 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -471,8 +471,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 
 error:
 	errcode |= write_fault | user_fault;
-	if (fetch_fault && (mmu->nx ||
-			    kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
+	if (fetch_fault && (mmu->nx || mmu->mmu_role.ext.cr4_smep))
 		errcode |= PFERR_FETCH_MASK;
 
 	walker->fault.vector = PF_VECTOR;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 05/54] Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack"
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (3 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 04/54] KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT walk Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-25  8:47   ` Yu Zhang
  2021-06-22 17:56 ` [PATCH 06/54] KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified Sean Christopherson
                   ` (49 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested
virtualization.  When KVM (L0) is using TDP, CR4.LA57 is not reflected in
mmu_role.base.level because that tracks the shadow root level, i.e. TDP
level.  Normally, this is not an issue because LA57 can't be toggled
while long mode is active, i.e. the guest has to first disable paging,
then toggle LA57, then re-enable paging, thus ensuring an MMU
reinitialization.

But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
without having to bounce through an unpaged section.  L1 can also load a
new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a
single entry PML5 is sufficient.  Such shenanigans are only problematic
if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets
reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode.

Note, in the L2 case with nested TDP, even though L1 can switch between
L2s with different LA57 settings, thus bypassing the paging requirement,
in that case KVM's nested_mmu will track LA57 in base.level.

This reverts commit 8053f924cad30bf9f9a24e02b6c8ddfabf5202ea.

Fixes: 8053f924cad3 ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/mmu/mmu.c          | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e11d64aa0bcd..916e0f89fdfc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -320,6 +320,7 @@ union kvm_mmu_extended_role {
 		unsigned int cr4_pke:1;
 		unsigned int cr4_smap:1;
 		unsigned int cr4_smep:1;
+		unsigned int cr4_la57:1;
 		unsigned int maxphyaddr:6;
 	};
 };
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0db12f461c9d..5024318dec45 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4537,6 +4537,7 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu)
 	ext.cr4_smap = !!kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
 	ext.cr4_pse = !!is_pse(vcpu);
 	ext.cr4_pke = !!kvm_read_cr4_bits(vcpu, X86_CR4_PKE);
+	ext.cr4_la57 = !!kvm_read_cr4_bits(vcpu, X86_CR4_LA57);
 	ext.maxphyaddr = cpuid_maxphyaddr(vcpu);
 
 	ext.valid = 1;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 06/54] KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (4 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 05/54] Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack" Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-22 17:56 ` [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken Sean Christopherson
                   ` (48 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Invalidate all MMUs' roles after a CPUID update to force reinitizliation
of the MMU context/helpers.  Despite the efforts of commit de3ccd26fafc
("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"),
there are still a handful of CPUID-based properties that affect MMU
behavior but are not incorporated into mmu_role.  E.g. 1gb hugepage
support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all
factor into the guest's reserved PTE bits.

The obvious alternative would be to add all such properties to mmu_role,
but doing so provides no benefit over simply forcing a reinitialization
on every CPUID update, as setting guest CPUID is a rare operation.

Note, reinitializing all MMUs after a CPUID update does not fix all of
KVM's woes.  Specifically, kvm_mmu_page_role doesn't track the CPUID
properties, which means that a vCPU can reuse shadow pages that should
not exist for the new vCPU model, e.g. that map GPAs that are now illegal
(due to MAXPHYADDR changes) or that set bits that are now reserved
(PAGE_SIZE for 1gb pages), etc...

Tracking the relevant CPUID properties in kvm_mmu_page_role would address
the majority of problems, but fully tracking that much state in the
shadow page role comes with an unpalatable cost as it would require a
non-trivial increase in KVM's memory footprint.  The GBPAGES case is even
worse, as neither Intel nor AMD provides a way to disable 1gb hugepage
support in the hardware page walker, i.e. it's a virtualization hole that
can't be closed when using TDP.

In other words, resetting the MMU after a CPUID update is largely a
superficial fix.  But, it will allow reverting the tracking of MAXPHYADDR
in the mmu_role, and that case in particular needs to mostly work because
KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging
is supported.  For cases where KVM botches guest behavior, the damage is
limited to that guest.  But for the shadow_root_level, a misconfigured
MMU can cause KVM to incorrectly access memory, e.g. due to walking off
the end of its shadow page tables.

Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c            |  6 +++---
 arch/x86/kvm/mmu/mmu.c          | 12 ++++++++++++
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 916e0f89fdfc..4ac534766eff 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1501,6 +1501,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu);
 void kvm_mmu_init_vm(struct kvm *kvm);
 void kvm_mmu_uninit_vm(struct kvm *kvm);
 
+void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu);
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 				      struct kvm_memory_slot *memslot,
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b4da665bb892..c42613cfb5ba 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -202,10 +202,10 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu);
 
 	/*
-	 * Except for the MMU, which needs to be reset after any vendor
-	 * specific adjustments to the reserved GPA bits.
+	 * Except for the MMU, which needs to do its thing any vendor specific
+	 * adjustments to the reserved GPA bits.
 	 */
-	kvm_mmu_reset_context(vcpu);
+	kvm_mmu_after_set_cpuid(vcpu);
 }
 
 static int is_efer_nx(void)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5024318dec45..e2668a9b5936 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4903,6 +4903,18 @@ kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu)
 	return role.base;
 }
 
+void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Invalidate all MMU roles to force them to reinitialize as CPUID
+	 * information is factored into reserved bit calculations.
+	 */
+	vcpu->arch.root_mmu.mmu_role.ext.valid = 0;
+	vcpu->arch.guest_mmu.mmu_role.ext.valid = 0;
+	vcpu->arch.nested_mmu.mmu_role.ext.valid = 0;
+	kvm_mmu_reset_context(vcpu);
+}
+
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_unload(vcpu);
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (5 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 06/54] KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-23 14:16   ` Paolo Bonzini
  2021-06-22 17:56 ` [PATCH 08/54] Revert "KVM: MMU: record maximum physical address width in kvm_mmu_extended_role" Sean Christopherson
                   ` (47 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Warn userspace that KVM_SET_CPUID{,2} after KVM_RUN "may" cause guest
instability.  Initialize last_vmentry_cpu to -1 and use it to detect if
the vCPU has been run at least once when its CPUID model is changed.

KVM does not correctly handle changes to paging related settings in the
guest's vCPU model after KVM_RUN, e.g. MAXPHYADDR, GBPAGES, etc...  KVM
could theoretically zap all shadow pages, but actually making that happen
is a mess due to lock inversion (vcpu->mutex is held).  And even then,
updating paging settings on the fly would only work if all vCPUs are
stopped, updated in concert with identical settings, then restarted.

To support running vCPUs with different vCPU models (that affect paging),
KVM would need to track all relevant information in kvm_mmu_page_role.
Note, that's the _page_ role, not the full mmu_role.  Updating mmu_role
isn't sufficient as a vCPU can reuse a shadow page translation that was
created by a vCPU with different settings and thus completely skip the
reserved bit checks (that are tied to CPUID).

Tracking CPUID state in kvm_mmu_page_role is _extremely_ undesirable as
it would require doubling gfn_track from a u16 to a u32, i.e. would
increase KVM's memory footprint by 2 bytes for every 4kb of guest memory.
E.g. MAXPHYADDR (6 bits), GBPAGES, AMD vs. INTEL = 1 bit, and SEV C-BIT
would all need to be tracked.

In practice, there is no remotely sane use case for changing any paging
related CPUID entries on the fly, so just sweep it under the rug (after
yelling at userspace).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/api.rst  | 11 ++++++++---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu/mmu.c          | 18 ++++++++++++++++++
 arch/x86/kvm/x86.c              |  2 ++
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e328caa35d6c..06e82f07fe54 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -688,9 +688,14 @@ MSRs that have been set successfully.
 Defines the vcpu responses to the cpuid instruction.  Applications
 should use the KVM_SET_CPUID2 ioctl if available.
 
-Note, when this IOCTL fails, KVM gives no guarantees that previous valid CPUID
-configuration (if there is) is not corrupted. Userspace can get a copy of the
-resulting CPUID configuration through KVM_GET_CPUID2 in case.
+Caveat emptor:
+  - If this IOCTL fails, KVM gives no guarantees that previous valid CPUID
+    configuration (if there is) is not corrupted. Userspace can get a copy
+    of the resulting CPUID configuration through KVM_GET_CPUID2 in case.
+  - Using KVM_SET_CPUID{,2} after KVM_RUN, i.e. changing the guest vCPU model
+    after running the guest, may cause guest instability.
+  - Using heterogeneous CPUID configurations, modulo APIC IDs, topology, etc...
+    may cause guest instability.
 
 ::
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4ac534766eff..19c88b445ee0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -840,7 +840,7 @@ struct kvm_vcpu_arch {
 	bool l1tf_flush_l1d;
 
 	/* Host CPU on which VM-entry was most recently attempted */
-	unsigned int last_vmentry_cpu;
+	int last_vmentry_cpu;
 
 	/* AMD MSRC001_0015 Hardware Configuration */
 	u64 msr_hwcr;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e2668a9b5936..8d97d21d5241 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4913,6 +4913,24 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	vcpu->arch.guest_mmu.mmu_role.ext.valid = 0;
 	vcpu->arch.nested_mmu.mmu_role.ext.valid = 0;
 	kvm_mmu_reset_context(vcpu);
+
+	/*
+	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
+	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
+	 * tracked in kvm_mmu_page_role.  As a result, KVM may miss guest page
+	 * faults due to reusing SPs/SPTEs.  Alert userspace, but otherwise
+	 * sweep the problem under the rug.
+	 *
+	 * KVM's horrific CPUID ABI makes the problem all but impossible to
+	 * solve, as correctly handling multiple vCPU models (with respect to
+	 * paging and physical address properties) in a single VM would require
+	 * tracking all relevant CPUID information in kvm_mmu_page_role.  That
+	 * is very undesirable as it would double the memory requirements for
+	 * gfn_track (see struct kvm_mmu_page_role comments), and in practice
+	 * no sane VMM mucks with the core vCPU model on the fly.
+	 */
+	if (vcpu->arch.last_vmentry_cpu != -1)
+		pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} after KVM_RUN may cause guest instability\n");
 }
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42608b515ce4..92b4a9305651 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10583,6 +10583,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	struct page *page;
 	int r;
 
+	vcpu->arch.last_vmentry_cpu = -1;
+
 	if (!irqchip_in_kernel(vcpu->kvm) || kvm_vcpu_is_reset_bsp(vcpu))
 		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 	else
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 08/54] Revert "KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (6 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-25  8:52   ` Yu Zhang
  2021-06-22 17:56 ` [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN Sean Christopherson
                   ` (46 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Drop MAXPHYADDR from mmu_role now that all MMUs have their role
invalidated after a CPUID update.  Invalidating the role forces all MMUs
to re-evaluate the guest's MAXPHYADDR, and the guest's MAXPHYADDR can
only be changed only through a CPUID update.

This reverts commit de3ccd26fafc707b09792d9b633c8b5b48865315.

Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 1 -
 arch/x86/kvm/mmu/mmu.c          | 1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 19c88b445ee0..cdaff399ed94 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -321,7 +321,6 @@ union kvm_mmu_extended_role {
 		unsigned int cr4_smap:1;
 		unsigned int cr4_smep:1;
 		unsigned int cr4_la57:1;
-		unsigned int maxphyaddr:6;
 	};
 };
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d97d21d5241..04cab330c445 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4538,7 +4538,6 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu)
 	ext.cr4_pse = !!is_pse(vcpu);
 	ext.cr4_pke = !!kvm_read_cr4_bits(vcpu, X86_CR4_PKE);
 	ext.cr4_la57 = !!kvm_read_cr4_bits(vcpu, X86_CR4_LA57);
-	ext.maxphyaddr = cpuid_maxphyaddr(vcpu);
 
 	ext.valid = 1;
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (7 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 08/54] Revert "KVM: MMU: record maximum physical address width in kvm_mmu_extended_role" Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-23 14:36   ` Paolo Bonzini
  2021-06-25  9:51   ` Yu Zhang
  2021-06-22 17:56 ` [PATCH 10/54] KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check Sean Christopherson
                   ` (45 subsequent siblings)
  54 siblings, 2 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

When creating a new upper-level shadow page, zap unsync shadow pages at
the same target gfn instead of attempting to sync the pages.  This fixes
a bug where an unsync shadow page could be sync'd with an incompatible
context, e.g. wrong smm, is_guest, etc... flags.  In practice, the bug is
relatively benign as sync_page() is all but guaranteed to fail its check
that the guest's desired gfn (for the to-be-sync'd page) matches the
current gfn associated with the shadow page.  I.e. kvm_sync_page() would
end up zapping the page anyways.

Alternatively, __kvm_sync_page() could be modified to explicitly verify
the mmu_role of the unsync shadow page is compatible with the current MMU
context.  But, except for this specific case, __kvm_sync_page() is called
iff the page is compatible, e.g. the transient sync in kvm_mmu_get_page()
requires an exact role match, and the call from kvm_sync_mmu_roots() is
only synchronizing shadow pages from the current MMU (which better be
compatible or KVM has problems).  And as described above, attempting to
sync shadow pages when creating an upper-level shadow page is unlikely
to succeed, e.g. zero successful syncs were observed when running Linux
guests despite over a million attempts.

Fixes: 9f1a122f970d ("KVM: MMU: allow more page become unsync at getting sp time")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 50 ++++++++++++++----------------------------
 1 file changed, 16 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 04cab330c445..99d26859021d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1843,24 +1843,6 @@ static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	return __kvm_sync_page(vcpu, sp, invalid_list);
 }
 
-/* @gfn should be write-protected at the call site */
-static bool kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
-			   struct list_head *invalid_list)
-{
-	struct kvm_mmu_page *s;
-	bool ret = false;
-
-	for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
-		if (!s->unsync)
-			continue;
-
-		WARN_ON(s->role.level != PG_LEVEL_4K);
-		ret |= kvm_sync_page(vcpu, s, invalid_list);
-	}
-
-	return ret;
-}
-
 struct mmu_page_path {
 	struct kvm_mmu_page *parent[PT64_ROOT_MAX_LEVEL];
 	unsigned int idx[PT64_ROOT_MAX_LEVEL];
@@ -1990,8 +1972,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	struct hlist_head *sp_list;
 	unsigned quadrant;
 	struct kvm_mmu_page *sp;
-	bool need_sync = false;
-	bool flush = false;
 	int collisions = 0;
 	LIST_HEAD(invalid_list);
 
@@ -2014,11 +1994,21 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 			continue;
 		}
 
-		if (!need_sync && sp->unsync)
-			need_sync = true;
-
-		if (sp->role.word != role.word)
+		if (sp->role.word != role.word) {
+			/*
+			 * If the guest is creating an upper-level page, zap
+			 * unsync pages for the same gfn.  While it's possible
+			 * the guest is using recursive page tables, in all
+			 * likelihood the guest has stopped using the unsync
+			 * page and is installing a completely unrelated page.
+			 * Unsync pages must not be left as is, because the new
+			 * upper-level page will be write-protected.
+			 */
+			if (level > PG_LEVEL_4K && sp->unsync)
+				kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
+							 &invalid_list);
 			continue;
+		}
 
 		if (direct_mmu)
 			goto trace_get_page;
@@ -2052,22 +2042,14 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	sp->role = role;
 	hlist_add_head(&sp->hash_link, sp_list);
 	if (!direct) {
-		/*
-		 * we should do write protection before syncing pages
-		 * otherwise the content of the synced shadow page may
-		 * be inconsistent with guest page table.
-		 */
 		account_shadowed(vcpu->kvm, sp);
 		if (level == PG_LEVEL_4K && rmap_write_protect(vcpu, gfn))
 			kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1);
-
-		if (level > PG_LEVEL_4K && need_sync)
-			flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
 	}
 	trace_kvm_mmu_get_page(sp, true);
-
-	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
 out:
+	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+
 	if (collisions > vcpu->kvm->stat.max_mmu_page_hash_collisions)
 		vcpu->kvm->stat.max_mmu_page_hash_collisions = collisions;
 	return sp;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 10/54] KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (8 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-23 15:49   ` Paolo Bonzini
  2021-06-22 17:56 ` [PATCH 11/54] KVM: x86/mmu: WARN and zap SP when sync'ing if MMU role mismatches Sean Christopherson
                   ` (44 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Replace the hack to identify nested EPT shadow pages with a simple check
that the size of the guest PTEs associated with the shadow page and the
current MMU match, which is the intent of the "8 bytes == PAE" test.
The nested EPT hack existed to avoid a false negative due to the is_pae()
check not matching for 32-bit L2 guests; checking the MMU role directly
avoids the indirect calculation of the guest PTE size entirely.

Note, this should be a glorified nop now that __kvm_sync_page() is called
if and only if the role is an exact match (kvm_mmu_get_page()) or is part
of the current MMU context (kvm_mmu_sync_roots()).  A future commit will
convert the likely-pointless check into a meaningful WARN to enforce that
the mmu_roles of the current context and the shadow page are compatible.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/mmu.rst |  3 ---
 arch/x86/kvm/mmu/mmu.c         | 16 +++-------------
 2 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst
index 20d85daed395..ddbb23998742 100644
--- a/Documentation/virt/kvm/mmu.rst
+++ b/Documentation/virt/kvm/mmu.rst
@@ -192,9 +192,6 @@ Shadow pages contain the following information:
     Contains the value of cr4.smap && !cr0.wp for which the page is valid
     (pages for which this is true are different from other pages; see the
     treatment of cr0.wp=0 below).
-  role.ept_sp:
-    This is a virtual flag to denote a shadowed nested EPT page.  ept_sp
-    is true if "cr0_wp && smap_andnot_wp", an otherwise invalid combination.
   role.smm:
     Is 1 if the page is valid in system management mode.  This field
     determines which of the kvm_memslots array was used to build this
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 99d26859021d..9f277c5bab76 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1780,16 +1780,13 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])	\
 		if ((_sp)->gfn != (_gfn) || (_sp)->role.direct) {} else
 
-static inline bool is_ept_sp(struct kvm_mmu_page *sp)
-{
-	return sp->role.cr0_wp && sp->role.smap_andnot_wp;
-}
-
 /* @sp->gfn should be write-protected at the call site */
 static bool __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			    struct list_head *invalid_list)
 {
-	if ((!is_ept_sp(sp) && sp->role.gpte_is_8_bytes != !!is_pae(vcpu)) ||
+	union kvm_mmu_page_role mmu_role = vcpu->arch.mmu->mmu_role.base;
+
+	if (sp->role.gpte_is_8_bytes != mmu_role.gpte_is_8_bytes ||
 	    vcpu->arch.mmu->sync_page(vcpu, sp) == 0) {
 		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
 		return false;
@@ -4721,13 +4718,6 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
 	role.base.guest_mode = true;
 	role.base.access = ACC_ALL;
 
-	/*
-	 * WP=1 and NOT_WP=1 is an impossible combination, use WP and the
-	 * SMAP variation to denote shadow EPT entries.
-	 */
-	role.base.cr0_wp = true;
-	role.base.smap_andnot_wp = true;
-
 	role.ext = kvm_calc_mmu_role_ext(vcpu);
 	role.ext.execonly = execonly;
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 11/54] KVM: x86/mmu: WARN and zap SP when sync'ing if MMU role mismatches
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (9 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 10/54] KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-22 17:56 ` [PATCH 12/54] KVM: x86/mmu: Drop the intermediate "transient" __kvm_sync_page() Sean Christopherson
                   ` (43 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

When synchronizing a shadow page, WARN and zap the page if its mmu role
isn't compatible with the current MMU context, where "compatible" is an
exact match sans the bits that have no meaning in the overall MMU context
or will be explicitly overwritten during the sync.  Many of the helpers
used by sync_page() are specific to the current context, updating a SMM
vs. non-SMM shadow page would use the wrong memslots, updating L1 vs. L2
PTEs might work but would be extremely bizaree, and so on and so forth.

Drop the guard with respect to 8-byte vs. 4-byte PTEs in
__kvm_sync_page(), it was made useless when kvm_mmu_get_page() stopped
trying to sync shadow pages irrespective of the current MMU context.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         |  5 +----
 arch/x86/kvm/mmu/paging_tmpl.h | 27 +++++++++++++++++++++++++--
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9f277c5bab76..2e2d66319325 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1784,10 +1784,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 static bool __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			    struct list_head *invalid_list)
 {
-	union kvm_mmu_page_role mmu_role = vcpu->arch.mmu->mmu_role.base;
-
-	if (sp->role.gpte_is_8_bytes != mmu_role.gpte_is_8_bytes ||
-	    vcpu->arch.mmu->sync_page(vcpu, sp) == 0) {
+	if (vcpu->arch.mmu->sync_page(vcpu, sp) == 0) {
 		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
 		return false;
 	}
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 52fffd68b522..b632606a87d6 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1030,13 +1030,36 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gpa_t vaddr,
  */
 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
+	union kvm_mmu_page_role mmu_role = vcpu->arch.mmu->mmu_role.base;
 	int i, nr_present = 0;
 	bool host_writable;
 	gpa_t first_pte_gpa;
 	int set_spte_ret = 0;
 
-	/* direct kvm_mmu_page can not be unsync. */
-	BUG_ON(sp->role.direct);
+	/*
+	 * Ignore various flags when verifying that it's safe to sync a shadow
+	 * page using the current MMU context.
+	 *
+	 *  - level: not part of the overall MMU role and will never match as the MMU's
+	 *           level tracks the root level
+	 *  - access: updated based on the new guest PTE
+	 *  - quadrant: not part of the overall MMU role (similar to level)
+	 */
+	const union kvm_mmu_page_role sync_role_ign = {
+		.level = 0xf,
+		.access = 0x7,
+		.quadrant = 0x3,
+	};
+
+	/*
+	 * Direct pages can never be unsync, and KVM should never attempt to
+	 * sync a shadow page for a different MMU context, e.g. if the role
+	 * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
+	 * reserved bits checks will be wrong, etc...
+	 */
+	if (WARN_ON_ONCE(sp->role.direct ||
+			 (sp->role.word ^ mmu_role.word) & ~sync_role_ign.word))
+		return 0;
 
 	first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 12/54] KVM: x86/mmu: Drop the intermediate "transient" __kvm_sync_page()
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (10 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 11/54] KVM: x86/mmu: WARN and zap SP when sync'ing if MMU role mismatches Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-23 16:54   ` Paolo Bonzini
  2021-06-22 17:56 ` [PATCH 13/54] KVM: x86/mmu: Rename unsync helper and update related comments Sean Christopherson
                   ` (42 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Nove the kvm_unlink_unsync_page() call out of kvm_sync_page() and into
it's sole caller, and fold __kvm_sync_page() into kvm_sync_page() since
the latter becomes a pure pass-through.  There really should be no reason
for code to do a complete sync of a shadow page outside of the full
kvm_mmu_sync_roots(), e.g. the one use case that creeped in turned out to
be flawed and counter-productive.

Update the comment in kvm_mmu_get_page() regarding its sync_page() usage,
which is anything but obvious.

Drop the stale comment about @sp->gfn needing to be write-protected, as
it directly contradicts the kvm_mmu_get_page() usage.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 36 +++++++++++++++++++-----------------
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2e2d66319325..77296ce6215f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1780,18 +1780,6 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])	\
 		if ((_sp)->gfn != (_gfn) || (_sp)->role.direct) {} else
 
-/* @sp->gfn should be write-protected at the call site */
-static bool __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-			    struct list_head *invalid_list)
-{
-	if (vcpu->arch.mmu->sync_page(vcpu, sp) == 0) {
-		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
-		return false;
-	}
-
-	return true;
-}
-
 static bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm,
 					struct list_head *invalid_list,
 					bool remote_flush)
@@ -1833,8 +1821,12 @@ static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			 struct list_head *invalid_list)
 {
-	kvm_unlink_unsync_page(vcpu->kvm, sp);
-	return __kvm_sync_page(vcpu, sp, invalid_list);
+	if (vcpu->arch.mmu->sync_page(vcpu, sp) == 0) {
+		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
+		return false;
+	}
+
+	return true;
 }
 
 struct mmu_page_path {
@@ -1931,6 +1923,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 		}
 
 		for_each_sp(pages, sp, parents, i) {
+			kvm_unlink_unsync_page(vcpu->kvm, sp);
 			flush |= kvm_sync_page(vcpu, sp, &invalid_list);
 			mmu_pages_clear_parents(&parents);
 		}
@@ -2008,10 +2001,19 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 			goto trace_get_page;
 
 		if (sp->unsync) {
-			/* The page is good, but __kvm_sync_page might still end
-			 * up zapping it.  If so, break in order to rebuild it.
+			/*
+			 * The page is good, but is stale.  "Sync" the page to
+			 * get the latest guest state, but don't write-protect
+			 * the page and don't mark it synchronized!  KVM needs
+			 * to ensure the mapping is valid, but doesn't need to
+			 * fully sync (write-protect) the page until the guest
+			 * invalidates the TLB mapping.  This allows multiple
+			 * SPs for a single gfn to be unsync.
+			 *
+			 * If the sync fails, the page is zapped.  If so, break
+			 * If so, break in order to rebuild it.
 			 */
-			if (!__kvm_sync_page(vcpu, sp, &invalid_list))
+			if (!kvm_sync_page(vcpu, sp, &invalid_list))
 				break;
 
 			WARN_ON(!list_empty(&invalid_list));
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 13/54] KVM: x86/mmu: Rename unsync helper and update related comments
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (11 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 12/54] KVM: x86/mmu: Drop the intermediate "transient" __kvm_sync_page() Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-22 17:56 ` [PATCH 14/54] KVM: x86: Fix sizes used to pass around CR0, CR4, and EFER Sean Christopherson
                   ` (41 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Rename mmu_need_write_protect() to mmu_try_to_unsync_pages() and update
a variety of related, stale comments.  Add several new comments to call
out subtle details, e.g. that upper-level shadow pages are write-tracked,
and that can_unsync is false iff KVM is in the process of synchronizing
pages.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 34 ++++++++++++++++++++++++---------
 arch/x86/kvm/mmu/mmu_internal.h |  3 +--
 arch/x86/kvm/mmu/spte.c         | 10 ++++++++--
 3 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 77296ce6215f..0171c245ecc7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2458,17 +2458,33 @@ static void kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	kvm_mmu_mark_parents_unsync(sp);
 }
 
-bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
-			    bool can_unsync)
+/*
+ * Attempt to unsync any shadow pages that can be reached by the specified gfn,
+ * KVM is creating a writable mapping for said gfn.  Returns 0 if all pages
+ * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
+ * be write-protected.
+ */
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync)
 {
 	struct kvm_mmu_page *sp;
 
+	/*
+	 * Force write-protection if the page is being tracked.  Note, the page
+	 * track machinery is used to write-protect upper-level shadow pages,
+	 * i.e. this guards the role.level == 4K assertion below!
+	 */
 	if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
-		return true;
+		return -EPERM;
 
+	/*
+	 * The page is not write-tracked, mark existing shadow pages unsync
+	 * unless KVM is synchronizing an unsync SP (can_unsync = false).  In
+	 * that case, KVM must complete emulation of the guest TLB flush before
+	 * allowing shadow pages to become unsync (writable by the guest).
+	 */
 	for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) {
 		if (!can_unsync)
-			return true;
+			return -EPERM;
 
 		if (sp->unsync)
 			continue;
@@ -2499,8 +2515,8 @@ bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 	 *                      2.2 Guest issues TLB flush.
 	 *                          That causes a VM Exit.
 	 *
-	 *                      2.3 kvm_mmu_sync_pages() reads sp->unsync.
-	 *                          Since it is false, so it just returns.
+	 *                      2.3 Walking of unsync pages sees sp->unsync is
+	 *                          false and skips the page.
 	 *
 	 *                      2.4 Guest accesses GVA X.
 	 *                          Since the mapping in the SP was not updated,
@@ -2516,7 +2532,7 @@ bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 	 */
 	smp_wmb();
 
-	return false;
+	return 0;
 }
 
 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
@@ -3461,8 +3477,8 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 		 * flush strictly after those changes are made. We only need to
 		 * ensure that the other CPU sets these flags before any actual
 		 * changes to the page tables are made. The comments in
-		 * mmu_need_write_protect() describe what could go wrong if this
-		 * requirement isn't satisfied.
+		 * mmu_try_to_unsync_pages() describe what could go wrong if
+		 * this requirement isn't satisfied.
 		 */
 		if (!smp_load_acquire(&sp->unsync) &&
 		    !smp_load_acquire(&sp->unsync_children))
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 18be103df9d5..35567293c1fd 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -122,8 +122,7 @@ static inline bool is_nx_huge_page_enabled(void)
 	return READ_ONCE(nx_huge_pages);
 }
 
-bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
-			    bool can_unsync);
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync);
 
 void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 8e8e8da740a0..246e61e0771e 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -147,13 +147,19 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 		/*
 		 * Optimization: for pte sync, if spte was writable the hash
 		 * lookup is unnecessary (and expensive). Write protection
-		 * is responsibility of mmu_get_page / kvm_sync_page.
+		 * is responsibility of kvm_mmu_get_page / kvm_mmu_sync_roots.
 		 * Same reasoning can be applied to dirty page accounting.
 		 */
 		if (!can_unsync && is_writable_pte(old_spte))
 			goto out;
 
-		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
+		/*
+		 * Unsync shadow pages that are reachable by the new, writable
+		 * SPTE.  Write-protect the SPTE if the page can't be unsync'd,
+		 * e.g. it's write-tracked (upper-level SPs) or has one or more
+		 * shadow pages and unsync'ing pages is not allowed.
+		 */
+		if (mmu_try_to_unsync_pages(vcpu, gfn, can_unsync)) {
 			pgprintk("%s: found shadow page for %llx, marking ro\n",
 				 __func__, gfn);
 			ret |= SET_SPTE_WRITE_PROTECTED_PT;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 14/54] KVM: x86: Fix sizes used to pass around CR0, CR4, and EFER
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (12 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 13/54] KVM: x86/mmu: Rename unsync helper and update related comments Sean Christopherson
@ 2021-06-22 17:56 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 15/54] KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU state Sean Christopherson
                   ` (40 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

When configuring KVM's MMU, pass CR0 and CR4 as unsigned longs, and EFER
as a u64 in various flows (mostly MMU).  Passing the params as u32s is
functionally ok since all of the affected registers reserve bits 63:32 to
zero (enforced by KVM), but it's technically wrong.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu.h        |  4 ++--
 arch/x86/kvm/mmu/mmu.c    | 11 ++++++-----
 arch/x86/kvm/svm/nested.c |  2 +-
 arch/x86/kvm/x86.c        |  2 +-
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index bc11402df83b..47131b92b990 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -66,8 +66,8 @@ void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 
 void kvm_init_mmu(struct kvm_vcpu *vcpu);
-void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer,
-			     gpa_t nested_cr3);
+void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
+			     unsigned long cr4, u64 efer, gpa_t nested_cr3);
 void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 			     bool accessed_dirty, gpa_t new_eptp);
 bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0171c245ecc7..96c16a6e0044 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4659,8 +4659,8 @@ kvm_calc_shadow_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only)
 }
 
 static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
-				    u32 cr0, u32 cr4, u32 efer,
-				    union kvm_mmu_role new_role)
+				    unsigned long cr0, unsigned long cr4,
+				    u64 efer, union kvm_mmu_role new_role)
 {
 	if (!(cr0 & X86_CR0_PG))
 		nonpaging_init_context(vcpu, context);
@@ -4675,7 +4675,8 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 	reset_shadow_zero_bits_mask(vcpu, context);
 }
 
-static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer)
+static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
+				unsigned long cr4, u64 efer)
 {
 	struct kvm_mmu *context = &vcpu->arch.root_mmu;
 	union kvm_mmu_role new_role =
@@ -4697,8 +4698,8 @@ kvm_calc_shadow_npt_root_page_role(struct kvm_vcpu *vcpu)
 	return role;
 }
 
-void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer,
-			     gpa_t nested_cr3)
+void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
+			     unsigned long cr4, u64 efer, gpa_t nested_cr3)
 {
 	struct kvm_mmu *context = &vcpu->arch.guest_mmu;
 	union kvm_mmu_role new_role = kvm_calc_shadow_npt_root_page_role(vcpu);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dca20f949b63..9f0e7ed672b2 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1244,8 +1244,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 		&user_kvm_nested_state->data.svm[0];
 	struct vmcb_control_area *ctl;
 	struct vmcb_save_area *save;
+	unsigned long cr0;
 	int ret;
-	u32 cr0;
 
 	BUILD_BUG_ON(sizeof(struct vmcb_control_area) + sizeof(struct vmcb_save_area) >
 		     KVM_STATE_NESTED_SVM_VMCB_SIZE);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 92b4a9305651..2d3b9f10b14a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9076,8 +9076,8 @@ static void enter_smm(struct kvm_vcpu *vcpu)
 {
 	struct kvm_segment cs, ds;
 	struct desc_ptr dt;
+	unsigned long cr0;
 	char buf[512];
-	u32 cr0;
 
 	memset(buf, 0, 512);
 #ifdef CONFIG_X86_64
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 15/54] KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU state
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (13 preceding siblings ...)
  2021-06-22 17:56 ` [PATCH 14/54] KVM: x86: Fix sizes used to pass around CR0, CR4, and EFER Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-23 17:06   ` Paolo Bonzini
  2021-06-22 17:57 ` [PATCH 16/54] KVM: x86/mmu: Drop smep_andnot_wp check from "uses NX" for shadow MMUs Sean Christopherson
                   ` (39 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Add a comment in the nested NPT initialization flow to call out that it
intentionally uses vmcb01 instead current vCPU state to get the effective
hCR4 and hEFER for L1's NPT context.

Note, despite nSVM's efforts to handle the case where vCPU state doesn't
reflect L1 state, the MMU may still do the wrong thing due to pulling
state from the vCPU instead of the passed in CR0/CR4/EFER values.  This
will be addressed in future commits.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9f0e7ed672b2..33b2f9337e26 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -98,6 +98,12 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
 	WARN_ON(mmu_is_nested(vcpu));
 
 	vcpu->arch.mmu = &vcpu->arch.guest_mmu;
+
+	/*
+	 * L1's CR4 and EFER are stuffed into vmcb01 by the caller.  Note, when
+	 * called via KVM_SET_NESTED_STATE, that state may _not_ match current
+	 * vCPU state.  CR0.WP is explicitly ignored, while CR0.PG is required.
+	 */
 	kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, svm->vmcb01.ptr->save.cr4,
 				svm->vmcb01.ptr->save.efer,
 				svm->nested.ctl.nested_cr3);
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 16/54] KVM: x86/mmu: Drop smep_andnot_wp check from "uses NX" for shadow MMUs
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (14 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 15/54] KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU state Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-23 17:11   ` Paolo Bonzini
  2021-06-22 17:57 ` [PATCH 17/54] KVM: x86: Read and pass all CR0/CR4 role bits to shadow MMU helper Sean Christopherson
                   ` (38 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Drop the smep_andnot_wp role check from the "uses NX" calculation now
that all non-nested shadow MMUs treat NX as used via the !TDP check.

The shadow MMU for nested NPT, which shares the helper, does not need to
deal with SMEP (or WP) as NPT walks are always "user" accesses and WP is
explicitly noted as being ignored:

  Table walks for guest page tables are always treated as user writes at
  the nested page table level.

  A table walk for the guest page itself is always treated as a user
  access at the nested page table level

  The host hCR0.WP bit is ignored under nested paging.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 96c16a6e0044..ca7680d1ea24 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4223,8 +4223,7 @@ reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 	 * NX can be used by any non-nested shadow MMU to avoid having to reset
 	 * MMU contexts.  Note, KVM forces EFER.NX=1 when TDP is disabled.
 	 */
-	bool uses_nx = context->nx || !tdp_enabled ||
-		context->mmu_role.base.smep_andnot_wp;
+	bool uses_nx = context->nx || !tdp_enabled;
 	struct rsvd_bits_validate *shadow_zero_check;
 	int i;
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 17/54] KVM: x86: Read and pass all CR0/CR4 role bits to shadow MMU helper
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (15 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 16/54] KVM: x86/mmu: Drop smep_andnot_wp check from "uses NX" for shadow MMUs Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 18/54] KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper Sean Christopherson
                   ` (37 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Grab all CR0/CR4 MMU role bits from current vCPU state when initializing
a non-nested shadow MMU.  Extract the masks from kvm_post_set_cr{0,4}(),
as the CR0/CR4 update masks must exactly match the mmu_role bits, with
one exception (see below).  The "full" CR0/CR4 will be used by future
commits to initialize the MMU and its role, as opposed to the current
approach of pulling everything from vCPU, which is incorrect for certain
flows, e.g. nested NPT.

CR4.LA57 is an exception, as it can be toggled on VM-Exit (for L1's MMU)
but can't be toggled via MOV CR4 while long mode is active.  I.e. LA57
needs to be in the mmu_role, but technically doesn't need to be checked
by kvm_post_set_cr4().  However, the extra check is completely benign as
the hardware restrictions simply mean LA57 will never be _the_ cause of
a MMU reset during MOV CR4.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu.h     | 6 ++++++
 arch/x86/kvm/mmu/mmu.c | 4 ++--
 arch/x86/kvm/x86.c     | 9 ++-------
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 47131b92b990..4e926f4935b0 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -44,6 +44,12 @@
 #define PT32_ROOT_LEVEL 2
 #define PT32E_ROOT_LEVEL 3
 
+#define KVM_MMU_CR4_ROLE_BITS (X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE | \
+			       X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE | \
+			       X86_CR4_LA57)
+
+#define KVM_MMU_CR0_ROLE_BITS (X86_CR0_PG | X86_CR0_WP)
+
 static __always_inline u64 rsvd_bits(int s, int e)
 {
 	BUILD_BUG_ON(__builtin_constant_p(e) && __builtin_constant_p(s) && e < s);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ca7680d1ea24..02c54426e7a2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4778,8 +4778,8 @@ static void init_kvm_softmmu(struct kvm_vcpu *vcpu)
 	struct kvm_mmu *context = &vcpu->arch.root_mmu;
 
 	kvm_init_shadow_mmu(vcpu,
-			    kvm_read_cr0_bits(vcpu, X86_CR0_PG),
-			    kvm_read_cr4_bits(vcpu, X86_CR4_PAE),
+			    kvm_read_cr0_bits(vcpu, KVM_MMU_CR0_ROLE_BITS),
+			    kvm_read_cr4_bits(vcpu, KVM_MMU_CR4_ROLE_BITS),
 			    vcpu->arch.efer);
 
 	context->get_guest_pgd     = get_cr3;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2d3b9f10b14a..cdce4b134bef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -832,14 +832,12 @@ EXPORT_SYMBOL_GPL(load_pdptrs);
 
 void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
 {
-	unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;
-
 	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
 		kvm_clear_async_pf_completion_queue(vcpu);
 		kvm_async_pf_hash_reset(vcpu);
 	}
 
-	if ((cr0 ^ old_cr0) & update_bits)
+	if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
 		kvm_mmu_reset_context(vcpu);
 
 	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
@@ -1018,10 +1016,7 @@ EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);
 
 void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
 {
-	unsigned long mmu_role_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
-				      X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
-
-	if (((cr4 ^ old_cr4) & mmu_role_bits) ||
+	if (((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS) ||
 	    (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
 		kvm_mmu_reset_context(vcpu);
 }
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 18/54] KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (16 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 17/54] KVM: x86: Read and pass all CR0/CR4 role bits to shadow MMU helper Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-23 17:13   ` Paolo Bonzini
  2021-06-22 17:57 ` [PATCH 19/54] KVM: x86/mmu: Grab shadow root level from mmu_role for shadow MMUs Sean Christopherson
                   ` (36 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Move nested NPT's invocation of reset_shadow_zero_bits_mask() into the
MMU proper and unexport said function.  Aside from dropping an export,
this is a baby step toward eliminating the call entirely by fixing the
shadow_root_level confusion.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu.h        |  3 ---
 arch/x86/kvm/mmu/mmu.c    | 11 ++++++++---
 arch/x86/kvm/svm/nested.c |  1 -
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 4e926f4935b0..62844bacd13f 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -68,9 +68,6 @@ static __always_inline u64 rsvd_bits(int s, int e)
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
 void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
 
-void
-reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
-
 void kvm_init_mmu(struct kvm_vcpu *vcpu);
 void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 			     unsigned long cr4, u64 efer, gpa_t nested_cr3);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 02c54426e7a2..5a46a87b23b0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4212,8 +4212,8 @@ static inline u64 reserved_hpa_bits(void)
  * table in guest or amd nested guest, its mmu features completely
  * follow the features in guest.
  */
-void
-reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
+static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
+					struct kvm_mmu *context)
 {
 	/*
 	 * KVM uses NX when TDP is disabled to handle a variety of scenarios,
@@ -4247,7 +4247,6 @@ reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 	}
 
 }
-EXPORT_SYMBOL_GPL(reset_shadow_zero_bits_mask);
 
 static inline bool boot_cpu_is_amd(void)
 {
@@ -4714,6 +4713,12 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 		 */
 		context->shadow_root_level = new_role.base.level;
 	}
+
+	/*
+	 * Redo the shadow bits, the reset done by shadow_mmu_init_context()
+	 * (above) may use the wrong shadow_root_level.
+	 */
+	reset_shadow_zero_bits_mask(vcpu, context);
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
 
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 33b2f9337e26..927e545591c3 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -110,7 +110,6 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu->get_guest_pgd     = nested_svm_get_tdp_cr3;
 	vcpu->arch.mmu->get_pdptr         = nested_svm_get_tdp_pdptr;
 	vcpu->arch.mmu->inject_page_fault = nested_svm_inject_npf_exit;
-	reset_shadow_zero_bits_mask(vcpu, vcpu->arch.mmu);
 	vcpu->arch.walk_mmu              = &vcpu->arch.nested_mmu;
 }
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 19/54] KVM: x86/mmu: Grab shadow root level from mmu_role for shadow MMUs
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (17 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 18/54] KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 20/54] KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from regs Sean Christopherson
                   ` (35 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the mmu_role to initialize shadow root level instead of assuming the
level of KVM's shadow root (host) is the same as that of the guest root,
or in the case of 32-bit non-PAE paging where KVM forces PAE paging.
For nested NPT, the shadow root level cannot be adapted to L1's NPT root
level and is instead always the TDP root level because NPT uses the
current host CR0/CR4/EFER, e.g. 64-bit KVM can't drop into 32-bit PAE to
shadow L1's NPT.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5a46a87b23b0..5e3ee4aba2ff 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3898,7 +3898,6 @@ static void nonpaging_init_context(struct kvm_vcpu *vcpu,
 	context->sync_page = nonpaging_sync_page;
 	context->invlpg = NULL;
 	context->root_level = 0;
-	context->shadow_root_level = PT32E_ROOT_LEVEL;
 	context->direct_map = true;
 	context->nx = false;
 }
@@ -4466,10 +4465,10 @@ static void update_last_nonleaf_level(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu
 
 static void paging64_init_context_common(struct kvm_vcpu *vcpu,
 					 struct kvm_mmu *context,
-					 int level)
+					 int root_level)
 {
 	context->nx = is_nx(vcpu);
-	context->root_level = level;
+	context->root_level = root_level;
 
 	reset_rsvds_bits_mask(vcpu, context);
 	update_permission_bitmask(vcpu, context, false);
@@ -4481,7 +4480,6 @@ static void paging64_init_context_common(struct kvm_vcpu *vcpu,
 	context->gva_to_gpa = paging64_gva_to_gpa;
 	context->sync_page = paging64_sync_page;
 	context->invlpg = paging64_invlpg;
-	context->shadow_root_level = level;
 	context->direct_map = false;
 }
 
@@ -4509,7 +4507,6 @@ static void paging32_init_context(struct kvm_vcpu *vcpu,
 	context->gva_to_gpa = paging32_gva_to_gpa;
 	context->sync_page = paging32_sync_page;
 	context->invlpg = paging32_invlpg;
-	context->shadow_root_level = PT32E_ROOT_LEVEL;
 	context->direct_map = false;
 }
 
@@ -4669,6 +4666,8 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 	else
 		paging32_init_context(vcpu, context);
 
+	context->shadow_root_level = new_role.base.level;
+
 	context->mmu_role.as_u64 = new_role.as_u64;
 	reset_shadow_zero_bits_mask(vcpu, context);
 }
@@ -4704,16 +4703,9 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 
 	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
 
-	if (new_role.as_u64 != context->mmu_role.as_u64) {
+	if (new_role.as_u64 != context->mmu_role.as_u64)
 		shadow_mmu_init_context(vcpu, context, cr0, cr4, efer, new_role);
 
-		/*
-		 * Override the level set by the common init helper, nested TDP
-		 * always uses the host's TDP configuration.
-		 */
-		context->shadow_root_level = new_role.base.level;
-	}
-
 	/*
 	 * Redo the shadow bits, the reset done by shadow_mmu_init_context()
 	 * (above) may use the wrong shadow_root_level.
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 20/54] KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from regs
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (18 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 19/54] KVM: x86/mmu: Grab shadow root level from mmu_role for shadow MMUs Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-23  1:58   ` kernel test robot
  2021-06-23 17:18   ` Paolo Bonzini
  2021-06-22 17:57 ` [PATCH 21/54] KVM: x86/mmu: Consolidate misc updates into shadow_mmu_init_context() Sean Christopherson
                   ` (34 subsequent siblings)
  54 siblings, 2 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Introduce "struct kvm_mmu_role_regs" to hold the register state that is
incorporated into the mmu_role.  For nested TDP, the register state that
is factored into the MMU isn't vCPU state; the dedicated struct will be
used to propagate the correct state throughout the flows without having
to pass multiple params, and also provides helpers for the various flag
accessors.

Intentionally make the new helpers cumbersome/ugly by prepending four
underscores.  In the not-too-distant future, it will be preferable to use
the mmu_role to query bits as the mmu_role can drop irrelevant bits
without creating contradictions, e.g. clearing CR4 bits when CR0.PG=0.
Reserve the clean helper names (no underscores) for the mmu_role.

Add a helper for vCPU conversion, which is the common case.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 66 +++++++++++++++++++++++++++++++++---------
 1 file changed, 53 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5e3ee4aba2ff..3616c3b7618e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -176,9 +176,46 @@ static void mmu_spte_set(u64 *sptep, u64 spte);
 static union kvm_mmu_page_role
 kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
 
+struct kvm_mmu_role_regs {
+	const unsigned long cr0;
+	const unsigned long cr4;
+	const u64 efer;
+};
+
 #define CREATE_TRACE_POINTS
 #include "mmutrace.h"
 
+/*
+ * Yes, lot's of underscores.  They're a hint that you probably shouldn't be
+ * reading from the role_regs.  Once the mmu_role is constructed, it becomes
+ * the single source of truth for the MMU's state.
+ */
+#define BUILD_MMU_ROLE_REGS_ACCESSOR(reg, name, flag)			\
+static inline bool ____is_##reg##_##name(struct kvm_mmu_role_regs *regs)\
+{									\
+	return !!(regs->reg & flag);					\
+}
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr0, pg, X86_CR0_PG);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr0, wp, X86_CR0_WP);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pse, X86_CR4_PSE);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pae, X86_CR4_PAE);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smep, X86_CR4_SMEP);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smap, X86_CR4_SMAP);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pke, X86_CR4_PKE);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, la57, X86_CR4_LA57);
+BUILD_MMU_ROLE_REGS_ACCESSOR(efer, nx, EFER_NX);
+BUILD_MMU_ROLE_REGS_ACCESSOR(efer, lma, EFER_LMA);
+
+struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu_role_regs regs = {
+		.cr0 = kvm_read_cr0_bits(vcpu, KVM_MMU_CR0_ROLE_BITS),
+		.cr4 = kvm_read_cr4_bits(vcpu, KVM_MMU_CR4_ROLE_BITS),
+		.efer = vcpu->arch.efer,
+	};
+
+	return regs;
+}
 
 static inline bool kvm_available_flush_tlb_with_range(void)
 {
@@ -4654,14 +4691,14 @@ kvm_calc_shadow_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only)
 }
 
 static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
-				    unsigned long cr0, unsigned long cr4,
-				    u64 efer, union kvm_mmu_role new_role)
+				    struct kvm_mmu_role_regs *regs,
+				    union kvm_mmu_role new_role)
 {
-	if (!(cr0 & X86_CR0_PG))
+	if (!____is_cr0_pg(regs))
 		nonpaging_init_context(vcpu, context);
-	else if (efer & EFER_LMA)
+	else if (____is_efer_lma(regs))
 		paging64_init_context(vcpu, context);
-	else if (cr4 & X86_CR4_PAE)
+	else if (____is_cr4_pae(regs))
 		paging32E_init_context(vcpu, context);
 	else
 		paging32_init_context(vcpu, context);
@@ -4672,15 +4709,15 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 	reset_shadow_zero_bits_mask(vcpu, context);
 }
 
-static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
-				unsigned long cr4, u64 efer)
+static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
+				struct kvm_mmu_role_regs *regs)
 {
 	struct kvm_mmu *context = &vcpu->arch.root_mmu;
 	union kvm_mmu_role new_role =
 		kvm_calc_shadow_mmu_root_page_role(vcpu, false);
 
 	if (new_role.as_u64 != context->mmu_role.as_u64)
-		shadow_mmu_init_context(vcpu, context, cr0, cr4, efer, new_role);
+		shadow_mmu_init_context(vcpu, context, regs, new_role);
 }
 
 static union kvm_mmu_role
@@ -4699,12 +4736,17 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 			     unsigned long cr4, u64 efer, gpa_t nested_cr3)
 {
 	struct kvm_mmu *context = &vcpu->arch.guest_mmu;
+	struct kvm_mmu_role_regs regs = {
+		.cr0 = cr0,
+		.cr4 = cr4,
+		.efer = efer,
+	};
 	union kvm_mmu_role new_role = kvm_calc_shadow_npt_root_page_role(vcpu);
 
 	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
 
 	if (new_role.as_u64 != context->mmu_role.as_u64)
-		shadow_mmu_init_context(vcpu, context, cr0, cr4, efer, new_role);
+		shadow_mmu_init_context(vcpu, context, &regs, new_role);
 
 	/*
 	 * Redo the shadow bits, the reset done by shadow_mmu_init_context()
@@ -4773,11 +4815,9 @@ EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
 static void init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *context = &vcpu->arch.root_mmu;
+	struct kvm_mmu_role_regs regs = vcpu_to_role_regs(vcpu);
 
-	kvm_init_shadow_mmu(vcpu,
-			    kvm_read_cr0_bits(vcpu, KVM_MMU_CR0_ROLE_BITS),
-			    kvm_read_cr4_bits(vcpu, KVM_MMU_CR4_ROLE_BITS),
-			    vcpu->arch.efer);
+	kvm_init_shadow_mmu(vcpu, &regs);
 
 	context->get_guest_pgd     = get_cr3;
 	context->get_pdptr         = kvm_pdptr_read;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 21/54] KVM: x86/mmu: Consolidate misc updates into shadow_mmu_init_context()
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (19 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 20/54] KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from regs Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 22/54] KVM: x86/mmu: Ignore CR0 and CR4 bits in nested EPT MMU role Sean Christopherson
                   ` (33 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Consolidate the MMU metadata update calls to deduplicate code, and to
prep for future cleanup.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3616c3b7618e..241408e6576d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4507,11 +4507,6 @@ static void paging64_init_context_common(struct kvm_vcpu *vcpu,
 	context->nx = is_nx(vcpu);
 	context->root_level = root_level;
 
-	reset_rsvds_bits_mask(vcpu, context);
-	update_permission_bitmask(vcpu, context, false);
-	update_pkru_bitmask(vcpu, context, false);
-	update_last_nonleaf_level(vcpu, context);
-
 	MMU_WARN_ON(!is_pae(vcpu));
 	context->page_fault = paging64_page_fault;
 	context->gva_to_gpa = paging64_gva_to_gpa;
@@ -4534,12 +4529,6 @@ static void paging32_init_context(struct kvm_vcpu *vcpu,
 {
 	context->nx = false;
 	context->root_level = PT32_ROOT_LEVEL;
-
-	reset_rsvds_bits_mask(vcpu, context);
-	update_permission_bitmask(vcpu, context, false);
-	update_pkru_bitmask(vcpu, context, false);
-	update_last_nonleaf_level(vcpu, context);
-
 	context->page_fault = paging32_page_fault;
 	context->gva_to_gpa = paging32_gva_to_gpa;
 	context->sync_page = paging32_sync_page;
@@ -4703,6 +4692,12 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 	else
 		paging32_init_context(vcpu, context);
 
+	if (____is_cr0_pg(regs)) {
+		reset_rsvds_bits_mask(vcpu, context);
+		update_permission_bitmask(vcpu, context, false);
+		update_pkru_bitmask(vcpu, context, false);
+		update_last_nonleaf_level(vcpu, context);
+	}
 	context->shadow_root_level = new_role.base.level;
 
 	context->mmu_role.as_u64 = new_role.as_u64;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 22/54] KVM: x86/mmu: Ignore CR0 and CR4 bits in nested EPT MMU role
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (20 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 21/54] KVM: x86/mmu: Consolidate misc updates into shadow_mmu_init_context() Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 23/54] KVM: x86/mmu: Use MMU's role_regs, not vCPU state, to compute mmu_role Sean Christopherson
                   ` (32 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Do not incorporate CR0/CR4 bits into the role for the nested EPT MMU, as
EPT behavior is not influenced by CR0/CR4.  Note, this is the guest_mmu,
(L1's EPT), not nested_mmu (L2's IA32 paging); the nested_mmu does need
CR0/CR4, and is initialized in a separate flow.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 241408e6576d..84a40488eba7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4767,8 +4767,10 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
 	role.base.guest_mode = true;
 	role.base.access = ACC_ALL;
 
-	role.ext = kvm_calc_mmu_role_ext(vcpu);
+	/* EPT, and thus nested EPT, does not consume CR0, CR4, nor EFER. */
+	role.ext.word = 0;
 	role.ext.execonly = execonly;
+	role.ext.valid = 1;
 
 	return role;
 }
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 23/54] KVM: x86/mmu: Use MMU's role_regs, not vCPU state, to compute mmu_role
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (21 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 22/54] KVM: x86/mmu: Ignore CR0 and CR4 bits in nested EPT MMU role Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 24/54] KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans Sean Christopherson
                   ` (31 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the provided role_regs to calculate the mmu_role instead of pulling
bits from current vCPU state.  For some flows, e.g. nested TDP, the vCPU
state may not be correct (or relevant).

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 92 ++++++++++++++++++++++++------------------
 1 file changed, 52 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 84a40488eba7..896e92eac28b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4542,17 +4542,18 @@ static void paging32E_init_context(struct kvm_vcpu *vcpu,
 	paging64_init_context_common(vcpu, context, PT32E_ROOT_LEVEL);
 }
 
-static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu)
+static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu,
+							 struct kvm_mmu_role_regs *regs)
 {
 	union kvm_mmu_extended_role ext = {0};
 
-	ext.cr0_pg = !!is_paging(vcpu);
-	ext.cr4_pae = !!is_pae(vcpu);
-	ext.cr4_smep = !!kvm_read_cr4_bits(vcpu, X86_CR4_SMEP);
-	ext.cr4_smap = !!kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
-	ext.cr4_pse = !!is_pse(vcpu);
-	ext.cr4_pke = !!kvm_read_cr4_bits(vcpu, X86_CR4_PKE);
-	ext.cr4_la57 = !!kvm_read_cr4_bits(vcpu, X86_CR4_LA57);
+	ext.cr0_pg = ____is_cr0_pg(regs);
+	ext.cr4_pae = ____is_cr4_pae(regs);
+	ext.cr4_smep = ____is_cr4_smep(regs);
+	ext.cr4_smap = ____is_cr4_smap(regs);
+	ext.cr4_pse = ____is_cr4_pse(regs);
+	ext.cr4_pke = ____is_cr4_pke(regs);
+	ext.cr4_la57 = ____is_cr4_la57(regs);
 
 	ext.valid = 1;
 
@@ -4560,20 +4561,21 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu)
 }
 
 static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,
+						   struct kvm_mmu_role_regs *regs,
 						   bool base_only)
 {
 	union kvm_mmu_role role = {0};
 
 	role.base.access = ACC_ALL;
-	role.base.nxe = !!is_nx(vcpu);
-	role.base.cr0_wp = is_write_protection(vcpu);
+	role.base.nxe = ____is_efer_nx(regs);
+	role.base.cr0_wp = ____is_cr0_wp(regs);
 	role.base.smm = is_smm(vcpu);
 	role.base.guest_mode = is_guest_mode(vcpu);
 
 	if (base_only)
 		return role;
 
-	role.ext = kvm_calc_mmu_role_ext(vcpu);
+	role.ext = kvm_calc_mmu_role_ext(vcpu, regs);
 
 	return role;
 }
@@ -4588,9 +4590,10 @@ static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
 }
 
 static union kvm_mmu_role
-kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only)
+kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
+				struct kvm_mmu_role_regs *regs, bool base_only)
 {
-	union kvm_mmu_role role = kvm_calc_mmu_role_common(vcpu, base_only);
+	union kvm_mmu_role role = kvm_calc_mmu_role_common(vcpu, regs, base_only);
 
 	role.base.ad_disabled = (shadow_accessed_mask == 0);
 	role.base.level = kvm_mmu_get_tdp_level(vcpu);
@@ -4603,8 +4606,9 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only)
 static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *context = &vcpu->arch.root_mmu;
+	struct kvm_mmu_role_regs regs = vcpu_to_role_regs(vcpu);
 	union kvm_mmu_role new_role =
-		kvm_calc_tdp_mmu_root_page_role(vcpu, false);
+		kvm_calc_tdp_mmu_root_page_role(vcpu, &regs, false);
 
 	if (new_role.as_u64 == context->mmu_role.as_u64)
 		return;
@@ -4648,30 +4652,30 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 }
 
 static union kvm_mmu_role
-kvm_calc_shadow_root_page_role_common(struct kvm_vcpu *vcpu, bool base_only)
+kvm_calc_shadow_root_page_role_common(struct kvm_vcpu *vcpu,
+				      struct kvm_mmu_role_regs *regs, bool base_only)
 {
-	union kvm_mmu_role role = kvm_calc_mmu_role_common(vcpu, base_only);
+	union kvm_mmu_role role = kvm_calc_mmu_role_common(vcpu, regs, base_only);
 
-	role.base.smep_andnot_wp = role.ext.cr4_smep &&
-		!is_write_protection(vcpu);
-	role.base.smap_andnot_wp = role.ext.cr4_smap &&
-		!is_write_protection(vcpu);
-	role.base.gpte_is_8_bytes = !!is_pae(vcpu);
+	role.base.smep_andnot_wp = role.ext.cr4_smep && !____is_cr0_wp(regs);
+	role.base.smap_andnot_wp = role.ext.cr4_smap && !____is_cr0_wp(regs);
+	role.base.gpte_is_8_bytes = ____is_cr4_pae(regs);
 
 	return role;
 }
 
 static union kvm_mmu_role
-kvm_calc_shadow_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only)
+kvm_calc_shadow_mmu_root_page_role(struct kvm_vcpu *vcpu,
+				   struct kvm_mmu_role_regs *regs, bool base_only)
 {
 	union kvm_mmu_role role =
-		kvm_calc_shadow_root_page_role_common(vcpu, base_only);
+		kvm_calc_shadow_root_page_role_common(vcpu, regs, base_only);
 
-	role.base.direct = !is_paging(vcpu);
+	role.base.direct = !____is_cr0_pg(regs);
 
-	if (!is_long_mode(vcpu))
+	if (!____is_efer_lma(regs))
 		role.base.level = PT32E_ROOT_LEVEL;
-	else if (is_la57_mode(vcpu))
+	else if (____is_cr4_la57(regs))
 		role.base.level = PT64_ROOT_5LEVEL;
 	else
 		role.base.level = PT64_ROOT_4LEVEL;
@@ -4709,17 +4713,18 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
 {
 	struct kvm_mmu *context = &vcpu->arch.root_mmu;
 	union kvm_mmu_role new_role =
-		kvm_calc_shadow_mmu_root_page_role(vcpu, false);
+		kvm_calc_shadow_mmu_root_page_role(vcpu, regs, false);
 
 	if (new_role.as_u64 != context->mmu_role.as_u64)
 		shadow_mmu_init_context(vcpu, context, regs, new_role);
 }
 
 static union kvm_mmu_role
-kvm_calc_shadow_npt_root_page_role(struct kvm_vcpu *vcpu)
+kvm_calc_shadow_npt_root_page_role(struct kvm_vcpu *vcpu,
+				   struct kvm_mmu_role_regs *regs)
 {
 	union kvm_mmu_role role =
-		kvm_calc_shadow_root_page_role_common(vcpu, false);
+		kvm_calc_shadow_root_page_role_common(vcpu, regs, false);
 
 	role.base.direct = false;
 	role.base.level = kvm_mmu_get_tdp_level(vcpu);
@@ -4736,7 +4741,9 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 		.cr4 = cr4,
 		.efer = efer,
 	};
-	union kvm_mmu_role new_role = kvm_calc_shadow_npt_root_page_role(vcpu);
+	union kvm_mmu_role new_role;
+
+	new_role = kvm_calc_shadow_npt_root_page_role(vcpu, &regs);
 
 	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
 
@@ -4821,9 +4828,12 @@ static void init_kvm_softmmu(struct kvm_vcpu *vcpu)
 	context->inject_page_fault = kvm_inject_page_fault;
 }
 
-static union kvm_mmu_role kvm_calc_nested_mmu_role(struct kvm_vcpu *vcpu)
+static union kvm_mmu_role
+kvm_calc_nested_mmu_role(struct kvm_vcpu *vcpu, struct kvm_mmu_role_regs *regs)
 {
-	union kvm_mmu_role role = kvm_calc_shadow_root_page_role_common(vcpu, false);
+	union kvm_mmu_role role;
+
+	role = kvm_calc_shadow_root_page_role_common(vcpu, regs, false);
 
 	/*
 	 * Nested MMUs are used only for walking L2's gva->gpa, they never have
@@ -4832,12 +4842,12 @@ static union kvm_mmu_role kvm_calc_nested_mmu_role(struct kvm_vcpu *vcpu)
 	 */
 	role.base.direct = true;
 
-	if (!is_paging(vcpu))
+	if (!____is_cr0_pg(regs))
 		role.base.level = 0;
-	else if (is_long_mode(vcpu))
-		role.base.level = is_la57_mode(vcpu) ? PT64_ROOT_5LEVEL :
-						       PT64_ROOT_4LEVEL;
-	else if (is_pae(vcpu))
+	else if (____is_efer_lma(regs))
+		role.base.level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL :
+							  PT64_ROOT_4LEVEL;
+	else if (____is_cr4_pae(regs))
 		role.base.level = PT32E_ROOT_LEVEL;
 	else
 		role.base.level = PT32_ROOT_LEVEL;
@@ -4847,7 +4857,8 @@ static union kvm_mmu_role kvm_calc_nested_mmu_role(struct kvm_vcpu *vcpu)
 
 static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 {
-	union kvm_mmu_role new_role = kvm_calc_nested_mmu_role(vcpu);
+	struct kvm_mmu_role_regs regs = vcpu_to_role_regs(vcpu);
+	union kvm_mmu_role new_role = kvm_calc_nested_mmu_role(vcpu, &regs);
 	struct kvm_mmu *g_context = &vcpu->arch.nested_mmu;
 
 	if (new_role.as_u64 == g_context->mmu_role.as_u64)
@@ -4913,12 +4924,13 @@ EXPORT_SYMBOL_GPL(kvm_init_mmu);
 static union kvm_mmu_page_role
 kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu)
 {
+	struct kvm_mmu_role_regs regs = vcpu_to_role_regs(vcpu);
 	union kvm_mmu_role role;
 
 	if (tdp_enabled)
-		role = kvm_calc_tdp_mmu_root_page_role(vcpu, true);
+		role = kvm_calc_tdp_mmu_root_page_role(vcpu, &regs, true);
 	else
-		role = kvm_calc_shadow_mmu_root_page_role(vcpu, true);
+		role = kvm_calc_shadow_mmu_root_page_role(vcpu, &regs, true);
 
 	return role.base;
 }
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 24/54] KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (22 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 23/54] KVM: x86/mmu: Use MMU's role_regs, not vCPU state, to compute mmu_role Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 25/54] KVM: x86/mmu: Add helpers to query mmu_role bits Sean Christopherson
                   ` (30 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Rename "nxe" to "efer_nx" so that future macro magic can use the pattern
<reg>_<bit> for all CR0, CR4, and EFER bits that included in the role.
Using "efer_nx" also makes it clear that the role bit reflects EFER.NX,
not the NX bit in the corresponding PTE.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/mmu.rst            | 4 ++--
 arch/x86/include/asm/kvm_host.h           | 4 ++--
 arch/x86/kvm/mmu/mmu.c                    | 2 +-
 arch/x86/kvm/mmu/mmutrace.h               | 2 +-
 tools/lib/traceevent/plugins/plugin_kvm.c | 4 ++--
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst
index ddbb23998742..f60f5488e121 100644
--- a/Documentation/virt/kvm/mmu.rst
+++ b/Documentation/virt/kvm/mmu.rst
@@ -180,8 +180,8 @@ Shadow pages contain the following information:
   role.gpte_is_8_bytes:
     Reflects the size of the guest PTE for which the page is valid, i.e. '1'
     if 64-bit gptes are in use, '0' if 32-bit gptes are in use.
-  role.nxe:
-    Contains the value of efer.nxe for which the page is valid.
+  role.efer_nx:
+    Contains the value of efer.nx for which the page is valid.
   role.cr0_wp:
     Contains the value of cr0.wp for which the page is valid.
   role.smep_andnot_wp:
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cdaff399ed94..8aa798c75e9a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -274,7 +274,7 @@ struct kvm_kernel_irq_routing_entry;
  * by indirect shadow page can not be more than 15 bits.
  *
  * Currently, we used 14 bits that are @level, @gpte_is_8_bytes, @quadrant, @access,
- * @nxe, @cr0_wp, @smep_andnot_wp and @smap_andnot_wp.
+ * @efer_nx, @cr0_wp, @smep_andnot_wp and @smap_andnot_wp.
  */
 union kvm_mmu_page_role {
 	u32 word;
@@ -285,7 +285,7 @@ union kvm_mmu_page_role {
 		unsigned direct:1;
 		unsigned access:3;
 		unsigned invalid:1;
-		unsigned nxe:1;
+		unsigned efer_nx:1;
 		unsigned cr0_wp:1;
 		unsigned smep_andnot_wp:1;
 		unsigned smap_andnot_wp:1;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 896e92eac28b..7bc5b1a8fca5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4567,7 +4567,7 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,
 	union kvm_mmu_role role = {0};
 
 	role.base.access = ACC_ALL;
-	role.base.nxe = ____is_efer_nx(regs);
+	role.base.efer_nx = ____is_efer_nx(regs);
 	role.base.cr0_wp = ____is_cr0_wp(regs);
 	role.base.smm = is_smm(vcpu);
 	role.base.guest_mode = is_guest_mode(vcpu);
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index e798489b56b5..efbad33a0645 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -40,7 +40,7 @@
 			 role.direct ? " direct" : "",			\
 			 access_str[role.access],			\
 			 role.invalid ? " invalid" : "",		\
-			 role.nxe ? "" : "!",				\
+			 role.efer_nx ? "" : "!",			\
 			 role.ad_disabled ? "!" : "",			\
 			 __entry->root_count,				\
 			 __entry->unsync ? "unsync" : "sync", 0);	\
diff --git a/tools/lib/traceevent/plugins/plugin_kvm.c b/tools/lib/traceevent/plugins/plugin_kvm.c
index 51ceeb9147eb..9ce7b4b68e3f 100644
--- a/tools/lib/traceevent/plugins/plugin_kvm.c
+++ b/tools/lib/traceevent/plugins/plugin_kvm.c
@@ -366,7 +366,7 @@ union kvm_mmu_page_role {
 		unsigned direct:1;
 		unsigned access:3;
 		unsigned invalid:1;
-		unsigned nxe:1;
+		unsigned efer_nx:1;
 		unsigned cr0_wp:1;
 		unsigned smep_and_not_wp:1;
 		unsigned smap_and_not_wp:1;
@@ -403,7 +403,7 @@ static int kvm_mmu_print_role(struct trace_seq *s, struct tep_record *record,
 				 access_str[role.access],
 				 role.invalid ? " invalid" : "",
 				 role.cr4_pae ? "" : "!",
-				 role.nxe ? "" : "!",
+				 role.efer_nx ? "" : "!",
 				 role.cr0_wp ? "" : "!",
 				 role.smep_and_not_wp ? " smep" : "",
 				 role.smap_and_not_wp ? " smap" : "",
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 25/54] KVM: x86/mmu: Add helpers to query mmu_role bits
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (23 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 24/54] KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-23 20:02   ` Paolo Bonzini
  2021-06-22 17:57 ` [PATCH 26/54] KVM: x86/mmu: Do not set paging-related bits in MMU role if CR0.PG=0 Sean Christopherson
                   ` (29 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Add helpers via a builder macro for all mmu_role bits that track a CR0,
CR4, or EFER bit.  Digging out the bits manually is not exactly the most
readable code.

Future commits will switch to using mmu_role instead of vCPU state to
configure the MMU, i.e. there are about to be a large number of users.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 21 +++++++++++++++++++++
 arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7bc5b1a8fca5..be95595b30c7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -206,6 +206,27 @@ BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, la57, X86_CR4_LA57);
 BUILD_MMU_ROLE_REGS_ACCESSOR(efer, nx, EFER_NX);
 BUILD_MMU_ROLE_REGS_ACCESSOR(efer, lma, EFER_LMA);
 
+/*
+ * The MMU itself (with a valid role) is the single source of truth for the
+ * MMU.  Do not use the regs used to build the MMU/role, nor the vCPU.  The
+ * regs don't account for dependencies, e.g. clearing CR4 bits if CR0.PG=1,
+ * and the vCPU may be incorrect/irrelevant.
+ */
+#define BUILD_MMU_ROLE_ACCESSOR(base_or_ext, reg, name)		\
+static inline bool is_##reg##_##name(struct kvm_mmu *mmu)	\
+{								\
+	return !!(mmu->mmu_role. base_or_ext . reg##_##name);	\
+}
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr0, pg);
+BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pae);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
+BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
+
 struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu_role_regs regs = {
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index b632606a87d6..5cf36eb96ee2 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -471,7 +471,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 
 error:
 	errcode |= write_fault | user_fault;
-	if (fetch_fault && (mmu->nx || mmu->mmu_role.ext.cr4_smep))
+	if (fetch_fault && (mmu->nx || is_cr4_smep(mmu)))
 		errcode |= PFERR_FETCH_MASK;
 
 	walker->fault.vector = PF_VECTOR;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 26/54] KVM: x86/mmu: Do not set paging-related bits in MMU role if CR0.PG=0
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (24 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 25/54] KVM: x86/mmu: Add helpers to query mmu_role bits Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 27/54] KVM: x86/mmu: Set CR4.PKE/LA57 in MMU role iff long mode is active Sean Christopherson
                   ` (28 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Don't set CR0/CR4/EFER bits in the MMU role if paging is disabled, paging
modifiers are irrelevant if there is no paging in the first place.
Somewhat arbitrarily clear gpte_is_8_bytes for shadow paging if paging is
disabled in the guest.  Again, there are no guest PTEs to process, so the
size is meaningless.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index be95595b30c7..0eb77a45f1ff 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4568,13 +4568,15 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu,
 {
 	union kvm_mmu_extended_role ext = {0};
 
-	ext.cr0_pg = ____is_cr0_pg(regs);
-	ext.cr4_pae = ____is_cr4_pae(regs);
-	ext.cr4_smep = ____is_cr4_smep(regs);
-	ext.cr4_smap = ____is_cr4_smap(regs);
-	ext.cr4_pse = ____is_cr4_pse(regs);
-	ext.cr4_pke = ____is_cr4_pke(regs);
-	ext.cr4_la57 = ____is_cr4_la57(regs);
+	if (____is_cr0_pg(regs)) {
+		ext.cr0_pg = 1;
+		ext.cr4_pae = ____is_cr4_pae(regs);
+		ext.cr4_smep = ____is_cr4_smep(regs);
+		ext.cr4_smap = ____is_cr4_smap(regs);
+		ext.cr4_pse = ____is_cr4_pse(regs);
+		ext.cr4_pke = ____is_cr4_pke(regs);
+		ext.cr4_la57 = ____is_cr4_la57(regs);
+	}
 
 	ext.valid = 1;
 
@@ -4588,8 +4590,10 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,
 	union kvm_mmu_role role = {0};
 
 	role.base.access = ACC_ALL;
-	role.base.efer_nx = ____is_efer_nx(regs);
-	role.base.cr0_wp = ____is_cr0_wp(regs);
+	if (____is_cr0_pg(regs)) {
+		role.base.efer_nx = ____is_efer_nx(regs);
+		role.base.cr0_wp = ____is_cr0_wp(regs);
+	}
 	role.base.smm = is_smm(vcpu);
 	role.base.guest_mode = is_guest_mode(vcpu);
 
@@ -4680,7 +4684,7 @@ kvm_calc_shadow_root_page_role_common(struct kvm_vcpu *vcpu,
 
 	role.base.smep_andnot_wp = role.ext.cr4_smep && !____is_cr0_wp(regs);
 	role.base.smap_andnot_wp = role.ext.cr4_smap && !____is_cr0_wp(regs);
-	role.base.gpte_is_8_bytes = ____is_cr4_pae(regs);
+	role.base.gpte_is_8_bytes = ____is_cr0_pg(regs) && ____is_cr4_pae(regs);
 
 	return role;
 }
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 27/54] KVM: x86/mmu: Set CR4.PKE/LA57 in MMU role iff long mode is active
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (25 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 26/54] KVM: x86/mmu: Do not set paging-related bits in MMU role if CR0.PG=0 Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 28/54] KVM: x86/mmu: Always Set new mmu_role immediately after checking old role Sean Christopherson
                   ` (27 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Don't set cr4_pke or cr4_la57 in the MMU role if long mode isn't active,
which is required for protection keys and 5-level paging to be fully
enabled.  Ignoring the bit avoids unnecessary reconfiguration on reuse,
and also means consumers of mmu_role don't need to manually check for
long mode.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0eb77a45f1ff..31662283dac7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4574,8 +4574,10 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu,
 		ext.cr4_smep = ____is_cr4_smep(regs);
 		ext.cr4_smap = ____is_cr4_smap(regs);
 		ext.cr4_pse = ____is_cr4_pse(regs);
-		ext.cr4_pke = ____is_cr4_pke(regs);
-		ext.cr4_la57 = ____is_cr4_la57(regs);
+
+		/* PKEY and LA57 are active iff long mode is active. */
+		ext.cr4_pke = ____is_efer_lma(regs) && ____is_cr4_pke(regs);
+		ext.cr4_la57 = ____is_efer_lma(regs) && ____is_cr4_la57(regs);
 	}
 
 	ext.valid = 1;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 28/54] KVM: x86/mmu: Always Set new mmu_role immediately after checking old role
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (26 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 27/54] KVM: x86/mmu: Set CR4.PKE/LA57 in MMU role iff long mode is active Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 29/54] KVM: x86/mmu: Don't grab CR4.PSE for calculating shadow reserved bits Sean Christopherson
                   ` (26 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Refactor shadow MMU initialization to immediately set its new mmu_role
after verifying it differs from the old role, and so that all flavors
of MMU initialization share the same check-and-set pattern.  Immediately
setting the role will allow future commits to use mmu_role to configure
the MMU without consuming stale state.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 31662283dac7..337a3e571db6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4714,6 +4714,11 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 				    struct kvm_mmu_role_regs *regs,
 				    union kvm_mmu_role new_role)
 {
+	if (new_role.as_u64 == context->mmu_role.as_u64)
+		return;
+
+	context->mmu_role.as_u64 = new_role.as_u64;
+
 	if (!____is_cr0_pg(regs))
 		nonpaging_init_context(vcpu, context);
 	else if (____is_efer_lma(regs))
@@ -4731,7 +4736,6 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 	}
 	context->shadow_root_level = new_role.base.level;
 
-	context->mmu_role.as_u64 = new_role.as_u64;
 	reset_shadow_zero_bits_mask(vcpu, context);
 }
 
@@ -4742,8 +4746,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
 	union kvm_mmu_role new_role =
 		kvm_calc_shadow_mmu_root_page_role(vcpu, regs, false);
 
-	if (new_role.as_u64 != context->mmu_role.as_u64)
-		shadow_mmu_init_context(vcpu, context, regs, new_role);
+	shadow_mmu_init_context(vcpu, context, regs, new_role);
 }
 
 static union kvm_mmu_role
@@ -4774,8 +4777,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 
 	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
 
-	if (new_role.as_u64 != context->mmu_role.as_u64)
-		shadow_mmu_init_context(vcpu, context, &regs, new_role);
+	shadow_mmu_init_context(vcpu, context, &regs, new_role);
 
 	/*
 	 * Redo the shadow bits, the reset done by shadow_mmu_init_context()
@@ -4823,6 +4825,8 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 	if (new_role.as_u64 == context->mmu_role.as_u64)
 		return;
 
+	context->mmu_role.as_u64 = new_role.as_u64;
+
 	context->shadow_root_level = level;
 
 	context->nx = true;
@@ -4833,7 +4837,6 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 	context->invlpg = ept_invlpg;
 	context->root_level = level;
 	context->direct_map = false;
-	context->mmu_role.as_u64 = new_role.as_u64;
 
 	update_permission_bitmask(vcpu, context, true);
 	update_pkru_bitmask(vcpu, context, true);
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 29/54] KVM: x86/mmu: Don't grab CR4.PSE for calculating shadow reserved bits
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (27 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 28/54] KVM: x86/mmu: Always Set new mmu_role immediately after checking old role Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 30/54] KVM: x86/mmu: Use MMU's role to get CR4.PSE for computing rsvd bits Sean Christopherson
                   ` (25 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Unconditionally pass pse=false when calculating reserved bits for shadow
PTEs.  CR4.PSE is only relevant for 32-bit non-PAE paging, which KVM does
not use for shadow paging (including nested NPT).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 337a3e571db6..ffcaede019e4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4281,19 +4281,22 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 	 * MMU contexts.  Note, KVM forces EFER.NX=1 when TDP is disabled.
 	 */
 	bool uses_nx = context->nx || !tdp_enabled;
+
+	/* @amd adds a check on bit of SPTEs, which KVM shouldn't use anyways. */
+	bool is_amd = true;
+	/* KVM doesn't use 2-level page tables for the shadow MMU. */
+	bool is_pse = false;
 	struct rsvd_bits_validate *shadow_zero_check;
 	int i;
 
-	/*
-	 * Passing "true" to the last argument is okay; it adds a check
-	 * on bit 8 of the SPTEs which KVM doesn't use anyway.
-	 */
+	WARN_ON_ONCE(context->shadow_root_level < PT32E_ROOT_LEVEL);
+
 	shadow_zero_check = &context->shadow_zero_check;
 	__reset_rsvds_bits_mask(vcpu, shadow_zero_check,
 				reserved_hpa_bits(),
 				context->shadow_root_level, uses_nx,
 				guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES),
-				is_pse(vcpu), true);
+				is_pse, is_amd);
 
 	if (!shadow_me_mask)
 		return;
@@ -4329,7 +4332,7 @@ reset_tdp_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 					reserved_hpa_bits(),
 					context->shadow_root_level, false,
 					boot_cpu_has(X86_FEATURE_GBPAGES),
-					true, true);
+					false, true);
 	else
 		__reset_rsvds_bits_mask_ept(shadow_zero_check,
 					    reserved_hpa_bits(), false);
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 30/54] KVM: x86/mmu: Use MMU's role to get CR4.PSE for computing rsvd bits
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (28 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 29/54] KVM: x86/mmu: Don't grab CR4.PSE for calculating shadow reserved bits Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 31/54] KVM: x86/mmu: Drop vCPU param from reserved bits calculator Sean Christopherson
                   ` (24 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the MMU's role to get CR4.PSE when calculating reserved bits for the
guest's PTEs.  Practically speaking, this is a glorified nop as the role
always come from vCPU state for the relevant flows, but converting to
the roles will provide consistency once everything else is converted, and
will Just Work if the "always comes from vCPU" behavior were ever to
change (unlikely).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ffcaede019e4..e912d9a83e22 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4216,7 +4216,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 				vcpu->arch.reserved_gpa_bits,
 				context->root_level, context->nx,
 				guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES),
-				is_pse(vcpu),
+				is_cr4_pse(context),
 				guest_cpuid_is_amd_or_hygon(vcpu));
 }
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 31/54] KVM: x86/mmu: Drop vCPU param from reserved bits calculator
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (29 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 30/54] KVM: x86/mmu: Use MMU's role to get CR4.PSE for computing rsvd bits Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 32/54] KVM: x86/mmu: Use MMU's role to compute permission bitmask Sean Christopherson
                   ` (23 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Drop the vCPU param from __reset_rsvds_bits_mask() as it's now unused,
and ideally will remain unused in the future.  Any information that's
needed by the low level helper should be explicitly provided as it's used
for both shadow/host MMUs and guest MMUs, i.e. vCPU state may be
meaningless or simply wrong.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e912d9a83e22..c3bf5d4186e9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4119,8 +4119,7 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu,
 #undef PTTYPE
 
 static void
-__reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
-			struct rsvd_bits_validate *rsvd_check,
+__reset_rsvds_bits_mask(struct rsvd_bits_validate *rsvd_check,
 			u64 pa_bits_rsvd, int level, bool nx, bool gbpages,
 			bool pse, bool amd)
 {
@@ -4212,7 +4211,7 @@ __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 				  struct kvm_mmu *context)
 {
-	__reset_rsvds_bits_mask(vcpu, &context->guest_rsvd_check,
+	__reset_rsvds_bits_mask(&context->guest_rsvd_check,
 				vcpu->arch.reserved_gpa_bits,
 				context->root_level, context->nx,
 				guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES),
@@ -4292,8 +4291,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 	WARN_ON_ONCE(context->shadow_root_level < PT32E_ROOT_LEVEL);
 
 	shadow_zero_check = &context->shadow_zero_check;
-	__reset_rsvds_bits_mask(vcpu, shadow_zero_check,
-				reserved_hpa_bits(),
+	__reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
 				context->shadow_root_level, uses_nx,
 				guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES),
 				is_pse, is_amd);
@@ -4328,8 +4326,7 @@ reset_tdp_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 	shadow_zero_check = &context->shadow_zero_check;
 
 	if (boot_cpu_is_amd())
-		__reset_rsvds_bits_mask(vcpu, shadow_zero_check,
-					reserved_hpa_bits(),
+		__reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
 					context->shadow_root_level, false,
 					boot_cpu_has(X86_FEATURE_GBPAGES),
 					false, true);
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 32/54] KVM: x86/mmu: Use MMU's role to compute permission bitmask
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (30 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 31/54] KVM: x86/mmu: Drop vCPU param from reserved bits calculator Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 33/54] KVM: x86/mmu: Use MMU's role to compute PKRU bitmask Sean Christopherson
                   ` (22 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the MMU's role to generate the permission bitmasks for the MMU.
For some flows, the vCPU state may not be correct (or relevant), e.g.
the nested NPT MMU can be initialized with incoherent vCPU state.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c3bf5d4186e9..bd412e082356 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4365,8 +4365,7 @@ reset_ept_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 	 (7 & (access) ? 128 : 0))
 
 
-static void update_permission_bitmask(struct kvm_vcpu *vcpu,
-				      struct kvm_mmu *mmu, bool ept)
+static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
 {
 	unsigned byte;
 
@@ -4374,9 +4373,9 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
 	const u8 w = BYTE_MASK(ACC_WRITE_MASK);
 	const u8 u = BYTE_MASK(ACC_USER_MASK);
 
-	bool cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) != 0;
-	bool cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP) != 0;
-	bool cr0_wp = is_write_protection(vcpu);
+	bool cr4_smep = is_cr4_smep(mmu);
+	bool cr4_smap = is_cr4_smap(mmu);
+	bool cr0_wp = is_cr0_wp(mmu);
 
 	for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) {
 		unsigned pfec = byte << 1;
@@ -4672,7 +4671,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->gva_to_gpa = paging32_gva_to_gpa;
 	}
 
-	update_permission_bitmask(vcpu, context, false);
+	update_permission_bitmask(context, false);
 	update_pkru_bitmask(vcpu, context, false);
 	update_last_nonleaf_level(vcpu, context);
 	reset_tdp_shadow_zero_bits_mask(vcpu, context);
@@ -4730,7 +4729,7 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 
 	if (____is_cr0_pg(regs)) {
 		reset_rsvds_bits_mask(vcpu, context);
-		update_permission_bitmask(vcpu, context, false);
+		update_permission_bitmask(context, false);
 		update_pkru_bitmask(vcpu, context, false);
 		update_last_nonleaf_level(vcpu, context);
 	}
@@ -4838,7 +4837,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 	context->root_level = level;
 	context->direct_map = false;
 
-	update_permission_bitmask(vcpu, context, true);
+	update_permission_bitmask(context, true);
 	update_pkru_bitmask(vcpu, context, true);
 	update_last_nonleaf_level(vcpu, context);
 	reset_rsvds_bits_mask_ept(vcpu, context, execonly);
@@ -4935,7 +4934,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 		g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
 	}
 
-	update_permission_bitmask(vcpu, g_context, false);
+	update_permission_bitmask(g_context, false);
 	update_pkru_bitmask(vcpu, g_context, false);
 	update_last_nonleaf_level(vcpu, g_context);
 }
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 33/54] KVM: x86/mmu: Use MMU's role to compute PKRU bitmask
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (31 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 32/54] KVM: x86/mmu: Use MMU's role to compute permission bitmask Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 34/54] KVM: x86/mmu: Use MMU's roles to compute last non-leaf level Sean Christopherson
                   ` (21 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the MMU's role to calculate the Protection Keys (Restrict Userspace)
bitmask instead of pulling bits from current vCPU state.  For some flows,
the vCPU state may not be correct (or relevant), e.g. EPT doesn't
interact with PKRU.  Case in point, the "ept" param simply disappears.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bd412e082356..dcde7514358b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4460,24 +4460,17 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
 * away both AD and WD.  For all reads or if the last condition holds, WD
 * only will be masked away.
 */
-static void update_pkru_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-				bool ept)
+static void update_pkru_bitmask(struct kvm_mmu *mmu)
 {
 	unsigned bit;
 	bool wp;
 
-	if (ept) {
+	if (!is_cr4_pke(mmu)) {
 		mmu->pkru_mask = 0;
 		return;
 	}
 
-	/* PKEY is enabled only if CR4.PKE and EFER.LMA are both set. */
-	if (!kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || !is_long_mode(vcpu)) {
-		mmu->pkru_mask = 0;
-		return;
-	}
-
-	wp = is_write_protection(vcpu);
+	wp = is_cr0_wp(mmu);
 
 	for (bit = 0; bit < ARRAY_SIZE(mmu->permissions); ++bit) {
 		unsigned pfec, pkey_bits;
@@ -4672,7 +4665,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	}
 
 	update_permission_bitmask(context, false);
-	update_pkru_bitmask(vcpu, context, false);
+	update_pkru_bitmask(context);
 	update_last_nonleaf_level(vcpu, context);
 	reset_tdp_shadow_zero_bits_mask(vcpu, context);
 }
@@ -4730,7 +4723,7 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 	if (____is_cr0_pg(regs)) {
 		reset_rsvds_bits_mask(vcpu, context);
 		update_permission_bitmask(context, false);
-		update_pkru_bitmask(vcpu, context, false);
+		update_pkru_bitmask(context);
 		update_last_nonleaf_level(vcpu, context);
 	}
 	context->shadow_root_level = new_role.base.level;
@@ -4838,8 +4831,8 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 	context->direct_map = false;
 
 	update_permission_bitmask(context, true);
-	update_pkru_bitmask(vcpu, context, true);
 	update_last_nonleaf_level(vcpu, context);
+	update_pkru_bitmask(context);
 	reset_rsvds_bits_mask_ept(vcpu, context, execonly);
 	reset_ept_shadow_zero_bits_mask(vcpu, context, execonly);
 }
@@ -4935,7 +4928,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 	}
 
 	update_permission_bitmask(g_context, false);
-	update_pkru_bitmask(vcpu, g_context, false);
+	update_pkru_bitmask(g_context);
 	update_last_nonleaf_level(vcpu, g_context);
 }
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 34/54] KVM: x86/mmu: Use MMU's roles to compute last non-leaf level
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (32 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 33/54] KVM: x86/mmu: Use MMU's role to compute PKRU bitmask Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 35/54] KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk Sean Christopherson
                   ` (20 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the MMU's role to get CR4.PSE when determining the last level at
which the guest _cannot_ create a non-leaf PTE, i.e. cannot create a
huge page.

Note, the existing logic is arguably wrong when considering 5-level
paging and the case where 1gb pages aren't supported.  In practice, the
logic is confusing but not broken, because except for 32-bit non-PAE
paging, the PAGE_SIZE bit is reserved when a huge page isn't supported at
that level.  I.e. PAGE_SIZE=1 will terminate the guest walk one way or
another.  Furthermore, last_nonleaf_level is only consulted after KVM has
verified there are no reserved bits set.

All that confusion will be addressed in a future patch by dropping
last_nonleaf_level entirely.  For now, massage the code to continue the
march toward using mmu_role for (almost) all MMU computations.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dcde7514358b..67aa19ab628d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4504,12 +4504,12 @@ static void update_pkru_bitmask(struct kvm_mmu *mmu)
 	}
 }
 
-static void update_last_nonleaf_level(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
+static void update_last_nonleaf_level(struct kvm_mmu *mmu)
 {
 	unsigned root_level = mmu->root_level;
 
 	mmu->last_nonleaf_level = root_level;
-	if (root_level == PT32_ROOT_LEVEL && is_pse(vcpu))
+	if (root_level == PT32_ROOT_LEVEL && is_cr4_pse(mmu))
 		mmu->last_nonleaf_level++;
 }
 
@@ -4666,7 +4666,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 
 	update_permission_bitmask(context, false);
 	update_pkru_bitmask(context);
-	update_last_nonleaf_level(vcpu, context);
+	update_last_nonleaf_level(context);
 	reset_tdp_shadow_zero_bits_mask(vcpu, context);
 }
 
@@ -4724,7 +4724,7 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 		reset_rsvds_bits_mask(vcpu, context);
 		update_permission_bitmask(context, false);
 		update_pkru_bitmask(context);
-		update_last_nonleaf_level(vcpu, context);
+		update_last_nonleaf_level(context);
 	}
 	context->shadow_root_level = new_role.base.level;
 
@@ -4831,7 +4831,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 	context->direct_map = false;
 
 	update_permission_bitmask(context, true);
-	update_last_nonleaf_level(vcpu, context);
+	update_last_nonleaf_level(context);
 	update_pkru_bitmask(context);
 	reset_rsvds_bits_mask_ept(vcpu, context, execonly);
 	reset_ept_shadow_zero_bits_mask(vcpu, context, execonly);
@@ -4929,7 +4929,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 
 	update_permission_bitmask(g_context, false);
 	update_pkru_bitmask(g_context);
-	update_last_nonleaf_level(vcpu, g_context);
+	update_last_nonleaf_level(g_context);
 }
 
 void kvm_init_mmu(struct kvm_vcpu *vcpu)
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 35/54] KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (33 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 34/54] KVM: x86/mmu: Use MMU's roles to compute last non-leaf level Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 36/54] KVM: x86/mmu: Use MMU's role/role_regs to compute context's metadata Sean Christopherson
                   ` (19 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the NX bit from the MMU's role instead of the MMU itself so that the
redundant, dedicated "nx" flag can be dropped.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 5cf36eb96ee2..c92e712607b6 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -471,7 +471,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 
 error:
 	errcode |= write_fault | user_fault;
-	if (fetch_fault && (mmu->nx || is_cr4_smep(mmu)))
+	if (fetch_fault && (is_efer_nx(mmu) || is_cr4_smep(mmu)))
 		errcode |= PFERR_FETCH_MASK;
 
 	walker->fault.vector = PF_VECTOR;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 36/54] KVM: x86/mmu: Use MMU's role/role_regs to compute context's metadata
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (34 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 35/54] KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 37/54] KVM: x86/mmu: Use MMU's role to get EFER.NX during MMU configuration Sean Christopherson
                   ` (18 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the MMU's role and role_regs to calculate the MMU's guest root level
and NX bit.  For some flows, the vCPU state may not be correct (or
relevant), e.g. EPT doesn't interact with EFER.NX and nested NPT will
configure the guest_mmu with possibly-stale vCPU state.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 36 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 67aa19ab628d..30cbc6cdb0db 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3948,8 +3948,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 				 max_level, true);
 }
 
-static void nonpaging_init_context(struct kvm_vcpu *vcpu,
-				   struct kvm_mmu *context)
+static void nonpaging_init_context(struct kvm_mmu *context)
 {
 	context->page_fault = nonpaging_page_fault;
 	context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -4513,14 +4512,13 @@ static void update_last_nonleaf_level(struct kvm_mmu *mmu)
 		mmu->last_nonleaf_level++;
 }
 
-static void paging64_init_context_common(struct kvm_vcpu *vcpu,
-					 struct kvm_mmu *context,
+static void paging64_init_context_common(struct kvm_mmu *context,
 					 int root_level)
 {
-	context->nx = is_nx(vcpu);
+	context->nx = is_efer_nx(context);
 	context->root_level = root_level;
 
-	MMU_WARN_ON(!is_pae(vcpu));
+	WARN_ON_ONCE(!is_cr4_pae(context));
 	context->page_fault = paging64_page_fault;
 	context->gva_to_gpa = paging64_gva_to_gpa;
 	context->sync_page = paging64_sync_page;
@@ -4528,17 +4526,16 @@ static void paging64_init_context_common(struct kvm_vcpu *vcpu,
 	context->direct_map = false;
 }
 
-static void paging64_init_context(struct kvm_vcpu *vcpu,
-				  struct kvm_mmu *context)
+static void paging64_init_context(struct kvm_mmu *context,
+				  struct kvm_mmu_role_regs *regs)
 {
-	int root_level = is_la57_mode(vcpu) ?
-			 PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;
+	int root_level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL :
+						 PT64_ROOT_4LEVEL;
 
-	paging64_init_context_common(vcpu, context, root_level);
+	paging64_init_context_common(context, root_level);
 }
 
-static void paging32_init_context(struct kvm_vcpu *vcpu,
-				  struct kvm_mmu *context)
+static void paging32_init_context(struct kvm_mmu *context)
 {
 	context->nx = false;
 	context->root_level = PT32_ROOT_LEVEL;
@@ -4549,10 +4546,9 @@ static void paging32_init_context(struct kvm_vcpu *vcpu,
 	context->direct_map = false;
 }
 
-static void paging32E_init_context(struct kvm_vcpu *vcpu,
-				   struct kvm_mmu *context)
+static void paging32E_init_context(struct kvm_mmu *context)
 {
-	paging64_init_context_common(vcpu, context, PT32E_ROOT_LEVEL);
+	paging64_init_context_common(context, PT32E_ROOT_LEVEL);
 }
 
 static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu,
@@ -4712,13 +4708,13 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 	context->mmu_role.as_u64 = new_role.as_u64;
 
 	if (!____is_cr0_pg(regs))
-		nonpaging_init_context(vcpu, context);
+		nonpaging_init_context(context);
 	else if (____is_efer_lma(regs))
-		paging64_init_context(vcpu, context);
+		paging64_init_context(context, regs);
 	else if (____is_cr4_pae(regs))
-		paging32E_init_context(vcpu, context);
+		paging32E_init_context(context);
 	else
-		paging32_init_context(vcpu, context);
+		paging32_init_context(context);
 
 	if (____is_cr0_pg(regs)) {
 		reset_rsvds_bits_mask(vcpu, context);
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 37/54] KVM: x86/mmu: Use MMU's role to get EFER.NX during MMU configuration
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (35 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 36/54] KVM: x86/mmu: Use MMU's role/role_regs to compute context's metadata Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 38/54] KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers Sean Christopherson
                   ` (17 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Get the MMU's effective EFER.NX from its role instead of using the
one-off, dedicated flag.  This will allow dropping said flag in a
future commit.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 30cbc6cdb0db..eb6386bcc2ef 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4212,7 +4212,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 {
 	__reset_rsvds_bits_mask(&context->guest_rsvd_check,
 				vcpu->arch.reserved_gpa_bits,
-				context->root_level, context->nx,
+				context->root_level, is_efer_nx(context),
 				guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES),
 				is_cr4_pse(context),
 				guest_cpuid_is_amd_or_hygon(vcpu));
@@ -4278,7 +4278,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 	 * NX can be used by any non-nested shadow MMU to avoid having to reset
 	 * MMU contexts.  Note, KVM forces EFER.NX=1 when TDP is disabled.
 	 */
-	bool uses_nx = context->nx || !tdp_enabled;
+	bool uses_nx = is_efer_nx(context) || !tdp_enabled;
 
 	/* @amd adds a check on bit of SPTEs, which KVM shouldn't use anyways. */
 	bool is_amd = true;
@@ -4375,6 +4375,7 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
 	bool cr4_smep = is_cr4_smep(mmu);
 	bool cr4_smap = is_cr4_smap(mmu);
 	bool cr0_wp = is_cr0_wp(mmu);
+	bool efer_nx = is_efer_nx(mmu);
 
 	for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) {
 		unsigned pfec = byte << 1;
@@ -4400,7 +4401,7 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
 			u8 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
 
 			/* Not really needed: !nx will cause pte.nx to fault */
-			if (!mmu->nx)
+			if (!efer_nx)
 				ff = 0;
 
 			/* Allow supervisor writes if !cr0.wp */
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 38/54] KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (36 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 37/54] KVM: x86/mmu: Use MMU's role to get EFER.NX during MMU configuration Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 39/54] KVM: x86/mmu: Get nested MMU's root level from the MMU's role Sean Christopherson
                   ` (16 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Drop kvm_mmu.nx as there no consumers left.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 --
 arch/x86/kvm/mmu/mmu.c          | 17 -----------------
 2 files changed, 19 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8aa798c75e9a..be7088fb0594 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -423,8 +423,6 @@ struct kvm_mmu {
 	/* Can have large pages at levels 2..last_nonleaf_level-1. */
 	u8 last_nonleaf_level;
 
-	bool nx;
-
 	u64 pdptrs[4]; /* pae */
 };
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index eb6386bcc2ef..6c4655c356b7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -322,11 +322,6 @@ static int is_cpuid_PSE36(void)
 	return 1;
 }
 
-static int is_nx(struct kvm_vcpu *vcpu)
-{
-	return vcpu->arch.efer & EFER_NX;
-}
-
 static gfn_t pse36_gfn_delta(u32 gpte)
 {
 	int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
@@ -3956,7 +3951,6 @@ static void nonpaging_init_context(struct kvm_mmu *context)
 	context->invlpg = NULL;
 	context->root_level = 0;
 	context->direct_map = true;
-	context->nx = false;
 }
 
 static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
@@ -4516,7 +4510,6 @@ static void update_last_nonleaf_level(struct kvm_mmu *mmu)
 static void paging64_init_context_common(struct kvm_mmu *context,
 					 int root_level)
 {
-	context->nx = is_efer_nx(context);
 	context->root_level = root_level;
 
 	WARN_ON_ONCE(!is_cr4_pae(context));
@@ -4538,7 +4531,6 @@ static void paging64_init_context(struct kvm_mmu *context,
 
 static void paging32_init_context(struct kvm_mmu *context)
 {
-	context->nx = false;
 	context->root_level = PT32_ROOT_LEVEL;
 	context->page_fault = paging32_page_fault;
 	context->gva_to_gpa = paging32_gva_to_gpa;
@@ -4640,22 +4632,18 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->inject_page_fault = kvm_inject_page_fault;
 
 	if (!is_paging(vcpu)) {
-		context->nx = false;
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
 		context->root_level = 0;
 	} else if (is_long_mode(vcpu)) {
-		context->nx = is_nx(vcpu);
 		context->root_level = is_la57_mode(vcpu) ?
 				PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;
 		reset_rsvds_bits_mask(vcpu, context);
 		context->gva_to_gpa = paging64_gva_to_gpa;
 	} else if (is_pae(vcpu)) {
-		context->nx = is_nx(vcpu);
 		context->root_level = PT32E_ROOT_LEVEL;
 		reset_rsvds_bits_mask(vcpu, context);
 		context->gva_to_gpa = paging64_gva_to_gpa;
 	} else {
-		context->nx = false;
 		context->root_level = PT32_ROOT_LEVEL;
 		reset_rsvds_bits_mask(vcpu, context);
 		context->gva_to_gpa = paging32_gva_to_gpa;
@@ -4818,7 +4806,6 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 
 	context->shadow_root_level = level;
 
-	context->nx = true;
 	context->ept_ad = accessed_dirty;
 	context->page_fault = ept_page_fault;
 	context->gva_to_gpa = ept_gva_to_gpa;
@@ -4903,22 +4890,18 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 	 * the gva_to_gpa functions between mmu and nested_mmu are swapped.
 	 */
 	if (!is_paging(vcpu)) {
-		g_context->nx = false;
 		g_context->root_level = 0;
 		g_context->gva_to_gpa = nonpaging_gva_to_gpa_nested;
 	} else if (is_long_mode(vcpu)) {
-		g_context->nx = is_nx(vcpu);
 		g_context->root_level = is_la57_mode(vcpu) ?
 					PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;
 		reset_rsvds_bits_mask(vcpu, g_context);
 		g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
 	} else if (is_pae(vcpu)) {
-		g_context->nx = is_nx(vcpu);
 		g_context->root_level = PT32E_ROOT_LEVEL;
 		reset_rsvds_bits_mask(vcpu, g_context);
 		g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
 	} else {
-		g_context->nx = false;
 		g_context->root_level = PT32_ROOT_LEVEL;
 		reset_rsvds_bits_mask(vcpu, g_context);
 		g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 39/54] KVM: x86/mmu: Get nested MMU's root level from the MMU's role
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (37 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 38/54] KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 40/54] KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper Sean Christopherson
                   ` (15 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Initialize the MMU's (guest) root_level using its mmu_role instead of
redoing the calculations.  The role_regs used to calculate the mmu_role
are initialized from the vCPU, i.e. this should be a complete nop.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6c4655c356b7..6418b50d33ca 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4874,6 +4874,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 	g_context->get_guest_pgd     = get_cr3;
 	g_context->get_pdptr         = kvm_pdptr_read;
 	g_context->inject_page_fault = kvm_inject_page_fault;
+	g_context->root_level        = new_role.base.level;
 
 	/*
 	 * L2 page tables are never shadowed, so there is no need to sync
@@ -4890,19 +4891,14 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 	 * the gva_to_gpa functions between mmu and nested_mmu are swapped.
 	 */
 	if (!is_paging(vcpu)) {
-		g_context->root_level = 0;
 		g_context->gva_to_gpa = nonpaging_gva_to_gpa_nested;
 	} else if (is_long_mode(vcpu)) {
-		g_context->root_level = is_la57_mode(vcpu) ?
-					PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;
 		reset_rsvds_bits_mask(vcpu, g_context);
 		g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
 	} else if (is_pae(vcpu)) {
-		g_context->root_level = PT32E_ROOT_LEVEL;
 		reset_rsvds_bits_mask(vcpu, g_context);
 		g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
 	} else {
-		g_context->root_level = PT32_ROOT_LEVEL;
 		reset_rsvds_bits_mask(vcpu, g_context);
 		g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
 	}
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 40/54] KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (38 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 39/54] KVM: x86/mmu: Get nested MMU's root level from the MMU's role Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 41/54] KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls Sean Christopherson
                   ` (14 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Get LA57 from the role_regs, which are initialized from the vCPU even
though TDP is enabled, instead of pulling the value directly from the
vCPU when computing the guest's root_level for TDP MMUs.  Note, the check
is inside an is_long_mode() statement, so that requirement is not lost.

Use role_regs even though the MMU's role is available and arguably
"better".  A future commit will consolidate the guest root level logic,
and it needs access to EFER.LMA, which is not tracked in the role (it
can't be toggled on VM-Exit, unlike LA57).

Drop is_la57_mode() as there are no remaining users, and to discourage
pulling MMU state from the vCPU (in the future).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c |  2 +-
 arch/x86/kvm/x86.h     | 10 ----------
 2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6418b50d33ca..30557b3e5c37 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4635,7 +4635,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
 		context->root_level = 0;
 	} else if (is_long_mode(vcpu)) {
-		context->root_level = is_la57_mode(vcpu) ?
+		context->root_level = ____is_cr4_la57(&regs) ?
 				PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;
 		reset_rsvds_bits_mask(vcpu, context);
 		context->gva_to_gpa = paging64_gva_to_gpa;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 521f74e5bbf2..44ae10312740 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -157,16 +157,6 @@ static inline bool is_64_bit_mode(struct kvm_vcpu *vcpu)
 	return cs_l;
 }
 
-static inline bool is_la57_mode(struct kvm_vcpu *vcpu)
-{
-#ifdef CONFIG_X86_64
-	return (vcpu->arch.efer & EFER_LMA) &&
-		 kvm_read_cr4_bits(vcpu, X86_CR4_LA57);
-#else
-	return 0;
-#endif
-}
-
 static inline bool x86_exception_has_error_code(unsigned int vector)
 {
 	static u32 exception_has_error_code = BIT(DF_VECTOR) | BIT(TS_VECTOR) |
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 41/54] KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (39 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 40/54] KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-23 20:07   ` Paolo Bonzini
  2021-06-22 17:57 ` [PATCH 42/54] KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0 Sean Christopherson
                   ` (13 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Move calls to reset_rsvds_bits_mask() out of the various mode statements
and under a more generic !CR0.PG check.  This will allow for additional
code consolidation in the future.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 30557b3e5c37..52311c2efd5d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4637,18 +4637,18 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	} else if (is_long_mode(vcpu)) {
 		context->root_level = ____is_cr4_la57(&regs) ?
 				PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;
-		reset_rsvds_bits_mask(vcpu, context);
 		context->gva_to_gpa = paging64_gva_to_gpa;
 	} else if (is_pae(vcpu)) {
 		context->root_level = PT32E_ROOT_LEVEL;
-		reset_rsvds_bits_mask(vcpu, context);
 		context->gva_to_gpa = paging64_gva_to_gpa;
 	} else {
 		context->root_level = PT32_ROOT_LEVEL;
-		reset_rsvds_bits_mask(vcpu, context);
 		context->gva_to_gpa = paging32_gva_to_gpa;
 	}
 
+	if (is_cr0_pg(context))
+		reset_rsvds_bits_mask(vcpu, context);
+
 	update_permission_bitmask(context, false);
 	update_pkru_bitmask(context);
 	update_last_nonleaf_level(context);
@@ -4890,18 +4890,17 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 	 * nested page tables as the second level of translation. Basically
 	 * the gva_to_gpa functions between mmu and nested_mmu are swapped.
 	 */
-	if (!is_paging(vcpu)) {
+	if (!is_paging(vcpu))
 		g_context->gva_to_gpa = nonpaging_gva_to_gpa_nested;
-	} else if (is_long_mode(vcpu)) {
-		reset_rsvds_bits_mask(vcpu, g_context);
+	else if (is_long_mode(vcpu))
 		g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
-	} else if (is_pae(vcpu)) {
-		reset_rsvds_bits_mask(vcpu, g_context);
+	else if (is_pae(vcpu))
 		g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
-	} else {
-		reset_rsvds_bits_mask(vcpu, g_context);
+	else
 		g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
-	}
+
+	if (is_cr0_pg(g_context))
+		reset_rsvds_bits_mask(vcpu, g_context);
 
 	update_permission_bitmask(g_context, false);
 	update_pkru_bitmask(g_context);
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 42/54] KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (40 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 41/54] KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 43/54] KVM: x86/mmu: Add helper to update paging metadata Sean Christopherson
                   ` (12 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Don't bother updating the bitmasks and last-leaf information if paging is
disabled as the metadata will never be used.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 52311c2efd5d..30eb1364fc20 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4646,12 +4646,12 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->gva_to_gpa = paging32_gva_to_gpa;
 	}
 
-	if (is_cr0_pg(context))
+	if (is_cr0_pg(context)) {
 		reset_rsvds_bits_mask(vcpu, context);
-
-	update_permission_bitmask(context, false);
-	update_pkru_bitmask(context);
-	update_last_nonleaf_level(context);
+		update_permission_bitmask(context, false);
+		update_pkru_bitmask(context);
+		update_last_nonleaf_level(context);
+	}
 	reset_tdp_shadow_zero_bits_mask(vcpu, context);
 }
 
@@ -4899,12 +4899,12 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 	else
 		g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
 
-	if (is_cr0_pg(g_context))
+	if (is_cr0_pg(g_context)) {
 		reset_rsvds_bits_mask(vcpu, g_context);
-
-	update_permission_bitmask(g_context, false);
-	update_pkru_bitmask(g_context);
-	update_last_nonleaf_level(g_context);
+		update_permission_bitmask(g_context, false);
+		update_pkru_bitmask(g_context);
+		update_last_nonleaf_level(g_context);
+	}
 }
 
 void kvm_init_mmu(struct kvm_vcpu *vcpu)
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 43/54] KVM: x86/mmu: Add helper to update paging metadata
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (41 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 42/54] KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0 Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 44/54] KVM: x86/mmu: Add a helper to calculate root from role_regs Sean Christopherson
                   ` (11 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Consolidate MMU guest metadata updates into a common helper for TDP,
shadow, and nested MMUs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 30eb1364fc20..a79871fe5b01 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4507,6 +4507,18 @@ static void update_last_nonleaf_level(struct kvm_mmu *mmu)
 		mmu->last_nonleaf_level++;
 }
 
+static void reset_guest_paging_metadata(struct kvm_vcpu *vcpu,
+					struct kvm_mmu *mmu)
+{
+	if (!is_cr0_pg(mmu))
+		return;
+
+	reset_rsvds_bits_mask(vcpu, mmu);
+	update_permission_bitmask(mmu, false);
+	update_pkru_bitmask(mmu);
+	update_last_nonleaf_level(mmu);
+}
+
 static void paging64_init_context_common(struct kvm_mmu *context,
 					 int root_level)
 {
@@ -4646,12 +4658,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->gva_to_gpa = paging32_gva_to_gpa;
 	}
 
-	if (is_cr0_pg(context)) {
-		reset_rsvds_bits_mask(vcpu, context);
-		update_permission_bitmask(context, false);
-		update_pkru_bitmask(context);
-		update_last_nonleaf_level(context);
-	}
+	reset_guest_paging_metadata(vcpu, context);
 	reset_tdp_shadow_zero_bits_mask(vcpu, context);
 }
 
@@ -4705,12 +4712,7 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 	else
 		paging32_init_context(context);
 
-	if (____is_cr0_pg(regs)) {
-		reset_rsvds_bits_mask(vcpu, context);
-		update_permission_bitmask(context, false);
-		update_pkru_bitmask(context);
-		update_last_nonleaf_level(context);
-	}
+	reset_guest_paging_metadata(vcpu, context);
 	context->shadow_root_level = new_role.base.level;
 
 	reset_shadow_zero_bits_mask(vcpu, context);
@@ -4899,12 +4901,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 	else
 		g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
 
-	if (is_cr0_pg(g_context)) {
-		reset_rsvds_bits_mask(vcpu, g_context);
-		update_permission_bitmask(g_context, false);
-		update_pkru_bitmask(g_context);
-		update_last_nonleaf_level(g_context);
-	}
+	reset_guest_paging_metadata(vcpu, g_context);
 }
 
 void kvm_init_mmu(struct kvm_vcpu *vcpu)
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 44/54] KVM: x86/mmu: Add a helper to calculate root from role_regs
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (42 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 43/54] KVM: x86/mmu: Add helper to update paging metadata Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 45/54] KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers Sean Christopherson
                   ` (10 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Add a helper to calculate the level for non-EPT page tables from the
MMU's role_regs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 60 ++++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a79871fe5b01..b83fd635e1f2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -238,6 +238,19 @@ struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
 	return regs;
 }
 
+static int role_regs_to_root_level(struct kvm_mmu_role_regs *regs)
+{
+	if (!____is_cr0_pg(regs))
+		return 0;
+	else if (____is_efer_lma(regs))
+		return ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL :
+					       PT64_ROOT_4LEVEL;
+	else if (____is_cr4_pae(regs))
+		return PT32E_ROOT_LEVEL;
+	else
+		return PT32_ROOT_LEVEL;
+}
+
 static inline bool kvm_available_flush_tlb_with_range(void)
 {
 	return kvm_x86_ops.tlb_remote_flush_with_range;
@@ -3949,7 +3962,6 @@ static void nonpaging_init_context(struct kvm_mmu *context)
 	context->gva_to_gpa = nonpaging_gva_to_gpa;
 	context->sync_page = nonpaging_sync_page;
 	context->invlpg = NULL;
-	context->root_level = 0;
 	context->direct_map = true;
 }
 
@@ -4519,11 +4531,8 @@ static void reset_guest_paging_metadata(struct kvm_vcpu *vcpu,
 	update_last_nonleaf_level(mmu);
 }
 
-static void paging64_init_context_common(struct kvm_mmu *context,
-					 int root_level)
+static void paging64_init_context_common(struct kvm_mmu *context)
 {
-	context->root_level = root_level;
-
 	WARN_ON_ONCE(!is_cr4_pae(context));
 	context->page_fault = paging64_page_fault;
 	context->gva_to_gpa = paging64_gva_to_gpa;
@@ -4532,18 +4541,13 @@ static void paging64_init_context_common(struct kvm_mmu *context,
 	context->direct_map = false;
 }
 
-static void paging64_init_context(struct kvm_mmu *context,
-				  struct kvm_mmu_role_regs *regs)
+static void paging64_init_context(struct kvm_mmu *context)
 {
-	int root_level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL :
-						 PT64_ROOT_4LEVEL;
-
-	paging64_init_context_common(context, root_level);
+	paging64_init_context_common(context);
 }
 
 static void paging32_init_context(struct kvm_mmu *context)
 {
-	context->root_level = PT32_ROOT_LEVEL;
 	context->page_fault = paging32_page_fault;
 	context->gva_to_gpa = paging32_gva_to_gpa;
 	context->sync_page = paging32_sync_page;
@@ -4553,7 +4557,7 @@ static void paging32_init_context(struct kvm_mmu *context)
 
 static void paging32E_init_context(struct kvm_mmu *context)
 {
-	paging64_init_context_common(context, PT32E_ROOT_LEVEL);
+	paging64_init_context_common(context);
 }
 
 static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu,
@@ -4642,21 +4646,16 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->get_guest_pgd = get_cr3;
 	context->get_pdptr = kvm_pdptr_read;
 	context->inject_page_fault = kvm_inject_page_fault;
+	context->root_level = role_regs_to_root_level(&regs);
 
-	if (!is_paging(vcpu)) {
+	if (!is_paging(vcpu))
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
-		context->root_level = 0;
-	} else if (is_long_mode(vcpu)) {
-		context->root_level = ____is_cr4_la57(&regs) ?
-				PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;
+	else if (is_long_mode(vcpu))
 		context->gva_to_gpa = paging64_gva_to_gpa;
-	} else if (is_pae(vcpu)) {
-		context->root_level = PT32E_ROOT_LEVEL;
+	else if (is_pae(vcpu))
 		context->gva_to_gpa = paging64_gva_to_gpa;
-	} else {
-		context->root_level = PT32_ROOT_LEVEL;
+	else
 		context->gva_to_gpa = paging32_gva_to_gpa;
-	}
 
 	reset_guest_paging_metadata(vcpu, context);
 	reset_tdp_shadow_zero_bits_mask(vcpu, context);
@@ -4706,11 +4705,12 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 	if (!____is_cr0_pg(regs))
 		nonpaging_init_context(context);
 	else if (____is_efer_lma(regs))
-		paging64_init_context(context, regs);
+		paging64_init_context(context);
 	else if (____is_cr4_pae(regs))
 		paging32E_init_context(context);
 	else
 		paging32_init_context(context);
+	context->root_level = role_regs_to_root_level(regs);
 
 	reset_guest_paging_metadata(vcpu, context);
 	context->shadow_root_level = new_role.base.level;
@@ -4849,17 +4849,7 @@ kvm_calc_nested_mmu_role(struct kvm_vcpu *vcpu, struct kvm_mmu_role_regs *regs)
 	 * to "true" to try to detect bogus usage of the nested MMU.
 	 */
 	role.base.direct = true;
-
-	if (!____is_cr0_pg(regs))
-		role.base.level = 0;
-	else if (____is_efer_lma(regs))
-		role.base.level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL :
-							  PT64_ROOT_4LEVEL;
-	else if (____is_cr4_pae(regs))
-		role.base.level = PT32E_ROOT_LEVEL;
-	else
-		role.base.level = PT32_ROOT_LEVEL;
-
+	role.base.level = role_regs_to_root_level(regs);
 	return role;
 }
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 45/54] KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (43 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 44/54] KVM: x86/mmu: Add a helper to calculate root from role_regs Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 46/54] KVM: x86/mmu: Use MMU's role to determine PTTYPE Sean Christopherson
                   ` (9 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Skip paging32E_init_context() and paging64_init_context_common() and go
directly to paging64_init_context() (was the common version) now that
the relevant flows don't need to distinguish between 64-bit PAE and
32-bit PAE for other reasons.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 19 ++-----------------
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b83fd635e1f2..4e11cb284006 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4531,9 +4531,8 @@ static void reset_guest_paging_metadata(struct kvm_vcpu *vcpu,
 	update_last_nonleaf_level(mmu);
 }
 
-static void paging64_init_context_common(struct kvm_mmu *context)
+static void paging64_init_context(struct kvm_mmu *context)
 {
-	WARN_ON_ONCE(!is_cr4_pae(context));
 	context->page_fault = paging64_page_fault;
 	context->gva_to_gpa = paging64_gva_to_gpa;
 	context->sync_page = paging64_sync_page;
@@ -4541,11 +4540,6 @@ static void paging64_init_context_common(struct kvm_mmu *context)
 	context->direct_map = false;
 }
 
-static void paging64_init_context(struct kvm_mmu *context)
-{
-	paging64_init_context_common(context);
-}
-
 static void paging32_init_context(struct kvm_mmu *context)
 {
 	context->page_fault = paging32_page_fault;
@@ -4555,11 +4549,6 @@ static void paging32_init_context(struct kvm_mmu *context)
 	context->direct_map = false;
 }
 
-static void paging32E_init_context(struct kvm_mmu *context)
-{
-	paging64_init_context_common(context);
-}
-
 static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu,
 							 struct kvm_mmu_role_regs *regs)
 {
@@ -4650,8 +4639,6 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 
 	if (!is_paging(vcpu))
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
-	else if (is_long_mode(vcpu))
-		context->gva_to_gpa = paging64_gva_to_gpa;
 	else if (is_pae(vcpu))
 		context->gva_to_gpa = paging64_gva_to_gpa;
 	else
@@ -4704,10 +4691,8 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 
 	if (!____is_cr0_pg(regs))
 		nonpaging_init_context(context);
-	else if (____is_efer_lma(regs))
+	else if (____is_cr4_pae(regs))
 		paging64_init_context(context);
-	else if (____is_cr4_pae(regs))
-		paging32E_init_context(context);
 	else
 		paging32_init_context(context);
 	context->root_level = role_regs_to_root_level(regs);
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 46/54] KVM: x86/mmu: Use MMU's role to determine PTTYPE
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (44 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 45/54] KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 47/54] KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU Sean Christopherson
                   ` (8 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the MMU's role instead of vCPU state or role_regs to determine the
PTTYPE, i.e. which helpers to wire up.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4e11cb284006..92260cf48d5e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4637,9 +4637,9 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->inject_page_fault = kvm_inject_page_fault;
 	context->root_level = role_regs_to_root_level(&regs);
 
-	if (!is_paging(vcpu))
+	if (!is_cr0_pg(context))
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
-	else if (is_pae(vcpu))
+	else if (is_cr4_pae(context))
 		context->gva_to_gpa = paging64_gva_to_gpa;
 	else
 		context->gva_to_gpa = paging32_gva_to_gpa;
@@ -4689,9 +4689,9 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 
 	context->mmu_role.as_u64 = new_role.as_u64;
 
-	if (!____is_cr0_pg(regs))
+	if (!is_cr0_pg(context))
 		nonpaging_init_context(context);
-	else if (____is_cr4_pae(regs))
+	else if (is_cr4_pae(context))
 		paging64_init_context(context);
 	else
 		paging32_init_context(context);
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 47/54] KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (45 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 46/54] KVM: x86/mmu: Use MMU's role to determine PTTYPE Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-23 20:13   ` Paolo Bonzini
  2021-06-22 17:57 ` [PATCH 48/54] KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE Sean Christopherson
                   ` (7 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Extract the reserved SPTE check and print helpers in get_mmio_spte() to
new helpers so that KVM can also WARN on reserved badness when making a
SPTE.

Tag the checking helper with __always_inline to improve the probability
of the compiler generating optimal code for the checking loop, e.g. gcc
appears to avoid using %rbp when the helper is tagged with a vanilla
"inline".

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c  | 23 ++---------------------
 arch/x86/kvm/mmu/spte.h | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 92260cf48d5e..34e7a489e71b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3594,19 +3594,6 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gpa_t vaddr,
 	return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, access, exception);
 }
 
-static bool
-__is_rsvd_bits_set(struct rsvd_bits_validate *rsvd_check, u64 pte, int level)
-{
-	int bit7 = (pte >> 7) & 1;
-
-	return pte & rsvd_check->rsvd_bits_mask[bit7][level-1];
-}
-
-static bool __is_bad_mt_xwr(struct rsvd_bits_validate *rsvd_check, u64 pte)
-{
-	return rsvd_check->bad_mt_xwr & BIT_ULL(pte & 0x3f);
-}
-
 static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 {
 	/*
@@ -3684,13 +3671,7 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
 	rsvd_check = &vcpu->arch.mmu->shadow_zero_check;
 
 	for (level = root; level >= leaf; level--)
-		/*
-		 * Use a bitwise-OR instead of a logical-OR to aggregate the
-		 * reserved bit and EPT's invalid memtype/XWR checks to avoid
-		 * adding a Jcc in the loop.
-		 */
-		reserved |= __is_bad_mt_xwr(rsvd_check, sptes[level]) |
-			    __is_rsvd_bits_set(rsvd_check, sptes[level], level);
+		reserved |= is_rsvd_spte(rsvd_check, sptes[level], level);
 
 	if (reserved) {
 		pr_err("%s: reserved bits set on MMU-present spte, addr 0x%llx, hierarchy:\n",
@@ -3698,7 +3679,7 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
 		for (level = root; level >= leaf; level--)
 			pr_err("------ spte = 0x%llx level = %d, rsvd bits = 0x%llx",
 			       sptes[level], level,
-			       rsvd_check->rsvd_bits_mask[(sptes[level] >> 7) & 1][level-1]);
+			       get_rsvd_bits(rsvd_check, sptes[level], level));
 	}
 
 	return reserved;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index bca0ba11cccf..47e10dd9352d 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -293,6 +293,38 @@ static inline bool is_dirty_spte(u64 spte)
 	return dirty_mask ? spte & dirty_mask : spte & PT_WRITABLE_MASK;
 }
 
+static inline u64 get_rsvd_bits(struct rsvd_bits_validate *rsvd_check, u64 pte,
+				int level)
+{
+	int bit7 = (pte >> 7) & 1;
+
+	return rsvd_check->rsvd_bits_mask[bit7][level-1];
+}
+
+static inline bool __is_rsvd_bits_set(struct rsvd_bits_validate *rsvd_check,
+				      u64 pte, int level)
+{
+	return pte & get_rsvd_bits(rsvd_check, pte, level);
+}
+
+static inline bool __is_bad_mt_xwr(struct rsvd_bits_validate *rsvd_check,
+				   u64 pte)
+{
+	return rsvd_check->bad_mt_xwr & BIT_ULL(pte & 0x3f);
+}
+
+static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,
+					 u64 spte, int level)
+{
+	/*
+	 * Use a bitwise-OR instead of a logical-OR to aggregate the reserved
+	 * bits and EPT's invalid memtype/XWR checks to avoid an extra Jcc
+	 * (this is used in hot paths).
+	 */
+	return __is_bad_mt_xwr(rsvd_check, spte) |
+	       __is_rsvd_bits_set(rsvd_check, spte, level);
+}
+
 static inline bool spte_can_locklessly_be_made_writable(u64 spte)
 {
 	return (spte & shadow_host_writable_mask) &&
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 48/54] KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (46 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 47/54] KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 49/54] KVM: x86: Enhance comments for MMU roles and nested transition trickiness Sean Christopherson
                   ` (6 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Replace make_spte()'s WARN on a collision with the magic MMIO value with
a generic WARN on reserved bits being set (including EPT's reserved WX
combination).  Warning on any reserved bits covers MMIO, A/D tracking
bits with PAE paging, and in theory any future goofs that are introduced.

Opportunistically convert to ONCE behavior to avoid spamming the kernel
log, odds are very good that if KVM screws up one SPTE, it will botch all
SPTEs for the same MMU.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/spte.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 246e61e0771e..3e97cdb13eb7 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -175,7 +175,10 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 		spte = mark_spte_for_access_track(spte);
 
 out:
-	WARN_ON(is_mmio_spte(spte));
+	WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level),
+		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
+		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
+
 	*new_spte = spte;
 	return ret;
 }
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 49/54] KVM: x86: Enhance comments for MMU roles and nested transition trickiness
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (47 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 48/54] KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 50/54] KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic Sean Christopherson
                   ` (5 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Expand the comments for the MMU roles.  The interactions with gfn_track
PGD reuse in particular are hairy.

Regarding PGD reuse, add comments in the nested virtualization flows to
call out why kvm_init_mmu() is unconditionally called even when nested
TDP is used.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 59 +++++++++++++++++++++++++++------
 arch/x86/kvm/svm/nested.c       |  1 +
 arch/x86/kvm/vmx/nested.c       |  1 +
 3 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index be7088fb0594..2da8b5ddbd6a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -269,12 +269,36 @@ enum x86_intercept_stage;
 struct kvm_kernel_irq_routing_entry;
 
 /*
- * the pages used as guest page table on soft mmu are tracked by
- * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used
- * by indirect shadow page can not be more than 15 bits.
+ * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
+ * also includes TDP pages) to determine whether or not a page can be used in
+ * the given MMU context.  This is a subset of the overall kvm_mmu_role to
+ * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
+ * 2 bytes per gfn instead of 4 bytes per gfn.
  *
- * Currently, we used 14 bits that are @level, @gpte_is_8_bytes, @quadrant, @access,
- * @efer_nx, @cr0_wp, @smep_andnot_wp and @smap_andnot_wp.
+ * Indirect upper-level shadow pages are tracked for write-protection via
+ * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
+ * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
+ * gfn_track will overflow and explosions will ensure.
+ *
+ * A unique shadow page (SP) for a gfn is created if and only if an existing SP
+ * cannot be reused.  The ability to reuse a SP is tracked by its role, which
+ * incorporates various mode bits and properties of the SP.  Roughly speaking,
+ * the number of unique SPs that can theoretically be created is 2^n, where n
+ * is the number of bits that are used to compute the role.
+ *
+ * But, even though there are 18 bits in the mask below, not all combinations
+ * of modes and flags are possible.  The maximum number of possible upper-level
+ * shadow pages for a single gfn is in the neighborhood of 2^13.
+ *
+ *   - invalid shadow pages are not accounted.
+ *   - level is effectively limited to four combinations, not 16 as the number
+ *     bits would imply, as 4k SPs are not tracked (allowed to go unsync).
+ *   - level is effectively unused for non-PAE paging because there is exactly
+ *     one upper level (see 4k SP exception above).
+ *   - quadrant is used only for non-PAE paging and is exclusive with
+ *     gpte_is_8_bytes.
+ *   - execonly and ad_disabled are used only for nested EPT, which makes it
+ *     exclusive with quadrant.
  */
 union kvm_mmu_page_role {
 	u32 word;
@@ -303,13 +327,26 @@ union kvm_mmu_page_role {
 	};
 };
 
+/*
+ * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
+ * relevant to the current MMU configuration.   When loading CR0, CR4, or EFER,
+ * including on nested transitions, if nothing in the full role changes then
+ * MMU re-configuration can be skipped. @valid bit is set on first usage so we
+ * don't treat all-zero structure as valid data.
+ *
+ * The properties that are tracked in the extended role but not the page role
+ * are for things that either (a) do not affect the validity of the shadow page
+ * or (b) are indirectly reflected in the shadow page's role.  For example,
+ * CR4.PKE only affects permission checks for software walks of the guest page
+ * tables (because KVM doesn't support Protection Keys with shadow paging), and
+ * CR0.PG, CR4.PAE, and CR4.PSE are indirectly reflected in role.level.
+ *
+ * Note, SMEP and SMAP are not redundant with sm*p_andnot_wp in the page role.
+ * If CR0.WP=1, KVM can reuse shadow pages for the guest regardless of SMEP and
+ * SMAP, but the MMU's permission checks for software walks need to be SMEP and
+ * SMAP aware regardless of CR0.WP.
+ */
 union kvm_mmu_extended_role {
-/*
- * This structure complements kvm_mmu_page_role caching everything needed for
- * MMU configuration. If nothing in both these structures changed, MMU
- * re-configuration can be skipped. @valid bit is set on first usage so we don't
- * treat all-zero structure as valid data.
- */
 	u32 word;
 	struct {
 		unsigned int valid:1;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 927e545591c3..94389f974ba9 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -424,6 +424,7 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
+	/* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */
 	kvm_init_mmu(vcpu);
 
 	return 0;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 183fd9d62fc5..77fc51a852cf 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1098,6 +1098,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 	vcpu->arch.cr3 = cr3;
 	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
 
+	/* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */
 	kvm_init_mmu(vcpu);
 
 	return 0;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 50/54] KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (48 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 49/54] KVM: x86: Enhance comments for MMU roles and nested transition trickiness Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-23 20:22   ` Paolo Bonzini
  2021-06-22 17:57 ` [PATCH 51/54] KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT Sean Christopherson
                   ` (4 subsequent siblings)
  54 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Drop the pre-computed last_nonleaf_level, which is arguably wrong and at
best confusing.  Per the comment:

  Can have large pages at levels 2..last_nonleaf_level-1.

the intent of the variable would appear to be to track what levels can
_legally_ have large pages, but that intent doesn't align with reality.
The computed value will be wrong for 5-level paging, or if 1gb pages are
not supported.

The flawed code is not a problem in practice, because except for 32-bit
PSE paging, bit 7 is reserved if large pages aren't supported at the
level.  Take advantage of this invariant and simply omit the level magic
math for 64-bit page tables (including PAE).

For 32-bit paging (non-PAE), the adjustments are needed purely because
bit 7 is ignored if PSE=0.  Retain that logic as is, but make
is_last_gpte() unique per PTTYPE so that the PSE check is avoided for
PAE and EPT paging.  In the spirit of avoiding branches, bump the "last
nonleaf level" for 32-bit PSE paging by adding the PSE bit itself.

Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite
FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are
not PTEs and will blow up all the other gpte helpers.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ---
 arch/x86/kvm/mmu/mmu.c          | 31 -------------------------------
 arch/x86/kvm/mmu/paging_tmpl.h  | 31 ++++++++++++++++++++++++++++++-
 3 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2da8b5ddbd6a..c97b83cf8381 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -457,9 +457,6 @@ struct kvm_mmu {
 
 	struct rsvd_bits_validate guest_rsvd_check;
 
-	/* Can have large pages at levels 2..last_nonleaf_level-1. */
-	u8 last_nonleaf_level;
-
 	u64 pdptrs[4]; /* pae */
 };
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 34e7a489e71b..7849f53fd874 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4071,26 +4071,6 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
 	return false;
 }
 
-static inline bool is_last_gpte(struct kvm_mmu *mmu,
-				unsigned level, unsigned gpte)
-{
-	/*
-	 * The RHS has bit 7 set iff level < mmu->last_nonleaf_level.
-	 * If it is clear, there are no large pages at this level, so clear
-	 * PT_PAGE_SIZE_MASK in gpte if that is the case.
-	 */
-	gpte &= level - mmu->last_nonleaf_level;
-
-	/*
-	 * PG_LEVEL_4K always terminates.  The RHS has bit 7 set
-	 * iff level <= PG_LEVEL_4K, which for our purpose means
-	 * level == PG_LEVEL_4K; set PT_PAGE_SIZE_MASK in gpte then.
-	 */
-	gpte |= level - PG_LEVEL_4K - 1;
-
-	return gpte & PT_PAGE_SIZE_MASK;
-}
-
 #define PTTYPE_EPT 18 /* arbitrary */
 #define PTTYPE PTTYPE_EPT
 #include "paging_tmpl.h"
@@ -4491,15 +4471,6 @@ static void update_pkru_bitmask(struct kvm_mmu *mmu)
 	}
 }
 
-static void update_last_nonleaf_level(struct kvm_mmu *mmu)
-{
-	unsigned root_level = mmu->root_level;
-
-	mmu->last_nonleaf_level = root_level;
-	if (root_level == PT32_ROOT_LEVEL && is_cr4_pse(mmu))
-		mmu->last_nonleaf_level++;
-}
-
 static void reset_guest_paging_metadata(struct kvm_vcpu *vcpu,
 					struct kvm_mmu *mmu)
 {
@@ -4509,7 +4480,6 @@ static void reset_guest_paging_metadata(struct kvm_vcpu *vcpu,
 	reset_rsvds_bits_mask(vcpu, mmu);
 	update_permission_bitmask(mmu, false);
 	update_pkru_bitmask(mmu);
-	update_last_nonleaf_level(mmu);
 }
 
 static void paging64_init_context(struct kvm_mmu *context)
@@ -4783,7 +4753,6 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 	context->direct_map = false;
 
 	update_permission_bitmask(context, true);
-	update_last_nonleaf_level(context);
 	update_pkru_bitmask(context);
 	reset_rsvds_bits_mask_ept(vcpu, context, execonly);
 	reset_ept_shadow_zero_bits_mask(vcpu, context, execonly);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index c92e712607b6..ec1de57f3572 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -305,6 +305,35 @@ static inline unsigned FNAME(gpte_pkeys)(struct kvm_vcpu *vcpu, u64 gpte)
 	return pkeys;
 }
 
+static inline bool FNAME(is_last_gpte)(struct kvm_mmu *mmu,
+				       unsigned int level, unsigned int gpte)
+{
+	/*
+	 * For EPT and PAE paging (both variants), bit 7 is either reserved at
+	 * all level or indicates a huge page (ignoring CR3/EPTP).  In either
+	 * case, bit 7 being set terminates the walk.
+	 */
+#if PTTYPE == 32
+	/*
+	 * 32-bit paging requires special handling because bit 7 is ignored if
+	 * CR4.PSE=0, not reserved.  Clear bit 7 in the gpte if the level is
+	 * greater than the last level for which bit 7 is the PAGE_SIZE bit.
+	 *
+	 * The RHS has bit 7 set iff level < (2 + PSE).  If it is clear, bit 7
+	 * is not reserved and does not indicate a large page at this level,
+	 * so clear PT_PAGE_SIZE_MASK in gpte if that is the case.
+	 */
+	gpte &= level - (PT32_ROOT_LEVEL + !!mmu->mmu_role.ext.cr4_pse);
+#endif
+	/*
+	 * PG_LEVEL_4K always terminates.  The RHS has bit 7 set
+	 * iff level <= PG_LEVEL_4K, which for our purpose means
+	 * level == PG_LEVEL_4K; set PT_PAGE_SIZE_MASK in gpte then.
+	 */
+	gpte |= level - PG_LEVEL_4K - 1;
+
+	return gpte & PT_PAGE_SIZE_MASK;
+}
 /*
  * Fetch a guest pte for a guest virtual address, or for an L2's GPA.
  */
@@ -421,7 +450,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 
 		/* Convert to ACC_*_MASK flags for struct guest_walker.  */
 		walker->pt_access[walker->level - 1] = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
-	} while (!is_last_gpte(mmu, walker->level, pte));
+	} while (!FNAME(is_last_gpte)(mmu, walker->level, pte));
 
 	pte_pkey = FNAME(gpte_pkeys)(vcpu, pte);
 	accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 51/54] KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (49 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 50/54] KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 52/54] KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault Sean Christopherson
                   ` (3 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Drop the extra reset of shadow_zero_bits in the nested NPT flow now
that shadow_mmu_init_context computes the correct level for nested NPT.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7849f53fd874..d4969ac98a4b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4693,12 +4693,6 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 	__kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base);
 
 	shadow_mmu_init_context(vcpu, context, &regs, new_role);
-
-	/*
-	 * Redo the shadow bits, the reset done by shadow_mmu_init_context()
-	 * (above) may use the wrong shadow_root_level.
-	 */
-	reset_shadow_zero_bits_mask(vcpu, context);
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 52/54] KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (50 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 51/54] KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 53/54] KVM: x86/mmu: Get CR4.SMEP " Sean Christopherson
                   ` (2 subsequent siblings)
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the current MMU instead of vCPU state to query CR0.WP when handling
a page fault.  In the nested NPT case, the current CR0.WP reflects L2,
whereas the page fault is shadowing L1's NPT.  Practically speaking, this
is a nop a NPT walks are always user faults, but fix it up for
consistency.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu.h             | 5 -----
 arch/x86/kvm/mmu/paging_tmpl.h | 5 ++---
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 62844bacd13f..83e6c6965f1e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -165,11 +165,6 @@ static inline bool is_writable_pte(unsigned long pte)
 	return pte & PT_WRITABLE_MASK;
 }
 
-static inline bool is_write_protection(struct kvm_vcpu *vcpu)
-{
-	return kvm_read_cr0_bits(vcpu, X86_CR0_WP);
-}
-
 /*
  * Check if a given access (described through the I/D, W/R and U/S bits of a
  * page fault error code pfec) causes a permission fault with the given PTE
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index ec1de57f3572..260a9c06d764 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -795,7 +795,7 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
 	bool self_changed = false;
 
 	if (!(walker->pte_access & ACC_WRITE_MASK ||
-	      (!is_write_protection(vcpu) && !user_fault)))
+	    (!is_cr0_wp(vcpu->arch.mmu) && !user_fault)))
 		return false;
 
 	for (level = walker->level; level <= walker->max_level; level++) {
@@ -893,8 +893,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	 * we will cache the incorrect access into mmio spte.
 	 */
 	if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
-	     !is_write_protection(vcpu) && !user_fault &&
-	      !is_noslot_pfn(pfn)) {
+	    !is_cr0_wp(vcpu->arch.mmu) && !user_fault && !is_noslot_pfn(pfn)) {
 		walker.pte_access |= ACC_WRITE_MASK;
 		walker.pte_access &= ~ACC_USER_MASK;
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 53/54] KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (51 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 52/54] KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-22 17:57 ` [PATCH 54/54] KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on Sean Christopherson
  2021-06-23 20:29 ` [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Paolo Bonzini
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Use the current MMU instead of vCPU state to query CR4.SMEP when handling
a page fault.  In the nested NPT case, the current CR4.SMEP reflects L2,
whereas the page fault is shadowing L1's NPT, which uses L1's hCR4.
Practically speaking, this is a nop a NPT walks are always user faults,
i.e. this code will never be reached, but fix it up for consistency.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 260a9c06d764..a79353fc6efd 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -903,7 +903,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 		 * then we should prevent the kernel from executing it
 		 * if SMEP is enabled.
 		 */
-		if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
+		if (is_cr4_smep(vcpu->arch.mmu))
 			walker.pte_access &= ~ACC_EXEC_MASK;
 	}
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 54/54] KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (52 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 53/54] KVM: x86/mmu: Get CR4.SMEP " Sean Christopherson
@ 2021-06-22 17:57 ` Sean Christopherson
  2021-06-23 20:29 ` [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Paolo Bonzini
  54 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-22 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Yu Zhang, Maxim Levitsky

Let the guest use 1g hugepages if TDP is enabled and the host supports
GBPAGES, KVM can't actively prevent the guest from using 1g pages in this
case since they can't be disabled in the hardware page walker.  While
injecting a page fault if a bogus 1g page is encountered during a
software page walk is perfectly reasonable since KVM is simply honoring
userspace's vCPU model, doing so arguably doesn't provide any meaningful
value, and at worst will be horribly confusing as the guest will see
inconsistent behavior and seemingly spurious page faults.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d4969ac98a4b..684255defb33 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4174,13 +4174,28 @@ __reset_rsvds_bits_mask(struct rsvd_bits_validate *rsvd_check,
 	}
 }
 
+static bool guest_can_use_gbpages(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * If TDP is enabled, let the guest use GBPAGES if they're supported in
+	 * hardware.  The hardware page walker doesn't let KVM disable GBPAGES,
+	 * i.e. won't treat them as reserved, and KVM doesn't redo the GVA->GPA
+	 * walk for performance and complexity reasons.  Not to mention KVM
+	 * _can't_ solve the problem because GVA->GPA walks aren't visible to
+	 * KVM once a TDP translation is installed.  Mimic hardware behavior so
+	 * that KVM's is at least consistent, i.e. doesn't randomly inject #PF.
+	 */
+	return tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) :
+			     guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES);
+}
+
 static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 				  struct kvm_mmu *context)
 {
 	__reset_rsvds_bits_mask(&context->guest_rsvd_check,
 				vcpu->arch.reserved_gpa_bits,
 				context->root_level, is_efer_nx(context),
-				guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES),
+				guest_can_use_gbpages(vcpu),
 				is_cr4_pse(context),
 				guest_cpuid_is_amd_or_hygon(vcpu));
 }
@@ -4259,8 +4274,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 	shadow_zero_check = &context->shadow_zero_check;
 	__reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
 				context->shadow_root_level, uses_nx,
-				guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES),
-				is_pse, is_amd);
+				guest_can_use_gbpages(vcpu), is_pse, is_amd);
 
 	if (!shadow_me_mask)
 		return;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: [PATCH 20/54] KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from regs
  2021-06-22 17:57 ` [PATCH 20/54] KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from regs Sean Christopherson
@ 2021-06-23  1:58   ` kernel test robot
  2021-06-23 17:18   ` Paolo Bonzini
  1 sibling, 0 replies; 103+ messages in thread
From: kernel test robot @ 2021-06-23  1:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kbuild-all, clang-built-linux, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

[-- Attachment #1: Type: text/plain, Size: 5598 bytes --]

Hi Sean,

I love your patch! Perhaps something to improve:

[auto build test WARNING on kvm/queue]
[also build test WARNING on next-20210622]
[cannot apply to linus/master vhost/linux-next v5.13-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sean-Christopherson/KVM-x86-mmu-Bug-fixes-and-summer-cleaning/20210623-020645
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: x86_64-randconfig-a002-20210622 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project b3634d3e88b7f26534a5057bff182b7dced584fc)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/01d7a0135a12b1e0e5134d0575e424fd20d1a90f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sean-Christopherson/KVM-x86-mmu-Bug-fixes-and-summer-cleaning/20210623-020645
        git checkout 01d7a0135a12b1e0e5134d0575e424fd20d1a90f
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/x86/kvm/mmu/mmu.c:209:26: warning: no previous prototype for function 'vcpu_to_role_regs' [-Wmissing-prototypes]
   struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
                            ^
   arch/x86/kvm/mmu/mmu.c:209:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
   ^
   static 
   arch/x86/kvm/mmu/mmu.c:199:1: warning: unused function '____is_cr0_wp' [-Wunused-function]
   BUILD_MMU_ROLE_REGS_ACCESSOR(cr0, wp, X86_CR0_WP);
   ^
   arch/x86/kvm/mmu/mmu.c:194:20: note: expanded from macro 'BUILD_MMU_ROLE_REGS_ACCESSOR'
   static inline bool ____is_##reg##_##name(struct kvm_mmu_role_regs *regs)\
                      ^
   <scratch space>:58:1: note: expanded from here
   ____is_cr0_wp
   ^
   arch/x86/kvm/mmu/mmu.c:200:1: warning: unused function '____is_cr4_pse' [-Wunused-function]
   BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pse, X86_CR4_PSE);
   ^
   arch/x86/kvm/mmu/mmu.c:194:20: note: expanded from macro 'BUILD_MMU_ROLE_REGS_ACCESSOR'
   static inline bool ____is_##reg##_##name(struct kvm_mmu_role_regs *regs)\
                      ^
   <scratch space>:62:1: note: expanded from here
   ____is_cr4_pse
   ^
   arch/x86/kvm/mmu/mmu.c:202:1: warning: unused function '____is_cr4_smep' [-Wunused-function]
   BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smep, X86_CR4_SMEP);
   ^
   arch/x86/kvm/mmu/mmu.c:194:20: note: expanded from macro 'BUILD_MMU_ROLE_REGS_ACCESSOR'
   static inline bool ____is_##reg##_##name(struct kvm_mmu_role_regs *regs)\
                      ^
   <scratch space>:70:1: note: expanded from here
   ____is_cr4_smep
   ^
   arch/x86/kvm/mmu/mmu.c:203:1: warning: unused function '____is_cr4_smap' [-Wunused-function]
   BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smap, X86_CR4_SMAP);
   ^
   arch/x86/kvm/mmu/mmu.c:194:20: note: expanded from macro 'BUILD_MMU_ROLE_REGS_ACCESSOR'
   static inline bool ____is_##reg##_##name(struct kvm_mmu_role_regs *regs)\
                      ^
   <scratch space>:74:1: note: expanded from here
   ____is_cr4_smap
   ^
   arch/x86/kvm/mmu/mmu.c:204:1: warning: unused function '____is_cr4_pke' [-Wunused-function]
   BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pke, X86_CR4_PKE);
   ^
   arch/x86/kvm/mmu/mmu.c:194:20: note: expanded from macro 'BUILD_MMU_ROLE_REGS_ACCESSOR'
   static inline bool ____is_##reg##_##name(struct kvm_mmu_role_regs *regs)\
                      ^
   <scratch space>:78:1: note: expanded from here
   ____is_cr4_pke
   ^
   arch/x86/kvm/mmu/mmu.c:205:1: warning: unused function '____is_cr4_la57' [-Wunused-function]
   BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, la57, X86_CR4_LA57);
   ^
   arch/x86/kvm/mmu/mmu.c:194:20: note: expanded from macro 'BUILD_MMU_ROLE_REGS_ACCESSOR'
   static inline bool ____is_##reg##_##name(struct kvm_mmu_role_regs *regs)\
                      ^
   <scratch space>:82:1: note: expanded from here
   ____is_cr4_la57
   ^
   arch/x86/kvm/mmu/mmu.c:206:1: warning: unused function '____is_efer_nx' [-Wunused-function]
   BUILD_MMU_ROLE_REGS_ACCESSOR(efer, nx, EFER_NX);
   ^
   arch/x86/kvm/mmu/mmu.c:194:20: note: expanded from macro 'BUILD_MMU_ROLE_REGS_ACCESSOR'
   static inline bool ____is_##reg##_##name(struct kvm_mmu_role_regs *regs)\
                      ^
   <scratch space>:85:1: note: expanded from here
   ____is_efer_nx
   ^
   8 warnings generated.


vim +/vcpu_to_role_regs +209 arch/x86/kvm/mmu/mmu.c

   208	
 > 209	struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
   210	{
   211		struct kvm_mmu_role_regs regs = {
   212			.cr0 = kvm_read_cr0_bits(vcpu, KVM_MMU_CR0_ROLE_BITS),
   213			.cr4 = kvm_read_cr4_bits(vcpu, KVM_MMU_CR4_ROLE_BITS),
   214			.efer = vcpu->arch.efer,
   215		};
   216	
   217		return regs;
   218	}
   219	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 43090 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 03/54] KVM: x86: Properly reset MMU context at vCPU RESET/INIT
  2021-06-22 17:56 ` [PATCH 03/54] KVM: x86: Properly reset MMU context at vCPU RESET/INIT Sean Christopherson
@ 2021-06-23 13:59   ` Paolo Bonzini
  2021-06-23 14:01   ` Paolo Bonzini
  1 sibling, 0 replies; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 13:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:56, Sean Christopherson wrote:
> +	/*
> +	 * Reset the MMU context if paging was enabled prior to INIT (which is
> +	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
> +	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
> +	 * checked because it is unconditionally cleared on INIT and all other
> +	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
> +	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
> +	 */
> +	if (old_cr0 & X86_CR0_PG)
> +		kvm_mmu_reset_context(vcpu);

Why not just check "if (init_event)", with a simple comment like

	/*
	 * Reset the MMU context in case paging was enabled prior to INIT (CR0
	 * will be '0' prior to RESET).
	 */

?

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 03/54] KVM: x86: Properly reset MMU context at vCPU RESET/INIT
  2021-06-22 17:56 ` [PATCH 03/54] KVM: x86: Properly reset MMU context at vCPU RESET/INIT Sean Christopherson
  2021-06-23 13:59   ` Paolo Bonzini
@ 2021-06-23 14:01   ` Paolo Bonzini
  2021-06-23 14:50     ` Sean Christopherson
  1 sibling, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 14:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:56, Sean Christopherson wrote:
> +	/*
> +	 * Reset the MMU context if paging was enabled prior to INIT (which is
> +	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
> +	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
> +	 * checked because it is unconditionally cleared on INIT and all other
> +	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
> +	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
> +	 */
> +	if (old_cr0 & X86_CR0_PG)
> +		kvm_mmu_reset_context(vcpu);

Hmm, I'll answer myself, is it because of the plan to add a vCPU reset 
ioctl?

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken
  2021-06-22 17:56 ` [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken Sean Christopherson
@ 2021-06-23 14:16   ` Paolo Bonzini
  2021-06-23 17:00     ` Jim Mattson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 14:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:56, Sean Christopherson wrote:
> +	/*
> +	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
> +	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> +	 * tracked in kvm_mmu_page_role.  As a result, KVM may miss guest page
> +	 * faults due to reusing SPs/SPTEs.  Alert userspace, but otherwise
> +	 * sweep the problem under the rug.
> +	 *
> +	 * KVM's horrific CPUID ABI makes the problem all but impossible to
> +	 * solve, as correctly handling multiple vCPU models (with respect to
> +	 * paging and physical address properties) in a single VM would require
> +	 * tracking all relevant CPUID information in kvm_mmu_page_role.  That
> +	 * is very undesirable as it would double the memory requirements for
> +	 * gfn_track (see struct kvm_mmu_page_role comments), and in practice
> +	 * no sane VMM mucks with the core vCPU model on the fly.
> +	 */
> +	if (vcpu->arch.last_vmentry_cpu != -1)
> +		pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} after KVM_RUN may cause guest instability\n");

Let's make this even stronger and promise to break it in 5.16.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN
  2021-06-22 17:56 ` [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN Sean Christopherson
@ 2021-06-23 14:36   ` Paolo Bonzini
  2021-06-23 15:08     ` Sean Christopherson
  2021-06-25  9:51   ` Yu Zhang
  1 sibling, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 14:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:56, Sean Christopherson wrote:
> When creating a new upper-level shadow page, zap unsync shadow pages at
> the same target gfn instead of attempting to sync the pages.  This fixes
> a bug where an unsync shadow page could be sync'd with an incompatible
> context, e.g. wrong smm, is_guest, etc... flags.  In practice, the bug is
> relatively benign as sync_page() is all but guaranteed to fail its check
> that the guest's desired gfn (for the to-be-sync'd page) matches the
> current gfn associated with the shadow page.  I.e. kvm_sync_page() would
> end up zapping the page anyways.
> 
> Alternatively, __kvm_sync_page() could be modified to explicitly verify
> the mmu_role of the unsync shadow page is compatible with the current MMU
> context.  But, except for this specific case, __kvm_sync_page() is called
> iff the page is compatible, e.g. the transient sync in kvm_mmu_get_page()
> requires an exact role match, and the call from kvm_sync_mmu_roots() is
> only synchronizing shadow pages from the current MMU (which better be
> compatible or KVM has problems).  And as described above, attempting to
> sync shadow pages when creating an upper-level shadow page is unlikely
> to succeed, e.g. zero successful syncs were observed when running Linux
> guests despite over a million attempts.

One issue, this WARN_ON may now trigger:

                         WARN_ON(!list_empty(&invalid_list));

due to a kvm_mmu_prepare_zap_page that could have happened on an earlier 
iteration of the for_each_valid_sp.  Before your change, __kvm_sync_page 
would be called always before kvm_sync_pages could add anything to 
invalid_list.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 03/54] KVM: x86: Properly reset MMU context at vCPU RESET/INIT
  2021-06-23 14:01   ` Paolo Bonzini
@ 2021-06-23 14:50     ` Sean Christopherson
  0 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 14:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 22/06/21 19:56, Sean Christopherson wrote:
> > +	/*
> > +	 * Reset the MMU context if paging was enabled prior to INIT (which is
> > +	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
> > +	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
> > +	 * checked because it is unconditionally cleared on INIT and all other
> > +	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
> > +	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
> > +	 */
> > +	if (old_cr0 & X86_CR0_PG)
> > +		kvm_mmu_reset_context(vcpu);
> 
> Hmm, I'll answer myself, is it because of the plan to add a vCPU reset
> ioctl?

Heh, no, I'm not thinking that far ahead at the moment.

Using "if (init_event)" also resets the MMU when paging was disabled prior to
INIT, which is unnecessary.  "if (init_event && (old_cr0 & X86_CR0_PG))" would
obviously work, but I guess I was feeling clever.

As for why I don't want to unnecessarily reset the MMU, my preference for the MMU
role/context logic is to be as precise as possible to help document "why".  Doing
a MMU reset on any INIT obviously won't break anything, but it doesn't highlight
that the true motivation is CR0.PG being cleared, not simply that INIT occurred.
I.e. the MMU context is a KVM construct, there is no architectural model that
we're trying to follow.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN
  2021-06-23 14:36   ` Paolo Bonzini
@ 2021-06-23 15:08     ` Sean Christopherson
  2021-06-23 16:38       ` Paolo Bonzini
  0 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 15:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 22/06/21 19:56, Sean Christopherson wrote:
> > When creating a new upper-level shadow page, zap unsync shadow pages at
> > the same target gfn instead of attempting to sync the pages.  This fixes
> > a bug where an unsync shadow page could be sync'd with an incompatible
> > context, e.g. wrong smm, is_guest, etc... flags.  In practice, the bug is
> > relatively benign as sync_page() is all but guaranteed to fail its check
> > that the guest's desired gfn (for the to-be-sync'd page) matches the
> > current gfn associated with the shadow page.  I.e. kvm_sync_page() would
> > end up zapping the page anyways.
> > 
> > Alternatively, __kvm_sync_page() could be modified to explicitly verify
> > the mmu_role of the unsync shadow page is compatible with the current MMU
> > context.  But, except for this specific case, __kvm_sync_page() is called
> > iff the page is compatible, e.g. the transient sync in kvm_mmu_get_page()
> > requires an exact role match, and the call from kvm_sync_mmu_roots() is
> > only synchronizing shadow pages from the current MMU (which better be
> > compatible or KVM has problems).  And as described above, attempting to
> > sync shadow pages when creating an upper-level shadow page is unlikely
> > to succeed, e.g. zero successful syncs were observed when running Linux
> > guests despite over a million attempts.
> 
> One issue, this WARN_ON may now trigger:
> 
>                         WARN_ON(!list_empty(&invalid_list));
> 
> due to a kvm_mmu_prepare_zap_page that could have happened on an earlier
> iteration of the for_each_valid_sp.  Before your change, __kvm_sync_page
> would be called always before kvm_sync_pages could add anything to
> invalid_list.

Ah, I should have added a comment.  It took me a few minutes of staring to
remember why it can't fire.

The branch at (2), which adds to invalid_list, is taken if and only if the new
page is not a 4k page.

The branch at (3) is taken if and only if the existing page is a 4k page, because
only 4k pages can become unsync.

Because the shadow page's level is incorporated into its role, if the level of
the new page is >4k, the branch at (1) will be taken for all 4k shadow pages.

Maybe something like this for a comment?

			/*
			 * Assert that the page was not zapped if the "sync" was
			 * successful.  Note, this cannot collide with the above
			 * zapping of unsync pages, as this point is reached iff
			 * the new page is a 4k page (only 4k pages can become
			 * unsync and the role check ensures identical levels),
			 * and zapping occurs iff the new page is NOT a 4k page.
			 */
			WARN_ON(!list_empty(&invalid_list));




1)		if (sp->role.word != role.word) {
			/*
			 * If the guest is creating an upper-level page, zap
			 * unsync pages for the same gfn.  While it's possible
			 * the guest is using recursive page tables, in all
			 * likelihood the guest has stopped using the unsync
			 * page and is installing a completely unrelated page.
			 * Unsync pages must not be left as is, because the new
			 * upper-level page will be write-protected.
			 */
2)			if (level > PG_LEVEL_4K && sp->unsync)
				kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
							 &invalid_list);
			continue;
		}

		if (direct_mmu)
			goto trace_get_page;

3)		if (sp->unsync) {
			/*
			 * The page is good, but is stale.  "Sync" the page to
			 * get the latest guest state, but don't write-protect
			 * the page and don't mark it synchronized!  KVM needs
			 * to ensure the mapping is valid, but doesn't need to
			 * fully sync (write-protect) the page until the guest
			 * invalidates the TLB mapping.  This allows multiple
			 * SPs for a single gfn to be unsync.
			 *
			 * If the sync fails, the page is zapped.  If so, break
			 * If so, break in order to rebuild it.
			 */
			if (!kvm_sync_page(vcpu, sp, &invalid_list))
				break;

			WARN_ON(!list_empty(&invalid_list));
			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
		}

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 10/54] KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check
  2021-06-22 17:56 ` [PATCH 10/54] KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check Sean Christopherson
@ 2021-06-23 15:49   ` Paolo Bonzini
  2021-06-23 16:17     ` Sean Christopherson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 15:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:56, Sean Christopherson wrote:
> Replace the hack to identify nested EPT shadow pages with a simple check
> that the size of the guest PTEs associated with the shadow page and the
> current MMU match, which is the intent of the "8 bytes == PAE" test.
> The nested EPT hack existed to avoid a false negative due to the is_pae()
> check not matching for 32-bit L2 guests; checking the MMU role directly
> avoids the indirect calculation of the guest PTE size entirely.

What the commit message doesn't say is, did we miss this opportunity all
along, or has there been a change since commit 47c42e6b4192 ("KVM: x86:
fix handling of role.cr4_pae and rename it to 'gpte_size'", 2019-03-28)
that allows this?

I think the only change needed would be making the commit something like
this:

==========
KVM: x86/mmu: Use MMU role to check for matching guest page sizes

Originally, __kvm_sync_page used to check the cr4_pae bit in the role
to avoid zapping 4-byte kvm_mmu_pages when guest page size are 8-byte
or the other way round.  However, in commit 47c42e6b4192 ("KVM: x86: fix
handling of role.cr4_pae and rename it to 'gpte_size'", 2019-03-28) it
was observed that this did not work for nested EPT, where the page table
size would be 8 bytes even if CR4.PAE=0.  (Note that the check still
has to be done for nested *NPT*, so it is not possible to use tdp_enabled
or similar).

Therefore, a hack was introduced to identify nested EPT shadow pages
and unconditionally call __kvm_sync_page() on them.  However, it is
possible to do without the hack to identify nested EPT shadow pages:
if EPT is active, there will be no shadow pages in non-EPT format,
and all of them will have gpte_is_8_bytes set to true; we can just
check the MMU role directly, and the test will always be true.

Even for non-EPT shadow MMUs, this test should really always be true
now that __kvm_sync_page() is called if and only if the role is an
exact match (kvm_mmu_get_page()) or is part of the current MMU context
(kvm_mmu_sync_roots()).  A future commit will convert the likely-pointless
check into a meaningful WARN to enforce that the mmu_roles of the current
context and the shadow page are compatible.
==========


Paolo

> Note, this should be a glorified nop now that __kvm_sync_page() is called
> if and only if the role is an exact match (kvm_mmu_get_page()) or is part
> of the current MMU context (kvm_mmu_sync_roots()).  A future commit will
> convert the likely-pointless check into a meaningful WARN to enforce that
> the mmu_roles of the current context and the shadow page are compatible.
> 
> Cc: Vitaly Kuznetsov<vkuznets@redhat.com>
> Signed-off-by: Sean Christopherson<seanjc@google.com>


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 10/54] KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check
  2021-06-23 15:49   ` Paolo Bonzini
@ 2021-06-23 16:17     ` Sean Christopherson
  2021-06-23 16:41       ` Paolo Bonzini
  0 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 16:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 22/06/21 19:56, Sean Christopherson wrote:
> > Replace the hack to identify nested EPT shadow pages with a simple check
> > that the size of the guest PTEs associated with the shadow page and the
> > current MMU match, which is the intent of the "8 bytes == PAE" test.
> > The nested EPT hack existed to avoid a false negative due to the is_pae()
> > check not matching for 32-bit L2 guests; checking the MMU role directly
> > avoids the indirect calculation of the guest PTE size entirely.
> 
> What the commit message doesn't say is, did we miss this opportunity all
> along, or has there been a change since commit 47c42e6b4192 ("KVM: x86:
> fix handling of role.cr4_pae and rename it to 'gpte_size'", 2019-03-28)
> that allows this?

The code was wrong from the initial "unsync" commit.  The 4-byte vs. 8-byte check
papered over the real bug, which was that the roles were not checked for
compabitility.  I suspect that the bug only manisfested as an observable problem
when the GPTE sizes mismatched, thus the PAE check was added.

So yes, there was an "opportunity" that was missed all along.

> I think the only change needed would be making the commit something like
> this:
> 
> ==========
> KVM: x86/mmu: Use MMU role to check for matching guest page sizes
> 
> Originally, __kvm_sync_page used to check the cr4_pae bit in the role
> to avoid zapping 4-byte kvm_mmu_pages when guest page size are 8-byte
> or the other way round.  However, in commit 47c42e6b4192 ("KVM: x86: fix
> handling of role.cr4_pae and rename it to 'gpte_size'", 2019-03-28) it
> was observed that this did not work for nested EPT, where the page table
> size would be 8 bytes even if CR4.PAE=0.  (Note that the check still
> has to be done for nested *NPT*, so it is not possible to use tdp_enabled
> or similar).
> 
> Therefore, a hack was introduced to identify nested EPT shadow pages
> and unconditionally call __kvm_sync_page() on them.  However, it is
> possible to do without the hack to identify nested EPT shadow pages:
> if EPT is active, there will be no shadow pages in non-EPT format,
> and all of them will have gpte_is_8_bytes set to true; we can just
> check the MMU role directly, and the test will always be true.
> 
> Even for non-EPT shadow MMUs, this test should really always be true
> now that __kvm_sync_page() is called if and only if the role is an
> exact match (kvm_mmu_get_page()) or is part of the current MMU context
> (kvm_mmu_sync_roots()).  A future commit will convert the likely-pointless
> check into a meaningful WARN to enforce that the mmu_roles of the current
> context and the shadow page are compatible.
> ==========
> 
> 
> Paolo
> 
> > Note, this should be a glorified nop now that __kvm_sync_page() is called
> > if and only if the role is an exact match (kvm_mmu_get_page()) or is part
> > of the current MMU context (kvm_mmu_sync_roots()).  A future commit will
> > convert the likely-pointless check into a meaningful WARN to enforce that
> > the mmu_roles of the current context and the shadow page are compatible.
> > 
> > Cc: Vitaly Kuznetsov<vkuznets@redhat.com>
> > Signed-off-by: Sean Christopherson<seanjc@google.com>
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN
  2021-06-23 15:08     ` Sean Christopherson
@ 2021-06-23 16:38       ` Paolo Bonzini
  2021-06-23 22:04         ` Sean Christopherson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 16:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 23/06/21 17:08, Sean Christopherson wrote:
> Because the shadow page's level is incorporated into its role, if the level of
> the new page is >4k, the branch at (1) will be taken for all 4k shadow pages.
> 
> Maybe something like this for a comment?

Good, integrated.

Though I also wonder why breaking out of the loop early is okay.  Initially I thought
that zapping only matters if there's no existing page with the desired role,
because otherwise the unsync page would have been zapped already by an earlier
kvm_get_mmu_page, but what if the page was synced at the time of kvm_get_mmu_page
and then both were unsynced?

It may be easier to just split the loop to avoid that additional confusion,
something like:

         /*
          * If the guest is creating an upper-level page, zap unsync pages
          * for the same gfn, because the gfn will be write protected and
          * future syncs of those unsync pages could happen with an incompatible
          * context.  While it's possible the guest is using recursive page
          * tables, in all likelihood the guest has stopped using the unsync
          * page and is installing a completely unrelated page.
          */
         if (level > PG_LEVEL_4K) {
                 for_each_valid_sp(vcpu->kvm, sp, sp_list)
                         if (sp->gfn == gfn && sp->role.word != role.word && sp->unsync)
                                 kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
                                                          &invalid_list);
         }

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 10/54] KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check
  2021-06-23 16:17     ` Sean Christopherson
@ 2021-06-23 16:41       ` Paolo Bonzini
  2021-06-23 16:54         ` Sean Christopherson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 16:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 23/06/21 18:17, Sean Christopherson wrote:
>> What the commit message doesn't say is, did we miss this
>> opportunity all along, or has there been a change since commit
>> 47c42e6b4192 ("KVM: x86: fix handling of role.cr4_pae and rename it
>> to 'gpte_size'", 2019-03-28) that allows this?
>
> The code was wrong from the initial "unsync" commit.  The 4-byte vs.
> 8-byte check papered over the real bug, which was that the roles were
> not checked for compabitility.  I suspect that the bug only
> manisfested as an observable problem when the GPTE sizes mismatched,
> thus the PAE check was added.

I meant that we really never needed is_ept_sp, and you could have used 
the simpler check already at the time you introduced gpte_is_8_bytes. 
But anyway I think we're in agreement.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 12/54] KVM: x86/mmu: Drop the intermediate "transient" __kvm_sync_page()
  2021-06-22 17:56 ` [PATCH 12/54] KVM: x86/mmu: Drop the intermediate "transient" __kvm_sync_page() Sean Christopherson
@ 2021-06-23 16:54   ` Paolo Bonzini
  0 siblings, 0 replies; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 16:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:56, Sean Christopherson wrote:
> @@ -2008,10 +2001,19 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>   			goto trace_get_page;
>   
>   		if (sp->unsync) {
> -			/* The page is good, but __kvm_sync_page might still end
> -			 * up zapping it.  If so, break in order to rebuild it.
> +			/*
> +			 * The page is good, but is stale.  "Sync" the page to
> +			 * get the latest guest state, but don't write-protect
> +			 * the page and don't mark it synchronized!  KVM needs
> +			 * to ensure the mapping is valid, but doesn't need to
> +			 * fully sync (write-protect) the page until the guest
> +			 * invalidates the TLB mapping.  This allows multiple
> +			 * SPs for a single gfn to be unsync.
> +			 *
> +			 * If the sync fails, the page is zapped.  If so, break
> +			 * If so, break in order to rebuild it.
>   			 */

This should be a separate patch I think.  In addition it should point out the
place where write protection does happen, which is mmu_unsync_children:

                         /*
                          * The page is good, but is stale.  kvm_sync_page does
                          * get the latest guest state, but (unlike mmu_unsync_children)
                          * it doesn't write-protect the page or mark it synchronized!
                          * This way the validity of the mapping is ensured, but the
                          * overhead of write protection is not incurred until the
                          * guest invalidates the TLB mapping.  This allows multiple
                          * SPs for a single gfn to be unsync.
                          *
                          * If the sync fails, the page is zapped.  If so, break
                          * in order to rebuild it.
                          */

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 10/54] KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check
  2021-06-23 16:41       ` Paolo Bonzini
@ 2021-06-23 16:54         ` Sean Christopherson
  0 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 16:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 23/06/21 18:17, Sean Christopherson wrote:
> > > What the commit message doesn't say is, did we miss this
> > > opportunity all along, or has there been a change since commit
> > > 47c42e6b4192 ("KVM: x86: fix handling of role.cr4_pae and rename it
> > > to 'gpte_size'", 2019-03-28) that allows this?
> > 
> > The code was wrong from the initial "unsync" commit.  The 4-byte vs.
> > 8-byte check papered over the real bug, which was that the roles were
> > not checked for compabitility.  I suspect that the bug only
> > manisfested as an observable problem when the GPTE sizes mismatched,
> > thus the PAE check was added.
> 
> I meant that we really never needed is_ept_sp, and you could have used the
> simpler check already at the time you introduced gpte_is_8_bytes. But anyway
> I think we're in agreement.

Ah, yes, I was too clever :-/

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken
  2021-06-23 14:16   ` Paolo Bonzini
@ 2021-06-23 17:00     ` Jim Mattson
  2021-06-23 17:11       ` Paolo Bonzini
  0 siblings, 1 reply; 103+ messages in thread
From: Jim Mattson @ 2021-06-23 17:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021 at 7:16 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 22/06/21 19:56, Sean Christopherson wrote:
> > +     /*
> > +      * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
> > +      * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> > +      * tracked in kvm_mmu_page_role.  As a result, KVM may miss guest page
> > +      * faults due to reusing SPs/SPTEs.  Alert userspace, but otherwise
> > +      * sweep the problem under the rug.
> > +      *
> > +      * KVM's horrific CPUID ABI makes the problem all but impossible to
> > +      * solve, as correctly handling multiple vCPU models (with respect to
> > +      * paging and physical address properties) in a single VM would require
> > +      * tracking all relevant CPUID information in kvm_mmu_page_role.  That
> > +      * is very undesirable as it would double the memory requirements for
> > +      * gfn_track (see struct kvm_mmu_page_role comments), and in practice
> > +      * no sane VMM mucks with the core vCPU model on the fly.
> > +      */
> > +     if (vcpu->arch.last_vmentry_cpu != -1)
> > +             pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} after KVM_RUN may cause guest instability\n");
>
> Let's make this even stronger and promise to break it in 5.16.
>
> Paolo

Doesn't this fall squarely into kvm's philosophy of "we should let
userspace shoot itself in the foot wherever possible"? I thought we
only stepped in when host stability was an issue.

I'm actually delighted if this is a sign that we're rethinking that
philosophy. I'd just like to hear someone say it.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 15/54] KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU state
  2021-06-22 17:57 ` [PATCH 15/54] KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU state Sean Christopherson
@ 2021-06-23 17:06   ` Paolo Bonzini
  2021-06-23 20:49     ` Sean Christopherson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 17:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:57, Sean Christopherson wrote:
> +	/*
> +	 * L1's CR4 and EFER are stuffed into vmcb01 by the caller.  Note, when
> +	 * called via KVM_SET_NESTED_STATE, that state may_not_  match current
> +	 * vCPU state.  CR0.WP is explicitly ignored, while CR0.PG is required.
> +	 */

"stuffed into" doesn't really match reality of vmentry, though it works 
for KVM_SET_NESTED_STATE.  What about a more neutral "The NPT format 
depends on L1's CR4 and EFER, which is in vmcb01"?

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken
  2021-06-23 17:00     ` Jim Mattson
@ 2021-06-23 17:11       ` Paolo Bonzini
  2021-06-23 18:11         ` Jim Mattson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 17:11 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Yu Zhang, Maxim Levitsky

On 23/06/21 19:00, Jim Mattson wrote:
> On Wed, Jun 23, 2021 at 7:16 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 22/06/21 19:56, Sean Christopherson wrote:
>>> +     /*
>>> +      * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
>>> +      * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
>>> +      * tracked in kvm_mmu_page_role.  As a result, KVM may miss guest page
>>> +      * faults due to reusing SPs/SPTEs.  Alert userspace, but otherwise
>>> +      * sweep the problem under the rug.
>>> +      *
>>> +      * KVM's horrific CPUID ABI makes the problem all but impossible to
>>> +      * solve, as correctly handling multiple vCPU models (with respect to
>>> +      * paging and physical address properties) in a single VM would require
>>> +      * tracking all relevant CPUID information in kvm_mmu_page_role.  That
>>> +      * is very undesirable as it would double the memory requirements for
>>> +      * gfn_track (see struct kvm_mmu_page_role comments), and in practice
>>> +      * no sane VMM mucks with the core vCPU model on the fly.
>>> +      */
>>> +     if (vcpu->arch.last_vmentry_cpu != -1)
>>> +             pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} after KVM_RUN may cause guest instability\n");
>>
>> Let's make this even stronger and promise to break it in 5.16.
>>
>> Paolo
> 
> Doesn't this fall squarely into kvm's philosophy of "we should let
> userspace shoot itself in the foot wherever possible"? I thought we
> only stepped in when host stability was an issue.
> 
> I'm actually delighted if this is a sign that we're rethinking that
> philosophy. I'd just like to hear someone say it.

Nah, that's not the philosophy.  The philosophy is that covering all 
possible ways for userspace to shoot itself in the foot is impossible.

However, here we're talking about 2 lines of code (thanks also to your 
patches that add last_vmentry_cpu for completely unrelated reasons) to 
remove a whole set of bullet/foot encounters.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 16/54] KVM: x86/mmu: Drop smep_andnot_wp check from "uses NX" for shadow MMUs
  2021-06-22 17:57 ` [PATCH 16/54] KVM: x86/mmu: Drop smep_andnot_wp check from "uses NX" for shadow MMUs Sean Christopherson
@ 2021-06-23 17:11   ` Paolo Bonzini
  2021-06-23 19:36     ` Sean Christopherson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 17:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:57, Sean Christopherson wrote:
> Drop the smep_andnot_wp role check from the "uses NX" calculation now
> that all non-nested shadow MMUs treat NX as used via the !TDP check.
> 
> The shadow MMU for nested NPT, which shares the helper, does not need to
> deal with SMEP (or WP) as NPT walks are always "user" accesses and WP is
> explicitly noted as being ignored:
> 
>    Table walks for guest page tables are always treated as user writes at
>    the nested page table level.
> 
>    A table walk for the guest page itself is always treated as a user
>    access at the nested page table level
> 
>    The host hCR0.WP bit is ignored under nested paging.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 96c16a6e0044..ca7680d1ea24 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4223,8 +4223,7 @@ reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
>   	 * NX can be used by any non-nested shadow MMU to avoid having to reset
>   	 * MMU contexts.  Note, KVM forces EFER.NX=1 when TDP is disabled.
>   	 */
> -	bool uses_nx = context->nx || !tdp_enabled ||
> -		context->mmu_role.base.smep_andnot_wp;
> +	bool uses_nx = context->nx || !tdp_enabled;
>   	struct rsvd_bits_validate *shadow_zero_check;
>   	int i;
>   
> 

Good idea, but why not squash it into patch 2?

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 18/54] KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper
  2021-06-22 17:57 ` [PATCH 18/54] KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper Sean Christopherson
@ 2021-06-23 17:13   ` Paolo Bonzini
  0 siblings, 0 replies; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 17:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:57, Sean Christopherson wrote:
> Move nested NPT's invocation of reset_shadow_zero_bits_mask() into the
> MMU proper and unexport said function.  Aside from dropping an export,
> this is a baby step toward eliminating the call entirely by fixing the
> shadow_root_level confusion.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Extra points for adding a comment about why the heck it's there.

Paolo

> ---
>   arch/x86/kvm/mmu.h        |  3 ---
>   arch/x86/kvm/mmu/mmu.c    | 11 ++++++++---
>   arch/x86/kvm/svm/nested.c |  1 -
>   3 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 4e926f4935b0..62844bacd13f 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -68,9 +68,6 @@ static __always_inline u64 rsvd_bits(int s, int e)
>   void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
>   void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
>   
> -void
> -reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
> -
>   void kvm_init_mmu(struct kvm_vcpu *vcpu);
>   void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
>   			     unsigned long cr4, u64 efer, gpa_t nested_cr3);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 02c54426e7a2..5a46a87b23b0 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4212,8 +4212,8 @@ static inline u64 reserved_hpa_bits(void)
>    * table in guest or amd nested guest, its mmu features completely
>    * follow the features in guest.
>    */
> -void
> -reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
> +static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
> +					struct kvm_mmu *context)
>   {
>   	/*
>   	 * KVM uses NX when TDP is disabled to handle a variety of scenarios,
> @@ -4247,7 +4247,6 @@ reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
>   	}
>   
>   }
> -EXPORT_SYMBOL_GPL(reset_shadow_zero_bits_mask);
>   
>   static inline bool boot_cpu_is_amd(void)
>   {
> @@ -4714,6 +4713,12 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
>   		 */
>   		context->shadow_root_level = new_role.base.level;
>   	}
> +
> +	/*
> +	 * Redo the shadow bits, the reset done by shadow_mmu_init_context()
> +	 * (above) may use the wrong shadow_root_level.
> +	 */
> +	reset_shadow_zero_bits_mask(vcpu, context);
>   }
>   EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
>   
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 33b2f9337e26..927e545591c3 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -110,7 +110,6 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
>   	vcpu->arch.mmu->get_guest_pgd     = nested_svm_get_tdp_cr3;
>   	vcpu->arch.mmu->get_pdptr         = nested_svm_get_tdp_pdptr;
>   	vcpu->arch.mmu->inject_page_fault = nested_svm_inject_npf_exit;
> -	reset_shadow_zero_bits_mask(vcpu, vcpu->arch.mmu);
>   	vcpu->arch.walk_mmu              = &vcpu->arch.nested_mmu;
>   }
>   
> 


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 20/54] KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from regs
  2021-06-22 17:57 ` [PATCH 20/54] KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from regs Sean Christopherson
  2021-06-23  1:58   ` kernel test robot
@ 2021-06-23 17:18   ` Paolo Bonzini
  1 sibling, 0 replies; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 17:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:57, Sean Christopherson wrote:
> +/*
> + * Yes, lot's of underscores.  They're a hint that you probably shouldn't be
> + * reading from the role_regs.  Once the mmu_role is constructed, it becomes
> + * the single source of truth for the MMU's state.
> + */
> +#define BUILD_MMU_ROLE_REGS_ACCESSOR(reg, name, flag)			\
> +static inline bool ____is_##reg##_##name(struct kvm_mmu_role_regs *regs)\
> +{									\
> +	return !!(regs->reg & flag);					\
> +}

Ok, that's a decent reason to have these accessors in the first place. :)

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken
  2021-06-23 17:11       ` Paolo Bonzini
@ 2021-06-23 18:11         ` Jim Mattson
  2021-06-23 18:49           ` Paolo Bonzini
  0 siblings, 1 reply; 103+ messages in thread
From: Jim Mattson @ 2021-06-23 18:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021 at 10:11 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 23/06/21 19:00, Jim Mattson wrote:
> > On Wed, Jun 23, 2021 at 7:16 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 22/06/21 19:56, Sean Christopherson wrote:
> >>> +     /*
> >>> +      * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
> >>> +      * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> >>> +      * tracked in kvm_mmu_page_role.  As a result, KVM may miss guest page
> >>> +      * faults due to reusing SPs/SPTEs.  Alert userspace, but otherwise
> >>> +      * sweep the problem under the rug.
> >>> +      *
> >>> +      * KVM's horrific CPUID ABI makes the problem all but impossible to
> >>> +      * solve, as correctly handling multiple vCPU models (with respect to
> >>> +      * paging and physical address properties) in a single VM would require
> >>> +      * tracking all relevant CPUID information in kvm_mmu_page_role.  That
> >>> +      * is very undesirable as it would double the memory requirements for
> >>> +      * gfn_track (see struct kvm_mmu_page_role comments), and in practice
> >>> +      * no sane VMM mucks with the core vCPU model on the fly.
> >>> +      */
> >>> +     if (vcpu->arch.last_vmentry_cpu != -1)
> >>> +             pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} after KVM_RUN may cause guest instability\n");
> >>
> >> Let's make this even stronger and promise to break it in 5.16.
> >>
> >> Paolo
> >
> > Doesn't this fall squarely into kvm's philosophy of "we should let
> > userspace shoot itself in the foot wherever possible"? I thought we
> > only stepped in when host stability was an issue.
> >
> > I'm actually delighted if this is a sign that we're rethinking that
> > philosophy. I'd just like to hear someone say it.
>
> Nah, that's not the philosophy.  The philosophy is that covering all
> possible ways for userspace to shoot itself in the foot is impossible.
>
> However, here we're talking about 2 lines of code (thanks also to your
> patches that add last_vmentry_cpu for completely unrelated reasons) to
> remove a whole set of bullet/foot encounters.

What about the problems that arise when we have different CPUID tables
for different vCPUs in the same VM? Can we just replace this
hole-in-foot inducing ioctl with a KVM_VM_SET_CPUID ioctl on the VM
level that has to be called before any vCPUs are created?

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken
  2021-06-23 18:11         ` Jim Mattson
@ 2021-06-23 18:49           ` Paolo Bonzini
  2021-06-23 19:02             ` Jim Mattson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 18:49 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Yu Zhang, Maxim Levitsky

On 23/06/21 20:11, Jim Mattson wrote:
> On Wed, Jun 23, 2021 at 10:11 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Nah, that's not the philosophy.  The philosophy is that covering all
>> possible ways for userspace to shoot itself in the foot is impossible.
>>
>> However, here we're talking about 2 lines of code (thanks also to your
>> patches that add last_vmentry_cpu for completely unrelated reasons) to
>> remove a whole set of bullet/foot encounters.
> 
> What about the problems that arise when we have different CPUID tables
> for different vCPUs in the same VM? Can we just replace this
> hole-in-foot inducing ioctl with a KVM_VM_SET_CPUID ioctl on the VM
> level that has to be called before any vCPUs are created?

Are there any KVM bugs that this can fix?  The problem is that, unlike 
this case, it would be effectively impossible to deprecate 
KVM_SET_CPUID2 as a vcpu ioctl, so it would be hard to reap any benefits 
in KVM.

BTW, there is actually a theoretical usecase for KVM_SET_CPUID2 after 
KVM_RUN, which is to test OSes against microcode updates that hide, 
totally random example, the RTM bit.  But it's still not worth keeping 
it given 1) the bugs and complications in KVM, 2) if you really wanted 
that kind of testing so hard, the fact that you can just create a new 
vcpu file descriptor from scratch, possibly in cooperation with 
userspace MSR filtering 3) AFAIK no one has done that anyway in 15 years.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken
  2021-06-23 18:49           ` Paolo Bonzini
@ 2021-06-23 19:02             ` Jim Mattson
  2021-06-23 19:53               ` Paolo Bonzini
  0 siblings, 1 reply; 103+ messages in thread
From: Jim Mattson @ 2021-06-23 19:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021 at 11:49 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 23/06/21 20:11, Jim Mattson wrote:
> > On Wed, Jun 23, 2021 at 10:11 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> Nah, that's not the philosophy.  The philosophy is that covering all
> >> possible ways for userspace to shoot itself in the foot is impossible.
> >>
> >> However, here we're talking about 2 lines of code (thanks also to your
> >> patches that add last_vmentry_cpu for completely unrelated reasons) to
> >> remove a whole set of bullet/foot encounters.
> >
> > What about the problems that arise when we have different CPUID tables
> > for different vCPUs in the same VM? Can we just replace this
> > hole-in-foot inducing ioctl with a KVM_VM_SET_CPUID ioctl on the VM
> > level that has to be called before any vCPUs are created?
>
> Are there any KVM bugs that this can fix?  The problem is that, unlike
> this case, it would be effectively impossible to deprecate
> KVM_SET_CPUID2 as a vcpu ioctl, so it would be hard to reap any benefits
> in KVM.
>
> BTW, there is actually a theoretical usecase for KVM_SET_CPUID2 after
> KVM_RUN, which is to test OSes against microcode updates that hide,
> totally random example, the RTM bit.  But it's still not worth keeping
> it given 1) the bugs and complications in KVM, 2) if you really wanted
> that kind of testing so hard, the fact that you can just create a new
> vcpu file descriptor from scratch, possibly in cooperation with
> userspace MSR filtering 3) AFAIK no one has done that anyway in 15 years.

Though such a usecase may exist, I don't think it actually works
today. For example, kvm_vcpu_after_set_cpuid() potentially changes the
value of the guest IA32_PERF_GLOBAL_CTRL MSR.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 16/54] KVM: x86/mmu: Drop smep_andnot_wp check from "uses NX" for shadow MMUs
  2021-06-23 17:11   ` Paolo Bonzini
@ 2021-06-23 19:36     ` Sean Christopherson
  0 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 19:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 22/06/21 19:57, Sean Christopherson wrote:
> > Drop the smep_andnot_wp role check from the "uses NX" calculation now
> > that all non-nested shadow MMUs treat NX as used via the !TDP check.
> > 
> > The shadow MMU for nested NPT, which shares the helper, does not need to
> > deal with SMEP (or WP) as NPT walks are always "user" accesses and WP is
> > explicitly noted as being ignored:
> > 
> >    Table walks for guest page tables are always treated as user writes at
> >    the nested page table level.
> > 
> >    A table walk for the guest page itself is always treated as a user
> >    access at the nested page table level
> > 
> >    The host hCR0.WP bit is ignored under nested paging.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/mmu/mmu.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 96c16a6e0044..ca7680d1ea24 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4223,8 +4223,7 @@ reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
> >   	 * NX can be used by any non-nested shadow MMU to avoid having to reset
> >   	 * MMU contexts.  Note, KVM forces EFER.NX=1 when TDP is disabled.
> >   	 */
> > -	bool uses_nx = context->nx || !tdp_enabled ||
> > -		context->mmu_role.base.smep_andnot_wp;
> > +	bool uses_nx = context->nx || !tdp_enabled;
> >   	struct rsvd_bits_validate *shadow_zero_check;
> >   	int i;
> > 
> 
> Good idea, but why not squash it into patch 2?

Because that patch is marked for stable and dropping the smep_andnot_wp is not
necessary to fix the bug.  At worst, the too-liberal uses_nx will suppress the
WARN in handle_mmio_page_fault() because this is for checking KVM's SPTEs, not
the guest's SPTEs, i.e. KVM won't miss a guest reserved NX #PF.

That said, I'm not at all opposed to squashing this.  I have a feeling I originally
split the patches because I wasn't super confident about either change, and never
revisited them.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken
  2021-06-23 19:02             ` Jim Mattson
@ 2021-06-23 19:53               ` Paolo Bonzini
  0 siblings, 0 replies; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 19:53 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm list, LKML, Yu Zhang, Maxim Levitsky

On 23/06/21 21:02, Jim Mattson wrote:
>>
>> BTW, there is actually a theoretical usecase for KVM_SET_CPUID2 after
>> KVM_RUN, which is to test OSes against microcode updates that hide,
>> totally random example, the RTM bit.  But it's still not worth keeping
>> it given 1) the bugs and complications in KVM, 2) if you really wanted
>> that kind of testing so hard, the fact that you can just create a new
>> vcpu file descriptor from scratch, possibly in cooperation with
>> userspace MSR filtering 3) AFAIK no one has done that anyway in 15 years.
>
> Though such a usecase may exist, I don't think it actually works
> today. For example, kvm_vcpu_after_set_cpuid() potentially changes the
> value of the guest IA32_PERF_GLOBAL_CTRL MSR.

Yep, and that's why I'm okay with actively deprecating KVM_SET_CPUID2 
and not just "discouraging" it.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 25/54] KVM: x86/mmu: Add helpers to query mmu_role bits
  2021-06-22 17:57 ` [PATCH 25/54] KVM: x86/mmu: Add helpers to query mmu_role bits Sean Christopherson
@ 2021-06-23 20:02   ` Paolo Bonzini
  2021-06-23 20:47     ` Sean Christopherson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 20:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:57, Sean Christopherson wrote:
> +static inline bool is_##reg##_##name(struct kvm_mmu *mmu)	\

What do you think about calling these is_mmu_##name?  The point of 
having these helpers is that the register doesn't count, and they return 
the effective value (e.g. false in most EPT cases).

Paolo

> +{								\
> +	return !!(mmu->mmu_role. base_or_ext . reg##_##name);	\
> +}
> +BUILD_MMU_ROLE_ACCESSOR(ext,  cr0, pg);
> +BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
> +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
> +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pae);
> +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
> +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
> +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
> +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
> +BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
> +
>   struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 41/54] KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
  2021-06-22 17:57 ` [PATCH 41/54] KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls Sean Christopherson
@ 2021-06-23 20:07   ` Paolo Bonzini
  2021-06-23 20:53     ` Sean Christopherson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 20:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:57, Sean Christopherson wrote:
> Move calls to reset_rsvds_bits_mask() out of the various mode statements
> and under a more generic !CR0.PG check

CR0.PG=1, not =0.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 47/54] KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU
  2021-06-22 17:57 ` [PATCH 47/54] KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU Sean Christopherson
@ 2021-06-23 20:13   ` Paolo Bonzini
  0 siblings, 0 replies; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 20:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:57, Sean Christopherson wrote:
> +	/*
> +	 * Use a bitwise-OR instead of a logical-OR to aggregate the reserved
> +	 * bits and EPT's invalid memtype/XWR checks to avoid an extra Jcc
> +	 * (this is used in hot paths).

Probably s/this is used in hot paths/this is extremely unlikely to be 
short-circuited as true/, since we are at it.

Paolo

> +	 */
> +	return __is_bad_mt_xwr(rsvd_check, spte) |
> +	       __is_rsvd_bits_set(rsvd_check, spte, level);


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 50/54] KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
  2021-06-22 17:57 ` [PATCH 50/54] KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic Sean Christopherson
@ 2021-06-23 20:22   ` Paolo Bonzini
  2021-06-23 20:58     ` Sean Christopherson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 20:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:57, Sean Christopherson wrote:
> For 32-bit paging (non-PAE), the adjustments are needed purely because
> bit 7 is ignored if PSE=0.  Retain that logic as is, but make
> is_last_gpte() unique per PTTYPE

... which makes total sense given where it's used, too.

> +#if PTTYPE == 32
> +	/*
> +	 * 32-bit paging requires special handling because bit 7 is ignored if
> +	 * CR4.PSE=0, not reserved.  Clear bit 7 in the gpte if the level is
> +	 * greater than the last level for which bit 7 is the PAGE_SIZE bit.
> +	 *
> +	 * The RHS has bit 7 set iff level < (2 + PSE).  If it is clear, bit 7
> +	 * is not reserved and does not indicate a large page at this level,
> +	 * so clear PT_PAGE_SIZE_MASK in gpte if that is the case.
> +	 */
> +	gpte &= level - (PT32_ROOT_LEVEL + !!mmu->mmu_role.ext.cr4_pse);

!! is not needed and possibly slightly confusing?  (We know it's a 
single bit).

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning
  2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
                   ` (53 preceding siblings ...)
  2021-06-22 17:57 ` [PATCH 54/54] KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on Sean Christopherson
@ 2021-06-23 20:29 ` Paolo Bonzini
  2021-06-23 21:06   ` Sean Christopherson
  54 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 20:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 22/06/21 19:56, Sean Christopherson wrote:
> Patch 01 is the only patch that is remotely 5.13 worthy, and even then
> only because it's about as safe as a patch can be.  Everything else is far
> from urgent as these bugs have existed for quite some time.

Maybe patch 54 (not sarcastic), but I agree it's not at all necessary.

This is good stuff, I made a few comments but almost all of them (all 
except the last comment on patch 9, "Unconditionally zap unsync SPs") 
are cosmetic and I can resolve them myself.

I'd like your input on renaming is_{cr0,cr4,efer}_* to is_mmu_* (and 
possibly reduce the four underscores to two...).

If I get remarks by tomorrow, I'll get this into 5.14, otherwise 
consider everything but the first eight patches queued only for 5.15.

> I labeled the "sections" of this mess in the shortlog below.
> 
> P.S. Does anyone know how PKRU interacts with NPT?  I assume/hope NPT
>       accesses, which are always "user", ignore PKRU, but the APM doesn't
>       say a thing.  If PKRU is ignored, KVM has some fixing to do.  If PKRU
>       isn't ignored, AMD has some fixing to do:-)
> 
> P.S.S. This series pulled in one patch from my vCPU RESET/INIT series,
>         "Properly reset MMU context at vCPU RESET/INIT", as that was needed
>         to fix a root_level bug on VMX.  My goal is to get the RESET/INIT
>         series refreshed later this week and thoroughly bombard everyone.

Note that it won't get into 5.14 anyway, since I plan to send my first 
pull request to Linus as soon as Friday.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 25/54] KVM: x86/mmu: Add helpers to query mmu_role bits
  2021-06-23 20:02   ` Paolo Bonzini
@ 2021-06-23 20:47     ` Sean Christopherson
  2021-06-23 20:53       ` Paolo Bonzini
  0 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 22/06/21 19:57, Sean Christopherson wrote:
> > +static inline bool is_##reg##_##name(struct kvm_mmu *mmu)	\
> 
> What do you think about calling these is_mmu_##name?  The point of having
> these helpers is that the register doesn't count, and they return the
> effective value (e.g. false in most EPT cases).

I strongly prefer to keep <reg> in the name, both to match the mmu_role bits and
to make it a bit more clear that it's reflective (modified) register state, as
opposed to PTEs or even something else entirely.  E.g. I always struggled to
remember the purpose of mmu->nx flag.

I wouldn't be opposed to is_mmu_##reg##_##name() though.  I omitted the "mmu"
part because it was loosely implied by the "struct kvm_mmu" param, and to keep
line lengths short.  But being explicit is usually a good thing, and looking at
the code I don't see any lines that would wrap if "mmu" were added.

> > +{								\
> > +	return !!(mmu->mmu_role. base_or_ext . reg##_##name);	\
> > +}
> > +BUILD_MMU_ROLE_ACCESSOR(ext,  cr0, pg);
> > +BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
> > +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
> > +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pae);
> > +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
> > +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
> > +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
> > +BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
> > +BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
> > +
> >   struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 15/54] KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU state
  2021-06-23 17:06   ` Paolo Bonzini
@ 2021-06-23 20:49     ` Sean Christopherson
  0 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 20:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 22/06/21 19:57, Sean Christopherson wrote:
> > +	/*
> > +	 * L1's CR4 and EFER are stuffed into vmcb01 by the caller.  Note, when
> > +	 * called via KVM_SET_NESTED_STATE, that state may_not_  match current
> > +	 * vCPU state.  CR0.WP is explicitly ignored, while CR0.PG is required.
> > +	 */
> 
> "stuffed into" doesn't really match reality of vmentry, though it works for
> KVM_SET_NESTED_STATE.  What about a more neutral "The NPT format depends on
> L1's CR4 and EFER, which is in vmcb01"?

Ah, true.  Works for me.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 25/54] KVM: x86/mmu: Add helpers to query mmu_role bits
  2021-06-23 20:47     ` Sean Christopherson
@ 2021-06-23 20:53       ` Paolo Bonzini
  0 siblings, 0 replies; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 20:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 23/06/21 22:47, Sean Christopherson wrote:
>> What do you think about calling these is_mmu_##name?  The point of having
>> these helpers is that the register doesn't count, and they return the
>> effective value (e.g. false in most EPT cases).
>
> I strongly prefer to keep <reg> in the name, both to match the mmu_role bits and
> to make it a bit more clear that it's reflective (modified) register state, as
> opposed to PTEs or even something else entirely.  E.g. I always struggled to
> remember the purpose of mmu->nx flag.

No problem.  I do disagree that it's register state ("modified" seems to 
be more than a parenthetical remark), but not enough to argue about it 
and even less to do the work to rename the accessors.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 41/54] KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
  2021-06-23 20:07   ` Paolo Bonzini
@ 2021-06-23 20:53     ` Sean Christopherson
  0 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 20:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 22/06/21 19:57, Sean Christopherson wrote:
> > Move calls to reset_rsvds_bits_mask() out of the various mode statements
> > and under a more generic !CR0.PG check
> 
> CR0.PG=1, not =0.

It's always some mundane detail!

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 50/54] KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
  2021-06-23 20:22   ` Paolo Bonzini
@ 2021-06-23 20:58     ` Sean Christopherson
  0 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 20:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 22/06/21 19:57, Sean Christopherson wrote:
> > +#if PTTYPE == 32
> > +	/*
> > +	 * 32-bit paging requires special handling because bit 7 is ignored if
> > +	 * CR4.PSE=0, not reserved.  Clear bit 7 in the gpte if the level is
> > +	 * greater than the last level for which bit 7 is the PAGE_SIZE bit.
> > +	 *
> > +	 * The RHS has bit 7 set iff level < (2 + PSE).  If it is clear, bit 7
> > +	 * is not reserved and does not indicate a large page at this level,
> > +	 * so clear PT_PAGE_SIZE_MASK in gpte if that is the case.
> > +	 */
> > +	gpte &= level - (PT32_ROOT_LEVEL + !!mmu->mmu_role.ext.cr4_pse);
> 
> !! is not needed and possibly slightly confusing?  (We know it's a single
> bit).

Ah, I had it backwards.  I misremembered the "!!" logic added around the
mmu_role helpers, but that was to ensure that e.g. kvm_read_cr4_bits() was
squished down into a 0/1 value when setting the mmu_role bit.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning
  2021-06-23 20:29 ` [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Paolo Bonzini
@ 2021-06-23 21:06   ` Sean Christopherson
  2021-06-23 21:33     ` Paolo Bonzini
  0 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 21:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 22/06/21 19:56, Sean Christopherson wrote:
> > Patch 01 is the only patch that is remotely 5.13 worthy, and even then
> > only because it's about as safe as a patch can be.  Everything else is far
> > from urgent as these bugs have existed for quite some time.
> 
> Maybe patch 54 (not sarcastic), but I agree it's not at all necessary.
> 
> This is good stuff, I made a few comments but almost all of them (all except
> the last comment on patch 9, "Unconditionally zap unsync SPs") are cosmetic
> and I can resolve them myself.

The 0-day bot also reported some warnings.  vcpu_to_role_regs() needs to be
static, the helpers are added without a user.  I liked the idea of adding the
helpers in one patch, but I can't really defend adding them without a user. :-/

   arch/x86/kvm/mmu/mmu.c:209:26: warning: no previous prototype for function 'vcpu_to_role_regs' [-Wmissing-prototypes]
   struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
                            ^
   arch/x86/kvm/mmu/mmu.c:209:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
   ^
   static
   arch/x86/kvm/mmu/mmu.c:199:1: warning: unused function '____is_cr0_wp' [-Wunused-function]
   BUILD_MMU_ROLE_REGS_ACCESSOR(cr0, wp, X86_CR0_WP);

> 
> I'd like your input on renaming is_{cr0,cr4,efer}_* to is_mmu_* (and
> possibly reduce the four underscores to two...).
> 
> If I get remarks by tomorrow, I'll get this into 5.14, otherwise consider
> everything but the first eight patches queued only for 5.15.
> 
> > I labeled the "sections" of this mess in the shortlog below.
> > 
> > P.S. Does anyone know how PKRU interacts with NPT?  I assume/hope NPT
> >       accesses, which are always "user", ignore PKRU, but the APM doesn't
> >       say a thing.  If PKRU is ignored, KVM has some fixing to do.  If PKRU
> >       isn't ignored, AMD has some fixing to do:-)
> > 
> > P.S.S. This series pulled in one patch from my vCPU RESET/INIT series,
> >         "Properly reset MMU context at vCPU RESET/INIT", as that was needed
> >         to fix a root_level bug on VMX.  My goal is to get the RESET/INIT
> >         series refreshed later this week and thoroughly bombard everyone.
> 
> Note that it won't get into 5.14 anyway, since I plan to send my first pull
> request to Linus as soon as Friday.

Good to know.  I'll still try to get it out tomorrow as I'll be on vacation
for a few weeks starting Friday, and I'm afraid I'll completely forget what's in
the series :-)

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning
  2021-06-23 21:06   ` Sean Christopherson
@ 2021-06-23 21:33     ` Paolo Bonzini
  2021-06-23 22:08       ` Sean Christopherson
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 21:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 23/06/21 23:06, Sean Christopherson wrote:
>>
>> This is good stuff, I made a few comments but almost all of them (all except
>> the last comment on patch 9, "Unconditionally zap unsync SPs") are cosmetic
>> and I can resolve them myself.
> The 0-day bot also reported some warnings.  vcpu_to_role_regs() needs to be
> static, the helpers are added without a user.  I liked the idea of adding the
> helpers in one patch, but I can't really defend adding them without a user. :-/

Yep, I noticed them too.

We can just mark them static inline, which is a good idea anyway and 
enough to shut up the compiler (clang might behave different in this 
respect for .h and .c files, but again it's just a warning and not a 
bisection breakage).

Paolo

>     arch/x86/kvm/mmu/mmu.c:209:26: warning: no previous prototype for function 'vcpu_to_role_regs' [-Wmissing-prototypes]
>     struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
>                              ^
>     arch/x86/kvm/mmu/mmu.c:209:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
>     struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
>     ^
>     static
>     arch/x86/kvm/mmu/mmu.c:199:1: warning: unused function '____is_cr0_wp' [-Wunused-function]
>     BUILD_MMU_ROLE_REGS_ACCESSOR(cr0, wp, X86_CR0_WP);
> 


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN
  2021-06-23 16:38       ` Paolo Bonzini
@ 2021-06-23 22:04         ` Sean Christopherson
  0 siblings, 0 replies; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 22:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 23/06/21 17:08, Sean Christopherson wrote:
> > Because the shadow page's level is incorporated into its role, if the level of
> > the new page is >4k, the branch at (1) will be taken for all 4k shadow pages.
> > 
> > Maybe something like this for a comment?
> 
> Good, integrated.
> 
> Though I also wonder why breaking out of the loop early is okay.  Initially I thought
> that zapping only matters if there's no existing page with the desired role,
> because otherwise the unsync page would have been zapped already by an earlier
> kvm_get_mmu_page, but what if the page was synced at the time of kvm_get_mmu_page
> and then both were unsynced?

That can't happen, because the new >4k SP will mark the page for write-tracking
via account_shadowed(), and any attempt to unsync the page will fail and
write-protect the entry.

It would be possible have both an unsync and a sync SP, e.g. unsync, then INVLPG
only one of the pages.  But as you pointed out, creating the first >4k SP would
be guaranteed to wipe out the unsync SP because no match should exist.

> It may be easier to just split the loop to avoid that additional confusion,
> something like:
> 
>         /*
>          * If the guest is creating an upper-level page, zap unsync pages
>          * for the same gfn, because the gfn will be write protected and
>          * future syncs of those unsync pages could happen with an incompatible
>          * context.

I don't think the part about "future syncs ... with an incompatible context" is
correct.  The unsync walks, i.e. the sync() flows, are done with the current root
and it should be impossible to reach a SP with an invalid context when walking
the child SPs.

I also can't find anything that would out break if the SP were left unsync, i.e.
I haven't found any code that assumes a write-protected SP can't be unsync.
E.g. mmu_try_to_unsync_pages() will force write-protection due to write tracking
even if unsync is left true.  Maybe there was a rule/assumption at some point
that has since gone away?  That's why my comment hedged and just said "don't
do it" without explaining why :-)

All that said, I'm definitely not opposed to simplying/clarifying the code and
ensuring all unsync SPs are zapped in this case.

>	   * While it's possible the guest is using recursive page
>          * tables, in all likelihood the guest has stopped using the unsync
>          * page and is installing a completely unrelated page.
>          */
>         if (level > PG_LEVEL_4K) {

I believe this can be "if (!direct && level > PG_LEVEL_4K)", because the direct
case won't write protect/track anything.

>                 for_each_valid_sp(vcpu->kvm, sp, sp_list)

This can technically be "for_each_gfn_indirect_valid_sp", though I'm not sure it
saves much, if anything.

>                         if (sp->gfn == gfn && sp->role.word != role.word && sp->unsync)
>                                 kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
>                                                          &invalid_list);
>         }
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning
  2021-06-23 21:33     ` Paolo Bonzini
@ 2021-06-23 22:08       ` Sean Christopherson
  2021-06-23 22:12         ` Paolo Bonzini
  0 siblings, 1 reply; 103+ messages in thread
From: Sean Christopherson @ 2021-06-23 22:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 23/06/21 23:06, Sean Christopherson wrote:
> > > 
> > > This is good stuff, I made a few comments but almost all of them (all except
> > > the last comment on patch 9, "Unconditionally zap unsync SPs") are cosmetic
> > > and I can resolve them myself.
> > The 0-day bot also reported some warnings.  vcpu_to_role_regs() needs to be
> > static, the helpers are added without a user.  I liked the idea of adding the
> > helpers in one patch, but I can't really defend adding them without a user. :-/
> 
> Yep, I noticed them too.
> 
> We can just mark them static inline, which is a good idea anyway and enough

But they already are static inline :-(

> to shut up the compiler (clang might behave different in this respect for .h
> and .c files, but again it's just a warning and not a bisection breakage).

I was worried about the CONFIG_KVM_WERROR=y case.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning
  2021-06-23 22:08       ` Sean Christopherson
@ 2021-06-23 22:12         ` Paolo Bonzini
  0 siblings, 0 replies; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-23 22:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yu Zhang, Maxim Levitsky

On 24/06/21 00:08, Sean Christopherson wrote:
>> We can just mark them static inline, which is a good idea anyway and enough
> But they already are static inline:-(

Yep, I noticed later. :/  Probably the clang difference below?

>> to shut up the compiler (clang might behave different in this respect for .h
>> and .c files, but again it's just a warning and not a bisection breakage).
> 
> I was worried about the CONFIG_KVM_WERROR=y case.

CONFIG_KVM_WERROR can always be disabled.  "Unused" warnings do 
sometimes happen in the middle of large series.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 05/54] Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack"
  2021-06-22 17:56 ` [PATCH 05/54] Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack" Sean Christopherson
@ 2021-06-25  8:47   ` Yu Zhang
  2021-06-25  8:57     ` Paolo Bonzini
  0 siblings, 1 reply; 103+ messages in thread
From: Yu Zhang @ 2021-06-25  8:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky

On Tue, Jun 22, 2021 at 10:56:50AM -0700, Sean Christopherson wrote:
> Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested
> virtualization.  When KVM (L0) is using TDP, CR4.LA57 is not reflected in
> mmu_role.base.level because that tracks the shadow root level, i.e. TDP
> level.  Normally, this is not an issue because LA57 can't be toggled
> while long mode is active, i.e. the guest has to first disable paging,
> then toggle LA57, then re-enable paging, thus ensuring an MMU
> reinitialization.
> 
> But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
> without having to bounce through an unpaged section.  L1 can also load a

May I ask how this is done by the guest? Thanks!

> new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a
> single entry PML5 is sufficient.  Such shenanigans are only problematic
> if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets
> reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode.
> 
> Note, in the L2 case with nested TDP, even though L1 can switch between
> L2s with different LA57 settings, thus bypassing the paging requirement,
> in that case KVM's nested_mmu will track LA57 in base.level.
> 
> This reverts commit 8053f924cad30bf9f9a24e02b6c8ddfabf5202ea.
> 
> Fixes: 8053f924cad3 ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/mmu/mmu.c          | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e11d64aa0bcd..916e0f89fdfc 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -320,6 +320,7 @@ union kvm_mmu_extended_role {
>  		unsigned int cr4_pke:1;
>  		unsigned int cr4_smap:1;
>  		unsigned int cr4_smep:1;
> +		unsigned int cr4_la57:1;
>  		unsigned int maxphyaddr:6;
>  	};
>  };
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 0db12f461c9d..5024318dec45 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4537,6 +4537,7 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu)
>  	ext.cr4_smap = !!kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
>  	ext.cr4_pse = !!is_pse(vcpu);
>  	ext.cr4_pke = !!kvm_read_cr4_bits(vcpu, X86_CR4_PKE);
> +	ext.cr4_la57 = !!kvm_read_cr4_bits(vcpu, X86_CR4_LA57);
>  	ext.maxphyaddr = cpuid_maxphyaddr(vcpu);
>  
>  	ext.valid = 1;
> -- 
> 2.32.0.288.g62a8d224e6-goog
> 

B.R.
Yu

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 08/54] Revert "KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"
  2021-06-22 17:56 ` [PATCH 08/54] Revert "KVM: MMU: record maximum physical address width in kvm_mmu_extended_role" Sean Christopherson
@ 2021-06-25  8:52   ` Yu Zhang
  0 siblings, 0 replies; 103+ messages in thread
From: Yu Zhang @ 2021-06-25  8:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky

On Tue, Jun 22, 2021 at 10:56:53AM -0700, Sean Christopherson wrote:
> Drop MAXPHYADDR from mmu_role now that all MMUs have their role
> invalidated after a CPUID update.  Invalidating the role forces all MMUs
> to re-evaluate the guest's MAXPHYADDR, and the guest's MAXPHYADDR can
> only be changed only through a CPUID update.
> 
> This reverts commit de3ccd26fafc707b09792d9b633c8b5b48865315.
> 
> Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 1 -
>  arch/x86/kvm/mmu/mmu.c          | 1 -
>  2 files changed, 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 19c88b445ee0..cdaff399ed94 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -321,7 +321,6 @@ union kvm_mmu_extended_role {
>  		unsigned int cr4_smap:1;
>  		unsigned int cr4_smep:1;
>  		unsigned int cr4_la57:1;
> -		unsigned int maxphyaddr:6;
>  	};
>  };
>  
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 8d97d21d5241..04cab330c445 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4538,7 +4538,6 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu)
>  	ext.cr4_pse = !!is_pse(vcpu);
>  	ext.cr4_pke = !!kvm_read_cr4_bits(vcpu, X86_CR4_PKE);
>  	ext.cr4_la57 = !!kvm_read_cr4_bits(vcpu, X86_CR4_LA57);
> -	ext.maxphyaddr = cpuid_maxphyaddr(vcpu);
>  
>  	ext.valid = 1;
>  
> -- 
> 2.32.0.288.g62a8d224e6-goog
> 

Reviewed-by: Yu Zhang <yu.c.zhang@linux.intel.com>

Thanks
Yu

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 05/54] Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack"
  2021-06-25  8:47   ` Yu Zhang
@ 2021-06-25  8:57     ` Paolo Bonzini
  2021-06-25  9:29       ` Yu Zhang
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-25  8:57 UTC (permalink / raw)
  To: Yu Zhang, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Maxim Levitsky

On 25/06/21 10:47, Yu Zhang wrote:
>> But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
>> without having to bounce through an unpaged section.  L1 can also load a
>
> May I ask how this is done by the guest? Thanks!

It can set HOST_CR3 and HOST_CR4 to a value that is different from the 
one on vmentry.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 05/54] Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack"
  2021-06-25  8:57     ` Paolo Bonzini
@ 2021-06-25  9:29       ` Yu Zhang
  2021-06-25 10:25         ` Paolo Bonzini
  0 siblings, 1 reply; 103+ messages in thread
From: Yu Zhang @ 2021-06-25  9:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky

On Fri, Jun 25, 2021 at 10:57:51AM +0200, Paolo Bonzini wrote:
> On 25/06/21 10:47, Yu Zhang wrote:
> > > But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
> > > without having to bounce through an unpaged section.  L1 can also load a
> > 
> > May I ask how this is done by the guest? Thanks!
> 
> It can set HOST_CR3 and HOST_CR4 to a value that is different from the one
> on vmentry.

Thanks, Paolo.

Do you mean the L1 can modify its paging mode by setting HOST_CR3 as root of
a PML5 table in VMCS12 and HOST_CR4 with LA57 flipped in VMCS12, causing the
GUEST_CR3/4 being changed in VMCS01, and eventually updating the CR3/4 when 
L0 is injecting a VM Exit from L2? 

B.R.
Yu

  

> Paolo
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN
  2021-06-22 17:56 ` [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN Sean Christopherson
  2021-06-23 14:36   ` Paolo Bonzini
@ 2021-06-25  9:51   ` Yu Zhang
  2021-06-25 10:26     ` Paolo Bonzini
  1 sibling, 1 reply; 103+ messages in thread
From: Yu Zhang @ 2021-06-25  9:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Maxim Levitsky

While reading the sync pages code, I just realized that patch
https://lkml.org/lkml/2021/2/9/212 has not be merged in upstream(
though it is irrelevant to this one). May I ask the reason? Thanks!

B.R.
Yu


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 05/54] Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack"
  2021-06-25  9:29       ` Yu Zhang
@ 2021-06-25 10:25         ` Paolo Bonzini
  2021-06-25 11:23           ` Yu Zhang
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-25 10:25 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky

On 25/06/21 11:29, Yu Zhang wrote:
> Thanks, Paolo.
> 
> Do you mean the L1 can modify its paging mode by setting HOST_CR3 as root of
> a PML5 table in VMCS12 and HOST_CR4 with LA57 flipped in VMCS12, causing the
> GUEST_CR3/4 being changed in VMCS01, and eventually updating the CR3/4 when
> L0 is injecting a VM Exit from L2?

Yes, you can even do that without a "full" vmentry by setting invalid 
guest state in vmcs12. :)

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN
  2021-06-25  9:51   ` Yu Zhang
@ 2021-06-25 10:26     ` Paolo Bonzini
  2021-06-25 13:08       ` Yu Zhang
  0 siblings, 1 reply; 103+ messages in thread
From: Paolo Bonzini @ 2021-06-25 10:26 UTC (permalink / raw)
  To: Yu Zhang, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Maxim Levitsky

On 25/06/21 11:51, Yu Zhang wrote:
> While reading the sync pages code, I just realized that patch
> https://lkml.org/lkml/2021/2/9/212 has not be merged in upstream(
> though it is irrelevant to this one). May I ask the reason? Thanks!

I hadn't noticed it, thanks for reminding me.

Paolo


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 05/54] Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack"
  2021-06-25 10:25         ` Paolo Bonzini
@ 2021-06-25 11:23           ` Yu Zhang
  0 siblings, 0 replies; 103+ messages in thread
From: Yu Zhang @ 2021-06-25 11:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky

On Fri, Jun 25, 2021 at 12:25:46PM +0200, Paolo Bonzini wrote:
> On 25/06/21 11:29, Yu Zhang wrote:
> > Thanks, Paolo.
> > 
> > Do you mean the L1 can modify its paging mode by setting HOST_CR3 as root of
> > a PML5 table in VMCS12 and HOST_CR4 with LA57 flipped in VMCS12, causing the
> > GUEST_CR3/4 being changed in VMCS01, and eventually updating the CR3/4 when
> > L0 is injecting a VM Exit from L2?
> 
> Yes, you can even do that without a "full" vmentry by setting invalid guest
> state in vmcs12. :)

Hah.. Interesting. :) I think this is what load_vmcs12_host_state() does. Anyway,
thanks a lot for the explanation!

Also my reviewed-by for this one.

Rviewed-by: Yu Zhang <yu.c.zhang@linux.intel.com>

B.R.
Yu
 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN
  2021-06-25 10:26     ` Paolo Bonzini
@ 2021-06-25 13:08       ` Yu Zhang
  0 siblings, 0 replies; 103+ messages in thread
From: Yu Zhang @ 2021-06-25 13:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Maxim Levitsky

On Fri, Jun 25, 2021 at 12:26:10PM +0200, Paolo Bonzini wrote:
> On 25/06/21 11:51, Yu Zhang wrote:
> > While reading the sync pages code, I just realized that patch
> > https://lkml.org/lkml/2021/2/9/212 has not be merged in upstream(
> > though it is irrelevant to this one). May I ask the reason? Thanks!
> 
> I hadn't noticed it, thanks for reminding me.

It's just a cleanup patch. And I forgot it too.:) Thanks!

B.R.
Yu

^ permalink raw reply	[flat|nested] 103+ messages in thread

end of thread, other threads:[~2021-06-25 13:08 UTC | newest]

Thread overview: 103+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-22 17:56 [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Sean Christopherson
2021-06-22 17:56 ` [PATCH 01/54] KVM: x86/mmu: Remove broken WARN that fires on 32-bit KVM w/ nested EPT Sean Christopherson
2021-06-22 17:56 ` [PATCH 02/54] KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs Sean Christopherson
2021-06-22 17:56 ` [PATCH 03/54] KVM: x86: Properly reset MMU context at vCPU RESET/INIT Sean Christopherson
2021-06-23 13:59   ` Paolo Bonzini
2021-06-23 14:01   ` Paolo Bonzini
2021-06-23 14:50     ` Sean Christopherson
2021-06-22 17:56 ` [PATCH 04/54] KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT walk Sean Christopherson
2021-06-22 17:56 ` [PATCH 05/54] Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack" Sean Christopherson
2021-06-25  8:47   ` Yu Zhang
2021-06-25  8:57     ` Paolo Bonzini
2021-06-25  9:29       ` Yu Zhang
2021-06-25 10:25         ` Paolo Bonzini
2021-06-25 11:23           ` Yu Zhang
2021-06-22 17:56 ` [PATCH 06/54] KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified Sean Christopherson
2021-06-22 17:56 ` [PATCH 07/54] KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken Sean Christopherson
2021-06-23 14:16   ` Paolo Bonzini
2021-06-23 17:00     ` Jim Mattson
2021-06-23 17:11       ` Paolo Bonzini
2021-06-23 18:11         ` Jim Mattson
2021-06-23 18:49           ` Paolo Bonzini
2021-06-23 19:02             ` Jim Mattson
2021-06-23 19:53               ` Paolo Bonzini
2021-06-22 17:56 ` [PATCH 08/54] Revert "KVM: MMU: record maximum physical address width in kvm_mmu_extended_role" Sean Christopherson
2021-06-25  8:52   ` Yu Zhang
2021-06-22 17:56 ` [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN Sean Christopherson
2021-06-23 14:36   ` Paolo Bonzini
2021-06-23 15:08     ` Sean Christopherson
2021-06-23 16:38       ` Paolo Bonzini
2021-06-23 22:04         ` Sean Christopherson
2021-06-25  9:51   ` Yu Zhang
2021-06-25 10:26     ` Paolo Bonzini
2021-06-25 13:08       ` Yu Zhang
2021-06-22 17:56 ` [PATCH 10/54] KVM: x86/mmu: Replace EPT shadow page shenanigans with simpler check Sean Christopherson
2021-06-23 15:49   ` Paolo Bonzini
2021-06-23 16:17     ` Sean Christopherson
2021-06-23 16:41       ` Paolo Bonzini
2021-06-23 16:54         ` Sean Christopherson
2021-06-22 17:56 ` [PATCH 11/54] KVM: x86/mmu: WARN and zap SP when sync'ing if MMU role mismatches Sean Christopherson
2021-06-22 17:56 ` [PATCH 12/54] KVM: x86/mmu: Drop the intermediate "transient" __kvm_sync_page() Sean Christopherson
2021-06-23 16:54   ` Paolo Bonzini
2021-06-22 17:56 ` [PATCH 13/54] KVM: x86/mmu: Rename unsync helper and update related comments Sean Christopherson
2021-06-22 17:56 ` [PATCH 14/54] KVM: x86: Fix sizes used to pass around CR0, CR4, and EFER Sean Christopherson
2021-06-22 17:57 ` [PATCH 15/54] KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU state Sean Christopherson
2021-06-23 17:06   ` Paolo Bonzini
2021-06-23 20:49     ` Sean Christopherson
2021-06-22 17:57 ` [PATCH 16/54] KVM: x86/mmu: Drop smep_andnot_wp check from "uses NX" for shadow MMUs Sean Christopherson
2021-06-23 17:11   ` Paolo Bonzini
2021-06-23 19:36     ` Sean Christopherson
2021-06-22 17:57 ` [PATCH 17/54] KVM: x86: Read and pass all CR0/CR4 role bits to shadow MMU helper Sean Christopherson
2021-06-22 17:57 ` [PATCH 18/54] KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper Sean Christopherson
2021-06-23 17:13   ` Paolo Bonzini
2021-06-22 17:57 ` [PATCH 19/54] KVM: x86/mmu: Grab shadow root level from mmu_role for shadow MMUs Sean Christopherson
2021-06-22 17:57 ` [PATCH 20/54] KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from regs Sean Christopherson
2021-06-23  1:58   ` kernel test robot
2021-06-23 17:18   ` Paolo Bonzini
2021-06-22 17:57 ` [PATCH 21/54] KVM: x86/mmu: Consolidate misc updates into shadow_mmu_init_context() Sean Christopherson
2021-06-22 17:57 ` [PATCH 22/54] KVM: x86/mmu: Ignore CR0 and CR4 bits in nested EPT MMU role Sean Christopherson
2021-06-22 17:57 ` [PATCH 23/54] KVM: x86/mmu: Use MMU's role_regs, not vCPU state, to compute mmu_role Sean Christopherson
2021-06-22 17:57 ` [PATCH 24/54] KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans Sean Christopherson
2021-06-22 17:57 ` [PATCH 25/54] KVM: x86/mmu: Add helpers to query mmu_role bits Sean Christopherson
2021-06-23 20:02   ` Paolo Bonzini
2021-06-23 20:47     ` Sean Christopherson
2021-06-23 20:53       ` Paolo Bonzini
2021-06-22 17:57 ` [PATCH 26/54] KVM: x86/mmu: Do not set paging-related bits in MMU role if CR0.PG=0 Sean Christopherson
2021-06-22 17:57 ` [PATCH 27/54] KVM: x86/mmu: Set CR4.PKE/LA57 in MMU role iff long mode is active Sean Christopherson
2021-06-22 17:57 ` [PATCH 28/54] KVM: x86/mmu: Always Set new mmu_role immediately after checking old role Sean Christopherson
2021-06-22 17:57 ` [PATCH 29/54] KVM: x86/mmu: Don't grab CR4.PSE for calculating shadow reserved bits Sean Christopherson
2021-06-22 17:57 ` [PATCH 30/54] KVM: x86/mmu: Use MMU's role to get CR4.PSE for computing rsvd bits Sean Christopherson
2021-06-22 17:57 ` [PATCH 31/54] KVM: x86/mmu: Drop vCPU param from reserved bits calculator Sean Christopherson
2021-06-22 17:57 ` [PATCH 32/54] KVM: x86/mmu: Use MMU's role to compute permission bitmask Sean Christopherson
2021-06-22 17:57 ` [PATCH 33/54] KVM: x86/mmu: Use MMU's role to compute PKRU bitmask Sean Christopherson
2021-06-22 17:57 ` [PATCH 34/54] KVM: x86/mmu: Use MMU's roles to compute last non-leaf level Sean Christopherson
2021-06-22 17:57 ` [PATCH 35/54] KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk Sean Christopherson
2021-06-22 17:57 ` [PATCH 36/54] KVM: x86/mmu: Use MMU's role/role_regs to compute context's metadata Sean Christopherson
2021-06-22 17:57 ` [PATCH 37/54] KVM: x86/mmu: Use MMU's role to get EFER.NX during MMU configuration Sean Christopherson
2021-06-22 17:57 ` [PATCH 38/54] KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers Sean Christopherson
2021-06-22 17:57 ` [PATCH 39/54] KVM: x86/mmu: Get nested MMU's root level from the MMU's role Sean Christopherson
2021-06-22 17:57 ` [PATCH 40/54] KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper Sean Christopherson
2021-06-22 17:57 ` [PATCH 41/54] KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls Sean Christopherson
2021-06-23 20:07   ` Paolo Bonzini
2021-06-23 20:53     ` Sean Christopherson
2021-06-22 17:57 ` [PATCH 42/54] KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0 Sean Christopherson
2021-06-22 17:57 ` [PATCH 43/54] KVM: x86/mmu: Add helper to update paging metadata Sean Christopherson
2021-06-22 17:57 ` [PATCH 44/54] KVM: x86/mmu: Add a helper to calculate root from role_regs Sean Christopherson
2021-06-22 17:57 ` [PATCH 45/54] KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers Sean Christopherson
2021-06-22 17:57 ` [PATCH 46/54] KVM: x86/mmu: Use MMU's role to determine PTTYPE Sean Christopherson
2021-06-22 17:57 ` [PATCH 47/54] KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU Sean Christopherson
2021-06-23 20:13   ` Paolo Bonzini
2021-06-22 17:57 ` [PATCH 48/54] KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE Sean Christopherson
2021-06-22 17:57 ` [PATCH 49/54] KVM: x86: Enhance comments for MMU roles and nested transition trickiness Sean Christopherson
2021-06-22 17:57 ` [PATCH 50/54] KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic Sean Christopherson
2021-06-23 20:22   ` Paolo Bonzini
2021-06-23 20:58     ` Sean Christopherson
2021-06-22 17:57 ` [PATCH 51/54] KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT Sean Christopherson
2021-06-22 17:57 ` [PATCH 52/54] KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault Sean Christopherson
2021-06-22 17:57 ` [PATCH 53/54] KVM: x86/mmu: Get CR4.SMEP " Sean Christopherson
2021-06-22 17:57 ` [PATCH 54/54] KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on Sean Christopherson
2021-06-23 20:29 ` [PATCH 00/54] KVM: x86/mmu: Bug fixes and summer cleaning Paolo Bonzini
2021-06-23 21:06   ` Sean Christopherson
2021-06-23 21:33     ` Paolo Bonzini
2021-06-23 22:08       ` Sean Christopherson
2021-06-23 22:12         ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).