Re: [PATCH] KVM: x86: drop erroneous mmu_check_root() from fast_pgd_switch()

From: Junaid Shahid <junaids@google.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>,
	kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: x86: drop erroneous mmu_check_root() from fast_pgd_switch()
Date: Tue, 30 Jun 2020 20:24:28 -0700	[thread overview]
Message-ID: <a8f60652-c419-58bc-fe78-67954fc6d4c1@google.com> (raw)
In-Reply-To: <20200630100742.1167961-1-vkuznets@redhat.com>

On 6/30/20 3:07 AM, Vitaly Kuznetsov wrote:
> Undesired triple fault gets injected to L1 guest on SVM when L2 is
> launched with certain CR3 values. It seems the mmu_check_root()
> check in fast_pgd_switch() is wrong: first of all we don't know
> if 'new_pgd' is a GPA or a nested GPA and, in case it is a nested
> GPA, we can't check it with kvm_is_visible_gfn().
> 
> The problematic code path is:
> nested_svm_vmrun()
>    ...
>    nested_prepare_vmcb_save()
>      kvm_set_cr3(..., nested_vmcb->save.cr3)
>        kvm_mmu_new_pgd()
>          ...
>          mmu_check_root() -> TRIPLE FAULT
> 
> The mmu_check_root() check in fast_pgd_switch() seems to be
> superfluous even for non-nested case: when GPA is outside of the
> visible range cached_root_available() will fail for non-direct
> roots (as we can't have a matching one on the list) and we don't
> seem to care for direct ones.
> 
> Also, raising #TF immediately when a non-existent GFN is written to CR3
> doesn't seem to mach architecture behavior.
> 
> Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path")
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> - The patch fixes the immediate issue and doesn't seem to break any
>    tests even with shadow PT but I'm not sure I properly understood
>    why the check was there in the first place. Please review!
> ---
>   arch/x86/kvm/mmu/mmu.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 76817d13c86e..286c74d2ae8d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4277,8 +4277,7 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>   	 */
>   	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
>   	    mmu->root_level >= PT64_ROOT_4LEVEL)
> -		return !mmu_check_root(vcpu, new_pgd >> PAGE_SHIFT) &&
> -		       cached_root_available(vcpu, new_pgd, new_role);
> +		return cached_root_available(vcpu, new_pgd, new_role);
>   
>   	return false;
>   }
> 

The check does seem superfluous, so should be ok to remove. Though I think that fast_pgd_switch() really should be getting only L1 GPAs. Otherwise, there could be confusion between the same GPAs from two different L2s.

IIUC, at least on Intel, only L1 CR3s (including shadow L1 CR3s for L2) or L1 EPTPs should get to fast_pgd_switch(). But I am not familiar enough with SVM to see why an L2 GPA would end up there. From a cursory look, it seems that until "978ce5837c7e KVM: SVM: always update CR3 in VMCB", enter_svm_guest_mode() was calling kvm_set_cr3() only when using shadow paging, in which case I assume that nested_vmcb->save.cr3 would have been an L1 CR3 shadowing the L2 CR3, correct? But now kvm_set_cr3() is called even when not using shadow paging, which I suppose is how we are getting the L2 CR3. Should we skip calling fast_pgd_switch() in that particular case?

Thanks,
Junaid