Re: [PATCH] KVM: x86: drop erroneous mmu_check_root() from fast_pgd_switch()

From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Junaid Shahid <junaids@google.com>,
	kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: x86: drop erroneous mmu_check_root() from fast_pgd_switch()
Date: Wed, 01 Jul 2020 09:14:23 +0200	[thread overview]
Message-ID: <87eepvbtbk.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <a8f60652-c419-58bc-fe78-67954fc6d4c1@google.com>

Junaid Shahid <junaids@google.com> writes:

> On 6/30/20 3:07 AM, Vitaly Kuznetsov wrote:
>> Undesired triple fault gets injected to L1 guest on SVM when L2 is
>> launched with certain CR3 values. It seems the mmu_check_root()
>> check in fast_pgd_switch() is wrong: first of all we don't know
>> if 'new_pgd' is a GPA or a nested GPA and, in case it is a nested
>> GPA, we can't check it with kvm_is_visible_gfn().
>> 
>> The problematic code path is:
>> nested_svm_vmrun()
>>    ...
>>    nested_prepare_vmcb_save()
>>      kvm_set_cr3(..., nested_vmcb->save.cr3)
>>        kvm_mmu_new_pgd()
>>          ...
>>          mmu_check_root() -> TRIPLE FAULT
>> 
>> The mmu_check_root() check in fast_pgd_switch() seems to be
>> superfluous even for non-nested case: when GPA is outside of the
>> visible range cached_root_available() will fail for non-direct
>> roots (as we can't have a matching one on the list) and we don't
>> seem to care for direct ones.
>> 
>> Also, raising #TF immediately when a non-existent GFN is written to CR3
>> doesn't seem to mach architecture behavior.
>> 
>> Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path")
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> - The patch fixes the immediate issue and doesn't seem to break any
>>    tests even with shadow PT but I'm not sure I properly understood
>>    why the check was there in the first place. Please review!
>> ---
>>   arch/x86/kvm/mmu/mmu.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 76817d13c86e..286c74d2ae8d 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -4277,8 +4277,7 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>>   	 */
>>   	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
>>   	    mmu->root_level >= PT64_ROOT_4LEVEL)
>> -		return !mmu_check_root(vcpu, new_pgd >> PAGE_SHIFT) &&
>> -		       cached_root_available(vcpu, new_pgd, new_role);
>> +		return cached_root_available(vcpu, new_pgd, new_role);
>>   
>>   	return false;
>>   }
>> 
>
> The check does seem superfluous, so should be ok to remove. Though I
> think that fast_pgd_switch() really should be getting only L1
> GPAs. Otherwise, there could be confusion between the same GPAs from
> two different L2s.
>
> IIUC, at least on Intel, only L1 CR3s (including shadow L1 CR3s for
> L2) or L1 EPTPs should get to fast_pgd_switch(). But I am not familiar
> enough with SVM to see why an L2 GPA would end up there. From a
> cursory look, it seems that until "978ce5837c7e KVM: SVM: always
> update CR3 in VMCB", enter_svm_guest_mode() was calling kvm_set_cr3()
> only when using shadow paging, in which case I assume that
> nested_vmcb->save.cr3 would have been an L1 CR3 shadowing the L2 CR3,
> correct? But now kvm_set_cr3() is called even when not using shadow
> paging, which I suppose is how we are getting the L2 CR3. Should we
> skip calling fast_pgd_switch() in that particular case?

Thank you for your thoughts, this is helpful indeed.

As far as I can see, nVMX calls kvm_mmu_new_pgd() directly in two cases:

1) nested_ept_init_mmu_context() -> kvm_init_shadow_ept_mmu() when
switching to L2 (and 'new_pgd' is EPTP)

2) nested_vmx_load_cr3() when !nested_ept (and 'new_pgd' is 'cr3')

I think we need to do something similar for nSVM to make the root cache
work and work correctly:

1) supplement nested_svm_init_mmu_context() -> kvm_init_shadow_mmu()
with kvm_mmu_new_pgd()

2) stop doing kvm_mmu_new_pgd() from nested_prepare_vmcb_save() ->
kvm_set_cr3() when npt_enabled

Let me try to experiment here a bit.

Thanks again!

-- 
Vitaly